Re: Updating Broadcast Variable in Spark Streaming 2.4.4

2022-09-28 Thread Sean Owen
; i...@ricobergmann.de> wrote: > Hi folks! > > > I'm trying to implement an update of a broadcast var in Spark Streaming. > The idea is that whenever some configuration value has changed (this is > periodically checked by the driver) the existing broadcast variable is > unpersiste

Updating Broadcast Variable in Spark Streaming 2.4.4

2022-09-28 Thread Dipl.-Inf. Rico Bergmann
Hi folks! I'm trying to implement an update of a broadcast var in Spark Streaming. The idea is that whenever some configuration value has changed (this is periodically checked by the driver) the existing broadcast variable is unpersisted and then (re-)broadcasted. In a local test setup

Re: Updating Broadcast Variable in Spark Streaming 2.4.4

2022-07-22 Thread Sean Owen
checked by the driver) the existing broadcast variable is > unpersisted and then (re-)broadcasted. > > In a local test setup (using a local Spark) it works fine but on a real > cluster it doesn't work. The broadcast variable never gets updated. Am I > doing something wrong? Or i

Updating Broadcast Variable in Spark Streaming 2.4.4

2022-07-22 Thread Dipl.-Inf. Rico Bergmann
Hi folks! I'm trying to implement an update of a broadcast var in Spark Streaming. The idea is that whenever some configuration value has changed (this is periodically checked by the driver) the existing broadcast variable is unpersisted and then (re-)broadcasted. In a local test setup

Re: Broadcast Variable

2021-05-03 Thread Sean Owen
There is just one copy in memory. No different than if you have to variables pointing to the same dict. On Mon, May 3, 2021 at 7:54 AM Bode, Meikel, NMA-CFD < meikel.b...@bertelsmann.de> wrote: > Hi all, > > > > when broadcasting a large dict containing several million entries to > executors

Broadcast Variable

2021-05-03 Thread Bode, Meikel, NMA-CFD
Hi all, when broadcasting a large dict containing several million entries to executors what exactly happens when calling bc_var.value within a UDF like: .. d = bc_var.value .. Does d receives a copy of the dict inside value or is this handled like a pointer? Thanks, Meikel

Broadcast Variable question

2020-10-04 Thread Eduardo
Is there any concurrent access to broadcast variables in the workers? My use case is that the broadcast variable is a large dataset object which is *only* read. However, there is some cache in this object which is written (so it speeds up generation of entries in this dataset). Everything

Re: Spark stuck at removing broadcast variable

2020-04-18 Thread Waleed Fateem
This might be obvious but just checking anyways, did you confirm whether or not all of the messages have already been consumed by Spark? If that's the case then I wouldn't expect much to happen unless new data comes into your Kafka topic. If you're a hundred percent sure that there's still plenty

Re: Spark stuck at removing broadcast variable

2020-04-18 Thread Sean Owen
I don't think that means it's stuck on removing something; it was removed. Not sure what it is waiting on - more data perhaps? On Sat, Apr 18, 2020 at 2:22 PM Alchemist wrote: > > I am running a simple Spark structured streaming application that is pulling > data from a Kafka Topic. I have a

Spark stuck at removing broadcast variable

2020-04-18 Thread Alchemist
I am running a simple Spark structured streaming application that is pulling data from a Kafka Topic. I have a Kafka Topic with nearly 1000 partitions. I am running this app on 6 node EMR cluster with 4 cores and 16GB RAM. I observed that Spark is trying to pull data from all 1024 Kafka

Mutating broadcast variable from executors, any risks even if done in a thread-safe manner?

2019-03-12 Thread Jan Brabec (janbrabe)
Hello, I have quite specific usecase. I want to use an MXNet neural-net model in a distributed fashion to get predictions on a very large dataset. It is not possible to broadcast the model directly because the underlying implementation is not serializable. Instead the model has to be loaded

Re: Spark seems to think that a particular broadcast variable is large in size

2018-10-16 Thread Dillon Dukek
You keep mentioning that you're viewing this after the fact in the spark history server. Also the spark-shell isn't a UI so I'm not sure what you mean by saying that the storage tab is blank in the spark-shell. Just so I'm clear about what you're doing, are you looking at this info while your

Re: Spark seems to think that a particular broadcast variable is large in size

2018-10-16 Thread Venkat Dabri
The same problem is mentioned here : https://forums.databricks.com/questions/117/why-is-my-rdd-not-showing-up-in-the-storage-tab-of.html https://stackoverflow.com/questions/44792213/blank-storage-tab-in-spark-history-server On Tue, Oct 16, 2018 at 8:06 AM Venkat Dabri wrote: > > I did try that

Re: Spark seems to think that a particular broadcast variable is large in size

2018-10-16 Thread Venkat Dabri
I did try that mechanism before but the data never shows up in the storage tab. The storage tab is always blank. I have tried it in Zeppelin as well as spark-shell. scala> val classCount = spark.read.parquet("s3:// /classCount") scala> classCount.persist scala> classCount.count Nothing shows

Re: Spark seems to think that a particular broadcast variable is large in size

2018-10-15 Thread Dillon Dukek
In your program persist the smaller table and use count to force it to materialize. Then in the Spark UI go to the Storage tab. The size of your table as spark sees it should be displayed there. Out of curiosity what version / language of Spark are you using? On Mon, Oct 15, 2018 at 11:53 AM

Spark seems to think that a particular broadcast variable is large in size

2018-10-15 Thread Venkat Dabri
I am trying to do a broadcast join on two tables. The size of the smaller table will vary based upon the parameters but the size of the larger table is close to 2TB. What I have noticed is that if I don't set the spark.sql.autoBroadcastJoinThreshold to 10G some of these operations do a

Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on wo

2018-10-09 Thread zakhavan
;, line 87, in _py2java File "/home/zeinab/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/serializers.py", line 555, in dumps File "/home/zeinab/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 315, in __getnewargs__ Exception: It appears tha

Re: Any way to see the size of the broadcast variable?

2018-10-09 Thread V0lleyBallJunki3
Yes each of the executors have 60GB -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Any way to see the size of the broadcast variable?

2018-10-09 Thread Gourav Sengupta
Hi Venkat, do you executors have that much amount of memory? Regards, Gourav Sengupta On Tue, Oct 9, 2018 at 4:44 PM V0lleyBallJunki3 wrote: > Hello, >I have set the value of spark.sql.autoBroadcastJoinThreshold to a very > high value of 20 GB. I am joining a table that I am sure is below

Any way to see the size of the broadcast variable?

2018-10-09 Thread V0lleyBallJunki3
Hello, I have set the value of spark.sql.autoBroadcastJoinThreshold to a very high value of 20 GB. I am joining a table that I am sure is below this variable, however spark is doing a SortMergeJoin. If I set a broadcast hint then spark does a broadcast join and job finishes much faster.

Refresh broadcast variable when it isn't the value.

2018-08-19 Thread Guillermo Ortiz Fernández
the broadcast variable for the next microbach and finally reset the accumulator to 0. I don't know if there are a better solution or others ideas to do this. Has anyone faced this problem?

Re: Broadcast variable size limit?

2018-08-05 Thread Vadim Semenov
That’s the max size of a byte array in Java, limited by the length which is defined as integer, and in most JVMS arrays can’t hold more than Int.MaxValue - 8 elements. Other way to overcome this is to create multiple broadcast variables On Sunday, August 5, 2018, klrmowse wrote: > i don't need

Re: Broadcast variable size limit?

2018-08-05 Thread klrmowse
i don't need more, per se... i just need to watch the size of the variable; then, if it's within the size limit, go ahead and broadcast it; if not, then i won't broadcast... so, that would be a yes then? (2 GB, or which is it exactly?) -- Sent from:

Re: Broadcast variable size limit?

2018-08-05 Thread Jörn Franke
I think if you need more then you should anyway think about something different than broadcast variable ... > On 5. Aug 2018, at 16:51, klrmowse wrote: > > is it currently still ~2GB (Integer.MAX_VALUE) ?? > > or am i misinformed, since that's what google-search and scouring

Broadcast variable size limit?

2018-08-05 Thread klrmowse
is it currently still ~2GB (Integer.MAX_VALUE) ?? or am i misinformed, since that's what google-search and scouring this mailing list seem to say... ? Thanks -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

[PySpark] - Broadcast Variable Pickle Registry Usage?

2017-05-24 Thread Michael Mansour (CS)
Hi all, I’m poking around the Pyspark.Broadcast module, and I notice that one can pass in a `pickle_registry` and a `path`. The documentation does not outline the pickle registry use and I’m curious about how to use it, and if there are any advantages to it. Thanks, Michael Mansour

Re: [Spark Streaming] Dynamic Broadcast Variable Update

2017-05-05 Thread Pierce Lamb
dcast >> variables without requiring a downtime of the streaming service. >> >> The key to this implementation is a live re-broadcastVariable() >> interface, which can be triggered in between micro-batch executions, >> without any re-boot required for the streaming appli

Re: [Spark Streaming] Dynamic Broadcast Variable Update

2017-05-04 Thread Gene Pang
ion is a live re-broadcastVariable() interface, > which can be triggered in between micro-batch executions, without any > re-boot required for the streaming application. At a high level the task is > done by re-fetching broadcast variable information from the spark driver, > and then re-dis

Re: [Spark Streaming] Dynamic Broadcast Variable Update

2017-05-02 Thread Tim Smith
ur goal was to re-broadcast > variables without requiring a downtime of the streaming service. > > The key to this implementation is a live re-broadcastVariable() interface, > which can be triggered in between micro-batch executions, without any > re-boot required for the streaming applicatio

[Spark Streaming] Dynamic Broadcast Variable Update

2017-05-02 Thread Nipun Arora
this implementation is a live re-broadcastVariable() interface, which can be triggered in between micro-batch executions, without any re-boot required for the streaming application. At a high level the task is done by re-fetching broadcast variable information from the spark driver, and then re-distribu

Access broadcast variable from within function passed to reduceByKey

2016-11-15 Thread coolgar
=> val valueSum = aggLog1(f) + aggLog2(f) aggs ++ Map(f -> valueSum) } aggs } I can't pass broadcast to the reduceKeyMapFunction. I create the broadcast variable (broadcastv) in the driver but I fear it will not be initialized on the workers where the reduceKeyMapFunctio

Re: How do I convert a data frame to broadcast variable?

2016-11-04 Thread Jain, Nishit
;denny.g@gmail.com<mailto:denny.g@gmail.com>>, "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: How do I convert a data frame to broadcast variable? Hi Nishit, Yes the JDBC connector

Re: How do I convert a data frame to broadcast variable?

2016-11-03 Thread Silvio Fiorito
:48 PM To: Denny Lee; user@spark.apache.org Subject: Re: How do I convert a data frame to broadcast variable? Thanks Denny! That does help. I will give that a shot. Question: If I am going this route, I am wondering how can I only read few columns of a table (not whole table) from JDBC as data fram

Re: How do I convert a data frame to broadcast variable?

2016-11-03 Thread Jain, Nishit
ark.apache.org<mailto:user@spark.apache.org>> Subject: Re: How do I convert a data frame to broadcast variable? If you're able to read the data in as a DataFrame, perhaps you can use a BroadcastHashJoin so that way you can join to that table presuming its small enough to dis

Re: How do I convert a data frame to broadcast variable?

2016-11-03 Thread Denny Lee
%20SQL,%20DataFrames%20%26%20Datasets/05%20BroadcastHashJoin%20-%20scala.html HTH! On Thu, Nov 3, 2016 at 8:53 AM Jain, Nishit <nja...@underarmour.com> wrote: > I have a lookup table in HANA database. I want to create a spark broadcast > variable for it. > What would be the sug

How do I convert a data frame to broadcast variable?

2016-11-03 Thread Jain, Nishit
I have a lookup table in HANA database. I want to create a spark broadcast variable for it. What would be the suggested approach? Should I read it as an data frame and convert data frame into broadcast variable? Thanks, Nishit

BiMap BroadCast Variable - Kryo Serialization Issue

2016-11-02 Thread Kalpana Jalawadi
Hi, I am getting Nullpointer exception due to Kryo Serialization issue, while trying to read a BiMap broadcast variable. Attached is the code snippets. Pointers shared here didn't help - link1 <http://stackoverflow.com/questions/33156095/spark-serialization-issue-with-hashmap>, link2

Re: Spark 2.0.0 - Broadcast variable - What is ClassTag?

2016-08-08 Thread Holden Karau
If your using this from Java you might find it easier to construct a JavaSparkContext, the broadcast function will automatically use a fake class tag. On Sun, Aug 7, 2016 at 11:57 PM, Aseem Bansal wrote: > I am using the following to broadcast and it explicitly requires

Re: Spark 2.0.0 - Broadcast variable - What is ClassTag?

2016-08-08 Thread Aseem Bansal
I am using the following to broadcast and it explicitly requires classtag sparkSession.sparkContext().broadcast On Mon, Aug 8, 2016 at 12:09 PM, Holden Karau wrote: > Classtag is Scala concept (see http://docs.scala-lang. >

Re: Spark 2.0.0 - Broadcast variable - What is ClassTag?

2016-08-08 Thread Holden Karau
Classtag is Scala concept (see http://docs.scala-lang.org/overviews/reflection/typetags-manifests.html) - although this should not be explicitly required - looking at http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkContext we can see that in Scala the classtag tag is

Spark 2.0.0 - Broadcast variable - What is ClassTag?

2016-08-08 Thread Aseem Bansal
Earlier for broadcasting we just needed to use sparkcontext.broadcast(objectToBroadcast) But now it is sparkcontext.broadcast(objectToBroadcast, classTag) What is classTag here?

Re: broadcast variable not picked up

2016-05-16 Thread Davies Liu
adcast_var ' is not defined > > > Any ideas on how to fix it ? > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/broadcast-variable-not-picked-up

broadcast variable not picked up

2016-05-13 Thread abi
Any ideas on how to fix it ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/broadcast-variable-not-picked-up-tp26955.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Getting the size of a broadcast variable

2016-02-02 Thread Ted Yu
iver; > > 16/02/02 14:51:53 INFO BlockManagerInfo: Added rdd_2_12 in memory on > localhost:58536 (size: 40.0 B, free: 510.7 MB) > > > > On Tue, Feb 2, 2016 at 8:20 AM, apu mishra . rr <apumishra...@gmail.com> > wrote: > >> How can I determine the size (in byte

Getting the size of a broadcast variable

2016-02-01 Thread apu mishra . rr
How can I determine the size (in bytes) of a broadcast variable? Do I need to use the .dump method and then look at the size of the result, or is there an easier way? Using PySpark with Spark 1.6. Thanks! Apu

Re: Getting the size of a broadcast variable

2016-02-01 Thread Takeshi Yamamuro
wrote: > How can I determine the size (in bytes) of a broadcast variable? Do I need > to use the .dump method and then look at the size of the result, or is > there an easier way? > > Using PySpark with Spark 1.6. > > Thanks! > > Apu > -- --- Takeshi Yamamuro

Re: Millions of entities in custom Hadoop InputFormat and broadcast variable

2015-11-27 Thread Jeff Zhang
RecordReader will load the full entity(a plain Java Bean) > from my data source for each ID in the specific split. > > My first observation is some Spark tasks were timeout, and looks like > Spark broadcast variable is being used to distribute my splits, is that > correct? If so, fro

Millions of entities in custom Hadoop InputFormat and broadcast variable

2015-11-26 Thread Anfernee Xu
in the specific split. My first observation is some Spark tasks were timeout, and looks like Spark broadcast variable is being used to distribute my splits, is that correct? If so, from performance perspective, what enhancement I can make to make it better? Thanks -- --Anfernee

How to create broadcast variable from Java String array?

2015-09-12 Thread unk1102
n creating DataFrame DataFrame df = sourceFrame.toDF(fieldNames); I want to do the above using broadcast variable to that we dont ship huge string array to executor I believe we can do something like the following to create broadcast String[] brArray = sc.broadcast(fieldNames); DataFrame df = sourceF

Re: broadcast variable get cleaned by ContextCleaner unexpectedly ?

2015-09-10 Thread swetha
Hi, How is the ContextCleaner different from spark.cleaner.ttl?Is spark.cleaner.ttl when there is ContextCleaner in the Streaming job? Thanks, Swetha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/broadcast-variable-get-cleaned-by-ContextCleaner

Re: Access a Broadcast variable causes Spark to launch a second context

2015-09-07 Thread sstraub
roducible in my environment and the application runs fine as soon > as I don't access this specific Map from this specific RDD. > > Any idea what might cause this problem? > Can I provide you with any other Information (besides posting >500 lines > of code)? > > c

Access a Broadcast variable causes Spark to launch a second context

2015-09-07 Thread sstraub
with any other Information (besides posting >500 lines of code)? cheers Sebastian -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Access-a-Broadcast-variable-causes-Spark-to-launch-a-second-context-tp24595.html Sent from the Apache Spark User List maili

Re: broadcast variable of Kafka producer throws ConcurrentModificationException

2015-08-19 Thread Marcin Kuthan
. Thanks for all your comments. On Tue, Aug 18, 2015 at 4:28 PM, Tathagata Das t...@databricks.com wrote: Why are you even trying to broadcast a producer? A broadcast variable is some immutable piece of serializable DATA that can be used for processing on the executors. A Kafka producer

Re: broadcast variable of Kafka producer throws ConcurrentModificationException

2015-08-19 Thread Shenghua(Daniel) Wan
are you even trying to broadcast a producer? A broadcast variable is some immutable piece of serializable DATA that can be used for processing on the executors. A Kafka producer is neither DATA nor immutable, and definitely not serializable. The right way to do this is to create the producer

Re: broadcast variable of Kafka producer throws ConcurrentModificationException

2015-08-18 Thread Shenghua(Daniel) Wan
: Why are you even trying to broadcast a producer? A broadcast variable is some immutable piece of serializable DATA that can be used for processing on the executors. A Kafka producer is neither DATA nor immutable, and definitely not serializable. The right way to do this is to create

Re: broadcast variable of Kafka producer throws ConcurrentModificationException

2015-08-18 Thread Tathagata Das
Why are you even trying to broadcast a producer? A broadcast variable is some immutable piece of serializable DATA that can be used for processing on the executors. A Kafka producer is neither DATA nor immutable, and definitely not serializable. The right way to do this is to create the producer

Re: broadcast variable of Kafka producer throws ConcurrentModificationException

2015-08-18 Thread Marcin Kuthan
: Why are you even trying to broadcast a producer? A broadcast variable is some immutable piece of serializable DATA that can be used for processing on the executors. A Kafka producer is neither DATA nor immutable, and definitely not serializable. The right way to do this is to create

Re: broadcast variable of Kafka producer throws ConcurrentModificationException

2015-08-18 Thread Tathagata Das
to broadcast a producer? A broadcast variable is some immutable piece of serializable DATA that can be used for processing on the executors. A Kafka producer is neither DATA nor immutable, and definitely not serializable. The right way to do this is to create the producer in the executors. Please

broadcast variable of Kafka producer throws ConcurrentModificationException

2015-08-18 Thread Shenghua(Daniel) Wan
Hi, Did anyone see java.util.ConcurrentModificationException when using broadcast variables? I encountered this exception when wrapping a Kafka producer like this in the spark streaming driver. Here is what I did. KafkaProducerString, String producer = new KafkaProducerString, String(properties);

Re: broadcast variable of Kafka producer throws ConcurrentModificationException

2015-08-18 Thread Cody Koeninger
I wouldn't expect a kafka producer to be serializable at all... among other things, it has a background thread On Tue, Aug 18, 2015 at 4:55 PM, Shenghua(Daniel) Wan wansheng...@gmail.com wrote: Hi, Did anyone see java.util.ConcurrentModificationException when using broadcast variables? I

broadcast variable and accumulators issue while spark streaming checkpoint recovery

2015-07-29 Thread Shushant Arora
Hi I am using spark streaming 1.3 and using checkpointing. But job is failing to recover from checkpoint on restart. For broadcast variable it says : 1.WARN TaskSetManager: Lost task 15.0 in stage 7.0 (TID 1269, hostIP): java.lang.ClassCastException: [B cannot be cast

Re: broadcast variable and accumulators issue while spark streaming checkpoint recovery

2015-07-29 Thread Tathagata Das
. For broadcast variable it says : 1.WARN TaskSetManager: Lost task 15.0 in stage 7.0 (TID 1269, hostIP): java.lang.ClassCastException: [B cannot be cast to pkg.broadcastvariableclassname at point where i call bcvariable.value() in map function. at mapfunction

Re: broadcast variable and accumulators issue while spark streaming checkpoint recovery

2015-07-29 Thread Shushant Arora
. rdd.map { x = /// use accum } } On Wed, Jul 29, 2015 at 1:15 PM, Shushant Arora shushantaror...@gmail.com wrote: Hi I am using spark streaming 1.3 and using checkpointing. But job is failing to recover from checkpoint on restart. For broadcast variable it says : 1.WARN TaskSetManager: Lost

Re: broadcast variable and accumulators issue while spark streaming checkpoint recovery

2015-07-29 Thread Tathagata Das
: Hi I am using spark streaming 1.3 and using checkpointing. But job is failing to recover from checkpoint on restart. For broadcast variable it says : 1.WARN TaskSetManager: Lost task 15.0 in stage 7.0 (TID 1269, hostIP): java.lang.ClassCastException: [B cannot be cast

Re: broadcast variable question

2015-07-28 Thread Jonathan Coveney
'); wrote: i am running in coarse grained mode, let's say with 8 cores per executor. If I use a broadcast variable, will all of the tasks in that executor share the same value? Or will each task broadcast its own value ie in this case, would there be one value in the executor shared by the 8 tasks

broadcast variable question

2015-07-28 Thread Jonathan Coveney
i am running in coarse grained mode, let's say with 8 cores per executor. If I use a broadcast variable, will all of the tasks in that executor share the same value? Or will each task broadcast its own value ie in this case, would there be one value in the executor shared by the 8 tasks, or would

Re: broadcast variable question

2015-07-28 Thread Ted Yu
If I understand correctly, there would be one value in the executor. Cheers On Tue, Jul 28, 2015 at 4:23 PM, Jonathan Coveney jcove...@gmail.com wrote: i am running in coarse grained mode, let's say with 8 cores per executor. If I use a broadcast variable, will all of the tasks

Re: Passing Broadcast variable as parameter

2015-07-18 Thread Gylfi
Hi. You can use a broadcast variable to make data available to all the nodes in your cluster that can live longer then just the current distributed task. For example if you need a to access a large structure in multiple sub-tasks, instead of sending that structure again and again with each sub

Re: [Spark Streaming] Null Pointer Exception when accessing broadcast variable to store a hashmap in Java

2015-06-23 Thread Benjamin Fradet
some operation based on the checkup. The following is a snippet of how I access the broadcast variable. JavaDStreamTuple3Long,Double,String split = matched.map(new GenerateType2Scores()); class GenerateType2Scores implements FunctionString, Tuple3Long, Double, String { @Override

Re: [Spark Streaming] Null Pointer Exception when accessing broadcast variable to store a hashmap in Java

2015-06-23 Thread Nipun Arora
btw. just for reference I have added the code in a gist: https://gist.github.com/nipunarora/ed987e45028250248edc and a stackoverflow reference here: http://stackoverflow.com/questions/31006490/broadcast-variable-null-pointer-exception-in-spark-streaming On Tue, Jun 23, 2015 at 11:01 AM, Nipun

Re: [Spark Streaming] Null Pointer Exception when accessing broadcast variable to store a hashmap in Java

2015-06-23 Thread Nipun Arora
on the checkup. The following is a snippet of how I access the broadcast variable. JavaDStreamTuple3Long,Double,String split = matched.map(new GenerateType2Scores()); class GenerateType2Scores implements FunctionString, Tuple3Long, Double, String { @Override public Tuple3Long

[Spark Streaming] Null Pointer Exception when accessing broadcast variable to store a hashmap in Java

2015-06-23 Thread Nipun Arora
().broadcast(hm); I need to access this model in my mapper phase, and do some operation based on the checkup. The following is a snippet of how I access the broadcast variable. JavaDStreamTuple3Long,Double,String split = matched.map(new GenerateType2Scores()); class GenerateType2Scores implements

Re: [Spark Streaming] Null Pointer Exception when accessing broadcast variable to store a hashmap in Java

2015-06-23 Thread Nipun Arora
: https://gist.github.com/nipunarora/ed987e45028250248edc and a stackoverflow reference here: http://stackoverflow.com/questions/31006490/broadcast-variable-null-pointer-exception-in-spark-streaming On Tue, Jun 23, 2015 at 11:01 AM, Nipun Arora nipunarora2...@gmail.com wrote: Hi, I

Re: [Spark Streaming] Null Pointer Exception when accessing broadcast variable to store a hashmap in Java

2015-06-23 Thread Tathagata Das
/questions/31006490/broadcast-variable-null-pointer-exception-in-spark-streaming On Tue, Jun 23, 2015 at 11:01 AM, Nipun Arora nipunarora2...@gmail.com wrote: Hi, I have a spark streaming application where I need to access a model saved in a HashMap. I have *no problems in running the same

Re: How Broadcast variable works

2015-05-30 Thread bit1...@163.com
Can someone help take a look at my questions? Thanks. bit1...@163.com From: bit1...@163.com Date: 2015-05-29 18:57 To: user Subject: How Broadcast variable works Hi, I have a spark streaming application. SparkContext uses broadcast vriables to broadcast Configuration information that each

Re: How Broadcast variable works

2015-05-30 Thread ayan guha
1. No. thats the purpose of broadcast variable, ie not to be shipped with every task. From Documentation Broadcast Variables Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks. They can be used, for example

NullPointerException when accessing broadcast variable in DStream

2015-05-18 Thread hotienvu
I'm running spark 1.3.1 in standalone mode with 2 nodes cluster. I tried with spark-shell and it works fine. Please help! Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-when-accessing-broadcast-variable-in-DStream-tp22934.html

Re: How Broadcast variable scale?.

2015-02-23 Thread Guillermo Ortiz
/~agearh/cs267.sp10/files/mosharaf-spark-bc-report-spring10.pdf I don't know if they're talking about the current version (1.2.1) because the file was created in 2010. I took a look to the documentation and API and I read that there is an TorrentFactory for broadcast variable it's which

Re: How Broadcast variable scale?.

2015-02-23 Thread Mosharaf Chowdhury
that there is an TorrentFactory for broadcast variable it's which it uses Spark right now? In the article they talk that Spark uses another one (Centralized HDFS Broadcast) How does it scale if I have a big cluster (about 300 nodes) the current algorithm?? is it linear? are there others options

How Broadcast variable scale?.

2015-02-23 Thread Guillermo Ortiz
took a look to the documentation and API and I read that there is an TorrentFactory for broadcast variable it's which it uses Spark right now? In the article they talk that Spark uses another one (Centralized HDFS Broadcast) How does it scale if I have a big cluster (about 300 nodes) the current

Broadcast variable questions

2015-01-21 Thread Hao Ren
Hi, Spark 1.2.0, standalone, local mode(for test) Here are several questions on broadcast variable: 1) Where is the broadcast variable cached on executors ? In memory or On disk ? I read somewhere, it was said these variables are stored in spark.local.dir. But I can find any info in Spark 1.2

Re: how to blend a DStream and a broadcast variable?

2014-11-06 Thread Steve Reinhardt
one large data stream (DS1) that obviously maps to a DStream. The processing of DS1 involves filtering it for any of a set of known values, which will change over time, though slowly by streaming standards. If the filter data were static, it seems to obviously map to a broadcast variable

how to blend a DStream and a broadcast variable?

2014-11-05 Thread spr
in the Spark Streaming Programming Guide of broadcast variables. Do they coexist properly? 2) Once I have a broadcast variable in place in the periodic function that Spark Streaming executes, how can I update its value? Obviously I can't literally update the value of that broadcast variable, which

Re: how to blend a DStream and a broadcast variable?

2014-11-05 Thread Sean Owen
will change over time, though slowly by streaming standards. If the filter data were static, it seems to obviously map to a broadcast variable, but it's dynamic. (And I don't think it works to implement it as a DStream, because the new values need to be copied redundantly to all executors

Returned type of Broadcast variable is byte array

2014-10-30 Thread Stephen Boesch
As a template for creating a broadcast variable, the following code snippet within mllib was used: val bcIdf = dataset.context.broadcast(idf) dataset.mapPartitions { iter = val thisIdf = bcIdf.value The new code follows that model: import org.apache.spark.mllib.linalg.{Vector

Re: Returned type of Broadcast variable is byte array

2014-10-30 Thread Stephen Boesch
= sc.broadcast(crows) .. val arrayVect = bcRows.value 2014-10-30 7:42 GMT-07:00 Stephen Boesch java...@gmail.com: As a template for creating a broadcast variable, the following code snippet within mllib was used: val bcIdf = dataset.context.broadcast(idf) dataset.mapPartitions

spark-streaming-kafka with broadcast variable

2014-09-05 Thread Penny Espinoza
I need to use a broadcast variable inside the Decoder I use for class parameter T in org.apache.spark.streaming.kafka.KafkaUtils.createStream. I am using the override with this signature: createStreamhttps://spark.apache.org/docs/latest/api/java/org/apache/spark/streaming/kafka/KafkaUtils.html

Re: spark-streaming-kafka with broadcast variable

2014-09-05 Thread Tathagata Das
/StorageLevel.html storageLevel) Anyone know how I might do that? The actual Decoder instance is instantiated by Spark, so I don’t know how to access a broadcast variable inside the fromBytes method. thanks Penny

How to clear broadcast variable from driver memory?

2014-09-03 Thread Kevin Jung
in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-clear-broadcast-variable-from-driver-memory-tp13353.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user

Re: How to clear broadcast variable from driver memory?

2014-09-03 Thread Andrew Or
-to-clear-broadcast-variable-from-driver-memory-tp13353.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail

Re: Got NotSerializableException when access broadcast variable

2014-08-21 Thread tianyi
at 8:53 PM, Vida Ha v...@databricks.com wrote: Hi, I doubt the the broadcast variable is your problem, since you are seeing: org.apache.spark.SparkException: Task not serializable Caused by: java.io.NotSerializableException: org.apache.spark.sql.hive.HiveContext$$anon$3 We have

Got NotSerializableException when access broadcast variable

2014-08-20 Thread 田毅
Hi everyone! I got a exception when i run my script with spark-shell: I added SPARK_JAVA_OPTS=-Dsun.io.serialization.extendedDebugInfo=true in spark-env.sh to show the following stack: org.apache.spark.SparkException: Task not serializable at

Re: Got NotSerializableException when access broadcast variable

2014-08-20 Thread tianyi
Thanks for help. I run this script again with bin/spark-shell --conf spark.serializer=org.apache.spark.serializer.KryoSerializer” in the console, I can see: scala sc.getConf.getAll.foreach(println) (spark.tachyonStore.folderName,spark-eaabe986-03cb-41bd-bde5-993c7db3f048)

Re: Got NotSerializableException when access broadcast variable

2014-08-20 Thread Vida Ha
Hi, I doubt the the broadcast variable is your problem, since you are seeing: org.apache.spark.SparkException: Task not serializable Caused by: java.io.NotSerializableException: org.apache.spark.sql .hive.HiveContext$$anon$3 We have a knowledgebase article that explains why this happens - it's

Re: Got NotSerializableException when access broadcast variable

2014-08-20 Thread Yin Huai
If you want to filter the table name, you can use hc.sql(show tables).filter(row = !test.equals(row.getString(0 Seems making functionRegistry transient can fix the error. On Wed, Aug 20, 2014 at 8:53 PM, Vida Ha v...@databricks.com wrote: Hi, I doubt the the broadcast variable is your

RE: Got NotSerializableException when access broadcast variable

2014-08-20 Thread Yin Huai
NotSerializableException when access broadcast variable If you want to filter the table name, you can use hc.sql(show tables).filter(row = !test.equals(row.getString(0 Seems making functionRegistry transient can fix the error. On Wed, Aug 20, 2014 at 8:53 PM, Vida Ha v...@databricks.com wrote: Hi, I

broadcast variable get cleaned by ContextCleaner unexpectedly ?

2014-07-21 Thread Nan Zhu
Hi, all When I run some Spark application (actually unit test of the application in Jenkins ), I found that I always hit the FileNotFoundException when reading broadcast variable The program itself works well, except the unit test Here is the example log: 14/07/21 19:49:13 INFO Executor

Re: broadcast variable get cleaned by ContextCleaner unexpectedly ?

2014-07-21 Thread Nan Zhu
up data and metadata related to RDDs and broadcast variables, only when those variables are not in scope and get garbage-collected by the JVM. So if the broadcast variable in question is probably somehow going out of scope even before the job using the broadcast variable is in progress

Re: broadcast variable get cleaned by ContextCleaner unexpectedly ?

2014-07-21 Thread Nan Zhu
variables are not in scope and get garbage-collected by the JVM. So if the broadcast variable in question is probably somehow going out of scope even before the job using the broadcast variable is in progress. Could you reproduce this behavior reliably in a simple code snippet that you

Re: broadcast variable get cleaned by ContextCleaner unexpectedly ?

2014-07-21 Thread Tathagata Das
That is definitely weird. spark.core.max should not affect thing when they are running local mode. And, I am trying to think of scenarios that could cause a broadcast variable used in the current job to fall out of scope, but they all seem very far fetched. So i am really curious to see the code

  1   2   >