;
i...@ricobergmann.de> wrote:
> Hi folks!
>
>
> I'm trying to implement an update of a broadcast var in Spark Streaming.
> The idea is that whenever some configuration value has changed (this is
> periodically checked by the driver) the existing broadcast variable is
> unpersiste
Hi folks!
I'm trying to implement an update of a broadcast var in Spark Streaming.
The idea is that whenever some configuration value has changed (this is
periodically checked by the driver) the existing broadcast variable is
unpersisted and then (re-)broadcasted.
In a local test setup
checked by the driver) the existing broadcast variable is
> unpersisted and then (re-)broadcasted.
>
> In a local test setup (using a local Spark) it works fine but on a real
> cluster it doesn't work. The broadcast variable never gets updated. Am I
> doing something wrong? Or i
Hi folks!
I'm trying to implement an update of a broadcast var in Spark Streaming.
The idea is that whenever some configuration value has changed (this is
periodically checked by the driver) the existing broadcast variable is
unpersisted and then (re-)broadcasted.
In a local test setup
There is just one copy in memory. No different than if you have to
variables pointing to the same dict.
On Mon, May 3, 2021 at 7:54 AM Bode, Meikel, NMA-CFD <
meikel.b...@bertelsmann.de> wrote:
> Hi all,
>
>
>
> when broadcasting a large dict containing several million entries to
> executors
Hi all,
when broadcasting a large dict containing several million entries to executors
what exactly happens when calling bc_var.value within a UDF like:
..
d = bc_var.value
..
Does d receives a copy of the dict inside value or is this handled like a
pointer?
Thanks,
Meikel
Is there any concurrent access to broadcast variables in the workers? My
use case is that the broadcast variable is a large dataset object which is
*only* read. However, there is some cache in this object which is written
(so it speeds up generation of entries in this dataset). Everything
This might be obvious but just checking anyways, did you confirm whether or
not all of the messages have already been consumed by Spark? If that's the
case then I wouldn't expect much to happen unless new data comes into your
Kafka topic.
If you're a hundred percent sure that there's still plenty
I don't think that means it's stuck on removing something; it was
removed. Not sure what it is waiting on - more data perhaps?
On Sat, Apr 18, 2020 at 2:22 PM Alchemist wrote:
>
> I am running a simple Spark structured streaming application that is pulling
> data from a Kafka Topic. I have a
I am running a simple Spark structured streaming application that is pulling
data from a Kafka Topic. I have a Kafka Topic with nearly 1000 partitions. I am
running this app on 6 node EMR cluster with 4 cores and 16GB RAM. I observed
that Spark is trying to pull data from all 1024 Kafka
Hello,
I have quite specific usecase. I want to use an MXNet neural-net model in a
distributed fashion to get predictions on a very large dataset. It is not
possible to broadcast the model directly because the underlying implementation
is not serializable. Instead the model has to be loaded
You keep mentioning that you're viewing this after the fact in the spark
history server. Also the spark-shell isn't a UI so I'm not sure what you
mean by saying that the storage tab is blank in the spark-shell. Just so
I'm clear about what you're doing, are you looking at this info while your
The same problem is mentioned here :
https://forums.databricks.com/questions/117/why-is-my-rdd-not-showing-up-in-the-storage-tab-of.html
https://stackoverflow.com/questions/44792213/blank-storage-tab-in-spark-history-server
On Tue, Oct 16, 2018 at 8:06 AM Venkat Dabri wrote:
>
> I did try that
I did try that mechanism before but the data never shows up in the
storage tab. The storage tab is always blank. I have tried it in
Zeppelin as well as spark-shell.
scala> val classCount = spark.read.parquet("s3:// /classCount")
scala> classCount.persist
scala> classCount.count
Nothing shows
In your program persist the smaller table and use count to force it to
materialize. Then in the Spark UI go to the Storage tab. The size of your
table as spark sees it should be displayed there. Out of curiosity what
version / language of Spark are you using?
On Mon, Oct 15, 2018 at 11:53 AM
I am trying to do a broadcast join on two tables. The size of the
smaller table will vary based upon the parameters but the size of the
larger table is close to 2TB. What I have noticed is that if I don't
set the spark.sql.autoBroadcastJoinThreshold to 10G some of these
operations do a
;,
line 87, in _py2java
File
"/home/zeinab/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/serializers.py",
line 555, in dumps
File
"/home/zeinab/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py",
line 315, in __getnewargs__
Exception: It appears tha
Yes each of the executors have 60GB
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi Venkat,
do you executors have that much amount of memory?
Regards,
Gourav Sengupta
On Tue, Oct 9, 2018 at 4:44 PM V0lleyBallJunki3
wrote:
> Hello,
>I have set the value of spark.sql.autoBroadcastJoinThreshold to a very
> high value of 20 GB. I am joining a table that I am sure is below
Hello,
I have set the value of spark.sql.autoBroadcastJoinThreshold to a very
high value of 20 GB. I am joining a table that I am sure is below this
variable, however spark is doing a SortMergeJoin. If I set a broadcast hint
then spark does a broadcast join and job finishes much faster.
the broadcast variable for the
next microbach and finally reset the accumulator to 0.
I don't know if there are a better solution or others ideas to do this.
Has anyone faced this problem?
That’s the max size of a byte array in Java, limited by the length which is
defined as integer, and in most JVMS arrays can’t hold more than
Int.MaxValue - 8 elements. Other way to overcome this is to create multiple
broadcast variables
On Sunday, August 5, 2018, klrmowse wrote:
> i don't need
i don't need more, per se... i just need to watch the size of the variable;
then, if it's within the size limit, go ahead and broadcast it; if not, then
i won't broadcast...
so, that would be a yes then? (2 GB, or which is it exactly?)
--
Sent from:
I think if you need more then you should anyway think about something different
than broadcast variable ...
> On 5. Aug 2018, at 16:51, klrmowse wrote:
>
> is it currently still ~2GB (Integer.MAX_VALUE) ??
>
> or am i misinformed, since that's what google-search and scouring
is it currently still ~2GB (Integer.MAX_VALUE) ??
or am i misinformed, since that's what google-search and scouring this
mailing list seem to say... ?
Thanks
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
Hi all,
I’m poking around the Pyspark.Broadcast module, and I notice that one can pass
in a `pickle_registry` and a `path`. The documentation does not outline the
pickle registry use and I’m curious about how to use it, and if there are any
advantages to it.
Thanks,
Michael Mansour
dcast
>> variables without requiring a downtime of the streaming service.
>>
>> The key to this implementation is a live re-broadcastVariable()
>> interface, which can be triggered in between micro-batch executions,
>> without any re-boot required for the streaming appli
ion is a live re-broadcastVariable() interface,
> which can be triggered in between micro-batch executions, without any
> re-boot required for the streaming application. At a high level the task is
> done by re-fetching broadcast variable information from the spark driver,
> and then re-dis
ur goal was to re-broadcast
> variables without requiring a downtime of the streaming service.
>
> The key to this implementation is a live re-broadcastVariable() interface,
> which can be triggered in between micro-batch executions, without any
> re-boot required for the streaming applicatio
this implementation is a live re-broadcastVariable() interface,
which can be triggered in between micro-batch executions, without any
re-boot required for the streaming application. At a high level the task is
done by re-fetching broadcast variable information from the spark driver,
and then re-distribu
=>
val valueSum = aggLog1(f) + aggLog2(f)
aggs ++ Map(f -> valueSum)
}
aggs
}
I can't pass broadcast to the reduceKeyMapFunction. I create the broadcast
variable (broadcastv) in the driver but I fear it will not be initialized on
the workers where the reduceKeyMapFunctio
;denny.g@gmail.com<mailto:denny.g@gmail.com>>,
"user@spark.apache.org<mailto:user@spark.apache.org>"
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: How do I convert a data frame to broadcast variable?
Hi Nishit,
Yes the JDBC connector
:48 PM
To: Denny Lee; user@spark.apache.org
Subject: Re: How do I convert a data frame to broadcast variable?
Thanks Denny! That does help. I will give that a shot.
Question: If I am going this route, I am wondering how can I only read few
columns of a table (not whole table) from JDBC as data fram
ark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: How do I convert a data frame to broadcast variable?
If you're able to read the data in as a DataFrame, perhaps you can use a
BroadcastHashJoin so that way you can join to that table presuming its small
enough to dis
%20SQL,%20DataFrames%20%26%20Datasets/05%20BroadcastHashJoin%20-%20scala.html
HTH!
On Thu, Nov 3, 2016 at 8:53 AM Jain, Nishit <nja...@underarmour.com> wrote:
> I have a lookup table in HANA database. I want to create a spark broadcast
> variable for it.
> What would be the sug
I have a lookup table in HANA database. I want to create a spark broadcast
variable for it.
What would be the suggested approach? Should I read it as an data frame and
convert data frame into broadcast variable?
Thanks,
Nishit
Hi,
I am getting Nullpointer exception due to Kryo Serialization issue, while
trying to read a BiMap broadcast variable. Attached is the code snippets.
Pointers shared here didn't help - link1
<http://stackoverflow.com/questions/33156095/spark-serialization-issue-with-hashmap>,
link2
If your using this from Java you might find it easier to construct a
JavaSparkContext, the broadcast function will automatically use a fake
class tag.
On Sun, Aug 7, 2016 at 11:57 PM, Aseem Bansal wrote:
> I am using the following to broadcast and it explicitly requires
I am using the following to broadcast and it explicitly requires classtag
sparkSession.sparkContext().broadcast
On Mon, Aug 8, 2016 at 12:09 PM, Holden Karau wrote:
> Classtag is Scala concept (see http://docs.scala-lang.
>
Classtag is Scala concept (see
http://docs.scala-lang.org/overviews/reflection/typetags-manifests.html) -
although this should not be explicitly required - looking at
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkContext
we can see that in Scala the classtag tag is
Earlier for broadcasting we just needed to use
sparkcontext.broadcast(objectToBroadcast)
But now it is
sparkcontext.broadcast(objectToBroadcast, classTag)
What is classTag here?
adcast_var ' is not defined
>
>
> Any ideas on how to fix it ?
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/broadcast-variable-not-picked-up
Any ideas on how to fix it ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/broadcast-variable-not-picked-up-tp26955.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
iver;
>
> 16/02/02 14:51:53 INFO BlockManagerInfo: Added rdd_2_12 in memory on
> localhost:58536 (size: 40.0 B, free: 510.7 MB)
>
>
>
> On Tue, Feb 2, 2016 at 8:20 AM, apu mishra . rr <apumishra...@gmail.com>
> wrote:
>
>> How can I determine the size (in byte
How can I determine the size (in bytes) of a broadcast variable? Do I need
to use the .dump method and then look at the size of the result, or is
there an easier way?
Using PySpark with Spark 1.6.
Thanks!
Apu
wrote:
> How can I determine the size (in bytes) of a broadcast variable? Do I need
> to use the .dump method and then look at the size of the result, or is
> there an easier way?
>
> Using PySpark with Spark 1.6.
>
> Thanks!
>
> Apu
>
--
---
Takeshi Yamamuro
RecordReader will load the full entity(a plain Java Bean)
> from my data source for each ID in the specific split.
>
> My first observation is some Spark tasks were timeout, and looks like
> Spark broadcast variable is being used to distribute my splits, is that
> correct? If so, fro
in the specific split.
My first observation is some Spark tasks were timeout, and looks like Spark
broadcast variable is being used to distribute my splits, is that correct?
If so, from performance perspective, what enhancement I can make to make it
better?
Thanks
--
--Anfernee
n creating DataFrame
DataFrame df = sourceFrame.toDF(fieldNames);
I want to do the above using broadcast variable to that we dont ship huge
string array to executor I believe we can do something like the following to
create broadcast
String[] brArray = sc.broadcast(fieldNames);
DataFrame df = sourceF
Hi,
How is the ContextCleaner different from spark.cleaner.ttl?Is
spark.cleaner.ttl when there is ContextCleaner in the Streaming job?
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/broadcast-variable-get-cleaned-by-ContextCleaner
roducible in my environment and the application runs fine as soon
> as I don't access this specific Map from this specific RDD.
>
> Any idea what might cause this problem?
> Can I provide you with any other Information (besides posting >500 lines
> of code)?
>
> c
with any other Information (besides posting >500 lines of
code)?
cheers
Sebastian
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Access-a-Broadcast-variable-causes-Spark-to-launch-a-second-context-tp24595.html
Sent from the Apache Spark User List maili
.
Thanks for all your comments.
On Tue, Aug 18, 2015 at 4:28 PM, Tathagata Das t...@databricks.com
wrote:
Why are you even trying to broadcast a producer? A broadcast variable
is some immutable piece of serializable DATA that can be used for
processing on the executors. A Kafka producer
are you even trying to broadcast a producer? A broadcast variable
is some immutable piece of serializable DATA that can be used for
processing on the executors. A Kafka producer is neither DATA nor
immutable, and definitely not serializable.
The right way to do this is to create the producer
:
Why are you even trying to broadcast a producer? A broadcast variable is
some immutable piece of serializable DATA that can be used for processing
on the executors. A Kafka producer is neither DATA nor immutable, and
definitely not serializable.
The right way to do this is to create
Why are you even trying to broadcast a producer? A broadcast variable is
some immutable piece of serializable DATA that can be used for processing
on the executors. A Kafka producer is neither DATA nor immutable, and
definitely not serializable.
The right way to do this is to create the producer
:
Why are you even trying to broadcast a producer? A broadcast variable is
some immutable piece of serializable DATA that can be used for processing
on the executors. A Kafka producer is neither DATA nor immutable, and
definitely not serializable.
The right way to do this is to create
to broadcast a producer? A broadcast variable is
some immutable piece of serializable DATA that can be used for processing
on the executors. A Kafka producer is neither DATA nor immutable, and
definitely not serializable.
The right way to do this is to create the producer in the executors.
Please
Hi,
Did anyone see java.util.ConcurrentModificationException when using
broadcast variables?
I encountered this exception when wrapping a Kafka producer like this in
the spark streaming driver.
Here is what I did.
KafkaProducerString, String producer = new KafkaProducerString,
String(properties);
I wouldn't expect a kafka producer to be serializable at all... among other
things, it has a background thread
On Tue, Aug 18, 2015 at 4:55 PM, Shenghua(Daniel) Wan wansheng...@gmail.com
wrote:
Hi,
Did anyone see java.util.ConcurrentModificationException when using
broadcast variables?
I
Hi
I am using spark streaming 1.3 and using checkpointing.
But job is failing to recover from checkpoint on restart.
For broadcast variable it says :
1.WARN TaskSetManager: Lost task 15.0 in stage 7.0 (TID 1269, hostIP):
java.lang.ClassCastException: [B cannot be cast
.
For broadcast variable it says :
1.WARN TaskSetManager: Lost task 15.0 in stage 7.0 (TID 1269, hostIP):
java.lang.ClassCastException: [B cannot be cast to
pkg.broadcastvariableclassname
at point where i call bcvariable.value() in map function.
at mapfunction
.
rdd.map { x = /// use accum }
}
On Wed, Jul 29, 2015 at 1:15 PM, Shushant Arora shushantaror...@gmail.com
wrote:
Hi
I am using spark streaming 1.3 and using checkpointing.
But job is failing to recover from checkpoint on restart.
For broadcast variable it says :
1.WARN TaskSetManager: Lost
:
Hi
I am using spark streaming 1.3 and using checkpointing.
But job is failing to recover from checkpoint on restart.
For broadcast variable it says :
1.WARN TaskSetManager: Lost task 15.0 in stage 7.0 (TID 1269, hostIP):
java.lang.ClassCastException: [B cannot be cast
'); wrote:
i am running in coarse grained mode, let's say with 8 cores per executor.
If I use a broadcast variable, will all of the tasks in that executor
share the same value? Or will each task broadcast its own value ie in this
case, would there be one value in the executor shared by the 8 tasks
i am running in coarse grained mode, let's say with 8 cores per executor.
If I use a broadcast variable, will all of the tasks in that executor share
the same value? Or will each task broadcast its own value ie in this case,
would there be one value in the executor shared by the 8 tasks, or would
If I understand correctly, there would be one value in the executor.
Cheers
On Tue, Jul 28, 2015 at 4:23 PM, Jonathan Coveney jcove...@gmail.com
wrote:
i am running in coarse grained mode, let's say with 8 cores per executor.
If I use a broadcast variable, will all of the tasks
Hi.
You can use a broadcast variable to make data available to all the nodes in
your cluster that can live longer then just the current distributed task.
For example if you need a to access a large structure in multiple sub-tasks,
instead of sending that structure again and again with each sub
some operation
based on the checkup. The following is a snippet of how I access the
broadcast variable.
JavaDStreamTuple3Long,Double,String split = matched.map(new
GenerateType2Scores());
class GenerateType2Scores implements FunctionString, Tuple3Long, Double,
String {
@Override
btw. just for reference I have added the code in a gist:
https://gist.github.com/nipunarora/ed987e45028250248edc
and a stackoverflow reference here:
http://stackoverflow.com/questions/31006490/broadcast-variable-null-pointer-exception-in-spark-streaming
On Tue, Jun 23, 2015 at 11:01 AM, Nipun
on the checkup. The following is a snippet of how I access the
broadcast variable.
JavaDStreamTuple3Long,Double,String split = matched.map(new
GenerateType2Scores());
class GenerateType2Scores implements FunctionString, Tuple3Long, Double,
String {
@Override
public Tuple3Long
().broadcast(hm);
I need to access this model in my mapper phase, and do some operation based
on the checkup. The following is a snippet of how I access the broadcast
variable.
JavaDStreamTuple3Long,Double,String split = matched.map(new
GenerateType2Scores());
class GenerateType2Scores implements
:
https://gist.github.com/nipunarora/ed987e45028250248edc
and a stackoverflow reference here:
http://stackoverflow.com/questions/31006490/broadcast-variable-null-pointer-exception-in-spark-streaming
On Tue, Jun 23, 2015 at 11:01 AM, Nipun Arora nipunarora2...@gmail.com
wrote:
Hi,
I
/questions/31006490/broadcast-variable-null-pointer-exception-in-spark-streaming
On Tue, Jun 23, 2015 at 11:01 AM, Nipun Arora nipunarora2...@gmail.com
wrote:
Hi,
I have a spark streaming application where I need to access a model
saved in a HashMap.
I have *no problems in running the same
Can someone help take a look at my questions? Thanks.
bit1...@163.com
From: bit1...@163.com
Date: 2015-05-29 18:57
To: user
Subject: How Broadcast variable works
Hi,
I have a spark streaming application. SparkContext uses broadcast vriables to
broadcast Configuration information that each
1. No. thats the purpose of broadcast variable, ie not to be shipped with
every task. From Documentation
Broadcast Variables
Broadcast variables allow the programmer to keep a read-only variable
cached on each machine rather than shipping a copy of it with tasks. They
can be used, for example
I'm running spark 1.3.1 in standalone mode with 2 nodes cluster. I tried
with spark-shell and it works fine. Please help!
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-when-accessing-broadcast-variable-in-DStream-tp22934.html
/~agearh/cs267.sp10/files/mosharaf-spark-bc-report-spring10.pdf
I don't know if they're talking about the current version (1.2.1)
because the file was created in 2010.
I took a look to the documentation and API and I read that there is an
TorrentFactory for broadcast variable
it's which
that there is an
TorrentFactory for broadcast variable
it's which it uses Spark right now? In the article they talk that
Spark uses another one (Centralized HDFS Broadcast)
How does it scale if I have a big cluster (about 300 nodes) the
current algorithm?? is it linear? are there others options
took a look to the documentation and API and I read that there is an
TorrentFactory for broadcast variable
it's which it uses Spark right now? In the article they talk that
Spark uses another one (Centralized HDFS Broadcast)
How does it scale if I have a big cluster (about 300 nodes) the
current
Hi,
Spark 1.2.0, standalone, local mode(for test)
Here are several questions on broadcast variable:
1) Where is the broadcast variable cached on executors ? In memory or On
disk ?
I read somewhere, it was said these variables are stored in spark.local.dir.
But I can find any info in Spark 1.2
one large data stream (DS1) that obviously maps to a DStream.
The processing of DS1 involves filtering it for any of a set of known
values, which will change over time, though slowly by streaming standards.
If the filter data were static, it seems to obviously map to a broadcast
variable
in the Spark Streaming Programming Guide of broadcast
variables. Do they coexist properly?
2) Once I have a broadcast variable in place in the periodic function that
Spark Streaming executes, how can I update its value? Obviously I can't
literally update the value of that broadcast variable, which
will change over time, though slowly by streaming standards.
If the filter data were static, it seems to obviously map to a broadcast
variable, but it's dynamic. (And I don't think it works to implement it as
a DStream, because the new values need to be copied redundantly to all
executors
As a template for creating a broadcast variable, the following code snippet
within mllib was used:
val bcIdf = dataset.context.broadcast(idf)
dataset.mapPartitions { iter =
val thisIdf = bcIdf.value
The new code follows that model:
import org.apache.spark.mllib.linalg.{Vector
= sc.broadcast(crows)
..
val arrayVect = bcRows.value
2014-10-30 7:42 GMT-07:00 Stephen Boesch java...@gmail.com:
As a template for creating a broadcast variable, the following code
snippet within mllib was used:
val bcIdf = dataset.context.broadcast(idf)
dataset.mapPartitions
I need to use a broadcast variable inside the Decoder I use for class parameter
T in org.apache.spark.streaming.kafka.KafkaUtils.createStream. I am using the
override with this signature:
createStreamhttps://spark.apache.org/docs/latest/api/java/org/apache/spark/streaming/kafka/KafkaUtils.html
/StorageLevel.html
storageLevel)
Anyone know how I might do that? The actual Decoder instance is
instantiated by Spark, so I don’t know how to access a broadcast variable
inside the fromBytes method.
thanks
Penny
in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-clear-broadcast-variable-from-driver-memory-tp13353.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user
-to-clear-broadcast-variable-from-driver-memory-tp13353.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail
at 8:53 PM, Vida Ha v...@databricks.com wrote:
Hi,
I doubt the the broadcast variable is your problem, since you are seeing:
org.apache.spark.SparkException: Task not serializable
Caused by: java.io.NotSerializableException:
org.apache.spark.sql.hive.HiveContext$$anon$3
We have
Hi everyone!
I got a exception when i run my script with spark-shell:
I added
SPARK_JAVA_OPTS=-Dsun.io.serialization.extendedDebugInfo=true
in spark-env.sh to show the following stack:
org.apache.spark.SparkException: Task not serializable
at
Thanks for help.
I run this script again with bin/spark-shell --conf
spark.serializer=org.apache.spark.serializer.KryoSerializer”
in the console, I can see:
scala sc.getConf.getAll.foreach(println)
(spark.tachyonStore.folderName,spark-eaabe986-03cb-41bd-bde5-993c7db3f048)
Hi,
I doubt the the broadcast variable is your problem, since you are seeing:
org.apache.spark.SparkException: Task not serializable
Caused by: java.io.NotSerializableException: org.apache.spark.sql
.hive.HiveContext$$anon$3
We have a knowledgebase article that explains why this happens - it's
If you want to filter the table name, you can use
hc.sql(show tables).filter(row = !test.equals(row.getString(0
Seems making functionRegistry transient can fix the error.
On Wed, Aug 20, 2014 at 8:53 PM, Vida Ha v...@databricks.com wrote:
Hi,
I doubt the the broadcast variable is your
NotSerializableException when access broadcast variable
If you want to filter the table name, you can use
hc.sql(show tables).filter(row = !test.equals(row.getString(0
Seems making functionRegistry transient can fix the error.
On Wed, Aug 20, 2014 at 8:53 PM, Vida Ha v...@databricks.com wrote:
Hi,
I
Hi, all
When I run some Spark application (actually unit test of the application in
Jenkins ), I found that I always hit the FileNotFoundException when reading
broadcast variable
The program itself works well, except the unit test
Here is the example log:
14/07/21 19:49:13 INFO Executor
up data and metadata related to RDDs and broadcast
variables, only when those variables are not in scope and get
garbage-collected by the JVM. So if the broadcast variable in question is
probably somehow going out of scope even before the job using the broadcast
variable is in progress
variables are not in scope and get
garbage-collected by the JVM. So if the broadcast variable in question is
probably somehow going out of scope even before the job using the broadcast
variable is in progress.
Could you reproduce this behavior reliably in a simple code snippet that
you
That is definitely weird. spark.core.max should not affect thing when they
are running local mode.
And, I am trying to think of scenarios that could cause a broadcast
variable used in the current job to fall out of scope, but they all seem
very far fetched. So i am really curious to see the code
1 - 100 of 110 matches
Mail list logo