Sabastian,
*Update:-* This is not possible. Probably will remain this way for the
foreseeable future.
https://issues.apache.org/jira/browse/SPARK-3863
Srikanth
On Fri, Feb 19, 2016 at 10:20 AM, Sebastian Piu
wrote:
> I don't have the code with me now, and I ended
Sure. These may be unrelated.
On Fri, Feb 19, 2016 at 10:39 AM, Jerry Lam wrote:
> Hi guys,
>
> I also encounter broadcast dataframe issue not for steaming jobs but
> regular dataframe join. In my case, the executors died probably due to OOM
> which I don't think it should
Hmmm..OK.
Srikanth
On Fri, Feb 19, 2016 at 10:20 AM, Sebastian Piu
wrote:
> I don't have the code with me now, and I ended moving everything to RDD in
> the end and using map operations to do some lookups, i.e. instead of
> broadcasting a Dataframe I ended broadcasting
Hi guys,
I also encounter broadcast dataframe issue not for steaming jobs but regular
dataframe join. In my case, the executors died probably due to OOM which I
don't think it should use that much memory. Anyway, I'm going to craft an
example and send it here to see if it is a bug or something
I don't have the code with me now, and I ended moving everything to RDD in
the end and using map operations to do some lookups, i.e. instead of
broadcasting a Dataframe I ended broadcasting a Map
On Fri, Feb 19, 2016 at 11:39 AM Srikanth wrote:
> It didn't fail. It
It didn't fail. It wasn't broadcasting. I just ran the test again and here
are the logs.
Every batch is reading the metadata file.
16/02/19 06:27:02 INFO HadoopRDD: Input split:
file:/shared/data/test-data.txt:0+27
16/02/19 06:27:02 INFO HadoopRDD: Input split:
I don't see anything obviously wrong on your second approach, I've done it
like that before and it worked. When you say that it didn't work what do
you mean? did it fail? it didnt broadcast?
On Thu, Feb 18, 2016 at 11:43 PM Srikanth wrote:
> Code with SQL broadcast hint.
Code with SQL broadcast hint. This worked and I was able to see that
broadcastjoin was performed.
val testDF = sqlContext.read.format("com.databricks.spark.csv")
.schema(schema).load("file:///shared/data/test-data.txt")
val lines = ssc.socketTextStream("DevNode", )
Can you paste the code where you use sc.broadcast ?
On Thu, Feb 18, 2016 at 5:32 PM Srikanth wrote:
> Sebastian,
>
> I was able to broadcast using sql broadcast hint. Question is how to
> prevent this broadcast for each RDD.
> Is there a way where it can be broadcast once
Sebastian,
I was able to broadcast using sql broadcast hint. Question is how to
prevent this broadcast for each RDD.
Is there a way where it can be broadcast once and used locally for each RDD?
Right now every batch the metadata file is read and the DF is broadcasted.
I tried sc.broadcast and
You should be able to broadcast that data frame using sc.broadcast and join
against it.
On Wed, 17 Feb 2016, 21:13 Srikanth wrote:
> Hello,
>
> I have a streaming use case where I plan to keep a dataset broadcasted and
> cached on each executor.
> Every micro batch in
Hello,
I have a streaming use case where I plan to keep a dataset broadcasted and
cached on each executor.
Every micro batch in streaming will create a DF out of the RDD and join the
batch.
The below code will perform the broadcast operation for each RDD. Is there
a way to broadcast it just once?
12 matches
Mail list logo