I prepared simple example helping in reproducing problem:
https://github.com/alberskib/spark-streaming-broadcast-issue
I think that in that way it will be easier for you to understand problem
and find solution (if any exists)
Thanks
Bartek
2015-12-16 23:34 GMT+01:00 Bartłomiej Alberski <alb
+01:00 Tathagata Das <t...@databricks.com>:
> Could you test serializing and deserializing the MyClassReporter class
> separately?
>
> On Mon, Dec 14, 2015 at 8:57 AM, Bartłomiej Alberski <albers...@gmail.com>
> wrote:
>
>> Below is the full stacktrace(real nam
Below is the full stacktrace(real names of my classes were changed) with
short description of entries from my code:
rdd.mapPartitions{ partition => //this is the line to which second
stacktrace entry is pointing
val sender = broadcastedValue.value // this is the maing place to which
first
I mean getResults is called only after foo has been called on all records.
It could be useful if foo is asynchronous call to external service
returning Future that provide you some additional data i.e REST API (IO
operations).
If such API has latency of 100ms, sending all requests (for 1000
I knew that one possible solution will be to map loaded object into another
class just after reading from HDFS.
I was looking for solution enabling reuse of avro generated classes.
It could be useful in situation when your record have more 22 records,
because you do not need to write boilerplate