I agree, it was by mistake. I just updated so that the next person looking for torrent broadcast issues will have a hint :)
Thank you. Daniel On Sun, Jun 19, 2016 at 5:26 PM, Ted Yu <yuzhih...@gmail.com> wrote: > I think good practice is not to hold on to SparkContext in mapFunction. > > On Sun, Jun 19, 2016 at 7:10 AM, Takeshi Yamamuro <linguin....@gmail.com> > wrote: > >> How about using `transient` annotations? >> >> // maropu >> >> On Sun, Jun 19, 2016 at 10:51 PM, Daniel Haviv < >> daniel.ha...@veracity-group.com> wrote: >> >>> Hi, >>> Just updating on my findings for future reference. >>> The problem was that after refactoring my code I ended up with a scala >>> object which held SparkContext as a member, eg: >>> object A { >>> sc: SparkContext = new SparkContext >>> def mapFunction {} >>> } >>> >>> and when I called rdd.map(A.mapFunction) it failed as A.sc is not >>> serializable. >>> >>> Thanks, >>> Daniel >>> >>> On Tue, Jun 7, 2016 at 10:13 AM, Takeshi Yamamuro <linguin....@gmail.com >>> > wrote: >>> >>>> Hi, >>>> >>>> Since `HttpBroadcastFactory` has already been removed in master, so >>>> you cannot use the broadcast mechanism in future releases. >>>> >>>> Anyway, I couldn't find a root cause only from the stacktraces... >>>> >>>> // maropu >>>> >>>> >>>> >>>> >>>> On Mon, Jun 6, 2016 at 2:14 AM, Daniel Haviv < >>>> daniel.ha...@veracity-group.com> wrote: >>>> >>>>> Hi, >>>>> I've set spark.broadcast.factory to >>>>> org.apache.spark.broadcast.HttpBroadcastFactory and it indeed resolve my >>>>> issue. >>>>> >>>>> I'm creating a dataframe which creates a broadcast variable internally >>>>> and then fails due to the torrent broadcast with the following stacktrace: >>>>> Caused by: org.apache.spark.SparkException: Failed to get >>>>> broadcast_3_piece0 of broadcast_3 >>>>> at >>>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:138) >>>>> at >>>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:138) >>>>> at scala.Option.getOrElse(Option.scala:120) >>>>> at >>>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:137) >>>>> at >>>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:120) >>>>> at >>>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:120) >>>>> at scala.collection.immutable.List.foreach(List.scala:318) >>>>> at org.apache.spark.broadcast.TorrentBroadcast.org >>>>> $apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:120) >>>>> at >>>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:175) >>>>> at >>>>> org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1220) >>>>> >>>>> I'm using spark 1.6.0 on CDH 5.7 >>>>> >>>>> Thanks, >>>>> Daniel >>>>> >>>>> >>>>> On Wed, Jun 1, 2016 at 5:52 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>>> >>>>>> I found spark.broadcast.blockSize but no parameter to switch >>>>>> broadcast method. >>>>>> >>>>>> Can you describe the issues with torrent broadcast in more detail ? >>>>>> >>>>>> Which version of Spark are you using ? >>>>>> >>>>>> Thanks >>>>>> >>>>>> On Wed, Jun 1, 2016 at 7:48 AM, Daniel Haviv < >>>>>> daniel.ha...@veracity-group.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> Our application is failing due to issues with the torrent broadcast, >>>>>>> is there a way to switch to another broadcast method ? >>>>>>> >>>>>>> Thank you. >>>>>>> Daniel >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> --- >>>> Takeshi Yamamuro >>>> >>> >>> >> >> >> -- >> --- >> Takeshi Yamamuro >> > >