Broadcast HashMap much slower than Array

2015-07-24 Thread huanglr
Hi, When I try to broadcast a hashmap, it runs much slower than the same data broadcast in array. It hangs in SparkContext: Created broadcast 0 for few secondes (30s), while an array does not. The broadcast dataset is about 1G. best! huanglr

Spark Broadcasting large dataset

2015-07-10 Thread huanglr
. But spark broadcast them 48 times (if I understand correctly). Is there a way to broadcast just one copy for each node and share by all tasks running on such nodes? Much appreciated! best! huanglr

Re: RE: Spark Broadcasting large dataset

2015-07-10 Thread huanglr
have 48 partitions corresponding 48 tasks (or clousure) where each tasks get a broadcast value (I see this from the memory usage and the API doc). Is there a way to share the value with all 48 partitions of 48 tasks? best! huanglr From: Ashic Mahtab Date: 2015-07-10 17:02 To: huanglr