Spark Broadcasting large dataset

2015-07-10 Thread huanglr
Hey, Guys! I am using spark for NGS data application. In my case I have to broadcast a very big dataset to each task. However there are serveral tasks (say 48 tasks) running on cpus (also 48 cores) in the same node. These tasks, who run on the same node, could share the same dataset. But

RE: Spark Broadcasting large dataset

2015-07-10 Thread Ashic Mahtab
might look into putting the data into a fast store like Cassandra - that might help depending on your use case. Cheers, Ashic. Date: Fri, 10 Jul 2015 15:52:42 +0200 From: huan...@cebitec.uni-bielefeld.de To: user@spark.apache.org Subject: Spark Broadcasting large dataset Hey, Guys! I am using

Re: RE: Spark Broadcasting large dataset

2015-07-10 Thread huanglr
; Apache Spark Subject: RE: Spark Broadcasting large dataset When you say tasks, do you mean different applications, or different tasks in the same application? If it's the same program, they should be able to share the broadcasted value. But given you're asking the question, I imagine they're separate