Re: How to share large resources like dictionaries while processing data with Spark ?

2015-06-05 Thread Charles Earl
javascript:_e(%7B%7D,'cvml','user@spark.apache.org'); *Subject:* Re: How to share large resources like dictionaries while processing data with Spark ? Thanks so much, Yiannis, Olivier, Huang! On Thu, Jun 4, 2015 at 6:44 PM, Yiannis Gkoufas johngou...@gmail.com javascript:_e(%7B%7D,'cvml

RE: How to share large resources like dictionaries while processing data with Spark ?

2015-06-05 Thread Evo Eftimov
[mailto:dgoldenberg...@gmail.com] Sent: Friday, June 5, 2015 12:12 AM To: Yiannis Gkoufas Cc: Olivier Girardot; user@spark.apache.org Subject: Re: How to share large resources like dictionaries while processing data with Spark ? Thanks so much, Yiannis, Olivier, Huang! On Thu, Jun 4, 2015 at 6

RE: How to share large resources like dictionaries while processing data with Spark ?

2015-06-05 Thread Evo Eftimov
@spark.apache.org' Subject: RE: How to share large resources like dictionaries while processing data with Spark ? It is called Indexed RDD https://github.com/amplab/spark-indexedrdd From: Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] Sent: Friday, June 5, 2015 3:15 PM To: Evo

Re: How to share large resources like dictionaries while processing data with Spark ?

2015-06-05 Thread Charles Earl
: How to share large resources like dictionaries while processing data with Spark ? Thanks so much, Yiannis, Olivier, Huang! On Thu, Jun 4, 2015 at 6:44 PM, Yiannis Gkoufas johngou...@gmail.com javascript:_e(%7B%7D,'cvml','johngou...@gmail.com'); wrote: Hi there, I would recommend

RE: How to share large resources like dictionaries while processing data with Spark ?

2015-06-05 Thread Evo Eftimov
Goldenberg [mailto:dgoldenberg...@gmail.com] Sent: Friday, June 5, 2015 12:12 AM To: Yiannis Gkoufas Cc: Olivier Girardot; user@spark.apache.org Subject: Re: How to share large resources like dictionaries while processing data with Spark ? Thanks so much, Yiannis, Olivier, Huang! On Thu, Jun

Re: How to share large resources like dictionaries while processing data with Spark ?

2015-06-05 Thread Dmitry Goldenberg
:* Friday, June 5, 2015 12:12 AM *To:* Yiannis Gkoufas *Cc:* Olivier Girardot; user@spark.apache.org *Subject:* Re: How to share large resources like dictionaries while processing data with Spark ? Thanks so much, Yiannis, Olivier, Huang! On Thu, Jun 4, 2015 at 6:44 PM, Yiannis Gkoufas

RE: How to share large resources like dictionaries while processing data with Spark ?

2015-06-05 Thread Evo Eftimov
, 2015 12:12 AM To: Yiannis Gkoufas Cc: Olivier Girardot; user@spark.apache.org javascript:_e(%7B%7D,'cvml','user@spark.apache.org'); Subject: Re: How to share large resources like dictionaries while processing data with Spark ? Thanks so much, Yiannis, Olivier, Huang! On Thu, Jun 4, 2015 at 6

Re: How to share large resources like dictionaries while processing data with Spark ?

2015-06-04 Thread Olivier Girardot
You can use it as a broadcast variable, but if it's too large (more than 1Gb I guess), you may need to share it joining this using some kind of key to the other RDDs. But this is the kind of thing broadcast variables were designed for. Regards, Olivier. Le jeu. 4 juin 2015 à 23:50, dgoldenberg

RE: How to share large resources like dictionaries while processing data with Spark ?

2015-06-04 Thread Huang, Roger
Is the dictionary read-only? Did you look at http://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables ? -Original Message- From: dgoldenberg [mailto:dgoldenberg...@gmail.com] Sent: Thursday, June 04, 2015 4:50 PM To: user@spark.apache.org Subject: How to share

Re: How to share large resources like dictionaries while processing data with Spark ?

2015-06-04 Thread Yiannis Gkoufas
Hi there, I would recommend checking out https://github.com/spark-jobserver/spark-jobserver which I think gives the functionality you are looking for. I haven't tested it though. BR On 5 June 2015 at 01:35, Olivier Girardot ssab...@gmail.com wrote: You can use it as a broadcast variable, but

Re: How to share large resources like dictionaries while processing data with Spark ?

2015-06-04 Thread Dmitry Goldenberg
Thanks so much, Yiannis, Olivier, Huang! On Thu, Jun 4, 2015 at 6:44 PM, Yiannis Gkoufas johngou...@gmail.com wrote: Hi there, I would recommend checking out https://github.com/spark-jobserver/spark-jobserver which I think gives the functionality you are looking for. I haven't tested it