Re: How to share large resources like dictionaries while processing data with Spark ?
Would the IndexedRDD feature provide what the Lookup RDD does? I'Ve been using a broadcast variable map for a similar kind of thing -- It probably is within 1GB but interested to know if the lookup (or indexed) might be better. C On Friday, June 5, 2015, Dmitry Goldenberg wrote: > Thanks everyone. Evo, could you provide a link to the Lookup RDD project? > I can't seem to locate it exactly on Github. (Yes, to your point, our > project is Spark streaming based). Thank you. > > On Fri, Jun 5, 2015 at 6:04 AM, Evo Eftimov > wrote: > >> Oops, @Yiannis, sorry to be a party pooper but the Job Server is for >> Spark Batch Jobs (besides anyone can put something like that in 5 min), >> while I am under the impression that Dmytiy is working on Spark Streaming >> app >> >> >> >> Besides the Job Server is essentially for sharing the Spark Context >> between multiple threads >> >> >> >> Re Dmytiis intial question – you can load large data sets as Batch >> (Static) RDD from any Spark Streaming App and then join DStream RDDs >> against them to emulate “lookups” , you can also try the “Lookup RDD” – >> there is a git hub project >> >> >> >> *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com >> ] >> *Sent:* Friday, June 5, 2015 12:12 AM >> *To:* Yiannis Gkoufas >> *Cc:* Olivier Girardot; user@spark.apache.org >> >> *Subject:* Re: How to share large resources like dictionaries while >> processing data with Spark ? >> >> >> >> Thanks so much, Yiannis, Olivier, Huang! >> >> >> >> On Thu, Jun 4, 2015 at 6:44 PM, Yiannis Gkoufas > > wrote: >> >> Hi there, >> >> >> >> I would recommend checking out >> https://github.com/spark-jobserver/spark-jobserver which I think gives >> the functionality you are looking for. >> >> I haven't tested it though. >> >> >> >> BR >> >> >> >> On 5 June 2015 at 01:35, Olivier Girardot > > wrote: >> >> You can use it as a broadcast variable, but if it's "too" large (more >> than 1Gb I guess), you may need to share it joining this using some kind of >> key to the other RDDs. >> >> But this is the kind of thing broadcast variables were designed for. >> >> >> >> Regards, >> >> >> >> Olivier. >> >> >> >> Le jeu. 4 juin 2015 à 23:50, dgoldenberg > > a écrit : >> >> We have some pipelines defined where sometimes we need to load potentially >> large resources such as dictionaries. >> >> What would be the best strategy for sharing such resources among the >> transformations/actions within a consumer? Can they be shared somehow >> across the RDD's? >> >> I'm looking for a way to load such a resource once into the cluster memory >> and have it be available throughout the lifecycle of a consumer... >> >> Thanks. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-share-large-resources-like-dictionaries-while-processing-data-with-Spark-tp23162.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >> >> >> >> > > -- - Charles
RE: How to share large resources like dictionaries while processing data with Spark ?
And RDD.lookup() can not be invoked from Transformations e.g. maps Lookup() is an action which can be invoked only from the driver – if you want functionality like that from within Transformations executed on the cluster nodes try Indexed RDD Other options are load a Batch / Static RDD once in your Spark Streaming App and then keep joining and then e.g. filtering every incoming DStream RDD with the (big static) Batch RDD From: Evo Eftimov [mailto:evo.efti...@isecc.com] Sent: Friday, June 5, 2015 3:27 PM To: 'Dmitry Goldenberg' Cc: 'Yiannis Gkoufas'; 'Olivier Girardot'; 'user@spark.apache.org' Subject: RE: How to share large resources like dictionaries while processing data with Spark ? It is called Indexed RDD https://github.com/amplab/spark-indexedrdd From: Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] Sent: Friday, June 5, 2015 3:15 PM To: Evo Eftimov Cc: Yiannis Gkoufas; Olivier Girardot; user@spark.apache.org Subject: Re: How to share large resources like dictionaries while processing data with Spark ? Thanks everyone. Evo, could you provide a link to the Lookup RDD project? I can't seem to locate it exactly on Github. (Yes, to your point, our project is Spark streaming based). Thank you. On Fri, Jun 5, 2015 at 6:04 AM, Evo Eftimov wrote: Oops, @Yiannis, sorry to be a party pooper but the Job Server is for Spark Batch Jobs (besides anyone can put something like that in 5 min), while I am under the impression that Dmytiy is working on Spark Streaming app Besides the Job Server is essentially for sharing the Spark Context between multiple threads Re Dmytiis intial question – you can load large data sets as Batch (Static) RDD from any Spark Streaming App and then join DStream RDDs against them to emulate “lookups” , you can also try the “Lookup RDD” – there is a git hub project From: Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] Sent: Friday, June 5, 2015 12:12 AM To: Yiannis Gkoufas Cc: Olivier Girardot; user@spark.apache.org Subject: Re: How to share large resources like dictionaries while processing data with Spark ? Thanks so much, Yiannis, Olivier, Huang! On Thu, Jun 4, 2015 at 6:44 PM, Yiannis Gkoufas wrote: Hi there, I would recommend checking out https://github.com/spark-jobserver/spark-jobserver which I think gives the functionality you are looking for. I haven't tested it though. BR On 5 June 2015 at 01:35, Olivier Girardot wrote: You can use it as a broadcast variable, but if it's "too" large (more than 1Gb I guess), you may need to share it joining this using some kind of key to the other RDDs. But this is the kind of thing broadcast variables were designed for. Regards, Olivier. Le jeu. 4 juin 2015 à 23:50, dgoldenberg a écrit : We have some pipelines defined where sometimes we need to load potentially large resources such as dictionaries. What would be the best strategy for sharing such resources among the transformations/actions within a consumer? Can they be shared somehow across the RDD's? I'm looking for a way to load such a resource once into the cluster memory and have it be available throughout the lifecycle of a consumer... Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-share-large-resources-like-dictionaries-while-processing-data-with-Spark-tp23162.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: How to share large resources like dictionaries while processing data with Spark ?
It is called Indexed RDD https://github.com/amplab/spark-indexedrdd From: Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] Sent: Friday, June 5, 2015 3:15 PM To: Evo Eftimov Cc: Yiannis Gkoufas; Olivier Girardot; user@spark.apache.org Subject: Re: How to share large resources like dictionaries while processing data with Spark ? Thanks everyone. Evo, could you provide a link to the Lookup RDD project? I can't seem to locate it exactly on Github. (Yes, to your point, our project is Spark streaming based). Thank you. On Fri, Jun 5, 2015 at 6:04 AM, Evo Eftimov wrote: Oops, @Yiannis, sorry to be a party pooper but the Job Server is for Spark Batch Jobs (besides anyone can put something like that in 5 min), while I am under the impression that Dmytiy is working on Spark Streaming app Besides the Job Server is essentially for sharing the Spark Context between multiple threads Re Dmytiis intial question – you can load large data sets as Batch (Static) RDD from any Spark Streaming App and then join DStream RDDs against them to emulate “lookups” , you can also try the “Lookup RDD” – there is a git hub project From: Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] Sent: Friday, June 5, 2015 12:12 AM To: Yiannis Gkoufas Cc: Olivier Girardot; user@spark.apache.org Subject: Re: How to share large resources like dictionaries while processing data with Spark ? Thanks so much, Yiannis, Olivier, Huang! On Thu, Jun 4, 2015 at 6:44 PM, Yiannis Gkoufas wrote: Hi there, I would recommend checking out https://github.com/spark-jobserver/spark-jobserver which I think gives the functionality you are looking for. I haven't tested it though. BR On 5 June 2015 at 01:35, Olivier Girardot wrote: You can use it as a broadcast variable, but if it's "too" large (more than 1Gb I guess), you may need to share it joining this using some kind of key to the other RDDs. But this is the kind of thing broadcast variables were designed for. Regards, Olivier. Le jeu. 4 juin 2015 à 23:50, dgoldenberg a écrit : We have some pipelines defined where sometimes we need to load potentially large resources such as dictionaries. What would be the best strategy for sharing such resources among the transformations/actions within a consumer? Can they be shared somehow across the RDD's? I'm looking for a way to load such a resource once into the cluster memory and have it be available throughout the lifecycle of a consumer... Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-share-large-resources-like-dictionaries-while-processing-data-with-Spark-tp23162.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to share large resources like dictionaries while processing data with Spark ?
Thanks everyone. Evo, could you provide a link to the Lookup RDD project? I can't seem to locate it exactly on Github. (Yes, to your point, our project is Spark streaming based). Thank you. On Fri, Jun 5, 2015 at 6:04 AM, Evo Eftimov wrote: > Oops, @Yiannis, sorry to be a party pooper but the Job Server is for Spark > Batch Jobs (besides anyone can put something like that in 5 min), while I > am under the impression that Dmytiy is working on Spark Streaming app > > > > Besides the Job Server is essentially for sharing the Spark Context > between multiple threads > > > > Re Dmytiis intial question – you can load large data sets as Batch > (Static) RDD from any Spark Streaming App and then join DStream RDDs > against them to emulate “lookups” , you can also try the “Lookup RDD” – > there is a git hub project > > > > *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] > *Sent:* Friday, June 5, 2015 12:12 AM > *To:* Yiannis Gkoufas > *Cc:* Olivier Girardot; user@spark.apache.org > *Subject:* Re: How to share large resources like dictionaries while > processing data with Spark ? > > > > Thanks so much, Yiannis, Olivier, Huang! > > > > On Thu, Jun 4, 2015 at 6:44 PM, Yiannis Gkoufas > wrote: > > Hi there, > > > > I would recommend checking out > https://github.com/spark-jobserver/spark-jobserver which I think gives > the functionality you are looking for. > > I haven't tested it though. > > > > BR > > > > On 5 June 2015 at 01:35, Olivier Girardot wrote: > > You can use it as a broadcast variable, but if it's "too" large (more than > 1Gb I guess), you may need to share it joining this using some kind of key > to the other RDDs. > > But this is the kind of thing broadcast variables were designed for. > > > > Regards, > > > > Olivier. > > > > Le jeu. 4 juin 2015 à 23:50, dgoldenberg a > écrit : > > We have some pipelines defined where sometimes we need to load potentially > large resources such as dictionaries. > > What would be the best strategy for sharing such resources among the > transformations/actions within a consumer? Can they be shared somehow > across the RDD's? > > I'm looking for a way to load such a resource once into the cluster memory > and have it be available throughout the lifecycle of a consumer... > > Thanks. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-share-large-resources-like-dictionaries-while-processing-data-with-Spark-tp23162.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > > >
RE: How to share large resources like dictionaries while processing data with Spark ?
Spark uses Tachyon internally ie all SERIALIZED IN-MEMORY RDDs are kept there – so if you have a BATCH RDD which is SERIALIZED IN_MEMORY then you are using Tachyon implicitly – the only difference is that if you are using Tachyon explicitly ie as a distributed, in-memory file system you can share data between Jobs, while an RDD is ALWAYS visible within Jobs using the same Spark Context From: Charles Earl [mailto:charles.ce...@gmail.com] Sent: Friday, June 5, 2015 12:10 PM To: Evo Eftimov Cc: Dmitry Goldenberg; Yiannis Gkoufas; Olivier Girardot; user@spark.apache.org Subject: Re: How to share large resources like dictionaries while processing data with Spark ? Would tachyon be appropriate here? On Friday, June 5, 2015, Evo Eftimov wrote: Oops, @Yiannis, sorry to be a party pooper but the Job Server is for Spark Batch Jobs (besides anyone can put something like that in 5 min), while I am under the impression that Dmytiy is working on Spark Streaming app Besides the Job Server is essentially for sharing the Spark Context between multiple threads Re Dmytiis intial question – you can load large data sets as Batch (Static) RDD from any Spark Streaming App and then join DStream RDDs against them to emulate “lookups” , you can also try the “Lookup RDD” – there is a git hub project From: Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com ] Sent: Friday, June 5, 2015 12:12 AM To: Yiannis Gkoufas Cc: Olivier Girardot; user@spark.apache.org Subject: Re: How to share large resources like dictionaries while processing data with Spark ? Thanks so much, Yiannis, Olivier, Huang! On Thu, Jun 4, 2015 at 6:44 PM, Yiannis Gkoufas > wrote: Hi there, I would recommend checking out https://github.com/spark-jobserver/spark-jobserver which I think gives the functionality you are looking for. I haven't tested it though. BR On 5 June 2015 at 01:35, Olivier Girardot > wrote: You can use it as a broadcast variable, but if it's "too" large (more than 1Gb I guess), you may need to share it joining this using some kind of key to the other RDDs. But this is the kind of thing broadcast variables were designed for. Regards, Olivier. Le jeu. 4 juin 2015 à 23:50, dgoldenberg > a écrit : We have some pipelines defined where sometimes we need to load potentially large resources such as dictionaries. What would be the best strategy for sharing such resources among the transformations/actions within a consumer? Can they be shared somehow across the RDD's? I'm looking for a way to load such a resource once into the cluster memory and have it be available throughout the lifecycle of a consumer... Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-share-large-resources-like-dictionaries-while-processing-data-with-Spark-tp23162.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- - Charles
Re: How to share large resources like dictionaries while processing data with Spark ?
Would tachyon be appropriate here? On Friday, June 5, 2015, Evo Eftimov wrote: > Oops, @Yiannis, sorry to be a party pooper but the Job Server is for Spark > Batch Jobs (besides anyone can put something like that in 5 min), while I > am under the impression that Dmytiy is working on Spark Streaming app > > > > Besides the Job Server is essentially for sharing the Spark Context > between multiple threads > > > > Re Dmytiis intial question – you can load large data sets as Batch > (Static) RDD from any Spark Streaming App and then join DStream RDDs > against them to emulate “lookups” , you can also try the “Lookup RDD” – > there is a git hub project > > > > *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com > ] > *Sent:* Friday, June 5, 2015 12:12 AM > *To:* Yiannis Gkoufas > *Cc:* Olivier Girardot; user@spark.apache.org > > *Subject:* Re: How to share large resources like dictionaries while > processing data with Spark ? > > > > Thanks so much, Yiannis, Olivier, Huang! > > > > On Thu, Jun 4, 2015 at 6:44 PM, Yiannis Gkoufas > wrote: > > Hi there, > > > > I would recommend checking out > https://github.com/spark-jobserver/spark-jobserver which I think gives > the functionality you are looking for. > > I haven't tested it though. > > > > BR > > > > On 5 June 2015 at 01:35, Olivier Girardot > wrote: > > You can use it as a broadcast variable, but if it's "too" large (more than > 1Gb I guess), you may need to share it joining this using some kind of key > to the other RDDs. > > But this is the kind of thing broadcast variables were designed for. > > > > Regards, > > > > Olivier. > > > > Le jeu. 4 juin 2015 à 23:50, dgoldenberg > a écrit : > > We have some pipelines defined where sometimes we need to load potentially > large resources such as dictionaries. > > What would be the best strategy for sharing such resources among the > transformations/actions within a consumer? Can they be shared somehow > across the RDD's? > > I'm looking for a way to load such a resource once into the cluster memory > and have it be available throughout the lifecycle of a consumer... > > Thanks. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-share-large-resources-like-dictionaries-while-processing-data-with-Spark-tp23162.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > > > > -- - Charles
RE: How to share large resources like dictionaries while processing data with Spark ?
Oops, @Yiannis, sorry to be a party pooper but the Job Server is for Spark Batch Jobs (besides anyone can put something like that in 5 min), while I am under the impression that Dmytiy is working on Spark Streaming app Besides the Job Server is essentially for sharing the Spark Context between multiple threads Re Dmytiis intial question – you can load large data sets as Batch (Static) RDD from any Spark Streaming App and then join DStream RDDs against them to emulate “lookups” , you can also try the “Lookup RDD” – there is a git hub project From: Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] Sent: Friday, June 5, 2015 12:12 AM To: Yiannis Gkoufas Cc: Olivier Girardot; user@spark.apache.org Subject: Re: How to share large resources like dictionaries while processing data with Spark ? Thanks so much, Yiannis, Olivier, Huang! On Thu, Jun 4, 2015 at 6:44 PM, Yiannis Gkoufas wrote: Hi there, I would recommend checking out https://github.com/spark-jobserver/spark-jobserver which I think gives the functionality you are looking for. I haven't tested it though. BR On 5 June 2015 at 01:35, Olivier Girardot wrote: You can use it as a broadcast variable, but if it's "too" large (more than 1Gb I guess), you may need to share it joining this using some kind of key to the other RDDs. But this is the kind of thing broadcast variables were designed for. Regards, Olivier. Le jeu. 4 juin 2015 à 23:50, dgoldenberg a écrit : We have some pipelines defined where sometimes we need to load potentially large resources such as dictionaries. What would be the best strategy for sharing such resources among the transformations/actions within a consumer? Can they be shared somehow across the RDD's? I'm looking for a way to load such a resource once into the cluster memory and have it be available throughout the lifecycle of a consumer... Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-share-large-resources-like-dictionaries-while-processing-data-with-Spark-tp23162.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to share large resources like dictionaries while processing data with Spark ?
Thanks so much, Yiannis, Olivier, Huang! On Thu, Jun 4, 2015 at 6:44 PM, Yiannis Gkoufas wrote: > Hi there, > > I would recommend checking out > https://github.com/spark-jobserver/spark-jobserver which I think gives > the functionality you are looking for. > I haven't tested it though. > > BR > > On 5 June 2015 at 01:35, Olivier Girardot wrote: > >> You can use it as a broadcast variable, but if it's "too" large (more >> than 1Gb I guess), you may need to share it joining this using some kind of >> key to the other RDDs. >> But this is the kind of thing broadcast variables were designed for. >> >> Regards, >> >> Olivier. >> >> Le jeu. 4 juin 2015 à 23:50, dgoldenberg a >> écrit : >> >>> We have some pipelines defined where sometimes we need to load >>> potentially >>> large resources such as dictionaries. >>> >>> What would be the best strategy for sharing such resources among the >>> transformations/actions within a consumer? Can they be shared somehow >>> across the RDD's? >>> >>> I'm looking for a way to load such a resource once into the cluster >>> memory >>> and have it be available throughout the lifecycle of a consumer... >>> >>> Thanks. >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-share-large-resources-like-dictionaries-while-processing-data-with-Spark-tp23162.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> - >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >
Re: How to share large resources like dictionaries while processing data with Spark ?
Hi there, I would recommend checking out https://github.com/spark-jobserver/spark-jobserver which I think gives the functionality you are looking for. I haven't tested it though. BR On 5 June 2015 at 01:35, Olivier Girardot wrote: > You can use it as a broadcast variable, but if it's "too" large (more than > 1Gb I guess), you may need to share it joining this using some kind of key > to the other RDDs. > But this is the kind of thing broadcast variables were designed for. > > Regards, > > Olivier. > > Le jeu. 4 juin 2015 à 23:50, dgoldenberg a > écrit : > >> We have some pipelines defined where sometimes we need to load potentially >> large resources such as dictionaries. >> >> What would be the best strategy for sharing such resources among the >> transformations/actions within a consumer? Can they be shared somehow >> across the RDD's? >> >> I'm looking for a way to load such a resource once into the cluster memory >> and have it be available throughout the lifecycle of a consumer... >> >> Thanks. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-share-large-resources-like-dictionaries-while-processing-data-with-Spark-tp23162.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >>
Re: How to share large resources like dictionaries while processing data with Spark ?
You can use it as a broadcast variable, but if it's "too" large (more than 1Gb I guess), you may need to share it joining this using some kind of key to the other RDDs. But this is the kind of thing broadcast variables were designed for. Regards, Olivier. Le jeu. 4 juin 2015 à 23:50, dgoldenberg a écrit : > We have some pipelines defined where sometimes we need to load potentially > large resources such as dictionaries. > > What would be the best strategy for sharing such resources among the > transformations/actions within a consumer? Can they be shared somehow > across the RDD's? > > I'm looking for a way to load such a resource once into the cluster memory > and have it be available throughout the lifecycle of a consumer... > > Thanks. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-share-large-resources-like-dictionaries-while-processing-data-with-Spark-tp23162.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
RE: How to share large resources like dictionaries while processing data with Spark ?
Is the dictionary read-only? Did you look at http://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables ? -Original Message- From: dgoldenberg [mailto:dgoldenberg...@gmail.com] Sent: Thursday, June 04, 2015 4:50 PM To: user@spark.apache.org Subject: How to share large resources like dictionaries while processing data with Spark ? We have some pipelines defined where sometimes we need to load potentially large resources such as dictionaries. What would be the best strategy for sharing such resources among the transformations/actions within a consumer? Can they be shared somehow across the RDD's? I'm looking for a way to load such a resource once into the cluster memory and have it be available throughout the lifecycle of a consumer... Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-share-large-resources-like-dictionaries-while-processing-data-with-Spark-tp23162.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org