subject:"Datasets is extremely slow in comparison to RDD in standalone mode WordCount examlpe"

Re: Datasets is extremely slow in comparison to RDD in standalone mode WordCount examlpe

2016-05-13 Thread Amit Sela

Taking it to a more basic level, I compared between a simple transformation with RDDs and with Datasets. This is far simpler than Renato's use case and this brungs up two good question: 1. Is the time it takes to "spin-up" a standalone instance of Spark(SQL) is just an additional one-time overhead

Re: Datasets is extremely slow in comparison to RDD in standalone mode WordCount examlpe

2016-05-12 Thread Renato Marroquín Mogrovejo

Hi Amit, This is very interesting indeed because I have got similar resutls. I tried doing a filtter + groupBy using DataSet with a function, and using the inner RDD of the DF(RDD[row]). I used the inner RDD of a DataFrame because apparently there is no straight-forward way to create an RDD of

Re: Datasets is extremely slow in comparison to RDD in standalone mode WordCount examlpe

2016-05-11 Thread Amit Sela

Some how missed that ;) Anything about Datasets slowness ? On Wed, May 11, 2016, 21:02 Ted Yu wrote: > Which release are you using ? > > You can use the following to disable UI: > --conf spark.ui.enabled=false > > On Wed, May 11, 2016 at 10:59 AM, Amit Sela

Re: Datasets is extremely slow in comparison to RDD in standalone mode WordCount examlpe

2016-05-11 Thread Ted Yu

Which release are you using ? You can use the following to disable UI: --conf spark.ui.enabled=false On Wed, May 11, 2016 at 10:59 AM, Amit Sela wrote: > I've ran a simple WordCount example with a very small List as > input lines and ran it in standalone (local[*]), and

Datasets is extremely slow in comparison to RDD in standalone mode WordCount examlpe

2016-05-11 Thread Amit Sela

I've ran a simple WordCount example with a very small List as input lines and ran it in standalone (local[*]), and Datasets is very slow.. We're talking ~700 msec for RDDs while Datasets takes ~3.5 sec. Is this just start-up overhead ? please note that I'm not timing the context creation... And

Re: Datasets is extremely slow in comparison to RDD in standalone mode WordCount examlpe

Re: Datasets is extremely slow in comparison to RDD in standalone mode WordCount examlpe

Re: Datasets is extremely slow in comparison to RDD in standalone mode WordCount examlpe

Re: Datasets is extremely slow in comparison to RDD in standalone mode WordCount examlpe

Datasets is extremely slow in comparison to RDD in standalone mode WordCount examlpe

5 matches

Site Navigation

Mail list logo

Footer information