subject:"How to make Dataset api as fast as DataFrame"

Re: How to make Dataset api as fast as DataFrame

2016-01-13 Thread Michael Armbrust

The focus of this release was to get the API out there and there's a lot of low hanging performance optimizations. That said, there is likely always going to be some cost of materializing objects. Another note, anytime your comparing performance its useful to include the output of explain so we

Re: How to make Dataset api as fast as DataFrame

2016-01-13 Thread Arkadiusz Bicz

Hi, Including query plan : DataFrame : == Physical Plan == SortBasedAggregate(key=[agreement#23], functions=[(MaxVectorAggFunction(values#3),mode=Final,isDistinct=false)], output=[agreement#23,maxvalues#27]) +- ConvertToSafe +- Sort [agreement#23 ASC], false, 0 +- TungstenExchange

How to make Dataset api as fast as DataFrame

2016-01-13 Thread Arkadiusz Bicz

Hi, I have done some performance tests by repeating execution with different number of executors and memory for YARN clustered Spark (version 1.6.0) ( cluster contains 6 large size nodes) I found Dataset joinWith or cogroup from 3 to 5 times slower then broadcast join in DataFrame, how to

Re: How to make Dataset api as fast as DataFrame

Re: How to make Dataset api as fast as DataFrame

How to make Dataset api as fast as DataFrame

3 matches

Site Navigation

Mail list logo

Footer information