Re: Slowness of Spark Thrift Server

2017-07-17 Thread Maciej Bryński
I did the test on Spark 2.2.0 and problem still exists. Any ideas how to fix it ? Regards, Maciek 2017-07-11 11:52 GMT+02:00 Maciej Bryński <mac...@brynski.pl>: > Hi, > I have following issue. > I'm trying to use Spark as a proxy to Cassandra. > The problem is the thri

Re: Difference between Data set and Data Frame in Spark 2

2016-09-01 Thread Maciej Bryński
I think there could be performance reason. RDD can be faster than Datasets. For example check query plan for this code: spark.range(100).map(_ * 2).filter(_ < 100).map(_ * 2).collect() There are two serialize / deserialize pairs. And then compare with RDD equivalent. sc.parallelize(1 to

Re: GraphFrames 0.2.0 released

2016-08-24 Thread Maciej Bryński
Hi, Do you plan to add tag for this release on github ? https://github.com/graphframes/graphframes/releases Regards, Maciek 2016-08-17 3:18 GMT+02:00 Jacek Laskowski : > Hi Tim, > > AWESOME. Thanks a lot for releasing it. That makes me even more eager > to see it in Spark's

Re: MultiThreading in Spark 1.6.0

2016-07-20 Thread Maciej Bryński
RK Aduri, Another idea is to union all results and then run collect. The question is how big collected data is. 2016-07-20 20:32 GMT+02:00 RK Aduri : > Spark version: 1.6.0 > So, here is the background: > > I have a data frame (Large_Row_DataFrame) which I have

Re: transtition SQLContext to SparkSession

2016-07-19 Thread Maciej Bryński
@Reynold Xin, How this will work with Hive Support ? SparkSession.sqlContext return HiveContext ? 2016-07-19 0:26 GMT+02:00 Reynold Xin : > Good idea. > > https://github.com/apache/spark/pull/14252 > > > > On Mon, Jul 18, 2016 at 12:16 PM, Michael Armbrust