I did the test on Spark 2.2.0 and problem still exists.
Any ideas how to fix it ?
Regards,
Maciek
2017-07-11 11:52 GMT+02:00 Maciej Bryński <mac...@brynski.pl>:
> Hi,
> I have following issue.
> I'm trying to use Spark as a proxy to Cassandra.
> The problem is the thri
I think there could be performance reason.
RDD can be faster than Datasets.
For example check query plan for this code:
spark.range(100).map(_ * 2).filter(_ < 100).map(_ * 2).collect()
There are two serialize / deserialize pairs.
And then compare with RDD equivalent.
sc.parallelize(1 to
Hi,
Do you plan to add tag for this release on github ?
https://github.com/graphframes/graphframes/releases
Regards,
Maciek
2016-08-17 3:18 GMT+02:00 Jacek Laskowski :
> Hi Tim,
>
> AWESOME. Thanks a lot for releasing it. That makes me even more eager
> to see it in Spark's
RK Aduri,
Another idea is to union all results and then run collect.
The question is how big collected data is.
2016-07-20 20:32 GMT+02:00 RK Aduri :
> Spark version: 1.6.0
> So, here is the background:
>
> I have a data frame (Large_Row_DataFrame) which I have
@Reynold Xin,
How this will work with Hive Support ?
SparkSession.sqlContext return HiveContext ?
2016-07-19 0:26 GMT+02:00 Reynold Xin :
> Good idea.
>
> https://github.com/apache/spark/pull/14252
>
>
>
> On Mon, Jul 18, 2016 at 12:16 PM, Michael Armbrust