[jira] Lantao Jin shared "SPARK-20680: Spark-sql do not support for void column datatype of view" with you

2017-05-09 Thread Lantao Jin (JIRA)
Lantao Jin shared an issue with you > Spark-sql do not support for void column datatype of view > - > > Key: SPARK-20680 > URL:

Faster Spark on ORC with Apache ORC

2017-05-09 Thread Dong Joon Hyun
Hi, All. Apache Spark always has been a fast and general engine, and since SPARK-2883, Spark supports Apache ORC inside `sql/hive` module with Hive dependency. With Apache ORC 1.4.0 (released yesterday), we can make Spark on ORC faster and get some benefits. - Speed: Use both Spark

Re: Any plans for making StateStore pluggable?

2017-05-09 Thread Tathagata Das
Thank you for creating the JIRA. I am working towards making it configurable very soon. On Tue, May 9, 2017 at 4:12 PM, Yogesh Mahajan wrote: > Hi Team, > > Any plans to make the StateStoreProvider/StateStore in structured > streaming pluggable ? > Currently

Any plans for making StateStore pluggable?

2017-05-09 Thread Yogesh Mahajan
Hi Team, Any plans to make the StateStoreProvider/StateStore in structured streaming pluggable ? Currently StateStore#loadedProviders has only one HDFSBackedStateStoreProvider and it's not configurable. If we make this configurable, users can bring in their own implementation of StateStore.

Submitting SparkR to CRAN

2017-05-09 Thread Shivaram Venkataraman
Closely related to the PyPi upload thread (https://s.apache.org/WLtM), I just wanted to give a heads up that we are working on submitting SparkR from Spark 2.1.1 as a package to CRAN. The package submission is under review with CRAN right now and I will post updates to this thread. The main

Re: [VOTE] Apache Spark 2.2.0 (RC2)

2017-05-09 Thread Kazuaki Ishizaki
+1 (non-binding) I tested it on Ubuntu 16.04 and OpenJDK8 on ppc64le. All of the tests for core have passed. $ java -version openjdk version "1.8.0_111" OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14) OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode) $

RDD.cacheDataSet() not working intermittently

2017-05-09 Thread jasbir.sing
Hi, I have a scenario in which I am caching my RDDs for future use. But I observed that when I use my RDD, complete DAG is re-executed and RDD gets created again. How can I avoid this scenario and make sure that RDD.cacheDataSet() caches RDD every time. Regards, Jasbir Singh

Re: Uploading PySpark 2.1.1 to PyPi

2017-05-09 Thread Holden Karau
So I have a PR to add this to the release process documentation - I'm waiting on the necessary approvals from PyPi folks before I merge that incase anything changes as a result of the discussion (like uploading to the legacy host or something). As for conda-forge, it's not something we need to do,