Re: Should python-2 be supported in Spark 3.0?

2018-09-16 Thread Hyukjin Kwon
I think we can deprecate it in 3.x.0 and remove it in Spark 4.0.0. Many people still use Python 2. Also, techincally 2.7 support is not officially dropped yet - https://pythonclock.org/ 2018년 9월 17일 (월) 오전 9:31, Aakash Basu 님이 작성: > Removing support for an API in a major release makes poor

Re: Should python-2 be supported in Spark 3.0?

2018-09-16 Thread Aakash Basu
Removing support for an API in a major release makes poor sense, deprecating is always better. Removal can always be done two - three minor release later. On Mon 17 Sep, 2018, 6:49 AM Felix Cheung, wrote: > I don’t think we should remove any API even in a major release without > deprecating it

Re: Should python-2 be supported in Spark 3.0?

2018-09-16 Thread Felix Cheung
I don’t think we should remove any API even in a major release without deprecating it first... From: Mark Hamstra Sent: Sunday, September 16, 2018 12:26 PM To: Erik Erlandson Cc: user@spark.apache.org; dev Subject: Re: Should python-2 be supported in Spark 3.0?

Re: Is there any open source framework that converts Cypher to SparkSQL?

2018-09-16 Thread Matei Zaharia
GraphFrames (https://graphframes.github.io) offers a Cypher-like syntax that then executes on Spark SQL. > On Sep 14, 2018, at 2:42 AM, kant kodali wrote: > > Hi All, > > Is there any open source framework that converts Cypher to SparkSQL? > > Thanks!

Re: Should python-2 be supported in Spark 3.0?

2018-09-16 Thread Mark Hamstra
We could also deprecate Py2 already in the 2.4.0 release. On Sat, Sep 15, 2018 at 11:46 AM Erik Erlandson wrote: > In case this didn't make it onto this thread: > > There is a 3rd option, which is to deprecate Py2 for Spark-3.0, and remove > it entirely on a later 3.x release. > > On Sat, Sep

Run spark tests on Windows/docker

2018-09-16 Thread Shmuel Blitz
Hi, I'd like to build and run spark tests on my PC. Build works fine on my Windows machine, but the tests can't run for various reasons. 1. Is it possible to run the tests on a Windows without special magic? 2. If you need some magic, how complicated is it? 3. I thought about running the tests

Best practices on how to multiple spark sessions

2018-09-16 Thread unk1102
Hi I have application which servers as ETL job and I have hundreds of such ETL jobs which runs daily now as of now I have just one spark session which is shared by all these jobs and sometimes all of these jobs run at the same time causing spark session to die due memory issues mostly. Is this a

please help me: when I write code to connect kafka with spark using python and I run code on jupyer there is error display

2018-09-16 Thread hager
I write code to connect kafka with spark using python and I run code on jupyer my code import os #os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars /home/hadoop/Desktop/spark-program/kafka/spark-streaming-kafka-0-8-assembly_2.10-2.0.0-preview.jar pyspark-shell' os.environ['PYSPARK_SUBMIT_ARGS'] =

Re: issue Running Spark Job on Yarn Cluster

2018-09-16 Thread sivasonai
Come across such issue in our project and got it resolved by clearing the space under hdfs directory - "/user/spark". Please check if you have enough space/privileges for this hdfs directory - "/user/spark" -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/