Re: [External Sender] Re: How to make pyspark use custom python?

2018-09-06 Thread Femi Anthony
Are you sure that pyarrow is deployed on your slave hosts ? If not, you will either have to get it installed or ship it along when you call spark-submit by zipping it up and specifying the zipfile to be shipped using the --py-files zipfile.zip option A quick check would be to ssh to a slave host,

Re: How to make pyspark use custom python?

2018-09-06 Thread mithril
The whole content in `spark-env.sh` is ``` SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=10.104.85.78:2181,10.104.114.131:2181,10.135.2.132:2181 -Dspark.deploy.zookeeper.dir=/spark" PYSPARK_PYTHON="/usr/local/miniconda3/bin/python" ``` I ran

Error in show()

2018-09-06 Thread dimitris plakas
Hello everyone, I am new in Pyspark and i am facing an issue. Let me explain what exactly is the problem. I have a dataframe and i apply on this a map() function (dataframe2=datframe1.rdd.map(custom_function()) dataframe = sqlContext.createDataframe(dataframe2) when i have

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-06 Thread Mich Talebzadeh
Ok somehow this worked! // Save prices to mongoDB collection val document = sparkContext.parallelize((1 to 1). map(i =>

Re: How to make pyspark use custom python?

2018-09-06 Thread Patrick McCarthy
It looks like for whatever reason your cluster isn't using the python you distributed, or said distribution doesn't contain what you think. I've used the following with success to deploy a conda environment to my cluster at runtime:

Re: CBO not working for Parquet Files

2018-09-06 Thread emlyn
rajat mishra wrote > When I try to computed the statistics for a query where partition column > is in where clause, the statistics returned contains only the sizeInBytes > and not the no of rows count. We are also having the same issue. We have our data in partitioned parquet files and were

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-06 Thread Mich Talebzadeh
thanks if you define columns class as below scala> case class columns(KEY: String, TICKER: String, TIMEISSUED: String, *PRICE: Double)* defined class columns scala> var df = Seq(columns("key", "ticker", "timeissued", 1.23f)).toDF df: org.apache.spark.sql.DataFrame = [KEY: string, TICKER: string

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-06 Thread Jungtaek Lim
This code works with Spark 2.3.0 via spark-shell. scala> case class columns(KEY: String, TICKER: String, TIMEISSUED: String, PRICE: Float) defined class columns scala> import spark.implicits._ import spark.implicits._ scala> var df = Seq(columns("key", "ticker", "timeissued", 1.23f)).toDF

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-06 Thread Mich Talebzadeh
I am trying to understand why spark cannot convert a simple comma separated columns as DF. I did a test I took one line of print and stored it as a one liner csv file as below var allInOne = key+","+ticker+","+timeissued+","+price println(allInOne) cat crap.csv

Unsubscribe

2018-09-06 Thread Anu B Nair
Hi, I have tried all possible way to unsubscripted from this group. Can anyone help? -- Anu

Re: How to make pyspark use custom python?

2018-09-06 Thread Hyukjin Kwon
Are you doubly sure if it is an issue in Spark? I used custom python several times with setting it in PYSPARK_PYTHON before and it was no problem. 2018년 9월 6일 (목) 오후 2:21, mithril 님이 작성: > For better looking , please see > >

How to make pyspark use custom python?

2018-09-06 Thread mithril
For better looking , please see https://stackoverflow.com/questions/52178406/howto-make-pyspark-use-custom-python -- I am using zeppelin connect remote spark cluster. remote spark is