Re: spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

2017-08-10 Thread Hemanth Gudela
t 2017 at 11.24 To: Hemanth Gudela <hemanth.gud...@qvantel.com> Cc: "user@spark.apache.org" <user@spark.apache.org> Subject: Re: spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes Also,

Re: spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

2017-08-10 Thread Hemanth Gudela
Yes, I have tried with file:/// and the fullpath, as well as just the full path without file:/// prefix. Spark session has been closed, no luck though ☹ Regards, Hemanth From: Femi Anthony <femib...@gmail.com> Date: Thursday, 10 August 2017 at 11.06 To: Hemanth Gudela <he

Re: spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

2017-08-10 Thread Hemanth Gudela
n worked nodes. I know spark.write.csv works best with HDFS, but with the current setup I have in my environment, I have to deal with spark write to node’s local file system and not to HDFS. Regards, Hemanth From: Femi Anthony <femib...@gmail.com> Date: Thursday, 10 August 2017 at 10.38 To

spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

2017-08-10 Thread Hemanth Gudela
Hi, I’m running spark on cluster mode containing 4 nodes, and trying to write CSV files to node’s local path (not HDFS). I’m spark.write.csv to write CSV files. On master node: spark.write.csv creates a folder with csv file name and writes many files with part-r-000n suffix. This is okay for

Re: Spark SQL - Global Temporary View is not behaving as expected

2017-04-24 Thread Hemanth Gudela
. Thanks, Hemanth From: Gene Pang <gene.p...@gmail.com> Date: Monday, 24 April 2017 at 16.41 To: vincent gromakowski <vincent.gromakow...@gmail.com> Cc: Hemanth Gudela <hemanth.gud...@qvantel.com>, "user@spark.apache.org" <user@spark.apache.org>, Felix Cheung <f

Re: How to maintain order of key-value in DataFrame same as JSON?

2017-04-24 Thread Hemanth Gudela
Hi, One option to use if you can is to force df to use the schema order you prefer like this. DataFrame df = sqlContext.read().json(jsonPath).select("name","salary","occupation","address") -Hemanth From: Devender Yadav Date: Monday, 24 April 2017 at 15.45 To:

Spark registered view in "Future" - View changes updated in "Future" are lost in main thread

2017-04-24 Thread Hemanth Gudela
Hi, I’m trying to write a background thread using “Future” which would periodically re-register a view with latest data from underlying database table. However, the data changes updated in “Future” thread are lost in main thread. In the below code, 1. In the beginning, registered view

Re: Spark SQL - Global Temporary View is not behaving as expected

2017-04-22 Thread Hemanth Gudela
d my usecase a lot. But yeah, thanks for explaining the behavior of global temporary view, now it’s clear ☺ -Hemanth From: Felix Cheung <felixcheun...@hotmail.com> Date: Saturday, 22 April 2017 at 11.05 To: Hemanth Gudela <hemanth.gud...@qvantel.com>, "user@spark.apache.org&quo

Spark SQL - Global Temporary View is not behaving as expected

2017-04-22 Thread Hemanth Gudela
Hi, According to documentation, global temporary views are cross-session accessible. But when I try to query a global temporary view from another spark shell like this --> Instance 1 of spark-shell

Re: Spark structured streaming: Is it possible to periodically refresh static data frame?

2017-04-22 Thread Hemanth Gudela
. This looks to be working for me now, but if this solution leads to other problems, I will look for persisted views in hive / Alluxio. Regards, Hemanth From: Gene Pang <gene.p...@gmail.com> Date: Saturday, 22 April 2017 at 0.30 To: Georg Heiler <georg.kf.hei...@gmail.com> Cc: Hemanth Gudela

Re: Spark structured streaming: Is it possible to periodically refresh static data frame?

2017-04-21 Thread Hemanth Gudela
” the way to achieve this? Thanks, Hemanth From: Tathagata Das <tathagata.das1...@gmail.com> Date: Friday, 21 April 2017 at 0.03 To: Hemanth Gudela <hemanth.gud...@qvantel.com> Cc: Georg Heiler <georg.kf.hei...@gmail.com>, "user@spark.apache.org" <user@spark

Re: Spark structured streaming: Is it possible to periodically refresh static data frame?

2017-04-20 Thread Hemanth Gudela
<tathagata.das1...@gmail.com> Date: Friday, 21 April 2017 at 0.03 To: Hemanth Gudela <hemanth.gud...@qvantel.com> Cc: Georg Heiler <georg.kf.hei...@gmail.com>, "user@spark.apache.org" <user@spark.apache.org> Subject: Re: Spark structured streaming: Is it possible to perio

Re: Spark structured streaming: Is it possible to periodically refresh static data frame?

2017-04-20 Thread Hemanth Gudela
red-streaming-programming-guide.html#data-sources>, Structured streaming does not support database as a streaming source 2. Joining between two streams is not possible yet. Regards, Hemanth From: Georg Heiler <georg.kf.hei...@gmail.com> Date: Thursday, 20 April 2017 at 23.11 To: He

Spark structured streaming: Is it possible to periodically refresh static data frame?

2017-04-20 Thread Hemanth Gudela
Hello, I am working on a use case where there is a need to join streaming data frame with a static data frame. The streaming data frame continuously gets data from Kafka topics, whereas static data frame fetches data from a database table. However, as the underlying database table is getting

Re: Does spark 2.1.0 structured streaming support jdbc sink?

2017-04-10 Thread Hemanth Gudela
vio.fior...@granturing.com>> wrote: JDBC sink is not in 2.1. You can see here for an example implementation using the ForEachWriter sink instead: https://databricks.com/blog/2017/04/04/real-time-end-to-end-integration-with-apache-kafka-in-apache-sparks-structured-streaming.html From: Hemanth

Does spark 2.1.0 structured streaming support jdbc sink?

2017-04-09 Thread Hemanth Gudela
Hello Everyone, I am new to Spark, especially spark streaming. I am trying to read an input stream from Kafka, perform windowed aggregations in spark using structured streaming, and finally write aggregates to a sink. - MySQL as an output sink doesn’t seem to be an

Re: df.count() returns one more count than SELECT COUNT()

2017-04-06 Thread Hemanth Gudela
Nulls are excluded with spark.sql("SELECT count(distinct col) FROM Table").show() I think it is ANSI SQL behaviour. scala> spark.sql("select distinct count(null)").show(false) +---+ |count(NULL)| +---+ |0 | +---+ scala> spark.sql("select distinct null").count