Re: SQL specific documentation for recent Spark releases

2017-08-10 Thread Stephen Boesch
The correct link is https://docs.databricks.com/spark/latest/spark-sql/index.html . This link does have the core syntax such as the BNF for the DDL and DML and SELECT. It does *not *have a reference for date / string / numeric functions: is there any such reference at this point? It is not

Re: How can I tell if a Spark job is successful or not?

2017-08-10 Thread Ryan
you could exit with error code just like normal java/scala application, and get it from driver/yarn On Fri, Aug 11, 2017 at 9:55 AM, Wei Zhang wrote: > I suppose you can find the job status from Yarn UI application view. > > > > Cheers, > > -z > > > > *From:* 陈宇航

RE: How can I tell if a Spark job is successful or not?

2017-08-10 Thread Wei Zhang
I suppose you can find the job status from Yarn UI application view. Cheers, -z From: 陈宇航 [mailto:yuhang.c...@foxmail.com] Sent: Thursday, August 10, 2017 5:23 PM To: user Subject: How can I tell if a Spark job is successful or not? I want to do some clean-ups after a

Issues when trying to recover a textFileStream from checkpoint in Spark streaming

2017-08-10 Thread SRK
Hi, I am facing issues while trying to recover a textFileStream from checkpoint. Basically it is trying to load the files from the begining of the job start whereas I am deleting the files after processing them. I have the following configs set so was thinking that it should not look for files

Re: SQL specific documentation for recent Spark releases

2017-08-10 Thread Jules Damji
I refer to docs.databricks.com/Spark/latest/Spark-sql/index.html. Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On Aug 10, 2017, at 1:46 PM, Stephen Boesch wrote: > > > While the DataFrame/DataSets are useful in many circumstances they are >

Re: Does Spark SQL uses Calcite?

2017-08-10 Thread Jules Damji
Yes, it's more used in Hive than Spark Sent from my iPhone Pardon the dumb thumb typos :) > On Aug 10, 2017, at 2:24 PM, Sathish Kumaran Vairavelu > wrote: > > I think it is for hive dependency. >> On Thu, Aug 10, 2017 at 4:14 PM kant kodali

Re: Does Spark SQL uses Calcite?

2017-08-10 Thread Sathish Kumaran Vairavelu
I think it is for hive dependency. On Thu, Aug 10, 2017 at 4:14 PM kant kodali wrote: > Since I see a calcite dependency in Spark I wonder where Calcite is being > used? > > On Thu, Aug 10, 2017 at 1:30 PM, Sathish Kumaran Vairavelu < > vsathishkuma...@gmail.com> wrote: > >>

Re: Does Spark SQL uses Calcite?

2017-08-10 Thread kant kodali
Since I see a calcite dependency in Spark I wonder where Calcite is being used? On Thu, Aug 10, 2017 at 1:30 PM, Sathish Kumaran Vairavelu < vsathishkuma...@gmail.com> wrote: > Spark SQL doesn't use Calcite > > On Thu, Aug 10, 2017 at 3:14 PM kant kodali wrote: > >> Hi All,

SQL specific documentation for recent Spark releases

2017-08-10 Thread Stephen Boesch
While the DataFrame/DataSets are useful in many circumstances they are cumbersome for many types of complex sql queries. Is there an up to date *SQL* reference - i.e. not DataFrame DSL operations - for version 2.2? An example of what is not clear: what constructs are supported within

Re: Does Spark SQL uses Calcite?

2017-08-10 Thread Sathish Kumaran Vairavelu
Spark SQL doesn't use Calcite On Thu, Aug 10, 2017 at 3:14 PM kant kodali wrote: > Hi All, > > Does Spark SQL uses Calcite? If so, what for? I thought the Spark SQL has > catalyst which would generate its own logical plans, physical plans and > other optimizations. > >

Does Spark SQL uses Calcite?

2017-08-10 Thread kant kodali
Hi All, Does Spark SQL uses Calcite? If so, what for? I thought the Spark SQL has catalyst which would generate its own logical plans, physical plans and other optimizations. Thanks, Kant

Re: How do I pass multiple cassandra hosts in spark submit?

2017-08-10 Thread shyla deshpande
Got the answer from https://groups.google.com/a/lists.datastax.com/forum/#!topic/spark-connector-user/ETCZdCcaKq8 On Thu, Aug 10, 2017 at 11:59 AM, shyla deshpande wrote: > I have a 3 node cassandra cluster. I want to pass all the 3 nodes in spark > submit. How do I

How do I pass multiple cassandra hosts in spark submit?

2017-08-10 Thread shyla deshpande
I have a 3 node cassandra cluster. I want to pass all the 3 nodes in spark submit. How do I do that. Any code samples will help. Thanks

Re: KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?

2017-08-10 Thread shyla deshpande
Thanks Cody. On Wed, Aug 9, 2017 at 8:46 AM, Cody Koeninger wrote: > org.apache.spark.streaming.kafka.KafkaCluster has methods > getLatestLeaderOffsets and getEarliestLeaderOffsets > > On Mon, Aug 7, 2017 at 11:37 PM, shyla deshpande > wrote: > >

Spark streaming - Processing time keeps on increasing under following scenario

2017-08-10 Thread Ravi Gurram
Hi, I have a spark streaming task that basically does the following, 1. Read a batch using a custom receiver 2. Parse and apply transforms to the batch 3. Convert the raw fields to a bunch of features 4. Use a pre-built model to predict the class of each record in

How can I tell if a Spark job is successful or not?

2017-08-10 Thread ??????
I want to do some clean-ups after a Spark job is finished, and the action I would do depends on whether the job is successful or not. So how where can I get the result for the job? I already tried the SparkListener, it worked fine when the job is successful, but if the job fails, the listener

Re: spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

2017-08-10 Thread Hemanth Gudela
Yeah, installing HDFS in our environment is unfornutately going to take lot of time (approvals/planning etc). I will have to live with local FS for now. The other option I had already tried is collect() and send everything to driver node. But my data volume is too huge for driver node to handle

Re: spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

2017-08-10 Thread Femi Anthony
Also, why are you trying to write results locally if you're not using a distributed file system ? Spark is geared towards writing to a distributed file system. I would suggest trying to collect() so the data is sent to the master and then do a write if the result set isn't too big, or

Re: spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

2017-08-10 Thread Hemanth Gudela
Yes, I have tried with file:/// and the fullpath, as well as just the full path without file:/// prefix. Spark session has been closed, no luck though ☹ Regards, Hemanth From: Femi Anthony Date: Thursday, 10 August 2017 at 11.06 To: Hemanth Gudela

Re: spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

2017-08-10 Thread Femi Anthony
Is your filePath prefaced with file:/// and the full path or is it relative ? You might also try calling close() on the Spark context or session the end of the program execution to try and ensure that cleanup is completed Sent from my iPhone > On Aug 10, 2017, at 3:58 AM, Hemanth Gudela

Re: spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

2017-08-10 Thread Hemanth Gudela
Thanks for reply Femi! I’m writing the file like this --> myDataFrame.write.mode("overwrite").csv("myFilePath") There absolutely are no errors/warnings after the write. _SUCCESS file is created on master node, but the problem of _temporary is noticed only on worked nodes. I know

Re: spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

2017-08-10 Thread Femi Anthony
Normally the* _temporary* directory gets deleted as part of the cleanup when the write is complete and a SUCCESS file is created. I suspect that the writes are not properly completed. How are you specifying the write ? Any error messages in the logs ? On Thu, Aug 10, 2017 at 3:17 AM, Hemanth

spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

2017-08-10 Thread Hemanth Gudela
Hi, I’m running spark on cluster mode containing 4 nodes, and trying to write CSV files to node’s local path (not HDFS). I’m spark.write.csv to write CSV files. On master node: spark.write.csv creates a folder with csv file name and writes many files with part-r-000n suffix. This is okay for

Re: Spark SVD benchmark for dense matrices

2017-08-10 Thread Anastasios Zouzias
Hi Jose, Just to note that in the databricks blog they state that they compute the top-5 singular vectors, not all singular values/vectors. Computing all is much more computational intense. Cheers, Anastasios Am 09.08.2017 15:19 schrieb "Jose Francisco Saray Villamizar" < jsa...@gmail.com>: