Re: Spark Streaming Job completed without executing next batches

2017-11-16 Thread KhajaAsmath Mohammed
Here is screenshot . Status shows finished but it should be running for next batch to pick up the data. [image: Inline image 1] On Thu, Nov 16, 2017 at 10:01 PM, KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > Hi, > > I have scheduled spark streaming job to run every 30 minutes and it

Spark Streaming Job completed without executing next batches

2017-11-16 Thread KhajaAsmath Mohammed
Hi, I have scheduled spark streaming job to run every 30 minutes and it was running fine till 32 hours and suddenly I see status of Finsished instead of running (Since it always run in background and shows up in resource manager) Am i doing anything wrong here? how come job was finished without

[ML] Spark Package Release: Deep Learning Pipelines 0.2.0

2017-11-16 Thread Siddharth Murching
Hi all, Just wanted to announce that Deep Learning Pipelines 0.2.0 has been released, providing utilities for transfer learning, parallelized hyperparameter tuning of Keras models, and applying neural networks to DataFrames as SQL UDFs. Spark packages:

Apache Spark Downloads Page Error

2017-11-16 Thread rjsullivan
I just noticed that there's a problem on the Apache Spark Downloads page at: https://spark.apache.org/downloads.html Regardless of which option selected from the 'Choose a package type:' pulldown menu, the file listed for download is always: spark-2.2.0-bin-hadoop2.7.tgz I'm using Chrome

Re: Parquet files from spark not readable in Cascading

2017-11-16 Thread Yong Zhang
I don't have experience with Cascading, but we saw similar issue for importing the data generated in Spark into Hive. Did you try this setting "spark.sql.parquet.writeLegacyFormat" to true? https://stackoverflow.com/questions/44279870/why-cant-impala-read-parquet-files-after-spark-sqls-write

Re: Processing a splittable file from a single executor

2017-11-16 Thread Jeroen Miller
On 16 Nov 2017, at 10:22, Michael Shtelma wrote: > you call repartition(1) before starting processing your files. This > will ensure that you end up with just one partition. One question and one remark: Q) val ds = sqlContext.read.parquet(path).repartition(1) Am I

Processing a splittable file from a single executor

2017-11-16 Thread Jeroen Miller
Dear Sparkers, A while back, I asked how to process non-splittable files in parallel, one file per executor. Vadim's suggested "scheduling within an application" approach worked out beautifully. I am now facing the 'opposite' problem: - I have a bunch of parquet files to process - Once

Re: Restart Spark Streaming after deployment

2017-11-16 Thread Jacek Laskowski
Hi, You're right...killing the spark streaming job is the way to go. If a batch was completed successfully, Spark Streaming will recover from the controlled failure and start where it left off. I don't think there's other way to do it. Pozdrawiam, Jacek Laskowski