Are there any alternatives to Hive "stored by" clause as Spark 2.0 does not support it

2018-02-07 Thread Pralabh Kumar
Hi Spark 2.0 doesn't support stored by . Is there any alternative to achieve the same.

[CFP] DataWorks Summit, San Jose, 2018

2018-02-07 Thread Yanbo Liang
Hi All, DataWorks Summit, San Jose, 2018 is a good place to share your experience of advanced analytics, data science, machine learning and deep learning. We have Artificial Intelligence and Data Science session, to cover technologies such as: Apache Spark, Sciki-learn, TensorFlow, Keras,

unsubscribe

2018-02-07 Thread dmp
unsubscribe - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Issue with EFS checkpoint

2018-02-07 Thread Khan, Obaidur Rehman
Hello, We have a Spark cluster with 3 worker nodes available as EC2 on AWS. Spark application is running in cluster mode and the checkpoints are stored in EFS. Spark version used is 2.2.0. We noticed the below error coming up – our understanding was that this intermittent checkpoint issue

Re: Sharing spark executor pool across multiple long running spark applications

2018-02-07 Thread Vadim Semenov
The other way might be to launch a single SparkContext and then run jobs inside of it. You can take a look at these projects: - https://github.com/spark-jobserver/spark-jobserver#persistent-context-mode---faster--required-for-related-jobs - http://livy.incubator.apache.org Problems with this

How to preserve the order of parquet files?

2018-02-07 Thread Kevin Jung
Hi all, In spark 2.2.1, when I load parquet files, it shows differently ordered result of original dataset. It seems like FileSourceScanExec.createNonBucketedReadRDD method sorts parquet file splits by their own lengths. - val splitFiles = selectedPartitions.flatMap { partition =>

Spark CEP with files and no streams ?

2018-02-07 Thread Esa Heikkinen
Hello I am trying to use CEP of Spark for log files (as batch job), but not for streams (as realtime). Is that possible ? If yes, do you know examples Scala codes about that ? Or should I convert the log files (with time stamps) into streams ? But how to handle time stamps in Spark ? If I can