Re: How to control the number of files for dynamic partition in Spark SQL?

2016-01-30 Thread Deenar Toraskar
The following should work as long as your tables are created using Spark SQL event_wk.repartition(2).write.partitionBy("eventDate").format("parquet" ).insertInto("event) If you want to stick to using "insert overwrite" for Hive compatibility, then you can repartition twice, instead of setting

Product similarity with TF/IDF and Cosine similarity (DIMSUM)

2016-01-30 Thread Alan Prando
Hi Folks! I am trying to implement a spark job to calculate the similarity of my database products, using only name and descriptions. I would like to use TF-IDF to represent my text data and cosine similarity to calculate all similarities. My goal is, after job completes, get all similarities

deep learning with heterogeneous cloud computing using spark

2016-01-30 Thread Abid Malik
Dear all; Is there any work in this area? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/deep-learning-with-heterogeneous-cloud-computing-using-spark-tp26109.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: can't kill spark job in supervise mode

2016-01-30 Thread Tim Chen
Hi Duc, Are you running Spark on Mesos with cluster mode? And what's your cluster mode submission, and version of Spark are you running? Tim On Sat, Jan 30, 2016 at 8:19 AM, PhuDuc Nguyen wrote: > I have a spark job running on Mesos in multi-master and supervise mode.

Re: deep learning with heterogeneous cloud computing using spark

2016-01-30 Thread Christopher Nguyen
Thanks Nick :) Abid, you may also want to check out http://conferences.oreilly.com/strata/big-data-conference-ny-2015/public/schedule/detail/43484, which describes our work on a combination of Spark and Tachyon for Deep Learning. We found significant gains in using Tachyon (with co-processing)

Re: can't kill spark job in supervise mode

2016-01-30 Thread PhuDuc Nguyen
Hi Tim, Yes we are running Spark on Mesos in cluster mode with supervise flag. Submit script looks like this: spark-submit \ --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:+UseCompressedOops -XX:-UseGCOverheadLimit" \ --supervise \ --deploy-mode cluster \ --class \ --master

Re: Spark 1.5.2 - Programmatically launching spark on yarn-client mode

2016-01-30 Thread Nirav Patel
Thanks Ted. In my application jar there was no spark 1.3.1 artifacts. Anyhow I got it working via Oozie spark action. On Thu, Jan 28, 2016 at 7:42 PM, Ted Yu wrote: > Looks like '--properties-file' is no longer supported. > > Was it possible that Spark 1.3.1 artifact /

Re: deep learning with heterogeneous cloud computing using spark

2016-01-30 Thread Nick Pentreath
Spark ML offers a multi-layer perceptron and has some machinery in place that will support development of further deep-learning models. There is also deeplearning4j and some work on distributed tensorflow on Spark

can't kill spark job in supervise mode

2016-01-30 Thread PhuDuc Nguyen
I have a spark job running on Mesos in multi-master and supervise mode. If I kill it, it is resilient as expected and respawns on another node. However, I cannot kill it when I need to. I have tried 2 methods: 1) ./bin/spark-class org.apache.spark.deploy.Client kill 2)