Re: How to kill spark applications submitted using spark-submit reliably?

2015-11-22 Thread Ted Yu
> > If you ask about trapping the SIGKILL signal in your script, see the > following: > > http://linuxcommand.org/wss0160.php > > Cheers > > On Fri, Nov 20, 2015 at 10:02 PM, Vikram Kone > wrote: > >> I tried adding shutdown hook to my code but it didn't help. Still same

Re: Datastore for GrpahX

2015-11-22 Thread Sonal Goyal
For graphx, you should be able to read and write data from practically any datastore Spark supports - flat files, rdbms, hadoop etc. If you want to save your graph as it is, check something like Neo4j. http://neo4j.com/developer/apache-spark/ Best Regards, Sonal Founder, Nube Technologies

Re: How to kill spark applications submitted using spark-submit reliably?

2015-11-22 Thread Sudhanshu Janghel
I have noticed that the UI takes some time to reflect the requested changes. Is that the issue ? Have you tried waiting for a few minutes after killing the spark job from terminal ? Regards, Sudhanshu Kind Regards, Sudhanshu On 23 Nov 2015, at 1:43 a.m., Ted Yu wrote:

Re: Spark twitter streaming in Java

2015-11-22 Thread Yogs
Hi Soni, I think you need to start the JavaStreamingContext. Add something like this at the end of your program : jssc.start(); jssc.awaitTermination(6); jssc.stop(); - Yogesh On Thu, Nov 19, 2015 at 12:34 PM, Soni spark wrote: > Dear Friends, > > I am

RE: SparkR DataFrame , Out of memory exception for very small file.

2015-11-22 Thread Sun, Rui
Vipul, Not sure if I understand your question. DataFrame is immutable. You can't update a DataFrame. Could you paste some log info for the OOM error? -Original Message- From: vipulrai [mailto:vipulrai8...@gmail.com] Sent: Friday, November 20, 2015 12:11 PM To: user@spark.apache.org

Re: Initial State

2015-11-22 Thread Tathagata Das
There is a way. Please see the scala docs. http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions The first version of updateStateByKey has the parameter "initialRDD" On Fri, Nov 20, 2015 at 6:52 PM, Bryan wrote:

Re: Need Help Diagnosing/operating/tuning

2015-11-22 Thread Jeremy Davis
It seems like the problem is related to —executor-cores. Is there possibly some sort of race condition when using multiple cores per executor? On Nov 22, 2015, at 12:38 PM, Jeremy Davis > wrote: Hello, I’m at a loss trying to diagnose why

Re: Spark-SQL idiomatic way of adding a new partition or writing to Partitioned Persistent Table

2015-11-22 Thread Deenar Toraskar
Thanks Michael Thanks for the response. Here is my understanding, correct me if I am wrong 1) Spark SQL written partitioned tables do not write metadata to the Hive metastore. Spark SQL discovers partitions from the table location on the underlying DFS, and not the metastore. It does this the

RE: Initial State

2015-11-22 Thread Bryan
I am currently using updateStateByKey (which as you pointed out allows the introduction of an initial RDD) to introduce an initial RDD to my window counting function. I was hoping to essentially seed the widow state in startup without the use of updateStateByKey to avoid the associated cost.

Re: Spark-SQL idiomatic way of adding a new partition or writing to Partitioned Persistent Table

2015-11-22 Thread Stephen Boesch
>> and then use the Hive's dynamic partitioned insert syntax What does this entail? Same sql but you need to do set hive.exec.dynamic.partition = true; in the hive/sql context (along with several other related dynamic partition settings.) Is there anything else/special

Re: How to adjust Spark shell table width

2015-11-22 Thread Ted Yu
Currently the width, if truncation is performed, is hardcoded to be 20 characters. I wonder if capability for user to specify the width should be added. If so, I can send a PR. Cheers On Sun, Nov 22, 2015 at 1:39 AM, Jagrut Sharma wrote: > Since version 1.5.0,

Re: thought experiment: use spark ML to real time prediction

2015-11-22 Thread Vincenzo Selvaggio
The Data Mining Group (http://dmg.org/) that created PMML are working on a new standard called PFA that indeed uses JSON documents, see http://dmg.org/pfa/docs/motivation/ for details. PFA could be the answer to your option c. Regards, Vincenzo On Wed, Nov 18, 2015 at 12:03 PM, Nick Pentreath

Re: spark shuffle

2015-11-22 Thread Shushant Arora
And does groupByKey will keep all values of pairrdd in an iterable list in inmemory of reducer. Which will lead to outofmemory if values of a key are beyond memory of that node . 1.Is there a way to spill that to disk ? 2.If not is there a feasibility of partitioning pairdd using custom

Re: How to adjust Spark shell table width

2015-11-22 Thread Jagrut Sharma
Since version 1.5.0, show(false) on a DataFrame prevents truncation of long strings in the output. By default, strings more than 20 characters are truncated. Example usage: scala> df.show(false) -- Jagrut On Sat, Nov 21, 2015 at 6:24 AM, Fengdong Yu wrote: > Hi, > >

Re: thought experiment: use spark ML to real time prediction

2015-11-22 Thread Andy Davidson
Hi Nick I started this thread. IMHO we need something like spark to train our models. The resulting model are typically small enough to easily fit on a single machine. My real time production system is not built on spark. The real time system needs to use the model to make predictions in real

Need Help Diagnosing/operating/tuning

2015-11-22 Thread Jeremy Davis
Hello, I’m at a loss trying to diagnose why my spark job is failing. (works fine on small data) It is failing during the repartition, or on the subsequent steps.. which then seem to fail and fall back to repartitioning.. I’ve tried adjusting every parameter I can find, but have had no success.

Re: How to kill spark applications submitted using spark-submit reliably?

2015-11-22 Thread Ted Yu
If you ask about trapping the SIGKILL signal in your script, see the following: http://linuxcommand.org/wss0160.php Cheers On Fri, Nov 20, 2015 at 10:02 PM, Vikram Kone wrote: > I tried adding shutdown hook to my code but it didn't help. Still same > issue > > > On Fri,