Re: [Spark Streaming] Iterative programming on an ordered spark stream using Java?

2015-06-18 Thread twinkle sachdeva
Hi, UpdateStateByKey : if you can brief the issue you are facing with this,that will be great. Regarding not keeping whole dataset in memory, you can tweak the parameter of remember, such that it does checkpoint at appropriate time. Thanks Twinkle On Thursday, June 18, 2015, Nipun Arora

Re: Spark Streaming to Kafka

2015-05-20 Thread twinkle sachdeva
Thanks Saisai. On Wed, May 20, 2015 at 11:23 AM, Saisai Shao sai.sai.s...@gmail.com wrote: I think here is the PR https://github.com/apache/spark/pull/2994 you could refer to. 2015-05-20 13:41 GMT+08:00 twinkle sachdeva twinkle.sachd...@gmail.com: Hi, As Spark streaming is being nicely

Spark Streaming to Kafka

2015-05-19 Thread twinkle sachdeva
Hi, As Spark streaming is being nicely integrated with consuming messages from Kafka, so I thought of asking the forum, that is there any implementation available for pushing data to Kafka from Spark Streaming too? Any link(s) will be helpful. Thanks and Regards, Twinkle

Re: FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle

2015-05-06 Thread twinkle sachdeva
Hi, Can you please share your compression etc settings, which you are using. Thanks, Twinkle On Wed, May 6, 2015 at 4:15 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: I'm facing this error in Spark 1.3.1 https://issues.apache.org/jira/browse/SPARK-4105 Anyone knows what's the

Re: Addition of new Metrics for killed executors.

2015-04-22 Thread twinkle sachdeva
Thakur. On Mon, Apr 20, 2015 at 1:31 PM, twinkle sachdeva twinkle.sachd...@gmail.com wrote: Hi Archit, What is your use case and what kind of metrics are you planning to add? Thanks, Twinkle On Fri, Apr 17, 2015 at 4:07 PM, Archit Thakur archit279tha...@gmail.com wrote: Hi, We

Re: Addition of new Metrics for killed executors.

2015-04-20 Thread twinkle sachdeva
Hi Archit, What is your use case and what kind of metrics are you planning to add? Thanks, Twinkle On Fri, Apr 17, 2015 at 4:07 PM, Archit Thakur archit279tha...@gmail.com wrote: Hi, We are planning to add new Metrics in Spark for the executors that got killed during the execution. Was

Re: RDD generated on every query

2015-04-14 Thread twinkle sachdeva
Hi, If you have the same spark context, then you can cache the query result via caching the table ( sqlContext.cacheTable(tableName) ). Maybe you can have a look at OOyola server also. On Tue, Apr 14, 2015 at 11:36 AM, Akhil Das ak...@sigmoidanalytics.com wrote: You can use a tachyon based

Re: set spark.storage.memoryFraction to 0 when no cached RDD and memory area for broadcast value?

2015-04-14 Thread twinkle sachdeva
Hi, In one of the application we have made which had no clone stuff, we have set the value of spark.storage.memoryFraction to very low, and yes that gave us performance benefits. Regarding that issue, you should also look at the data you are trying to broadcast, as sometimes creating that data

Regarding benefits of using more than one cpu for a task in spark

2015-04-07 Thread twinkle sachdeva
Hi, In spark, there are two settings regarding number of cores, one is at task level :spark.task.cpus and there is another one, which drives number of cores per executors: spark.executor.cores Apart from using more than one core for a task which has to call some other external API etc, is there

Re: Strategy regarding maximum number of executor's failure for log running jobs/ spark streaming jobs

2015-04-07 Thread twinkle sachdeva
a lot of them in a short span of time, it means there's probably something going wrong in the app or on the cluster. On Wed, Apr 1, 2015 at 7:08 PM, twinkle sachdeva twinkle.sachd...@gmail.com wrote: Hi, Thanks Sandy. Another way to look at this is that would we like to have our long

Strategy regarding maximum number of executor's failure for log running jobs/ spark streaming jobs

2015-04-01 Thread twinkle sachdeva
Hi, In spark over YARN, there is a property spark.yarn.max.executor.failures which controls the maximum number of executor's failure an application will survive. If number of executor's failures ( due to any reason like OOM or machine failure etc ), increases this value then applications quits.

Re: Strategy regarding maximum number of executor's failure for log running jobs/ spark streaming jobs

2015-04-01 Thread twinkle sachdeva
solution could be to allow a maximum number of failures within any given time span. E.g. a max failures per hour property. -Sandy On Tue, Mar 31, 2015 at 11:52 PM, twinkle sachdeva twinkle.sachd...@gmail.com wrote: Hi, In spark over YARN, there is a property spark.yarn.max.executor.failures

Re: Priority queue in spark

2015-03-17 Thread twinkle sachdeva
will be submitted to the spark cluster based on the priority. jobs will lower priority or less than some threshold will be discarded. Thanks, Abhi On Mon, Mar 16, 2015 at 10:36 PM, twinkle sachdeva twinkle.sachd...@gmail.com wrote: Hi Abhi, You mean each task of a job can have different

Re: Priority queue in spark

2015-03-16 Thread twinkle sachdeva
Hi, Maybe this is what you are looking for : http://spark.apache.org/docs/1.2.0/job-scheduling.html#fair-scheduler-pools Thanks, On Mon, Mar 16, 2015 at 8:15 PM, abhi abhishek...@gmail.com wrote: Hi Current all the jobs in spark gets submitted using queue . i have a requirement where

Re: One of the executor not getting StopExecutor message

2015-03-03 Thread twinkle sachdeva
...@sigmoidanalytics.com wrote: Mostly, that particular executor is stuck on GC Pause, what operation are you performing? You can try increasing the parallelism if you see only 1 executor is doing the task. Thanks Best Regards On Fri, Feb 27, 2015 at 11:39 AM, twinkle sachdeva twinkle.sachd

delay between removing the block manager of an executor, and marking that as lost

2015-03-03 Thread twinkle sachdeva
Hi, Is there any relation between removing block manager of an executor and marking that as lost? In my setup,even after removing block manager ( after failing to do some operation )...it is taking more than 20 mins, to mark that as lost executor. Following are the logs: *15/03/03 10:26:49

One of the executor not getting StopExecutor message

2015-02-26 Thread twinkle sachdeva
Hi, I am running a spark application on Yarn in cluster mode. One of my executor appears to be in hang state, for a long time, and gets finally killed by the driver. As compared to other executors, It have not received StopExecutor message from the driver. Here are the logs at the end of this

Regarding shuffle data file format

2015-02-20 Thread twinkle sachdeva
Hi, What is the file format which is used to write files while shuffle write? Is it dependent on the spark shuffle manager or output format? Is it possible to change the file format for shuffle, irrespective of the output format of the file? Thanks, Twinkle

Regarding minimum number of partitions while reading data from Hadoop

2015-02-19 Thread twinkle sachdeva
Hi, In our job, we need to process the data in small chunks, so as to avoid GC and other stuff. For this, we are using old API of hadoop as that let us specify parameter like minPartitions. Does any one knows, If there a way to do the same via newHadoopAPI also? How that way will be different

Re: Spark can't find jars

2014-10-27 Thread twinkle sachdeva
Hi, Try running following in the spark folder: bin/*run-example *SparkPi 10 If this runs fine, just see the set of arguments being passed via this script, and try in similar way. Thanks, On Thu, Oct 16, 2014 at 2:59 PM, Christophe Préaud christophe.pre...@kelkoo.com wrote: Hi, I have

Regarding using spark sql with yarn

2014-10-17 Thread twinkle sachdeva
Hi, I have been using spark sql with yarn. It works fine with yarn-client mode, but with yarn-cluster mode, we are facing 2 issues. Is yarn-cluster mode not recommended for spark-sql using hiveContext ?? *Problem #1* We are not able to use any query with very simple filtering operation like,

Regarding java version requirement in spark 1.2.0 or upcoming releases

2014-10-13 Thread twinkle sachdeva
Hi, Can somebody please share the plans regarding java version's support for apache spark 1.2.0 or near future releases. Will java 8 become the all feature supported version in apache spark 1.2 or java 1.7 will suffice? Thanks,

Re: Using one sql query's result inside another sql query

2014-09-28 Thread twinkle sachdeva
://github.com/apache/spark/pull/2382. As a workaround, you can use lowercase letters in field names instead. Cheng On 9/25/14 1:18 PM, twinkle sachdeva wrote: Hi, I am using Hive Context to fire the sql queries inside spark. I have created a schemaRDD( Let's call it cachedSchema

Using one sql query's result inside another sql query

2014-09-24 Thread twinkle sachdeva
Hi, I am using Hive Context to fire the sql queries inside spark. I have created a schemaRDD( Let's call it cachedSchema ) inside my code. If i fire a sql query ( Query 1 ) on top of it, then it works. But if I refer to Query1's result inside another sql, that fails. Note that I have already

Has anybody faced SPARK-2604 issue regarding Application hang state

2014-09-01 Thread twinkle sachdeva
Hi, Has anyone else also experienced https://issues.apache.org/jira/browse/SPARK-2604? It is an edge case scenario of mis configuration, where the executor memory asked is same as the maximum allowed memory by yarn. In such situation, application stays in hang state, and the reason is not logged