Re: Spark driver not reusing HConnection

2016-11-23 Thread Mukesh Jha
Corrosponding HBase bug: https://issues.apache.org/jira/browse/HBASE-12629 On Wed, Nov 23, 2016 at 1:55 PM, Mukesh Jha <me.mukesh@gmail.com> wrote: > The solution is to disable region size caluculation check. > > hbase.regionsizecalculator.enable: false > > On Sun, No

Re: Spark driver not reusing HConnection

2016-11-23 Thread Mukesh Jha
The solution is to disable region size caluculation check. hbase.regionsizecalculator.enable: false On Sun, Nov 20, 2016 at 9:29 PM, Mukesh Jha <me.mukesh@gmail.com> wrote: > Any ideas folks? > > On Fri, Nov 18, 2016 at 3:37 PM, Mukesh Jha <me.mukesh@gmail.com

Re: Spark driver not reusing HConnection

2016-11-20 Thread Mukesh Jha
Any ideas folks? On Fri, Nov 18, 2016 at 3:37 PM, Mukesh Jha <me.mukesh@gmail.com> wrote: > Hi > > I'm accessing multiple regions (~5k) of an HBase table using spark's > newAPIHadoopRDD. But the driver is trying to calculate the region size of > all the regions. >

Spark driver not reusing HConnection

2016-11-18 Thread Mukesh Jha
NFO Driver] RegionSizeCalculator: Calculating region sizes for table "message". -- Thanks & Regards, *Mukesh Jha <me.mukesh@gmail.com>*

Re: Spark kafka integration issues

2016-09-14 Thread Mukesh Jha
that only works on brokers 0.10 or higher. A pull request for > documenting it has been merged, but not deployed. > > On Tue, Sep 13, 2016 at 6:46 PM, Mukesh Jha <me.mukesh@gmail.com> > wrote: > > Hello fellow sparkers, > > > > I'm using spark to consume m

Spark kafka integration issues

2016-09-13 Thread Mukesh Jha
the same? 3) is there a newer version to consumer from kafka-0.10 & kafka-0.9 clusters -- Thanks & Regards, *Mukesh Jha <me.mukesh@gmail.com>*

Re: how to spark streaming application start working on next batch before completing on previous batch .

2015-12-15 Thread Mukesh Jha
Try setting *spark*.streaming.*concurrent*. *jobs* to number of concurrent jobs you want to run. On 15 Dec 2015 17:35, "ikmal" wrote: > The best practice is to set batch interval lesser than processing time. I'm > sure your application is suffering from constantly

Re: how to spark streaming application start working on next batch before completing on previous batch .

2015-12-15 Thread Mukesh Jha
rted. There are issues with fault-tolerance and data loss if that is > set to more than 1. > > > > On Tue, Dec 15, 2015 at 9:19 AM, Mukesh Jha <me.mukesh@gmail.com> > wrote: > >> Try setting *spark*.streaming.*concurrent*. *jobs* to number of >> concurrent jo

Re: SparkStreaming failing with exception Could not compute split, block input

2015-02-27 Thread Mukesh Jha
Apart from that little more information about your job would be helpful. Thanks Best Regards On Wed, Feb 25, 2015 at 11:34 AM, Mukesh Jha me.mukesh@gmail.com wrote: Hi Experts, My Spark Job is failing with below error. From the logs I can see that input-3-1424842351600 was added at 5:32

Re: SparkStreaming failing with exception Could not compute split, block input

2015-02-27 Thread Mukesh Jha
Also my job is map only so there is no shuffle/reduce phase. On Fri, Feb 27, 2015 at 7:10 PM, Mukesh Jha me.mukesh@gmail.com wrote: I'm streamin data from kafka topic using kafkautils doing some computation and writing records to hbase. Storage level is memory-and-disk-ser On 27 Feb

Re: SparkStreaming failing with exception Could not compute split, block input

2015-02-26 Thread Mukesh Jha
On Wed, Feb 25, 2015 at 8:09 PM, Mukesh Jha me.mukesh@gmail.com wrote: My application runs fine for ~3/4 hours and then hits this issue. On Wed, Feb 25, 2015 at 11:34 AM, Mukesh Jha me.mukesh@gmail.com wrote: Hi Experts, My Spark Job is failing with below error. From the logs I

Re: SparkStreaming failing with exception Could not compute split, block input

2015-02-25 Thread Mukesh Jha
My application runs fine for ~3/4 hours and then hits this issue. On Wed, Feb 25, 2015 at 11:34 AM, Mukesh Jha me.mukesh@gmail.com wrote: Hi Experts, My Spark Job is failing with below error. From the logs I can see that input-3-1424842351600 was added at 5:32:32 and was never purged

SparkStreaming failing with exception Could not compute split, block input

2015-02-24 Thread Mukesh Jha
:32:43 WARN scheduler.TaskSetManager: Lost task 36.1 in stage 451.0 (TID 22515, chsnmphbase19.usdc2.cloud.com): java.lang.Exception: Could not compute split, block input-3-1424842355600 not found at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51) -- Thanks Regards, *Mukesh Jha

Re: spark streaming: stderr does not roll

2015-02-24 Thread Mukesh Jha
(spark.executor.logs.rolling.strategy, size) .set(spark.executor.logs.rolling.size.maxBytes, 1024) .set(spark.executor.logs.rolling.maxRetainedFiles, 3) Yet it does not roll and continues to grow. Am I missing something obvious? thanks, Duc -- Thanks Regards, *Mukesh Jha me.mukesh

Re: Cannot access Spark web UI

2015-02-24 Thread Mukesh Jha
spark-env.sh file and /etc/hosts file. Thanks Best Regards On Wed, Feb 18, 2015 at 2:06 PM, Mukesh Jha me.mukesh@gmail.com wrote: Hello Experts, I am running a spark-streaming app inside YARN. I have Spark History server running as well (Do we need it running to access UI

Cannot access Spark web UI

2015-02-18 Thread Mukesh Jha
) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Powered by Jetty:// -- Thanks Regards, *Mukesh Jha me.mukesh@gmail.com*

Re: Spark streaming app shutting down

2015-02-09 Thread Mukesh Jha
think mostly the inflight data would be lost if you arent using any of the fault tolerance mechanism. Thanks Best Regards On Wed, Feb 4, 2015 at 5:24 PM, Mukesh Jha me.mukesh@gmail.com wrote: Hello Sprakans, I'm running a spark streaming app which reads data from kafka topic does

Re: SPARK-streaming app running 10x slower on YARN vs STANDALONE cluster

2015-01-21 Thread Mukesh Jha
. On Wed, Jan 21, 2015 at 9:42 AM, Mukesh Jha me.mukesh@gmail.com wrote: Hello Guys, I've re partitioned my kafkaStream so that it gets evenly distributed among the executors and the results are better. Still from the executors page it seems that only 1 executors all 8 cores are getting

Re: SPARK-streaming app running 10x slower on YARN vs STANDALONE cluster

2015-01-21 Thread Mukesh Jha
-programming-guide.html#reducing-the-processing-time-of-each-batch On Tue, Dec 30, 2014 at 1:43 AM, Mukesh Jha me.mukesh@gmail.com wrote: Thanks Sandy, It was the issue with the no of cores. Another issue I was facing is that tasks are not getting distributed evenly among all executors

Re: SPARKonYARN failing on CDH 5.3.0 : container cannot be fetched because of NumberFormatException

2015-01-09 Thread Mukesh Jha
Is it possible you're still including the old jars on the classpath in some way? -Sandy On Thu, Jan 8, 2015 at 3:38 AM, Mukesh Jha me.mukesh@gmail.com wrote: Hi Experts, I am running spark inside YARN job. The spark-streaming job is running fine in CDH-5.0.0 but after

SPARKonYARN failing on CDH 5.3.0 : container cannot be fetched because of NumberFormatException

2015-01-08 Thread Mukesh Jha
: container_e01_1420481081140_0006_01_01) -- Thanks Regards, *Mukesh Jha me.mukesh@gmail.com*

KafkaUtils not consuming all the data from all partitions

2015-01-07 Thread Mukesh Jha
); kafkaConf.put(zookeeper.connection.timeout.ms, 6000); kafkaConf.put(zookeeper.sync.time.ms, 2000); kafkaConf.put(rebalance.backoff.ms, 1); kafkaConf.put(rebalance.max.retries, 20); -- Thanks Regards, *Mukesh Jha me.mukesh@gmail.com*

Re: KafkaUtils not consuming all the data from all partitions

2015-01-07 Thread Mukesh Jha
); kafkaConf.put(zookeeper.session.timeout.ms, 6000); kafkaConf.put(zookeeper.connection.timeout.ms, 6000); kafkaConf.put(zookeeper.sync.time.ms, 2000); kafkaConf.put(rebalance.backoff.ms, 1); kafkaConf.put(rebalance.max.retries, 20); -- Thanks Regards, Mukesh Jha me.mukesh

Re: SPARK-streaming app running 10x slower on YARN vs STANDALONE cluster

2014-12-30 Thread Mukesh Jha
though other executors are idle. I configured *spark.locality.wait=50* instead of the default 3000 ms, which forced the task rebalancing among nodes, let me know if there is a better way to deal with this. On Tue, Dec 30, 2014 at 12:09 AM, Mukesh Jha me.mukesh@gmail.com wrote: Makes sense, I've

Re: SPARK-streaming app running 10x slower on YARN vs STANDALONE cluster

2014-12-29 Thread Mukesh Jha
on your spark-submit command, it looks like you're only running with 2 executors on YARN. Also, how many cores does each machine have? -Sandy On Mon, Dec 29, 2014 at 4:36 AM, Mukesh Jha me.mukesh@gmail.com wrote: Hello Experts, I'm bench-marking Spark on YARN ( https://spark.apache.org/docs

Re: SPARK-streaming app running 10x slower on YARN vs STANDALONE cluster

2014-12-29 Thread Mukesh Jha
And this is with spark version 1.2.0. On Mon, Dec 29, 2014 at 11:43 PM, Mukesh Jha me.mukesh@gmail.com wrote: Sorry Sandy, The command is just for reference but I can confirm that there are 4 executors and a driver as shown in the spark UI page. Each of these machines is a 8 core box

Re: SPARK-streaming app running 10x slower on YARN vs STANDALONE cluster

2014-12-29 Thread Mukesh Jha
sandy.r...@cloudera.com wrote: Are you setting --num-executors to 8? On Mon, Dec 29, 2014 at 10:13 AM, Mukesh Jha me.mukesh@gmail.com wrote: Sorry Sandy, The command is just for reference but I can confirm that there are 4 executors and a driver as shown in the spark UI page. Each

Re: SPARK-streaming app running 10x slower on YARN vs STANDALONE cluster

2014-12-29 Thread Mukesh Jha
: When running in standalone mode, each executor will be able to use all 8 cores on the box. When running on YARN, each executor will only have access to 2 cores. So the comparison doesn't seem fair, no? -Sandy On Mon, Dec 29, 2014 at 10:22 AM, Mukesh Jha me.mukesh@gmail.com wrote

Re: KafkaUtils explicit acks

2014-12-16 Thread Mukesh Jha
other things should also be taken care J. Thanks Jerry *From:* mukh@gmail.com [mailto:mukh@gmail.com] *On Behalf Of *Mukesh Jha *Sent:* Monday, December 15, 2014 1:31 PM *To:* Tathagata Das *Cc:* francois.garil...@typesafe.com; user@spark.apache.org *Subject:* Re: KafkaUtils

Re: KafkaUtils explicit acks

2014-12-14 Thread Mukesh Jha
not aware of any doc yet (did I miss something ?) but you can look at the ReliableKafkaReceiver's test suite: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/ReliableKafkaStreamSuite.scala -- FG On Wed, Dec 10, 2014 at 11:17 AM, Mukesh Jha me.mukesh@gmail.com wrote

Re: KafkaUtils explicit acks

2014-12-10 Thread Mukesh Jha
Hello Guys, Any insights on this?? If I'm not clear enough my question is how can I use kafka consumer and not loose any data in cases of failures with spark-streaming. On Tue, Dec 9, 2014 at 2:53 PM, Mukesh Jha me.mukesh@gmail.com wrote: Hello Experts, I'm working on a spark app which

KafkaUtils explicit acks

2014-12-09 Thread Mukesh Jha
and it will continue to receive data. 2. https://github.com/dibbhatt/kafka-spark-consumer Txz, *Mukesh Jha me.mukesh@gmail.com*

Lifecycle of RDD in spark-streaming

2014-11-25 Thread Mukesh Jha
/assumptions. -- Thanks Regards, *Mukesh Jha me.mukesh@gmail.com*

Re: Lifecycle of RDD in spark-streaming

2014-11-25 Thread Mukesh Jha
Any pointers guys? On Tue, Nov 25, 2014 at 5:32 PM, Mukesh Jha me.mukesh@gmail.com wrote: Hey Experts, I wanted to understand in detail about the lifecycle of rdd(s) in a streaming app. From my current understanding - rdd gets created out of the realtime input stream. - Transform(s

Debugging spark java application

2014-11-19 Thread Mukesh Jha
Hello experts, Is there an easy way to debug a spark java application? I'm putting debug logs in the map's function but there aren't any logs on the console. Also can i include my custom jars while launching spark-shell and do my poc there? This might me a naive question but any help here is

Re: Functions in Spark

2014-11-16 Thread Mukesh Jha
, *Mukesh Jha me.mukesh@gmail.com*