Corrosponding HBase bug: https://issues.apache.org/jira/browse/HBASE-12629
On Wed, Nov 23, 2016 at 1:55 PM, Mukesh Jha <me.mukesh@gmail.com> wrote:
> The solution is to disable region size caluculation check.
>
> hbase.regionsizecalculator.enable: false
>
> On Sun, No
The solution is to disable region size caluculation check.
hbase.regionsizecalculator.enable: false
On Sun, Nov 20, 2016 at 9:29 PM, Mukesh Jha <me.mukesh@gmail.com> wrote:
> Any ideas folks?
>
> On Fri, Nov 18, 2016 at 3:37 PM, Mukesh Jha <me.mukesh@gmail.com
Any ideas folks?
On Fri, Nov 18, 2016 at 3:37 PM, Mukesh Jha <me.mukesh@gmail.com> wrote:
> Hi
>
> I'm accessing multiple regions (~5k) of an HBase table using spark's
> newAPIHadoopRDD. But the driver is trying to calculate the region size of
> all the regions.
>
NFO Driver] RegionSizeCalculator: Calculating
region sizes for table "message".
--
Thanks & Regards,
*Mukesh Jha <me.mukesh@gmail.com>*
that only works on brokers 0.10 or higher. A pull request for
> documenting it has been merged, but not deployed.
>
> On Tue, Sep 13, 2016 at 6:46 PM, Mukesh Jha <me.mukesh@gmail.com>
> wrote:
> > Hello fellow sparkers,
> >
> > I'm using spark to consume m
the same?
3) is there a newer version to consumer from kafka-0.10 & kafka-0.9 clusters
--
Thanks & Regards,
*Mukesh Jha <me.mukesh@gmail.com>*
Try setting *spark*.streaming.*concurrent*. *jobs* to number of concurrent
jobs you want to run.
On 15 Dec 2015 17:35, "ikmal" wrote:
> The best practice is to set batch interval lesser than processing time. I'm
> sure your application is suffering from constantly
rted. There are issues with fault-tolerance and data loss if that is
> set to more than 1.
>
>
>
> On Tue, Dec 15, 2015 at 9:19 AM, Mukesh Jha <me.mukesh@gmail.com>
> wrote:
>
>> Try setting *spark*.streaming.*concurrent*. *jobs* to number of
>> concurrent jo
Apart from that little more information about your job would be helpful.
Thanks
Best Regards
On Wed, Feb 25, 2015 at 11:34 AM, Mukesh Jha me.mukesh@gmail.com
wrote:
Hi Experts,
My Spark Job is failing with below error.
From the logs I can see that input-3-1424842351600 was added at 5:32
Also my job is map only so there is no shuffle/reduce phase.
On Fri, Feb 27, 2015 at 7:10 PM, Mukesh Jha me.mukesh@gmail.com wrote:
I'm streamin data from kafka topic using kafkautils doing some
computation and writing records to hbase.
Storage level is memory-and-disk-ser
On 27 Feb
On Wed, Feb 25, 2015 at 8:09 PM, Mukesh Jha me.mukesh@gmail.com wrote:
My application runs fine for ~3/4 hours and then hits this issue.
On Wed, Feb 25, 2015 at 11:34 AM, Mukesh Jha me.mukesh@gmail.com
wrote:
Hi Experts,
My Spark Job is failing with below error.
From the logs I
My application runs fine for ~3/4 hours and then hits this issue.
On Wed, Feb 25, 2015 at 11:34 AM, Mukesh Jha me.mukesh@gmail.com
wrote:
Hi Experts,
My Spark Job is failing with below error.
From the logs I can see that input-3-1424842351600 was added at 5:32:32
and was never purged
:32:43 WARN scheduler.TaskSetManager: Lost task 36.1 in stage
451.0 (TID 22515, chsnmphbase19.usdc2.cloud.com): java.lang.Exception:
Could not compute split, block input-3-1424842355600 not found
at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)
--
Thanks Regards,
*Mukesh Jha
(spark.executor.logs.rolling.strategy, size)
.set(spark.executor.logs.rolling.size.maxBytes, 1024)
.set(spark.executor.logs.rolling.maxRetainedFiles, 3)
Yet it does not roll and continues to grow. Am I missing something
obvious?
thanks,
Duc
--
Thanks Regards,
*Mukesh Jha me.mukesh
spark-env.sh
file and /etc/hosts file.
Thanks
Best Regards
On Wed, Feb 18, 2015 at 2:06 PM, Mukesh Jha me.mukesh@gmail.com
wrote:
Hello Experts,
I am running a spark-streaming app inside YARN. I have Spark History
server running as well (Do we need it running to access UI
)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Powered by Jetty://
--
Thanks Regards,
*Mukesh Jha me.mukesh@gmail.com*
think mostly the inflight data would be lost if
you arent using any of the fault tolerance mechanism.
Thanks
Best Regards
On Wed, Feb 4, 2015 at 5:24 PM, Mukesh Jha me.mukesh@gmail.com
wrote:
Hello Sprakans,
I'm running a spark streaming app which reads data from kafka topic does
.
On Wed, Jan 21, 2015 at 9:42 AM, Mukesh Jha me.mukesh@gmail.com
wrote:
Hello Guys,
I've re partitioned my kafkaStream so that it gets evenly distributed
among the executors and the results are better.
Still from the executors page it seems that only 1 executors all 8 cores
are getting
-programming-guide.html#reducing-the-processing-time-of-each-batch
On Tue, Dec 30, 2014 at 1:43 AM, Mukesh Jha me.mukesh@gmail.com
wrote:
Thanks Sandy, It was the issue with the no of cores.
Another issue I was facing is that tasks are not getting distributed
evenly
among all executors
Is it possible you're still including the old jars on the classpath in
some
way?
-Sandy
On Thu, Jan 8, 2015 at 3:38 AM, Mukesh Jha me.mukesh@gmail.com
wrote:
Hi Experts,
I am running spark inside YARN job.
The spark-streaming job is running fine in CDH-5.0.0 but after
:
container_e01_1420481081140_0006_01_01)
--
Thanks Regards,
*Mukesh Jha me.mukesh@gmail.com*
);
kafkaConf.put(zookeeper.connection.timeout.ms, 6000);
kafkaConf.put(zookeeper.sync.time.ms, 2000);
kafkaConf.put(rebalance.backoff.ms, 1);
kafkaConf.put(rebalance.max.retries, 20);
--
Thanks Regards,
*Mukesh Jha me.mukesh@gmail.com*
);
kafkaConf.put(zookeeper.session.timeout.ms, 6000);
kafkaConf.put(zookeeper.connection.timeout.ms, 6000);
kafkaConf.put(zookeeper.sync.time.ms, 2000);
kafkaConf.put(rebalance.backoff.ms, 1);
kafkaConf.put(rebalance.max.retries, 20);
--
Thanks Regards,
Mukesh Jha me.mukesh
though other executors are idle.
I configured *spark.locality.wait=50* instead of the default 3000 ms, which
forced the task rebalancing among nodes, let me know if there is a better
way to deal with this.
On Tue, Dec 30, 2014 at 12:09 AM, Mukesh Jha me.mukesh@gmail.com
wrote:
Makes sense, I've
on your spark-submit command, it looks like you're only running with
2 executors on YARN. Also, how many cores does each machine have?
-Sandy
On Mon, Dec 29, 2014 at 4:36 AM, Mukesh Jha me.mukesh@gmail.com
wrote:
Hello Experts,
I'm bench-marking Spark on YARN (
https://spark.apache.org/docs
And this is with spark version 1.2.0.
On Mon, Dec 29, 2014 at 11:43 PM, Mukesh Jha me.mukesh@gmail.com
wrote:
Sorry Sandy, The command is just for reference but I can confirm that
there are 4 executors and a driver as shown in the spark UI page.
Each of these machines is a 8 core box
sandy.r...@cloudera.com
wrote:
Are you setting --num-executors to 8?
On Mon, Dec 29, 2014 at 10:13 AM, Mukesh Jha me.mukesh@gmail.com
wrote:
Sorry Sandy, The command is just for reference but I can confirm that
there are 4 executors and a driver as shown in the spark UI page.
Each
:
When running in standalone mode, each executor will be able to use all 8
cores on the box. When running on YARN, each executor will only have
access to 2 cores. So the comparison doesn't seem fair, no?
-Sandy
On Mon, Dec 29, 2014 at 10:22 AM, Mukesh Jha me.mukesh@gmail.com
wrote
other things should also be taken care J.
Thanks
Jerry
*From:* mukh@gmail.com [mailto:mukh@gmail.com] *On Behalf Of *Mukesh
Jha
*Sent:* Monday, December 15, 2014 1:31 PM
*To:* Tathagata Das
*Cc:* francois.garil...@typesafe.com; user@spark.apache.org
*Subject:* Re: KafkaUtils
not aware of any doc yet (did I miss something ?) but you can look at
the ReliableKafkaReceiver's test suite:
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/ReliableKafkaStreamSuite.scala
--
FG
On Wed, Dec 10, 2014 at 11:17 AM, Mukesh Jha me.mukesh@gmail.com
wrote
Hello Guys,
Any insights on this??
If I'm not clear enough my question is how can I use kafka consumer and not
loose any data in cases of failures with spark-streaming.
On Tue, Dec 9, 2014 at 2:53 PM, Mukesh Jha me.mukesh@gmail.com wrote:
Hello Experts,
I'm working on a spark app which
and it will continue to receive data.
2. https://github.com/dibbhatt/kafka-spark-consumer
Txz,
*Mukesh Jha me.mukesh@gmail.com*
/assumptions.
--
Thanks Regards,
*Mukesh Jha me.mukesh@gmail.com*
Any pointers guys?
On Tue, Nov 25, 2014 at 5:32 PM, Mukesh Jha me.mukesh@gmail.com wrote:
Hey Experts,
I wanted to understand in detail about the lifecycle of rdd(s) in a
streaming app.
From my current understanding
- rdd gets created out of the realtime input stream.
- Transform(s
Hello experts,
Is there an easy way to debug a spark java application?
I'm putting debug logs in the map's function but there aren't any logs on
the console.
Also can i include my custom jars while launching spark-shell and do my poc
there?
This might me a naive question but any help here is
,
*Mukesh Jha me.mukesh@gmail.com*
36 matches
Mail list logo