Hi
I have an spark application where driver starts few tasks and In each task
which is a VoidFunction , I have a long running infinite loop. I have set
speculative execution to false.
Will spark kill my task after sometime (Timeout) or tasks will run
infinitely?
If tasks will be killed after
you may have to recreate your cluster with below configuration at emr
creation
"Configurations": [
{
"Properties": {
"maximizeResourceAllocation": "false"
},
"Classification": "spark"
}
]
On
Hi
I have a workflow like below:
rdd1 = sc.textFile(input);
rdd2 = rdd1.filter(filterfunc1);
rdd3 = rdd1.filter(fiterfunc2);
rdd4 = rdd2.map(mapptrans1);
rdd5 = rdd3.map(maptrans2);
rdd6 = rdd4.union(rdd5);
rdd6.foreach(some transformation);
[image: Inline image 1]
1. Do I need to
3.Also will the mappartitions can go out of memory if I return the
arraylist of whole partition after processing the partition ? whats the
alternative to this if this can fail.
On Fri, Jan 27, 2017 at 9:32 AM, Shushant Arora <shushantaror...@gmail.com>
wrote:
> Hi
>
> I have two
Hi
I have two transformations in series.
rdd1 = sourcerdd.map(new Function(...)); //step1
rdd2 = rdd1.mapPartitions(new Function(...)); //step2
1.Is map and mapPartitions narrow dependency ? Does spark optimise the dag
and execute step 1 and step2 in single stage or there will be two stages ?
in their forum or something.
>
> // maropu
>
>
> On Mon, Nov 14, 2016 at 11:20 PM, Shushant Arora <
> shushantaror...@gmail.com> wrote:
>
>> 1.No, I want to implement low level consumer on kinesis stream.
>> so need to stop the worker once it read the latest seque
" not
> enough for your usecase?
>
> On Mon, Nov 14, 2016 at 10:23 PM, Shushant Arora <
> shushantaror...@gmail.com> wrote:
>
>> Thanks!
>> Is there a way to get the latest sequence number of all shards of a
>> kinesis stream?
>>
>>
>>
>>
maropu
>
>
> On Sun, Nov 13, 2016 at 12:08 AM, Shushant Arora <
> shushantaror...@gmail.com> wrote:
>
>> *Hi *
>>
>> *is **spark.streaming.blockInterval* for kinesis input stream is
>> hardcoded to 1 sec or is it configurable ? Time interval at which rec
Hi
In spark streaming based on receivers - when receiver gets data and store
in blocks for workers to process, How many blocks does receiver gives to
worker.
Say I have a streaming app with 30 sec of batch interval what will happen
1.for first batch(first 30 sec) there will not be any data for
*Hi *
*is **spark.streaming.blockInterval* for kinesis input stream is hardcoded
to 1 sec or is it configurable ? Time interval at which receiver fetched
data from kinesis .
Means stream batch interval cannot be less than *spark.streaming.blockInterval
and this should be configrable , Also is
ntation, yes.
>
> Also, we currently cannot disable the interval checkpoints.
>
> On Tue, Oct 25, 2016 at 11:53 AM, Shushant Arora <
> shushantaror...@gmail.com> wrote:
>
>> Thanks!
>>
>> Is kinesis streams are receiver based only? Is there non receiver b
gt; replicated across executors.
> However, all the executors that have the replicated data crash,
> IIUC the dataloss occurs.
>
> // maropu
>
> On Mon, Oct 24, 2016 at 4:43 PM, Shushant Arora <shushantaror...@gmail.com
> > wrote:
>
>> Does spark streaming c
Does spark streaming consumer for kinesis uses Kinesis Client Library and
mandates to checkpoint the sequence number of shards in dynamo db.
Will it lead to dataloss if consumed datarecords are not yet processed and
kinesis checkpointed the consumed sequenece numbers in dynamo db and spark
Hi
I have a transformation on a pair rdd using flatmap function.
1.Can I detect in flatmap whether the current record is last record of
partition being processed and
2. what is the partition index of this partition.
public Iterable> call(Tuple2 t)
throws
Hi
I want to enquire does spark streaming has some limitation of 500ms of
batch intreval ?
Is storm better than spark streaming for real time (for latency of just
50-100ms). In spark streaming can parallel batches be run ? If yes is it
supported at productionlevel.
Thanks
Hi
I have a flow like below
1.rdd1=some source.transform();
2.tranformedrdd1 = rdd1.transform(..);
3.transformrdd2 = rdd1.transform(..);
4.tranformrdd1.action();
Does I need to persist rdd1 to optimise step 2 and 3 ? or since there is no
lineage breakage so it will work without persist ?
3.And is the same behavior applied to streaming application also?
On Sat, May 21, 2016 at 7:44 PM, Shushant Arora <shushantaror...@gmail.com>
wrote:
> And will it allocate rest executors when other containers get freed which
> were occupied by other hadoop jobs/spark
re you ask for enough memory: YARN is a lot more unforgiving about
> memory use than it is about CPU
>
> > On 20 Apr 2016, at 16:21, Shushant Arora <shushantaror...@gmail.com>
> wrote:
> >
> > I am running a spark application on yarn cluster.
> >
> &
I am running a spark application on yarn cluster.
say I have available vcors in cluster as 100.And I start spark application
with --num-executors 200 --num-cores 2 (so I need total 200*2=400 vcores)
but in my cluster only 100 are available.
What will happen ? Will the job abort or it will be
can two stages of single job run in parallel in spark?
e.g one stage is ,map transformation and another is repartition on mapped
rdd.
rdd.map(function,100).repartition(30);
can it happen that map transformation which is running 100 tasks after few
of them say (10 ) are finished and spark
Hi
I have created a jira for this feature
https://issues.apache.org/jira/browse/SPARK-12524
Please vote this feature if its necessary. I would like to implement this
feature.
Thanks
Shushant
On Wed, Dec 2, 2015 at 1:14 PM, Rajat Kumar
wrote:
> What if I don't have
Hi
I have a javapairrdd pairrdd.
when i do rdd.persist(StorageLevel.MEMORY_AND_DISK()).
It throws exception :
com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID:100
serialisationtrace:
familyMap(org.apache.hadoop.hbase.cleint.Put)
I have
, Abhijeet <absi...@informatica.com>
wrote:
> Yes, Parquet has min/max.
>
>
>
> *From:* Cheng Lian [mailto:l...@databricks.com]
> *Sent:* Monday, December 07, 2015 11:21 AM
> *To:* Ted Yu
> *Cc:* Shushant Arora; user@spark.apache.org
> *Subject:* Re: parquet file doub
partitioner and make all values of same key on same node and number of
partitions to be equal to number of distinct keys.
On Sat, Nov 21, 2015 at 11:21 PM, Shushant Arora <shushantaror...@gmail.com>
wrote:
> Hi
>
> I have few doubts
>
> 1.does
> rdd.saveasNewAPIHadoopFile(outpu
er or had a
> rebalance... why did you say " I am getting Connection tmeout in my code."
>
> You've asked questions about this exact same situation before, the answer
> remains the same
>
> On Thu, Sep 10, 2015 at 9:44 AM, Shushant Arora <shushantaror...@gmail.com
> >
consuming messages from kafka");
}
On Thu, Sep 10, 2015 at 6:58 PM, Cody Koeninger <c...@koeninger.org> wrote:
> Post the actual stacktrace you're getting
>
> On Thu, Sep 10, 2015 at 12:21 AM, Shushant Arora <
> shushantaror...@gmail.com> wrote:
>
>> Ex
Executors in spark streaming 1.3 fetch messages from kafka in batches and
what happens when executor takes longer time to complete a fetch batch
say in
directKafkaStream.foreachRDD(new Function, Void>() {
@Override
public Void call(JavaRDD v1) throws Exception {
wrote:
> The answer already given is correct. You shouldn't doubt this, because
> you've already seen the shuffle data change accordingly.
>
> On Fri, Sep 4, 2015 at 11:25 AM, Shushant Arora <shushantaror...@gmail.com
> > wrote:
>
>> But Kafka stream has underlyng RDD which
1.Does repartitioning on direct kafka stream shuffles only the offsets or
exact kafka messages across executors?
Say I have a direct kafkastream
directKafkaStream.repartition(numexecutors).mapPartitions(new
FlatMapFunction>, String>(){
...
}
Say originally I have
t failing if the external server is
> down, and scripting monitoring / restarting of your job.
>
> On Tue, Sep 1, 2015 at 11:19 AM, Shushant Arora <shushantaror...@gmail.com
> > wrote:
>
>> Since in my app , after processing the events I am posting the events to
>> some exter
Hi
In spark streaming 1.3 with kafka- when does driver bring latest offsets of
this run - at start of each batch or at time when batch gets queued ?
Say few of my batches take longer time to complete than their batch
interval. So some of batches will go in queue. Will driver waits for
queued
, 2015 at 8:57 PM, Cody Koeninger <c...@koeninger.org> wrote:
> Honestly I'd concentrate more on getting your batches to finish in a
> timely fashion, so you won't even have the issue to begin with...
>
> On Tue, Sep 1, 2015 at 10:16 AM, Shushant Arora <shushantaror...@gmail.com
t condition is set at previous batch run time.
On Tue, Sep 1, 2015 at 7:09 PM, Cody Koeninger <c...@koeninger.org> wrote:
> It's at the time compute() gets called, which should be near the time the
> batch should have been queued.
>
> On Tue, Sep 1, 2015 at 8:02 AM, Shush
th offset ranges after
> compute is called, things are going to get out of whack.
>
> e.g. checkpoints are no longer going to correspond to what you're actually
> processing
>
> On Tue, Sep 1, 2015 at 10:04 AM, Shushant Arora <shushantaror...@gmail.com
> > wrote:
>
>>
will "help", but it's
> better to figure out what's going on with kafka.
>
> On Wed, Aug 26, 2015 at 9:07 PM, Shushant Arora <shushantaror...@gmail.com
> > wrote:
>
>> Hi
>>
>> My streaming application gets killed with below e
whats the default buffer in spark streaming 1.3 for kafka messages.
Say In this run it has to fetch messages from offset 1 to 1. will it
fetch all in one go or internally it fetches messages in few messages
batch.
Is there any setting to configure this no of offsets fetched in one batch?
Hi
My streaming application gets killed with below error
5/08/26 21:55:20 ERROR kafka.DirectKafkaInputDStream:
ArrayBuffer(kafka.common.NotLeaderForPartitionException,
kafka.common.NotLeaderForPartitionException,
kafka.common.NotLeaderForPartitionException,
to unlimited again .
On Wed, Aug 26, 2015 at 9:32 PM, Cody Koeninger c...@koeninger.org wrote:
see http://kafka.apache.org/documentation.html#consumerconfigs
fetch.message.max.bytes
in the kafka params passed to the constructor
On Wed, Aug 26, 2015 at 10:39 AM, Shushant Arora
shushantaror
task ? Or is it created once only and that
is getting closed somehow ?
On Sat, Aug 22, 2015 at 9:41 AM, Shushant Arora shushantaror...@gmail.com
wrote:
it comes at start of each tasks when there is new data inserted in kafka.(
data inserted is very few)
kafka topic has 300 partitions - data
...@sigmoidanalytics.com
wrote:
Can you try some other consumer and see if the issue still exists?
On Aug 22, 2015 12:47 AM, Shushant Arora shushantaror...@gmail.com
wrote:
Exception comes when client has so many connections to some another
external server also.
So I think Exception is coming
, Aug 22, 2015 at 7:24 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Can you try some other consumer and see if the issue still exists?
On Aug 22, 2015 12:47 AM, Shushant Arora shushantaror...@gmail.com
wrote:
Exception comes when client has so many connections to some another
external
what's going on from a networking point of view,
post a minimal reproducible code sample that demonstrates the issue, so it
can be tested in a different environment.
On Fri, Aug 21, 2015 at 4:06 AM, Shushant Arora
shushantaror...@gmail.com wrote:
Hi
Getting below error in spark
Hi
Getting below error in spark streaming 1.3 while consuming from kafka
using directkafka stream. Few of tasks are getting failed in each run.
What is the reason /solution of this error?
15/08/21 08:54:54 ERROR executor.Executor: Exception in task 262.0 in
stage 130.0 (TID 16332)
, Shushant Arora shushantaror...@gmail.com
wrote:
To correct myself - KafkaRDD[K, V, U, T, R] is subclass of RDD[R] but
OptionKafkaRDD[K, V, U, T, R] is not subclass of OptionRDD[R];
In scala C[T’] is a subclass of C[T] as per
https://twitter.github.io/scala_school/type-basics.html
to achieve this in java for overriding
DirectKafkaInputDStream
?
On Wed, Aug 19, 2015 at 12:45 AM, Shushant Arora shushantaror...@gmail.com
wrote:
But KafkaRDD[K, V, U, T, R] is not subclass of RDD[R] as per java generic
inheritance is not supported so derived class cannot return different
and KafkaRDD is a subclass of RDD
On Tue, Aug 18, 2015 at 1:12 PM, Shushant Arora shushantaror...@gmail.com
wrote:
Is it that in scala its allowed for derived class to have any return type
?
And streaming jar is originally created in scala so its allowed for
DirectKafkaInputDStream to return
] so it should have been failed?
On Tue, Aug 18, 2015 at 7:28 PM, Cody Koeninger c...@koeninger.org wrote:
The superclass method in DStream is defined as returning an Option[RDD[T]]
On Tue, Aug 18, 2015 at 8:07 AM, Shushant Arora shushantaror...@gmail.com
wrote:
Getting compilation error
.
On Wed, Aug 12, 2015 at 1:03 PM, Shushant Arora shushantaror...@gmail.com
wrote:
Hi Cody
Can you help here if streaming 1.3 has any api for not consuming any
message in next few runs?
Thanks
-- Forwarded message --
From: Shushant Arora shushantaror...@gmail.com
Date: Wed
calling
jssc.stop()- since that leads to deadlock.
On Tue, Aug 11, 2015 at 9:54 PM, Shushant Arora shushantaror...@gmail.com
wrote:
Is stopping in the streaming context in onBatchCompleted event
of StreamingListener does not kill the app?
I have below code in streaming listener
public void
=fXnNEq1v3VA
On Mon, Aug 10, 2015 at 4:32 PM, Shushant Arora shushantaror...@gmail.com
wrote:
Hi
How can I avoid duplicate processing of kafka messages in spark stream
1.3 because of executor failure.
1.Can I some how access accumulators of failed task in retry task to
skip those many
down.
Runtime.getRuntime().addShutdownHook does not seem to be working. Yarn
kills the application immediately and dooes not call shutdown hook call
back .
On Sun, Aug 9, 2015 at 12:45 PM, Shushant Arora shushantaror...@gmail.com
wrote:
Hi
How to ensure in spark streaming 1.3 with kafka
a flag. When you get the signal from RPC, you can just
call context.stop(stopGracefully = true) . Though note that this is
blocking, so gotta be carefully about doing blocking calls on the RPC
thread.
On Mon, Aug 10, 2015 at 12:24 PM, Shushant Arora
shushantaror...@gmail.com wrote:
By RPC you
Hi
How can I avoid duplicate processing of kafka messages in spark stream 1.3
because of executor failure.
1.Can I some how access accumulators of failed task in retry task to skip
those many events which are already processed by failed task on this
partition ?
2.Or I ll have to persist each
stop the context and terminate. This is more robust that than
leveraging shutdown hooks.
On Mon, Aug 10, 2015 at 11:56 AM, Shushant Arora
shushantaror...@gmail.com wrote:
Any help in best recommendation for gracefully shutting down a spark
stream application ?
I am running it on yarn
Hi
How to ensure in spark streaming 1.3 with kafka that when an application is
killed , last running batch is fully processed and offsets are written to
checkpointing dir.
On Fri, Aug 7, 2015 at 8:56 AM, Shushant Arora shushantaror...@gmail.com
wrote:
Hi
I am using spark stream 1.3 and using
In stream application how many times the map transformation object being
created?
Say I have
directKafkaStream.repartition(numPartitions).mapPartitions
(new FlatMapFunction_derivedclass(configs));
class FlatMapFunction_derivedclass{
FlatMapFunction_derivedclass(Config config){
}
@Override
which is the scheduler on your cluster. Just check on RM UI scheduler tab
and see your user and max limit of vcores for that user , is currently
other applications of that user have occupies till max vcores of this user
then that could be the reason of not allocating vcores to this user but for
Hi
I am using spark stream 1.3 and using custom checkpoint to save kafka
offsets.
1.Is doing
Runtime.getRuntime().addShutdownHook(new Thread() {
@Override
public void run() {
jssc.stop(true, true);
System.out.println(Inside Add Shutdown Hook);
}
});
to handle stop is safe ?
2.And I
Hi
For checkpointing and using fromOffsets arguments- Say for the first time
when my app starts I don't have any prev state stored and I want to start
consuming from largest offset
1. is it possible to specify that in fromOffsets api- I don't want to use
another api which returs
1.In spark 1.3(Non receiver) - If my batch interval is 1 sec and I don't
set spark.streaming.kafka.maxRatePerPartition - so default behavious is to
bring all messages from kafka from last offset to current offset ?
Say no of messages were large and it took 5 sec to process those so will
all jobs
Is there any setting to allow --files to copy jar from driver to executor
nodes.
When I am passing some jar files using --files to executors and adding them
in class path of executor it throws exception of File not found
15/08/03 07:59:50 WARN TaskSetManager: Lost task 8.0 in stage 0.0 (TID 8,
Hi
I am using spark streaming 1.3 and using checkpointing.
But job is failing to recover from checkpoint on restart.
For broadcast variable it says :
1.WARN TaskSetManager: Lost task 15.0 in stage 7.0 (TID 1269, hostIP):
java.lang.ClassCastException: [B cannot be cast to
.
rdd.map { x = /// use accum }
}
On Wed, Jul 29, 2015 at 1:15 PM, Shushant Arora shushantaror...@gmail.com
wrote:
Hi
I am using spark streaming 1.3 and using checkpointing.
But job is failing to recover from checkpoint on restart.
For broadcast variable it says :
1.WARN TaskSetManager: Lost
Hi
I am processing kafka messages using spark streaming 1.3.
I am using mapPartitions function to process kafka message.
How can I access offset no of individual message getting being processed.
JavaPairInputDStreambyte[], byte[] directKafkaStream
=KafkaUtils.createDirectStream(..);
running application where you want to check that
you didn't see the same value before, and check that for every value, you
probably need a key-value store, not RDD.
On Sun, Jul 26, 2015 at 7:38 PM Shushant Arora shushantaror...@gmail.com
wrote:
Hi
I have a requirement for processing large
Hi
I have a requirement for processing large events but ignoring duplicate at
the same time.
Events are consumed from kafka and each event has a eventid. It may happen
that an event is already processed and came again at some other offset.
1.Can I use Spark RDD to persist processed events and
Hi
I am running a spark stream app on yarn and using apache httpasyncclient 4.1
This client Jar internally has a dependency on jar http-core4.4.1.jar.
This jar's( http-core .jar) old version i.e. httpcore-4.2.5.jar is also
present in class path and has higher priority in classpath(coming earlier
rounds.
If you are using the Kafka receiver based approach (not Direct), then the
raw Kafka data is stored in the executor memory. If you are using Direct
Kafka, then it is read from Kafka directly at the time of filtering.
TD
On Tue, Jul 21, 2015 at 9:34 PM, Shushant Arora shushantaror
...@sigmoidanalytics.com
wrote:
I'd suggest you upgrading to 1.4 as it has better metrices and UI.
Thanks
Best Regards
On Mon, Jul 20, 2015 at 7:01 PM, Shushant Arora
shushantaror...@gmail.com wrote:
Is coalesce not applicable to kafkaStream ? How to do coalesce on
kafkadirectstream its
on a partition as
opposed to spawning a future per record in the RDD for example.
On Tue, Jul 21, 2015 at 4:11 AM, Shushant Arora
shushantaror...@gmail.com wrote:
Hi
Can I create user threads in executors.
I have a streaming app where after processing I have a requirement to
push events
Hi
Can I create user threads in executors.
I have a streaming app where after processing I have a requirement to push
events to external system . Each post request costs ~90-100 ms.
To make post parllel, I can not use same thread because that is limited by
no of cores available in system , can I
Hi
1.I am using spark streaming 1.3 for reading from a kafka queue and pushing
events to external source.
I passed in my job 20 executors but it is showing only 6 in executor tab ?
When I used highlevel streaming 1.2 - its showing 20 executors. My cluster
is 10 node yarn cluster with each node
call for getting offsets of each partition separately or in single
call it gets all partitions new offsets ? I mean will reducing no of
partitions oin kafka help improving the performance?
On Mon, Jul 20, 2015 at 4:52 PM, Shushant Arora shushantaror...@gmail.com
wrote:
Hi
1.I am using spark
does spark streaming 1.3 launches task for each partition offset range
whether that is 0 or not ?
If yes, how can I enforce it to not to launch tasks for empty rdds.Not able
t o use coalesce on directKafkaStream.
Shall we enforce repartitioning always before processing direct stream ?
use case
for
the same server.
Cheers
On Fri, Jul 17, 2015 at 5:15 AM, Shushant Arora shushantaror...@gmail.com
wrote:
Thanks !
My key is random (hexadecimal). So hot spot should not be created.
Is there any concept of bulk put. Say I want to raise a one put request
for a 1000 size batch which
an HBase issue when it comes to
design.
HTH
-Mike
On Jul 15, 2015, at 11:46 AM, Shushant Arora shushantaror...@gmail.com
wrote:
Hi
I have a requirement of writing in hbase table from Spark streaming app
after some processing.
Is Hbase put operation the only way of writing
Hi
I have a requirement of writing in hbase table from Spark streaming app
after some processing.
Is Hbase put operation the only way of writing to hbase or is there any
specialised connector or rdd of spark for hbase write.
Should Bulk load to hbase from streaming app be avoided if output of
I am running spark application on yarn managed cluster.
When I specify --executor-cores 4 it fails to start the application.
I am starting the app as
spark-submit --class classname --num-executors 10 --executor-cores
5 --master masteradd jarname
Exception in thread main
containers .
And these 10 containers will be released only at end of streaming
application never in between if none of them fails.
On Tue, Jul 14, 2015 at 11:32 PM, Marcelo Vanzin van...@cloudera.com
wrote:
On Tue, Jul 14, 2015 at 10:55 AM, Shushant Arora
shushantaror...@gmail.com wrote
15, 2015, at 01:57, Shushant Arora shushantaror...@gmail.com
wrote:
I am running spark application on yarn managed cluster.
When I specify --executor-cores 4 it fails to start the application.
I am starting the app as
spark-submit --class classname --num-executors 10 --executor-cores
Is yarn.scheduler.maximum-allocation-vcores the setting for max vcores per
container?
Whats the setting for max limit of --num-executors ?
On Tue, Jul 14, 2015 at 11:18 PM, Marcelo Vanzin van...@cloudera.com
wrote:
On Tue, Jul 14, 2015 at 10:40 AM, Shushant Arora
shushantaror...@gmail.com
?
On Tue, Jul 14, 2015 at 10:52 PM, Marcelo Vanzin van...@cloudera.com
wrote:
On Tue, Jul 14, 2015 at 9:57 AM, Shushant Arora shushantaror...@gmail.com
wrote:
When I specify --executor-cores 4 it fails to start the application.
When I give --executor-cores as 4 , it works fine.
Do you have
, Shushant Arora shushantaror...@gmail.com
wrote:
1.spark streaming 1.3 creates as many RDD Partitions as there are kafka
partitions in topic. Say I have 300 partitions in topic and 10 executors
and each with 3 cores so , is it means at a time only 10*3=30 partitions
are processed and then 30 like
1.spark streaming 1.3 creates as many RDD Partitions as there are kafka
partitions in topic. Say I have 300 partitions in topic and 10 executors
and each with 3 cores so , is it means at a time only 10*3=30 partitions
are processed and then 30 like that since executors launch tasks per RDD
...@koeninger.org wrote:
It's the consumer version. Should work with 0.8.2 clusters.
On Thu, Jul 9, 2015 at 11:10 AM, Shushant Arora shushantaror...@gmail.com
wrote:
Does spark streaming 1.3 requires kafka version 0.8.1.1 and is not
compatible with kafka 0.8.2 ?
As per maven dependency of spark
Does spark streaming 1.3 requires kafka version 0.8.1.1 and is not
compatible with kafka 0.8.2 ?
As per maven dependency of spark streaming 1.3 with kafka
dependenciesdependencygroupIdorg.apache.spark/groupIdartifactIdspark-streaming_2.10/artifactIdversion
1.3.0
1.Does creation of read only singleton object in each map function is same
as broadcast object as singleton never gets garbage collected unless
executor gets shutdown ? Aim is to avoid creation of complex object at each
batch interval of a spark streaming app.
2.why JavaStreamingContext 's sc ()
Is it possible to pause and resume a streaming app?
I have a streaming app which reads events from kafka and post to some
external source. I want to pause the app when external source is down and
resume it automatically when it comes back ?
Is it possible to pause the app and is it possible to
I have a requirement to write in kafka queue from a spark streaming
application.
I am using spark 1.2 streaming. Since different executors in spark are
allocated at each run so instantiating a new kafka producer at each run
seems a costly operation .Is there a way to reuse objects in processing
In spark streaming 1.2 , Is offset of kafka message consumed are updated in
zookeeper only after writing in WAL if WAL and checkpointig are enabled or
is it depends upon kafkaparams while initialing the kafkaDstream.
MapString,String kafkaParams = new HashMapString, String();
commit is disabled, no any part will
call commitOffset, you need to call this API yourself.
Also Kafka’s offset commitment mechanism is actually a timer way, so it is
asynchronized with replication.
*From:* Shushant Arora [mailto:shushantaror...@gmail.com]
*Sent:* Monday, July 6, 2015 8
.
On Mon, Jul 6, 2015 at 6:11 AM, Shushant Arora shushantaror...@gmail.com
wrote:
I have a requirement to write in kafka queue from a spark streaming
application.
I am using spark 1.2 streaming. Since different executors in spark are
allocated at each run so instantiating a new kafka producer
, Shushant Arora shushantaror...@gmail.com
wrote:
whats the difference between foreachPartition vs mapPartitions for a
Dtstream both works at partition granularity?
One is an operation and another is action but if I call an opeartion
afterwords mapPartitions also, which one is more
Hi
Is it possible to write custom RDD in java?
Requirement is - I am having a list of Sqlserver tables need to be dumped
in HDFS.
So I have a
ListString tables = {dbname.tablename,dbname.tablename2..};
then
JavaRDDString rdd = javasparkcontext.parllelise(tables);
JavaRDDString
on customRDD directly to save in hdfs.
On Thu, Jul 2, 2015 at 12:59 AM, Feynman Liang fli...@databricks.com
wrote:
On Wed, Jul 1, 2015 at 7:19 AM, Shushant Arora shushantaror...@gmail.com
wrote:
JavaRDDString rdd = javasparkcontext.parllelise(tables);
You are already creating an RDD
this in Spark could you just use the
existing JdbcRDD?
From: Shushant Arora
Date: Wednesday, July 1, 2015 at 10:19 AM
To: user
Subject: custom RDD in java
Hi
Is it possible to write custom RDD in java?
Requirement is - I am having a list of Sqlserver tables need to be
dumped in HDFS
-core_2.10(version 1.3) in my application but my
cluster has spark version 1.2 ?
On Mon, Jun 29, 2015 at 7:56 PM, Shushant Arora shushantaror...@gmail.com
wrote:
1. Here you are basically creating 2 receivers and asking each of them to
consume 3 kafka partitions each.
- In 1.2 we have high
too.
Best
Ayan
On Mon, Jun 29, 2015 at 4:02 AM, Shushant Arora shushantaror...@gmail.com
wrote:
Few doubts :
In 1.2 streaming when I use union of streams , my streaming application
getting hanged sometimes and nothing gets printed on driver.
[Stage 2
planning is specific to your environment and what the job is
actually doing, youll need to determine it empirically.
On Friday, June 26, 2015, Shushant Arora shushantaror...@gmail.com
wrote:
In 1.2 how to handle offset management after stream application starts
in each job . I should commit
, Shushant Arora shushantaror...@gmail.com
wrote:
I am using spark streaming 1.2.
If processing executors get crashed will receiver rest the offset back to
last processed offset?
If receiver itself got crashed is there a way to reset the offset without
restarting streaming application other than
1 - 100 of 126 matches
Mail list logo