Corelation between 2 consecutive RDDs in Dstream

2015-11-20 Thread anshu shukla
- Thanks & Regards, Anshu Shukla

Moving avg in saprk streaming

2015-11-19 Thread anshu shukla
Any formal way to do moving avg over fixed window duration . I calculated a simple moving average by creating a count stream and a sum stream; then joined them and finally calculated the mean. This was not per time window since time periods were part of the tuples. -- Thanks & Regards, A

REST api to avoid spark context creation

2015-10-18 Thread anshu shukla
creation for every Job. 2-Can i have something like pool to serve requests . -- Thanks & Regards, Anshu Shukla

Resource allocation in SPARK streaming

2015-09-01 Thread anshu shukla
I am not much clear about resource allocation (CPU/CORE/Thread level allocation) as per the parallelism by setting number of cores in spark standalone mode . Any guidelines for that . -- Thanks & Regards, Anshu Shukla

Setting number of CORES from inside the Topology (JAVA code )

2015-08-26 Thread anshu shukla
)) { sparkConf.setExecutorEnv(SPARK_WORKER_CORES,1); } -- Thanks Regards, Anshu Shukla

Re: Graceful shutdown for Spark Streaming

2015-07-30 Thread anshu shukla
) streamingContext.stop() On Wed, Jul 29, 2015 at 6:55 PM, anshu shukla anshushuk...@gmail.com wrote: If we want to stop the application after fix-time period , how it will work . (How to give the duration in logic , in my case sleep(t.s.) is not working .) So i used to kill coarseGrained job at each

Re: Graceful shutdown for Spark Streaming

2015-07-29 Thread anshu shukla
stop the context gracefully? How is it done? Is there a signal sent to the driver process? For EMR, is there a way how to terminate an EMR cluster with Spark Streaming graceful shutdown? Thanks! -- Thanks Regards, Anshu Shukla

Parallelism of Custom receiver in spark

2015-07-25 Thread anshu shukla
1 - How to increase the level of *parallelism in spark streaming custom RECEIVER* . 2 - Will ssc.receiverstream(/**anything //) will *delete the data stored in spark memory using store(s) * logic . -- Thanks Regards, Anshu Shukla

ReceiverStream SPARK not able to cope up with 20,000 events /sec .

2015-07-25 Thread anshu shukla
(,); } String s1=MsgIdAddandRemove.addMessageId(tuple.toString(),msgId); store(s1); } -- Thanks Regards, Anshu Shukla

Re: Ordering of Batches in Spark streaming

2015-07-12 Thread anshu shukla
Anyone who can give some highlight over HOW SPARK DOES *ORDERING OF BATCHES * . On Sat, Jul 11, 2015 at 9:19 AM, anshu shukla anshushuk...@gmail.com wrote: Thanks Ayan , I was curious to know* how Spark does it *.Is there any *Documentation* where i can get the detail about

Ordering of Batches in Spark streaming

2015-07-10 Thread anshu shukla
1.4 in this context. Any Comments please !! -- Thanks Regards, Anshu Shukla

Re: Ordering of Batches in Spark streaming

2015-07-10 Thread anshu shukla
in partitions like a normal RDD, so following rdd.zipWithIndex should give a wy to order them by the time they are received. On Sat, Jul 11, 2015 at 12:50 PM, anshu shukla anshushuk...@gmail.com wrote: Hey , Is there any *guarantee of fix ordering among the batches/RDDs* . After searching a lot I

Error while taking union

2015-07-08 Thread anshu shukla
Hi all , I want to create union of 2 DStreams , in one of them *RDD is created per 1 second* , other is having RDD generated by reduceByWindowandKey with *duration set to 60 sec.* (slide duration also 60 sec .) - Main idea is to do some analysis for every minute data and emitting union

Re: Applying functions over certain count of tuples .

2015-06-30 Thread anshu shukla
the final grouping doesn't have exactly 5 items, if that matters. On Mon, Jun 29, 2015 at 3:57 PM, anshu shukla anshushuk...@gmail.com wrote: I want to apply some logic on the basis of a FIX count of number of tuples in each RDD . *suppose emit one rdd for every 5 tuple of previous RDD

Applying functions over certain count of tuples .

2015-06-29 Thread anshu shukla
I want to apply some logic on the basis of a FIX count of number of tuples in each RDD . *suppose emit one rdd for every 5 tuple of previous RDD . * -- Thanks Regards, Anshu Shukla

Re: Parsing a tsv file with key value pairs

2015-06-25 Thread anshu shukla
? Thanks Ravikant -- Thanks Regards, Anshu Shukla

Loss of data due to congestion

2015-06-24 Thread anshu shukla
How spark guarantees that no RDD will fail /lost during its life cycle . Is there something like ask in storm or its does it by default . -- Thanks Regards, Anshu Shukla

Re: Loss of data due to congestion

2015-06-24 Thread anshu shukla
Thaks, I am talking about streaming. On 25 Jun 2015 5:37 am, ayan guha guha.a...@gmail.com wrote: Can you elaborate little more? Are you talking about receiver or streaming? On 24 Jun 2015 23:18, anshu shukla anshushuk...@gmail.com wrote: How spark guarantees that no RDD will fail /lost

Calculating tuple count /input rate with time

2015-06-23 Thread anshu shukla
FunctionJavaRDDString, Void() { @Override public Void call(JavaRDDString stringJavaRDD) throws Exception { System.out.println(System.currentTimeMillis()+,spoutstringJavaRDD, + stringJavaRDD.count() ); return null; } }); -- Thanks Regards, Anshu Shukla

Re: Multiple executors writing file using java filewriter

2015-06-23 Thread anshu shukla
Thanks alot , Because i just want to log timestamp and unique message id and not full RDD . On Tue, Jun 23, 2015 at 12:41 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Why don't you do a normal .saveAsTextFiles? Thanks Best Regards On Mon, Jun 22, 2015 at 11:55 PM, anshu shukla

Re: Multiple executors writing file using java filewriter

2015-06-22 Thread anshu shukla
(); throw e; } System.out.println(msgid,+msgId); return msgeditor.addMessageId(v1,msgId); } }); -- Thanks Regards, Anshu Shukla On Mon, Jun 22, 2015 at 10:50 PM, anshu shukla anshushuk...@gmail.com wrote: Can not we write some data to a txt file

Multiple executors writing file using java filewriter

2015-06-22 Thread anshu shukla
Can not we write some data to a txt file in parallel with multiple executors running in parallel ?? -- Thanks Regards, Anshu Shukla

Re: Using Accumulators in Streaming

2015-06-22 Thread anshu shukla
, anshu shukla anshushuk...@gmail.com wrote: In spark Streaming ,Since we are already having Streaming context , which does not allows us to have accumulators .We have to get sparkContext for initializing accumulator value . But having 2 spark context will not serve the problem . Please Help

Re: Multiple executors writing file using java filewriter

2015-06-22 Thread anshu shukla
wrote: Is spoutLog just a non-spark file writer? If you run that in the map call on a cluster its going to be writing in the filesystem of the executor its being run on. I'm not sure if that's what you intended. On Mon, Jun 22, 2015 at 1:35 PM, anshu shukla anshushuk...@gmail.com wrote

Updation of Static variable inside foreachRDD method

2015-06-21 Thread anshu shukla
, + msgeditor.getMessageId(s)); //System.out.println(msgeditor.getMessageId(s)); } return null; } }); -- Thanks Regards, Anshu Shukla

Using Accumulators in Streaming

2015-06-21 Thread anshu shukla
In spark Streaming ,Since we are already having Streaming context , which does not allows us to have accumulators .We have to get sparkContext for initializing accumulator value . But having 2 spark context will not serve the problem . Please Help !! -- Thanks Regards, Anshu Shukla

Verifying number of workers in Spark Streaming

2015-06-20 Thread anshu shukla
How to know that In stream Processing over the cluster of 8 machines all the machines/woker nodes are being used (my cluster have 8 slaves ) . -- Thanks Regards, Anshu Shukla

Fwd: Verifying number of workers in Spark Streaming

2015-06-20 Thread anshu shukla
not able figure out that my job is using all workers or not . -- Thanks Regards, Anshu Shukla SERC-IISC

Re: Assigning number of workers in spark streaming

2015-06-19 Thread anshu shukla
documented in the online documentation. http://spark.apache.org/docs/latest/submitting-applications.html On Fri, Jun 19, 2015 at 2:29 PM, anshu shukla anshushuk...@gmail.com wrote: Hey , *[For Client Mode]* 1- Is there any way to assign the number of workers from a cluster should be used

Assigning number of workers in spark streaming

2015-06-19 Thread anshu shukla
-wordcountstatistical analysis} then on how many workers it will be scheduled . -- Thanks Regards, Anshu Shukla SERC-IISC

Re: Latency between the RDD in Streaming

2015-06-19 Thread anshu shukla
, and when it is processed, isnt it? On Thu, Jun 18, 2015 at 2:28 PM, anshu shukla anshushuk...@gmail.com wrote: Thanks alot , But i have already tried the second way ,Problem with that is that how to identify the particular RDD from source to sink (as we can do by passing a msg id in storm

Latency between the RDD in Streaming

2015-06-18 Thread anshu shukla
Is there any fixed way to find among RDD in stream processing systems , in the Distributed set-up . -- Thanks Regards, Anshu Shukla

Re: Latency between the RDD in Streaming

2015-06-18 Thread anshu shukla
are asking. Find what among RDD? On Thu, Jun 18, 2015 at 11:24 AM, anshu shukla anshushuk...@gmail.com wrote: Is there any fixed way to find among RDD in stream processing systems , in the Distributed set-up . -- Thanks Regards, Anshu Shukla -- Thanks Regards, Anshu Shukla

Re: Latency between the RDD in Streaming

2015-06-18 Thread anshu shukla
...@databricks.com wrote: Its not clear what you are asking. Find what among RDD? On Thu, Jun 18, 2015 at 11:24 AM, anshu shukla anshushuk...@gmail.com wrote: Is there any fixed way to find among RDD in stream processing systems , in the Distributed set-up . -- Thanks Regards, Anshu Shukla

Implementing and Using a Custom Actor-based Receiver

2015-06-17 Thread anshu shukla
Is there any good sample code in java to implement *Implementing and Using a Custom Actor-based Receiver .* -- Thanks Regards, Anshu Shukla

Using queueStream

2015-06-15 Thread anshu shukla
JavaDStreamString inputStream = ssc.queueStream(rddQueue); Can this rddQueue be of dynamic type in nature .If yes then how to make it run untill rddQueue is not finished . Any other way to get rddQueue from a dynamically updatable Normal Queue . -- Thanks Regards, SERC-IISC Anshu Shukla

Problem: Custom Receiver for getting events from a Dynamic Queue

2015-06-15 Thread anshu shukla
### ); try { this.eventQueue.put(event); } catch (InterruptedException e) { // TODO Auto-generated catch block e.printStackTrace(); } } } -- Thanks Regards, Anshu Shukla

Union of two Diff. types of DStreams

2015-06-14 Thread anshu shukla
How to take union of JavaPairDStreamString, Integer and JavaDStreamString . *a.union(b) is working only with Dstreams of same type.* -- Thanks Regards, Anshu Shukla

Re: EVent generation

2015-05-12 Thread anshu shukla
, May 10, 2015 at 3:21 PM, anshu shukla anshushuk...@gmail.com wrote: http://stackoverflow.com/questions/30149868/generate-events-tuples-using-csv-file-with-timestamps -- Thanks Regards, Anshu Shukla -- Thanks Regards, Anshu Shukla

EVent generation

2015-05-10 Thread anshu shukla
http://stackoverflow.com/questions/30149868/generate-events-tuples-using-csv-file-with-timestamps -- Thanks Regards, Anshu Shukla

Re: Map one RDD into two RDD

2015-05-08 Thread anshu shukla
On Fri, May 8, 2015 at 2:42 AM, anshu shukla anshushuk...@gmail.com wrote: One of the best discussion in mailing list :-) ...Please help me in concluding -- The whole discussion concludes that - 1- Framework does not support increasing parallelism of any task just by any inbuilt function

Predict.scala using model for clustering In reference

2015-05-07 Thread anshu shukla
/blob/master/twitter_classifier/predict.md -- Thanks Regards, Anshu Shukla

Re: Map one RDD into two RDD

2015-05-07 Thread anshu shukla
an efficient way to do it. Any suggestions? Many thanks. Bill -- Many thanks. Bill -- Many thanks. Bill -- Many thanks. Bill -- Thanks Regards, Anshu Shukla

Re: Creating topology in spark streaming

2015-05-06 Thread anshu shukla
, Juan 2015-05-06 10:32 GMT+02:00 anshu shukla anshushuk...@gmail.com: But main problem is how to increase the level of parallelism for any particular bolt logic . suppose i want this type of topology . https://storm.apache.org/documentation/images/topology.png How we can manage

Re: Creating topology in spark streaming

2015-05-06 Thread anshu shukla
transformation on a dstream will create another dstream. You may want to take a look at foreachrdd? Also, kindly share your code so people can help better On 6 May 2015 17:54, anshu shukla anshushuk...@gmail.com wrote: Please help guys, Even After going through all the examples given i have

Creating topology in spark streaming

2015-05-06 Thread anshu shukla
of parallelism since the logic of topology is not clear . -- Thanks Regards, Anshu Shukla Indian Institute of Sciences

[no subject]

2015-05-06 Thread anshu shukla
libraryDependencies += org.apache.spark % spark-streaming_2.10 % 1.3.1 -- Thanks Regards, Anshu Shukla Indian Institute of Science

Re: Event generator for SPARK-Streaming from csv

2015-05-05 Thread anshu shukla
anshu shukla anshushuk...@gmail.com: I have the real DEBS-TAxi data in csv file , in order to operate over it how to simulate a Spout kind of thing as event generator using the timestamps in CSV file. -- Thanks Regards, Anshu Shukla -- Thanks Regards, Anshu Shukla

spark log analyzer sample

2015-05-04 Thread anshu shukla
Exception in thread main java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4 I am not using any hadoop facility (not even hdfs) then why it is giving this error . -- Thanks Regards, Anshu Shukla

Fwd: Event generator for SPARK-Streaming from csv

2015-05-01 Thread anshu shukla
I have the real DEBS-TAxi data in csv file , in order to operate over it how to simulate a Spout kind of thing as event generator using the timestamps in CSV file. -- Thanks Regards, Anshu Shukla

Event generator for SPARK-Streaming from csv

2015-04-29 Thread anshu shukla
I have the real DEBS-TAxi data in csv file , in order to operate over it how to simulate a Spout kind of thing as event generator using the timestamps in CSV file. -- SERC-IISC Thanks Regards, Anshu Shukla

Support for Data flow graphs and not DAG only

2015-04-02 Thread anshu shukla
Hey , I didn't find any documentation regarding support for cycles in spark topology , although storm supports this using manual configuration in acker function logic (setting it to a particular count) .By cycles i doesn't mean infinite loops . -- Thanks Regards, Anshu Shukla