-
Thanks & Regards,
Anshu Shukla
Any formal way to do moving avg over fixed window duration .
I calculated a simple moving average by creating a count stream and a sum
stream; then joined them and finally calculated the mean. This was not per
time window since time periods were part of the tuples.
--
Thanks & Regards,
A
creation for every Job.
2-Can i have something like pool to serve requests .
--
Thanks & Regards,
Anshu Shukla
I am not much clear about resource allocation (CPU/CORE/Thread level
allocation) as per the parallelism by setting number of cores in spark
standalone mode .
Any guidelines for that .
--
Thanks & Regards,
Anshu Shukla
))
{
sparkConf.setExecutorEnv(SPARK_WORKER_CORES,1);
}
--
Thanks Regards,
Anshu Shukla
)
streamingContext.stop()
On Wed, Jul 29, 2015 at 6:55 PM, anshu shukla anshushuk...@gmail.com
wrote:
If we want to stop the application after fix-time period , how it will
work . (How to give the duration in logic , in my case sleep(t.s.) is not
working .) So i used to kill coarseGrained job at each
stop the context gracefully? How is it done? Is
there a signal sent to the driver process?
For EMR, is there a way how to terminate an EMR cluster with Spark
Streaming graceful shutdown?
Thanks!
--
Thanks Regards,
Anshu Shukla
1 - How to increase the level of *parallelism in spark streaming custom
RECEIVER* .
2 - Will ssc.receiverstream(/**anything //) will *delete the data
stored in spark memory using store(s) * logic .
--
Thanks Regards,
Anshu Shukla
(,);
}
String s1=MsgIdAddandRemove.addMessageId(tuple.toString(),msgId);
store(s1);
}
--
Thanks Regards,
Anshu Shukla
Anyone who can give some highlight over HOW SPARK DOES *ORDERING OF
BATCHES * .
On Sat, Jul 11, 2015 at 9:19 AM, anshu shukla anshushuk...@gmail.com
wrote:
Thanks Ayan ,
I was curious to know* how Spark does it *.Is there any *Documentation*
where i can get the detail about
1.4 in this context.
Any Comments please !!
--
Thanks Regards,
Anshu Shukla
in partitions like a normal RDD, so following
rdd.zipWithIndex should give a wy to order them by the time they are
received.
On Sat, Jul 11, 2015 at 12:50 PM, anshu shukla anshushuk...@gmail.com
wrote:
Hey ,
Is there any *guarantee of fix ordering among the batches/RDDs* .
After searching a lot I
Hi all ,
I want to create union of 2 DStreams , in one of them *RDD is created
per 1 second* , other is having RDD generated by reduceByWindowandKey
with *duration set to 60 sec.* (slide duration also 60 sec .)
- Main idea is to do some analysis for every minute data and emitting
union
the final grouping doesn't have exactly 5 items, if
that matters.
On Mon, Jun 29, 2015 at 3:57 PM, anshu shukla anshushuk...@gmail.com
wrote:
I want to apply some logic on the basis of a FIX count of number of
tuples in each RDD . *suppose emit one rdd for every 5 tuple of
previous RDD
I want to apply some logic on the basis of a FIX count of number of
tuples in each RDD . *suppose emit one rdd for every 5 tuple of previous
RDD . *
--
Thanks Regards,
Anshu Shukla
?
Thanks
Ravikant
--
Thanks Regards,
Anshu Shukla
How spark guarantees that no RDD will fail /lost during its life cycle .
Is there something like ask in storm or its does it by default .
--
Thanks Regards,
Anshu Shukla
Thaks,
I am talking about streaming.
On 25 Jun 2015 5:37 am, ayan guha guha.a...@gmail.com wrote:
Can you elaborate little more? Are you talking about receiver or streaming?
On 24 Jun 2015 23:18, anshu shukla anshushuk...@gmail.com wrote:
How spark guarantees that no RDD will fail /lost
FunctionJavaRDDString, Void() {
@Override
public Void call(JavaRDDString stringJavaRDD) throws Exception {
System.out.println(System.currentTimeMillis()+,spoutstringJavaRDD,
+ stringJavaRDD.count() );
return null;
}
});
--
Thanks Regards,
Anshu Shukla
Thanks alot ,
Because i just want to log timestamp and unique message id and not full
RDD .
On Tue, Jun 23, 2015 at 12:41 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Why don't you do a normal .saveAsTextFiles?
Thanks
Best Regards
On Mon, Jun 22, 2015 at 11:55 PM, anshu shukla
();
throw e;
}
System.out.println(msgid,+msgId);
return msgeditor.addMessageId(v1,msgId);
}
});
--
Thanks Regards,
Anshu Shukla
On Mon, Jun 22, 2015 at 10:50 PM, anshu shukla anshushuk...@gmail.com
wrote:
Can not we write some data to a txt file
Can not we write some data to a txt file in parallel with multiple
executors running in parallel ??
--
Thanks Regards,
Anshu Shukla
, anshu shukla anshushuk...@gmail.com
wrote:
In spark Streaming ,Since we are already having Streaming context ,
which does not allows us to have accumulators .We have to get sparkContext
for initializing accumulator value .
But having 2 spark context will not serve the problem .
Please Help
wrote:
Is spoutLog just a non-spark file writer? If you run that in the map call
on a cluster its going to be writing in the filesystem of the executor its
being run on. I'm not sure if that's what you intended.
On Mon, Jun 22, 2015 at 1:35 PM, anshu shukla anshushuk...@gmail.com
wrote
, +
msgeditor.getMessageId(s));
//System.out.println(msgeditor.getMessageId(s));
}
return null;
}
});
--
Thanks Regards,
Anshu Shukla
In spark Streaming ,Since we are already having Streaming context , which
does not allows us to have accumulators .We have to get sparkContext for
initializing accumulator value .
But having 2 spark context will not serve the problem .
Please Help !!
--
Thanks Regards,
Anshu Shukla
How to know that In stream Processing over the cluster of 8 machines
all the machines/woker nodes are being used (my cluster have 8 slaves )
.
--
Thanks Regards,
Anshu Shukla
not able figure out that my
job is using all workers or not .
--
Thanks Regards,
Anshu Shukla
SERC-IISC
documented in the online documentation.
http://spark.apache.org/docs/latest/submitting-applications.html
On Fri, Jun 19, 2015 at 2:29 PM, anshu shukla anshushuk...@gmail.com
wrote:
Hey ,
*[For Client Mode]*
1- Is there any way to assign the number of workers from a cluster
should be used
-wordcountstatistical analysis}
then on how many workers it will be scheduled .
--
Thanks Regards,
Anshu Shukla
SERC-IISC
, and when it is processed,
isnt it?
On Thu, Jun 18, 2015 at 2:28 PM, anshu shukla anshushuk...@gmail.com
wrote:
Thanks alot , But i have already tried the second way ,Problem with
that is that how to identify the particular RDD from source to sink (as we
can do by passing a msg id in storm
Is there any fixed way to find among RDD in stream processing systems ,
in the Distributed set-up .
--
Thanks Regards,
Anshu Shukla
are asking. Find what among RDD?
On Thu, Jun 18, 2015 at 11:24 AM, anshu shukla anshushuk...@gmail.com
wrote:
Is there any fixed way to find among RDD in stream processing systems ,
in the Distributed set-up .
--
Thanks Regards,
Anshu Shukla
--
Thanks Regards,
Anshu Shukla
...@databricks.com
wrote:
Its not clear what you are asking. Find what among RDD?
On Thu, Jun 18, 2015 at 11:24 AM, anshu shukla anshushuk...@gmail.com
wrote:
Is there any fixed way to find among RDD in stream processing systems
, in the Distributed set-up .
--
Thanks Regards,
Anshu Shukla
Is there any good sample code in java to implement *Implementing and
Using a Custom Actor-based Receiver .*
--
Thanks Regards,
Anshu Shukla
JavaDStreamString inputStream = ssc.queueStream(rddQueue);
Can this rddQueue be of dynamic type in nature .If yes then how to
make it run untill rddQueue is not finished .
Any other way to get rddQueue from a dynamically updatable Normal Queue .
--
Thanks Regards,
SERC-IISC
Anshu Shukla
### );
try {
this.eventQueue.put(event);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
--
Thanks Regards,
Anshu Shukla
How to take union of JavaPairDStreamString, Integer and
JavaDStreamString .
*a.union(b) is working only with Dstreams of same type.*
--
Thanks Regards,
Anshu Shukla
, May 10, 2015 at 3:21 PM, anshu shukla anshushuk...@gmail.com
wrote:
http://stackoverflow.com/questions/30149868/generate-events-tuples-using-csv-file-with-timestamps
--
Thanks Regards,
Anshu Shukla
--
Thanks Regards,
Anshu Shukla
http://stackoverflow.com/questions/30149868/generate-events-tuples-using-csv-file-with-timestamps
--
Thanks Regards,
Anshu Shukla
On Fri, May 8, 2015 at 2:42 AM, anshu shukla anshushuk...@gmail.com wrote:
One of the best discussion in mailing list :-) ...Please help me in
concluding --
The whole discussion concludes that -
1- Framework does not support increasing parallelism of any task just
by any inbuilt function
/blob/master/twitter_classifier/predict.md
--
Thanks Regards,
Anshu Shukla
an
efficient way to do it. Any suggestions?
Many thanks.
Bill
--
Many thanks.
Bill
--
Many thanks.
Bill
--
Many thanks.
Bill
--
Thanks Regards,
Anshu Shukla
,
Juan
2015-05-06 10:32 GMT+02:00 anshu shukla anshushuk...@gmail.com:
But main problem is how to increase the level of parallelism for any
particular bolt logic .
suppose i want this type of topology .
https://storm.apache.org/documentation/images/topology.png
How we can manage
transformation on a dstream will create another dstream. You may
want to take a look at foreachrdd? Also, kindly share your code so people
can help better
On 6 May 2015 17:54, anshu shukla anshushuk...@gmail.com wrote:
Please help guys, Even After going through all the examples given i
have
of parallelism since the logic of topology is not clear .
--
Thanks Regards,
Anshu Shukla
Indian Institute of Sciences
libraryDependencies += org.apache.spark % spark-streaming_2.10 % 1.3.1
--
Thanks Regards,
Anshu Shukla
Indian Institute of Science
anshu shukla anshushuk...@gmail.com:
I have the real DEBS-TAxi data in csv file , in order to operate over it
how to simulate a Spout kind of thing as event generator using the
timestamps in CSV file.
--
Thanks Regards,
Anshu Shukla
--
Thanks Regards,
Anshu Shukla
Exception in thread main java.lang.RuntimeException:
org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot
communicate with client version 4
I am not using any hadoop facility (not even hdfs) then why it is giving
this error .
--
Thanks Regards,
Anshu Shukla
I have the real DEBS-TAxi data in csv file , in order to operate over it
how to simulate a Spout kind of thing as event generator using the
timestamps in CSV file.
--
Thanks Regards,
Anshu Shukla
I have the real DEBS-TAxi data in csv file , in order to operate over it
how to simulate a Spout kind of thing as event generator using the
timestamps in CSV file.
--
SERC-IISC
Thanks Regards,
Anshu Shukla
Hey ,
I didn't find any documentation regarding support for cycles in spark
topology , although storm supports this using manual configuration in
acker function logic (setting it to a particular count) .By cycles i
doesn't mean infinite loops .
--
Thanks Regards,
Anshu Shukla
52 matches
Mail list logo