Hi,
I have set spark.streaming.receiver.maxRate to 100. My batch interval is
4sec but still sometimes there are more than 400 records per batch. I am using
spark 1.2.0.
Regards,Laeeq
Hi,
Any comments please.
Regards,Laeeq
On Friday, April 17, 2015 11:37 AM, Laeeq Ahmed
laeeqsp...@yahoo.com.INVALID wrote:
Hi,
I am working with multiple Kafka streams (23 streams) and currently I am
processing them separately. I receive one stream from each topic. I have
.yiv8130515999MsoChpDefault {font-size:10.0pt;} _filtered #yiv8130515999
{margin:1.0in 1.0in 1.0in 1.0in;}#yiv8130515999 div.yiv8130515999WordSection1
{}#yiv8130515999 And what is the message rate of each topic mate – that was the
other part of the required clarifications From: Laeeq Ahmed
And what is the message rate of each topic mate – that was the
other part of the required clarifications From: Laeeq Ahmed
[mailto:laeeqsp...@yahoo.com]
Sent: Monday, April 20, 2015 3:38 PM
To: Evo Eftimov; user@spark.apache.org
Subject: Re: Equal number of RDD Blocks Hi, I have two different
receivers (ie running in parallel) giving a start of two different DSTreams
From: Laeeq Ahmed [mailto:laeeqsp...@yahoo.com.INVALID]
Sent: Monday, April 20, 2015 3:15 PM
To: user@spark.apache.org
Subject: Equal number of RDD Blocks Hi, I have two streams of data from
kafka. How can I make
Hi,
I have two streams of data from kafka. How can I make approx. equal number of
RDD blocks of on two executors.Please see the attachement, one worker has 1785
RDD blocks and the other has 26.
Regards,Laeeq
-
To unsubscribe,
Hi,
I am working with multiple Kafka streams (23 streams) and currently I am
processing them separately. I receive one stream from each topic. I have the
following questions.
1. Spark streaming guide suggests to union these streams. Is it possible to
get statistics of each stream even after
, Laeeq Ahmed laeeqsp...@yahoo.com.invalid
wrote:
Hi,
Earlier my code was like follwing but slow due to repartition. I want top K of
each window in a stream.
val counts = keyAndValues.map(x =
math.round(x._3.toDouble)).countByValueAndWindow(Seconds(4), Seconds(4))val
topCounts = counts.repartition
Hi,
I normally use dstream.transform whenever I need to use methods which are
available in RDD API but not in streaming API. e.g. dstream.transform(x =
x.sortByKey(true))
But there are other RDD methods which return types other than RDD. e.g.
dstream.transform(x = x.top(5)) top here returns
, Laeeq Ahmed laeeqsp...@yahoo.com.invalid
wrote:
Hi,
I normally use dstream.transform whenever I need to use methods which are
available in RDD API but not in streaming API. e.g. dstream.transform(x =
x.sortByKey(true))
But there are other RDD methods which return types other than RDD. e.g
as it's not the same as top()
On Fri, Mar 13, 2015 at 11:23 AM, Akhil Das ak...@sigmoidanalytics.com wrote:
Like this?
dtream.repartition(1).mapPartitions(it = it.take(5))
Thanks
Best Regards
On Fri, Mar 13, 2015 at 4:11 PM, Laeeq Ahmed laeeqsp...@yahoo.com.invalid
wrote:
Hi,
I normally
Hi,
I have a streaming application where am doing top 10 count in each window which
seems slow. Is there efficient way to do this.
val counts = keyAndValues.map(x =
math.round(x._3.toDouble)).countByValueAndWindow(Seconds(4), Seconds(4))
val topCounts =
Hi,
I am filtering first DStream with the value in second DStream. I also want to
keep the value of second Dstream. I have done the following and having problem
with returning new RDD:
val transformedFileAndTime = fileAndTime.transformWith(anomaly, (rdd1:
RDD[(String,String)], rdd2 : RDD[Int])
instantiate RDDs anyway.
On Fri, Mar 6, 2015 at 7:06 PM, Laeeq Ahmed
laeeqsp...@yahoo.com.invalid wrote:
Hi,
I am filtering first DStream with the value in second DStream. I also want
to keep the value of second Dstream. I have done the following and having
problem with returning new RDD
= your_stream.mapPartitions(rdd = rdd.take(10))
ThanksBest Regards
On Mon, Jan 5, 2015 at 11:08 PM, Laeeq Ahmed laeeqsp...@yahoo.com.invalid
wrote:
Hi,
I am counting values in each window and find the top values and want to save
only the top 10 frequent values of each window to hdfs rather than all the
values
Hi,
It worked out as this.
val topCounts = sortedCounts.transform(rdd = rdd.zipWithIndex().filter(x=x._2
=10))
Regards,Laeeq
On Wednesday, January 7, 2015 5:25 PM, Laeeq Ahmed
laeeqsp...@yahoo.com.INVALID wrote:
Hi Yana,
I also think thatval top10 = your_stream.mapPartitions(rdd
, Laeeq Ahmed laeeqsp...@yahoo.com.invalid
wrote:
Hi,
I applied it as fallows:
eegStreams(a) = KafkaUtils.createStream(ssc, zkQuorum, group, Map(args(a) -
1),StorageLevel.MEMORY_AND_DISK_SER).map(_._2)val counts = eegStreams(a).map(x
= math.round(x.toDouble)).countByValueAndWindow(Seconds(4
Hi,
I am counting values in each window and find the top values and want to save
only the top 10 frequent values of each window to hdfs rather than all the
values.
eegStreams(a) = KafkaUtils.createStream(ssc, zkQuorum, group, Map(args(a) -
1),StorageLevel.MEMORY_AND_DISK_SER).map(_._2)val
Hi,
I am using spark standalone on EC2. I can access ephemeral hdfs from
spark-shell interface but I can't access hdfs in standalone application. I am
using spark 1.2.0 with hadoop 2.4.0 and launched cluster from ec2 folder from
my local machine. In my pom file I have given hadoop client as
= currentAppTimeWindowEnd
r._1 currentAppTimeWindowStart) // filter and retain only the records
that fall in the current app-time window
return filteredRDD
})
Hope this helps!
TD
On Thu, Jul 17, 2014 at 5:54 AM, Laeeq Ahmed laeeqsp...@yahoo.com wrote:
Hi TD,
I have been able to filter
) // filter and retain only
the records that fall in the timestamp-based window
return filteredRDD
})
Consider the input tuples as (1,23),(1.2,34) . . . . . (3.8,54)(4,413) . . .
whereas key is the timestamp.
Regards,
Laeeq
On Saturday, July 12, 2014 8:29 PM, Laeeq Ahmed
Hi,
In the spark streaming paper, slack time has been suggested for delaying the
batch creation in case of external timestamps. I don't see any such option in
streamingcontext. Is it available in the API?
Also going through the previous posts, queueStream has been suggested for this.
I
Hi,
For QueueRDD, have a look here.
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/QueueStream.scala
Regards,
Laeeq
On Friday, May 23, 2014 10:33 AM, Mayur Rustagi mayur.rust...@gmail.com wrote:
Well its hard to use text data
Hi,
First use foreachrdd and then use collect as
DStream.foreachRDD(rdd = {
rdd.collect.foreach({
Also its better to use scala. Less verbose.
Regards,
Laeeq
On Wednesday, July 9, 2014 3:29 PM, Madabhattula Rajesh Kumar
mrajaf...@gmail.com wrote:
Hi Team,
Could you please
Hi,
For QueueRDD, have a look here.
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/QueueStream.scala
Regards,
Laeeq,
PhD candidatte,
KTH, Stockholm.
On Sunday, July 6, 2014 10:20 AM, alessandro finamore
alessandro.finam...@polito.it
Hi,
The window size in a spark streaming is time based which means we have
different number of elements in each window. For example if you have two
streams (might be more) which are related to each other and you want to compare
them in a specific time interval. I am not clear how it will work.
Hi,
The window size in a spark streaming is time based which means we have
different number of elements in each window. For example if you have two
streams (might be more) which are related to each other and you want to compare
them in a specific time interval. I am not clear how it will
above is that the streams just pump data in
at different rates -- first one got 7462 points in the first batch
interval, whereas stream2 saw 10493
On Tue, Jul 1, 2014 at 5:34 AM, Laeeq Ahmed laeeqsp...@yahoo.com wrote:
Hi,
The window size in a spark streaming is time based which means we have
Hi,
I want to do union of all RDDs in each window of DStream. I found Dstream.union
and haven't seen anything like DStream.windowRDDUnion.
Is there any way around it?
I want to find mean and SD of all values which comes under each sliding window
for which I need to union all the RDDs in each
as a steam may not be able to use the
entire data in the file for your analysis. Spark (give enough memory) can
process large amounts of data quickly.
On May 15, 2014, at 9:52 AM, Laeeq Ahmed laeeqsp...@yahoo.com wrote:
Hi,
I have data in a file. Can I read it as Stream in spark? I know
artifactIdscala-library/artifactId
version${scala.version}/version
/dependency
/dependencies
/project
On Tue, May 6, 2014 at 4:10 PM, Laeeq Ahmed laeeqsp...@yahoo.com wrote:
Hi all,
If anyone is using maven for building scala classes with all dependencies
Hi,
I have data in a file. Can I read it as Stream in spark? I know it seems odd to
read file as stream but it has practical applications in real life if I can
read it as stream. It there any other tools which can give this file as stream
to Spark or I have to make batches manually which is
Hi Ian,
Don't use SPARK_MEM in spark-env.sh. It will get it set for all of your jobs.
The better way is to use only the second option
sconf.setExecutorEnv(spark.executor.memory, 4g”) i.e. set it in the driver
program. In this way every job will have memory according to requirment.
For example
Hi,
I use the following code for calculating average. The problem is that the
reduce operation return a DStream here and not a tuple as it normally does
without Streaming. So how can we get the sum and the count from the DStream.
Can we cast it to tuple?
val numbers =
Hi all,
There seems to be a problem. I am not getting mails from spark user group from
two days.
Regards,
Laeeq
Hi,
I use the following code for calculating average. The problem is that the
reduce operation return a DStream here and not a tuple as it normally does
without Streaming. So how can we get the sum and the count from the DStream.
Can we cast it to tuple?
val numbers =
runs but
it does not scale...
On Apr 17, 2014 2:45 AM, Laeeq Ahmed laeeqsp...@yahoo.com wrote:
Hi,
For one of my application, I want to use Random forests(RF) on top of
spark. I see that currenlty MLLib does not have implementation for RF.
What other opensource RF implementations will be great
Hi,
For one of my application, I want to use Random forests(RF) on top of spark. I
see that currenlty MLLib does not have implementation for RF. What other
opensource RF implementations will be great to use with spark in terms of speed?
Regards,
Laeeq Ahmed,
KTH, Sweden.
VerifyError: class
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$CreateSnapshotRequestProto
overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
What can be cause of this error?
Regards,
Laeeq Ahmed,
PhD Student,
HPCViz, KTH.
http
39 matches
Mail list logo