spark.streaming.receiver.maxRate Not taking effect

2015-07-01 Thread Laeeq Ahmed
Hi, I have set spark.streaming.receiver.maxRate to 100. My batch interval is 4sec but still sometimes there are more than 400 records per batch. I am using spark 1.2.0. Regards,Laeeq

Re: Some questions on Multiple Streams

2015-04-24 Thread Laeeq Ahmed
Hi, Any comments please. Regards,Laeeq On Friday, April 17, 2015 11:37 AM, Laeeq Ahmed laeeqsp...@yahoo.com.INVALID wrote: Hi, I am working with multiple Kafka streams (23 streams) and currently I am processing them separately. I receive one stream from each topic. I have

Re: Equal number of RDD Blocks

2015-04-20 Thread Laeeq Ahmed
.yiv8130515999MsoChpDefault {font-size:10.0pt;} _filtered #yiv8130515999 {margin:1.0in 1.0in 1.0in 1.0in;}#yiv8130515999 div.yiv8130515999WordSection1 {}#yiv8130515999 And what is the message rate of each topic mate – that was the other part of the required clarifications  From: Laeeq Ahmed

Re: Equal number of RDD Blocks

2015-04-20 Thread Laeeq Ahmed
And what is the message rate of each topic mate – that was the other part of the required clarifications  From: Laeeq Ahmed [mailto:laeeqsp...@yahoo.com] Sent: Monday, April 20, 2015 3:38 PM To: Evo Eftimov; user@spark.apache.org Subject: Re: Equal number of RDD Blocks  Hi,  I have two different

Re: Equal number of RDD Blocks

2015-04-20 Thread Laeeq Ahmed
receivers (ie running in parallel) giving a start of two different DSTreams   From: Laeeq Ahmed [mailto:laeeqsp...@yahoo.com.INVALID] Sent: Monday, April 20, 2015 3:15 PM To: user@spark.apache.org Subject: Equal number of RDD Blocks  Hi,  I have two streams of data from kafka. How can I make

Equal number of RDD Blocks

2015-04-20 Thread Laeeq Ahmed
Hi, I have two streams of data from kafka. How can I make approx. equal number of RDD blocks of on two executors.Please see the attachement, one worker has 1785 RDD blocks and the other has 26.  Regards,Laeeq - To unsubscribe,

Some questions on Multiple Streams

2015-04-17 Thread Laeeq Ahmed
Hi, I am working with multiple Kafka streams (23 streams) and currently I am processing them separately. I receive one stream from each topic. I have the following questions. 1.    Spark streaming guide suggests to union these streams. Is it possible to get statistics of each stream even after

Re: Using rdd methods with Dstream

2015-03-14 Thread Laeeq Ahmed
, Laeeq Ahmed laeeqsp...@yahoo.com.invalid wrote: Hi, Earlier my code was like follwing but slow due to repartition. I want top K of each window in a stream. val counts = keyAndValues.map(x = math.round(x._3.toDouble)).countByValueAndWindow(Seconds(4), Seconds(4))val topCounts = counts.repartition

Using rdd methods with Dstream

2015-03-13 Thread Laeeq Ahmed
Hi, I normally use dstream.transform whenever I need to use methods which are available in RDD API but not in streaming API. e.g. dstream.transform(x = x.sortByKey(true)) But there are other RDD methods which return types other than RDD. e.g. dstream.transform(x = x.top(5)) top here returns

Re: Using rdd methods with Dstream

2015-03-13 Thread Laeeq Ahmed
, Laeeq Ahmed laeeqsp...@yahoo.com.invalid wrote: Hi, I normally use dstream.transform whenever I need to use methods which are available in RDD API but not in streaming API. e.g. dstream.transform(x = x.sortByKey(true)) But there are other RDD methods which return types other than RDD. e.g

Re: Using rdd methods with Dstream

2015-03-13 Thread Laeeq Ahmed
as it's not the same as top() On Fri, Mar 13, 2015 at 11:23 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Like this? dtream.repartition(1).mapPartitions(it = it.take(5)) Thanks Best Regards On Fri, Mar 13, 2015 at 4:11 PM, Laeeq Ahmed laeeqsp...@yahoo.com.invalid wrote: Hi, I normally

Efficient Top count in each window

2015-03-12 Thread Laeeq Ahmed
Hi,  I have a streaming application where am doing top 10 count in each window which seems slow. Is there efficient way to do this. val counts = keyAndValues.map(x = math.round(x._3.toDouble)).countByValueAndWindow(Seconds(4), Seconds(4))        val topCounts =

Help with transformWith in SparkStreaming

2015-03-06 Thread Laeeq Ahmed
Hi, I am filtering first DStream with the value in second DStream. I also want to keep the value of second Dstream. I have done the following and having problem with returning new RDD: val transformedFileAndTime = fileAndTime.transformWith(anomaly, (rdd1: RDD[(String,String)], rdd2 : RDD[Int])

Re: Help with transformWith in SparkStreaming

2015-03-06 Thread Laeeq Ahmed
instantiate RDDs anyway. On Fri, Mar 6, 2015 at 7:06 PM, Laeeq Ahmed laeeqsp...@yahoo.com.invalid wrote: Hi, I am filtering first DStream with the value in second DStream. I also want to keep the value of second Dstream. I have done the following and having problem with returning new RDD

Re: Saving partial (top 10) DStream windows to hdfs

2015-01-07 Thread Laeeq Ahmed
= your_stream.mapPartitions(rdd = rdd.take(10)) ThanksBest Regards On Mon, Jan 5, 2015 at 11:08 PM, Laeeq Ahmed laeeqsp...@yahoo.com.invalid wrote: Hi, I am counting values in each window and find the top values and want to save only the top 10 frequent values of each window to hdfs rather than all the values

Re: Saving partial (top 10) DStream windows to hdfs

2015-01-07 Thread Laeeq Ahmed
Hi, It worked out as this. val topCounts = sortedCounts.transform(rdd = rdd.zipWithIndex().filter(x=x._2 =10)) Regards,Laeeq On Wednesday, January 7, 2015 5:25 PM, Laeeq Ahmed laeeqsp...@yahoo.com.INVALID wrote: Hi Yana, I also think thatval top10 = your_stream.mapPartitions(rdd

Re: Saving partial (top 10) DStream windows to hdfs

2015-01-07 Thread Laeeq Ahmed
, Laeeq Ahmed laeeqsp...@yahoo.com.invalid wrote: Hi, I applied it as fallows:    eegStreams(a) = KafkaUtils.createStream(ssc, zkQuorum, group, Map(args(a) - 1),StorageLevel.MEMORY_AND_DISK_SER).map(_._2)val counts = eegStreams(a).map(x = math.round(x.toDouble)).countByValueAndWindow(Seconds(4

Saving partial (top 10) DStream windows to hdfs

2015-01-05 Thread Laeeq Ahmed
Hi, I am counting values in each window and find the top values and want to save only the top 10 frequent values of each window to hdfs rather than all the values. eegStreams(a) = KafkaUtils.createStream(ssc, zkQuorum, group, Map(args(a) - 1),StorageLevel.MEMORY_AND_DISK_SER).map(_._2)val

Host Error on EC2 while accessing hdfs from stadalone

2014-12-30 Thread Laeeq Ahmed
Hi, I am using spark standalone on EC2. I can access ephemeral hdfs from spark-shell interface but I can't access hdfs in standalone application. I am using spark 1.2.0 with hadoop 2.4.0 and launched cluster from ec2 folder from my local machine. In my pom file I have given hadoop client as

Re: Spark Streaming timing considerations

2014-07-21 Thread Laeeq Ahmed
= currentAppTimeWindowEnd r._1 currentAppTimeWindowStart) // filter and retain only the records that fall in the current app-time window  return filteredRDD  }) Hope this helps! TD On Thu, Jul 17, 2014 at 5:54 AM, Laeeq Ahmed laeeqsp...@yahoo.com wrote: Hi TD, I have been able to filter

Re: Spark Streaming timing considerations

2014-07-17 Thread Laeeq Ahmed
)     // filter and retain only the records that fall in the timestamp-based window  return filteredRDD }) Consider the input tuples as (1,23),(1.2,34) . . . . . (3.8,54)(4,413) . . .  whereas key is the timestamp. Regards, Laeeq   On Saturday, July 12, 2014 8:29 PM, Laeeq Ahmed

Spark Streaming timing considerations

2014-07-11 Thread Laeeq Ahmed
Hi, In the spark streaming paper, slack time has been suggested for delaying the batch creation in case of external timestamps. I don't see any such option in streamingcontext. Is it available in the API? Also going through the previous posts, queueStream has been suggested for this. I

Re: controlling the time in spark-streaming

2014-07-09 Thread Laeeq Ahmed
Hi, For QueueRDD, have a look here. https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/QueueStream.scala   Regards, Laeeq On Friday, May 23, 2014 10:33 AM, Mayur Rustagi mayur.rust...@gmail.com wrote: Well its hard to use text data

Re: how to convert JavaDStreamString to JavaRDDString

2014-07-09 Thread Laeeq Ahmed
Hi, First use foreachrdd and then use collect as DStream.foreachRDD(rdd = {    rdd.collect.foreach({ Also its better to use scala. Less verbose. Regards, Laeeq On Wednesday, July 9, 2014 3:29 PM, Madabhattula Rajesh Kumar mrajaf...@gmail.com wrote: Hi Team, Could you please

Re: window analysis with Spark and Spark streaming

2014-07-09 Thread Laeeq Ahmed
Hi, For QueueRDD, have a look here. https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/QueueStream.scala Regards, Laeeq, PhD candidatte, KTH, Stockholm.   On Sunday, July 6, 2014 10:20 AM, alessandro finamore alessandro.finam...@polito.it

Spark Streaming question batch size

2014-07-01 Thread Laeeq Ahmed
Hi, The window size in a spark streaming is time based which means we have different number of elements in each window. For example if you have two streams (might be more) which are related to each other and you want to compare them in a specific time interval. I am not clear how it will work.

Window Size

2014-07-01 Thread Laeeq Ahmed
Hi, The window size in a spark streaming is time based which means we have different number of elements in each window. For example if you have two streams (might be more) which are related to each other and you want to compare them in a specific time interval. I am not clear how it will

Re: Spark Streaming question batch size

2014-07-01 Thread Laeeq Ahmed
above is that the streams just pump data in at different rates -- first one got 7462 points in the first batch interval, whereas stream2 saw 10493 On Tue, Jul 1, 2014 at 5:34 AM, Laeeq Ahmed laeeqsp...@yahoo.com wrote: Hi, The window size in a spark streaming is time based which means we have

RDD union of a window in Dstream

2014-05-21 Thread Laeeq Ahmed
Hi, I want to do union of all RDDs in each window of DStream. I found Dstream.union and haven't seen anything like DStream.windowRDDUnion. Is there any way around it? I want to find mean and SD of all values which comes under each sliding window for which I need to union all the RDDs in each

Re: Historical Data as Stream

2014-05-17 Thread Laeeq Ahmed
as a steam may not be able to use the entire data in the file for your analysis. Spark (give enough memory) can process large amounts of data quickly.  On May 15, 2014, at 9:52 AM, Laeeq Ahmed laeeqsp...@yahoo.com wrote: Hi, I have data in a file. Can I read it as Stream in spark? I know

Re: maven for building scala simple program

2014-05-16 Thread Laeeq Ahmed
            artifactIdscala-library/artifactId             version${scala.version}/version         /dependency     /dependencies /project On Tue, May 6, 2014 at 4:10 PM, Laeeq Ahmed laeeqsp...@yahoo.com wrote: Hi all, If anyone is using maven for building scala classes with all dependencies

Historical Data as Stream

2014-05-16 Thread Laeeq Ahmed
Hi, I have data in a file. Can I read it as Stream in spark? I know it seems odd to read file as stream but it has practical applications in real life if I can read it as stream. It there any other tools which can give this file as stream to Spark or I have to make batches manually which is

Re: Easy one

2014-05-15 Thread Laeeq Ahmed
Hi Ian, Don't use SPARK_MEM in spark-env.sh. It will get it set for all of your jobs. The better way is to use only the second option sconf.setExecutorEnv(spark.executor.memory, 4g”) i.e. set it in the driver program. In this way every job will have memory according to requirment. For example

Average of each RDD in Stream

2014-05-15 Thread Laeeq Ahmed
Hi, I use the following code for calculating average. The problem is that the reduce operation return a DStream here and not a tuple as it normally does without Streaming. So how can we get the sum and the count from the DStream. Can we cast it to tuple? val numbers =

Not getting mails from user group

2014-05-15 Thread Laeeq Ahmed
Hi all, There seems to be a problem. I am not getting mails from spark user group from two days. Regards, Laeeq

Average of each RDD in Stream

2014-05-12 Thread Laeeq Ahmed
Hi, I use the following code for calculating average. The problem is that the reduce operation return a DStream here and not a tuple as it normally does without Streaming. So how can we get the sum and the count from the DStream. Can we cast it to tuple? val numbers =

Re: Random Forest on Spark

2014-04-18 Thread Laeeq Ahmed
runs but it does not scale... On Apr 17, 2014 2:45 AM, Laeeq Ahmed laeeqsp...@yahoo.com wrote: Hi, For one of my application, I want to use Random forests(RF) on top of spark. I see that currenlty MLLib does not have implementation for RF. What other opensource RF implementations will be great

Random Forest on Spark

2014-04-17 Thread Laeeq Ahmed
Hi, For one of my application, I want to use Random forests(RF) on top of spark. I see that currenlty MLLib does not have implementation for RF. What other opensource RF implementations will be great to use with spark in terms of speed? Regards, Laeeq Ahmed, KTH, Sweden.

Error while reading from HDFS Simple application

2014-03-20 Thread Laeeq Ahmed
VerifyError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$CreateSnapshotRequestProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; What can be cause of this error? Regards, Laeeq Ahmed, PhD Student, HPCViz, KTH. http