Re: Can we delete topic in kafka

2016-05-11 Thread Eduardo Costa Alfaia
Hi,
It’s better creating a script that delete the kafka folder where exist the 
kafka topic and after create it again if need.

BR


Eduardo Costa Alfaia
Ph.D. Student in Telecommunications Engineering
Università degli Studi di Brescia
Tel: +39 3209333018








On 5/11/16, 09:48, "Snehalata Nagaje" <snehalata.nag...@harbingergroup.com> 
wrote:

>
>
>Hi , 
>
>Can we delete certain topic in kafka? 
>
>I have deleted using command 
>
>./kafka-topics.sh --delete --topic topic_billing --zookeeper localhost:2181 
>
>It says topic marked as deletion, but it does not actually delete topic. 
>
>Thanks, 
>Snehalata 


-- 

Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Installing Spark on Mac

2016-03-08 Thread Eduardo Costa Alfaia
Hi Aida,
The installation has detected a maven version 3.0.3. Update to 3.3.3 and
try again.
Il 08/Mar/2016 14:06, "Aida"  ha scritto:

> Hi all,
>
> Thanks everyone for your responses; really appreciate it.
>
> Eduardo - I tried your suggestions but ran into some issues, please see
> below:
>
> ukdrfs01:Spark aidatefera$ cd spark-1.6.0
> ukdrfs01:spark-1.6.0 aidatefera$ build/mvn -DskipTests clean package
> Using `mvn` from path: /usr/bin/mvn
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
> MaxPermSize=512M;
> support was removed in 8.0
> [INFO] Scanning for projects...
> [INFO]
> 
> [INFO] Reactor Build Order:
> [INFO]
> [INFO] Spark Project Parent POM
> [INFO] Spark Project Test Tags
> [INFO] Spark Project Launcher
> [INFO] Spark Project Networking
> [INFO] Spark Project Shuffle Streaming Service
> [INFO] Spark Project Unsafe
> [INFO] Spark Project Core
> [INFO] Spark Project Bagel
> [INFO] Spark Project GraphX
> [INFO] Spark Project Streaming
> [INFO] Spark Project Catalyst
> [INFO] Spark Project SQL
> [INFO] Spark Project ML Library
> [INFO] Spark Project Tools
> [INFO] Spark Project Hive
> [INFO] Spark Project Docker Integration Tests
> [INFO] Spark Project REPL
> [INFO] Spark Project Assembly
> [INFO] Spark Project External Twitter
> [INFO] Spark Project External Flume Sink
> [INFO] Spark Project External Flume
> [INFO] Spark Project External Flume Assembly
> [INFO] Spark Project External MQTT
> [INFO] Spark Project External MQTT Assembly
> [INFO] Spark Project External ZeroMQ
> [INFO] Spark Project External Kafka
> [INFO] Spark Project Examples
> [INFO] Spark Project External Kafka Assembly
> [INFO]
> [INFO]
> 
> [INFO] Building Spark Project Parent POM 1.6.0
> [INFO]
> 
> [INFO]
> [INFO] --- maven-clean-plugin:2.6.1:clean (default-clean) @
> spark-parent_2.10 ---
> [INFO]
> [INFO] --- maven-enforcer-plugin:1.4:enforce (enforce-versions) @
> spark-parent_2.10 ---
> [WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion
> failed with message:
> Detected Maven Version: 3.0.3 is not in the allowed range 3.3.3.
> [INFO]
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Spark Project Parent POM .. FAILURE [0.821s]
> [INFO] Spark Project Test Tags ... SKIPPED
> [INFO] Spark Project Launcher  SKIPPED
> [INFO] Spark Project Networking .. SKIPPED
> [INFO] Spark Project Shuffle Streaming Service ... SKIPPED
> [INFO] Spark Project Unsafe .. SKIPPED
> [INFO] Spark Project Core  SKIPPED
> [INFO] Spark Project Bagel ... SKIPPED
> [INFO] Spark Project GraphX .. SKIPPED
> [INFO] Spark Project Streaming ... SKIPPED
> [INFO] Spark Project Catalyst  SKIPPED
> [INFO] Spark Project SQL . SKIPPED
> [INFO] Spark Project ML Library .. SKIPPED
> [INFO] Spark Project Tools ... SKIPPED
> [INFO] Spark Project Hive  SKIPPED
> [INFO] Spark Project Docker Integration Tests  SKIPPED
> [INFO] Spark Project REPL  SKIPPED
> [INFO] Spark Project Assembly  SKIPPED
> [INFO] Spark Project External Twitter  SKIPPED
> [INFO] Spark Project External Flume Sink . SKIPPED
> [INFO] Spark Project External Flume .. SKIPPED
> [INFO] Spark Project External Flume Assembly . SKIPPED
> [INFO] Spark Project External MQTT ... SKIPPED
> [INFO] Spark Project External MQTT Assembly .. SKIPPED
> [INFO] Spark Project External ZeroMQ . SKIPPED
> [INFO] Spark Project External Kafka .. SKIPPED
> [INFO] Spark Project Examples  SKIPPED
> [INFO] Spark Project External Kafka Assembly . SKIPPED
> [INFO]
> 
> [INFO] BUILD FAILURE
> [INFO]
> 
> [INFO] Total time: 1.745s
> [INFO] Finished at: Tue Mar 08 18:01:48 GMT 2016
> [INFO] Final Memory: 19M/183M
> [INFO]
> 
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-enforcer-plugin:1.4:enforce
> (enforce-versions) on project spark-parent_2.10: Some Enforcer rules have
> failed. Look above for specific messages explaining why 

Re: Installing Spark on Mac

2016-03-04 Thread Eduardo Costa Alfaia
Hi Aida

Run only "build/mvn -DskipTests clean package”

BR

Eduardo Costa Alfaia
Ph.D. Student in Telecommunications Engineering
Università degli Studi di Brescia
Tel: +39 3209333018








On 3/4/16, 16:18, "Aida" <aida1.tef...@gmail.com> wrote:

>Hi all,
>
>I am a complete novice and was wondering whether anyone would be willing to
>provide me with a step by step guide on how to install Spark on a Mac; on
>standalone mode btw.
>
>I downloaded a prebuilt version, the second version from the top. However, I
>have not installed Hadoop and am not planning to at this stage.
>
>I also downloaded Scala from the Scala website, do I need to download
>anything else?
>
>I am very eager to learn more about Spark but am unsure about the best way
>to do it.
>
>I would be happy for any suggestions or ideas
>
>Many thanks,
>
>Aida
>
>
>
>--
>View this message in context: 
>http://apache-spark-user-list.1001560.n3.nabble.com/Installing-Spark-on-Mac-tp26397.html
>Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>-
>To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>For additional commands, e-mail: user-h...@spark.apache.org
>


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Accessing Web UI

2016-02-19 Thread Eduardo Costa Alfaia

Hi,
try http://OAhtvJ5MCA:8080

BR




On 2/19/16, 07:18, "vasbhat"  wrote:

>OAhtvJ5MCA


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Eduardo Costa Alfaia
Hi Gourav,

I did a prove as you said, for me it’s working, I am using spark in local mode, 
master and worker in the same machine. I run the example in spark-shell 
—package com.databricks:spark-csv_2.10:1.3.0 without errors.

BR

From:  Gourav Sengupta 
Date:  Monday, February 15, 2016 at 10:03
To:  Jorge Machado 
Cc:  Spark Group 
Subject:  Re: Using SPARK packages in Spark Cluster

Hi Jorge/ All,

Please please please go through this link  
http://spark.apache.org/docs/latest/spark-standalone.html. 
The link tells you how to start a SPARK cluster in local mode. If you have not 
started or worked in SPARK cluster in local mode kindly do not attempt in 
answering this question.

My question is how to use packages like  
https://github.com/databricks/spark-csv when I using SPARK cluster in local 
mode.

Regards,
Gourav Sengupta


On Mon, Feb 15, 2016 at 1:55 PM, Jorge Machado  wrote:
Hi Gourav, 

I did not unterstand your problem… the - - packages  command should not make 
any difference if you are running standalone or in YARN for example.  
Give us an example what packages are you trying to load, and what error are you 
getting…  If you want to use the libraries in spark-packages.org without the 
--packages why do you not use maven ? 
Regards 


On 12/02/2016, at 13:22, Gourav Sengupta  wrote:

Hi,

I am creating sparkcontext in a SPARK standalone cluster as mentioned here: 
http://spark.apache.org/docs/latest/spark-standalone.html using the following 
code:

--
sc.stop()
conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False) \
  .setMaster("spark://hostname:7077") \
  .set('spark.shuffle.service.enabled', True) \
  .set('spark.dynamicAllocation.enabled','true') \
  .set('spark.executor.memory','20g') \
  .set('spark.driver.memory', '4g') \
  .set('spark.default.parallelism',(multiprocessing.cpu_count() 
-1 ))
conf.getAll()
sc = SparkContext(conf = conf)

-(we should definitely be able to optimise the configuration but that is 
not the point here) ---

I am not able to use packages, a list of which is mentioned here 
http://spark-packages.org, using this method. 

Where as if I use the standard "pyspark --packages" option then the packages 
load just fine.

I will be grateful if someone could kindly let me know how to load packages 
when starting a cluster as mentioned above.


Regards,
Gourav Sengupta




-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


unsubscribe email

2016-02-01 Thread Eduardo Costa Alfaia
Hi Guys,
How could I unsubscribe the email e.costaalf...@studenti.unibs.it, that is an 
alias from my email e.costaalf...@unibs.it and it is registered in the mail 
list .

Thanks

Eduardo Costa Alfaia
PhD Student Telecommunication Engineering
Università degli Studi di Brescia-UNIBS


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Queue Full

2015-10-26 Thread Eduardo Costa Alfaia
Hi Guys,

How could I solving this problem?

% Failed to produce message: Local: Queue full
% Failed to produce message: Local: Queue full

Thanks

-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Queue Full

2015-10-26 Thread Eduardo Costa Alfaia
Hi Magnus
I think this answer
c) producing messages at a higher rate than the network or broker can
handle
How could I manager this?


> On 26 Oct 2015, at 17:45, Magnus Edenhill  wrote:
> 
> c) producing messages at a higher rate than the network or broker can
> handle


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Similar code in Java

2015-02-11 Thread Eduardo Costa Alfaia
Thanks Ted.

 On Feb 10, 2015, at 20:06, Ted Yu yuzhih...@gmail.com wrote:
 
 Please take a look at:
 examples/scala-2.10/src/main/java/org/apache/spark/examples/streaming/JavaDirectKafkaWordCount.java
 which was checked in yesterday.
 
 On Sat, Feb 7, 2015 at 10:53 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it 
 mailto:e.costaalf...@unibs.it wrote:
 Hi Ted,
 
 I’ve seen the codes, I am using  JavaKafkaWordCount.java but I would like 
 reproducing in java that I’ve done in scala. Is it possible doing the same 
 thing that scala code does in java?
 Principally this code below or something looks liked:
 
 val KafkaDStreams = (1 to numStreams) map {_ =
  KafkaUtils.createStream[String, String, StringDecoder, 
 StringDecoder](ssc, kafkaParams, topicMap,storageLevel = 
 StorageLevel.MEMORY_ONLY).map(_._2)
 
 
 

 On Feb 7, 2015, at 19:32, Ted Yu yuzhih...@gmail.com 
 mailto:yuzhih...@gmail.com wrote:
 
 Can you take a look at:
 
 ./examples/scala-2.10/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java
 ./external/kafka/src/test/java/org/apache/spark/streaming/kafka/JavaKafkaStreamSuite.java
 
 Cheers
 
 On Sat, Feb 7, 2015 at 9:45 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it 
 mailto:e.costaalf...@unibs.it wrote:
 Hi Guys,
 
 How could I doing in Java the code scala below?
 
 val KafkaDStreams = (1 to numStreams) map {_ =
  KafkaUtils.createStream[String, String, StringDecoder, 
 StringDecoder](ssc, kafkaParams, topicMap,storageLevel = 
 StorageLevel.MEMORY_ONLY).map(_._2)
   
   }
 val unifiedStream = ssc.union(KafkaDStreams)
 val sparkProcessingParallelism = 1
 unifiedStream.repartition(sparkProcessingParallelism)
 
 Thanks Guys
 
 Informativa sulla Privacy: http://www.unibs.it/node/8155 
 http://www.unibs.it/node/8155
 
 
 Informativa sulla Privacy: http://www.unibs.it/node/8155 
 http://www.unibs.it/node/8155


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Doubts Kafka

2015-02-08 Thread Eduardo Costa Alfaia
Hi Guys,

I have some doubts about the Kafka, the first is Why sometimes the applications 
prefer to connect to zookeeper instead brokers? Connecting to zookeeper could 
create an overhead, because we are inserting other element between producer and 
consumer. Another question is about the information sent by producer, in my 
tests the producer send the messages to brokers and a few minutes my HardDisk 
is full (my harddisk has 250GB), is there something to do in the configuration 
to minimize this?

Thanks 
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Similar code in Java

2015-02-07 Thread Eduardo Costa Alfaia
Hi Guys,

How could I doing in Java the code scala below?

val KafkaDStreams = (1 to numStreams) map {_ =
 KafkaUtils.createStream[String, String, StringDecoder, StringDecoder](ssc, 
kafkaParams, topicMap,storageLevel = StorageLevel.MEMORY_ONLY).map(_._2)
  
  }
val unifiedStream = ssc.union(KafkaDStreams)
val sparkProcessingParallelism = 1
unifiedStream.repartition(sparkProcessingParallelism)

Thanks Guys
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Error KafkaStream

2015-02-05 Thread Eduardo Costa Alfaia
Hi Guys, 
I’m getting this error in KafkaWordCount;

TaskSetManager: Lost task 0.0 in stage 4095.0 (TID 1281, 10.20.10.234): 
java.lang.ClassCastException: [B cannot be cast to java.lang.String 

 
at 
org.apache.spark.examples.streaming.KafkaWordCount$$anonfun$4$$anonfun$apply$1.apply(KafkaWordCount.scala:7


Some idea that could be?


Bellow the piece of code



val kafkaStream = { 

  
val kafkaParams = Map[String, String](  

  
zookeeper.connect - achab3:2181,   

  
group.id - mygroup,

 
zookeeper.connect.timeout.ms - 1,  

 
kafka.fetch.message.max.bytes - 400,   


auto.offset.reset - largest)   

  



val topicpMap = topics.split(,).map((_,numThreads.toInt)).toMap   

 
  //val lines = KafkaUtils.createStream[String, String, DefaultDecoder, 
DefaultDecoder](ssc, kafkaParams, topicpMa  
  
p, storageLevel = StorageLevel.MEMORY_ONLY_SER).map(_._2)   


val KafkaDStreams = (1 to numStreams).map {_ = 


KafkaUtils.createStream[String, String, DefaultDecoder, 
DefaultDecoder](ssc, kafkaParams, topicpMap, storageLe
vel = StorageLevel.MEMORY_ONLY_SER).map(_._2)   

 
}   

 
val unifiedStream = ssc.union(KafkaDStreams)
  
unifiedStream.repartition(sparkProcessingParallelism)   
  
 }

Thanks Guys
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Error KafkaStream

2015-02-05 Thread Eduardo Costa Alfaia
I don’t think so Sean.

 On Feb 5, 2015, at 16:57, Sean Owen so...@cloudera.com wrote:
 
 Is SPARK-4905 / https://github.com/apache/spark/pull/4371/files the same 
 issue?
 
 On Thu, Feb 5, 2015 at 7:03 AM, Eduardo Costa Alfaia
 e.costaalf...@unibs.it wrote:
 Hi Guys,
 I’m getting this error in KafkaWordCount;
 
 TaskSetManager: Lost task 0.0 in stage 4095.0 (TID 1281, 10.20.10.234):
 java.lang.ClassCastException: [B cannot be cast to java.lang.String
at
 org.apache.spark.examples.streaming.KafkaWordCount$$anonfun$4$$anonfun$apply$1.apply(KafkaWordCount.scala:7
 
 
 Some idea that could be?
 
 
 Bellow the piece of code
 
 
 
 val kafkaStream = {
val kafkaParams = Map[String, String](
zookeeper.connect - achab3:2181,
group.id - mygroup,
zookeeper.connect.timeout.ms - 1,
kafka.fetch.message.max.bytes - 400,
auto.offset.reset - largest)
 
val topicpMap = topics.split(,).map((_,numThreads.toInt)).toMap
  //val lines = KafkaUtils.createStream[String, String, DefaultDecoder,
 DefaultDecoder](ssc, kafkaParams, topicpMa
 p, storageLevel = StorageLevel.MEMORY_ONLY_SER).map(_._2)
val KafkaDStreams = (1 to numStreams).map {_ =
KafkaUtils.createStream[String, String, DefaultDecoder,
 DefaultDecoder](ssc, kafkaParams, topicpMap, storageLe
 vel = StorageLevel.MEMORY_ONLY_SER).map(_._2)
}
val unifiedStream = ssc.union(KafkaDStreams)
unifiedStream.repartition(sparkProcessingParallelism)
 }
 
 Thanks Guys
 
 Informativa sulla Privacy: http://www.unibs.it/node/8155


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Error KafkaStream

2015-02-05 Thread Eduardo Costa Alfaia
Hi Shao, 
When I changed to StringDecoder I’ve get this compiling error:

[error] 
/sata_disk/workspace/spark-1.2.0/examples/scala-2.10/src/main/scala/org/apache/spark/examples/streaming/KafkaW
ordCount.scala:78: not found: type StringDecoder
[error] KafkaUtils.createStream[String, String, StringDecoder, 
StringDecoder](ssc, kafkaParams, topicMap,stora
geLevel = StorageLevel.MEMORY_ONLY_SER).map(_._2)
[error] ^
[error] 
/sata_disk/workspace/spark-1.2.0/examples/scala-2.10/src/main/scala/org/apache/spark/examples/streaming/KafkaW
ordCount.scala:85: value split is not a member of Nothing
[error] val words = unifiedStream.flatMap(_.split( ))
[error] ^
[error] 
/sata_disk/workspace/spark-1.2.0/examples/scala-2.10/src/main/scala/org/apache/spark/examples/streaming/KafkaW
ordCount.scala:86: value reduceByKeyAndWindow is not a member of 
org.apache.spark.streaming.dstream.DStream[(Nothing, 
Long)]
[error] val wordCounts = words.map(x = (x, 1L)).reduceByKeyAndWindow(_ + 
_, _ - _, Seconds(20), Seconds(10), 2)
[error]  ^
[error] three errors found
[error] (examples/compile:compile) Compilation failed


 On Feb 6, 2015, at 02:11, Shao, Saisai saisai.s...@intel.com wrote:
 
 Hi,
 
 I think you should change the `DefaultDecoder` of your type parameter into 
 `StringDecoder`, seems you want to decode the message into String. 
 `DefaultDecoder` is to return Array[Byte], not String, so here class casting 
 will meet error.
 
 Thanks
 Jerry
 
 -Original Message-
 From: Eduardo Costa Alfaia [mailto:e.costaalf...@unibs.it] 
 Sent: Friday, February 6, 2015 12:04 AM
 To: Sean Owen
 Cc: user@spark.apache.org
 Subject: Re: Error KafkaStream
 
 I don’t think so Sean.
 
 On Feb 5, 2015, at 16:57, Sean Owen so...@cloudera.com wrote:
 
 Is SPARK-4905 / https://github.com/apache/spark/pull/4371/files the same 
 issue?
 
 On Thu, Feb 5, 2015 at 7:03 AM, Eduardo Costa Alfaia 
 e.costaalf...@unibs.it wrote:
 Hi Guys,
 I’m getting this error in KafkaWordCount;
 
 TaskSetManager: Lost task 0.0 in stage 4095.0 (TID 1281, 10.20.10.234):
 java.lang.ClassCastException: [B cannot be cast to java.lang.String
   at
 org.apache.spark.examples.streaming.KafkaWordCount$$anonfun$4$$anonfu
 n$apply$1.apply(KafkaWordCount.scala:7
 
 
 Some idea that could be?
 
 
 Bellow the piece of code
 
 
 
 val kafkaStream = {
   val kafkaParams = Map[String, String](
   zookeeper.connect - achab3:2181,
   group.id - mygroup,
   zookeeper.connect.timeout.ms - 1,
   kafka.fetch.message.max.bytes - 400,
   auto.offset.reset - largest)
 
   val topicpMap = topics.split(,).map((_,numThreads.toInt)).toMap
 //val lines = KafkaUtils.createStream[String, String, DefaultDecoder,
 DefaultDecoder](ssc, kafkaParams, topicpMa p, storageLevel = 
 StorageLevel.MEMORY_ONLY_SER).map(_._2)
   val KafkaDStreams = (1 to numStreams).map {_ =
   KafkaUtils.createStream[String, String, DefaultDecoder, 
 DefaultDecoder](ssc, kafkaParams, topicpMap, storageLe vel = 
 StorageLevel.MEMORY_ONLY_SER).map(_._2)
   }
   val unifiedStream = ssc.union(KafkaDStreams)
   unifiedStream.repartition(sparkProcessingParallelism)
 }
 
 Thanks Guys
 
 Informativa sulla Privacy: http://www.unibs.it/node/8155
 
 
 --
 Informativa sulla Privacy: http://www.unibs.it/node/8155
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
 commands, e-mail: user-h...@spark.apache.org
 


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


KafkaWordCount

2015-01-30 Thread Eduardo Costa Alfaia
Hi Guys,

I would like to put in the kafkawordcount scala code the kafka parameter:  val 
kafkaParams = Map(“fetch.message.max.bytes” - “400”). I’ve put this 
variable like this

val KafkaDStreams = (1 to numStreams) map {_ = 

  
KafkaUtils.createStream(ssc, kafkaParams, zkQuorum, group, 
topicpMap).map(_._2)


However I’ve gotten these erros:

 (jssc: org.apache.spark.streaming.api.java.JavaStreamingContext,zkQuorum: 
String,groupId: String,topics: jav  
   a.util.Map[String,Integer],storageLevel: 
org.apache.spark.storage.StorageLevel)org.apache.spark.streaming.api.java.Jav   
 
aPairReceiverInputDStream[String,String] and  

   
[error]   (ssc: org.apache.spark.streaming.StreamingContext,zkQuorum: 
String,groupId: String,topics: scala.collection.
 
immutable.Map[String,Int],storageLevel: 
org.apache.spark.storage.StorageLevel)org.apache.spark.streaming.dstream.Recei  
   
verInputDStream[(String, String)]

Thanks
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Error Compiling

2015-01-30 Thread Eduardo Costa Alfaia
Hi Guys,

some idea how solve this error

[error] 
/sata_disk/workspace/spark-1.1.1/examples/src/main/scala/org/apache/spark/examples/streaming/KafkaWordCount.scala:76:
 missing parameter type for expanded function ((x$6, x$7) = x$6.$plus(x$7))

[error] val wordCounts = words.map(x = (x, 1L)).reduceByWindow(_ + _, _ - 
_, Minutes(1), Seconds(2), 2)

Thanks
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


KafkaWordCount

2015-01-30 Thread Eduardo Costa Alfaia
Hi Guys,

I would like to put in the kafkawordcount scala code the kafka parameter:  val 
kafkaParams = Map(“fetch.message.max.bytes” - “400”). I’ve put this 
variable like this

val KafkaDStreams = (1 to numStreams) map {_ = 

  
KafkaUtils.createStream(ssc, kafkaParams, zkQuorum, group, 
topicpMap).map(_._2)


However I’ve gotten these erros:

 (jssc: org.apache.spark.streaming.api.java.JavaStreamingContext,zkQuorum: 
String,groupId: String,topics: jav  
   a.util.Map[String,Integer],storageLevel: 
org.apache.spark.storage.StorageLevel)org.apache.spark.streaming.api.java.Jav   
 
aPairReceiverInputDStream[String,String] and  

   
[error]   (ssc: org.apache.spark.streaming.StreamingContext,zkQuorum: 
String,groupId: String,topics: scala.collection.
 
immutable.Map[String,Int],storageLevel: 
org.apache.spark.storage.StorageLevel)org.apache.spark.streaming.dstream.Recei  
   
verInputDStream[(String, String)]

Thanks
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Issue size message

2015-01-19 Thread Eduardo Costa Alfaia
Hi All,
I am having an issue when using kafka with librdkafka. I've changed the 
message.max.bytes to 2MB in my server.properties config file, that is the size 
of my message, when I run the command line ./rdkafka_performance -C -t test -p 
0 -b computer49:9092, after consume some messages the consumer remain waiting 
something that don't arrive. My producer continues sending messages. Some idea?

% Using random seed 1421685059, verbosity level 1
% 214 messages and 1042835 bytes consumed in 20ms: 10518 msgs/s and 51.26 Mb/s, 
no compression
% 21788 messages and 106128192 bytes consumed in 1029ms: 21154 msgs/s and 
103.04 Mb/s, no compression
% 43151 messages and 210185259 bytes consumed in 2030ms: 21252 msgs/s and 
103.52 Mb/s, no compression
% 64512 messages and 314233575 bytes consumed in 3031ms: 21280 msgs/s and 
103.66 Mb/s, no compression
% 86088 messages and 419328692 bytes consumed in 4039ms: 21313 msgs/s and 
103.82 Mb/s, no compression
% 100504 messages and 490022646 bytes consumed in 5719ms: 17571 msgs/s and 
85.67 Mb/s, no compression
% 100504 messages and 490022646 bytes consumed in 6720ms: 14955 msgs/s and 
72.92 Mb/s, no compression
% 100504 messages and 490022646 bytes consumed in 7720ms: 13018 msgs/s and 
63.47 Mb/s, no compression
% 100504 messages and 490022646 bytes consumed in 8720ms: 11524 msgs/s and 
56.19 Mb/s, no compression
% 100504 messages and 490022646 bytes consumed in 9720ms: 10339 msgs/s and 
50.41 Mb/s, no compression
% 100504 messages and 490022646 bytes consumed in 10721ms: 9374 msgs/s and 
45.71 Mb/s, no compression
% 100504 messages and 490022646 bytes consumed in 11721ms: 8574 msgs/s and 
41.81 Mb/s, no compression
% 100504 messages and 490022646 bytes consumed in 12721ms: 7900 msgs/s and 
38.52 Mb/s, no compression
% 100504 messages and 490022646 bytes consumed in 13721ms: 7324 msgs/s and 
35.71 Mb/s, no compression
% 100504 messages and 490022646 bytes consumed in 14721ms: 6826 msgs/s and 
33.29 Mb/s, no compression
% 100504 messages and 490022646 bytes consumed in 15722ms: 6392 msgs/s and 
31.17 Mb/s, no compression
% 100504 messages and 490022646 bytes consumed in 16722ms: 6010 msgs/s and 
29.30 Mb/s, no 



The software when consume all offset send me the message:

% Consumer reached end of unibs.nec [0] message queue at offset 229790
RD_KAFKA_RESP_ERR__PARTITION_EOF: [-191]

However changed de message.max.bytes to 2MB I don’t receive the code from Kafka.

Anyone has some idea?

Thanks guys.
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Issue size message

2015-01-19 Thread Eduardo Costa Alfaia
Hi guys,

Ok, I’ve proved this and it was fine.

Thanks

 On Jan 19, 2015, at 19:10, Joe Stein joe.st...@stealth.ly wrote:
 
 If you increase the size of the messages for producing then you **MUST** also
 change *replica.fetch.max.bytes i*n the broker* server.properties *otherwise
 none of your replicas will be able to fetch from the leader and they will
 all fall out of the ISR. You also then need to change your consumers
 *fetch.message.max.bytes* in your consumers properties (whoever that might
 be configured for your specific consumer being used) so that they can read
 that data otherwise you won't see messages downstream.
 
 /***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
 /
 
 On Mon, Jan 19, 2015 at 1:03 PM, Magnus Edenhill mag...@edenhill.se wrote:
 
 (duplicating the github answer for reference)
 
 Hi Eduardo,
 
 the default maximum fetch size is 1 Meg which means your 2 Meg messages
 will not fit the fetch request.
 Try increasing it by appending -X fetch.message.max.bytes=400 to your
 command line.
 
 Regards,
 Magnus
 
 
 2015-01-19 17:52 GMT+01:00 Eduardo Costa Alfaia e.costaalf...@unibs.it:
 
 Hi All,
 I am having an issue when using kafka with librdkafka. I've changed the
 message.max.bytes to 2MB in my server.properties config file, that is the
 size of my message, when I run the command line ./rdkafka_performance -C
 -t
 test -p 0 -b computer49:9092, after consume some messages the consumer
 remain waiting something that don't arrive. My producer continues sending
 messages. Some idea?
 
 % Using random seed 1421685059, verbosity level 1
 % 214 messages and 1042835 bytes consumed in 20ms: 10518 msgs/s and 51.26
 Mb/s, no compression
 % 21788 messages and 106128192 bytes consumed in 1029ms: 21154 msgs/s and
 103.04 Mb/s, no compression
 % 43151 messages and 210185259 bytes consumed in 2030ms: 21252 msgs/s and
 103.52 Mb/s, no compression
 % 64512 messages and 314233575 bytes consumed in 3031ms: 21280 msgs/s and
 103.66 Mb/s, no compression
 % 86088 messages and 419328692 bytes consumed in 4039ms: 21313 msgs/s and
 103.82 Mb/s, no compression
 % 100504 messages and 490022646 bytes consumed in 5719ms: 17571 msgs/s
 and
 85.67 Mb/s, no compression
 % 100504 messages and 490022646 bytes consumed in 6720ms: 14955 msgs/s
 and
 72.92 Mb/s, no compression
 % 100504 messages and 490022646 bytes consumed in 7720ms: 13018 msgs/s
 and
 63.47 Mb/s, no compression
 % 100504 messages and 490022646 bytes consumed in 8720ms: 11524 msgs/s
 and
 56.19 Mb/s, no compression
 % 100504 messages and 490022646 bytes consumed in 9720ms: 10339 msgs/s
 and
 50.41 Mb/s, no compression
 % 100504 messages and 490022646 bytes consumed in 10721ms: 9374 msgs/s
 and
 45.71 Mb/s, no compression
 % 100504 messages and 490022646 bytes consumed in 11721ms: 8574 msgs/s
 and
 41.81 Mb/s, no compression
 % 100504 messages and 490022646 bytes consumed in 12721ms: 7900 msgs/s
 and
 38.52 Mb/s, no compression
 % 100504 messages and 490022646 bytes consumed in 13721ms: 7324 msgs/s
 and
 35.71 Mb/s, no compression
 % 100504 messages and 490022646 bytes consumed in 14721ms: 6826 msgs/s
 and
 33.29 Mb/s, no compression
 % 100504 messages and 490022646 bytes consumed in 15722ms: 6392 msgs/s
 and
 31.17 Mb/s, no compression
 % 100504 messages and 490022646 bytes consumed in 16722ms: 6010 msgs/s
 and
 29.30 Mb/s, no
 
 
 
 The software when consume all offset send me the message:
 
 % Consumer reached end of unibs.nec [0] message queue at offset 229790
 RD_KAFKA_RESP_ERR__PARTITION_EOF: [-191]
 
 However changed de message.max.bytes to 2MB I don’t receive the code from
 Kafka.
 
 Anyone has some idea?
 
 Thanks guys.
 --
 Informativa sulla Privacy: http://www.unibs.it/node/8155
 
 


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


JavaKafkaWordCount

2014-11-18 Thread Eduardo Costa Alfaia
Hi Guys,

I am doing some tests with JavaKafkaWordCount, my cluster is composed by 8 
workers and 1 driver con spark-1.1.0, I am using Kafka too and I have some 
questions about.

1 - When I launch the command:
bin/spark-submit --class org.apache.spark.examples.streaming.JavaKafkaWordCount 
—master spark://computer8:7077 --driver-memory 1g --executor-memory 2g 
--executor-cores 2 
examples/target/scala-2.10/spark-examples-1.1.0-hadoop1.0.4.jar computer49:2181 
test-consumer-group test 2

I see in the Spark WebAdmin that only 1 worker work. Why?

2 - In Kafka I can see the same thing:

Group   Topic  Pid Offset  logSize  
   Lag Owner
test-consumer-group test  0   147092  147092
  0   test-consumer-group_computer1-1416319543858-817b566f-0
test-consumer-group test  1   232183  232183
  0   test-consumer-group_computer1-1416319543858-817b566f-0
test-consumer-group test  2   186805  186805
  0   test-consumer-group_computer1-1416319543858-817b566f-0
test-consumer-group test  3   0   0 
  0   test-consumer-group_computer1-1416319543858-817b566f-1
test-consumer-group test  4   0   0 
  0   test-consumer-group_computer1-1416319543858-817b566f-1
test-consumer-group test  5   0   0 
  0   test-consumer-group_computer1-1416319543858-817b566f-1

I would like to understand this behavior, Is it normal? Am I doing something 
wrong?

Thanks Guys
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Kafka examples

2014-11-13 Thread Eduardo Costa Alfaia
Hi guys,

The Kafka’s examples in master branch were canceled?

Thanks
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Information

2014-11-06 Thread Eduardo Costa Alfaia
Hi Guys

Anyone could explain me this information?

208K), 0.0086120 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] 
2014-11-06T12:20:55.673+0100: 1256.382: [GC2014-11-06T12:20:55.674+0100: 
1256.382: [ParNew: 551115K-2816K(613440K), 0.0204130 secs] 
560218K-13933K(4126208K), 0.0205130 secs] [Times: user=0.09 sys=0.01, 
real=0.02 secs] 
2014-11-06T12:21:03.372+0100: 1264.080: [GC2014-11-06T12:21:03.372+0100: 
1264.080: [ParNew: 547827K-1047K(613440K), 0.0073880 secs] 
558944K-12473K(4126208K), 0.0074770 secs] [Times: user=0.06 sys=0.00, 
real=0.00 secs] 
2014-11-06T12:21:10.416+0100: 1271.124: [GC2014-11-06T12:21:10.416+0100: 
1271.124: [ParNew: 545782K-2266K(613440K), 0.0069530 secs] 
557208K-13836K(4126208K), 0.0070420 secs] [Times: user=0.05 sys=0.00, 
real=0.01 secs] 
2014-11-06T12:21:18.307+0100: 1279.015: [GC2014-11-06T12:21:18.307+0100: 
1279.015: [ParNew: 546921K-2156K(613440K), 0.0071050 secs] 
558491K-13855K(4126208K), 0.0071900 secs] [Times: user=0.06 sys=0.00, 
real=0.01 secs] 
2014-11-06T12:21:26.394+0100: 1287.102: [GC2014-11-06T12:21:26.394+0100: 
1287.102: [ParNew: 546237K-3125K(613440K), 0.0071260 secs] 
557936K-14940K(4126208K), 0.0072170 secs] [Times: user=0.05 sys=0.00, 
real=0.00 secs] 
2014-11-06T12:21:33.913+0100: 1294.621: [GC2014-11-06T12:21:33.913+0100: 
1294.621: [ParNew: 547726K-2452K(613440K), 0.0070220 secs] 559541K-14367K(412

Thanks
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Consumer and Producer configs

2014-11-06 Thread Eduardo Costa Alfaia
Hi Guys,

How could I use the Consumer and Producer configs in my Kafka environment?

Thanks 

-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Spark and Kafka

2014-11-06 Thread Eduardo Costa Alfaia
Hi Guys,

I am doing some tests with Spark Streaming and Kafka, but I have seen something 
strange, I have modified the JavaKafkaWordCount to use ReducebyKeyandWindow and 
to print in the screen the accumulated numbers of the words, in the beginning 
spark works very well in each interaction the numbers of the words increase but 
after 12 a 13 sec the results repeats continually. 

My program producer remain sending the words toward the kafka.

Does anyone have any idea about this?


---
Time: 1415272266000 ms
---
(accompanied
them,6)
(merrier,5)
(it
possessed,5)
(the
treacherous,5)
(Quite,12)
(offer,273)
(rabble,58)
(exchanging,16)
(Genoa,18)
(merchant,41)
...
---
Time: 1415272267000 ms
---
(accompanied
them,12)
(merrier,12)
(it
possessed,12)
(the
treacherous,11)
(Quite,24)
(offer,602)
(rabble,132)
(exchanging,35)
(Genoa,36)
(merchant,84)
...
---
Time: 1415272268000 ms
---
(accompanied
them,17)
(merrier,18)
(it
possessed,17)
(the
treacherous,17)
(Quite,35)
(offer,889)
(rabble,192)
(the
bed,1)
(exchanging,51)
(Genoa,54)
...
---
Time: 1415272269000 ms
---
(accompanied
them,17)
(merrier,18)
(it
possessed,17)
(the
treacherous,17)
(Quite,35)
(offer,889)
(rabble,192)
(the
bed,1)
(exchanging,51)
(Genoa,54)
...

---
Time: 141527227 ms
---
(accompanied
them,17)
(merrier,18)
(it
possessed,17)
(the
treacherous,17)
(Quite,35)
(offer,889)
(rabble,192)
(the
bed,1)
(exchanging,51)
(Genoa,54)
...


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Spark and Kafka

2014-11-06 Thread Eduardo Costa Alfaia
This is my window:

reduceByKeyAndWindow(
   new Function2Integer, Integer, Integer() {
@Override
 public Integer call(Integer i1, Integer i2) { return i1 + i2; }
   },
   new Function2Integer, Integer, Integer() {
 public Integer call(Integer i1, Integer i2) { return i1 - i2; }
   },
   new Duration(60 * 5 * 1000),
   new Duration(1 * 1000)
 );

 On Nov 6, 2014, at 18:37, Gwen Shapira gshap...@cloudera.com wrote:
 
 What's the window size? If the window is around 10 seconds and you are
 sending data at very stable rate, this is expected.
 
 
 
 On Thu, Nov 6, 2014 at 9:32 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it
 wrote:
 
 Hi Guys,
 
 I am doing some tests with Spark Streaming and Kafka, but I have seen
 something strange, I have modified the JavaKafkaWordCount to use
 ReducebyKeyandWindow and to print in the screen the accumulated numbers of
 the words, in the beginning spark works very well in each interaction the
 numbers of the words increase but after 12 a 13 sec the results repeats
 continually.
 
 My program producer remain sending the words toward the kafka.
 
 Does anyone have any idea about this?
 
 
 ---
 Time: 1415272266000 ms
 ---
 (accompanied
 them,6)
 (merrier,5)
 (it
 possessed,5)
 (the
 treacherous,5)
 (Quite,12)
 (offer,273)
 (rabble,58)
 (exchanging,16)
 (Genoa,18)
 (merchant,41)
 ...
 ---
 Time: 1415272267000 ms
 ---
 (accompanied
 them,12)
 (merrier,12)
 (it
 possessed,12)
 (the
 treacherous,11)
 (Quite,24)
 (offer,602)
 (rabble,132)
 (exchanging,35)
 (Genoa,36)
 (merchant,84)
 ...
 ---
 Time: 1415272268000 ms
 ---
 (accompanied
 them,17)
 (merrier,18)
 (it
 possessed,17)
 (the
 treacherous,17)
 (Quite,35)
 (offer,889)
 (rabble,192)
 (the
 bed,1)
 (exchanging,51)
 (Genoa,54)
 ...
 ---
 Time: 1415272269000 ms
 ---
 (accompanied
 them,17)
 (merrier,18)
 (it
 possessed,17)
 (the
 treacherous,17)
 (Quite,35)
 (offer,889)
 (rabble,192)
 (the
 bed,1)
 (exchanging,51)
 (Genoa,54)
 ...
 
 ---
 Time: 141527227 ms
 ---
 (accompanied
 them,17)
 (merrier,18)
 (it
 possessed,17)
 (the
 treacherous,17)
 (Quite,35)
 (offer,889)
 (rabble,192)
 (the
 bed,1)
 (exchanging,51)
 (Genoa,54)
 ...
 
 
 --
 Informativa sulla Privacy: http://www.unibs.it/node/8155
 


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Spark and Kafka

2014-11-06 Thread Eduardo Costa Alfaia
Hi Guys,

I am doing some tests with Spark Streaming and Kafka, but I have seen something 
strange, I have modified the JavaKafkaWordCount to use ReducebyKeyandWindow and 
to print in the screen the accumulated numbers of the words, in the beginning 
spark works very well in each interaction the numbers of the words increase but 
after 12 a 13 sec the results repeats continually. 

My program producer remain sending the words toward the kafka.

Does anyone have any idea about this?


---
Time: 1415272266000 ms
---
(accompanied
them,6)
(merrier,5)
(it
possessed,5)
(the
treacherous,5)
(Quite,12)
(offer,273)
(rabble,58)
(exchanging,16)
(Genoa,18)
(merchant,41)
...
---
Time: 1415272267000 ms
---
(accompanied
them,12)
(merrier,12)
(it
possessed,12)
(the
treacherous,11)
(Quite,24)
(offer,602)
(rabble,132)
(exchanging,35)
(Genoa,36)
(merchant,84)
...
---
Time: 1415272268000 ms
---
(accompanied
them,17)
(merrier,18)
(it
possessed,17)
(the
treacherous,17)
(Quite,35)
(offer,889)
(rabble,192)
(the
bed,1)
(exchanging,51)
(Genoa,54)
...
---
Time: 1415272269000 ms
---
(accompanied
them,17)
(merrier,18)
(it
possessed,17)
(the
treacherous,17)
(Quite,35)
(offer,889)
(rabble,192)
(the
bed,1)
(exchanging,51)
(Genoa,54)
...

---
Time: 141527227 ms
---
(accompanied
them,17)
(merrier,18)
(it
possessed,17)
(the
treacherous,17)
(Quite,35)
(offer,889)
(rabble,192)
(the
bed,1)
(exchanging,51)
(Genoa,54)
...


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Spark and Kafka

2014-11-06 Thread Eduardo Costa Alfaia
This is my window:

reduceByKeyAndWindow(
   new Function2Integer, Integer, Integer() {
@Override
 public Integer call(Integer i1, Integer i2) { return i1 + i2; }
   },
   new Function2Integer, Integer, Integer() {
 public Integer call(Integer i1, Integer i2) { return i1 - i2; }
   },
   new Duration(60 * 5 * 1000),
   new Duration(1 * 1000)
 );

 On Nov 6, 2014, at 18:37, Gwen Shapira gshap...@cloudera.com wrote:
 
 What's the window size? If the window is around 10 seconds and you are
 sending data at very stable rate, this is expected.
 
 
 
 On Thu, Nov 6, 2014 at 9:32 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it
 wrote:
 
 Hi Guys,
 
 I am doing some tests with Spark Streaming and Kafka, but I have seen
 something strange, I have modified the JavaKafkaWordCount to use
 ReducebyKeyandWindow and to print in the screen the accumulated numbers of
 the words, in the beginning spark works very well in each interaction the
 numbers of the words increase but after 12 a 13 sec the results repeats
 continually.
 
 My program producer remain sending the words toward the kafka.
 
 Does anyone have any idea about this?
 
 
 ---
 Time: 1415272266000 ms
 ---
 (accompanied
 them,6)
 (merrier,5)
 (it
 possessed,5)
 (the
 treacherous,5)
 (Quite,12)
 (offer,273)
 (rabble,58)
 (exchanging,16)
 (Genoa,18)
 (merchant,41)
 ...
 ---
 Time: 1415272267000 ms
 ---
 (accompanied
 them,12)
 (merrier,12)
 (it
 possessed,12)
 (the
 treacherous,11)
 (Quite,24)
 (offer,602)
 (rabble,132)
 (exchanging,35)
 (Genoa,36)
 (merchant,84)
 ...
 ---
 Time: 1415272268000 ms
 ---
 (accompanied
 them,17)
 (merrier,18)
 (it
 possessed,17)
 (the
 treacherous,17)
 (Quite,35)
 (offer,889)
 (rabble,192)
 (the
 bed,1)
 (exchanging,51)
 (Genoa,54)
 ...
 ---
 Time: 1415272269000 ms
 ---
 (accompanied
 them,17)
 (merrier,18)
 (it
 possessed,17)
 (the
 treacherous,17)
 (Quite,35)
 (offer,889)
 (rabble,192)
 (the
 bed,1)
 (exchanging,51)
 (Genoa,54)
 ...
 
 ---
 Time: 141527227 ms
 ---
 (accompanied
 them,17)
 (merrier,18)
 (it
 possessed,17)
 (the
 treacherous,17)
 (Quite,35)
 (offer,889)
 (rabble,192)
 (the
 bed,1)
 (exchanging,51)
 (Genoa,54)
 ...
 
 
 --
 Informativa sulla Privacy: http://www.unibs.it/node/8155
 


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Spark Kafka Performance

2014-11-05 Thread Eduardo Costa Alfaia
Hi Bhavesh

I will collect the  dump and I will send for you.

I am using a program that I have caught here  
https://github.com/edenhill/librdkafka/tree/master/examples 
https://github.com/edenhill/librdkafka/tree/master/examples and I have 
changed to meet my tests. I have attached the files.






 On Nov 5, 2014, at 04:45, Bhavesh Mistry mistry.p.bhav...@gmail.com wrote:
 
 Hi Eduardo,
 
 Can you please take thread dump and see if there are blocking issues on
 producer side ?  Do you have single instance of Producers and Multiple
 treads ?
 
 Are you using Scala Producer or New Java Producer ?  Also, what is your
 producer property ?
 
 
 Thanks,
 
 Bhavesh
 
 On Tue, Nov 4, 2014 at 12:40 AM, Eduardo Alfaia e.costaalf...@unibs.it
 wrote:
 
 Hi Gwen,
 I have changed the java code kafkawordcount to use reducebykeyandwindow in
 spark.
 
 - Messaggio originale -
 Da: Gwen Shapira gshap...@cloudera.com
 Inviato: ‎03/‎11/‎2014 21:08
 A: users@kafka.apache.org users@kafka.apache.org
 Cc: u...@spark.incubator.apache.org u...@spark.incubator.apache.org
 Oggetto: Re: Spark Kafka Performance
 
 Not sure about the throughput, but:
 
 I mean that the words counted in spark should grow up - The spark
 word-count example doesn't accumulate.
 It gets an RDD every n seconds and counts the words in that RDD. So we
 don't expect the count to go up.
 
 
 
 On Mon, Nov 3, 2014 at 6:57 AM, Eduardo Costa Alfaia 
 e.costaalf...@unibs.it
 wrote:
 
 Hi Guys,
 Anyone could explain me how to work Kafka with Spark, I am using the
 JavaKafkaWordCount.java like a test and the line command is:
 
 ./run-example org.apache.spark.streaming.examples.JavaKafkaWordCount
 spark://192.168.0.13:7077 computer49:2181 test-consumer-group unibs.it 3
 
 and like a producer I am using this command:
 
 rdkafka_cachesender -t unibs.nec -p 1 -b 192.168.0.46:9092 -f output.txt
 -l 100 -n 10
 
 
 rdkafka_cachesender is a program that was developed by me which send to
 kafka the output.txt’s content where -l is the length of each send(upper
 bound) and -n is the lines to send in a row. Bellow is the throughput
 calculated by the program:
 
 File is 2235755 bytes
 throughput (b/s) = 699751388
 throughput (b/s) = 723542382
 throughput (b/s) = 662989745
 throughput (b/s) = 505028200
 throughput (b/s) = 471263416
 throughput (b/s) = 446837266
 throughput (b/s) = 409856716
 throughput (b/s) = 373994467
 throughput (b/s) = 366343097
 throughput (b/s) = 373240017
 throughput (b/s) = 386139016
 throughput (b/s) = 373802209
 throughput (b/s) = 369308515
 throughput (b/s) = 366935820
 throughput (b/s) = 365175388
 throughput (b/s) = 362175419
 throughput (b/s) = 358356633
 throughput (b/s) = 357219124
 throughput (b/s) = 352174125
 throughput (b/s) = 348313093
 throughput (b/s) = 355099099
 throughput (b/s) = 348069777
 throughput (b/s) = 348478302
 throughput (b/s) = 340404276
 throughput (b/s) = 339876031
 throughput (b/s) = 339175102
 throughput (b/s) = 327555252
 throughput (b/s) = 324272374
 throughput (b/s) = 322479222
 throughput (b/s) = 319544906
 throughput (b/s) = 317201853
 throughput (b/s) = 317351399
 throughput (b/s) = 315027978
 throughput (b/s) = 313831014
 throughput (b/s) = 310050384
 throughput (b/s) = 307654601
 throughput (b/s) = 305707061
 throughput (b/s) = 307961102
 throughput (b/s) = 296898200
 throughput (b/s) = 296409904
 throughput (b/s) = 294609332
 throughput (b/s) = 293397843
 throughput (b/s) = 293194876
 throughput (b/s) = 291724886
 throughput (b/s) = 290031314
 throughput (b/s) = 289747022
 throughput (b/s) = 289299632
 
 The throughput goes down after some seconds and it does not maintain the
 performance like the initial values:
 
 throughput (b/s) = 699751388
 throughput (b/s) = 723542382
 throughput (b/s) = 662989745
 
 Another question is about spark, after I have started the spark line
 command after 15 sec spark continue to repeat the words counted, but my
 program continue to send words to kafka, so I mean that the words counted
 in spark should grow up. I have attached the log from spark.
 
 My Case is:
 
 ComputerA(Kafka_cachsesender) - ComputerB(Kakfa-Brokers-Zookeeper) -
 ComputerC (Spark)
 
 If I don’t explain very well send a reply to me.
 
 Thanks Guys
 --
 Informativa sulla Privacy: http://www.unibs.it/node/8155
 
 
 --
 Informativa sulla Privacy: http://www.unibs.it/node/8155
 


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Producer and Consumer properties

2014-11-05 Thread Eduardo Costa Alfaia
Hi Dudes,

I would like to know if the producer and consumer’s properties files into the 
config folder should be configured. I have configured only the 
server.properties, is it enough? I am doing some tests about the performance, 
for example network throughput my scenario is:

Like producer I am using this program in c:



Like consumer this:




1 Server (zookeeper + 3 Brokers (8 partitions and Replication factor 3))
24GB RAM
5.0TB Hard Disc
eth0: Broadcom NetXtreme II BCM5709 1000Base-T 


Exist a great difference of throughput between the producer and consumer, does 
someone have any ideia?

Results:

ProducerConsumer
throughput (b/s) = 301393419received = 4083875, throughput (b/s) = 5571423
throughput (b/s) = 424807283received = 7146741, throughput (b/s) = 8061556
throughput (b/s) = 445245606received = 13270522, throughput (b/s) = 12925199
throughput (b/s) = 466454739received = 16333527, throughput (b/s) = 13890292
throughput (b/s) = 442368081received = 18375214, throughput (b/s) = 13967440
throughput (b/s) = 436540119received = 20416859, throughput (b/s) = 14127520
throughput (b/s) = 427105440received = 24500066, throughput (b/s) = 15594622
throughput (b/s) = 426395933received = 27563023, throughput (b/s) = 16177493
throughput (b/s) = 409344029received = 34708625, throughput (b/s) = 18740726
throughput (b/s) = 403371185received = 37771189, throughput (b/s) = 17961816
throughput (b/s) = 403325568received = 39813038, throughput (b/s) = 17654058
throughput (b/s) = 397938415received = 47979107, throughput (b/s) = 19686322
throughput (b/s) = 393364006received = 53083307, throughput (b/s) = 20623441
throughput (b/s) = 387393832received = 57166558, throughput (b/s) = 21050531
throughput (b/s) = 380266372received = 59207558, throughput (b/s) = 20654404
throughput (b/s) = 376436729received = 62269998, throughput (b/s) = 20740363
throughput (b/s) = 377043675received = 65332901, throughput (b/s) = 20888135
throughput (b/s) = 368613683received = 67374558, throughput (b/s) = 20467503
throughput (b/s) = 370020865received = 71457763, throughput (b/s) = 20727773
throughput (b/s) = 373827848received = 73499480, throughput (b/s) = 20171583
throughput (b/s) = 369647040received = 75541289, throughput (b/s) = 19599155
throughput (b/s) = 363395680received = 80645776, throughput (b/s) = 20033582


Thanks Guys



-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Spark Kafka Performance

2014-11-03 Thread Eduardo Costa Alfaia
Hi Guys,
Anyone could explain me how to work Kafka with Spark, I am using the 
JavaKafkaWordCount.java like a test and the line command is:

./run-example org.apache.spark.streaming.examples.JavaKafkaWordCount 
spark://192.168.0.13:7077 computer49:2181 test-consumer-group unibs.it 3

and like a producer I am using this command:

rdkafka_cachesender -t unibs.nec -p 1 -b 192.168.0.46:9092 -f output.txt -l 100 
-n 10


rdkafka_cachesender is a program that was developed by me which send to kafka 
the output.txt’s content where -l is the length of each send(upper bound) and 
-n is the lines to send in a row. Bellow is the throughput calculated by the 
program:

File is 2235755 bytes
throughput (b/s) = 699751388
throughput (b/s) = 723542382
throughput (b/s) = 662989745
throughput (b/s) = 505028200
throughput (b/s) = 471263416
throughput (b/s) = 446837266
throughput (b/s) = 409856716
throughput (b/s) = 373994467
throughput (b/s) = 366343097
throughput (b/s) = 373240017
throughput (b/s) = 386139016
throughput (b/s) = 373802209
throughput (b/s) = 369308515
throughput (b/s) = 366935820
throughput (b/s) = 365175388
throughput (b/s) = 362175419
throughput (b/s) = 358356633
throughput (b/s) = 357219124
throughput (b/s) = 352174125
throughput (b/s) = 348313093
throughput (b/s) = 355099099
throughput (b/s) = 348069777
throughput (b/s) = 348478302
throughput (b/s) = 340404276
throughput (b/s) = 339876031
throughput (b/s) = 339175102
throughput (b/s) = 327555252
throughput (b/s) = 324272374
throughput (b/s) = 322479222
throughput (b/s) = 319544906
throughput (b/s) = 317201853
throughput (b/s) = 317351399
throughput (b/s) = 315027978
throughput (b/s) = 313831014
throughput (b/s) = 310050384
throughput (b/s) = 307654601
throughput (b/s) = 305707061
throughput (b/s) = 307961102
throughput (b/s) = 296898200
throughput (b/s) = 296409904
throughput (b/s) = 294609332
throughput (b/s) = 293397843
throughput (b/s) = 293194876
throughput (b/s) = 291724886
throughput (b/s) = 290031314
throughput (b/s) = 289747022
throughput (b/s) = 289299632

The throughput goes down after some seconds and it does not maintain the 
performance like the initial values:

throughput (b/s) = 699751388
throughput (b/s) = 723542382
throughput (b/s) = 662989745

Another question is about spark, after I have started the spark line command 
after 15 sec spark continue to repeat the words counted, but my program 
continue to send words to kafka, so I mean that the words counted in spark 
should grow up. I have attached the log from spark.
  
My Case is:

ComputerA(Kafka_cachsesender) - ComputerB(Kakfa-Brokers-Zookeeper) - 
ComputerC (Spark)
 
If I don’t explain very well send a reply to me.

Thanks Guys
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Spark Kafka Performance

2014-11-03 Thread Eduardo Costa Alfaia
Hi Guys,
Anyone could explain me how to work Kafka with Spark, I am using the 
JavaKafkaWordCount.java like a test and the line command is:

./run-example org.apache.spark.streaming.examples.JavaKafkaWordCount 
spark://192.168.0.13:7077 computer49:2181 test-consumer-group unibs.it 3

and like a producer I am using this command:

rdkafka_cachesender -t unibs.nec -p 1 -b 192.168.0.46:9092 -f output.txt -l 100 
-n 10


rdkafka_cachesender is a program that was developed by me which send to kafka 
the output.txt’s content where -l is the length of each send(upper bound) and 
-n is the lines to send in a row. Bellow is the throughput calculated by the 
program:

File is 2235755 bytes
throughput (b/s) = 699751388
throughput (b/s) = 723542382
throughput (b/s) = 662989745
throughput (b/s) = 505028200
throughput (b/s) = 471263416
throughput (b/s) = 446837266
throughput (b/s) = 409856716
throughput (b/s) = 373994467
throughput (b/s) = 366343097
throughput (b/s) = 373240017
throughput (b/s) = 386139016
throughput (b/s) = 373802209
throughput (b/s) = 369308515
throughput (b/s) = 366935820
throughput (b/s) = 365175388
throughput (b/s) = 362175419
throughput (b/s) = 358356633
throughput (b/s) = 357219124
throughput (b/s) = 352174125
throughput (b/s) = 348313093
throughput (b/s) = 355099099
throughput (b/s) = 348069777
throughput (b/s) = 348478302
throughput (b/s) = 340404276
throughput (b/s) = 339876031
throughput (b/s) = 339175102
throughput (b/s) = 327555252
throughput (b/s) = 324272374
throughput (b/s) = 322479222
throughput (b/s) = 319544906
throughput (b/s) = 317201853
throughput (b/s) = 317351399
throughput (b/s) = 315027978
throughput (b/s) = 313831014
throughput (b/s) = 310050384
throughput (b/s) = 307654601
throughput (b/s) = 305707061
throughput (b/s) = 307961102
throughput (b/s) = 296898200
throughput (b/s) = 296409904
throughput (b/s) = 294609332
throughput (b/s) = 293397843
throughput (b/s) = 293194876
throughput (b/s) = 291724886
throughput (b/s) = 290031314
throughput (b/s) = 289747022
throughput (b/s) = 289299632

The throughput goes down after some seconds and it does not maintain the 
performance like the initial values:

throughput (b/s) = 699751388
throughput (b/s) = 723542382
throughput (b/s) = 662989745

Another question is about spark, after I have started the spark line command 
after 15 sec spark continue to repeat the words counted, but my program 
continue to send words to kafka, so I mean that the words counted in spark 
should grow up. I have attached the log from spark.
  
My Case is:

ComputerA(Kafka_cachsesender) - ComputerB(Kakfa-Brokers-Zookeeper) - 
ComputerC (Spark)
 
If I don’t explain very well send a reply to me.

Thanks Guys
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Clean Kafka Queue

2014-10-21 Thread Eduardo Costa Alfaia
Hi Guys,

Is there a manner of cleaning  a kafka queue after that the consumer consume 
the messages?

Thanks 
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Clean Kafka Queue

2014-10-21 Thread Eduardo Costa Alfaia
Ok guys,

Thanks by the help.

Regards
 On Oct 21, 2014, at 18:30, Joe Stein joe.st...@stealth.ly wrote:
 
 The concept of truncate topic comes up a lot.  I will add it as an item
 to https://issues.apache.org/jira/browse/KAFKA-1694
 
 It is a scary feature though, it might be best to wait until authorizations
 are in place before we release it.
 
 With 0.8.2 you can delete topics so at least you can start fresh easier.
 That should work in the mean time.  0.8.2-beta should be out this week :)
 
 /***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
 /
 
 On Tue, Oct 21, 2014 at 12:03 PM, Harsha ka...@harsha.io wrote:
 
 you can use log.retention.hours or log.retention.bytes to prune the log
 more info on that config here
 https://kafka.apache.org/08/configuration.html
 if you want to delete a message after the consumer processed a message
 there is no api for it.
 -Harsha
 
 
 On Tue, Oct 21, 2014, at 08:00 AM, Eduardo Costa Alfaia wrote:
 Hi Guys,
 
 Is there a manner of cleaning  a kafka queue after that the consumer
 consume the messages?
 
 Thanks
 --
 Informativa sulla Privacy: http://www.unibs.it/node/8155
 


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Spark's Behavior 2

2014-05-13 Thread Eduardo Costa Alfaia
Hi TD,

I have sent more informations now using 8 workers. The gap has been 27 sec now. 
Have you seen?
Thanks

BR
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Spark's behavior

2014-05-06 Thread Eduardo Costa Alfaia
Ok Andrew,
Thanks

I sent informations of test with 8 worker and the gap is grown up.

 
On May 4, 2014, at 2:31, Andrew Ash and...@andrewash.com wrote:

 From the logs, I see that the print() starts printing stuff 10 seconds 
 after the context is started. And that 10 seconds is taken by the initial 
 empty job (50 map + 20 reduce tasks) that spark streaming starts to ensure 
 all the executors have started. Somehow the first empty task takes 7-8 
 seconds to complete. See if this can be reproduced by running a simple, 
 empty job in spark shell (in the same cluster) and see if the first task 
 takes 7-8 seconds. 
 
 Either way, I didnt see the 30 second gap, but a 10 second gap. And that 
 does not seem to be a persistent problem as after that 10 seconds, the data 
 is being received and processed.
 
 TD


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Spark's behavior

2014-05-03 Thread Eduardo Costa Alfaia
Hi TD, Thanks for reply
This last experiment I did with one computer, like local, but I think that time 
gap grow up when I add more computer. I will do again now with 8 worker and 1 
word source and I will see what’s go on. I will control the time too, like 
suggested by Andrew. 
On May 3, 2014, at 1:19, Tathagata Das tathagata.das1...@gmail.com wrote:

 From the logs, I see that the print() starts printing stuff 10 seconds after 
 the context is started. And that 10 seconds is taken by the initial empty job 
 (50 map + 20 reduce tasks) that spark streaming starts to ensure all the 
 executors have started. Somehow the first empty task takes 7-8 seconds to 
 complete. See if this can be reproduced by running a simple, empty job in 
 spark shell (in the same cluster) and see if the first task takes 7-8 
 seconds. 
 
 Either way, I didnt see the 30 second gap, but a 10 second gap. And that does 
 not seem to be a persistent problem as after that 10 seconds, the data is 
 being received and processed.
 
 TD
 
 
 On Fri, May 2, 2014 at 2:14 PM, Eduardo Costa Alfaia e.costaalf...@unibs.it 
 wrote:
 Hi TD,
 
 I got the another information today using Spark 1.0 RC3 and the situation 
 remain the same:
 PastedGraphic-1.png
 
 The lines begin after 17 sec:
 
 14/05/02 21:52:25 INFO SparkDeploySchedulerBackend: Granted executor ID 
 app-20140502215225-0005/0 on hostPort computer8.ant-net:57229 with 2 cores, 
 2.0 GB RAM
 14/05/02 21:52:25 INFO AppClient$ClientActor: Executor updated: 
 app-20140502215225-0005/0 is now RUNNING
 14/05/02 21:52:25 INFO ReceiverTracker: ReceiverTracker started
 14/05/02 21:52:26 INFO ForEachDStream: metadataCleanupDelay = -1
 14/05/02 21:52:26 INFO SocketInputDStream: metadataCleanupDelay = -1
 14/05/02 21:52:26 INFO SocketInputDStream: Slide time = 1000 ms
 14/05/02 21:52:26 INFO SocketInputDStream: Storage level = 
 StorageLevel(false, false, false, false, 1)
 14/05/02 21:52:26 INFO SocketInputDStream: Checkpoint interval = null
 14/05/02 21:52:26 INFO SocketInputDStream: Remember duration = 1000 ms
 14/05/02 21:52:26 INFO SocketInputDStream: Initialized and validated 
 org.apache.spark.streaming.dstream.SocketInputDStream@5433868e
 14/05/02 21:52:26 INFO ForEachDStream: Slide time = 1000 ms
 14/05/02 21:52:26 INFO ForEachDStream: Storage level = StorageLevel(false, 
 false, false, false, 1)
 14/05/02 21:52:26 INFO ForEachDStream: Checkpoint interval = null
 14/05/02 21:52:26 INFO ForEachDStream: Remember duration = 1000 ms
 14/05/02 21:52:26 INFO ForEachDStream: Initialized and validated 
 org.apache.spark.streaming.dstream.ForEachDStream@1ebdbc05
 14/05/02 21:52:26 INFO SparkContext: Starting job: collect at 
 ReceiverTracker.scala:270
 14/05/02 21:52:26 INFO RecurringTimer: Started timer for JobGenerator at time 
 1399060346000
 14/05/02 21:52:26 INFO JobGenerator: Started JobGenerator at 1399060346000 ms
 14/05/02 21:52:26 INFO JobScheduler: Started JobScheduler
 14/05/02 21:52:26 INFO DAGScheduler: Registering RDD 3 (reduceByKey at 
 ReceiverTracker.scala:270)
 14/05/02 21:52:26 INFO ReceiverTracker: Stream 0 received 0 blocks
 14/05/02 21:52:26 INFO DAGScheduler: Got job 0 (collect at 
 ReceiverTracker.scala:270) with 20 output partitions (allowLocal=false)
 14/05/02 21:52:26 INFO DAGScheduler: Final stage: Stage 0(collect at 
 ReceiverTracker.scala:270)
 14/05/02 21:52:26 INFO DAGScheduler: Parents of final stage: List(Stage 1)
 14/05/02 21:52:26 INFO JobScheduler: Added jobs for time 1399060346000 ms
 14/05/02 21:52:26 INFO JobScheduler: Starting job streaming job 1399060346000 
 ms.0 from job set of time 1399060346000 ms
 14/05/02 21:52:26 INFO JobGenerator: Checkpointing graph for time 
 1399060346000 ms
 ---14/05/02 21:52:26 INFO 
 DStreamGraph: Updating checkpoint data for time 1399060346000 ms
 
 Time: 1399060346000 ms
 ---
 
 14/05/02 21:52:26 INFO JobScheduler: Finished job streaming job 1399060346000 
 ms.0 from job set of time 1399060346000 ms
 14/05/02 21:52:26 INFO JobScheduler: Total delay: 0.325 s for time 
 1399060346000 ms (execution: 0.024 s)
 
 
 
 14/05/02 21:52:42 INFO JobScheduler: Added jobs for time 1399060362000 ms
 14/05/02 21:52:42 INFO JobGenerator: Checkpointing graph for time 
 1399060362000 ms
 14/05/02 21:52:42 INFO DStreamGraph: Updating checkpoint data for time 
 1399060362000 ms
 14/05/02 21:52:42 INFO DStreamGraph: Updated checkpoint data for time 
 1399060362000 ms
 14/05/02 21:52:42 INFO JobScheduler: Starting job streaming job 1399060362000 
 ms.0 from job set of time 1399060362000 ms
 14/05/02 21:52:42 INFO SparkContext: Starting job: take at DStream.scala:593
 14/05/02 21:52:42 INFO DAGScheduler: Got job 2 (take at DStream.scala:593) 
 with 1 output partitions (allowLocal=true)
 14/05/02 21:52:42 INFO DAGScheduler: Final stage: Stage 3(take at 
 DStream.scala:593)
 14/05/02 21:52:42 INFO DAGScheduler: Parents of final stage: List()
 14/05/02 21:52:42 INFO

Spark's behavior

2014-04-29 Thread Eduardo Costa Alfaia
Hi TD,

In my tests with spark streaming, I'm using JavaNetworkWordCount(modified) code 
and a program that I wrote that sends words to the Spark worker, I use TCP as 
transport. I verified that after starting Spark, it connects to my source which 
actually starts sending, but the first word count is advertised approximately 
30 seconds after the context creation. So I'm wondering where is stored the 30 
seconds data already sent by the source. Is this a normal spark’s behaviour? I 
saw the same behaviour using the shipped JavaNetworkWordCount application.

Many thanks.
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Spark's behavior

2014-04-29 Thread Eduardo Costa Alfaia
Hi TD,
We are not using stream context with master local, we have 1 Master and 8 
Workers and 1 word source. The command line that we are using is:
bin/run-example org.apache.spark.streaming.examples.JavaNetworkWordCount 
spark://192.168.0.13:7077
 
On Apr 30, 2014, at 0:09, Tathagata Das tathagata.das1...@gmail.com wrote:

 Is you batch size 30 seconds by any chance? 
 
 Assuming not, please check whether you are creating the streaming context 
 with master local[n] where n  2. With local or local[1], the system 
 only has one processing slot, which is occupied by the receiver leaving no 
 room for processing the received data. It could be that after 30 seconds, the 
 server disconnects, the receiver terminates, releasing the single slot for 
 the processing to proceed. 
 
 TD
 
 
 On Tue, Apr 29, 2014 at 2:28 PM, Eduardo Costa Alfaia 
 e.costaalf...@unibs.it wrote:
 Hi TD,
 
 In my tests with spark streaming, I'm using JavaNetworkWordCount(modified) 
 code and a program that I wrote that sends words to the Spark worker, I use 
 TCP as transport. I verified that after starting Spark, it connects to my 
 source which actually starts sending, but the first word count is advertised 
 approximately 30 seconds after the context creation. So I'm wondering where 
 is stored the 30 seconds data already sent by the source. Is this a normal 
 spark’s behaviour? I saw the same behaviour using the shipped 
 JavaNetworkWordCount application.
 
 Many thanks.
 --
 Informativa sulla Privacy: http://www.unibs.it/node/8155
 


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: reduceByKeyAndWindow Java

2014-04-07 Thread Eduardo Costa Alfaia

Hi TD
Could you explain me this code part?

.reduceByKeyAndWindow(
109 new Function2Integer, Integer, Integer() {
110   public Integer call(Integer i1, Integer i2) { return i1 + 
i2; }

111 },
112 new Function2Integer, Integer, Integer() {
113   public Integer call(Integer i1, Integer i2) { return i1 - 
i2; }

114 },
115 new Duration(60 * 5 * 1000),
116 new Duration(1 * 1000)
117   );

Thanks

Em 4/4/14, 22:56, Tathagata Das escreveu:
I havent really compiled the code, but it looks good to me. Why? Is 
there any problem you are facing?


TD


On Fri, Apr 4, 2014 at 8:03 AM, Eduardo Costa Alfaia 
e.costaalf...@unibs.it mailto:e.costaalf...@unibs.it wrote:



Hi guys,

I would like knowing if the part of code is right to use in Window.

JavaPairDStreamString, Integer wordCounts = words.map(
103   new PairFunctionString, String, Integer() {
104 @Override
105 public Tuple2String, Integer call(String s) {
106   return new Tuple2String, Integer(s, 1);
107 }
108   }).reduceByKeyAndWindow(
109 new Function2Integer, Integer, Integer() {
110   public Integer call(Integer i1, Integer i2) { return
i1 + i2; }
111 },
112 new Function2Integer, Integer, Integer() {
113   public Integer call(Integer i1, Integer i2) { return
i1 - i2; }
114 },
115 new Duration(60 * 5 * 1000),
116 new Duration(1 * 1000)
117   );



Thanks

-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155






--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Driver Out of Memory

2014-04-07 Thread Eduardo Costa Alfaia

Hi Guys,

I would like understanding why the Driver's RAM goes down, Does the 
processing occur only in the workers?

Thanks
# Start Tests
computer1(Worker/Source Stream)
 23:57:18 up 12:03,  1 user,  load average: 0.03, 0.31, 0.44
 total   used   free sharedbuffers cached
Mem:  3945   1084   2860  0 44827
-/+ buffers/cache:212   3732
Swap:0  0  0
computer8 (Driver/Master)
 23:57:18 up 11:53,  5 users,  load average: 0.43, 1.19, 1.31
 total   used   free sharedbuffers cached
Mem:  5897   4430   1466  0 384   2662
-/+ buffers/cache:   1382   4514
Swap:0  0  0
computer10(Worker/Source Stream)
 23:57:18 up 12:02,  1 user,  load average: 0.55, 1.34, 0.98
 total   used   free sharedbuffers cached
Mem:  5897564   5332  0 18358
-/+ buffers/cache:187   5709
Swap:0  0  0
computer11(Worker/Source Stream)
 23:57:18 up 12:02,  1 user,  load average: 0.07, 0.19, 0.29
 total   used   free sharedbuffers cached
Mem:  3945603   3342  0 54355
-/+ buffers/cache:193   3751
Swap:0  0  0

 After 2 Minutes

computer1
 00:06:41 up 12:12,  1 user,  load average: 3.11, 1.32, 0.73
 total   used   free sharedbuffers cached
Mem:  3945   2950994  0 46 1095
-/+ buffers/cache:   1808   2136
Swap:0  0  0
computer8(Driver/Master)
 00:06:41 up 12:02,  5 users,  load average: 1.16, 0.71, 0.96
 total   used   free sharedbuffers cached
Mem:  5897   5191705  0 385   2792
-/+ buffers/cache:   2014   3882
Swap:0  0  0
computer10
 00:06:41 up 12:11,  1 user,  load average: 2.02, 1.07, 0.89
 total   used   free sharedbuffers cached
Mem:  5897   2567   3329  0 21647
-/+ buffers/cache:   1898   3998
Swap:0  0  0
computer11
 00:06:42 up 12:12,  1 user,  load average: 3.96, 1.83, 0.88
 total   used   free sharedbuffers cached
Mem:  3945   3542402  0 57 1099
-/+ buffers/cache:   2385   1559
Swap:0  0  0


--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Explain Add Input

2014-04-04 Thread Eduardo Costa Alfaia

Hi all,

Could anyone explain me about the lines below?

computer1 - worker
computer8 - driver(master)

14/04/04 14:24:56 INFO BlockManagerMasterActor$BlockManagerInfo: Added 
input-0-1396614314800 in memory on computer1.ant-net:60820 (size: 1262.5 
KB, free: 540.3 MB)
14/04/04 14:24:56 INFO MemoryStore: ensureFreeSpace(1292780) called with 
curMem=49555672, maxMem=825439027
14/04/04 14:24:56 INFO MemoryStore: Block input-0-1396614314800 stored 
as bytes to memory (size 1262.5 KB, free 738.7 MB)
14/04/04 14:24:56 INFO BlockManagerMasterActor$BlockManagerInfo: Added 
input-0-1396614314800 in memory on computer8.ant-net:49743 (size: 1262.5 
KB, free: 738.7 MB)


Why does spark add the same input in computer8, which is the Driver(master)?

Thanks guys

--
Informativa sulla Privacy: http://www.unibs.it/node/8155


RAM high consume

2014-04-04 Thread Eduardo Costa Alfaia

 Hi all,

I am doing some tests using JavaNetworkWordcount and I have some 
questions about the performance machine, my tests' time are 
approximately 2 min.


Why does the RAM Memory decrease meaningly? I have done tests with 2, 3 
machines and I had gotten the same behavior.


What should I do to get a better performance in this case?


# Star Test

computer1
 total   used   free sharedbuffers cached
Mem:  3945711 3233 0  3430
-/+ buffers/cache:276   3668
Swap:0  0  0

 14:42:50 up 73 days,  3:32,  2 users,  load average: 0.00, 0.06, 0.21


14/04/04 14:24:56 INFO BlockManagerMasterActor$BlockManagerInfo: Added 
input-0-1396614314400 in memory on computer1.ant-net:60820 (size: 826.1 
KB, free: 542.9 MB)
14/04/04 14:24:56 INFO MemoryStore: ensureFreeSpace(845956) called with 
curMem=47278100, maxMem=825439027
14/04/04 14:24:56 INFO MemoryStore: Block input-0-1396614314400 stored 
as bytes to memory (size 826.1 KB, free 741.3 MB)
14/04/04 14:24:56 INFO BlockManagerMasterActor$BlockManagerInfo: Added 
input-0-1396614314400 in memory on computer8.ant-net:49743 (size: 826.1 
KB, free: 741.3 MB)
14/04/04 14:24:56 INFO BlockManagerMaster: Updated info of block 
input-0-1396614314400
14/04/04 14:24:56 INFO TaskSetManager: Finished TID 272 in 84 ms on 
computer1.ant-net (progress: 0/1)

14/04/04 14:24:56 INFO TaskSchedulerImpl: Remove TaskSet 43.0 from pool
14/04/04 14:24:56 INFO DAGScheduler: Completed ResultTask(43, 0)
14/04/04 14:24:56 INFO DAGScheduler: Stage 43 (take at 
DStream.scala:594) finished in 0.088 s
14/04/04 14:24:56 INFO SparkContext: Job finished: take at 
DStream.scala:594, took 1.872875734 s

---
Time: 1396614289000 ms
---
(Santiago,1)
(liveliness,1)
(Sun,1)
(reapers,1)
(offer,3)
(BARBER,3)
(shrewdness,1)
(truism,1)
(hits,1)
(merchant,1)



# End Test
computer1
 total   used   free sharedbuffers cached
Mem:  3945   2209 1735 0  5773
-/+ buffers/cache:   1430   2514
Swap:0  0  0

 14:46:05 up 73 days,  3:35,  2 users,  load average: 2.69, 1.07, 0.55


14/04/04 14:26:57 INFO TaskSetManager: Starting task 183.0:0 as TID 696 
on executor 0: computer1.ant-net (PROCESS_LOCAL)
14/04/04 14:26:57 INFO TaskSetManager: Serialized task 183.0:0 as 1981 
bytes in 0 ms
14/04/04 14:26:57 INFO MapOutputTrackerMasterActor: Asked to send map 
output locations for shuffle 81 to sp...@computer1.ant-net:44817
14/04/04 14:26:57 INFO MapOutputTrackerMaster: Size of output statuses 
for shuffle 81 is 212 bytes
14/04/04 14:26:57 INFO BlockManagerMasterActor$BlockManagerInfo: Added 
input-0-1396614336600 on disk on computer1.ant-net:60820 (size: 1441.7 KB)
14/04/04 14:26:57 INFO BlockManagerMasterActor$BlockManagerInfo: Added 
input-0-1396614435200 in memory on computer1.ant-net:60820 (size: 1295.7 
KB, free: 589.3 KB)
14/04/04 14:26:57 INFO TaskSetManager: Finished TID 696 in 56 ms on 
computer1.ant-net (progress: 0/1)

14/04/04 14:26:57 INFO TaskSchedulerImpl: Remove TaskSet 183.0 from pool
14/04/04 14:26:57 INFO DAGScheduler: Completed ResultTask(183, 0)
14/04/04 14:26:57 INFO DAGScheduler: Stage 183 (take at 
DStream.scala:594) finished in 0.057 s
14/04/04 14:26:57 INFO SparkContext: Job finished: take at 
DStream.scala:594, took 1.575268894 s

---
Time: 1396614359000 ms
---
(hapless,9)
(reapers,8)
(amazed,113)
(feebleness,7)
(offer,148)
(rabble,27)
(exchanging,7)
(merchant,20)
(incentives,2)
(quarrel,48)
...


Thanks Guys

--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Driver increase memory utilization

2014-04-04 Thread Eduardo Costa Alfaia

   Hi Guys,

Could anyone help me understand this driver behavior when I start the 
JavaNetworkWordCount?


computer8
 16:24:07 up 121 days, 22:21, 12 users,  load average: 0.66, 1.27, 1.55
total   used   free shared buffers 
cached

Mem:  5897   4341 1555 0227   2798
-/+ buffers/cache:   1315   4581
Swap:0  0  0

in 2 minutes

computer8
 16:23:08 up 121 days, 22:20, 12 users,  load average: 0.80, 1.43, 1.62
 total   used   free shared buffers 
cached

Mem:  5897   5866 30 0230   3255
-/+ buffers/cache:   2380   3516
Swap:0  0  0


Thanks

--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Parallelism level

2014-04-04 Thread Eduardo Costa Alfaia

Hi all,

I have put this line in my spark-env.sh:
-Dspark.default.parallelism=20

 this parallelism level, is it correct?
 The machine's processor is a dual core.

Thanks

--
Informativa sulla Privacy: http://www.unibs.it/node/8155


RAM Increase

2014-04-04 Thread Eduardo Costa Alfaia

Hi Guys,

Could anyone explain me this behavior? After 2 min of tests

computer1- worker
computer10 - worker
computer8 - driver(master)

computer1
 18:24:31 up 73 days,  7:14,  1 user,  load average: 3.93, 2.45, 1.14
   total   used   free shared buffers 
cached

Mem:  3945   3925 19 0 18   1368
-/+ buffers/cache:   2539   1405
Swap:0  0  0
computer10
 18:22:38 up 44 days, 21:26,  2 users,  load average: 3.05, 2.20, 1.03
 total   used   free shared buffers 
cached

Mem:  5897   5292 604 0 46   2707
-/+ buffers/cache:   2538   3358
Swap:0  0  0
computer8
 18:24:13 up 122 days, 22 min, 13 users,  load average: 1.10, 0.93, 0.82
 total   used   free shared buffers 
cached

Mem:  5897   5841 55 0113   2747
-/+ buffers/cache:   2980   2916
Swap:0  0  0

Thanks Guys

--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Parallelism level

2014-04-04 Thread Eduardo Costa Alfaia

What do you advice me Nicholas?

Em 4/4/14, 19:05, Nicholas Chammas escreveu:
If you're running on one machine with 2 cores, I believe all you can 
get out of it are 2 concurrent tasks at any one time. So setting your 
default parallelism to 20 won't help.



On Fri, Apr 4, 2014 at 11:41 AM, Eduardo Costa Alfaia 
e.costaalf...@unibs.it mailto:e.costaalf...@unibs.it wrote:


Hi all,

I have put this line in my spark-env.sh:
-Dspark.default.parallelism=20

 this parallelism level, is it correct?
 The machine's processor is a dual core.

Thanks

-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155






--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: reduceByKeyAndWindow Java

2014-04-04 Thread Eduardo Costa Alfaia

Hi Tathagata,

You are right, this code compile, but I am some problems with high 
memory consummation, I sent today some email about this, but no response 
until now.


Thanks
Em 4/4/14, 22:56, Tathagata Das escreveu:
I havent really compiled the code, but it looks good to me. Why? Is 
there any problem you are facing?


TD


On Fri, Apr 4, 2014 at 8:03 AM, Eduardo Costa Alfaia 
e.costaalf...@unibs.it mailto:e.costaalf...@unibs.it wrote:



Hi guys,

I would like knowing if the part of code is right to use in Window.

JavaPairDStreamString, Integer wordCounts = words.map(
103   new PairFunctionString, String, Integer() {
104 @Override
105 public Tuple2String, Integer call(String s) {
106   return new Tuple2String, Integer(s, 1);
107 }
108   }).reduceByKeyAndWindow(
109 new Function2Integer, Integer, Integer() {
110   public Integer call(Integer i1, Integer i2) { return
i1 + i2; }
111 },
112 new Function2Integer, Integer, Integer() {
113   public Integer call(Integer i1, Integer i2) { return
i1 - i2; }
114 },
115 new Duration(60 * 5 * 1000),
116 new Duration(1 * 1000)
117   );



Thanks

-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155






--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Print line in JavaNetworkWordCount

2014-04-02 Thread Eduardo Costa Alfaia

Hi Guys

I would like printing the content inside of line in :

JavaDStreamString lines = ssc.socketTextStream(args[1], 
Integer.parseInt(args[2]));
JavaDStreamString words = lines.flatMap(new 
FlatMapFunctionString, String() {

  @Override
  public IterableString call(String x) {
return Lists.newArrayList(x.split( ));
  }
});

Is it possible? How could I do?

Thanks

--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Change print() in JavaNetworkWordCount

2014-03-27 Thread Eduardo Costa Alfaia

Thank you very much  Sourav

BR

Em 3/26/14, 17:29, Sourav Chandra escreveu:

def print() {
def foreachFunc = (rdd: RDD[T], time: Time) = {
  val total = rdd.collect().toList
  println (---)
  println (Time:  + time)
  println (---)
  total.foreach(println)
//  val first11 = rdd.take(11)
//  println (---)
//  println (Time:  + time)
//  println (---)
//  first11.take(10).foreach(println)
//  if (first11.size  10) println(...)
  println()
}
new ForEachDStream(this, 
context.sparkContext.clean(foreachFunc)).register()

  }



--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Change print() in JavaNetworkWordCount

2014-03-25 Thread Eduardo Costa Alfaia

Hi Guys,
I think that I already did this question, but I don't remember if anyone 
has answered me. I would like changing in the function print() the 
quantity of words and the frequency number that are sent to driver's 
screen. The default value is 10.


Anyone could help me with this?

Best Regards

--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Log Analyze

2014-03-10 Thread Eduardo Costa Alfaia

Hi Guys,
Could anyone help me to understand this piece of log in red? Why is this 
happened?


Thanks

14/03/10 16:55:20 INFO SparkContext: Starting job: first at 
NetworkWordCount.scala:87
14/03/10 16:55:20 INFO JobScheduler: Finished job streaming job 
1394466892000 ms.0 from job set of time 1394466892000 ms
14/03/10 16:55:20 INFO JobScheduler: Total delay: 28.537 s for time 
1394466892000 ms (execution: 4.479 s)
14/03/10 16:55:20 INFO JobScheduler: Starting job streaming job 
1394466893000 ms.0 from job set of time 1394466893000 ms
14/03/10 16:55:20 INFO JobGenerator: Checkpointing graph for time 
1394466892000 ms
14/03/10 16:55:20 INFO DStreamGraph: Updating checkpoint data for time 
1394466892000 ms
14/03/10 16:55:20 INFO DStreamGraph: Updated checkpoint data for time 
1394466892000 ms
14/03/10 16:55:20 INFO CheckpointWriter: Saving checkpoint for time 
1394466892000 ms to file 
'hdfs://computer8:54310/user/root/INPUT/checkpoint-1394466892000'
14/03/10 16:55:20 INFO DAGScheduler: Registering RDD 496 (combineByKey 
at ShuffledDStream.scala:42)
14/03/10 16:55:20 INFO DAGScheduler: Got job 39 (first at 
NetworkWordCount.scala:87) with 1 output partitions (allowLocal=true)
14/03/10 16:55:20 INFO DAGScheduler: Final stage: Stage 77 (first at 
NetworkWordCount.scala:87)

14/03/10 16:55:20 INFO DAGScheduler: Parents of final stage: List(Stage 78)
14/03/10 16:55:20 INFO DAGScheduler: Missing parents: List(Stage 78)
14/03/10 16:55:20 INFO BlockManagerMasterActor$BlockManagerInfo: Removed 
input-1-1394466782400 on computer10.ant-net:34062 in memory (size: 5.9 
MB, free: 502.2 MB)
14/03/10 16:55:20 INFO DAGScheduler: Submitting Stage 78 
(MapPartitionsRDD[496] at combineByKey at ShuffledDStream.scala:42), 
which has no missing parents
14/03/10 16:55:20 INFO BlockManagerMasterActor$BlockManagerInfo: Added 
input-1-1394466816600 in memory on computer10.ant-net:34062 (size: 4.4 
MB, free: 497.8 MB)
14/03/10 16:55:20 INFO DAGScheduler: Submitting 15 missing tasks from 
Stage 78 (MapPartitionsRDD[496] at combineByKey at ShuffledDStream.scala:42)

14/03/10 16:55:20 INFO TaskSchedulerImpl: Adding task set 78.0 with 15 tasks
14/03/10 16:55:20 INFO TaskSetManager: Starting task 78.0:9 as TID 539 
on executor 2: computer1.ant-net (PROCESS_LOCAL)
14/03/10 16:55:20 INFO TaskSetManager: Serialized task 78.0:9 as 4144 
bytes in 1 ms
14/03/10 16:55:20 INFO TaskSetManager: Starting task 78.0:10 as TID 540 
on executor 1: computer10.ant-net (PROCESS_LOCAL)
14/03/10 16:55:20 INFO TaskSetManager: Serialized task 78.0:10 as 4144 
bytes in 0 ms
14/03/10 16:55:20 INFO TaskSetManager: Starting task 78.0:11 as TID 541 
on executor 0: computer11.ant-net (PROCESS_LOCAL)
14/03/10 16:55:20 INFO TaskSetManager: Serialized task 78.0:11 as 4144 
bytes in 0 ms
14/03/10 16:55:20 INFO BlockManagerMasterActor$BlockManagerInfo: Removed 
input-0-1394466874200 on computer1.ant-net:51406 in memory (size: 2.9 
MB, free: 460.0 MB)
14/03/10 16:55:20 INFO BlockManagerMasterActor$BlockManagerInfo: Removed 
input-0-1394466874400 on computer1.ant-net:51406 in memory (size: 4.1 
MB, free: 468.2 MB)
14/03/10 16:55:20 INFO TaskSetManager: Starting task 78.0:12 as TID 542 
on executor 1: computer10.ant-net (PROCESS_LOCAL)
14/03/10 16:55:20 INFO TaskSetManager: Serialized task 78.0:12 as 4144 
bytes in 1 ms

14/03/10 16:55:20 WARN TaskSetManager: Lost TID 540 (task 78.0:10)
14/03/10 16:55:20 INFO CheckpointWriter: Deleting 
hdfs://computer8:54310/user/root/INPUT/checkpoint-1394466892000
14/03/10 16:55:20 INFO CheckpointWriter: Checkpoint for time 
1394466892000 ms saved to file 
'hdfs://computer8:54310/user/root/INPUT/checkpoint-1394466892000', took 
3633 bytes and 93 ms
14/03/10 16:55:20 INFO DStreamGraph: Clearing checkpoint data for time 
1394466892000 ms
14/03/10 16:55:20 INFO DStreamGraph: Cleared checkpoint data for time 
1394466892000 ms
14/03/10 16:55:20 INFO BlockManagerMasterActor$BlockManagerInfo: Removed 
input-2-1394466789000 on computer11.ant-net:58332 in memory (size: 3.9 
MB, free: 536.0 MB)

14/03/10 16:55:20 WARN TaskSetManager: Loss was due to java.lang.Exception
java.lang.Exception: Could not compute split, block 
input-2-1394466794200 not found

at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:45)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.rdd.UnionPartition.iterator(UnionRDD.scala:32)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:72)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at 

Re: Explain About Logs NetworkWordcount.scala

2014-03-09 Thread Eduardo Costa Alfaia

Yes TD,
I can use tcpdump to see if the data are being accepted by the receiver 
and if else them are arriving into the IP packet.


Thanks
Em 3/8/14, 4:19, Tathagata Das escreveu:
I am not sure how to debug this without any more information about the 
source. Can you monitor on the receiver side that data is being 
accepted by the receiver but not reported?


TD


On Wed, Mar 5, 2014 at 7:23 AM, eduardocalfaia e.costaalf...@unibs.it 
mailto:e.costaalf...@unibs.it wrote:


Hi TD,
I have seen in the web UI the stage number that result has been
zero and in
the field GC Times there is nothing.

http://apache-spark-user-list.1001560.n3.nabble.com/file/n2306/CaptureStage.png



--
View this message in context:

http://apache-spark-user-list.1001560.n3.nabble.com/Explain-About-Logs-NetworkWordcount-scala-tp1835p2306.html
Sent from the Apache Spark User List mailing list archive at
Nabble.com.





--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Explain About Logs NetworkWordcount.scala

2014-02-20 Thread Eduardo Costa Alfaia

Hi Guys,

Could anyone help me understanding the logs below? Why the result in the 
second log is 0?


Thanks Guys

14/02/20 19:06:00 INFO JobScheduler: Finished job streaming job 
1392919557000 ms.0 from job set of time 1392919557000 ms
14/02/20 19:06:00 INFO JobScheduler: Total delay: 3.185 s for time 
1392919557000 ms (execution: 3.167 s)
14/02/20 19:06:00 INFO JobGenerator: Checkpointing graph for time 
1392919557000 ms
14/02/20 19:06:00 INFO DStreamGraph: Updating checkpoint data for time 
1392919557000 ms
14/02/20 19:06:00 INFO DStreamGraph: Updated checkpoint data for time 
1392919557000 ms
14/02/20 19:06:00 INFO SparkContext: Starting job: first at 
NetworkWordCount.scala:87
14/02/20 19:06:00 INFO JobScheduler: Starting job streaming job 
1392919558000 ms.0 from job set of time 1392919558000 ms
14/02/20 19:06:00 INFO CheckpointWriter: Saving checkpoint for time 
1392919557000 ms to file 
'hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919557000'
14/02/20 19:06:00 INFO DAGScheduler: Registering RDD 812 (combineByKey 
at ShuffledDStream.scala:42)
14/02/20 19:06:00 INFO DAGScheduler: Got job 91 (first at 
NetworkWordCount.scala:87) with 1 output partitions (allowLocal=true)
14/02/20 19:06:00 INFO DAGScheduler: Final stage: Stage 181 (first at 
NetworkWordCount.scala:87)

14/02/20 19:06:00 INFO DAGScheduler: Parents of final stage: List(Stage 182)
14/02/20 19:06:00 INFO DAGScheduler: Missing parents: List(Stage 182)
14/02/20 19:06:00 INFO DAGScheduler: Submitting Stage 182 
(MapPartitionsRDD[812] at combineByKey at ShuffledDStream.scala:42), 
which has no missing parents
14/02/20 19:06:00 INFO DAGScheduler: Submitting 2 missing tasks from 
Stage 182 (MapPartitionsRDD[812] at combineByKey at 
ShuffledDStream.scala:42)

14/02/20 19:06:00 INFO TaskSchedulerImpl: Adding task set 182.0 with 2 tasks
14/02/20 19:06:00 INFO TaskSetManager: Starting task 182.0:1 as TID 609 
on executor 0: computer1.ant-net (PROCESS_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 182.0:1 as 3023 
bytes in 0 ms
14/02/20 19:06:00 INFO TaskSetManager: Starting task 182.0:0 as TID 610 
on executor 0: computer1.ant-net (NODE_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 182.0:0 as 3485 
bytes in 0 ms
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 609 in 17 ms on 
computer1.ant-net (progress: 0/2)

14/02/20 19:06:00 INFO DAGScheduler: Completed ShuffleMapTask(182, 1)
14/02/20 19:06:00 INFO BlockManagerMasterActor$BlockManagerInfo: Added 
input-0-1392919527400 in memory on computer1.ant-net:41142 (size: 2018.6 
KB, free: 387.3 MB)
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 610 in 67 ms on 
computer1.ant-net (progress: 1/2)

14/02/20 19:06:00 INFO TaskSchedulerImpl: Remove TaskSet 182.0 from pool
14/02/20 19:06:00 INFO DAGScheduler: Completed ShuffleMapTask(182, 0)
14/02/20 19:06:00 INFO DAGScheduler: Stage 182 (combineByKey at 
ShuffledDStream.scala:42) finished in 0.080 s

14/02/20 19:06:00 INFO DAGScheduler: looking for newly runnable stages
14/02/20 19:06:00 INFO DAGScheduler: running: Set(Stage 4)
14/02/20 19:06:00 INFO DAGScheduler: waiting: Set(Stage 181)
14/02/20 19:06:00 INFO DAGScheduler: failed: Set()
14/02/20 19:06:00 INFO CheckpointWriter: Deleting 
hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919554000.bk

14/02/20 19:06:00 INFO DAGScheduler: Missing parents for Stage 181: List()
14/02/20 19:06:00 INFO DAGScheduler: Submitting Stage 181 
(MappedRDD[815] at map at MappedDStream.scala:35), which is now runnable
14/02/20 19:06:00 INFO CheckpointWriter: Checkpoint for time 
1392919557000 ms saved to file 
'hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919557000', took 
3270 bytes and 102 ms
14/02/20 19:06:00 INFO DStreamGraph: Clearing checkpoint data for time 
1392919557000 ms
14/02/20 19:06:00 INFO DStreamGraph: Cleared checkpoint data for time 
1392919557000 ms
14/02/20 19:06:00 INFO DAGScheduler: Submitting 1 missing tasks from 
Stage 181 (MappedRDD[815] at map at MappedDStream.scala:35)

14/02/20 19:06:00 INFO TaskSchedulerImpl: Adding task set 181.0 with 1 tasks
14/02/20 19:06:00 INFO TaskSetManager: Starting task 181.0:0 as TID 611 
on executor 0: computer1.ant-net (PROCESS_LOCAL)
14/02/20 19:06:00 INFO TaskSetManager: Serialized task 181.0:0 as 2057 
bytes in 1 ms
14/02/20 19:06:00 INFO MapOutputTrackerMasterActor: Asked to send map 
output locations for shuffle 90 to sp...@computer1.ant-net:47226
14/02/20 19:06:00 INFO MapOutputTrackerMaster: Size of output statuses 
for shuffle 90 is 146 bytes
14/02/20 19:06:00 INFO TaskSetManager: Finished TID 611 in 25 ms on 
computer1.ant-net (progress: 0/1)

14/02/20 19:06:00 INFO TaskSchedulerImpl: Remove TaskSet 181.0 from pool
14/02/20 19:06:00 INFO DAGScheduler: Completed ResultTask(181, 0)
14/02/20 19:06:00 INFO DAGScheduler: Stage 181 (first at 
NetworkWordCount.scala:87) finished in 0.027 s
14/02/20 19:06:00 INFO SparkContext: Job finished: first at 
NetworkWordCount.scala:87, took 0.133625862 s

118967 

NetworkWordCount Tests

2014-02-18 Thread Eduardo Costa Alfaia

Hi Guys,

I am doing some test with NetworkWordCount scala code where I am 
counting and summing a stream of words received from network  using 
foreach action, thanks TD. Firstly I have began with this scenario 1 
Master + 1 Worker(also actioning like Stream source) and I have obtained 
the result(bellow).  The worker machine has 4GB RAM e 2 Core, I am 
trying to understand why some results are equals zero. I have seen that 
RAM Memory goes down very quickly.

Could anyone help me this question?

Thanks Guys

308425
964276
628731
801010
711439
808223
507143
981999
853862
852054
581291
1078153
822553
860385
907984
792155
801966
860747
804655
827498
727398
834044
821059
708479
949565
796239
813312
717552
792051
811995
803358
762467
838375
803473
773933
824912
811991
605851
1012426
631953
725137
747702
907284
0
0
0
0
0
0

113534
1591325
861635
857057
815570
287201
0
0


--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: object scala not found

2014-02-11 Thread Eduardo Costa Alfaia
Hi Sai

Have you already tried running with JDK-7?

BR

On Feb 11, 2014, at 6:00, Sai Prasanna ansaiprasa...@gmail.com wrote:

 
 When i ran sbt/sbt assembly after installing scala 2.9.3 and downloading 
 Spark 0.8.1 binaries and JDK-6 being intalled, for a standalone spark, i got 
 the following error.
 I did sbt clean and even then sbt assembly is giving this error. 
 Can someone please help !!
 
 
 [info] Loading project definition from 
 /home/sparkslave/spark-0.8.1-incubating/project/project
 [info] Loading project definition from 
 /home/sparkslave/spark-0.8.1-incubating/project
 [info] Set current project to root (in build 
 file:/home/sparkslave/spark-0.8.1-incubating/)
 [info] Compiling 49 Scala sources to 
 /home/sparkslave/spark-0.8.1-incubating/streaming/target/scala-2.9.3/classes...
 [error] error while loading root, error in opening zip file
 scala.tools.nsc.MissingRequirementError: object scala not found.
 at 
 scala.tools.nsc.symtab.Definitions$definitions$.getModuleOrClass(Definitions.scala:655)
 at 
 scala.tools.nsc.symtab.Definitions$definitions$.getModule(Definitions.scala:605)
 at 
 scala.tools.nsc.symtab.Definitions$definitions$.ScalaPackage(Definitions.scala:145)
 at 
 scala.tools.nsc.symtab.Definitions$definitions$.ScalaPackageClass(Definitions.scala:146)
 at 
 scala.tools.nsc.symtab.Definitions$definitions$.AnyClass(Definitions.scala:176)
 at 
 scala.tools.nsc.symtab.Definitions$definitions$.init(Definitions.scala:814)
 at scala.tools.nsc.Global$Run.init(Global.scala:697)
 at xsbt.CachedCompiler0.run(CompilerInterface.scala:86)
 at xsbt.CachedCompiler0.liftedTree1$1(CompilerInterface.scala:72)
 at xsbt.CachedCompiler0.run(CompilerInterface.scala:72)
 at xsbt.CompilerInterface.run(CompilerInterface.scala:27)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:73)
 at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:35)
 at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:29)
 at 
 sbt.compiler.AggressiveCompile$$anonfun$4$$anonfun$compileScala$1$1.apply$mcV$sp(AggressiveCompile.scala:71)
 at 
 sbt.compiler.AggressiveCompile$$anonfun$4$$anonfun$compileScala$1$1.apply(AggressiveCompile.scala:71)
 at 
 sbt.compiler.AggressiveCompile$$anonfun$4$$anonfun$compileScala$1$1.apply(AggressiveCompile.scala:71)
 at 
 sbt.compiler.AggressiveCompile.sbt$compiler$AggressiveCompile$$timed(AggressiveCompile.scala:101)
 at 
 sbt.compiler.AggressiveCompile$$anonfun$4.compileScala$1(AggressiveCompile.scala:70)
 at 
 sbt.compiler.AggressiveCompile$$anonfun$4.apply(AggressiveCompile.scala:88)
 at 
 sbt.compiler.AggressiveCompile$$anonfun$4.apply(AggressiveCompile.scala:60)
 at sbt.inc.IncrementalCompile$$anonfun$doCompile$1.apply(Compile.scala:24)
 at sbt.inc.IncrementalCompile$$anonfun$doCompile$1.apply(Compile.scala:22)
 at sbt.inc.Incremental$.cycle(Incremental.scala:52)
 at sbt.inc.Incremental$.compile(Incremental.scala:29)
 at sbt.inc.IncrementalCompile$.apply(Compile.scala:20)
 at sbt.compiler.AggressiveCompile.compile2(AggressiveCompile.scala:96)
 at sbt.compiler.AggressiveCompile.compile1(AggressiveCompile.scala:44)
 at sbt.compiler.AggressiveCompile.apply(AggressiveCompile.scala:31)
 at sbt.Compiler$.apply(Compiler.scala:79)
 at sbt.Defaults$$anonfun$compileTask$1.apply(Defaults.scala:574)
 at sbt.Defaults$$anonfun$compileTask$1.apply(Defaults.scala:574)
 at sbt.Scoped$$anonfun$hf2$1.apply(Structure.scala:578)
 at sbt.Scoped$$anonfun$hf2$1.apply(Structure.scala:578)
 at scala.Function1$$anonfun$compose$1.apply(Function1.scala:49)
 at 
 sbt.Scoped$Reduced$$anonfun$combine$1$$anonfun$apply$12.apply(Structure.scala:311)
 at 
 sbt.Scoped$Reduced$$anonfun$combine$1$$anonfun$apply$12.apply(Structure.scala:311)
 at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:41)
 at sbt.std.Transform$$anon$5.work(System.scala:71)
 at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:232)
 at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:232)
 at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:18)
 at sbt.Execute.work(Execute.scala:238)
 at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:232)
 at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:232)
 at 
 sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:160)
 at sbt.CompletionService$$anon$2.call(CompletionService.scala:30)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at 

Compiling NetworkWordCount scala code

2014-02-06 Thread Eduardo Costa Alfaia
Hi Guys,

I am getting this error when I compile NetworkWordCount.scala:

nfo] Compiling 1 Scala source to 
/opt/unibs_test/incubator-spark-tdas/examples/target/scala-2.10/classes...
[error] 
/opt/unibs_test/incubator-spark-tdas/examples/src/main/scala/org/apache/spark/streaming/examples/NetworkWordCount.scala:63:
 missing parameter type for expanded function ((x$2, x$3) = x$2.$plus(x$3))
[error] val wordCounts = words.map(x = (x, 1)).reduceByKeyAndWindow(_ + _, 
Seconds(30), Seconds(10))
[error]  ^
[error] one error found
[error] (examples/compile:compile) Compilation failed
[error] Total time: 16 s, completed 07-Feb-2014 01:13:21

Could anyone help me?

Thanks
-- 
INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI

I dati utilizzati per l'invio del presente messaggio sono trattati 
dall'Università degli Studi di Brescia esclusivamente per finalità 
istituzionali. Informazioni più dettagliate anche in ordine ai diritti 
dell'interessato sono riposte nell'informativa generale e nelle notizie 
pubblicate sul sito web dell'Ateneo nella sezione Privacy.

Il contenuto di questo messaggio è rivolto unicamente alle persona cui 
è indirizzato e può contenere informazioni la cui riservatezza è 
tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso 
in mancanza di autorizzazione del destinatario. Qualora il messaggio 
fosse pervenuto per errore, preghiamo di eliminarlo.


Re: Source code JavaNetworkWordcount

2014-02-05 Thread Eduardo Costa Alfaia
Hi Tathagata
I am playing with NetworkWordCount.scala, I did some changes like this(in red):

 // Create the context with a 1 second batch size
 67 val ssc = new StreamingContext(args(0), NetworkWordCount, Seconds(1),
 68   System.getenv(SPARK_HOME), 
StreamingContext.jarOfClass(this.getClass))
 69 ssc.checkpoint(hdfs://computer8:54310/user/root/INPUT)
 70 // Create a socket text stream on target ip:port and count the
 71 // words in the input stream of \n delimited text (eg. generated by 
'nc')
 72 val lines1 = ssc.socketTextStream(localhost, 12345.toInt, 
StorageLevel.MEMORY_ONLY)
 73 val lines2 = ssc.socketTextStream(localhost, 12345.toInt, 
StorageLevel.MEMORY_ONLY)
 74 val lines3 = ssc.socketTextStream(localhost, 12345.toInt, 
StorageLevel.MEMORY_ONLY)
 75 val union2 = lines1.union(lines2)
 76 val union3 = union2.union(lines3)
 77 
 78 //val words = lines.flatMap(_.split( ))
 79 val words = union3.flatMap(_.split( ))
 80 //val wordCounts = words.map(x = (x, 1)).reduceByKey(_ + _)
 81 val wordCounts = words.reduceByKeyAndWindow(_ + _, Seconds(30), 
Seconds(10))

However I have gotten the error bellow:

[error] 
/opt/unibs_test/incubator-spark-tdas/examples/src/main/scala/org/apache/spark/streaming/examples/NetworkWordCount.scala:81:
 value reduceByKeyAndWindow is not a member of 
org.apache.spark.streaming.dstream.DStream[String]
[error] val wordCounts = words.reduceByKeyAndWindow(_ + _, Seconds(30), 
Seconds(10))
[error]^
[error] one error found
[error] (examples/compile:compile) Compilation failed
[error] Total time: 15 s, completed 05-Feb-2014 17:10:38


The class is import within the code:

import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.storage.StorageLevel


Thanks

On Feb 5, 2014, at 5:22, Tathagata Das tathagata.das1...@gmail.com wrote:

 Seems good to me. BTW, its find to MEMORY_ONLY (i.e. without replication)
 for testing, but you should turn on replication if you want
 fault-tolerance.
 
 TD
 
 
 On Mon, Feb 3, 2014 at 3:19 PM, Eduardo Costa Alfaia e.costaalf...@unibs.it
 wrote:
 
 Hi Tathagata,
 
 You were right when you have said for me to use scala against java, scala
 is very easy. I have implemented that code you have given (in bold), but I
 have implemented also an union function(in red) because I am testing with 2
 stream sources, my idea is putting 3 or more stream sources and doing the
 union.
 
 object NetworkWordCount {
 37   def main(args: Array[String]) {
 38 if (args.length  1) {
 39   System.err.println(Usage: NetworkWordCount master hostname
 port\n +
 40 In local mode, master should be 'local[n]' with n  1)
 41   System.exit(1)
 42 }
 43
 44 StreamingExamples.setStreamingLogLevels()
 45
 46 // Create the context with a 1 second batch size
 47 val ssc = new StreamingContext(args(0), NetworkWordCount,
 Seconds(1),
 48   System.getenv(SPARK_HOME),
 StreamingContext.jarOfClass(this.getClass))
 49 ssc.checkpoint(hdfs://computer22:54310/user/root/INPUT)
 50 // Create a socket text stream on target ip:port and count the
 51 // words in the input stream of \n delimited text (eg. generated by
 'nc')
 52 *val lines1 = ssc.socketTextStream(localhost, 12345.toInt,
 StorageLevel.MEMORY_ONLY_SER)*
 * 53 val lines2 = ssc.socketTextStream(localhost, 12345.toInt,
 StorageLevel.MEMORY_ONLY_SER)*
 * 54 val union2 = lines1.union(lines2)*
 55 //val words = lines.flatMap(_.split( ))
 56 *val words = union2.flatMap(_.split( ))*
 57 val wordCounts = words.map(x = (x, 1)).reduceByKey(_ + _)
 58
 59* words.count().foreachRDD(rdd = {*
 * 60 val totalCount = rdd.first()*
 * 61 *
 * 62 // print to screen*
 * 63 println(totalCount)*
 * 64 *
 * 65 // append count to file*
 * 66   //  ...*
 * 67 })*
 //wordCounts.print()
 70 ssc.start()
 71 ssc.awaitTermination()
 72   }
 73 }
 
 What do you think? is My code right?
 
 I have obtained the follow result:
 
 root@computer8:/opt/unibs_test/incubator-spark-tdas# bin/run-example
 org.apache.spark.streaming.examples.NetworkWordCount
 spark://192.168.0.13:7077SLF4J: Class path contains multiple SLF4J
 bindings.
 SLF4J: Found binding in
 
 [jar:file:/opt/unibs_test/incubator-spark-tdas/examples/target/scala-2.10/spark-examples-assembly-0.9.0-incubating-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in
 
 [jar:file:/opt/unibs_test/incubator-spark-tdas/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop1.0.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
 explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 14/02/04 00:02:07 INFO StreamingExamples: Using Spark's default log4j
 profile: org

Source code JavaNetworkWordcount

2014-01-30 Thread Eduardo Costa Alfaia
Hi Guys,

I'm not very good like java programmer, so anybody could me help with this
code piece from JavaNetworkWordcount:

JavaPairDStreamString, Integer wordCounts = words.map(
new PairFunctionString, String, Integer() {
 @Override
  public Tuple2String, Integer call(String s) throws Exception {
return new Tuple2String, Integer(s, 1);
  }
}).reduceByKey(new Function2Integer, Integer, Integer() {
  @Override
  public Integer call(Integer i1, Integer i2) throws Exception {
return i1 + i2;
  }
});

  JavaPairDStreamString, Integer counts =
wordCounts.reduceByKeyAndWindow(
new Function2Integer, Integer, Integer() {
  public Integer call(Integer i1, Integer i2) { return i1 + i2; }
},
new Function2Integer, Integer, Integer() {
  public Integer call(Integer i1, Integer i2) { return i1 - i2; }
},
new Duration(60 * 5 * 1000),
new Duration(1 * 1000)
  );

I would like to think a manner of counting and after summing  and getting a
total from words counted in a single file, for example a book in txt
extension Don Quixote. The counts function give me the resulted from each
word has found and not a total of words from the file.
Tathagata has sent me a piece from scala code, Thanks Tathagata by your
attention with my posts I am very thankfully,

  yourDStream.foreachRDD(rdd = {

   // Get and print first n elements
   val firstN = rdd.take(n)
   println(First N elements =  + firstN)

  // Count the number of elements in each batch
  println(RDD has  + rdd.count() +  elements)

})

yourDStream.count.print()

Could anybody help me?


Thanks Guys

-- 
INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI

I dati utilizzati per l'invio del presente messaggio sono trattati 
dall'Università degli Studi di Brescia esclusivamente per finalità 
istituzionali. Informazioni più dettagliate anche in ordine ai diritti 
dell'interessato sono riposte nell'informativa generale e nelle notizie 
pubblicate sul sito web dell'Ateneo nella sezione Privacy.

Il contenuto di questo messaggio è rivolto unicamente alle persona cui 
è indirizzato e può contenere informazioni la cui riservatezza è 
tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso 
in mancanza di autorizzazione del destinatario. Qualora il messaggio 
fosse pervenuto per errore, preghiamo di eliminarlo.


Re: Print in JavaNetworkWordCount

2014-01-28 Thread Eduardo Costa Alfaia
Hi Tathagata,

This code that you have sent me is it a scala code?

yourDStream.foreachRDD(rdd = {

   // Get and print first n elements
   val firstN = rdd.take(n)
   println(First N elements =  + firstN)

  // Count the number of elements in each batch
  println(RDD has  + rdd.count() +  elements)

})

Thanks



Il giorno 20 gennaio 2014 19:11, Tathagata Das
tathagata.das1...@gmail.comha scritto:

 Hi Eduardo,

 You can do arbitrary stuff with the data in a DStream using the operation
 foreachRDD.

 yourDStream.foreachRDD(rdd = {

// Get and print first n elements
val firstN = rdd.take(n)
println(First N elements =  + firstN)

   // Count the number of elements in each batch
   println(RDD has  + rdd.count() +  elements)

 })


 Alternatively, just for printing the counts, you can also do

 yourDStream.count.print()

 Hope this helps!

 TD



 2014/1/20 Eduardo Costa Alfaia e.costaalf...@studenti.unibs.it

  Hi guys,
 
  Somebody help me, Where do I get change the print() function to print
 more
  than 10 lines in screen? Is there a manner to print the count total of
 all
  words in a batch?
 
  Best Regards
  --
  ---
  INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI
 
  I dati utilizzati per l'invio del presente messaggio sono trattati
  dall'Università degli Studi di Brescia esclusivamente per finalità
  istituzionali. Informazioni più dettagliate anche in ordine ai diritti
  dell'interessato sono riposte nell'informativa generale e nelle notizie
  pubblicate sul sito web dell'Ateneo nella sezione Privacy.
 
  Il contenuto di questo messaggio è rivolto unicamente alle persona cui
  è indirizzato e può contenere informazioni la cui riservatezza è
  tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e
 l'uso
  in mancanza di autorizzazione del destinatario. Qualora il messaggio
  fosse pervenuto per errore, preghiamo di eliminarlo.
 


-- 
---
INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI

I dati utilizzati per l'invio del presente messaggio sono trattati 
dall'Università degli Studi di Brescia esclusivamente per finalità 
istituzionali. Informazioni più dettagliate anche in ordine ai diritti 
dell'interessato sono riposte nell'informativa generale e nelle notizie 
pubblicate sul sito web dell'Ateneo nella sezione Privacy.

Il contenuto di questo messaggio è rivolto unicamente alle persona cui 
è indirizzato e può contenere informazioni la cui riservatezza è 
tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso 
in mancanza di autorizzazione del destinatario. Qualora il messaggio 
fosse pervenuto per errore, preghiamo di eliminarlo.


Re: Print in JavaNetworkWordCount

2014-01-28 Thread Eduardo Costa Alfaia
Hi Tathagata, doesn't worry I am looking for a manner in the source code of
JavaNetworkWordcount print me in console the sum of the total of words in a
file, not one word by line.

Thanks


Il giorno 28 gennaio 2014 22:36, Tathagata Das
tathagata.das1...@gmail.comha scritto:

 Yes, it was my intention to write scala code. But I may have failed to
 write a correct one that compiles. Apologies.

 Also, something to keep in mind. This is the dev mailing for Spark
 developers. Questions related to using Spark should be sent to
 u...@spark.incubator.apache.org

 TD



 2014/1/28 Eduardo Costa Alfaia e.costaalf...@unibs.it

  Hi Tathagata,
 
  This code that you have sent me is it a scala code?
 
  yourDStream.foreachRDD(rdd = {
 
 // Get and print first n elements
 val firstN = rdd.take(n)
 println(First N elements =  + firstN)
 
// Count the number of elements in each batch
println(RDD has  + rdd.count() +  elements)
 
  })
 
  Thanks
 
 
 
  Il giorno 20 gennaio 2014 19:11, Tathagata Das
  tathagata.das1...@gmail.comha scritto:
 
   Hi Eduardo,
  
   You can do arbitrary stuff with the data in a DStream using the
 operation
   foreachRDD.
  
   yourDStream.foreachRDD(rdd = {
  
  // Get and print first n elements
  val firstN = rdd.take(n)
  println(First N elements =  + firstN)
  
 // Count the number of elements in each batch
 println(RDD has  + rdd.count() +  elements)
  
   })
  
  
   Alternatively, just for printing the counts, you can also do
  
   yourDStream.count.print()
  
   Hope this helps!
  
   TD
  
  
  
   2014/1/20 Eduardo Costa Alfaia e.costaalf...@studenti.unibs.it
  
Hi guys,
   
Somebody help me, Where do I get change the print() function to print
   more
than 10 lines in screen? Is there a manner to print the count total
 of
   all
words in a batch?
   
Best Regards
--
---
INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI
   
I dati utilizzati per l'invio del presente messaggio sono trattati
dall'Università degli Studi di Brescia esclusivamente per finalità
istituzionali. Informazioni più dettagliate anche in ordine ai
 diritti
dell'interessato sono riposte nell'informativa generale e nelle
 notizie
pubblicate sul sito web dell'Ateneo nella sezione Privacy.
   
Il contenuto di questo messaggio è rivolto unicamente alle persona
 cui
è indirizzato e può contenere informazioni la cui riservatezza è
tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e
   l'uso
in mancanza di autorizzazione del destinatario. Qualora il messaggio
fosse pervenuto per errore, preghiamo di eliminarlo.
   
  
 
  --
  ---
  INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI
 
  I dati utilizzati per l'invio del presente messaggio sono trattati
  dall'Università degli Studi di Brescia esclusivamente per finalità
  istituzionali. Informazioni più dettagliate anche in ordine ai diritti
  dell'interessato sono riposte nell'informativa generale e nelle notizie
  pubblicate sul sito web dell'Ateneo nella sezione Privacy.
 
  Il contenuto di questo messaggio è rivolto unicamente alle persona cui
  è indirizzato e può contenere informazioni la cui riservatezza è
  tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e
 l'uso
  in mancanza di autorizzazione del destinatario. Qualora il messaggio
  fosse pervenuto per errore, preghiamo di eliminarlo.
 


-- 
---
INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI

I dati utilizzati per l'invio del presente messaggio sono trattati 
dall'Università degli Studi di Brescia esclusivamente per finalità 
istituzionali. Informazioni più dettagliate anche in ordine ai diritti 
dell'interessato sono riposte nell'informativa generale e nelle notizie 
pubblicate sul sito web dell'Ateneo nella sezione Privacy.

Il contenuto di questo messaggio è rivolto unicamente alle persona cui 
è indirizzato e può contenere informazioni la cui riservatezza è 
tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso 
in mancanza di autorizzazione del destinatario. Qualora il messaggio 
fosse pervenuto per errore, preghiamo di eliminarlo.


Re: Print in JavaNetworkWordCount

2014-01-21 Thread Eduardo Costa Alfaia
Thanks again Tathagata for your help

Best Regards
On Jan 20, 2014, at 19:11, Tathagata Das tathagata.das1...@gmail.com wrote:

 Hi Eduardo,
 
 You can do arbitrary stuff with the data in a DStream using the operation
 foreachRDD.
 
 yourDStream.foreachRDD(rdd = {
 
   // Get and print first n elements
   val firstN = rdd.take(n)
   println(First N elements =  + firstN)
 
  // Count the number of elements in each batch
  println(RDD has  + rdd.count() +  elements)
 
 })
 
 
 Alternatively, just for printing the counts, you can also do
 
 yourDStream.count.print()
 
 Hope this helps!
 
 TD
 
 
 
 2014/1/20 Eduardo Costa Alfaia e.costaalf...@studenti.unibs.it
 
 Hi guys,
 
 Somebody help me, Where do I get change the print() function to print more
 than 10 lines in screen? Is there a manner to print the count total of all
 words in a batch?
 
 Best Regards
 --
 ---
 INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI
 
 I dati utilizzati per l'invio del presente messaggio sono trattati
 dall'Università degli Studi di Brescia esclusivamente per finalità
 istituzionali. Informazioni più dettagliate anche in ordine ai diritti
 dell'interessato sono riposte nell'informativa generale e nelle notizie
 pubblicate sul sito web dell'Ateneo nella sezione Privacy.
 
 Il contenuto di questo messaggio è rivolto unicamente alle persona cui
 è indirizzato e può contenere informazioni la cui riservatezza è
 tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso
 in mancanza di autorizzazione del destinatario. Qualora il messaggio
 fosse pervenuto per errore, preghiamo di eliminarlo.
 


-- 
---
INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI

I dati utilizzati per l'invio del presente messaggio sono trattati 
dall'Università degli Studi di Brescia esclusivamente per finalità 
istituzionali. Informazioni più dettagliate anche in ordine ai diritti 
dell'interessato sono riposte nell'informativa generale e nelle notizie 
pubblicate sul sito web dell'Ateneo nella sezione Privacy.

Il contenuto di questo messaggio è rivolto unicamente alle persona cui 
è indirizzato e può contenere informazioni la cui riservatezza è 
tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso 
in mancanza di autorizzazione del destinatario. Qualora il messaggio 
fosse pervenuto per errore, preghiamo di eliminarlo.


Re: Print in JavaNetworkWordCount

2014-01-21 Thread Eduardo Costa Alfaia
Thanks again Tathagata for your help

Best Regards
On Jan 20, 2014, at 19:11, Tathagata Das tathagata.das1...@gmail.com wrote:

 Hi Eduardo,
 
 You can do arbitrary stuff with the data in a DStream using the operation
 foreachRDD.
 
 yourDStream.foreachRDD(rdd = {
 
   // Get and print first n elements
   val firstN = rdd.take(n)
   println(First N elements =  + firstN)
 
  // Count the number of elements in each batch
  println(RDD has  + rdd.count() +  elements)
 
 })
 
 
 Alternatively, just for printing the counts, you can also do
 
 yourDStream.count.print()
 
 Hope this helps!
 
 TD
 
 
 
 2014/1/20 Eduardo Costa Alfaia e.costaalf...@studenti.unibs.it
 
 Hi guys,
 
 Somebody help me, Where do I get change the print() function to print more
 than 10 lines in screen? Is there a manner to print the count total of all
 words in a batch?
 
 Best Regards
 --
 ---
 INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI
 
 I dati utilizzati per l'invio del presente messaggio sono trattati
 dall'Università degli Studi di Brescia esclusivamente per finalità
 istituzionali. Informazioni più dettagliate anche in ordine ai diritti
 dell'interessato sono riposte nell'informativa generale e nelle notizie
 pubblicate sul sito web dell'Ateneo nella sezione Privacy.
 
 Il contenuto di questo messaggio è rivolto unicamente alle persona cui
 è indirizzato e può contenere informazioni la cui riservatezza è
 tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso
 in mancanza di autorizzazione del destinatario. Qualora il messaggio
 fosse pervenuto per errore, preghiamo di eliminarlo.
 


-- 
---
INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI

I dati utilizzati per l'invio del presente messaggio sono trattati 
dall'Università degli Studi di Brescia esclusivamente per finalità 
istituzionali. Informazioni più dettagliate anche in ordine ai diritti 
dell'interessato sono riposte nell'informativa generale e nelle notizie 
pubblicate sul sito web dell'Ateneo nella sezione Privacy.

Il contenuto di questo messaggio è rivolto unicamente alle persona cui 
è indirizzato e può contenere informazioni la cui riservatezza è 
tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso 
in mancanza di autorizzazione del destinatario. Qualora il messaggio 
fosse pervenuto per errore, preghiamo di eliminarlo.


Print in JavaNetworkWordCount

2014-01-20 Thread Eduardo Costa Alfaia
Hi guys,

Somebody help me, Where do I get change the print() function to print more than 
10 lines in screen? Is there a manner to print the count total of all words in 
a batch?

Best Regards 
-- 
---
INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI

I dati utilizzati per l'invio del presente messaggio sono trattati 
dall'Università degli Studi di Brescia esclusivamente per finalità 
istituzionali. Informazioni più dettagliate anche in ordine ai diritti 
dell'interessato sono riposte nell'informativa generale e nelle notizie 
pubblicate sul sito web dell'Ateneo nella sezione Privacy.

Il contenuto di questo messaggio è rivolto unicamente alle persona cui 
è indirizzato e può contenere informazioni la cui riservatezza è 
tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso 
in mancanza di autorizzazione del destinatario. Qualora il messaggio 
fosse pervenuto per errore, preghiamo di eliminarlo.


Re: JavaNetworkWordCount Researches

2014-01-16 Thread Eduardo Costa Alfaia
Hi Tathagata,
Thank you very much by the explain.
Another curiosity is that I did some tests with this code yesterday where I 
used three machines like worker and I can see that one these machines have had 
the RAM memory increased, about 90% in use,  in compare the others this  hasn’t 
changed drastically and in this same machine I can see that the parts of file, 
in this case I am using the book Don Quixote in txt,  are save in hard disk 
specifically in /tmp/spark-localnumbers increasing the used space. Sorry by 
the severals questions I am a newer in Stream processing and I looking for 
understand better how to work Spark DStream.

Best Regards

On Jan 16, 2014, at 1:48, Tathagata Das tathagata.das1...@gmail.com wrote:

 All the computation with the data (that is, union, flatmap, map,
 reduceByKey, reduceByKeyAndWindow) are executed on the workers in a
 distributed manner. The data is received by the worker nodes and kept in
 memory, then the computation is executed on the workers to the in-memory
 data.
 
 After the count is computed for every batch of data, the first 10 elements
 of the generated counts are brought to master for being printed on the
 screen. This is done by the counts.print() which pulls those 10 word-count
 pairs and prints them.
 
 On a related note, if you only want to counts over a window, you dont need
 the first reduceByKey. The reduceByKeyAndWindow takes care of doing the
 reduceByKey per batch and then doing the reduce across a window.
 
 TD
 
 
 On Wed, Jan 15, 2014 at 6:01 AM, Eduardo Costa Alfaia 
 e.costaalf...@studenti.unibs.it wrote:
 
 Hi Guys,
 
 I did some changes in JavaNetworkWordCount for my researches in streaming
 process and I have added to the code the following lines in red:
 
 ssc1.checkpoint(hdfs://computer22:54310/user/root/INPUT);
 JavaDStreamString lines1 = ssc1.socketTextStream(localhost,
 Integer.parseInt(12345));
 JavaDStreamString lines2 = ssc1.socketTextStream(localhost,
 Integer.parseInt(12345));
 JavaDStreamString union2 = lines1.union(lines2);
 JavaDStreamString words = union2.flatMap(new
 FlatMapFunctionString, String() {
 @Override
   public IterableString call(String x) {
  return Lists.newArrayList(SPACE.split(x));
}
 });
 JavaPairDStreamString, Integer wordCounts = words.map(
 new PairFunctionString, String, Integer() {
  @Override
  public Tuple2String, Integer call(String s) {
   return new Tuple2String, Integer(s, 1);
 }
}).reduceByKey(new Function2Integer, Integer, Integer() {
  @Override
 public Integer call(Integer i1, Integer i2) {
   return i1 + i2;
  }
});
 
  JavaPairDStreamString, Integer counts =
 wordCounts.reduceByKeyAndWindow(
new Function2Integer, Integer, Integer() {
  public Integer call(Integer i1, Integer i2) { return i1 + i2; }
},
new Function2Integer, Integer, Integer() {
  public Integer call(Integer i1, Integer i2) { return i1 - i2; }
},
new Duration(60 * 5 * 1000),
 new Duration(1 * 1000)
 
 counts.print();
   ssc1.start();
 
   }
 }
 
 
 - We did a code in C that send words to workers.
 
 - Result From Master terminal:
 
 Time: 1389794084000 ms
 ---
 (,14294)
 (impertinences,2)
 (protracted.,3)
 (burlesque.,3)
 (Dorothea,,85)
 (grant,,5)
 (temples,,2)
 (discord,17)
 (conscience,48)
 (singed,,2)
 ...
 
 ---
 Time: 1389794085000 ms
 ---
 (,38580)
 (impertinences,5)
 (protracted.,7)
 (burlesque.,7)
 (Dorothea,,259)
 (grant,,12)
 (temples,,7)
 (discord,47)
 (conscience,130)
 (singed,,5)
 ...
 
 My question is, where does it happening the union()? between in the nodes
 or in the master?  I am using three machines( 1 Master + 2 Nodes).
 How could I get a total count of the words and show in the terminal?
 
 Thanks all
 
 
 
 
 --
 ---
 INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI
 
 I dati utilizzati per l'invio del presente messaggio sono trattati
 dall'Università degli Studi di Brescia esclusivamente per finalità
 istituzionali. Informazioni più dettagliate anche in ordine ai diritti
 dell'interessato sono riposte nell'informativa generale e nelle notizie
 pubblicate sul sito web dell'Ateneo nella sezione Privacy.
 
 Il contenuto di questo messaggio è rivolto unicamente alle persona cui
 è indirizzato e può contenere informazioni la cui riservatezza è
 tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso
 in mancanza di autorizzazione del destinatario. Qualora il messaggio
 fosse pervenuto per errore, preghiamo di eliminarlo.
 


-- 
---
INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI

I dati utilizzati per l'invio del presente messaggio sono trattati 
dall'Università degli Studi di Brescia esclusivamente per finalità 
istituzionali. Informazioni più dettagliate anche in ordine ai diritti

Zookeeper and Spark

2013-12-03 Thread Eduardo Costa Alfaia
Hi Dears,

I did recently(yesterday) a clone from github of the master incubator spark, I 
am configuring spark with zookeeper, I have read the spark-standalone 
documentation, I did the zookeeper’s installation and configuration, it’s ok, 
it works, I am using 2 masters nodes but when I start start-master.sh each one
I get this error:

13/12/03 15:41:27 INFO zookeeper.ClientCnxn: Socket connection established to 
10.20.10.60/10.20.10.60:2181, initiating session
13/12/03 15:41:27 INFO zookeeper.ClientCnxn: Unable to read additional data 
from server sessionid 0x0, likely server has closed socket, closing socket 
connection and attempting reconnect
13/12/03 15:41:27 INFO zookeeper.ClientCnxn: Opening socket connection to 
server 10.20.10.60/10.20.10.60:2181. Will not attempt to authenticate using 
SASL (Unable to locate a login configuration)
13/12/03 15:41:27 INFO zookeeper.ClientCnxn: Socket connection established to 
10.20.10.60/10.20.10.60:2181, initiating session
13/12/03 15:41:27 INFO zookeeper.ClientCnxn: Unable to read additional data 
from server sessionid 0x0, likely server has closed socket, closing socket 
connection and attempting reconnect
13/12/03 15:41:27 ERROR master.SparkZooKeeperSession: Could not connect to 
ZooKeeper: system failure
13/12/03 15:41:27 ERROR master.ZooKeeperLeaderElectionAgent: ZooKeeper down! 
LeaderElectionAgent shutting down Master. 

13/12/03 15:30:29 INFO zookeeper.ClientCnxn: Socket connection established to 
10.20.10.60/10.20.10.60:2181, initiating session
13/12/03 15:30:29 INFO zookeeper.ClientCnxn: Unable to read additional data 
from server sessionid 0x0, likely server has closed socket, closing socket 
connection and attempting reconnect
13/12/03 15:30:29 INFO zookeeper.ClientCnxn: Opening socket connection to 
server 10.20.10.61/10.20.10.61:2181. Will not attempt to authenticate using 
SASL (Unable to locate a login configuration)
13/12/03 15:30:29 INFO zookeeper.ClientCnxn: Socket connection established to 
10.20.10.61/10.20.10.61:2181, initiating session
13/12/03 15:30:29 INFO zookeeper.ClientCnxn: Unable to read additional data 
from server sessionid 0x0, likely server has closed socket, closing socket 
connection and attempting reconnect
13/12/03 15:30:29 ERROR master.SparkZooKeeperSession: Could not connect to 
ZooKeeper: system failure
13/12/03 15:30:29 ERROR master.ZooKeeperLeaderElectionAgent: ZooKeeper down! 
LeaderElectionAgent shutting down Master.


and the process goes down

Anybody may help me?

Best Regards