Re: Installing Spark on Mac

2016-03-08 Thread Eduardo Costa Alfaia
Hi Aida,
The installation has detected a maven version 3.0.3. Update to 3.3.3 and
try again.
Il 08/Mar/2016 14:06, "Aida"  ha scritto:

> Hi all,
>
> Thanks everyone for your responses; really appreciate it.
>
> Eduardo - I tried your suggestions but ran into some issues, please see
> below:
>
> ukdrfs01:Spark aidatefera$ cd spark-1.6.0
> ukdrfs01:spark-1.6.0 aidatefera$ build/mvn -DskipTests clean package
> Using `mvn` from path: /usr/bin/mvn
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
> MaxPermSize=512M;
> support was removed in 8.0
> [INFO] Scanning for projects...
> [INFO]
> 
> [INFO] Reactor Build Order:
> [INFO]
> [INFO] Spark Project Parent POM
> [INFO] Spark Project Test Tags
> [INFO] Spark Project Launcher
> [INFO] Spark Project Networking
> [INFO] Spark Project Shuffle Streaming Service
> [INFO] Spark Project Unsafe
> [INFO] Spark Project Core
> [INFO] Spark Project Bagel
> [INFO] Spark Project GraphX
> [INFO] Spark Project Streaming
> [INFO] Spark Project Catalyst
> [INFO] Spark Project SQL
> [INFO] Spark Project ML Library
> [INFO] Spark Project Tools
> [INFO] Spark Project Hive
> [INFO] Spark Project Docker Integration Tests
> [INFO] Spark Project REPL
> [INFO] Spark Project Assembly
> [INFO] Spark Project External Twitter
> [INFO] Spark Project External Flume Sink
> [INFO] Spark Project External Flume
> [INFO] Spark Project External Flume Assembly
> [INFO] Spark Project External MQTT
> [INFO] Spark Project External MQTT Assembly
> [INFO] Spark Project External ZeroMQ
> [INFO] Spark Project External Kafka
> [INFO] Spark Project Examples
> [INFO] Spark Project External Kafka Assembly
> [INFO]
> [INFO]
> 
> [INFO] Building Spark Project Parent POM 1.6.0
> [INFO]
> 
> [INFO]
> [INFO] --- maven-clean-plugin:2.6.1:clean (default-clean) @
> spark-parent_2.10 ---
> [INFO]
> [INFO] --- maven-enforcer-plugin:1.4:enforce (enforce-versions) @
> spark-parent_2.10 ---
> [WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion
> failed with message:
> Detected Maven Version: 3.0.3 is not in the allowed range 3.3.3.
> [INFO]
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Spark Project Parent POM .. FAILURE [0.821s]
> [INFO] Spark Project Test Tags ... SKIPPED
> [INFO] Spark Project Launcher  SKIPPED
> [INFO] Spark Project Networking .. SKIPPED
> [INFO] Spark Project Shuffle Streaming Service ... SKIPPED
> [INFO] Spark Project Unsafe .. SKIPPED
> [INFO] Spark Project Core  SKIPPED
> [INFO] Spark Project Bagel ... SKIPPED
> [INFO] Spark Project GraphX .. SKIPPED
> [INFO] Spark Project Streaming ... SKIPPED
> [INFO] Spark Project Catalyst  SKIPPED
> [INFO] Spark Project SQL . SKIPPED
> [INFO] Spark Project ML Library .. SKIPPED
> [INFO] Spark Project Tools ... SKIPPED
> [INFO] Spark Project Hive  SKIPPED
> [INFO] Spark Project Docker Integration Tests  SKIPPED
> [INFO] Spark Project REPL  SKIPPED
> [INFO] Spark Project Assembly  SKIPPED
> [INFO] Spark Project External Twitter  SKIPPED
> [INFO] Spark Project External Flume Sink . SKIPPED
> [INFO] Spark Project External Flume .. SKIPPED
> [INFO] Spark Project External Flume Assembly . SKIPPED
> [INFO] Spark Project External MQTT ... SKIPPED
> [INFO] Spark Project External MQTT Assembly .. SKIPPED
> [INFO] Spark Project External ZeroMQ . SKIPPED
> [INFO] Spark Project External Kafka .. SKIPPED
> [INFO] Spark Project Examples  SKIPPED
> [INFO] Spark Project External Kafka Assembly . SKIPPED
> [INFO]
> 
> [INFO] BUILD FAILURE
> [INFO]
> 
> [INFO] Total time: 1.745s
> [INFO] Finished at: Tue Mar 08 18:01:48 GMT 2016
> [INFO] Final Memory: 19M/183M
> [INFO]
> 
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-enforcer-plugin:1.4:enforce
> (enforce-versions) on project spark-parent_2.10: Some Enforcer rules have
> failed. Look above for specific messages explaining why 

Re: Installing Spark on Mac

2016-03-04 Thread Eduardo Costa Alfaia
Hi Aida

Run only "build/mvn -DskipTests clean package”

BR

Eduardo Costa Alfaia
Ph.D. Student in Telecommunications Engineering
Università degli Studi di Brescia
Tel: +39 3209333018








On 3/4/16, 16:18, "Aida" <aida1.tef...@gmail.com> wrote:

>Hi all,
>
>I am a complete novice and was wondering whether anyone would be willing to
>provide me with a step by step guide on how to install Spark on a Mac; on
>standalone mode btw.
>
>I downloaded a prebuilt version, the second version from the top. However, I
>have not installed Hadoop and am not planning to at this stage.
>
>I also downloaded Scala from the Scala website, do I need to download
>anything else?
>
>I am very eager to learn more about Spark but am unsure about the best way
>to do it.
>
>I would be happy for any suggestions or ideas
>
>Many thanks,
>
>Aida
>
>
>
>--
>View this message in context: 
>http://apache-spark-user-list.1001560.n3.nabble.com/Installing-Spark-on-Mac-tp26397.html
>Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>-
>To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>For additional commands, e-mail: user-h...@spark.apache.org
>


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Accessing Web UI

2016-02-19 Thread Eduardo Costa Alfaia

Hi,
try http://OAhtvJ5MCA:8080

BR




On 2/19/16, 07:18, "vasbhat"  wrote:

>OAhtvJ5MCA


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Eduardo Costa Alfaia
Hi Gourav,

I did a prove as you said, for me it’s working, I am using spark in local mode, 
master and worker in the same machine. I run the example in spark-shell 
—package com.databricks:spark-csv_2.10:1.3.0 without errors.

BR

From:  Gourav Sengupta 
Date:  Monday, February 15, 2016 at 10:03
To:  Jorge Machado 
Cc:  Spark Group 
Subject:  Re: Using SPARK packages in Spark Cluster

Hi Jorge/ All,

Please please please go through this link  
http://spark.apache.org/docs/latest/spark-standalone.html. 
The link tells you how to start a SPARK cluster in local mode. If you have not 
started or worked in SPARK cluster in local mode kindly do not attempt in 
answering this question.

My question is how to use packages like  
https://github.com/databricks/spark-csv when I using SPARK cluster in local 
mode.

Regards,
Gourav Sengupta


On Mon, Feb 15, 2016 at 1:55 PM, Jorge Machado  wrote:
Hi Gourav, 

I did not unterstand your problem… the - - packages  command should not make 
any difference if you are running standalone or in YARN for example.  
Give us an example what packages are you trying to load, and what error are you 
getting…  If you want to use the libraries in spark-packages.org without the 
--packages why do you not use maven ? 
Regards 


On 12/02/2016, at 13:22, Gourav Sengupta  wrote:

Hi,

I am creating sparkcontext in a SPARK standalone cluster as mentioned here: 
http://spark.apache.org/docs/latest/spark-standalone.html using the following 
code:

--
sc.stop()
conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False) \
  .setMaster("spark://hostname:7077") \
  .set('spark.shuffle.service.enabled', True) \
  .set('spark.dynamicAllocation.enabled','true') \
  .set('spark.executor.memory','20g') \
  .set('spark.driver.memory', '4g') \
  .set('spark.default.parallelism',(multiprocessing.cpu_count() 
-1 ))
conf.getAll()
sc = SparkContext(conf = conf)

-(we should definitely be able to optimise the configuration but that is 
not the point here) ---

I am not able to use packages, a list of which is mentioned here 
http://spark-packages.org, using this method. 

Where as if I use the standard "pyspark --packages" option then the packages 
load just fine.

I will be grateful if someone could kindly let me know how to load packages 
when starting a cluster as mentioned above.


Regards,
Gourav Sengupta




-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


unsubscribe email

2016-02-01 Thread Eduardo Costa Alfaia
Hi Guys,
How could I unsubscribe the email e.costaalf...@studenti.unibs.it, that is an 
alias from my email e.costaalf...@unibs.it and it is registered in the mail 
list .

Thanks

Eduardo Costa Alfaia
PhD Student Telecommunication Engineering
Università degli Studi di Brescia-UNIBS


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Similar code in Java

2015-02-11 Thread Eduardo Costa Alfaia
Thanks Ted.

 On Feb 10, 2015, at 20:06, Ted Yu yuzhih...@gmail.com wrote:
 
 Please take a look at:
 examples/scala-2.10/src/main/java/org/apache/spark/examples/streaming/JavaDirectKafkaWordCount.java
 which was checked in yesterday.
 
 On Sat, Feb 7, 2015 at 10:53 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it 
 mailto:e.costaalf...@unibs.it wrote:
 Hi Ted,
 
 I’ve seen the codes, I am using  JavaKafkaWordCount.java but I would like 
 reproducing in java that I’ve done in scala. Is it possible doing the same 
 thing that scala code does in java?
 Principally this code below or something looks liked:
 
 val KafkaDStreams = (1 to numStreams) map {_ =
  KafkaUtils.createStream[String, String, StringDecoder, 
 StringDecoder](ssc, kafkaParams, topicMap,storageLevel = 
 StorageLevel.MEMORY_ONLY).map(_._2)
 
 
 

 On Feb 7, 2015, at 19:32, Ted Yu yuzhih...@gmail.com 
 mailto:yuzhih...@gmail.com wrote:
 
 Can you take a look at:
 
 ./examples/scala-2.10/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java
 ./external/kafka/src/test/java/org/apache/spark/streaming/kafka/JavaKafkaStreamSuite.java
 
 Cheers
 
 On Sat, Feb 7, 2015 at 9:45 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it 
 mailto:e.costaalf...@unibs.it wrote:
 Hi Guys,
 
 How could I doing in Java the code scala below?
 
 val KafkaDStreams = (1 to numStreams) map {_ =
  KafkaUtils.createStream[String, String, StringDecoder, 
 StringDecoder](ssc, kafkaParams, topicMap,storageLevel = 
 StorageLevel.MEMORY_ONLY).map(_._2)
   
   }
 val unifiedStream = ssc.union(KafkaDStreams)
 val sparkProcessingParallelism = 1
 unifiedStream.repartition(sparkProcessingParallelism)
 
 Thanks Guys
 
 Informativa sulla Privacy: http://www.unibs.it/node/8155 
 http://www.unibs.it/node/8155
 
 
 Informativa sulla Privacy: http://www.unibs.it/node/8155 
 http://www.unibs.it/node/8155


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Similar code in Java

2015-02-07 Thread Eduardo Costa Alfaia
Hi Guys,

How could I doing in Java the code scala below?

val KafkaDStreams = (1 to numStreams) map {_ =
 KafkaUtils.createStream[String, String, StringDecoder, StringDecoder](ssc, 
kafkaParams, topicMap,storageLevel = StorageLevel.MEMORY_ONLY).map(_._2)
  
  }
val unifiedStream = ssc.union(KafkaDStreams)
val sparkProcessingParallelism = 1
unifiedStream.repartition(sparkProcessingParallelism)

Thanks Guys
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Error KafkaStream

2015-02-05 Thread Eduardo Costa Alfaia
Hi Guys, 
I’m getting this error in KafkaWordCount;

TaskSetManager: Lost task 0.0 in stage 4095.0 (TID 1281, 10.20.10.234): 
java.lang.ClassCastException: [B cannot be cast to java.lang.String 

 
at 
org.apache.spark.examples.streaming.KafkaWordCount$$anonfun$4$$anonfun$apply$1.apply(KafkaWordCount.scala:7


Some idea that could be?


Bellow the piece of code



val kafkaStream = { 

  
val kafkaParams = Map[String, String](  

  
zookeeper.connect - achab3:2181,   

  
group.id - mygroup,

 
zookeeper.connect.timeout.ms - 1,  

 
kafka.fetch.message.max.bytes - 400,   


auto.offset.reset - largest)   

  



val topicpMap = topics.split(,).map((_,numThreads.toInt)).toMap   

 
  //val lines = KafkaUtils.createStream[String, String, DefaultDecoder, 
DefaultDecoder](ssc, kafkaParams, topicpMa  
  
p, storageLevel = StorageLevel.MEMORY_ONLY_SER).map(_._2)   


val KafkaDStreams = (1 to numStreams).map {_ = 


KafkaUtils.createStream[String, String, DefaultDecoder, 
DefaultDecoder](ssc, kafkaParams, topicpMap, storageLe
vel = StorageLevel.MEMORY_ONLY_SER).map(_._2)   

 
}   

 
val unifiedStream = ssc.union(KafkaDStreams)
  
unifiedStream.repartition(sparkProcessingParallelism)   
  
 }

Thanks Guys
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Error KafkaStream

2015-02-05 Thread Eduardo Costa Alfaia
I don’t think so Sean.

 On Feb 5, 2015, at 16:57, Sean Owen so...@cloudera.com wrote:
 
 Is SPARK-4905 / https://github.com/apache/spark/pull/4371/files the same 
 issue?
 
 On Thu, Feb 5, 2015 at 7:03 AM, Eduardo Costa Alfaia
 e.costaalf...@unibs.it wrote:
 Hi Guys,
 I’m getting this error in KafkaWordCount;
 
 TaskSetManager: Lost task 0.0 in stage 4095.0 (TID 1281, 10.20.10.234):
 java.lang.ClassCastException: [B cannot be cast to java.lang.String
at
 org.apache.spark.examples.streaming.KafkaWordCount$$anonfun$4$$anonfun$apply$1.apply(KafkaWordCount.scala:7
 
 
 Some idea that could be?
 
 
 Bellow the piece of code
 
 
 
 val kafkaStream = {
val kafkaParams = Map[String, String](
zookeeper.connect - achab3:2181,
group.id - mygroup,
zookeeper.connect.timeout.ms - 1,
kafka.fetch.message.max.bytes - 400,
auto.offset.reset - largest)
 
val topicpMap = topics.split(,).map((_,numThreads.toInt)).toMap
  //val lines = KafkaUtils.createStream[String, String, DefaultDecoder,
 DefaultDecoder](ssc, kafkaParams, topicpMa
 p, storageLevel = StorageLevel.MEMORY_ONLY_SER).map(_._2)
val KafkaDStreams = (1 to numStreams).map {_ =
KafkaUtils.createStream[String, String, DefaultDecoder,
 DefaultDecoder](ssc, kafkaParams, topicpMap, storageLe
 vel = StorageLevel.MEMORY_ONLY_SER).map(_._2)
}
val unifiedStream = ssc.union(KafkaDStreams)
unifiedStream.repartition(sparkProcessingParallelism)
 }
 
 Thanks Guys
 
 Informativa sulla Privacy: http://www.unibs.it/node/8155


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Error KafkaStream

2015-02-05 Thread Eduardo Costa Alfaia
Hi Shao, 
When I changed to StringDecoder I’ve get this compiling error:

[error] 
/sata_disk/workspace/spark-1.2.0/examples/scala-2.10/src/main/scala/org/apache/spark/examples/streaming/KafkaW
ordCount.scala:78: not found: type StringDecoder
[error] KafkaUtils.createStream[String, String, StringDecoder, 
StringDecoder](ssc, kafkaParams, topicMap,stora
geLevel = StorageLevel.MEMORY_ONLY_SER).map(_._2)
[error] ^
[error] 
/sata_disk/workspace/spark-1.2.0/examples/scala-2.10/src/main/scala/org/apache/spark/examples/streaming/KafkaW
ordCount.scala:85: value split is not a member of Nothing
[error] val words = unifiedStream.flatMap(_.split( ))
[error] ^
[error] 
/sata_disk/workspace/spark-1.2.0/examples/scala-2.10/src/main/scala/org/apache/spark/examples/streaming/KafkaW
ordCount.scala:86: value reduceByKeyAndWindow is not a member of 
org.apache.spark.streaming.dstream.DStream[(Nothing, 
Long)]
[error] val wordCounts = words.map(x = (x, 1L)).reduceByKeyAndWindow(_ + 
_, _ - _, Seconds(20), Seconds(10), 2)
[error]  ^
[error] three errors found
[error] (examples/compile:compile) Compilation failed


 On Feb 6, 2015, at 02:11, Shao, Saisai saisai.s...@intel.com wrote:
 
 Hi,
 
 I think you should change the `DefaultDecoder` of your type parameter into 
 `StringDecoder`, seems you want to decode the message into String. 
 `DefaultDecoder` is to return Array[Byte], not String, so here class casting 
 will meet error.
 
 Thanks
 Jerry
 
 -Original Message-
 From: Eduardo Costa Alfaia [mailto:e.costaalf...@unibs.it] 
 Sent: Friday, February 6, 2015 12:04 AM
 To: Sean Owen
 Cc: user@spark.apache.org
 Subject: Re: Error KafkaStream
 
 I don’t think so Sean.
 
 On Feb 5, 2015, at 16:57, Sean Owen so...@cloudera.com wrote:
 
 Is SPARK-4905 / https://github.com/apache/spark/pull/4371/files the same 
 issue?
 
 On Thu, Feb 5, 2015 at 7:03 AM, Eduardo Costa Alfaia 
 e.costaalf...@unibs.it wrote:
 Hi Guys,
 I’m getting this error in KafkaWordCount;
 
 TaskSetManager: Lost task 0.0 in stage 4095.0 (TID 1281, 10.20.10.234):
 java.lang.ClassCastException: [B cannot be cast to java.lang.String
   at
 org.apache.spark.examples.streaming.KafkaWordCount$$anonfun$4$$anonfu
 n$apply$1.apply(KafkaWordCount.scala:7
 
 
 Some idea that could be?
 
 
 Bellow the piece of code
 
 
 
 val kafkaStream = {
   val kafkaParams = Map[String, String](
   zookeeper.connect - achab3:2181,
   group.id - mygroup,
   zookeeper.connect.timeout.ms - 1,
   kafka.fetch.message.max.bytes - 400,
   auto.offset.reset - largest)
 
   val topicpMap = topics.split(,).map((_,numThreads.toInt)).toMap
 //val lines = KafkaUtils.createStream[String, String, DefaultDecoder,
 DefaultDecoder](ssc, kafkaParams, topicpMa p, storageLevel = 
 StorageLevel.MEMORY_ONLY_SER).map(_._2)
   val KafkaDStreams = (1 to numStreams).map {_ =
   KafkaUtils.createStream[String, String, DefaultDecoder, 
 DefaultDecoder](ssc, kafkaParams, topicpMap, storageLe vel = 
 StorageLevel.MEMORY_ONLY_SER).map(_._2)
   }
   val unifiedStream = ssc.union(KafkaDStreams)
   unifiedStream.repartition(sparkProcessingParallelism)
 }
 
 Thanks Guys
 
 Informativa sulla Privacy: http://www.unibs.it/node/8155
 
 
 --
 Informativa sulla Privacy: http://www.unibs.it/node/8155
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
 commands, e-mail: user-h...@spark.apache.org
 


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


KafkaWordCount

2015-01-30 Thread Eduardo Costa Alfaia
Hi Guys,

I would like to put in the kafkawordcount scala code the kafka parameter:  val 
kafkaParams = Map(“fetch.message.max.bytes” - “400”). I’ve put this 
variable like this

val KafkaDStreams = (1 to numStreams) map {_ = 

  
KafkaUtils.createStream(ssc, kafkaParams, zkQuorum, group, 
topicpMap).map(_._2)


However I’ve gotten these erros:

 (jssc: org.apache.spark.streaming.api.java.JavaStreamingContext,zkQuorum: 
String,groupId: String,topics: jav  
   a.util.Map[String,Integer],storageLevel: 
org.apache.spark.storage.StorageLevel)org.apache.spark.streaming.api.java.Jav   
 
aPairReceiverInputDStream[String,String] and  

   
[error]   (ssc: org.apache.spark.streaming.StreamingContext,zkQuorum: 
String,groupId: String,topics: scala.collection.
 
immutable.Map[String,Int],storageLevel: 
org.apache.spark.storage.StorageLevel)org.apache.spark.streaming.dstream.Recei  
   
verInputDStream[(String, String)]

Thanks
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Error Compiling

2015-01-30 Thread Eduardo Costa Alfaia
Hi Guys,

some idea how solve this error

[error] 
/sata_disk/workspace/spark-1.1.1/examples/src/main/scala/org/apache/spark/examples/streaming/KafkaWordCount.scala:76:
 missing parameter type for expanded function ((x$6, x$7) = x$6.$plus(x$7))

[error] val wordCounts = words.map(x = (x, 1L)).reduceByWindow(_ + _, _ - 
_, Minutes(1), Seconds(2), 2)

Thanks
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


JavaKafkaWordCount

2014-11-18 Thread Eduardo Costa Alfaia
Hi Guys,

I am doing some tests with JavaKafkaWordCount, my cluster is composed by 8 
workers and 1 driver con spark-1.1.0, I am using Kafka too and I have some 
questions about.

1 - When I launch the command:
bin/spark-submit --class org.apache.spark.examples.streaming.JavaKafkaWordCount 
—master spark://computer8:7077 --driver-memory 1g --executor-memory 2g 
--executor-cores 2 
examples/target/scala-2.10/spark-examples-1.1.0-hadoop1.0.4.jar computer49:2181 
test-consumer-group test 2

I see in the Spark WebAdmin that only 1 worker work. Why?

2 - In Kafka I can see the same thing:

Group   Topic  Pid Offset  logSize  
   Lag Owner
test-consumer-group test  0   147092  147092
  0   test-consumer-group_computer1-1416319543858-817b566f-0
test-consumer-group test  1   232183  232183
  0   test-consumer-group_computer1-1416319543858-817b566f-0
test-consumer-group test  2   186805  186805
  0   test-consumer-group_computer1-1416319543858-817b566f-0
test-consumer-group test  3   0   0 
  0   test-consumer-group_computer1-1416319543858-817b566f-1
test-consumer-group test  4   0   0 
  0   test-consumer-group_computer1-1416319543858-817b566f-1
test-consumer-group test  5   0   0 
  0   test-consumer-group_computer1-1416319543858-817b566f-1

I would like to understand this behavior, Is it normal? Am I doing something 
wrong?

Thanks Guys
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Kafka examples

2014-11-13 Thread Eduardo Costa Alfaia
Hi guys,

The Kafka’s examples in master branch were canceled?

Thanks
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark and Kafka

2014-11-06 Thread Eduardo Costa Alfaia
Hi Guys,

I am doing some tests with Spark Streaming and Kafka, but I have seen something 
strange, I have modified the JavaKafkaWordCount to use ReducebyKeyandWindow and 
to print in the screen the accumulated numbers of the words, in the beginning 
spark works very well in each interaction the numbers of the words increase but 
after 12 a 13 sec the results repeats continually. 

My program producer remain sending the words toward the kafka.

Does anyone have any idea about this?


---
Time: 1415272266000 ms
---
(accompanied
them,6)
(merrier,5)
(it
possessed,5)
(the
treacherous,5)
(Quite,12)
(offer,273)
(rabble,58)
(exchanging,16)
(Genoa,18)
(merchant,41)
...
---
Time: 1415272267000 ms
---
(accompanied
them,12)
(merrier,12)
(it
possessed,12)
(the
treacherous,11)
(Quite,24)
(offer,602)
(rabble,132)
(exchanging,35)
(Genoa,36)
(merchant,84)
...
---
Time: 1415272268000 ms
---
(accompanied
them,17)
(merrier,18)
(it
possessed,17)
(the
treacherous,17)
(Quite,35)
(offer,889)
(rabble,192)
(the
bed,1)
(exchanging,51)
(Genoa,54)
...
---
Time: 1415272269000 ms
---
(accompanied
them,17)
(merrier,18)
(it
possessed,17)
(the
treacherous,17)
(Quite,35)
(offer,889)
(rabble,192)
(the
bed,1)
(exchanging,51)
(Genoa,54)
...

---
Time: 141527227 ms
---
(accompanied
them,17)
(merrier,18)
(it
possessed,17)
(the
treacherous,17)
(Quite,35)
(offer,889)
(rabble,192)
(the
bed,1)
(exchanging,51)
(Genoa,54)
...


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Spark and Kafka

2014-11-06 Thread Eduardo Costa Alfaia
This is my window:

reduceByKeyAndWindow(
   new Function2Integer, Integer, Integer() {
@Override
 public Integer call(Integer i1, Integer i2) { return i1 + i2; }
   },
   new Function2Integer, Integer, Integer() {
 public Integer call(Integer i1, Integer i2) { return i1 - i2; }
   },
   new Duration(60 * 5 * 1000),
   new Duration(1 * 1000)
 );

 On Nov 6, 2014, at 18:37, Gwen Shapira gshap...@cloudera.com wrote:
 
 What's the window size? If the window is around 10 seconds and you are
 sending data at very stable rate, this is expected.
 
 
 
 On Thu, Nov 6, 2014 at 9:32 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it
 wrote:
 
 Hi Guys,
 
 I am doing some tests with Spark Streaming and Kafka, but I have seen
 something strange, I have modified the JavaKafkaWordCount to use
 ReducebyKeyandWindow and to print in the screen the accumulated numbers of
 the words, in the beginning spark works very well in each interaction the
 numbers of the words increase but after 12 a 13 sec the results repeats
 continually.
 
 My program producer remain sending the words toward the kafka.
 
 Does anyone have any idea about this?
 
 
 ---
 Time: 1415272266000 ms
 ---
 (accompanied
 them,6)
 (merrier,5)
 (it
 possessed,5)
 (the
 treacherous,5)
 (Quite,12)
 (offer,273)
 (rabble,58)
 (exchanging,16)
 (Genoa,18)
 (merchant,41)
 ...
 ---
 Time: 1415272267000 ms
 ---
 (accompanied
 them,12)
 (merrier,12)
 (it
 possessed,12)
 (the
 treacherous,11)
 (Quite,24)
 (offer,602)
 (rabble,132)
 (exchanging,35)
 (Genoa,36)
 (merchant,84)
 ...
 ---
 Time: 1415272268000 ms
 ---
 (accompanied
 them,17)
 (merrier,18)
 (it
 possessed,17)
 (the
 treacherous,17)
 (Quite,35)
 (offer,889)
 (rabble,192)
 (the
 bed,1)
 (exchanging,51)
 (Genoa,54)
 ...
 ---
 Time: 1415272269000 ms
 ---
 (accompanied
 them,17)
 (merrier,18)
 (it
 possessed,17)
 (the
 treacherous,17)
 (Quite,35)
 (offer,889)
 (rabble,192)
 (the
 bed,1)
 (exchanging,51)
 (Genoa,54)
 ...
 
 ---
 Time: 141527227 ms
 ---
 (accompanied
 them,17)
 (merrier,18)
 (it
 possessed,17)
 (the
 treacherous,17)
 (Quite,35)
 (offer,889)
 (rabble,192)
 (the
 bed,1)
 (exchanging,51)
 (Genoa,54)
 ...
 
 
 --
 Informativa sulla Privacy: http://www.unibs.it/node/8155
 


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Spark Kafka Performance

2014-11-03 Thread Eduardo Costa Alfaia
Hi Guys,
Anyone could explain me how to work Kafka with Spark, I am using the 
JavaKafkaWordCount.java like a test and the line command is:

./run-example org.apache.spark.streaming.examples.JavaKafkaWordCount 
spark://192.168.0.13:7077 computer49:2181 test-consumer-group unibs.it 3

and like a producer I am using this command:

rdkafka_cachesender -t unibs.nec -p 1 -b 192.168.0.46:9092 -f output.txt -l 100 
-n 10


rdkafka_cachesender is a program that was developed by me which send to kafka 
the output.txt’s content where -l is the length of each send(upper bound) and 
-n is the lines to send in a row. Bellow is the throughput calculated by the 
program:

File is 2235755 bytes
throughput (b/s) = 699751388
throughput (b/s) = 723542382
throughput (b/s) = 662989745
throughput (b/s) = 505028200
throughput (b/s) = 471263416
throughput (b/s) = 446837266
throughput (b/s) = 409856716
throughput (b/s) = 373994467
throughput (b/s) = 366343097
throughput (b/s) = 373240017
throughput (b/s) = 386139016
throughput (b/s) = 373802209
throughput (b/s) = 369308515
throughput (b/s) = 366935820
throughput (b/s) = 365175388
throughput (b/s) = 362175419
throughput (b/s) = 358356633
throughput (b/s) = 357219124
throughput (b/s) = 352174125
throughput (b/s) = 348313093
throughput (b/s) = 355099099
throughput (b/s) = 348069777
throughput (b/s) = 348478302
throughput (b/s) = 340404276
throughput (b/s) = 339876031
throughput (b/s) = 339175102
throughput (b/s) = 327555252
throughput (b/s) = 324272374
throughput (b/s) = 322479222
throughput (b/s) = 319544906
throughput (b/s) = 317201853
throughput (b/s) = 317351399
throughput (b/s) = 315027978
throughput (b/s) = 313831014
throughput (b/s) = 310050384
throughput (b/s) = 307654601
throughput (b/s) = 305707061
throughput (b/s) = 307961102
throughput (b/s) = 296898200
throughput (b/s) = 296409904
throughput (b/s) = 294609332
throughput (b/s) = 293397843
throughput (b/s) = 293194876
throughput (b/s) = 291724886
throughput (b/s) = 290031314
throughput (b/s) = 289747022
throughput (b/s) = 289299632

The throughput goes down after some seconds and it does not maintain the 
performance like the initial values:

throughput (b/s) = 699751388
throughput (b/s) = 723542382
throughput (b/s) = 662989745

Another question is about spark, after I have started the spark line command 
after 15 sec spark continue to repeat the words counted, but my program 
continue to send words to kafka, so I mean that the words counted in spark 
should grow up. I have attached the log from spark.
  
My Case is:

ComputerA(Kafka_cachsesender) - ComputerB(Kakfa-Brokers-Zookeeper) - 
ComputerC (Spark)
 
If I don’t explain very well send a reply to me.

Thanks Guys
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Spark's Behavior 2

2014-05-13 Thread Eduardo Costa Alfaia
Hi TD,

I have sent more informations now using 8 workers. The gap has been 27 sec now. 
Have you seen?
Thanks

BR
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Spark's behavior

2014-05-06 Thread Eduardo Costa Alfaia
Ok Andrew,
Thanks

I sent informations of test with 8 worker and the gap is grown up.

 
On May 4, 2014, at 2:31, Andrew Ash and...@andrewash.com wrote:

 From the logs, I see that the print() starts printing stuff 10 seconds 
 after the context is started. And that 10 seconds is taken by the initial 
 empty job (50 map + 20 reduce tasks) that spark streaming starts to ensure 
 all the executors have started. Somehow the first empty task takes 7-8 
 seconds to complete. See if this can be reproduced by running a simple, 
 empty job in spark shell (in the same cluster) and see if the first task 
 takes 7-8 seconds. 
 
 Either way, I didnt see the 30 second gap, but a 10 second gap. And that 
 does not seem to be a persistent problem as after that 10 seconds, the data 
 is being received and processed.
 
 TD


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Spark's behavior

2014-05-03 Thread Eduardo Costa Alfaia
Hi TD, Thanks for reply
This last experiment I did with one computer, like local, but I think that time 
gap grow up when I add more computer. I will do again now with 8 worker and 1 
word source and I will see what’s go on. I will control the time too, like 
suggested by Andrew. 
On May 3, 2014, at 1:19, Tathagata Das tathagata.das1...@gmail.com wrote:

 From the logs, I see that the print() starts printing stuff 10 seconds after 
 the context is started. And that 10 seconds is taken by the initial empty job 
 (50 map + 20 reduce tasks) that spark streaming starts to ensure all the 
 executors have started. Somehow the first empty task takes 7-8 seconds to 
 complete. See if this can be reproduced by running a simple, empty job in 
 spark shell (in the same cluster) and see if the first task takes 7-8 
 seconds. 
 
 Either way, I didnt see the 30 second gap, but a 10 second gap. And that does 
 not seem to be a persistent problem as after that 10 seconds, the data is 
 being received and processed.
 
 TD
 
 
 On Fri, May 2, 2014 at 2:14 PM, Eduardo Costa Alfaia e.costaalf...@unibs.it 
 wrote:
 Hi TD,
 
 I got the another information today using Spark 1.0 RC3 and the situation 
 remain the same:
 PastedGraphic-1.png
 
 The lines begin after 17 sec:
 
 14/05/02 21:52:25 INFO SparkDeploySchedulerBackend: Granted executor ID 
 app-20140502215225-0005/0 on hostPort computer8.ant-net:57229 with 2 cores, 
 2.0 GB RAM
 14/05/02 21:52:25 INFO AppClient$ClientActor: Executor updated: 
 app-20140502215225-0005/0 is now RUNNING
 14/05/02 21:52:25 INFO ReceiverTracker: ReceiverTracker started
 14/05/02 21:52:26 INFO ForEachDStream: metadataCleanupDelay = -1
 14/05/02 21:52:26 INFO SocketInputDStream: metadataCleanupDelay = -1
 14/05/02 21:52:26 INFO SocketInputDStream: Slide time = 1000 ms
 14/05/02 21:52:26 INFO SocketInputDStream: Storage level = 
 StorageLevel(false, false, false, false, 1)
 14/05/02 21:52:26 INFO SocketInputDStream: Checkpoint interval = null
 14/05/02 21:52:26 INFO SocketInputDStream: Remember duration = 1000 ms
 14/05/02 21:52:26 INFO SocketInputDStream: Initialized and validated 
 org.apache.spark.streaming.dstream.SocketInputDStream@5433868e
 14/05/02 21:52:26 INFO ForEachDStream: Slide time = 1000 ms
 14/05/02 21:52:26 INFO ForEachDStream: Storage level = StorageLevel(false, 
 false, false, false, 1)
 14/05/02 21:52:26 INFO ForEachDStream: Checkpoint interval = null
 14/05/02 21:52:26 INFO ForEachDStream: Remember duration = 1000 ms
 14/05/02 21:52:26 INFO ForEachDStream: Initialized and validated 
 org.apache.spark.streaming.dstream.ForEachDStream@1ebdbc05
 14/05/02 21:52:26 INFO SparkContext: Starting job: collect at 
 ReceiverTracker.scala:270
 14/05/02 21:52:26 INFO RecurringTimer: Started timer for JobGenerator at time 
 1399060346000
 14/05/02 21:52:26 INFO JobGenerator: Started JobGenerator at 1399060346000 ms
 14/05/02 21:52:26 INFO JobScheduler: Started JobScheduler
 14/05/02 21:52:26 INFO DAGScheduler: Registering RDD 3 (reduceByKey at 
 ReceiverTracker.scala:270)
 14/05/02 21:52:26 INFO ReceiverTracker: Stream 0 received 0 blocks
 14/05/02 21:52:26 INFO DAGScheduler: Got job 0 (collect at 
 ReceiverTracker.scala:270) with 20 output partitions (allowLocal=false)
 14/05/02 21:52:26 INFO DAGScheduler: Final stage: Stage 0(collect at 
 ReceiverTracker.scala:270)
 14/05/02 21:52:26 INFO DAGScheduler: Parents of final stage: List(Stage 1)
 14/05/02 21:52:26 INFO JobScheduler: Added jobs for time 1399060346000 ms
 14/05/02 21:52:26 INFO JobScheduler: Starting job streaming job 1399060346000 
 ms.0 from job set of time 1399060346000 ms
 14/05/02 21:52:26 INFO JobGenerator: Checkpointing graph for time 
 1399060346000 ms
 ---14/05/02 21:52:26 INFO 
 DStreamGraph: Updating checkpoint data for time 1399060346000 ms
 
 Time: 1399060346000 ms
 ---
 
 14/05/02 21:52:26 INFO JobScheduler: Finished job streaming job 1399060346000 
 ms.0 from job set of time 1399060346000 ms
 14/05/02 21:52:26 INFO JobScheduler: Total delay: 0.325 s for time 
 1399060346000 ms (execution: 0.024 s)
 
 
 
 14/05/02 21:52:42 INFO JobScheduler: Added jobs for time 1399060362000 ms
 14/05/02 21:52:42 INFO JobGenerator: Checkpointing graph for time 
 1399060362000 ms
 14/05/02 21:52:42 INFO DStreamGraph: Updating checkpoint data for time 
 1399060362000 ms
 14/05/02 21:52:42 INFO DStreamGraph: Updated checkpoint data for time 
 1399060362000 ms
 14/05/02 21:52:42 INFO JobScheduler: Starting job streaming job 1399060362000 
 ms.0 from job set of time 1399060362000 ms
 14/05/02 21:52:42 INFO SparkContext: Starting job: take at DStream.scala:593
 14/05/02 21:52:42 INFO DAGScheduler: Got job 2 (take at DStream.scala:593) 
 with 1 output partitions (allowLocal=true)
 14/05/02 21:52:42 INFO DAGScheduler: Final stage: Stage 3(take at 
 DStream.scala:593)
 14/05/02 21:52:42 INFO DAGScheduler: Parents of final stage: List()
 14/05/02 21:52:42 INFO

Spark's behavior

2014-04-29 Thread Eduardo Costa Alfaia
Hi TD,

In my tests with spark streaming, I'm using JavaNetworkWordCount(modified) code 
and a program that I wrote that sends words to the Spark worker, I use TCP as 
transport. I verified that after starting Spark, it connects to my source which 
actually starts sending, but the first word count is advertised approximately 
30 seconds after the context creation. So I'm wondering where is stored the 30 
seconds data already sent by the source. Is this a normal spark’s behaviour? I 
saw the same behaviour using the shipped JavaNetworkWordCount application.

Many thanks.
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Spark's behavior

2014-04-29 Thread Eduardo Costa Alfaia
Hi TD,
We are not using stream context with master local, we have 1 Master and 8 
Workers and 1 word source. The command line that we are using is:
bin/run-example org.apache.spark.streaming.examples.JavaNetworkWordCount 
spark://192.168.0.13:7077
 
On Apr 30, 2014, at 0:09, Tathagata Das tathagata.das1...@gmail.com wrote:

 Is you batch size 30 seconds by any chance? 
 
 Assuming not, please check whether you are creating the streaming context 
 with master local[n] where n  2. With local or local[1], the system 
 only has one processing slot, which is occupied by the receiver leaving no 
 room for processing the received data. It could be that after 30 seconds, the 
 server disconnects, the receiver terminates, releasing the single slot for 
 the processing to proceed. 
 
 TD
 
 
 On Tue, Apr 29, 2014 at 2:28 PM, Eduardo Costa Alfaia 
 e.costaalf...@unibs.it wrote:
 Hi TD,
 
 In my tests with spark streaming, I'm using JavaNetworkWordCount(modified) 
 code and a program that I wrote that sends words to the Spark worker, I use 
 TCP as transport. I verified that after starting Spark, it connects to my 
 source which actually starts sending, but the first word count is advertised 
 approximately 30 seconds after the context creation. So I'm wondering where 
 is stored the 30 seconds data already sent by the source. Is this a normal 
 spark’s behaviour? I saw the same behaviour using the shipped 
 JavaNetworkWordCount application.
 
 Many thanks.
 --
 Informativa sulla Privacy: http://www.unibs.it/node/8155
 


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: reduceByKeyAndWindow Java

2014-04-07 Thread Eduardo Costa Alfaia

Hi TD
Could you explain me this code part?

.reduceByKeyAndWindow(
109 new Function2Integer, Integer, Integer() {
110   public Integer call(Integer i1, Integer i2) { return i1 + 
i2; }

111 },
112 new Function2Integer, Integer, Integer() {
113   public Integer call(Integer i1, Integer i2) { return i1 - 
i2; }

114 },
115 new Duration(60 * 5 * 1000),
116 new Duration(1 * 1000)
117   );

Thanks

Em 4/4/14, 22:56, Tathagata Das escreveu:
I havent really compiled the code, but it looks good to me. Why? Is 
there any problem you are facing?


TD


On Fri, Apr 4, 2014 at 8:03 AM, Eduardo Costa Alfaia 
e.costaalf...@unibs.it mailto:e.costaalf...@unibs.it wrote:



Hi guys,

I would like knowing if the part of code is right to use in Window.

JavaPairDStreamString, Integer wordCounts = words.map(
103   new PairFunctionString, String, Integer() {
104 @Override
105 public Tuple2String, Integer call(String s) {
106   return new Tuple2String, Integer(s, 1);
107 }
108   }).reduceByKeyAndWindow(
109 new Function2Integer, Integer, Integer() {
110   public Integer call(Integer i1, Integer i2) { return
i1 + i2; }
111 },
112 new Function2Integer, Integer, Integer() {
113   public Integer call(Integer i1, Integer i2) { return
i1 - i2; }
114 },
115 new Duration(60 * 5 * 1000),
116 new Duration(1 * 1000)
117   );



Thanks

-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155






--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Driver Out of Memory

2014-04-07 Thread Eduardo Costa Alfaia

Hi Guys,

I would like understanding why the Driver's RAM goes down, Does the 
processing occur only in the workers?

Thanks
# Start Tests
computer1(Worker/Source Stream)
 23:57:18 up 12:03,  1 user,  load average: 0.03, 0.31, 0.44
 total   used   free sharedbuffers cached
Mem:  3945   1084   2860  0 44827
-/+ buffers/cache:212   3732
Swap:0  0  0
computer8 (Driver/Master)
 23:57:18 up 11:53,  5 users,  load average: 0.43, 1.19, 1.31
 total   used   free sharedbuffers cached
Mem:  5897   4430   1466  0 384   2662
-/+ buffers/cache:   1382   4514
Swap:0  0  0
computer10(Worker/Source Stream)
 23:57:18 up 12:02,  1 user,  load average: 0.55, 1.34, 0.98
 total   used   free sharedbuffers cached
Mem:  5897564   5332  0 18358
-/+ buffers/cache:187   5709
Swap:0  0  0
computer11(Worker/Source Stream)
 23:57:18 up 12:02,  1 user,  load average: 0.07, 0.19, 0.29
 total   used   free sharedbuffers cached
Mem:  3945603   3342  0 54355
-/+ buffers/cache:193   3751
Swap:0  0  0

 After 2 Minutes

computer1
 00:06:41 up 12:12,  1 user,  load average: 3.11, 1.32, 0.73
 total   used   free sharedbuffers cached
Mem:  3945   2950994  0 46 1095
-/+ buffers/cache:   1808   2136
Swap:0  0  0
computer8(Driver/Master)
 00:06:41 up 12:02,  5 users,  load average: 1.16, 0.71, 0.96
 total   used   free sharedbuffers cached
Mem:  5897   5191705  0 385   2792
-/+ buffers/cache:   2014   3882
Swap:0  0  0
computer10
 00:06:41 up 12:11,  1 user,  load average: 2.02, 1.07, 0.89
 total   used   free sharedbuffers cached
Mem:  5897   2567   3329  0 21647
-/+ buffers/cache:   1898   3998
Swap:0  0  0
computer11
 00:06:42 up 12:12,  1 user,  load average: 3.96, 1.83, 0.88
 total   used   free sharedbuffers cached
Mem:  3945   3542402  0 57 1099
-/+ buffers/cache:   2385   1559
Swap:0  0  0


--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Explain Add Input

2014-04-04 Thread Eduardo Costa Alfaia

Hi all,

Could anyone explain me about the lines below?

computer1 - worker
computer8 - driver(master)

14/04/04 14:24:56 INFO BlockManagerMasterActor$BlockManagerInfo: Added 
input-0-1396614314800 in memory on computer1.ant-net:60820 (size: 1262.5 
KB, free: 540.3 MB)
14/04/04 14:24:56 INFO MemoryStore: ensureFreeSpace(1292780) called with 
curMem=49555672, maxMem=825439027
14/04/04 14:24:56 INFO MemoryStore: Block input-0-1396614314800 stored 
as bytes to memory (size 1262.5 KB, free 738.7 MB)
14/04/04 14:24:56 INFO BlockManagerMasterActor$BlockManagerInfo: Added 
input-0-1396614314800 in memory on computer8.ant-net:49743 (size: 1262.5 
KB, free: 738.7 MB)


Why does spark add the same input in computer8, which is the Driver(master)?

Thanks guys

--
Informativa sulla Privacy: http://www.unibs.it/node/8155


RAM high consume

2014-04-04 Thread Eduardo Costa Alfaia

 Hi all,

I am doing some tests using JavaNetworkWordcount and I have some 
questions about the performance machine, my tests' time are 
approximately 2 min.


Why does the RAM Memory decrease meaningly? I have done tests with 2, 3 
machines and I had gotten the same behavior.


What should I do to get a better performance in this case?


# Star Test

computer1
 total   used   free sharedbuffers cached
Mem:  3945711 3233 0  3430
-/+ buffers/cache:276   3668
Swap:0  0  0

 14:42:50 up 73 days,  3:32,  2 users,  load average: 0.00, 0.06, 0.21


14/04/04 14:24:56 INFO BlockManagerMasterActor$BlockManagerInfo: Added 
input-0-1396614314400 in memory on computer1.ant-net:60820 (size: 826.1 
KB, free: 542.9 MB)
14/04/04 14:24:56 INFO MemoryStore: ensureFreeSpace(845956) called with 
curMem=47278100, maxMem=825439027
14/04/04 14:24:56 INFO MemoryStore: Block input-0-1396614314400 stored 
as bytes to memory (size 826.1 KB, free 741.3 MB)
14/04/04 14:24:56 INFO BlockManagerMasterActor$BlockManagerInfo: Added 
input-0-1396614314400 in memory on computer8.ant-net:49743 (size: 826.1 
KB, free: 741.3 MB)
14/04/04 14:24:56 INFO BlockManagerMaster: Updated info of block 
input-0-1396614314400
14/04/04 14:24:56 INFO TaskSetManager: Finished TID 272 in 84 ms on 
computer1.ant-net (progress: 0/1)

14/04/04 14:24:56 INFO TaskSchedulerImpl: Remove TaskSet 43.0 from pool
14/04/04 14:24:56 INFO DAGScheduler: Completed ResultTask(43, 0)
14/04/04 14:24:56 INFO DAGScheduler: Stage 43 (take at 
DStream.scala:594) finished in 0.088 s
14/04/04 14:24:56 INFO SparkContext: Job finished: take at 
DStream.scala:594, took 1.872875734 s

---
Time: 1396614289000 ms
---
(Santiago,1)
(liveliness,1)
(Sun,1)
(reapers,1)
(offer,3)
(BARBER,3)
(shrewdness,1)
(truism,1)
(hits,1)
(merchant,1)



# End Test
computer1
 total   used   free sharedbuffers cached
Mem:  3945   2209 1735 0  5773
-/+ buffers/cache:   1430   2514
Swap:0  0  0

 14:46:05 up 73 days,  3:35,  2 users,  load average: 2.69, 1.07, 0.55


14/04/04 14:26:57 INFO TaskSetManager: Starting task 183.0:0 as TID 696 
on executor 0: computer1.ant-net (PROCESS_LOCAL)
14/04/04 14:26:57 INFO TaskSetManager: Serialized task 183.0:0 as 1981 
bytes in 0 ms
14/04/04 14:26:57 INFO MapOutputTrackerMasterActor: Asked to send map 
output locations for shuffle 81 to sp...@computer1.ant-net:44817
14/04/04 14:26:57 INFO MapOutputTrackerMaster: Size of output statuses 
for shuffle 81 is 212 bytes
14/04/04 14:26:57 INFO BlockManagerMasterActor$BlockManagerInfo: Added 
input-0-1396614336600 on disk on computer1.ant-net:60820 (size: 1441.7 KB)
14/04/04 14:26:57 INFO BlockManagerMasterActor$BlockManagerInfo: Added 
input-0-1396614435200 in memory on computer1.ant-net:60820 (size: 1295.7 
KB, free: 589.3 KB)
14/04/04 14:26:57 INFO TaskSetManager: Finished TID 696 in 56 ms on 
computer1.ant-net (progress: 0/1)

14/04/04 14:26:57 INFO TaskSchedulerImpl: Remove TaskSet 183.0 from pool
14/04/04 14:26:57 INFO DAGScheduler: Completed ResultTask(183, 0)
14/04/04 14:26:57 INFO DAGScheduler: Stage 183 (take at 
DStream.scala:594) finished in 0.057 s
14/04/04 14:26:57 INFO SparkContext: Job finished: take at 
DStream.scala:594, took 1.575268894 s

---
Time: 1396614359000 ms
---
(hapless,9)
(reapers,8)
(amazed,113)
(feebleness,7)
(offer,148)
(rabble,27)
(exchanging,7)
(merchant,20)
(incentives,2)
(quarrel,48)
...


Thanks Guys

--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Driver increase memory utilization

2014-04-04 Thread Eduardo Costa Alfaia

   Hi Guys,

Could anyone help me understand this driver behavior when I start the 
JavaNetworkWordCount?


computer8
 16:24:07 up 121 days, 22:21, 12 users,  load average: 0.66, 1.27, 1.55
total   used   free shared buffers 
cached

Mem:  5897   4341 1555 0227   2798
-/+ buffers/cache:   1315   4581
Swap:0  0  0

in 2 minutes

computer8
 16:23:08 up 121 days, 22:20, 12 users,  load average: 0.80, 1.43, 1.62
 total   used   free shared buffers 
cached

Mem:  5897   5866 30 0230   3255
-/+ buffers/cache:   2380   3516
Swap:0  0  0


Thanks

--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Parallelism level

2014-04-04 Thread Eduardo Costa Alfaia

Hi all,

I have put this line in my spark-env.sh:
-Dspark.default.parallelism=20

 this parallelism level, is it correct?
 The machine's processor is a dual core.

Thanks

--
Informativa sulla Privacy: http://www.unibs.it/node/8155


RAM Increase

2014-04-04 Thread Eduardo Costa Alfaia

Hi Guys,

Could anyone explain me this behavior? After 2 min of tests

computer1- worker
computer10 - worker
computer8 - driver(master)

computer1
 18:24:31 up 73 days,  7:14,  1 user,  load average: 3.93, 2.45, 1.14
   total   used   free shared buffers 
cached

Mem:  3945   3925 19 0 18   1368
-/+ buffers/cache:   2539   1405
Swap:0  0  0
computer10
 18:22:38 up 44 days, 21:26,  2 users,  load average: 3.05, 2.20, 1.03
 total   used   free shared buffers 
cached

Mem:  5897   5292 604 0 46   2707
-/+ buffers/cache:   2538   3358
Swap:0  0  0
computer8
 18:24:13 up 122 days, 22 min, 13 users,  load average: 1.10, 0.93, 0.82
 total   used   free shared buffers 
cached

Mem:  5897   5841 55 0113   2747
-/+ buffers/cache:   2980   2916
Swap:0  0  0

Thanks Guys

--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Parallelism level

2014-04-04 Thread Eduardo Costa Alfaia

What do you advice me Nicholas?

Em 4/4/14, 19:05, Nicholas Chammas escreveu:
If you're running on one machine with 2 cores, I believe all you can 
get out of it are 2 concurrent tasks at any one time. So setting your 
default parallelism to 20 won't help.



On Fri, Apr 4, 2014 at 11:41 AM, Eduardo Costa Alfaia 
e.costaalf...@unibs.it mailto:e.costaalf...@unibs.it wrote:


Hi all,

I have put this line in my spark-env.sh:
-Dspark.default.parallelism=20

 this parallelism level, is it correct?
 The machine's processor is a dual core.

Thanks

-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155






--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: reduceByKeyAndWindow Java

2014-04-04 Thread Eduardo Costa Alfaia

Hi Tathagata,

You are right, this code compile, but I am some problems with high 
memory consummation, I sent today some email about this, but no response 
until now.


Thanks
Em 4/4/14, 22:56, Tathagata Das escreveu:
I havent really compiled the code, but it looks good to me. Why? Is 
there any problem you are facing?


TD


On Fri, Apr 4, 2014 at 8:03 AM, Eduardo Costa Alfaia 
e.costaalf...@unibs.it mailto:e.costaalf...@unibs.it wrote:



Hi guys,

I would like knowing if the part of code is right to use in Window.

JavaPairDStreamString, Integer wordCounts = words.map(
103   new PairFunctionString, String, Integer() {
104 @Override
105 public Tuple2String, Integer call(String s) {
106   return new Tuple2String, Integer(s, 1);
107 }
108   }).reduceByKeyAndWindow(
109 new Function2Integer, Integer, Integer() {
110   public Integer call(Integer i1, Integer i2) { return
i1 + i2; }
111 },
112 new Function2Integer, Integer, Integer() {
113   public Integer call(Integer i1, Integer i2) { return
i1 - i2; }
114 },
115 new Duration(60 * 5 * 1000),
116 new Duration(1 * 1000)
117   );



Thanks

-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155






--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Print line in JavaNetworkWordCount

2014-04-02 Thread Eduardo Costa Alfaia

Hi Guys

I would like printing the content inside of line in :

JavaDStreamString lines = ssc.socketTextStream(args[1], 
Integer.parseInt(args[2]));
JavaDStreamString words = lines.flatMap(new 
FlatMapFunctionString, String() {

  @Override
  public IterableString call(String x) {
return Lists.newArrayList(x.split( ));
  }
});

Is it possible? How could I do?

Thanks

--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Re: Change print() in JavaNetworkWordCount

2014-03-27 Thread Eduardo Costa Alfaia

Thank you very much  Sourav

BR

Em 3/26/14, 17:29, Sourav Chandra escreveu:

def print() {
def foreachFunc = (rdd: RDD[T], time: Time) = {
  val total = rdd.collect().toList
  println (---)
  println (Time:  + time)
  println (---)
  total.foreach(println)
//  val first11 = rdd.take(11)
//  println (---)
//  println (Time:  + time)
//  println (---)
//  first11.take(10).foreach(println)
//  if (first11.size  10) println(...)
  println()
}
new ForEachDStream(this, 
context.sparkContext.clean(foreachFunc)).register()

  }



--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Change print() in JavaNetworkWordCount

2014-03-25 Thread Eduardo Costa Alfaia

Hi Guys,
I think that I already did this question, but I don't remember if anyone 
has answered me. I would like changing in the function print() the 
quantity of words and the frequency number that are sent to driver's 
screen. The default value is 10.


Anyone could help me with this?

Best Regards

--
Informativa sulla Privacy: http://www.unibs.it/node/8155


Log Analyze

2014-03-10 Thread Eduardo Costa Alfaia

Hi Guys,
Could anyone help me to understand this piece of log in red? Why is this 
happened?


Thanks

14/03/10 16:55:20 INFO SparkContext: Starting job: first at 
NetworkWordCount.scala:87
14/03/10 16:55:20 INFO JobScheduler: Finished job streaming job 
1394466892000 ms.0 from job set of time 1394466892000 ms
14/03/10 16:55:20 INFO JobScheduler: Total delay: 28.537 s for time 
1394466892000 ms (execution: 4.479 s)
14/03/10 16:55:20 INFO JobScheduler: Starting job streaming job 
1394466893000 ms.0 from job set of time 1394466893000 ms
14/03/10 16:55:20 INFO JobGenerator: Checkpointing graph for time 
1394466892000 ms
14/03/10 16:55:20 INFO DStreamGraph: Updating checkpoint data for time 
1394466892000 ms
14/03/10 16:55:20 INFO DStreamGraph: Updated checkpoint data for time 
1394466892000 ms
14/03/10 16:55:20 INFO CheckpointWriter: Saving checkpoint for time 
1394466892000 ms to file 
'hdfs://computer8:54310/user/root/INPUT/checkpoint-1394466892000'
14/03/10 16:55:20 INFO DAGScheduler: Registering RDD 496 (combineByKey 
at ShuffledDStream.scala:42)
14/03/10 16:55:20 INFO DAGScheduler: Got job 39 (first at 
NetworkWordCount.scala:87) with 1 output partitions (allowLocal=true)
14/03/10 16:55:20 INFO DAGScheduler: Final stage: Stage 77 (first at 
NetworkWordCount.scala:87)

14/03/10 16:55:20 INFO DAGScheduler: Parents of final stage: List(Stage 78)
14/03/10 16:55:20 INFO DAGScheduler: Missing parents: List(Stage 78)
14/03/10 16:55:20 INFO BlockManagerMasterActor$BlockManagerInfo: Removed 
input-1-1394466782400 on computer10.ant-net:34062 in memory (size: 5.9 
MB, free: 502.2 MB)
14/03/10 16:55:20 INFO DAGScheduler: Submitting Stage 78 
(MapPartitionsRDD[496] at combineByKey at ShuffledDStream.scala:42), 
which has no missing parents
14/03/10 16:55:20 INFO BlockManagerMasterActor$BlockManagerInfo: Added 
input-1-1394466816600 in memory on computer10.ant-net:34062 (size: 4.4 
MB, free: 497.8 MB)
14/03/10 16:55:20 INFO DAGScheduler: Submitting 15 missing tasks from 
Stage 78 (MapPartitionsRDD[496] at combineByKey at ShuffledDStream.scala:42)

14/03/10 16:55:20 INFO TaskSchedulerImpl: Adding task set 78.0 with 15 tasks
14/03/10 16:55:20 INFO TaskSetManager: Starting task 78.0:9 as TID 539 
on executor 2: computer1.ant-net (PROCESS_LOCAL)
14/03/10 16:55:20 INFO TaskSetManager: Serialized task 78.0:9 as 4144 
bytes in 1 ms
14/03/10 16:55:20 INFO TaskSetManager: Starting task 78.0:10 as TID 540 
on executor 1: computer10.ant-net (PROCESS_LOCAL)
14/03/10 16:55:20 INFO TaskSetManager: Serialized task 78.0:10 as 4144 
bytes in 0 ms
14/03/10 16:55:20 INFO TaskSetManager: Starting task 78.0:11 as TID 541 
on executor 0: computer11.ant-net (PROCESS_LOCAL)
14/03/10 16:55:20 INFO TaskSetManager: Serialized task 78.0:11 as 4144 
bytes in 0 ms
14/03/10 16:55:20 INFO BlockManagerMasterActor$BlockManagerInfo: Removed 
input-0-1394466874200 on computer1.ant-net:51406 in memory (size: 2.9 
MB, free: 460.0 MB)
14/03/10 16:55:20 INFO BlockManagerMasterActor$BlockManagerInfo: Removed 
input-0-1394466874400 on computer1.ant-net:51406 in memory (size: 4.1 
MB, free: 468.2 MB)
14/03/10 16:55:20 INFO TaskSetManager: Starting task 78.0:12 as TID 542 
on executor 1: computer10.ant-net (PROCESS_LOCAL)
14/03/10 16:55:20 INFO TaskSetManager: Serialized task 78.0:12 as 4144 
bytes in 1 ms

14/03/10 16:55:20 WARN TaskSetManager: Lost TID 540 (task 78.0:10)
14/03/10 16:55:20 INFO CheckpointWriter: Deleting 
hdfs://computer8:54310/user/root/INPUT/checkpoint-1394466892000
14/03/10 16:55:20 INFO CheckpointWriter: Checkpoint for time 
1394466892000 ms saved to file 
'hdfs://computer8:54310/user/root/INPUT/checkpoint-1394466892000', took 
3633 bytes and 93 ms
14/03/10 16:55:20 INFO DStreamGraph: Clearing checkpoint data for time 
1394466892000 ms
14/03/10 16:55:20 INFO DStreamGraph: Cleared checkpoint data for time 
1394466892000 ms
14/03/10 16:55:20 INFO BlockManagerMasterActor$BlockManagerInfo: Removed 
input-2-1394466789000 on computer11.ant-net:58332 in memory (size: 3.9 
MB, free: 536.0 MB)

14/03/10 16:55:20 WARN TaskSetManager: Loss was due to java.lang.Exception
java.lang.Exception: Could not compute split, block 
input-2-1394466794200 not found

at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:45)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.rdd.UnionPartition.iterator(UnionRDD.scala:32)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:72)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at 

Re: Explain About Logs NetworkWordcount.scala

2014-03-09 Thread Eduardo Costa Alfaia

Yes TD,
I can use tcpdump to see if the data are being accepted by the receiver 
and if else them are arriving into the IP packet.


Thanks
Em 3/8/14, 4:19, Tathagata Das escreveu:
I am not sure how to debug this without any more information about the 
source. Can you monitor on the receiver side that data is being 
accepted by the receiver but not reported?


TD


On Wed, Mar 5, 2014 at 7:23 AM, eduardocalfaia e.costaalf...@unibs.it 
mailto:e.costaalf...@unibs.it wrote:


Hi TD,
I have seen in the web UI the stage number that result has been
zero and in
the field GC Times there is nothing.

http://apache-spark-user-list.1001560.n3.nabble.com/file/n2306/CaptureStage.png



--
View this message in context:

http://apache-spark-user-list.1001560.n3.nabble.com/Explain-About-Logs-NetworkWordcount-scala-tp1835p2306.html
Sent from the Apache Spark User List mailing list archive at
Nabble.com.





--
Informativa sulla Privacy: http://www.unibs.it/node/8155