Re: Can we delete topic in kafka
Hi, It’s better creating a script that delete the kafka folder where exist the kafka topic and after create it again if need. BR Eduardo Costa Alfaia Ph.D. Student in Telecommunications Engineering Università degli Studi di Brescia Tel: +39 3209333018 On 5/11/16, 09:48, "Snehalata Nagaje" <snehalata.nag...@harbingergroup.com> wrote: > > >Hi , > >Can we delete certain topic in kafka? > >I have deleted using command > >./kafka-topics.sh --delete --topic topic_billing --zookeeper localhost:2181 > >It says topic marked as deletion, but it does not actually delete topic. > >Thanks, >Snehalata -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: Installing Spark on Mac
Hi Aida, The installation has detected a maven version 3.0.3. Update to 3.3.3 and try again. Il 08/Mar/2016 14:06, "Aida"ha scritto: > Hi all, > > Thanks everyone for your responses; really appreciate it. > > Eduardo - I tried your suggestions but ran into some issues, please see > below: > > ukdrfs01:Spark aidatefera$ cd spark-1.6.0 > ukdrfs01:spark-1.6.0 aidatefera$ build/mvn -DskipTests clean package > Using `mvn` from path: /usr/bin/mvn > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option > MaxPermSize=512M; > support was removed in 8.0 > [INFO] Scanning for projects... > [INFO] > > [INFO] Reactor Build Order: > [INFO] > [INFO] Spark Project Parent POM > [INFO] Spark Project Test Tags > [INFO] Spark Project Launcher > [INFO] Spark Project Networking > [INFO] Spark Project Shuffle Streaming Service > [INFO] Spark Project Unsafe > [INFO] Spark Project Core > [INFO] Spark Project Bagel > [INFO] Spark Project GraphX > [INFO] Spark Project Streaming > [INFO] Spark Project Catalyst > [INFO] Spark Project SQL > [INFO] Spark Project ML Library > [INFO] Spark Project Tools > [INFO] Spark Project Hive > [INFO] Spark Project Docker Integration Tests > [INFO] Spark Project REPL > [INFO] Spark Project Assembly > [INFO] Spark Project External Twitter > [INFO] Spark Project External Flume Sink > [INFO] Spark Project External Flume > [INFO] Spark Project External Flume Assembly > [INFO] Spark Project External MQTT > [INFO] Spark Project External MQTT Assembly > [INFO] Spark Project External ZeroMQ > [INFO] Spark Project External Kafka > [INFO] Spark Project Examples > [INFO] Spark Project External Kafka Assembly > [INFO] > [INFO] > > [INFO] Building Spark Project Parent POM 1.6.0 > [INFO] > > [INFO] > [INFO] --- maven-clean-plugin:2.6.1:clean (default-clean) @ > spark-parent_2.10 --- > [INFO] > [INFO] --- maven-enforcer-plugin:1.4:enforce (enforce-versions) @ > spark-parent_2.10 --- > [WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion > failed with message: > Detected Maven Version: 3.0.3 is not in the allowed range 3.3.3. > [INFO] > > [INFO] Reactor Summary: > [INFO] > [INFO] Spark Project Parent POM .. FAILURE [0.821s] > [INFO] Spark Project Test Tags ... SKIPPED > [INFO] Spark Project Launcher SKIPPED > [INFO] Spark Project Networking .. SKIPPED > [INFO] Spark Project Shuffle Streaming Service ... SKIPPED > [INFO] Spark Project Unsafe .. SKIPPED > [INFO] Spark Project Core SKIPPED > [INFO] Spark Project Bagel ... SKIPPED > [INFO] Spark Project GraphX .. SKIPPED > [INFO] Spark Project Streaming ... SKIPPED > [INFO] Spark Project Catalyst SKIPPED > [INFO] Spark Project SQL . SKIPPED > [INFO] Spark Project ML Library .. SKIPPED > [INFO] Spark Project Tools ... SKIPPED > [INFO] Spark Project Hive SKIPPED > [INFO] Spark Project Docker Integration Tests SKIPPED > [INFO] Spark Project REPL SKIPPED > [INFO] Spark Project Assembly SKIPPED > [INFO] Spark Project External Twitter SKIPPED > [INFO] Spark Project External Flume Sink . SKIPPED > [INFO] Spark Project External Flume .. SKIPPED > [INFO] Spark Project External Flume Assembly . SKIPPED > [INFO] Spark Project External MQTT ... SKIPPED > [INFO] Spark Project External MQTT Assembly .. SKIPPED > [INFO] Spark Project External ZeroMQ . SKIPPED > [INFO] Spark Project External Kafka .. SKIPPED > [INFO] Spark Project Examples SKIPPED > [INFO] Spark Project External Kafka Assembly . SKIPPED > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 1.745s > [INFO] Finished at: Tue Mar 08 18:01:48 GMT 2016 > [INFO] Final Memory: 19M/183M > [INFO] > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-enforcer-plugin:1.4:enforce > (enforce-versions) on project spark-parent_2.10: Some Enforcer rules have > failed. Look above for specific messages explaining why
Re: Installing Spark on Mac
Hi Aida Run only "build/mvn -DskipTests clean package” BR Eduardo Costa Alfaia Ph.D. Student in Telecommunications Engineering Università degli Studi di Brescia Tel: +39 3209333018 On 3/4/16, 16:18, "Aida" <aida1.tef...@gmail.com> wrote: >Hi all, > >I am a complete novice and was wondering whether anyone would be willing to >provide me with a step by step guide on how to install Spark on a Mac; on >standalone mode btw. > >I downloaded a prebuilt version, the second version from the top. However, I >have not installed Hadoop and am not planning to at this stage. > >I also downloaded Scala from the Scala website, do I need to download >anything else? > >I am very eager to learn more about Spark but am unsure about the best way >to do it. > >I would be happy for any suggestions or ideas > >Many thanks, > >Aida > > > >-- >View this message in context: >http://apache-spark-user-list.1001560.n3.nabble.com/Installing-Spark-on-Mac-tp26397.html >Sent from the Apache Spark User List mailing list archive at Nabble.com. > >- >To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >For additional commands, e-mail: user-h...@spark.apache.org > -- Informativa sulla Privacy: http://www.unibs.it/node/8155 - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Accessing Web UI
Hi, try http://OAhtvJ5MCA:8080 BR On 2/19/16, 07:18, "vasbhat"wrote: >OAhtvJ5MCA -- Informativa sulla Privacy: http://www.unibs.it/node/8155 - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Using SPARK packages in Spark Cluster
Hi Gourav, I did a prove as you said, for me it’s working, I am using spark in local mode, master and worker in the same machine. I run the example in spark-shell —package com.databricks:spark-csv_2.10:1.3.0 without errors. BR From: Gourav SenguptaDate: Monday, February 15, 2016 at 10:03 To: Jorge Machado Cc: Spark Group Subject: Re: Using SPARK packages in Spark Cluster Hi Jorge/ All, Please please please go through this link http://spark.apache.org/docs/latest/spark-standalone.html. The link tells you how to start a SPARK cluster in local mode. If you have not started or worked in SPARK cluster in local mode kindly do not attempt in answering this question. My question is how to use packages like https://github.com/databricks/spark-csv when I using SPARK cluster in local mode. Regards, Gourav Sengupta On Mon, Feb 15, 2016 at 1:55 PM, Jorge Machado wrote: Hi Gourav, I did not unterstand your problem… the - - packages command should not make any difference if you are running standalone or in YARN for example. Give us an example what packages are you trying to load, and what error are you getting… If you want to use the libraries in spark-packages.org without the --packages why do you not use maven ? Regards On 12/02/2016, at 13:22, Gourav Sengupta wrote: Hi, I am creating sparkcontext in a SPARK standalone cluster as mentioned here: http://spark.apache.org/docs/latest/spark-standalone.html using the following code: -- sc.stop() conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False) \ .setMaster("spark://hostname:7077") \ .set('spark.shuffle.service.enabled', True) \ .set('spark.dynamicAllocation.enabled','true') \ .set('spark.executor.memory','20g') \ .set('spark.driver.memory', '4g') \ .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 )) conf.getAll() sc = SparkContext(conf = conf) -(we should definitely be able to optimise the configuration but that is not the point here) --- I am not able to use packages, a list of which is mentioned here http://spark-packages.org, using this method. Where as if I use the standard "pyspark --packages" option then the packages load just fine. I will be grateful if someone could kindly let me know how to load packages when starting a cluster as mentioned above. Regards, Gourav Sengupta -- Informativa sulla Privacy: http://www.unibs.it/node/8155
unsubscribe email
Hi Guys, How could I unsubscribe the email e.costaalf...@studenti.unibs.it, that is an alias from my email e.costaalf...@unibs.it and it is registered in the mail list . Thanks Eduardo Costa Alfaia PhD Student Telecommunication Engineering Università degli Studi di Brescia-UNIBS -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Queue Full
Hi Guys, How could I solving this problem? % Failed to produce message: Local: Queue full % Failed to produce message: Local: Queue full Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: Queue Full
Hi Magnus I think this answer c) producing messages at a higher rate than the network or broker can handle How could I manager this? > On 26 Oct 2015, at 17:45, Magnus Edenhillwrote: > > c) producing messages at a higher rate than the network or broker can > handle -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: Similar code in Java
Thanks Ted. On Feb 10, 2015, at 20:06, Ted Yu yuzhih...@gmail.com wrote: Please take a look at: examples/scala-2.10/src/main/java/org/apache/spark/examples/streaming/JavaDirectKafkaWordCount.java which was checked in yesterday. On Sat, Feb 7, 2015 at 10:53 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it mailto:e.costaalf...@unibs.it wrote: Hi Ted, I’ve seen the codes, I am using JavaKafkaWordCount.java but I would like reproducing in java that I’ve done in scala. Is it possible doing the same thing that scala code does in java? Principally this code below or something looks liked: val KafkaDStreams = (1 to numStreams) map {_ = KafkaUtils.createStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicMap,storageLevel = StorageLevel.MEMORY_ONLY).map(_._2) On Feb 7, 2015, at 19:32, Ted Yu yuzhih...@gmail.com mailto:yuzhih...@gmail.com wrote: Can you take a look at: ./examples/scala-2.10/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java ./external/kafka/src/test/java/org/apache/spark/streaming/kafka/JavaKafkaStreamSuite.java Cheers On Sat, Feb 7, 2015 at 9:45 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it mailto:e.costaalf...@unibs.it wrote: Hi Guys, How could I doing in Java the code scala below? val KafkaDStreams = (1 to numStreams) map {_ = KafkaUtils.createStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicMap,storageLevel = StorageLevel.MEMORY_ONLY).map(_._2) } val unifiedStream = ssc.union(KafkaDStreams) val sparkProcessingParallelism = 1 unifiedStream.repartition(sparkProcessingParallelism) Thanks Guys Informativa sulla Privacy: http://www.unibs.it/node/8155 http://www.unibs.it/node/8155 Informativa sulla Privacy: http://www.unibs.it/node/8155 http://www.unibs.it/node/8155 -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Doubts Kafka
Hi Guys, I have some doubts about the Kafka, the first is Why sometimes the applications prefer to connect to zookeeper instead brokers? Connecting to zookeeper could create an overhead, because we are inserting other element between producer and consumer. Another question is about the information sent by producer, in my tests the producer send the messages to brokers and a few minutes my HardDisk is full (my harddisk has 250GB), is there something to do in the configuration to minimize this? Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Similar code in Java
Hi Guys, How could I doing in Java the code scala below? val KafkaDStreams = (1 to numStreams) map {_ = KafkaUtils.createStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicMap,storageLevel = StorageLevel.MEMORY_ONLY).map(_._2) } val unifiedStream = ssc.union(KafkaDStreams) val sparkProcessingParallelism = 1 unifiedStream.repartition(sparkProcessingParallelism) Thanks Guys -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Error KafkaStream
Hi Guys, I’m getting this error in KafkaWordCount; TaskSetManager: Lost task 0.0 in stage 4095.0 (TID 1281, 10.20.10.234): java.lang.ClassCastException: [B cannot be cast to java.lang.String at org.apache.spark.examples.streaming.KafkaWordCount$$anonfun$4$$anonfun$apply$1.apply(KafkaWordCount.scala:7 Some idea that could be? Bellow the piece of code val kafkaStream = { val kafkaParams = Map[String, String]( zookeeper.connect - achab3:2181, group.id - mygroup, zookeeper.connect.timeout.ms - 1, kafka.fetch.message.max.bytes - 400, auto.offset.reset - largest) val topicpMap = topics.split(,).map((_,numThreads.toInt)).toMap //val lines = KafkaUtils.createStream[String, String, DefaultDecoder, DefaultDecoder](ssc, kafkaParams, topicpMa p, storageLevel = StorageLevel.MEMORY_ONLY_SER).map(_._2) val KafkaDStreams = (1 to numStreams).map {_ = KafkaUtils.createStream[String, String, DefaultDecoder, DefaultDecoder](ssc, kafkaParams, topicpMap, storageLe vel = StorageLevel.MEMORY_ONLY_SER).map(_._2) } val unifiedStream = ssc.union(KafkaDStreams) unifiedStream.repartition(sparkProcessingParallelism) } Thanks Guys -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: Error KafkaStream
I don’t think so Sean. On Feb 5, 2015, at 16:57, Sean Owen so...@cloudera.com wrote: Is SPARK-4905 / https://github.com/apache/spark/pull/4371/files the same issue? On Thu, Feb 5, 2015 at 7:03 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it wrote: Hi Guys, I’m getting this error in KafkaWordCount; TaskSetManager: Lost task 0.0 in stage 4095.0 (TID 1281, 10.20.10.234): java.lang.ClassCastException: [B cannot be cast to java.lang.String at org.apache.spark.examples.streaming.KafkaWordCount$$anonfun$4$$anonfun$apply$1.apply(KafkaWordCount.scala:7 Some idea that could be? Bellow the piece of code val kafkaStream = { val kafkaParams = Map[String, String]( zookeeper.connect - achab3:2181, group.id - mygroup, zookeeper.connect.timeout.ms - 1, kafka.fetch.message.max.bytes - 400, auto.offset.reset - largest) val topicpMap = topics.split(,).map((_,numThreads.toInt)).toMap //val lines = KafkaUtils.createStream[String, String, DefaultDecoder, DefaultDecoder](ssc, kafkaParams, topicpMa p, storageLevel = StorageLevel.MEMORY_ONLY_SER).map(_._2) val KafkaDStreams = (1 to numStreams).map {_ = KafkaUtils.createStream[String, String, DefaultDecoder, DefaultDecoder](ssc, kafkaParams, topicpMap, storageLe vel = StorageLevel.MEMORY_ONLY_SER).map(_._2) } val unifiedStream = ssc.union(KafkaDStreams) unifiedStream.repartition(sparkProcessingParallelism) } Thanks Guys Informativa sulla Privacy: http://www.unibs.it/node/8155 -- Informativa sulla Privacy: http://www.unibs.it/node/8155 - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Error KafkaStream
Hi Shao, When I changed to StringDecoder I’ve get this compiling error: [error] /sata_disk/workspace/spark-1.2.0/examples/scala-2.10/src/main/scala/org/apache/spark/examples/streaming/KafkaW ordCount.scala:78: not found: type StringDecoder [error] KafkaUtils.createStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicMap,stora geLevel = StorageLevel.MEMORY_ONLY_SER).map(_._2) [error] ^ [error] /sata_disk/workspace/spark-1.2.0/examples/scala-2.10/src/main/scala/org/apache/spark/examples/streaming/KafkaW ordCount.scala:85: value split is not a member of Nothing [error] val words = unifiedStream.flatMap(_.split( )) [error] ^ [error] /sata_disk/workspace/spark-1.2.0/examples/scala-2.10/src/main/scala/org/apache/spark/examples/streaming/KafkaW ordCount.scala:86: value reduceByKeyAndWindow is not a member of org.apache.spark.streaming.dstream.DStream[(Nothing, Long)] [error] val wordCounts = words.map(x = (x, 1L)).reduceByKeyAndWindow(_ + _, _ - _, Seconds(20), Seconds(10), 2) [error] ^ [error] three errors found [error] (examples/compile:compile) Compilation failed On Feb 6, 2015, at 02:11, Shao, Saisai saisai.s...@intel.com wrote: Hi, I think you should change the `DefaultDecoder` of your type parameter into `StringDecoder`, seems you want to decode the message into String. `DefaultDecoder` is to return Array[Byte], not String, so here class casting will meet error. Thanks Jerry -Original Message- From: Eduardo Costa Alfaia [mailto:e.costaalf...@unibs.it] Sent: Friday, February 6, 2015 12:04 AM To: Sean Owen Cc: user@spark.apache.org Subject: Re: Error KafkaStream I don’t think so Sean. On Feb 5, 2015, at 16:57, Sean Owen so...@cloudera.com wrote: Is SPARK-4905 / https://github.com/apache/spark/pull/4371/files the same issue? On Thu, Feb 5, 2015 at 7:03 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it wrote: Hi Guys, I’m getting this error in KafkaWordCount; TaskSetManager: Lost task 0.0 in stage 4095.0 (TID 1281, 10.20.10.234): java.lang.ClassCastException: [B cannot be cast to java.lang.String at org.apache.spark.examples.streaming.KafkaWordCount$$anonfun$4$$anonfu n$apply$1.apply(KafkaWordCount.scala:7 Some idea that could be? Bellow the piece of code val kafkaStream = { val kafkaParams = Map[String, String]( zookeeper.connect - achab3:2181, group.id - mygroup, zookeeper.connect.timeout.ms - 1, kafka.fetch.message.max.bytes - 400, auto.offset.reset - largest) val topicpMap = topics.split(,).map((_,numThreads.toInt)).toMap //val lines = KafkaUtils.createStream[String, String, DefaultDecoder, DefaultDecoder](ssc, kafkaParams, topicpMa p, storageLevel = StorageLevel.MEMORY_ONLY_SER).map(_._2) val KafkaDStreams = (1 to numStreams).map {_ = KafkaUtils.createStream[String, String, DefaultDecoder, DefaultDecoder](ssc, kafkaParams, topicpMap, storageLe vel = StorageLevel.MEMORY_ONLY_SER).map(_._2) } val unifiedStream = ssc.union(KafkaDStreams) unifiedStream.repartition(sparkProcessingParallelism) } Thanks Guys Informativa sulla Privacy: http://www.unibs.it/node/8155 -- Informativa sulla Privacy: http://www.unibs.it/node/8155 - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Informativa sulla Privacy: http://www.unibs.it/node/8155
KafkaWordCount
Hi Guys, I would like to put in the kafkawordcount scala code the kafka parameter: val kafkaParams = Map(“fetch.message.max.bytes” - “400”). I’ve put this variable like this val KafkaDStreams = (1 to numStreams) map {_ = KafkaUtils.createStream(ssc, kafkaParams, zkQuorum, group, topicpMap).map(_._2) However I’ve gotten these erros: (jssc: org.apache.spark.streaming.api.java.JavaStreamingContext,zkQuorum: String,groupId: String,topics: jav a.util.Map[String,Integer],storageLevel: org.apache.spark.storage.StorageLevel)org.apache.spark.streaming.api.java.Jav aPairReceiverInputDStream[String,String] and [error] (ssc: org.apache.spark.streaming.StreamingContext,zkQuorum: String,groupId: String,topics: scala.collection. immutable.Map[String,Int],storageLevel: org.apache.spark.storage.StorageLevel)org.apache.spark.streaming.dstream.Recei verInputDStream[(String, String)] Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Error Compiling
Hi Guys, some idea how solve this error [error] /sata_disk/workspace/spark-1.1.1/examples/src/main/scala/org/apache/spark/examples/streaming/KafkaWordCount.scala:76: missing parameter type for expanded function ((x$6, x$7) = x$6.$plus(x$7)) [error] val wordCounts = words.map(x = (x, 1L)).reduceByWindow(_ + _, _ - _, Minutes(1), Seconds(2), 2) Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155
KafkaWordCount
Hi Guys, I would like to put in the kafkawordcount scala code the kafka parameter: val kafkaParams = Map(“fetch.message.max.bytes” - “400”). I’ve put this variable like this val KafkaDStreams = (1 to numStreams) map {_ = KafkaUtils.createStream(ssc, kafkaParams, zkQuorum, group, topicpMap).map(_._2) However I’ve gotten these erros: (jssc: org.apache.spark.streaming.api.java.JavaStreamingContext,zkQuorum: String,groupId: String,topics: jav a.util.Map[String,Integer],storageLevel: org.apache.spark.storage.StorageLevel)org.apache.spark.streaming.api.java.Jav aPairReceiverInputDStream[String,String] and [error] (ssc: org.apache.spark.streaming.StreamingContext,zkQuorum: String,groupId: String,topics: scala.collection. immutable.Map[String,Int],storageLevel: org.apache.spark.storage.StorageLevel)org.apache.spark.streaming.dstream.Recei verInputDStream[(String, String)] Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Issue size message
Hi All, I am having an issue when using kafka with librdkafka. I've changed the message.max.bytes to 2MB in my server.properties config file, that is the size of my message, when I run the command line ./rdkafka_performance -C -t test -p 0 -b computer49:9092, after consume some messages the consumer remain waiting something that don't arrive. My producer continues sending messages. Some idea? % Using random seed 1421685059, verbosity level 1 % 214 messages and 1042835 bytes consumed in 20ms: 10518 msgs/s and 51.26 Mb/s, no compression % 21788 messages and 106128192 bytes consumed in 1029ms: 21154 msgs/s and 103.04 Mb/s, no compression % 43151 messages and 210185259 bytes consumed in 2030ms: 21252 msgs/s and 103.52 Mb/s, no compression % 64512 messages and 314233575 bytes consumed in 3031ms: 21280 msgs/s and 103.66 Mb/s, no compression % 86088 messages and 419328692 bytes consumed in 4039ms: 21313 msgs/s and 103.82 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 5719ms: 17571 msgs/s and 85.67 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 6720ms: 14955 msgs/s and 72.92 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 7720ms: 13018 msgs/s and 63.47 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 8720ms: 11524 msgs/s and 56.19 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 9720ms: 10339 msgs/s and 50.41 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 10721ms: 9374 msgs/s and 45.71 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 11721ms: 8574 msgs/s and 41.81 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 12721ms: 7900 msgs/s and 38.52 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 13721ms: 7324 msgs/s and 35.71 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 14721ms: 6826 msgs/s and 33.29 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 15722ms: 6392 msgs/s and 31.17 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 16722ms: 6010 msgs/s and 29.30 Mb/s, no The software when consume all offset send me the message: % Consumer reached end of unibs.nec [0] message queue at offset 229790 RD_KAFKA_RESP_ERR__PARTITION_EOF: [-191] However changed de message.max.bytes to 2MB I don’t receive the code from Kafka. Anyone has some idea? Thanks guys. -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: Issue size message
Hi guys, Ok, I’ve proved this and it was fine. Thanks On Jan 19, 2015, at 19:10, Joe Stein joe.st...@stealth.ly wrote: If you increase the size of the messages for producing then you **MUST** also change *replica.fetch.max.bytes i*n the broker* server.properties *otherwise none of your replicas will be able to fetch from the leader and they will all fall out of the ISR. You also then need to change your consumers *fetch.message.max.bytes* in your consumers properties (whoever that might be configured for your specific consumer being used) so that they can read that data otherwise you won't see messages downstream. /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Mon, Jan 19, 2015 at 1:03 PM, Magnus Edenhill mag...@edenhill.se wrote: (duplicating the github answer for reference) Hi Eduardo, the default maximum fetch size is 1 Meg which means your 2 Meg messages will not fit the fetch request. Try increasing it by appending -X fetch.message.max.bytes=400 to your command line. Regards, Magnus 2015-01-19 17:52 GMT+01:00 Eduardo Costa Alfaia e.costaalf...@unibs.it: Hi All, I am having an issue when using kafka with librdkafka. I've changed the message.max.bytes to 2MB in my server.properties config file, that is the size of my message, when I run the command line ./rdkafka_performance -C -t test -p 0 -b computer49:9092, after consume some messages the consumer remain waiting something that don't arrive. My producer continues sending messages. Some idea? % Using random seed 1421685059, verbosity level 1 % 214 messages and 1042835 bytes consumed in 20ms: 10518 msgs/s and 51.26 Mb/s, no compression % 21788 messages and 106128192 bytes consumed in 1029ms: 21154 msgs/s and 103.04 Mb/s, no compression % 43151 messages and 210185259 bytes consumed in 2030ms: 21252 msgs/s and 103.52 Mb/s, no compression % 64512 messages and 314233575 bytes consumed in 3031ms: 21280 msgs/s and 103.66 Mb/s, no compression % 86088 messages and 419328692 bytes consumed in 4039ms: 21313 msgs/s and 103.82 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 5719ms: 17571 msgs/s and 85.67 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 6720ms: 14955 msgs/s and 72.92 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 7720ms: 13018 msgs/s and 63.47 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 8720ms: 11524 msgs/s and 56.19 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 9720ms: 10339 msgs/s and 50.41 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 10721ms: 9374 msgs/s and 45.71 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 11721ms: 8574 msgs/s and 41.81 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 12721ms: 7900 msgs/s and 38.52 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 13721ms: 7324 msgs/s and 35.71 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 14721ms: 6826 msgs/s and 33.29 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 15722ms: 6392 msgs/s and 31.17 Mb/s, no compression % 100504 messages and 490022646 bytes consumed in 16722ms: 6010 msgs/s and 29.30 Mb/s, no The software when consume all offset send me the message: % Consumer reached end of unibs.nec [0] message queue at offset 229790 RD_KAFKA_RESP_ERR__PARTITION_EOF: [-191] However changed de message.max.bytes to 2MB I don’t receive the code from Kafka. Anyone has some idea? Thanks guys. -- Informativa sulla Privacy: http://www.unibs.it/node/8155 -- Informativa sulla Privacy: http://www.unibs.it/node/8155
JavaKafkaWordCount
Hi Guys, I am doing some tests with JavaKafkaWordCount, my cluster is composed by 8 workers and 1 driver con spark-1.1.0, I am using Kafka too and I have some questions about. 1 - When I launch the command: bin/spark-submit --class org.apache.spark.examples.streaming.JavaKafkaWordCount —master spark://computer8:7077 --driver-memory 1g --executor-memory 2g --executor-cores 2 examples/target/scala-2.10/spark-examples-1.1.0-hadoop1.0.4.jar computer49:2181 test-consumer-group test 2 I see in the Spark WebAdmin that only 1 worker work. Why? 2 - In Kafka I can see the same thing: Group Topic Pid Offset logSize Lag Owner test-consumer-group test 0 147092 147092 0 test-consumer-group_computer1-1416319543858-817b566f-0 test-consumer-group test 1 232183 232183 0 test-consumer-group_computer1-1416319543858-817b566f-0 test-consumer-group test 2 186805 186805 0 test-consumer-group_computer1-1416319543858-817b566f-0 test-consumer-group test 3 0 0 0 test-consumer-group_computer1-1416319543858-817b566f-1 test-consumer-group test 4 0 0 0 test-consumer-group_computer1-1416319543858-817b566f-1 test-consumer-group test 5 0 0 0 test-consumer-group_computer1-1416319543858-817b566f-1 I would like to understand this behavior, Is it normal? Am I doing something wrong? Thanks Guys -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Kafka examples
Hi guys, The Kafka’s examples in master branch were canceled? Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155 - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Information
Hi Guys Anyone could explain me this information? 208K), 0.0086120 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] 2014-11-06T12:20:55.673+0100: 1256.382: [GC2014-11-06T12:20:55.674+0100: 1256.382: [ParNew: 551115K-2816K(613440K), 0.0204130 secs] 560218K-13933K(4126208K), 0.0205130 secs] [Times: user=0.09 sys=0.01, real=0.02 secs] 2014-11-06T12:21:03.372+0100: 1264.080: [GC2014-11-06T12:21:03.372+0100: 1264.080: [ParNew: 547827K-1047K(613440K), 0.0073880 secs] 558944K-12473K(4126208K), 0.0074770 secs] [Times: user=0.06 sys=0.00, real=0.00 secs] 2014-11-06T12:21:10.416+0100: 1271.124: [GC2014-11-06T12:21:10.416+0100: 1271.124: [ParNew: 545782K-2266K(613440K), 0.0069530 secs] 557208K-13836K(4126208K), 0.0070420 secs] [Times: user=0.05 sys=0.00, real=0.01 secs] 2014-11-06T12:21:18.307+0100: 1279.015: [GC2014-11-06T12:21:18.307+0100: 1279.015: [ParNew: 546921K-2156K(613440K), 0.0071050 secs] 558491K-13855K(4126208K), 0.0071900 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] 2014-11-06T12:21:26.394+0100: 1287.102: [GC2014-11-06T12:21:26.394+0100: 1287.102: [ParNew: 546237K-3125K(613440K), 0.0071260 secs] 557936K-14940K(4126208K), 0.0072170 secs] [Times: user=0.05 sys=0.00, real=0.00 secs] 2014-11-06T12:21:33.913+0100: 1294.621: [GC2014-11-06T12:21:33.913+0100: 1294.621: [ParNew: 547726K-2452K(613440K), 0.0070220 secs] 559541K-14367K(412 Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Consumer and Producer configs
Hi Guys, How could I use the Consumer and Producer configs in my Kafka environment? Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Spark and Kafka
Hi Guys, I am doing some tests with Spark Streaming and Kafka, but I have seen something strange, I have modified the JavaKafkaWordCount to use ReducebyKeyandWindow and to print in the screen the accumulated numbers of the words, in the beginning spark works very well in each interaction the numbers of the words increase but after 12 a 13 sec the results repeats continually. My program producer remain sending the words toward the kafka. Does anyone have any idea about this? --- Time: 1415272266000 ms --- (accompanied them,6) (merrier,5) (it possessed,5) (the treacherous,5) (Quite,12) (offer,273) (rabble,58) (exchanging,16) (Genoa,18) (merchant,41) ... --- Time: 1415272267000 ms --- (accompanied them,12) (merrier,12) (it possessed,12) (the treacherous,11) (Quite,24) (offer,602) (rabble,132) (exchanging,35) (Genoa,36) (merchant,84) ... --- Time: 1415272268000 ms --- (accompanied them,17) (merrier,18) (it possessed,17) (the treacherous,17) (Quite,35) (offer,889) (rabble,192) (the bed,1) (exchanging,51) (Genoa,54) ... --- Time: 1415272269000 ms --- (accompanied them,17) (merrier,18) (it possessed,17) (the treacherous,17) (Quite,35) (offer,889) (rabble,192) (the bed,1) (exchanging,51) (Genoa,54) ... --- Time: 141527227 ms --- (accompanied them,17) (merrier,18) (it possessed,17) (the treacherous,17) (Quite,35) (offer,889) (rabble,192) (the bed,1) (exchanging,51) (Genoa,54) ... -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: Spark and Kafka
This is my window: reduceByKeyAndWindow( new Function2Integer, Integer, Integer() { @Override public Integer call(Integer i1, Integer i2) { return i1 + i2; } }, new Function2Integer, Integer, Integer() { public Integer call(Integer i1, Integer i2) { return i1 - i2; } }, new Duration(60 * 5 * 1000), new Duration(1 * 1000) ); On Nov 6, 2014, at 18:37, Gwen Shapira gshap...@cloudera.com wrote: What's the window size? If the window is around 10 seconds and you are sending data at very stable rate, this is expected. On Thu, Nov 6, 2014 at 9:32 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it wrote: Hi Guys, I am doing some tests with Spark Streaming and Kafka, but I have seen something strange, I have modified the JavaKafkaWordCount to use ReducebyKeyandWindow and to print in the screen the accumulated numbers of the words, in the beginning spark works very well in each interaction the numbers of the words increase but after 12 a 13 sec the results repeats continually. My program producer remain sending the words toward the kafka. Does anyone have any idea about this? --- Time: 1415272266000 ms --- (accompanied them,6) (merrier,5) (it possessed,5) (the treacherous,5) (Quite,12) (offer,273) (rabble,58) (exchanging,16) (Genoa,18) (merchant,41) ... --- Time: 1415272267000 ms --- (accompanied them,12) (merrier,12) (it possessed,12) (the treacherous,11) (Quite,24) (offer,602) (rabble,132) (exchanging,35) (Genoa,36) (merchant,84) ... --- Time: 1415272268000 ms --- (accompanied them,17) (merrier,18) (it possessed,17) (the treacherous,17) (Quite,35) (offer,889) (rabble,192) (the bed,1) (exchanging,51) (Genoa,54) ... --- Time: 1415272269000 ms --- (accompanied them,17) (merrier,18) (it possessed,17) (the treacherous,17) (Quite,35) (offer,889) (rabble,192) (the bed,1) (exchanging,51) (Genoa,54) ... --- Time: 141527227 ms --- (accompanied them,17) (merrier,18) (it possessed,17) (the treacherous,17) (Quite,35) (offer,889) (rabble,192) (the bed,1) (exchanging,51) (Genoa,54) ... -- Informativa sulla Privacy: http://www.unibs.it/node/8155 -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Spark and Kafka
Hi Guys, I am doing some tests with Spark Streaming and Kafka, but I have seen something strange, I have modified the JavaKafkaWordCount to use ReducebyKeyandWindow and to print in the screen the accumulated numbers of the words, in the beginning spark works very well in each interaction the numbers of the words increase but after 12 a 13 sec the results repeats continually. My program producer remain sending the words toward the kafka. Does anyone have any idea about this? --- Time: 1415272266000 ms --- (accompanied them,6) (merrier,5) (it possessed,5) (the treacherous,5) (Quite,12) (offer,273) (rabble,58) (exchanging,16) (Genoa,18) (merchant,41) ... --- Time: 1415272267000 ms --- (accompanied them,12) (merrier,12) (it possessed,12) (the treacherous,11) (Quite,24) (offer,602) (rabble,132) (exchanging,35) (Genoa,36) (merchant,84) ... --- Time: 1415272268000 ms --- (accompanied them,17) (merrier,18) (it possessed,17) (the treacherous,17) (Quite,35) (offer,889) (rabble,192) (the bed,1) (exchanging,51) (Genoa,54) ... --- Time: 1415272269000 ms --- (accompanied them,17) (merrier,18) (it possessed,17) (the treacherous,17) (Quite,35) (offer,889) (rabble,192) (the bed,1) (exchanging,51) (Genoa,54) ... --- Time: 141527227 ms --- (accompanied them,17) (merrier,18) (it possessed,17) (the treacherous,17) (Quite,35) (offer,889) (rabble,192) (the bed,1) (exchanging,51) (Genoa,54) ... -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: Spark and Kafka
This is my window: reduceByKeyAndWindow( new Function2Integer, Integer, Integer() { @Override public Integer call(Integer i1, Integer i2) { return i1 + i2; } }, new Function2Integer, Integer, Integer() { public Integer call(Integer i1, Integer i2) { return i1 - i2; } }, new Duration(60 * 5 * 1000), new Duration(1 * 1000) ); On Nov 6, 2014, at 18:37, Gwen Shapira gshap...@cloudera.com wrote: What's the window size? If the window is around 10 seconds and you are sending data at very stable rate, this is expected. On Thu, Nov 6, 2014 at 9:32 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it wrote: Hi Guys, I am doing some tests with Spark Streaming and Kafka, but I have seen something strange, I have modified the JavaKafkaWordCount to use ReducebyKeyandWindow and to print in the screen the accumulated numbers of the words, in the beginning spark works very well in each interaction the numbers of the words increase but after 12 a 13 sec the results repeats continually. My program producer remain sending the words toward the kafka. Does anyone have any idea about this? --- Time: 1415272266000 ms --- (accompanied them,6) (merrier,5) (it possessed,5) (the treacherous,5) (Quite,12) (offer,273) (rabble,58) (exchanging,16) (Genoa,18) (merchant,41) ... --- Time: 1415272267000 ms --- (accompanied them,12) (merrier,12) (it possessed,12) (the treacherous,11) (Quite,24) (offer,602) (rabble,132) (exchanging,35) (Genoa,36) (merchant,84) ... --- Time: 1415272268000 ms --- (accompanied them,17) (merrier,18) (it possessed,17) (the treacherous,17) (Quite,35) (offer,889) (rabble,192) (the bed,1) (exchanging,51) (Genoa,54) ... --- Time: 1415272269000 ms --- (accompanied them,17) (merrier,18) (it possessed,17) (the treacherous,17) (Quite,35) (offer,889) (rabble,192) (the bed,1) (exchanging,51) (Genoa,54) ... --- Time: 141527227 ms --- (accompanied them,17) (merrier,18) (it possessed,17) (the treacherous,17) (Quite,35) (offer,889) (rabble,192) (the bed,1) (exchanging,51) (Genoa,54) ... -- Informativa sulla Privacy: http://www.unibs.it/node/8155 -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: Spark Kafka Performance
Hi Bhavesh I will collect the dump and I will send for you. I am using a program that I have caught here https://github.com/edenhill/librdkafka/tree/master/examples https://github.com/edenhill/librdkafka/tree/master/examples and I have changed to meet my tests. I have attached the files. On Nov 5, 2014, at 04:45, Bhavesh Mistry mistry.p.bhav...@gmail.com wrote: Hi Eduardo, Can you please take thread dump and see if there are blocking issues on producer side ? Do you have single instance of Producers and Multiple treads ? Are you using Scala Producer or New Java Producer ? Also, what is your producer property ? Thanks, Bhavesh On Tue, Nov 4, 2014 at 12:40 AM, Eduardo Alfaia e.costaalf...@unibs.it wrote: Hi Gwen, I have changed the java code kafkawordcount to use reducebykeyandwindow in spark. - Messaggio originale - Da: Gwen Shapira gshap...@cloudera.com Inviato: 03/11/2014 21:08 A: users@kafka.apache.org users@kafka.apache.org Cc: u...@spark.incubator.apache.org u...@spark.incubator.apache.org Oggetto: Re: Spark Kafka Performance Not sure about the throughput, but: I mean that the words counted in spark should grow up - The spark word-count example doesn't accumulate. It gets an RDD every n seconds and counts the words in that RDD. So we don't expect the count to go up. On Mon, Nov 3, 2014 at 6:57 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it wrote: Hi Guys, Anyone could explain me how to work Kafka with Spark, I am using the JavaKafkaWordCount.java like a test and the line command is: ./run-example org.apache.spark.streaming.examples.JavaKafkaWordCount spark://192.168.0.13:7077 computer49:2181 test-consumer-group unibs.it 3 and like a producer I am using this command: rdkafka_cachesender -t unibs.nec -p 1 -b 192.168.0.46:9092 -f output.txt -l 100 -n 10 rdkafka_cachesender is a program that was developed by me which send to kafka the output.txt’s content where -l is the length of each send(upper bound) and -n is the lines to send in a row. Bellow is the throughput calculated by the program: File is 2235755 bytes throughput (b/s) = 699751388 throughput (b/s) = 723542382 throughput (b/s) = 662989745 throughput (b/s) = 505028200 throughput (b/s) = 471263416 throughput (b/s) = 446837266 throughput (b/s) = 409856716 throughput (b/s) = 373994467 throughput (b/s) = 366343097 throughput (b/s) = 373240017 throughput (b/s) = 386139016 throughput (b/s) = 373802209 throughput (b/s) = 369308515 throughput (b/s) = 366935820 throughput (b/s) = 365175388 throughput (b/s) = 362175419 throughput (b/s) = 358356633 throughput (b/s) = 357219124 throughput (b/s) = 352174125 throughput (b/s) = 348313093 throughput (b/s) = 355099099 throughput (b/s) = 348069777 throughput (b/s) = 348478302 throughput (b/s) = 340404276 throughput (b/s) = 339876031 throughput (b/s) = 339175102 throughput (b/s) = 327555252 throughput (b/s) = 324272374 throughput (b/s) = 322479222 throughput (b/s) = 319544906 throughput (b/s) = 317201853 throughput (b/s) = 317351399 throughput (b/s) = 315027978 throughput (b/s) = 313831014 throughput (b/s) = 310050384 throughput (b/s) = 307654601 throughput (b/s) = 305707061 throughput (b/s) = 307961102 throughput (b/s) = 296898200 throughput (b/s) = 296409904 throughput (b/s) = 294609332 throughput (b/s) = 293397843 throughput (b/s) = 293194876 throughput (b/s) = 291724886 throughput (b/s) = 290031314 throughput (b/s) = 289747022 throughput (b/s) = 289299632 The throughput goes down after some seconds and it does not maintain the performance like the initial values: throughput (b/s) = 699751388 throughput (b/s) = 723542382 throughput (b/s) = 662989745 Another question is about spark, after I have started the spark line command after 15 sec spark continue to repeat the words counted, but my program continue to send words to kafka, so I mean that the words counted in spark should grow up. I have attached the log from spark. My Case is: ComputerA(Kafka_cachsesender) - ComputerB(Kakfa-Brokers-Zookeeper) - ComputerC (Spark) If I don’t explain very well send a reply to me. Thanks Guys -- Informativa sulla Privacy: http://www.unibs.it/node/8155 -- Informativa sulla Privacy: http://www.unibs.it/node/8155 -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Producer and Consumer properties
Hi Dudes, I would like to know if the producer and consumer’s properties files into the config folder should be configured. I have configured only the server.properties, is it enough? I am doing some tests about the performance, for example network throughput my scenario is: Like producer I am using this program in c: Like consumer this: 1 Server (zookeeper + 3 Brokers (8 partitions and Replication factor 3)) 24GB RAM 5.0TB Hard Disc eth0: Broadcom NetXtreme II BCM5709 1000Base-T Exist a great difference of throughput between the producer and consumer, does someone have any ideia? Results: ProducerConsumer throughput (b/s) = 301393419received = 4083875, throughput (b/s) = 5571423 throughput (b/s) = 424807283received = 7146741, throughput (b/s) = 8061556 throughput (b/s) = 445245606received = 13270522, throughput (b/s) = 12925199 throughput (b/s) = 466454739received = 16333527, throughput (b/s) = 13890292 throughput (b/s) = 442368081received = 18375214, throughput (b/s) = 13967440 throughput (b/s) = 436540119received = 20416859, throughput (b/s) = 14127520 throughput (b/s) = 427105440received = 24500066, throughput (b/s) = 15594622 throughput (b/s) = 426395933received = 27563023, throughput (b/s) = 16177493 throughput (b/s) = 409344029received = 34708625, throughput (b/s) = 18740726 throughput (b/s) = 403371185received = 37771189, throughput (b/s) = 17961816 throughput (b/s) = 403325568received = 39813038, throughput (b/s) = 17654058 throughput (b/s) = 397938415received = 47979107, throughput (b/s) = 19686322 throughput (b/s) = 393364006received = 53083307, throughput (b/s) = 20623441 throughput (b/s) = 387393832received = 57166558, throughput (b/s) = 21050531 throughput (b/s) = 380266372received = 59207558, throughput (b/s) = 20654404 throughput (b/s) = 376436729received = 62269998, throughput (b/s) = 20740363 throughput (b/s) = 377043675received = 65332901, throughput (b/s) = 20888135 throughput (b/s) = 368613683received = 67374558, throughput (b/s) = 20467503 throughput (b/s) = 370020865received = 71457763, throughput (b/s) = 20727773 throughput (b/s) = 373827848received = 73499480, throughput (b/s) = 20171583 throughput (b/s) = 369647040received = 75541289, throughput (b/s) = 19599155 throughput (b/s) = 363395680received = 80645776, throughput (b/s) = 20033582 Thanks Guys -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Spark Kafka Performance
Hi Guys, Anyone could explain me how to work Kafka with Spark, I am using the JavaKafkaWordCount.java like a test and the line command is: ./run-example org.apache.spark.streaming.examples.JavaKafkaWordCount spark://192.168.0.13:7077 computer49:2181 test-consumer-group unibs.it 3 and like a producer I am using this command: rdkafka_cachesender -t unibs.nec -p 1 -b 192.168.0.46:9092 -f output.txt -l 100 -n 10 rdkafka_cachesender is a program that was developed by me which send to kafka the output.txt’s content where -l is the length of each send(upper bound) and -n is the lines to send in a row. Bellow is the throughput calculated by the program: File is 2235755 bytes throughput (b/s) = 699751388 throughput (b/s) = 723542382 throughput (b/s) = 662989745 throughput (b/s) = 505028200 throughput (b/s) = 471263416 throughput (b/s) = 446837266 throughput (b/s) = 409856716 throughput (b/s) = 373994467 throughput (b/s) = 366343097 throughput (b/s) = 373240017 throughput (b/s) = 386139016 throughput (b/s) = 373802209 throughput (b/s) = 369308515 throughput (b/s) = 366935820 throughput (b/s) = 365175388 throughput (b/s) = 362175419 throughput (b/s) = 358356633 throughput (b/s) = 357219124 throughput (b/s) = 352174125 throughput (b/s) = 348313093 throughput (b/s) = 355099099 throughput (b/s) = 348069777 throughput (b/s) = 348478302 throughput (b/s) = 340404276 throughput (b/s) = 339876031 throughput (b/s) = 339175102 throughput (b/s) = 327555252 throughput (b/s) = 324272374 throughput (b/s) = 322479222 throughput (b/s) = 319544906 throughput (b/s) = 317201853 throughput (b/s) = 317351399 throughput (b/s) = 315027978 throughput (b/s) = 313831014 throughput (b/s) = 310050384 throughput (b/s) = 307654601 throughput (b/s) = 305707061 throughput (b/s) = 307961102 throughput (b/s) = 296898200 throughput (b/s) = 296409904 throughput (b/s) = 294609332 throughput (b/s) = 293397843 throughput (b/s) = 293194876 throughput (b/s) = 291724886 throughput (b/s) = 290031314 throughput (b/s) = 289747022 throughput (b/s) = 289299632 The throughput goes down after some seconds and it does not maintain the performance like the initial values: throughput (b/s) = 699751388 throughput (b/s) = 723542382 throughput (b/s) = 662989745 Another question is about spark, after I have started the spark line command after 15 sec spark continue to repeat the words counted, but my program continue to send words to kafka, so I mean that the words counted in spark should grow up. I have attached the log from spark. My Case is: ComputerA(Kafka_cachsesender) - ComputerB(Kakfa-Brokers-Zookeeper) - ComputerC (Spark) If I don’t explain very well send a reply to me. Thanks Guys -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Spark Kafka Performance
Hi Guys, Anyone could explain me how to work Kafka with Spark, I am using the JavaKafkaWordCount.java like a test and the line command is: ./run-example org.apache.spark.streaming.examples.JavaKafkaWordCount spark://192.168.0.13:7077 computer49:2181 test-consumer-group unibs.it 3 and like a producer I am using this command: rdkafka_cachesender -t unibs.nec -p 1 -b 192.168.0.46:9092 -f output.txt -l 100 -n 10 rdkafka_cachesender is a program that was developed by me which send to kafka the output.txt’s content where -l is the length of each send(upper bound) and -n is the lines to send in a row. Bellow is the throughput calculated by the program: File is 2235755 bytes throughput (b/s) = 699751388 throughput (b/s) = 723542382 throughput (b/s) = 662989745 throughput (b/s) = 505028200 throughput (b/s) = 471263416 throughput (b/s) = 446837266 throughput (b/s) = 409856716 throughput (b/s) = 373994467 throughput (b/s) = 366343097 throughput (b/s) = 373240017 throughput (b/s) = 386139016 throughput (b/s) = 373802209 throughput (b/s) = 369308515 throughput (b/s) = 366935820 throughput (b/s) = 365175388 throughput (b/s) = 362175419 throughput (b/s) = 358356633 throughput (b/s) = 357219124 throughput (b/s) = 352174125 throughput (b/s) = 348313093 throughput (b/s) = 355099099 throughput (b/s) = 348069777 throughput (b/s) = 348478302 throughput (b/s) = 340404276 throughput (b/s) = 339876031 throughput (b/s) = 339175102 throughput (b/s) = 327555252 throughput (b/s) = 324272374 throughput (b/s) = 322479222 throughput (b/s) = 319544906 throughput (b/s) = 317201853 throughput (b/s) = 317351399 throughput (b/s) = 315027978 throughput (b/s) = 313831014 throughput (b/s) = 310050384 throughput (b/s) = 307654601 throughput (b/s) = 305707061 throughput (b/s) = 307961102 throughput (b/s) = 296898200 throughput (b/s) = 296409904 throughput (b/s) = 294609332 throughput (b/s) = 293397843 throughput (b/s) = 293194876 throughput (b/s) = 291724886 throughput (b/s) = 290031314 throughput (b/s) = 289747022 throughput (b/s) = 289299632 The throughput goes down after some seconds and it does not maintain the performance like the initial values: throughput (b/s) = 699751388 throughput (b/s) = 723542382 throughput (b/s) = 662989745 Another question is about spark, after I have started the spark line command after 15 sec spark continue to repeat the words counted, but my program continue to send words to kafka, so I mean that the words counted in spark should grow up. I have attached the log from spark. My Case is: ComputerA(Kafka_cachsesender) - ComputerB(Kakfa-Brokers-Zookeeper) - ComputerC (Spark) If I don’t explain very well send a reply to me. Thanks Guys -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Clean Kafka Queue
Hi Guys, Is there a manner of cleaning a kafka queue after that the consumer consume the messages? Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: Clean Kafka Queue
Ok guys, Thanks by the help. Regards On Oct 21, 2014, at 18:30, Joe Stein joe.st...@stealth.ly wrote: The concept of truncate topic comes up a lot. I will add it as an item to https://issues.apache.org/jira/browse/KAFKA-1694 It is a scary feature though, it might be best to wait until authorizations are in place before we release it. With 0.8.2 you can delete topics so at least you can start fresh easier. That should work in the mean time. 0.8.2-beta should be out this week :) /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Tue, Oct 21, 2014 at 12:03 PM, Harsha ka...@harsha.io wrote: you can use log.retention.hours or log.retention.bytes to prune the log more info on that config here https://kafka.apache.org/08/configuration.html if you want to delete a message after the consumer processed a message there is no api for it. -Harsha On Tue, Oct 21, 2014, at 08:00 AM, Eduardo Costa Alfaia wrote: Hi Guys, Is there a manner of cleaning a kafka queue after that the consumer consume the messages? Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155 -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Spark's Behavior 2
Hi TD, I have sent more informations now using 8 workers. The gap has been 27 sec now. Have you seen? Thanks BR -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: Spark's behavior
Ok Andrew, Thanks I sent informations of test with 8 worker and the gap is grown up. On May 4, 2014, at 2:31, Andrew Ash and...@andrewash.com wrote: From the logs, I see that the print() starts printing stuff 10 seconds after the context is started. And that 10 seconds is taken by the initial empty job (50 map + 20 reduce tasks) that spark streaming starts to ensure all the executors have started. Somehow the first empty task takes 7-8 seconds to complete. See if this can be reproduced by running a simple, empty job in spark shell (in the same cluster) and see if the first task takes 7-8 seconds. Either way, I didnt see the 30 second gap, but a 10 second gap. And that does not seem to be a persistent problem as after that 10 seconds, the data is being received and processed. TD -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: Spark's behavior
Hi TD, Thanks for reply This last experiment I did with one computer, like local, but I think that time gap grow up when I add more computer. I will do again now with 8 worker and 1 word source and I will see what’s go on. I will control the time too, like suggested by Andrew. On May 3, 2014, at 1:19, Tathagata Das tathagata.das1...@gmail.com wrote: From the logs, I see that the print() starts printing stuff 10 seconds after the context is started. And that 10 seconds is taken by the initial empty job (50 map + 20 reduce tasks) that spark streaming starts to ensure all the executors have started. Somehow the first empty task takes 7-8 seconds to complete. See if this can be reproduced by running a simple, empty job in spark shell (in the same cluster) and see if the first task takes 7-8 seconds. Either way, I didnt see the 30 second gap, but a 10 second gap. And that does not seem to be a persistent problem as after that 10 seconds, the data is being received and processed. TD On Fri, May 2, 2014 at 2:14 PM, Eduardo Costa Alfaia e.costaalf...@unibs.it wrote: Hi TD, I got the another information today using Spark 1.0 RC3 and the situation remain the same: PastedGraphic-1.png The lines begin after 17 sec: 14/05/02 21:52:25 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140502215225-0005/0 on hostPort computer8.ant-net:57229 with 2 cores, 2.0 GB RAM 14/05/02 21:52:25 INFO AppClient$ClientActor: Executor updated: app-20140502215225-0005/0 is now RUNNING 14/05/02 21:52:25 INFO ReceiverTracker: ReceiverTracker started 14/05/02 21:52:26 INFO ForEachDStream: metadataCleanupDelay = -1 14/05/02 21:52:26 INFO SocketInputDStream: metadataCleanupDelay = -1 14/05/02 21:52:26 INFO SocketInputDStream: Slide time = 1000 ms 14/05/02 21:52:26 INFO SocketInputDStream: Storage level = StorageLevel(false, false, false, false, 1) 14/05/02 21:52:26 INFO SocketInputDStream: Checkpoint interval = null 14/05/02 21:52:26 INFO SocketInputDStream: Remember duration = 1000 ms 14/05/02 21:52:26 INFO SocketInputDStream: Initialized and validated org.apache.spark.streaming.dstream.SocketInputDStream@5433868e 14/05/02 21:52:26 INFO ForEachDStream: Slide time = 1000 ms 14/05/02 21:52:26 INFO ForEachDStream: Storage level = StorageLevel(false, false, false, false, 1) 14/05/02 21:52:26 INFO ForEachDStream: Checkpoint interval = null 14/05/02 21:52:26 INFO ForEachDStream: Remember duration = 1000 ms 14/05/02 21:52:26 INFO ForEachDStream: Initialized and validated org.apache.spark.streaming.dstream.ForEachDStream@1ebdbc05 14/05/02 21:52:26 INFO SparkContext: Starting job: collect at ReceiverTracker.scala:270 14/05/02 21:52:26 INFO RecurringTimer: Started timer for JobGenerator at time 1399060346000 14/05/02 21:52:26 INFO JobGenerator: Started JobGenerator at 1399060346000 ms 14/05/02 21:52:26 INFO JobScheduler: Started JobScheduler 14/05/02 21:52:26 INFO DAGScheduler: Registering RDD 3 (reduceByKey at ReceiverTracker.scala:270) 14/05/02 21:52:26 INFO ReceiverTracker: Stream 0 received 0 blocks 14/05/02 21:52:26 INFO DAGScheduler: Got job 0 (collect at ReceiverTracker.scala:270) with 20 output partitions (allowLocal=false) 14/05/02 21:52:26 INFO DAGScheduler: Final stage: Stage 0(collect at ReceiverTracker.scala:270) 14/05/02 21:52:26 INFO DAGScheduler: Parents of final stage: List(Stage 1) 14/05/02 21:52:26 INFO JobScheduler: Added jobs for time 1399060346000 ms 14/05/02 21:52:26 INFO JobScheduler: Starting job streaming job 1399060346000 ms.0 from job set of time 1399060346000 ms 14/05/02 21:52:26 INFO JobGenerator: Checkpointing graph for time 1399060346000 ms ---14/05/02 21:52:26 INFO DStreamGraph: Updating checkpoint data for time 1399060346000 ms Time: 1399060346000 ms --- 14/05/02 21:52:26 INFO JobScheduler: Finished job streaming job 1399060346000 ms.0 from job set of time 1399060346000 ms 14/05/02 21:52:26 INFO JobScheduler: Total delay: 0.325 s for time 1399060346000 ms (execution: 0.024 s) 14/05/02 21:52:42 INFO JobScheduler: Added jobs for time 1399060362000 ms 14/05/02 21:52:42 INFO JobGenerator: Checkpointing graph for time 1399060362000 ms 14/05/02 21:52:42 INFO DStreamGraph: Updating checkpoint data for time 1399060362000 ms 14/05/02 21:52:42 INFO DStreamGraph: Updated checkpoint data for time 1399060362000 ms 14/05/02 21:52:42 INFO JobScheduler: Starting job streaming job 1399060362000 ms.0 from job set of time 1399060362000 ms 14/05/02 21:52:42 INFO SparkContext: Starting job: take at DStream.scala:593 14/05/02 21:52:42 INFO DAGScheduler: Got job 2 (take at DStream.scala:593) with 1 output partitions (allowLocal=true) 14/05/02 21:52:42 INFO DAGScheduler: Final stage: Stage 3(take at DStream.scala:593) 14/05/02 21:52:42 INFO DAGScheduler: Parents of final stage: List() 14/05/02 21:52:42 INFO
Spark's behavior
Hi TD, In my tests with spark streaming, I'm using JavaNetworkWordCount(modified) code and a program that I wrote that sends words to the Spark worker, I use TCP as transport. I verified that after starting Spark, it connects to my source which actually starts sending, but the first word count is advertised approximately 30 seconds after the context creation. So I'm wondering where is stored the 30 seconds data already sent by the source. Is this a normal spark’s behaviour? I saw the same behaviour using the shipped JavaNetworkWordCount application. Many thanks. -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: Spark's behavior
Hi TD, We are not using stream context with master local, we have 1 Master and 8 Workers and 1 word source. The command line that we are using is: bin/run-example org.apache.spark.streaming.examples.JavaNetworkWordCount spark://192.168.0.13:7077 On Apr 30, 2014, at 0:09, Tathagata Das tathagata.das1...@gmail.com wrote: Is you batch size 30 seconds by any chance? Assuming not, please check whether you are creating the streaming context with master local[n] where n 2. With local or local[1], the system only has one processing slot, which is occupied by the receiver leaving no room for processing the received data. It could be that after 30 seconds, the server disconnects, the receiver terminates, releasing the single slot for the processing to proceed. TD On Tue, Apr 29, 2014 at 2:28 PM, Eduardo Costa Alfaia e.costaalf...@unibs.it wrote: Hi TD, In my tests with spark streaming, I'm using JavaNetworkWordCount(modified) code and a program that I wrote that sends words to the Spark worker, I use TCP as transport. I verified that after starting Spark, it connects to my source which actually starts sending, but the first word count is advertised approximately 30 seconds after the context creation. So I'm wondering where is stored the 30 seconds data already sent by the source. Is this a normal spark’s behaviour? I saw the same behaviour using the shipped JavaNetworkWordCount application. Many thanks. -- Informativa sulla Privacy: http://www.unibs.it/node/8155 -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: reduceByKeyAndWindow Java
Hi TD Could you explain me this code part? .reduceByKeyAndWindow( 109 new Function2Integer, Integer, Integer() { 110 public Integer call(Integer i1, Integer i2) { return i1 + i2; } 111 }, 112 new Function2Integer, Integer, Integer() { 113 public Integer call(Integer i1, Integer i2) { return i1 - i2; } 114 }, 115 new Duration(60 * 5 * 1000), 116 new Duration(1 * 1000) 117 ); Thanks Em 4/4/14, 22:56, Tathagata Das escreveu: I havent really compiled the code, but it looks good to me. Why? Is there any problem you are facing? TD On Fri, Apr 4, 2014 at 8:03 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it mailto:e.costaalf...@unibs.it wrote: Hi guys, I would like knowing if the part of code is right to use in Window. JavaPairDStreamString, Integer wordCounts = words.map( 103 new PairFunctionString, String, Integer() { 104 @Override 105 public Tuple2String, Integer call(String s) { 106 return new Tuple2String, Integer(s, 1); 107 } 108 }).reduceByKeyAndWindow( 109 new Function2Integer, Integer, Integer() { 110 public Integer call(Integer i1, Integer i2) { return i1 + i2; } 111 }, 112 new Function2Integer, Integer, Integer() { 113 public Integer call(Integer i1, Integer i2) { return i1 - i2; } 114 }, 115 new Duration(60 * 5 * 1000), 116 new Duration(1 * 1000) 117 ); Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155 -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Driver Out of Memory
Hi Guys, I would like understanding why the Driver's RAM goes down, Does the processing occur only in the workers? Thanks # Start Tests computer1(Worker/Source Stream) 23:57:18 up 12:03, 1 user, load average: 0.03, 0.31, 0.44 total used free sharedbuffers cached Mem: 3945 1084 2860 0 44827 -/+ buffers/cache:212 3732 Swap:0 0 0 computer8 (Driver/Master) 23:57:18 up 11:53, 5 users, load average: 0.43, 1.19, 1.31 total used free sharedbuffers cached Mem: 5897 4430 1466 0 384 2662 -/+ buffers/cache: 1382 4514 Swap:0 0 0 computer10(Worker/Source Stream) 23:57:18 up 12:02, 1 user, load average: 0.55, 1.34, 0.98 total used free sharedbuffers cached Mem: 5897564 5332 0 18358 -/+ buffers/cache:187 5709 Swap:0 0 0 computer11(Worker/Source Stream) 23:57:18 up 12:02, 1 user, load average: 0.07, 0.19, 0.29 total used free sharedbuffers cached Mem: 3945603 3342 0 54355 -/+ buffers/cache:193 3751 Swap:0 0 0 After 2 Minutes computer1 00:06:41 up 12:12, 1 user, load average: 3.11, 1.32, 0.73 total used free sharedbuffers cached Mem: 3945 2950994 0 46 1095 -/+ buffers/cache: 1808 2136 Swap:0 0 0 computer8(Driver/Master) 00:06:41 up 12:02, 5 users, load average: 1.16, 0.71, 0.96 total used free sharedbuffers cached Mem: 5897 5191705 0 385 2792 -/+ buffers/cache: 2014 3882 Swap:0 0 0 computer10 00:06:41 up 12:11, 1 user, load average: 2.02, 1.07, 0.89 total used free sharedbuffers cached Mem: 5897 2567 3329 0 21647 -/+ buffers/cache: 1898 3998 Swap:0 0 0 computer11 00:06:42 up 12:12, 1 user, load average: 3.96, 1.83, 0.88 total used free sharedbuffers cached Mem: 3945 3542402 0 57 1099 -/+ buffers/cache: 2385 1559 Swap:0 0 0 -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Explain Add Input
Hi all, Could anyone explain me about the lines below? computer1 - worker computer8 - driver(master) 14/04/04 14:24:56 INFO BlockManagerMasterActor$BlockManagerInfo: Added input-0-1396614314800 in memory on computer1.ant-net:60820 (size: 1262.5 KB, free: 540.3 MB) 14/04/04 14:24:56 INFO MemoryStore: ensureFreeSpace(1292780) called with curMem=49555672, maxMem=825439027 14/04/04 14:24:56 INFO MemoryStore: Block input-0-1396614314800 stored as bytes to memory (size 1262.5 KB, free 738.7 MB) 14/04/04 14:24:56 INFO BlockManagerMasterActor$BlockManagerInfo: Added input-0-1396614314800 in memory on computer8.ant-net:49743 (size: 1262.5 KB, free: 738.7 MB) Why does spark add the same input in computer8, which is the Driver(master)? Thanks guys -- Informativa sulla Privacy: http://www.unibs.it/node/8155
RAM high consume
Hi all, I am doing some tests using JavaNetworkWordcount and I have some questions about the performance machine, my tests' time are approximately 2 min. Why does the RAM Memory decrease meaningly? I have done tests with 2, 3 machines and I had gotten the same behavior. What should I do to get a better performance in this case? # Star Test computer1 total used free sharedbuffers cached Mem: 3945711 3233 0 3430 -/+ buffers/cache:276 3668 Swap:0 0 0 14:42:50 up 73 days, 3:32, 2 users, load average: 0.00, 0.06, 0.21 14/04/04 14:24:56 INFO BlockManagerMasterActor$BlockManagerInfo: Added input-0-1396614314400 in memory on computer1.ant-net:60820 (size: 826.1 KB, free: 542.9 MB) 14/04/04 14:24:56 INFO MemoryStore: ensureFreeSpace(845956) called with curMem=47278100, maxMem=825439027 14/04/04 14:24:56 INFO MemoryStore: Block input-0-1396614314400 stored as bytes to memory (size 826.1 KB, free 741.3 MB) 14/04/04 14:24:56 INFO BlockManagerMasterActor$BlockManagerInfo: Added input-0-1396614314400 in memory on computer8.ant-net:49743 (size: 826.1 KB, free: 741.3 MB) 14/04/04 14:24:56 INFO BlockManagerMaster: Updated info of block input-0-1396614314400 14/04/04 14:24:56 INFO TaskSetManager: Finished TID 272 in 84 ms on computer1.ant-net (progress: 0/1) 14/04/04 14:24:56 INFO TaskSchedulerImpl: Remove TaskSet 43.0 from pool 14/04/04 14:24:56 INFO DAGScheduler: Completed ResultTask(43, 0) 14/04/04 14:24:56 INFO DAGScheduler: Stage 43 (take at DStream.scala:594) finished in 0.088 s 14/04/04 14:24:56 INFO SparkContext: Job finished: take at DStream.scala:594, took 1.872875734 s --- Time: 1396614289000 ms --- (Santiago,1) (liveliness,1) (Sun,1) (reapers,1) (offer,3) (BARBER,3) (shrewdness,1) (truism,1) (hits,1) (merchant,1) # End Test computer1 total used free sharedbuffers cached Mem: 3945 2209 1735 0 5773 -/+ buffers/cache: 1430 2514 Swap:0 0 0 14:46:05 up 73 days, 3:35, 2 users, load average: 2.69, 1.07, 0.55 14/04/04 14:26:57 INFO TaskSetManager: Starting task 183.0:0 as TID 696 on executor 0: computer1.ant-net (PROCESS_LOCAL) 14/04/04 14:26:57 INFO TaskSetManager: Serialized task 183.0:0 as 1981 bytes in 0 ms 14/04/04 14:26:57 INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 81 to sp...@computer1.ant-net:44817 14/04/04 14:26:57 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 81 is 212 bytes 14/04/04 14:26:57 INFO BlockManagerMasterActor$BlockManagerInfo: Added input-0-1396614336600 on disk on computer1.ant-net:60820 (size: 1441.7 KB) 14/04/04 14:26:57 INFO BlockManagerMasterActor$BlockManagerInfo: Added input-0-1396614435200 in memory on computer1.ant-net:60820 (size: 1295.7 KB, free: 589.3 KB) 14/04/04 14:26:57 INFO TaskSetManager: Finished TID 696 in 56 ms on computer1.ant-net (progress: 0/1) 14/04/04 14:26:57 INFO TaskSchedulerImpl: Remove TaskSet 183.0 from pool 14/04/04 14:26:57 INFO DAGScheduler: Completed ResultTask(183, 0) 14/04/04 14:26:57 INFO DAGScheduler: Stage 183 (take at DStream.scala:594) finished in 0.057 s 14/04/04 14:26:57 INFO SparkContext: Job finished: take at DStream.scala:594, took 1.575268894 s --- Time: 1396614359000 ms --- (hapless,9) (reapers,8) (amazed,113) (feebleness,7) (offer,148) (rabble,27) (exchanging,7) (merchant,20) (incentives,2) (quarrel,48) ... Thanks Guys -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Driver increase memory utilization
Hi Guys, Could anyone help me understand this driver behavior when I start the JavaNetworkWordCount? computer8 16:24:07 up 121 days, 22:21, 12 users, load average: 0.66, 1.27, 1.55 total used free shared buffers cached Mem: 5897 4341 1555 0227 2798 -/+ buffers/cache: 1315 4581 Swap:0 0 0 in 2 minutes computer8 16:23:08 up 121 days, 22:20, 12 users, load average: 0.80, 1.43, 1.62 total used free shared buffers cached Mem: 5897 5866 30 0230 3255 -/+ buffers/cache: 2380 3516 Swap:0 0 0 Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Parallelism level
Hi all, I have put this line in my spark-env.sh: -Dspark.default.parallelism=20 this parallelism level, is it correct? The machine's processor is a dual core. Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155
RAM Increase
Hi Guys, Could anyone explain me this behavior? After 2 min of tests computer1- worker computer10 - worker computer8 - driver(master) computer1 18:24:31 up 73 days, 7:14, 1 user, load average: 3.93, 2.45, 1.14 total used free shared buffers cached Mem: 3945 3925 19 0 18 1368 -/+ buffers/cache: 2539 1405 Swap:0 0 0 computer10 18:22:38 up 44 days, 21:26, 2 users, load average: 3.05, 2.20, 1.03 total used free shared buffers cached Mem: 5897 5292 604 0 46 2707 -/+ buffers/cache: 2538 3358 Swap:0 0 0 computer8 18:24:13 up 122 days, 22 min, 13 users, load average: 1.10, 0.93, 0.82 total used free shared buffers cached Mem: 5897 5841 55 0113 2747 -/+ buffers/cache: 2980 2916 Swap:0 0 0 Thanks Guys -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: Parallelism level
What do you advice me Nicholas? Em 4/4/14, 19:05, Nicholas Chammas escreveu: If you're running on one machine with 2 cores, I believe all you can get out of it are 2 concurrent tasks at any one time. So setting your default parallelism to 20 won't help. On Fri, Apr 4, 2014 at 11:41 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it mailto:e.costaalf...@unibs.it wrote: Hi all, I have put this line in my spark-env.sh: -Dspark.default.parallelism=20 this parallelism level, is it correct? The machine's processor is a dual core. Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155 -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: reduceByKeyAndWindow Java
Hi Tathagata, You are right, this code compile, but I am some problems with high memory consummation, I sent today some email about this, but no response until now. Thanks Em 4/4/14, 22:56, Tathagata Das escreveu: I havent really compiled the code, but it looks good to me. Why? Is there any problem you are facing? TD On Fri, Apr 4, 2014 at 8:03 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it mailto:e.costaalf...@unibs.it wrote: Hi guys, I would like knowing if the part of code is right to use in Window. JavaPairDStreamString, Integer wordCounts = words.map( 103 new PairFunctionString, String, Integer() { 104 @Override 105 public Tuple2String, Integer call(String s) { 106 return new Tuple2String, Integer(s, 1); 107 } 108 }).reduceByKeyAndWindow( 109 new Function2Integer, Integer, Integer() { 110 public Integer call(Integer i1, Integer i2) { return i1 + i2; } 111 }, 112 new Function2Integer, Integer, Integer() { 113 public Integer call(Integer i1, Integer i2) { return i1 - i2; } 114 }, 115 new Duration(60 * 5 * 1000), 116 new Duration(1 * 1000) 117 ); Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155 -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Print line in JavaNetworkWordCount
Hi Guys I would like printing the content inside of line in : JavaDStreamString lines = ssc.socketTextStream(args[1], Integer.parseInt(args[2])); JavaDStreamString words = lines.flatMap(new FlatMapFunctionString, String() { @Override public IterableString call(String x) { return Lists.newArrayList(x.split( )); } }); Is it possible? How could I do? Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: Change print() in JavaNetworkWordCount
Thank you very much Sourav BR Em 3/26/14, 17:29, Sourav Chandra escreveu: def print() { def foreachFunc = (rdd: RDD[T], time: Time) = { val total = rdd.collect().toList println (---) println (Time: + time) println (---) total.foreach(println) // val first11 = rdd.take(11) // println (---) // println (Time: + time) // println (---) // first11.take(10).foreach(println) // if (first11.size 10) println(...) println() } new ForEachDStream(this, context.sparkContext.clean(foreachFunc)).register() } -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Change print() in JavaNetworkWordCount
Hi Guys, I think that I already did this question, but I don't remember if anyone has answered me. I would like changing in the function print() the quantity of words and the frequency number that are sent to driver's screen. The default value is 10. Anyone could help me with this? Best Regards -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Log Analyze
Hi Guys, Could anyone help me to understand this piece of log in red? Why is this happened? Thanks 14/03/10 16:55:20 INFO SparkContext: Starting job: first at NetworkWordCount.scala:87 14/03/10 16:55:20 INFO JobScheduler: Finished job streaming job 1394466892000 ms.0 from job set of time 1394466892000 ms 14/03/10 16:55:20 INFO JobScheduler: Total delay: 28.537 s for time 1394466892000 ms (execution: 4.479 s) 14/03/10 16:55:20 INFO JobScheduler: Starting job streaming job 1394466893000 ms.0 from job set of time 1394466893000 ms 14/03/10 16:55:20 INFO JobGenerator: Checkpointing graph for time 1394466892000 ms 14/03/10 16:55:20 INFO DStreamGraph: Updating checkpoint data for time 1394466892000 ms 14/03/10 16:55:20 INFO DStreamGraph: Updated checkpoint data for time 1394466892000 ms 14/03/10 16:55:20 INFO CheckpointWriter: Saving checkpoint for time 1394466892000 ms to file 'hdfs://computer8:54310/user/root/INPUT/checkpoint-1394466892000' 14/03/10 16:55:20 INFO DAGScheduler: Registering RDD 496 (combineByKey at ShuffledDStream.scala:42) 14/03/10 16:55:20 INFO DAGScheduler: Got job 39 (first at NetworkWordCount.scala:87) with 1 output partitions (allowLocal=true) 14/03/10 16:55:20 INFO DAGScheduler: Final stage: Stage 77 (first at NetworkWordCount.scala:87) 14/03/10 16:55:20 INFO DAGScheduler: Parents of final stage: List(Stage 78) 14/03/10 16:55:20 INFO DAGScheduler: Missing parents: List(Stage 78) 14/03/10 16:55:20 INFO BlockManagerMasterActor$BlockManagerInfo: Removed input-1-1394466782400 on computer10.ant-net:34062 in memory (size: 5.9 MB, free: 502.2 MB) 14/03/10 16:55:20 INFO DAGScheduler: Submitting Stage 78 (MapPartitionsRDD[496] at combineByKey at ShuffledDStream.scala:42), which has no missing parents 14/03/10 16:55:20 INFO BlockManagerMasterActor$BlockManagerInfo: Added input-1-1394466816600 in memory on computer10.ant-net:34062 (size: 4.4 MB, free: 497.8 MB) 14/03/10 16:55:20 INFO DAGScheduler: Submitting 15 missing tasks from Stage 78 (MapPartitionsRDD[496] at combineByKey at ShuffledDStream.scala:42) 14/03/10 16:55:20 INFO TaskSchedulerImpl: Adding task set 78.0 with 15 tasks 14/03/10 16:55:20 INFO TaskSetManager: Starting task 78.0:9 as TID 539 on executor 2: computer1.ant-net (PROCESS_LOCAL) 14/03/10 16:55:20 INFO TaskSetManager: Serialized task 78.0:9 as 4144 bytes in 1 ms 14/03/10 16:55:20 INFO TaskSetManager: Starting task 78.0:10 as TID 540 on executor 1: computer10.ant-net (PROCESS_LOCAL) 14/03/10 16:55:20 INFO TaskSetManager: Serialized task 78.0:10 as 4144 bytes in 0 ms 14/03/10 16:55:20 INFO TaskSetManager: Starting task 78.0:11 as TID 541 on executor 0: computer11.ant-net (PROCESS_LOCAL) 14/03/10 16:55:20 INFO TaskSetManager: Serialized task 78.0:11 as 4144 bytes in 0 ms 14/03/10 16:55:20 INFO BlockManagerMasterActor$BlockManagerInfo: Removed input-0-1394466874200 on computer1.ant-net:51406 in memory (size: 2.9 MB, free: 460.0 MB) 14/03/10 16:55:20 INFO BlockManagerMasterActor$BlockManagerInfo: Removed input-0-1394466874400 on computer1.ant-net:51406 in memory (size: 4.1 MB, free: 468.2 MB) 14/03/10 16:55:20 INFO TaskSetManager: Starting task 78.0:12 as TID 542 on executor 1: computer10.ant-net (PROCESS_LOCAL) 14/03/10 16:55:20 INFO TaskSetManager: Serialized task 78.0:12 as 4144 bytes in 1 ms 14/03/10 16:55:20 WARN TaskSetManager: Lost TID 540 (task 78.0:10) 14/03/10 16:55:20 INFO CheckpointWriter: Deleting hdfs://computer8:54310/user/root/INPUT/checkpoint-1394466892000 14/03/10 16:55:20 INFO CheckpointWriter: Checkpoint for time 1394466892000 ms saved to file 'hdfs://computer8:54310/user/root/INPUT/checkpoint-1394466892000', took 3633 bytes and 93 ms 14/03/10 16:55:20 INFO DStreamGraph: Clearing checkpoint data for time 1394466892000 ms 14/03/10 16:55:20 INFO DStreamGraph: Cleared checkpoint data for time 1394466892000 ms 14/03/10 16:55:20 INFO BlockManagerMasterActor$BlockManagerInfo: Removed input-2-1394466789000 on computer11.ant-net:58332 in memory (size: 3.9 MB, free: 536.0 MB) 14/03/10 16:55:20 WARN TaskSetManager: Loss was due to java.lang.Exception java.lang.Exception: Could not compute split, block input-2-1394466794200 not found at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:45) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.UnionPartition.iterator(UnionRDD.scala:32) at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:72) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at
Re: Explain About Logs NetworkWordcount.scala
Yes TD, I can use tcpdump to see if the data are being accepted by the receiver and if else them are arriving into the IP packet. Thanks Em 3/8/14, 4:19, Tathagata Das escreveu: I am not sure how to debug this without any more information about the source. Can you monitor on the receiver side that data is being accepted by the receiver but not reported? TD On Wed, Mar 5, 2014 at 7:23 AM, eduardocalfaia e.costaalf...@unibs.it mailto:e.costaalf...@unibs.it wrote: Hi TD, I have seen in the web UI the stage number that result has been zero and in the field GC Times there is nothing. http://apache-spark-user-list.1001560.n3.nabble.com/file/n2306/CaptureStage.png -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Explain-About-Logs-NetworkWordcount-scala-tp1835p2306.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Explain About Logs NetworkWordcount.scala
Hi Guys, Could anyone help me understanding the logs below? Why the result in the second log is 0? Thanks Guys 14/02/20 19:06:00 INFO JobScheduler: Finished job streaming job 1392919557000 ms.0 from job set of time 1392919557000 ms 14/02/20 19:06:00 INFO JobScheduler: Total delay: 3.185 s for time 1392919557000 ms (execution: 3.167 s) 14/02/20 19:06:00 INFO JobGenerator: Checkpointing graph for time 1392919557000 ms 14/02/20 19:06:00 INFO DStreamGraph: Updating checkpoint data for time 1392919557000 ms 14/02/20 19:06:00 INFO DStreamGraph: Updated checkpoint data for time 1392919557000 ms 14/02/20 19:06:00 INFO SparkContext: Starting job: first at NetworkWordCount.scala:87 14/02/20 19:06:00 INFO JobScheduler: Starting job streaming job 1392919558000 ms.0 from job set of time 1392919558000 ms 14/02/20 19:06:00 INFO CheckpointWriter: Saving checkpoint for time 1392919557000 ms to file 'hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919557000' 14/02/20 19:06:00 INFO DAGScheduler: Registering RDD 812 (combineByKey at ShuffledDStream.scala:42) 14/02/20 19:06:00 INFO DAGScheduler: Got job 91 (first at NetworkWordCount.scala:87) with 1 output partitions (allowLocal=true) 14/02/20 19:06:00 INFO DAGScheduler: Final stage: Stage 181 (first at NetworkWordCount.scala:87) 14/02/20 19:06:00 INFO DAGScheduler: Parents of final stage: List(Stage 182) 14/02/20 19:06:00 INFO DAGScheduler: Missing parents: List(Stage 182) 14/02/20 19:06:00 INFO DAGScheduler: Submitting Stage 182 (MapPartitionsRDD[812] at combineByKey at ShuffledDStream.scala:42), which has no missing parents 14/02/20 19:06:00 INFO DAGScheduler: Submitting 2 missing tasks from Stage 182 (MapPartitionsRDD[812] at combineByKey at ShuffledDStream.scala:42) 14/02/20 19:06:00 INFO TaskSchedulerImpl: Adding task set 182.0 with 2 tasks 14/02/20 19:06:00 INFO TaskSetManager: Starting task 182.0:1 as TID 609 on executor 0: computer1.ant-net (PROCESS_LOCAL) 14/02/20 19:06:00 INFO TaskSetManager: Serialized task 182.0:1 as 3023 bytes in 0 ms 14/02/20 19:06:00 INFO TaskSetManager: Starting task 182.0:0 as TID 610 on executor 0: computer1.ant-net (NODE_LOCAL) 14/02/20 19:06:00 INFO TaskSetManager: Serialized task 182.0:0 as 3485 bytes in 0 ms 14/02/20 19:06:00 INFO TaskSetManager: Finished TID 609 in 17 ms on computer1.ant-net (progress: 0/2) 14/02/20 19:06:00 INFO DAGScheduler: Completed ShuffleMapTask(182, 1) 14/02/20 19:06:00 INFO BlockManagerMasterActor$BlockManagerInfo: Added input-0-1392919527400 in memory on computer1.ant-net:41142 (size: 2018.6 KB, free: 387.3 MB) 14/02/20 19:06:00 INFO TaskSetManager: Finished TID 610 in 67 ms on computer1.ant-net (progress: 1/2) 14/02/20 19:06:00 INFO TaskSchedulerImpl: Remove TaskSet 182.0 from pool 14/02/20 19:06:00 INFO DAGScheduler: Completed ShuffleMapTask(182, 0) 14/02/20 19:06:00 INFO DAGScheduler: Stage 182 (combineByKey at ShuffledDStream.scala:42) finished in 0.080 s 14/02/20 19:06:00 INFO DAGScheduler: looking for newly runnable stages 14/02/20 19:06:00 INFO DAGScheduler: running: Set(Stage 4) 14/02/20 19:06:00 INFO DAGScheduler: waiting: Set(Stage 181) 14/02/20 19:06:00 INFO DAGScheduler: failed: Set() 14/02/20 19:06:00 INFO CheckpointWriter: Deleting hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919554000.bk 14/02/20 19:06:00 INFO DAGScheduler: Missing parents for Stage 181: List() 14/02/20 19:06:00 INFO DAGScheduler: Submitting Stage 181 (MappedRDD[815] at map at MappedDStream.scala:35), which is now runnable 14/02/20 19:06:00 INFO CheckpointWriter: Checkpoint for time 1392919557000 ms saved to file 'hdfs://computer8:54310/user/root/INPUT/checkpoint-1392919557000', took 3270 bytes and 102 ms 14/02/20 19:06:00 INFO DStreamGraph: Clearing checkpoint data for time 1392919557000 ms 14/02/20 19:06:00 INFO DStreamGraph: Cleared checkpoint data for time 1392919557000 ms 14/02/20 19:06:00 INFO DAGScheduler: Submitting 1 missing tasks from Stage 181 (MappedRDD[815] at map at MappedDStream.scala:35) 14/02/20 19:06:00 INFO TaskSchedulerImpl: Adding task set 181.0 with 1 tasks 14/02/20 19:06:00 INFO TaskSetManager: Starting task 181.0:0 as TID 611 on executor 0: computer1.ant-net (PROCESS_LOCAL) 14/02/20 19:06:00 INFO TaskSetManager: Serialized task 181.0:0 as 2057 bytes in 1 ms 14/02/20 19:06:00 INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 90 to sp...@computer1.ant-net:47226 14/02/20 19:06:00 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 90 is 146 bytes 14/02/20 19:06:00 INFO TaskSetManager: Finished TID 611 in 25 ms on computer1.ant-net (progress: 0/1) 14/02/20 19:06:00 INFO TaskSchedulerImpl: Remove TaskSet 181.0 from pool 14/02/20 19:06:00 INFO DAGScheduler: Completed ResultTask(181, 0) 14/02/20 19:06:00 INFO DAGScheduler: Stage 181 (first at NetworkWordCount.scala:87) finished in 0.027 s 14/02/20 19:06:00 INFO SparkContext: Job finished: first at NetworkWordCount.scala:87, took 0.133625862 s 118967
NetworkWordCount Tests
Hi Guys, I am doing some test with NetworkWordCount scala code where I am counting and summing a stream of words received from network using foreach action, thanks TD. Firstly I have began with this scenario 1 Master + 1 Worker(also actioning like Stream source) and I have obtained the result(bellow). The worker machine has 4GB RAM e 2 Core, I am trying to understand why some results are equals zero. I have seen that RAM Memory goes down very quickly. Could anyone help me this question? Thanks Guys 308425 964276 628731 801010 711439 808223 507143 981999 853862 852054 581291 1078153 822553 860385 907984 792155 801966 860747 804655 827498 727398 834044 821059 708479 949565 796239 813312 717552 792051 811995 803358 762467 838375 803473 773933 824912 811991 605851 1012426 631953 725137 747702 907284 0 0 0 0 0 0 113534 1591325 861635 857057 815570 287201 0 0 -- Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: object scala not found
Hi Sai Have you already tried running with JDK-7? BR On Feb 11, 2014, at 6:00, Sai Prasanna ansaiprasa...@gmail.com wrote: When i ran sbt/sbt assembly after installing scala 2.9.3 and downloading Spark 0.8.1 binaries and JDK-6 being intalled, for a standalone spark, i got the following error. I did sbt clean and even then sbt assembly is giving this error. Can someone please help !! [info] Loading project definition from /home/sparkslave/spark-0.8.1-incubating/project/project [info] Loading project definition from /home/sparkslave/spark-0.8.1-incubating/project [info] Set current project to root (in build file:/home/sparkslave/spark-0.8.1-incubating/) [info] Compiling 49 Scala sources to /home/sparkslave/spark-0.8.1-incubating/streaming/target/scala-2.9.3/classes... [error] error while loading root, error in opening zip file scala.tools.nsc.MissingRequirementError: object scala not found. at scala.tools.nsc.symtab.Definitions$definitions$.getModuleOrClass(Definitions.scala:655) at scala.tools.nsc.symtab.Definitions$definitions$.getModule(Definitions.scala:605) at scala.tools.nsc.symtab.Definitions$definitions$.ScalaPackage(Definitions.scala:145) at scala.tools.nsc.symtab.Definitions$definitions$.ScalaPackageClass(Definitions.scala:146) at scala.tools.nsc.symtab.Definitions$definitions$.AnyClass(Definitions.scala:176) at scala.tools.nsc.symtab.Definitions$definitions$.init(Definitions.scala:814) at scala.tools.nsc.Global$Run.init(Global.scala:697) at xsbt.CachedCompiler0.run(CompilerInterface.scala:86) at xsbt.CachedCompiler0.liftedTree1$1(CompilerInterface.scala:72) at xsbt.CachedCompiler0.run(CompilerInterface.scala:72) at xsbt.CompilerInterface.run(CompilerInterface.scala:27) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:73) at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:35) at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:29) at sbt.compiler.AggressiveCompile$$anonfun$4$$anonfun$compileScala$1$1.apply$mcV$sp(AggressiveCompile.scala:71) at sbt.compiler.AggressiveCompile$$anonfun$4$$anonfun$compileScala$1$1.apply(AggressiveCompile.scala:71) at sbt.compiler.AggressiveCompile$$anonfun$4$$anonfun$compileScala$1$1.apply(AggressiveCompile.scala:71) at sbt.compiler.AggressiveCompile.sbt$compiler$AggressiveCompile$$timed(AggressiveCompile.scala:101) at sbt.compiler.AggressiveCompile$$anonfun$4.compileScala$1(AggressiveCompile.scala:70) at sbt.compiler.AggressiveCompile$$anonfun$4.apply(AggressiveCompile.scala:88) at sbt.compiler.AggressiveCompile$$anonfun$4.apply(AggressiveCompile.scala:60) at sbt.inc.IncrementalCompile$$anonfun$doCompile$1.apply(Compile.scala:24) at sbt.inc.IncrementalCompile$$anonfun$doCompile$1.apply(Compile.scala:22) at sbt.inc.Incremental$.cycle(Incremental.scala:52) at sbt.inc.Incremental$.compile(Incremental.scala:29) at sbt.inc.IncrementalCompile$.apply(Compile.scala:20) at sbt.compiler.AggressiveCompile.compile2(AggressiveCompile.scala:96) at sbt.compiler.AggressiveCompile.compile1(AggressiveCompile.scala:44) at sbt.compiler.AggressiveCompile.apply(AggressiveCompile.scala:31) at sbt.Compiler$.apply(Compiler.scala:79) at sbt.Defaults$$anonfun$compileTask$1.apply(Defaults.scala:574) at sbt.Defaults$$anonfun$compileTask$1.apply(Defaults.scala:574) at sbt.Scoped$$anonfun$hf2$1.apply(Structure.scala:578) at sbt.Scoped$$anonfun$hf2$1.apply(Structure.scala:578) at scala.Function1$$anonfun$compose$1.apply(Function1.scala:49) at sbt.Scoped$Reduced$$anonfun$combine$1$$anonfun$apply$12.apply(Structure.scala:311) at sbt.Scoped$Reduced$$anonfun$combine$1$$anonfun$apply$12.apply(Structure.scala:311) at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:41) at sbt.std.Transform$$anon$5.work(System.scala:71) at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:232) at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:232) at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:18) at sbt.Execute.work(Execute.scala:238) at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:232) at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:232) at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:160) at sbt.CompletionService$$anon$2.call(CompletionService.scala:30) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at
Compiling NetworkWordCount scala code
Hi Guys, I am getting this error when I compile NetworkWordCount.scala: nfo] Compiling 1 Scala source to /opt/unibs_test/incubator-spark-tdas/examples/target/scala-2.10/classes... [error] /opt/unibs_test/incubator-spark-tdas/examples/src/main/scala/org/apache/spark/streaming/examples/NetworkWordCount.scala:63: missing parameter type for expanded function ((x$2, x$3) = x$2.$plus(x$3)) [error] val wordCounts = words.map(x = (x, 1)).reduceByKeyAndWindow(_ + _, Seconds(30), Seconds(10)) [error] ^ [error] one error found [error] (examples/compile:compile) Compilation failed [error] Total time: 16 s, completed 07-Feb-2014 01:13:21 Could anyone help me? Thanks -- INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI I dati utilizzati per l'invio del presente messaggio sono trattati dall'Università degli Studi di Brescia esclusivamente per finalità istituzionali. Informazioni più dettagliate anche in ordine ai diritti dell'interessato sono riposte nell'informativa generale e nelle notizie pubblicate sul sito web dell'Ateneo nella sezione Privacy. Il contenuto di questo messaggio è rivolto unicamente alle persona cui è indirizzato e può contenere informazioni la cui riservatezza è tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso in mancanza di autorizzazione del destinatario. Qualora il messaggio fosse pervenuto per errore, preghiamo di eliminarlo.
Re: Source code JavaNetworkWordcount
Hi Tathagata I am playing with NetworkWordCount.scala, I did some changes like this(in red): // Create the context with a 1 second batch size 67 val ssc = new StreamingContext(args(0), NetworkWordCount, Seconds(1), 68 System.getenv(SPARK_HOME), StreamingContext.jarOfClass(this.getClass)) 69 ssc.checkpoint(hdfs://computer8:54310/user/root/INPUT) 70 // Create a socket text stream on target ip:port and count the 71 // words in the input stream of \n delimited text (eg. generated by 'nc') 72 val lines1 = ssc.socketTextStream(localhost, 12345.toInt, StorageLevel.MEMORY_ONLY) 73 val lines2 = ssc.socketTextStream(localhost, 12345.toInt, StorageLevel.MEMORY_ONLY) 74 val lines3 = ssc.socketTextStream(localhost, 12345.toInt, StorageLevel.MEMORY_ONLY) 75 val union2 = lines1.union(lines2) 76 val union3 = union2.union(lines3) 77 78 //val words = lines.flatMap(_.split( )) 79 val words = union3.flatMap(_.split( )) 80 //val wordCounts = words.map(x = (x, 1)).reduceByKey(_ + _) 81 val wordCounts = words.reduceByKeyAndWindow(_ + _, Seconds(30), Seconds(10)) However I have gotten the error bellow: [error] /opt/unibs_test/incubator-spark-tdas/examples/src/main/scala/org/apache/spark/streaming/examples/NetworkWordCount.scala:81: value reduceByKeyAndWindow is not a member of org.apache.spark.streaming.dstream.DStream[String] [error] val wordCounts = words.reduceByKeyAndWindow(_ + _, Seconds(30), Seconds(10)) [error]^ [error] one error found [error] (examples/compile:compile) Compilation failed [error] Total time: 15 s, completed 05-Feb-2014 17:10:38 The class is import within the code: import org.apache.spark.streaming.{Seconds, StreamingContext} import org.apache.spark.streaming.StreamingContext._ import org.apache.spark.storage.StorageLevel Thanks On Feb 5, 2014, at 5:22, Tathagata Das tathagata.das1...@gmail.com wrote: Seems good to me. BTW, its find to MEMORY_ONLY (i.e. without replication) for testing, but you should turn on replication if you want fault-tolerance. TD On Mon, Feb 3, 2014 at 3:19 PM, Eduardo Costa Alfaia e.costaalf...@unibs.it wrote: Hi Tathagata, You were right when you have said for me to use scala against java, scala is very easy. I have implemented that code you have given (in bold), but I have implemented also an union function(in red) because I am testing with 2 stream sources, my idea is putting 3 or more stream sources and doing the union. object NetworkWordCount { 37 def main(args: Array[String]) { 38 if (args.length 1) { 39 System.err.println(Usage: NetworkWordCount master hostname port\n + 40 In local mode, master should be 'local[n]' with n 1) 41 System.exit(1) 42 } 43 44 StreamingExamples.setStreamingLogLevels() 45 46 // Create the context with a 1 second batch size 47 val ssc = new StreamingContext(args(0), NetworkWordCount, Seconds(1), 48 System.getenv(SPARK_HOME), StreamingContext.jarOfClass(this.getClass)) 49 ssc.checkpoint(hdfs://computer22:54310/user/root/INPUT) 50 // Create a socket text stream on target ip:port and count the 51 // words in the input stream of \n delimited text (eg. generated by 'nc') 52 *val lines1 = ssc.socketTextStream(localhost, 12345.toInt, StorageLevel.MEMORY_ONLY_SER)* * 53 val lines2 = ssc.socketTextStream(localhost, 12345.toInt, StorageLevel.MEMORY_ONLY_SER)* * 54 val union2 = lines1.union(lines2)* 55 //val words = lines.flatMap(_.split( )) 56 *val words = union2.flatMap(_.split( ))* 57 val wordCounts = words.map(x = (x, 1)).reduceByKey(_ + _) 58 59* words.count().foreachRDD(rdd = {* * 60 val totalCount = rdd.first()* * 61 * * 62 // print to screen* * 63 println(totalCount)* * 64 * * 65 // append count to file* * 66 // ...* * 67 })* //wordCounts.print() 70 ssc.start() 71 ssc.awaitTermination() 72 } 73 } What do you think? is My code right? I have obtained the follow result: root@computer8:/opt/unibs_test/incubator-spark-tdas# bin/run-example org.apache.spark.streaming.examples.NetworkWordCount spark://192.168.0.13:7077SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/unibs_test/incubator-spark-tdas/examples/target/scala-2.10/spark-examples-assembly-0.9.0-incubating-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/unibs_test/incubator-spark-tdas/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop1.0.4.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 14/02/04 00:02:07 INFO StreamingExamples: Using Spark's default log4j profile: org
Source code JavaNetworkWordcount
Hi Guys, I'm not very good like java programmer, so anybody could me help with this code piece from JavaNetworkWordcount: JavaPairDStreamString, Integer wordCounts = words.map( new PairFunctionString, String, Integer() { @Override public Tuple2String, Integer call(String s) throws Exception { return new Tuple2String, Integer(s, 1); } }).reduceByKey(new Function2Integer, Integer, Integer() { @Override public Integer call(Integer i1, Integer i2) throws Exception { return i1 + i2; } }); JavaPairDStreamString, Integer counts = wordCounts.reduceByKeyAndWindow( new Function2Integer, Integer, Integer() { public Integer call(Integer i1, Integer i2) { return i1 + i2; } }, new Function2Integer, Integer, Integer() { public Integer call(Integer i1, Integer i2) { return i1 - i2; } }, new Duration(60 * 5 * 1000), new Duration(1 * 1000) ); I would like to think a manner of counting and after summing and getting a total from words counted in a single file, for example a book in txt extension Don Quixote. The counts function give me the resulted from each word has found and not a total of words from the file. Tathagata has sent me a piece from scala code, Thanks Tathagata by your attention with my posts I am very thankfully, yourDStream.foreachRDD(rdd = { // Get and print first n elements val firstN = rdd.take(n) println(First N elements = + firstN) // Count the number of elements in each batch println(RDD has + rdd.count() + elements) }) yourDStream.count.print() Could anybody help me? Thanks Guys -- INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI I dati utilizzati per l'invio del presente messaggio sono trattati dall'Università degli Studi di Brescia esclusivamente per finalità istituzionali. Informazioni più dettagliate anche in ordine ai diritti dell'interessato sono riposte nell'informativa generale e nelle notizie pubblicate sul sito web dell'Ateneo nella sezione Privacy. Il contenuto di questo messaggio è rivolto unicamente alle persona cui è indirizzato e può contenere informazioni la cui riservatezza è tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso in mancanza di autorizzazione del destinatario. Qualora il messaggio fosse pervenuto per errore, preghiamo di eliminarlo.
Re: Print in JavaNetworkWordCount
Hi Tathagata, This code that you have sent me is it a scala code? yourDStream.foreachRDD(rdd = { // Get and print first n elements val firstN = rdd.take(n) println(First N elements = + firstN) // Count the number of elements in each batch println(RDD has + rdd.count() + elements) }) Thanks Il giorno 20 gennaio 2014 19:11, Tathagata Das tathagata.das1...@gmail.comha scritto: Hi Eduardo, You can do arbitrary stuff with the data in a DStream using the operation foreachRDD. yourDStream.foreachRDD(rdd = { // Get and print first n elements val firstN = rdd.take(n) println(First N elements = + firstN) // Count the number of elements in each batch println(RDD has + rdd.count() + elements) }) Alternatively, just for printing the counts, you can also do yourDStream.count.print() Hope this helps! TD 2014/1/20 Eduardo Costa Alfaia e.costaalf...@studenti.unibs.it Hi guys, Somebody help me, Where do I get change the print() function to print more than 10 lines in screen? Is there a manner to print the count total of all words in a batch? Best Regards -- --- INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI I dati utilizzati per l'invio del presente messaggio sono trattati dall'Università degli Studi di Brescia esclusivamente per finalità istituzionali. Informazioni più dettagliate anche in ordine ai diritti dell'interessato sono riposte nell'informativa generale e nelle notizie pubblicate sul sito web dell'Ateneo nella sezione Privacy. Il contenuto di questo messaggio è rivolto unicamente alle persona cui è indirizzato e può contenere informazioni la cui riservatezza è tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso in mancanza di autorizzazione del destinatario. Qualora il messaggio fosse pervenuto per errore, preghiamo di eliminarlo. -- --- INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI I dati utilizzati per l'invio del presente messaggio sono trattati dall'Università degli Studi di Brescia esclusivamente per finalità istituzionali. Informazioni più dettagliate anche in ordine ai diritti dell'interessato sono riposte nell'informativa generale e nelle notizie pubblicate sul sito web dell'Ateneo nella sezione Privacy. Il contenuto di questo messaggio è rivolto unicamente alle persona cui è indirizzato e può contenere informazioni la cui riservatezza è tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso in mancanza di autorizzazione del destinatario. Qualora il messaggio fosse pervenuto per errore, preghiamo di eliminarlo.
Re: Print in JavaNetworkWordCount
Hi Tathagata, doesn't worry I am looking for a manner in the source code of JavaNetworkWordcount print me in console the sum of the total of words in a file, not one word by line. Thanks Il giorno 28 gennaio 2014 22:36, Tathagata Das tathagata.das1...@gmail.comha scritto: Yes, it was my intention to write scala code. But I may have failed to write a correct one that compiles. Apologies. Also, something to keep in mind. This is the dev mailing for Spark developers. Questions related to using Spark should be sent to u...@spark.incubator.apache.org TD 2014/1/28 Eduardo Costa Alfaia e.costaalf...@unibs.it Hi Tathagata, This code that you have sent me is it a scala code? yourDStream.foreachRDD(rdd = { // Get and print first n elements val firstN = rdd.take(n) println(First N elements = + firstN) // Count the number of elements in each batch println(RDD has + rdd.count() + elements) }) Thanks Il giorno 20 gennaio 2014 19:11, Tathagata Das tathagata.das1...@gmail.comha scritto: Hi Eduardo, You can do arbitrary stuff with the data in a DStream using the operation foreachRDD. yourDStream.foreachRDD(rdd = { // Get and print first n elements val firstN = rdd.take(n) println(First N elements = + firstN) // Count the number of elements in each batch println(RDD has + rdd.count() + elements) }) Alternatively, just for printing the counts, you can also do yourDStream.count.print() Hope this helps! TD 2014/1/20 Eduardo Costa Alfaia e.costaalf...@studenti.unibs.it Hi guys, Somebody help me, Where do I get change the print() function to print more than 10 lines in screen? Is there a manner to print the count total of all words in a batch? Best Regards -- --- INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI I dati utilizzati per l'invio del presente messaggio sono trattati dall'Università degli Studi di Brescia esclusivamente per finalità istituzionali. Informazioni più dettagliate anche in ordine ai diritti dell'interessato sono riposte nell'informativa generale e nelle notizie pubblicate sul sito web dell'Ateneo nella sezione Privacy. Il contenuto di questo messaggio è rivolto unicamente alle persona cui è indirizzato e può contenere informazioni la cui riservatezza è tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso in mancanza di autorizzazione del destinatario. Qualora il messaggio fosse pervenuto per errore, preghiamo di eliminarlo. -- --- INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI I dati utilizzati per l'invio del presente messaggio sono trattati dall'Università degli Studi di Brescia esclusivamente per finalità istituzionali. Informazioni più dettagliate anche in ordine ai diritti dell'interessato sono riposte nell'informativa generale e nelle notizie pubblicate sul sito web dell'Ateneo nella sezione Privacy. Il contenuto di questo messaggio è rivolto unicamente alle persona cui è indirizzato e può contenere informazioni la cui riservatezza è tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso in mancanza di autorizzazione del destinatario. Qualora il messaggio fosse pervenuto per errore, preghiamo di eliminarlo. -- --- INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI I dati utilizzati per l'invio del presente messaggio sono trattati dall'Università degli Studi di Brescia esclusivamente per finalità istituzionali. Informazioni più dettagliate anche in ordine ai diritti dell'interessato sono riposte nell'informativa generale e nelle notizie pubblicate sul sito web dell'Ateneo nella sezione Privacy. Il contenuto di questo messaggio è rivolto unicamente alle persona cui è indirizzato e può contenere informazioni la cui riservatezza è tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso in mancanza di autorizzazione del destinatario. Qualora il messaggio fosse pervenuto per errore, preghiamo di eliminarlo.
Re: Print in JavaNetworkWordCount
Thanks again Tathagata for your help Best Regards On Jan 20, 2014, at 19:11, Tathagata Das tathagata.das1...@gmail.com wrote: Hi Eduardo, You can do arbitrary stuff with the data in a DStream using the operation foreachRDD. yourDStream.foreachRDD(rdd = { // Get and print first n elements val firstN = rdd.take(n) println(First N elements = + firstN) // Count the number of elements in each batch println(RDD has + rdd.count() + elements) }) Alternatively, just for printing the counts, you can also do yourDStream.count.print() Hope this helps! TD 2014/1/20 Eduardo Costa Alfaia e.costaalf...@studenti.unibs.it Hi guys, Somebody help me, Where do I get change the print() function to print more than 10 lines in screen? Is there a manner to print the count total of all words in a batch? Best Regards -- --- INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI I dati utilizzati per l'invio del presente messaggio sono trattati dall'Università degli Studi di Brescia esclusivamente per finalità istituzionali. Informazioni più dettagliate anche in ordine ai diritti dell'interessato sono riposte nell'informativa generale e nelle notizie pubblicate sul sito web dell'Ateneo nella sezione Privacy. Il contenuto di questo messaggio è rivolto unicamente alle persona cui è indirizzato e può contenere informazioni la cui riservatezza è tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso in mancanza di autorizzazione del destinatario. Qualora il messaggio fosse pervenuto per errore, preghiamo di eliminarlo. -- --- INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI I dati utilizzati per l'invio del presente messaggio sono trattati dall'Università degli Studi di Brescia esclusivamente per finalità istituzionali. Informazioni più dettagliate anche in ordine ai diritti dell'interessato sono riposte nell'informativa generale e nelle notizie pubblicate sul sito web dell'Ateneo nella sezione Privacy. Il contenuto di questo messaggio è rivolto unicamente alle persona cui è indirizzato e può contenere informazioni la cui riservatezza è tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso in mancanza di autorizzazione del destinatario. Qualora il messaggio fosse pervenuto per errore, preghiamo di eliminarlo.
Re: Print in JavaNetworkWordCount
Thanks again Tathagata for your help Best Regards On Jan 20, 2014, at 19:11, Tathagata Das tathagata.das1...@gmail.com wrote: Hi Eduardo, You can do arbitrary stuff with the data in a DStream using the operation foreachRDD. yourDStream.foreachRDD(rdd = { // Get and print first n elements val firstN = rdd.take(n) println(First N elements = + firstN) // Count the number of elements in each batch println(RDD has + rdd.count() + elements) }) Alternatively, just for printing the counts, you can also do yourDStream.count.print() Hope this helps! TD 2014/1/20 Eduardo Costa Alfaia e.costaalf...@studenti.unibs.it Hi guys, Somebody help me, Where do I get change the print() function to print more than 10 lines in screen? Is there a manner to print the count total of all words in a batch? Best Regards -- --- INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI I dati utilizzati per l'invio del presente messaggio sono trattati dall'Università degli Studi di Brescia esclusivamente per finalità istituzionali. Informazioni più dettagliate anche in ordine ai diritti dell'interessato sono riposte nell'informativa generale e nelle notizie pubblicate sul sito web dell'Ateneo nella sezione Privacy. Il contenuto di questo messaggio è rivolto unicamente alle persona cui è indirizzato e può contenere informazioni la cui riservatezza è tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso in mancanza di autorizzazione del destinatario. Qualora il messaggio fosse pervenuto per errore, preghiamo di eliminarlo. -- --- INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI I dati utilizzati per l'invio del presente messaggio sono trattati dall'Università degli Studi di Brescia esclusivamente per finalità istituzionali. Informazioni più dettagliate anche in ordine ai diritti dell'interessato sono riposte nell'informativa generale e nelle notizie pubblicate sul sito web dell'Ateneo nella sezione Privacy. Il contenuto di questo messaggio è rivolto unicamente alle persona cui è indirizzato e può contenere informazioni la cui riservatezza è tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso in mancanza di autorizzazione del destinatario. Qualora il messaggio fosse pervenuto per errore, preghiamo di eliminarlo.
Print in JavaNetworkWordCount
Hi guys, Somebody help me, Where do I get change the print() function to print more than 10 lines in screen? Is there a manner to print the count total of all words in a batch? Best Regards -- --- INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI I dati utilizzati per l'invio del presente messaggio sono trattati dall'Università degli Studi di Brescia esclusivamente per finalità istituzionali. Informazioni più dettagliate anche in ordine ai diritti dell'interessato sono riposte nell'informativa generale e nelle notizie pubblicate sul sito web dell'Ateneo nella sezione Privacy. Il contenuto di questo messaggio è rivolto unicamente alle persona cui è indirizzato e può contenere informazioni la cui riservatezza è tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso in mancanza di autorizzazione del destinatario. Qualora il messaggio fosse pervenuto per errore, preghiamo di eliminarlo.
Re: JavaNetworkWordCount Researches
Hi Tathagata, Thank you very much by the explain. Another curiosity is that I did some tests with this code yesterday where I used three machines like worker and I can see that one these machines have had the RAM memory increased, about 90% in use, in compare the others this hasn’t changed drastically and in this same machine I can see that the parts of file, in this case I am using the book Don Quixote in txt, are save in hard disk specifically in /tmp/spark-localnumbers increasing the used space. Sorry by the severals questions I am a newer in Stream processing and I looking for understand better how to work Spark DStream. Best Regards On Jan 16, 2014, at 1:48, Tathagata Das tathagata.das1...@gmail.com wrote: All the computation with the data (that is, union, flatmap, map, reduceByKey, reduceByKeyAndWindow) are executed on the workers in a distributed manner. The data is received by the worker nodes and kept in memory, then the computation is executed on the workers to the in-memory data. After the count is computed for every batch of data, the first 10 elements of the generated counts are brought to master for being printed on the screen. This is done by the counts.print() which pulls those 10 word-count pairs and prints them. On a related note, if you only want to counts over a window, you dont need the first reduceByKey. The reduceByKeyAndWindow takes care of doing the reduceByKey per batch and then doing the reduce across a window. TD On Wed, Jan 15, 2014 at 6:01 AM, Eduardo Costa Alfaia e.costaalf...@studenti.unibs.it wrote: Hi Guys, I did some changes in JavaNetworkWordCount for my researches in streaming process and I have added to the code the following lines in red: ssc1.checkpoint(hdfs://computer22:54310/user/root/INPUT); JavaDStreamString lines1 = ssc1.socketTextStream(localhost, Integer.parseInt(12345)); JavaDStreamString lines2 = ssc1.socketTextStream(localhost, Integer.parseInt(12345)); JavaDStreamString union2 = lines1.union(lines2); JavaDStreamString words = union2.flatMap(new FlatMapFunctionString, String() { @Override public IterableString call(String x) { return Lists.newArrayList(SPACE.split(x)); } }); JavaPairDStreamString, Integer wordCounts = words.map( new PairFunctionString, String, Integer() { @Override public Tuple2String, Integer call(String s) { return new Tuple2String, Integer(s, 1); } }).reduceByKey(new Function2Integer, Integer, Integer() { @Override public Integer call(Integer i1, Integer i2) { return i1 + i2; } }); JavaPairDStreamString, Integer counts = wordCounts.reduceByKeyAndWindow( new Function2Integer, Integer, Integer() { public Integer call(Integer i1, Integer i2) { return i1 + i2; } }, new Function2Integer, Integer, Integer() { public Integer call(Integer i1, Integer i2) { return i1 - i2; } }, new Duration(60 * 5 * 1000), new Duration(1 * 1000) counts.print(); ssc1.start(); } } - We did a code in C that send words to workers. - Result From Master terminal: Time: 1389794084000 ms --- (,14294) (impertinences,2) (protracted.,3) (burlesque.,3) (Dorothea,,85) (grant,,5) (temples,,2) (discord,17) (conscience,48) (singed,,2) ... --- Time: 1389794085000 ms --- (,38580) (impertinences,5) (protracted.,7) (burlesque.,7) (Dorothea,,259) (grant,,12) (temples,,7) (discord,47) (conscience,130) (singed,,5) ... My question is, where does it happening the union()? between in the nodes or in the master? I am using three machines( 1 Master + 2 Nodes). How could I get a total count of the words and show in the terminal? Thanks all -- --- INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI I dati utilizzati per l'invio del presente messaggio sono trattati dall'Università degli Studi di Brescia esclusivamente per finalità istituzionali. Informazioni più dettagliate anche in ordine ai diritti dell'interessato sono riposte nell'informativa generale e nelle notizie pubblicate sul sito web dell'Ateneo nella sezione Privacy. Il contenuto di questo messaggio è rivolto unicamente alle persona cui è indirizzato e può contenere informazioni la cui riservatezza è tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso in mancanza di autorizzazione del destinatario. Qualora il messaggio fosse pervenuto per errore, preghiamo di eliminarlo. -- --- INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI I dati utilizzati per l'invio del presente messaggio sono trattati dall'Università degli Studi di Brescia esclusivamente per finalità istituzionali. Informazioni più dettagliate anche in ordine ai diritti
Zookeeper and Spark
Hi Dears, I did recently(yesterday) a clone from github of the master incubator spark, I am configuring spark with zookeeper, I have read the spark-standalone documentation, I did the zookeeper’s installation and configuration, it’s ok, it works, I am using 2 masters nodes but when I start start-master.sh each one I get this error: 13/12/03 15:41:27 INFO zookeeper.ClientCnxn: Socket connection established to 10.20.10.60/10.20.10.60:2181, initiating session 13/12/03 15:41:27 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect 13/12/03 15:41:27 INFO zookeeper.ClientCnxn: Opening socket connection to server 10.20.10.60/10.20.10.60:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 13/12/03 15:41:27 INFO zookeeper.ClientCnxn: Socket connection established to 10.20.10.60/10.20.10.60:2181, initiating session 13/12/03 15:41:27 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect 13/12/03 15:41:27 ERROR master.SparkZooKeeperSession: Could not connect to ZooKeeper: system failure 13/12/03 15:41:27 ERROR master.ZooKeeperLeaderElectionAgent: ZooKeeper down! LeaderElectionAgent shutting down Master. 13/12/03 15:30:29 INFO zookeeper.ClientCnxn: Socket connection established to 10.20.10.60/10.20.10.60:2181, initiating session 13/12/03 15:30:29 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect 13/12/03 15:30:29 INFO zookeeper.ClientCnxn: Opening socket connection to server 10.20.10.61/10.20.10.61:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 13/12/03 15:30:29 INFO zookeeper.ClientCnxn: Socket connection established to 10.20.10.61/10.20.10.61:2181, initiating session 13/12/03 15:30:29 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect 13/12/03 15:30:29 ERROR master.SparkZooKeeperSession: Could not connect to ZooKeeper: system failure 13/12/03 15:30:29 ERROR master.ZooKeeperLeaderElectionAgent: ZooKeeper down! LeaderElectionAgent shutting down Master. and the process goes down Anybody may help me? Best Regards