Spark Standard Application to Test

2015-02-25 Thread danilopds
Hello, I am preparing some tests to execute in Spark in order to manipulate properties and check the variations in results. For this, I need to use a Standard Application in my environment like the well-known apps to Hadoop: Terasort

Re: MLlib - Show an element in RDD[(Int, Iterable[Array[Double]])]

2015-02-05 Thread danilopds
I solve the question with this code: import org.apache.spark.mllib.clustering.KMeans import org.apache.spark.mllib.linalg.Vectors val data = sc.textFile(/opt/testAppSpark/data/height-weight.txt).map { line = Vectors.dense(line.split(' ').map(_.toDouble)) }.cache() val cluster =

MLlib - Show an element in RDD[(Int, Iterable[Array[Double]])]

2015-02-05 Thread danilopds
Hi, I'm learning Spark and testing the Spark MLlib library with algorithm K-means. So, I created a file height-weight.txt like this: 65.0 220.0 73.0 160.0 59.0 110.0 61.0 120.0 ... And the code (executed in spark-shell): import org.apache.spark.mllib.clustering.KMeans import

Re: Spark metrics for ganglia

2014-12-15 Thread danilopds
Thanks tsingfu, I used this configuration based in your post: (with ganglia unicast mode) # Enable GangliaSink for all instances *.sink.ganglia.class=org.apache.spark.metrics.sink.GangliaSink *.sink.ganglia.host=10.0.0.7 *.sink.ganglia.port=8649 *.sink.ganglia.period=15

Re: Can not see any spark metrics on ganglia-web

2014-12-04 Thread danilopds
I used the command below because I'm using Spark 1.0.2 built with SBT and it worked. SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true SPARK_GANGLIA_LGPL=true sbt/sbt assembly -- View this message in context:

Re: Spark metrics for ganglia

2014-12-04 Thread danilopds
Hello Samudrala, Did you solve this issue about view metrics in Ganglia?? Because I have the same problem. Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-metrics-for-ganglia-tp14335p20385.html Sent from the Apache Spark User List mailing

Re: A question about streaming throughput

2014-10-15 Thread danilopds
Ok, I understand. But in both cases the data are in the same processing node. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/A-question-about-streaming-throughput-tp16416p16501.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

A question about streaming throughput

2014-10-14 Thread danilopds
Hi, I'm learning about Apache Spark Streaming and I'm doing some tests. Now, I have a modified version of the app NetworkWordCount that perform a /reduceByKeyAndWindow/ with window of 10 seconds in intervals of 5 seconds. I'm using also the function to measure the rate of records/second like

Re: Can not see any spark metrics on ganglia-web

2014-10-02 Thread danilopds
Hi tsingfu, I want to see metrics in ganglia too. But I don't understand this step: ./make-distribution.sh --tgz --skip-java-test -Phadoop-2.3 -Pyarn -Phive -Pspark-ganglia-lgpl Are you installing the hadoop, yarn, hive AND ganglia?? If I want to install just ganglia? Can you suggest me

Re: Can not see any spark metrics on ganglia-web

2014-10-02 Thread danilopds
Ok Krishna Sankar, In relation to this information on Spark monitoring webpage, For sbt users, set the SPARK_GANGLIA_LGPL environment variable before building. For Maven users, enable the -Pspark-ganglia-lgpl profile Do you know what I need to do to install with sbt? Thanks. -- View this

Spark Monitoring with Ganglia

2014-10-01 Thread danilopds
Hi, I need monitoring some aspects about my cluster like network and resources. Ganglia looks like a good option for what I need. Then, I found out that Spark has support to Ganglia. On the Spark monitoring webpage there is this information: To install the GangliaSink you’ll need to perform a

Re: Question About Submit Application

2014-10-01 Thread danilopds
I'll do this test and after I reply the result. Thank you Marcelo. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Question-About-Submit-Application-tp15072p15539.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Question About Submit Application

2014-09-24 Thread danilopds
Hello, I'm learning about Spark Streaming and I'm really excited. Today I was testing to package some apps and send them in a Standalone cluster in my computer locally. It occurred ok. So, I created one Virtual Machine with network bridge and I tried to send again the app to this VM from my local

Re: Spark Streaming Twitter Example Error

2014-09-24 Thread danilopds
I solved this question using the SBT plugin sbt-assembly. It's very good! Bye. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Twitter-Example-Error-tp12600p15073.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Memory/Network Intensive Workload

2014-09-24 Thread danilopds
Thank you for the suggestion! Bye. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Memory-Network-Intensive-Workload-tp8501p15074.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Question About Submit Application

2014-09-24 Thread danilopds
One more information.. When I submit the application from my local PC to my VM, This VM was the master and worker and my local PC wasn't part of the cluster. Thanks. -- View this message in context:

Re: streaming: code to simulate a network socket data source

2014-09-09 Thread danilopds
Hello Diana, How can I include this implementation in my code, in terms of start this task together the NetworkWordCount. In my case, I have a directory with several files. Then, I include this line: StreamingDataGenerator.streamingGenerator(NetPort, BytesSecond, DirFiles) But the program

Re: streaming: code to simulate a network socket data source

2014-09-09 Thread danilopds
I utilize this code in separated but the program block in this character: val socket = listener.accept() Do you have any suggestion? Thanks -- View this message in context:

Records - Input Byte

2014-09-08 Thread danilopds
Hi, I was reading the paper of Spark Streaming: Discretized Streams: Fault-Tolerant Streaming Computation at Scale So, I read that performance evaluation used 100-byte input records in test Grep and WordCount. I don't have much experience and I'd like to know how can I control this value in my

Spark Streaming Twitter Example Error

2014-08-21 Thread danilopds
Hi! I'm beginning with the development in Spark Streaming.. And I'm learning with the examples available in the spark directory. There are several applications and I want to make modifications. I can execute the TwitterPopularTags normally with command: ./bin/run-example TwitterPopularTags auth

Memory/Network Intensive Workload

2014-06-29 Thread danilopds
Hello, I'm studying the Spark platform and I'd like to realize experiments in your extension Spark Streaming. So, I guess that an intensive memory and network workload are a good options. Can anyone suggest a few typical Spark Streaming workloads that are network/memory intensive? If someone

Re: Interconnect benchmarking

2014-06-27 Thread danilopds
Hi, According with the research paper bellow of Mathei Zaharia, Spark's creator, http://people.csail.mit.edu/matei/papers/2013/sosp_spark_streaming.pdf He says on page 10 that: Grep is network-bound due to the cost to replicate the input data to multiple nodes. So, I guess a can be a good