Hi,
According with the research paper bellow of Mathei Zaharia, Spark's creator,
http://people.csail.mit.edu/matei/papers/2013/sosp_spark_streaming.pdf
He says on page 10 that:
Grep is network-bound due to the cost to replicate the input data to
multiple nodes.
So,
I guess a can be a good
Hello,
I'm studying the Spark platform and I'd like to realize experiments in your
extension Spark Streaming.
So,
I guess that an intensive memory and network workload are a good options.
Can anyone suggest a few typical Spark Streaming workloads that are
network/memory intensive?
If someone
Hi!
I'm beginning with the development in Spark Streaming.. And I'm learning
with the examples available in the spark directory. There are several
applications and I want to make modifications.
I can execute the TwitterPopularTags normally with command:
./bin/run-example TwitterPopularTags auth
Hi,
I was reading the paper of Spark Streaming:
Discretized Streams: Fault-Tolerant Streaming Computation at Scale
So,
I read that performance evaluation used 100-byte input records in test Grep
and WordCount.
I don't have much experience and I'd like to know how can I control this
value in my
Hello Diana,
How can I include this implementation in my code, in terms of start this
task together the NetworkWordCount.
In my case, I have a directory with several files.
Then,
I include this line:
StreamingDataGenerator.streamingGenerator(NetPort, BytesSecond, DirFiles)
But the program
I utilize this code in separated but the program block in this character:
val socket = listener.accept()
Do you have any suggestion?
Thanks
--
View this message in context:
Hello,
I'm learning about Spark Streaming and I'm really excited.
Today I was testing to package some apps and send them in a Standalone
cluster in my computer locally.
It occurred ok.
So,
I created one Virtual Machine with network bridge and I tried to send again
the app to this VM from my local
I solved this question using the SBT plugin sbt-assembly.
It's very good!
Bye.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Twitter-Example-Error-tp12600p15073.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Thank you for the suggestion!
Bye.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Memory-Network-Intensive-Workload-tp8501p15074.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
One more information..
When I submit the application from my local PC to my VM,
This VM was the master and worker and my local PC wasn't part of the
cluster.
Thanks.
--
View this message in context:
Hi,
I need monitoring some aspects about my cluster like network and resources.
Ganglia looks like a good option for what I need.
Then, I found out that Spark has support to Ganglia.
On the Spark monitoring webpage there is this information:
To install the GangliaSink you’ll need to perform a
I'll do this test and after I reply the result.
Thank you Marcelo.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Question-About-Submit-Application-tp15072p15539.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi tsingfu,
I want to see metrics in ganglia too.
But I don't understand this step:
./make-distribution.sh --tgz --skip-java-test -Phadoop-2.3 -Pyarn -Phive
-Pspark-ganglia-lgpl
Are you installing the hadoop, yarn, hive AND ganglia??
If I want to install just ganglia?
Can you suggest me
Ok Krishna Sankar,
In relation to this information on Spark monitoring webpage,
For sbt users, set the SPARK_GANGLIA_LGPL environment variable before
building. For Maven users, enable the -Pspark-ganglia-lgpl profile
Do you know what I need to do to install with sbt?
Thanks.
--
View this
Hi,
I'm learning about Apache Spark Streaming and I'm doing some tests.
Now,
I have a modified version of the app NetworkWordCount that perform a
/reduceByKeyAndWindow/ with window of 10 seconds in intervals of 5 seconds.
I'm using also the function to measure the rate of records/second like
Ok,
I understand.
But in both cases the data are in the same processing node.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/A-question-about-streaming-throughput-tp16416p16501.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I used the command below because I'm using Spark 1.0.2 built with SBT and it
worked.
SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true SPARK_GANGLIA_LGPL=true sbt/sbt
assembly
--
View this message in context:
Hello Samudrala,
Did you solve this issue about view metrics in Ganglia??
Because I have the same problem.
Thanks.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-metrics-for-ganglia-tp14335p20385.html
Sent from the Apache Spark User List mailing
Thanks tsingfu,
I used this configuration based in your post: (with ganglia unicast mode)
# Enable GangliaSink for all instances
*.sink.ganglia.class=org.apache.spark.metrics.sink.GangliaSink
*.sink.ganglia.host=10.0.0.7
*.sink.ganglia.port=8649
*.sink.ganglia.period=15
I solve the question with this code:
import org.apache.spark.mllib.clustering.KMeans
import org.apache.spark.mllib.linalg.Vectors
val data = sc.textFile(/opt/testAppSpark/data/height-weight.txt).map {
line = Vectors.dense(line.split(' ').map(_.toDouble))
}.cache()
val cluster =
Hi,
I'm learning Spark and testing the Spark MLlib library with algorithm
K-means.
So,
I created a file height-weight.txt like this:
65.0 220.0
73.0 160.0
59.0 110.0
61.0 120.0
...
And the code (executed in spark-shell):
import org.apache.spark.mllib.clustering.KMeans
import
Hello,
I am preparing some tests to execute in Spark in order to manipulate
properties and check the variations in results.
For this, I need to use a Standard Application in my environment like the
well-known apps to Hadoop: Terasort
22 matches
Mail list logo