Broadcast Variable question

2020-10-04 Thread Eduardo
will work fine if there is only a single access at a time to this object. So, my question is how many threads in each worker access broadcast variables. Thanks in advance, Eduardo

Re: Crash in Unit Tests

2017-09-29 Thread Eduardo Mello
I had this problem at my work. I solved by increasing the unix ulimit, because spark is trying to open to many files. Em 29 de set de 2017 5:05 PM, "Anthony Thomas" escreveu: > Hi Spark Users, > > I recently compiled spark 2.2.0 from source on an EC2 m4.2xlarge instance

Re: [Structured Streaming] Trying to use Spark structured streaming

2017-09-11 Thread Eduardo D'Avila
mps are equal)? Additionally, what is the use of sliding window? Thanks, Eduardo 2017-09-11 13:11 GMT-03:00 Burak Yavuz <brk...@gmail.com>: > Hi Eduardo, > > What you have written out is to output counts "as fast as possible" for > windows of 5 minute length and with a s

[Structured Streaming] Trying to use Spark structured streaming

2017-09-11 Thread Eduardo D'Avila
seems to change the behavior, but still far from what I expected. What is wrong with my assumptions on the way it should work? Given the code, how should the sample output be interpreted or used? Thanks, Eduardo

Re: Spark job profiler results showing high TCP cpu time

2017-06-23 Thread Eduardo Mello
what program do u use to profile Spark? On Fri, Jun 23, 2017 at 3:07 PM, Marcelo Vanzin wrote: > That thread looks like the connection between the Spark process and > jvisualvm. It's expected to show high up when doing sampling if the > app is not doing much else. > > On

Re: JDBC RDD Timestamp Parsing Issue

2017-06-21 Thread Eduardo Mello
You can add "?zeroDateTimeBehavior=convertToNull" to the connection string. On Wed, Jun 21, 2017 at 9:04 AM, Aviral Agarwal wrote: > The exception is happening in JDBC RDD code where getNext() is called to > get the next row. > I do not have access to the result set. I am

Transformation question

2016-04-27 Thread Eduardo
Is there a way to write a transformation that for each entry of an RDD uses certain other values of another RDD? As an example, image you have a RDD of entries to predict a certain label. In a second RDD, you have historical data. So for each entry in the first RDD, you want to find similar

Re: Installing Spark on Mac

2016-03-08 Thread Eduardo Costa Alfaia
Hi Aida, The installation has detected a maven version 3.0.3. Update to 3.3.3 and try again. Il 08/Mar/2016 14:06, "Aida" <aida1.tef...@gmail.com> ha scritto: > Hi all, > > Thanks everyone for your responses; really appreciate it. > > Eduardo - I tried your sugge

Re: Installing Spark on Mac

2016-03-04 Thread Eduardo Costa Alfaia
Hi Aida Run only "build/mvn -DskipTests clean package” BR Eduardo Costa Alfaia Ph.D. Student in Telecommunications Engineering Università degli Studi di Brescia Tel: +39 3209333018 On 3/4/16, 16:18, "Aida" <aida1.tef...@gmail.com> wrote: >Hi all, &

Re: Accessing Web UI

2016-02-19 Thread Eduardo Costa Alfaia
Hi, try http://OAhtvJ5MCA:8080 BR On 2/19/16, 07:18, "vasbhat" wrote: >OAhtvJ5MCA -- Informativa sulla Privacy: http://www.unibs.it/node/8155 - To unsubscribe, e-mail:

Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Eduardo Costa Alfaia
Hi Gourav, I did a prove as you said, for me it’s working, I am using spark in local mode, master and worker in the same machine. I run the example in spark-shell —package com.databricks:spark-csv_2.10:1.3.0 without errors. BR From: Gourav Sengupta Date: Monday,

unsubscribe email

2016-02-01 Thread Eduardo Costa Alfaia
Hi Guys, How could I unsubscribe the email e.costaalf...@studenti.unibs.it, that is an alias from my email e.costaalf...@unibs.it and it is registered in the mail list . Thanks Eduardo Costa Alfaia PhD Student Telecommunication Engineering Università degli Studi di Brescia-UNIBS

Input parsing time

2015-09-17 Thread Carlos Eduardo Santos
in the "Executor Computing Time" in History Server. Do you recommend any documentation to understand better the History Server logs and maybe more stats included in the log files? Thanks in advance, Carlos Eduardo M. Santos CS PhD student

Re: Spark Standalone Mode not working in a cluster

2015-07-13 Thread Eduardo
Akhil Das: Thanks for your reply. I am using exactly the same installation everywhere. Actually, the spark directory is shared among all nodes, including the place where I start pyspark. So, I believe this is not the problem. Regards, Eduardo On Mon, Jul 13, 2015 at 3:56 AM, Akhil Das ak

Spark Standalone Mode not working in a cluster

2015-07-12 Thread Eduardo
My installation of spark is not working correctly in my local cluster. I downloaded spark-1.4.0-bin-hadoop2.6.tgz and untar it in a directory visible to all nodes (these nodes are all accessible by ssh without password). In addition, I edited conf/slaves so that it contains the names of the nodes.

Re: python : Out of memory: Kill process

2015-03-30 Thread Eduardo Cusa
Hi, I change my process flow. Now I am processing a file per hour, instead of process at the end of the day. This decreased the memory comsuption . Regards Eduardo On Thu, Mar 26, 2015 at 3:16 PM, Davies Liu dav...@databricks.com wrote: Could you narrow down to a step which cause

Re: python : Out of memory: Kill process

2015-03-26 Thread Eduardo Cusa
, Mar 26, 2015 at 10:02 AM, Eduardo Cusa eduardo.c...@usmediaconsulting.com wrote: I running on ec2 : 1 Master : 4 CPU 15 GB RAM (2 GB swap) 2 Slaves 4 CPU 15 GB RAM the uncompressed dataset size is 15 GB On Thu, Mar 26, 2015 at 10:41 AM, Eduardo Cusa eduardo.c

Re: python : Out of memory: Kill process

2015-03-26 Thread Eduardo Cusa
I running on ec2 : 1 Master : 4 CPU 15 GB RAM (2 GB swap) 2 Slaves 4 CPU 15 GB RAM the uncompressed dataset size is 15 GB On Thu, Mar 26, 2015 at 10:41 AM, Eduardo Cusa eduardo.c...@usmediaconsulting.com wrote: Hi Davies, I upgrade to 1.3.0 and still getting Out of Memory. I ran

Re: python : Out of memory: Kill process

2015-03-26 Thread Eduardo Cusa
a taste for the new DataFrame API. On Wed, Mar 25, 2015 at 11:49 AM, Eduardo Cusa eduardo.c...@usmediaconsulting.com wrote: Hi Davies, I running 1.1.0. Now I'm following this thread that recommend use batchsize parameter = 1 http://apache-spark-user-list.1001560.n3.nabble.com/pySpark

python : Out of memory: Kill process

2015-03-25 Thread Eduardo Cusa
dataset completed successfully Any ideas to debug is welcome. Regards Eduardo

Re: python : Out of memory: Kill process

2015-03-25 Thread Eduardo Cusa
Liu dav...@databricks.com wrote: What's the version of Spark you are running? There is a bug in SQL Python API [1], it's fixed in 1.2.1 and 1.3, [1] https://issues.apache.org/jira/browse/SPARK-6055 On Wed, Mar 25, 2015 at 10:33 AM, Eduardo Cusa eduardo.c...@usmediaconsulting.com wrote: Hi

Re: Similar code in Java

2015-02-11 Thread Eduardo Costa Alfaia
Thanks Ted. On Feb 10, 2015, at 20:06, Ted Yu yuzhih...@gmail.com wrote: Please take a look at: examples/scala-2.10/src/main/java/org/apache/spark/examples/streaming/JavaDirectKafkaWordCount.java which was checked in yesterday. On Sat, Feb 7, 2015 at 10:53 AM, Eduardo Costa Alfaia

Similar code in Java

2015-02-07 Thread Eduardo Costa Alfaia
Hi Guys, How could I doing in Java the code scala below? val KafkaDStreams = (1 to numStreams) map {_ = KafkaUtils.createStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicMap,storageLevel = StorageLevel.MEMORY_ONLY).map(_._2) } val unifiedStream =

Error KafkaStream

2015-02-05 Thread Eduardo Costa Alfaia
Hi Guys, I’m getting this error in KafkaWordCount; TaskSetManager: Lost task 0.0 in stage 4095.0 (TID 1281, 10.20.10.234): java.lang.ClassCastException: [B cannot be cast to java.lang.String

Re: Error KafkaStream

2015-02-05 Thread Eduardo Costa Alfaia
I don’t think so Sean. On Feb 5, 2015, at 16:57, Sean Owen so...@cloudera.com wrote: Is SPARK-4905 / https://github.com/apache/spark/pull/4371/files the same issue? On Thu, Feb 5, 2015 at 7:03 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it wrote: Hi Guys, I’m getting this error

Re: Error KafkaStream

2015-02-05 Thread Eduardo Costa Alfaia
. `DefaultDecoder` is to return Array[Byte], not String, so here class casting will meet error. Thanks Jerry -Original Message- From: Eduardo Costa Alfaia [mailto:e.costaalf...@unibs.it] Sent: Friday, February 6, 2015 12:04 AM To: Sean Owen Cc: user@spark.apache.org Subject: Re: Error

KafkaWordCount

2015-01-30 Thread Eduardo Costa Alfaia
Hi Guys, I would like to put in the kafkawordcount scala code the kafka parameter: val kafkaParams = Map(“fetch.message.max.bytes” - “400”). I’ve put this variable like this val KafkaDStreams = (1 to numStreams) map {_ =

Error Compiling

2015-01-30 Thread Eduardo Costa Alfaia
Hi Guys, some idea how solve this error [error] /sata_disk/workspace/spark-1.1.1/examples/src/main/scala/org/apache/spark/examples/streaming/KafkaWordCount.scala:76: missing parameter type for expanded function ((x$6, x$7) = x$6.$plus(x$7))

R: Spark Streaming with Kafka

2015-01-18 Thread Eduardo Alfaia
I have the same issue. - Messaggio originale - Da: Rasika Pohankar rasikapohan...@gmail.com Inviato: ‎18/‎01/‎2015 18:48 A: user@spark.apache.org user@spark.apache.org Oggetto: Spark Streaming with Kafka I am using Spark Streaming to process data received through Kafka. The Spark

Re: Play Scala Spark Exmaple

2015-01-12 Thread Eduardo Cusa
the build file https://github.com/knoldus/Play-Spark-Scala/blob/master/build.sbt of your play application it seems that it uses Spark 1.0.1. Thanks Best Regards On Fri, Jan 9, 2015 at 7:17 PM, Eduardo Cusa eduardo.c...@usmediaconsulting.com wrote: Hi guys, I running the following example

Play Scala Spark Exmaple

2015-01-09 Thread Eduardo Cusa
Hi guys, I running the following example : https://github.com/knoldus/Play-Spark-Scala in the same machine as the spark master, and the spark cluster was lauched with ec2 script. I'm stuck with this errors, any idea how to fix it? Regards Eduardo call the play app prints the following

Re: EC2 VPC script

2014-12-29 Thread Eduardo Cusa
Eduardo On Sat, Dec 20, 2014 at 7:53 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: What version of the script are you running? What did you see in the EC2 web console when this happened? Sometimes instances just don't come up in a reasonable amount of time and you have to kill

EC2 VPC script

2014-12-18 Thread Eduardo Cusa
cluster vpc_spark... Spark AMI: ami-5bb18832 Launching instances... Launched 1 slaves in us-east-1a, regid = r-e9d603c4 Launched master in us-east-1a, regid = r-89d104a4 Waiting for cluster to enter 'ssh-ready' state... any ideas what happend? regards Eduardo

undefined

2014-12-18 Thread Eduardo Cusa
Hi guys. I run the folling command to lauch a new cluster : ./spark-ec2 -k test -i test.pem -s 1 --vpc-id vpc-X --subnet-id subnet-X launch vpc_spark The instances started ok but the command never end. With the following output: Setting up security groups... Searching for existing

Kestrel and Spark Stream

2014-11-18 Thread Eduardo Alfaia
Hi guys, Has anyone already tried doing this work? Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155

JavaKafkaWordCount

2014-11-18 Thread Eduardo Costa Alfaia
Hi Guys, I am doing some tests with JavaKafkaWordCount, my cluster is composed by 8 workers and 1 driver con spark-1.1.0, I am using Kafka too and I have some questions about. 1 - When I launch the command: bin/spark-submit --class org.apache.spark.examples.streaming.JavaKafkaWordCount

Kafka examples

2014-11-13 Thread Eduardo Costa Alfaia
Hi guys, The Kafka’s examples in master branch were canceled? Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155 - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail:

Java client connection

2014-11-12 Thread Eduardo Cusa
: Association failed with [akka.tcp://sparkMaster@10.0.2.20:7077] My spark master is run on 10.0.2.20. From pyspark I can work properly. Regards Eduardo

Spark and Kafka

2014-11-06 Thread Eduardo Costa Alfaia
Hi Guys, I am doing some tests with Spark Streaming and Kafka, but I have seen something strange, I have modified the JavaKafkaWordCount to use ReducebyKeyandWindow and to print in the screen the accumulated numbers of the words, in the beginning spark works very well in each interaction the

Re: Spark and Kafka

2014-11-06 Thread Eduardo Costa Alfaia
. On Thu, Nov 6, 2014 at 9:32 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it wrote: Hi Guys, I am doing some tests with Spark Streaming and Kafka, but I have seen something strange, I have modified the JavaKafkaWordCount to use ReducebyKeyandWindow and to print in the screen

R: Spark Kafka Performance

2014-11-04 Thread Eduardo Alfaia
. On Mon, Nov 3, 2014 at 6:57 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it wrote: Hi Guys, Anyone could explain me how to work Kafka with Spark, I am using the JavaKafkaWordCount.java like a test and the line command is: ./run-example org.apache.spark.streaming.examples.JavaKafkaWordCount

Spark Kafka Performance

2014-11-03 Thread Eduardo Costa Alfaia
Hi Guys, Anyone could explain me how to work Kafka with Spark, I am using the JavaKafkaWordCount.java like a test and the line command is: ./run-example org.apache.spark.streaming.examples.JavaKafkaWordCount spark://192.168.0.13:7077 computer49:2181 test-consumer-group unibs.it 3 and like a

Spark's Behavior 2

2014-05-13 Thread Eduardo Costa Alfaia
Hi TD, I have sent more informations now using 8 workers. The gap has been 27 sec now. Have you seen? Thanks BR -- Informativa sulla Privacy: http://www.unibs.it/node/8155

Re: Spark's behavior

2014-05-06 Thread Eduardo Costa Alfaia
Ok Andrew, Thanks I sent informations of test with 8 worker and the gap is grown up. On May 4, 2014, at 2:31, Andrew Ash and...@andrewash.com wrote: From the logs, I see that the print() starts printing stuff 10 seconds after the context is started. And that 10 seconds is taken by the

Re: Spark's behavior

2014-05-03 Thread Eduardo Costa Alfaia
. And that does not seem to be a persistent problem as after that 10 seconds, the data is being received and processed. TD On Fri, May 2, 2014 at 2:14 PM, Eduardo Costa Alfaia e.costaalf...@unibs.it wrote: Hi TD, I got the another information today using Spark 1.0 RC3 and the situation

Spark's behavior

2014-04-29 Thread Eduardo Costa Alfaia
Hi TD, In my tests with spark streaming, I'm using JavaNetworkWordCount(modified) code and a program that I wrote that sends words to the Spark worker, I use TCP as transport. I verified that after starting Spark, it connects to my source which actually starts sending, but the first word count

Re: Spark's behavior

2014-04-29 Thread Eduardo Costa Alfaia
no room for processing the received data. It could be that after 30 seconds, the server disconnects, the receiver terminates, releasing the single slot for the processing to proceed. TD On Tue, Apr 29, 2014 at 2:28 PM, Eduardo Costa Alfaia e.costaalf...@unibs.it wrote: Hi TD

Re: reduceByKeyAndWindow Java

2014-04-07 Thread Eduardo Costa Alfaia
are facing? TD On Fri, Apr 4, 2014 at 8:03 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it mailto:e.costaalf...@unibs.it wrote: Hi guys, I would like knowing if the part of code is right to use in Window. JavaPairDStreamString, Integer wordCounts = words.map( 103 new

Driver Out of Memory

2014-04-07 Thread Eduardo Costa Alfaia
Hi Guys, I would like understanding why the Driver's RAM goes down, Does the processing occur only in the workers? Thanks # Start Tests computer1(Worker/Source Stream) 23:57:18 up 12:03, 1 user, load average: 0.03, 0.31, 0.44 total used free shared

Explain Add Input

2014-04-04 Thread Eduardo Costa Alfaia
Hi all, Could anyone explain me about the lines below? computer1 - worker computer8 - driver(master) 14/04/04 14:24:56 INFO BlockManagerMasterActor$BlockManagerInfo: Added input-0-1396614314800 in memory on computer1.ant-net:60820 (size: 1262.5 KB, free: 540.3 MB) 14/04/04 14:24:56 INFO

RAM high consume

2014-04-04 Thread Eduardo Costa Alfaia
Hi all, I am doing some tests using JavaNetworkWordcount and I have some questions about the performance machine, my tests' time are approximately 2 min. Why does the RAM Memory decrease meaningly? I have done tests with 2, 3 machines and I had gotten the same behavior. What should I

Driver increase memory utilization

2014-04-04 Thread Eduardo Costa Alfaia
Hi Guys, Could anyone help me understand this driver behavior when I start the JavaNetworkWordCount? computer8 16:24:07 up 121 days, 22:21, 12 users, load average: 0.66, 1.27, 1.55 total used free shared buffers cached Mem: 5897

Parallelism level

2014-04-04 Thread Eduardo Costa Alfaia
Hi all, I have put this line in my spark-env.sh: -Dspark.default.parallelism=20 this parallelism level, is it correct? The machine's processor is a dual core. Thanks -- Informativa sulla Privacy: http://www.unibs.it/node/8155

RAM Increase

2014-04-04 Thread Eduardo Costa Alfaia
Hi Guys, Could anyone explain me this behavior? After 2 min of tests computer1- worker computer10 - worker computer8 - driver(master) computer1 18:24:31 up 73 days, 7:14, 1 user, load average: 3.93, 2.45, 1.14 total used free shared buffers cached

Re: Parallelism level

2014-04-04 Thread Eduardo Costa Alfaia
, Eduardo Costa Alfaia e.costaalf...@unibs.it mailto:e.costaalf...@unibs.it wrote: Hi all, I have put this line in my spark-env.sh: -Dspark.default.parallelism=20 this parallelism level, is it correct? The machine's processor is a dual core. Thanks -- Informativa

Re: reduceByKeyAndWindow Java

2014-04-04 Thread Eduardo Costa Alfaia
problem you are facing? TD On Fri, Apr 4, 2014 at 8:03 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it mailto:e.costaalf...@unibs.it wrote: Hi guys, I would like knowing if the part of code is right to use in Window. JavaPairDStreamString, Integer wordCounts = words.map

Print line in JavaNetworkWordCount

2014-04-02 Thread Eduardo Costa Alfaia
Hi Guys I would like printing the content inside of line in : JavaDStreamString lines = ssc.socketTextStream(args[1], Integer.parseInt(args[2])); JavaDStreamString words = lines.flatMap(new FlatMapFunctionString, String() { @Override public IterableString call(String x) {

Re: Change print() in JavaNetworkWordCount

2014-03-27 Thread Eduardo Costa Alfaia
Thank you very much Sourav BR Em 3/26/14, 17:29, Sourav Chandra escreveu: def print() { def foreachFunc = (rdd: RDD[T], time: Time) = { val total = rdd.collect().toList println (---) println (Time: + time) println

Change print() in JavaNetworkWordCount

2014-03-25 Thread Eduardo Costa Alfaia
Hi Guys, I think that I already did this question, but I don't remember if anyone has answered me. I would like changing in the function print() the quantity of words and the frequency number that are sent to driver's screen. The default value is 10. Anyone could help me with this? Best

Log Analyze

2014-03-10 Thread Eduardo Costa Alfaia
Hi Guys, Could anyone help me to understand this piece of log in red? Why is this happened? Thanks 14/03/10 16:55:20 INFO SparkContext: Starting job: first at NetworkWordCount.scala:87 14/03/10 16:55:20 INFO JobScheduler: Finished job streaming job 1394466892000 ms.0 from job set of time

Re: Explain About Logs NetworkWordcount.scala

2014-03-09 Thread Eduardo Costa Alfaia
Yes TD, I can use tcpdump to see if the data are being accepted by the receiver and if else them are arriving into the IP packet. Thanks Em 3/8/14, 4:19, Tathagata Das escreveu: I am not sure how to debug this without any more information about the source. Can you monitor on the receiver side