Re: Where can I find logs set inside RDD processing functions?

2015-02-06 Thread Petar Zecevic
You can enable YARN log aggregation (yarn.log-aggregation-enable to true) and execute command yarn logs -applicationId your_application_id after your application finishes. Or you can look at them directly in HDFS in /tmp/logs/user/logs/applicationid/hostname On 6.2.2015. 19:50, nitinkak001

Re: Discourse: A proposed alternative to the Spark User list

2015-01-22 Thread Petar Zecevic
Ok, thanks for the clarifications. I didn't know this list has to remain as the only official list. Nabble is really not the best solution in the world, but we're stuck with it, I guess. That's it from me on this subject. Petar On 22.1.2015. 3:55, Nicholas Chammas wrote: I think a few

Re: Discourse: A proposed alternative to the Spark User list

2015-01-22 Thread Petar Zecevic
this mailing list into subproject-specific lists? That might also help tune in/out the subset of conversations of interest. On Jan 22, 2015 10:30 AM, Petar Zecevic petar.zece...@gmail.com mailto:petar.zece...@gmail.com wrote: Ok, thanks for the clarifications. I didn't know this list has

Re: LeaseExpiredException while writing schemardd to hdfs

2015-02-05 Thread Petar Zecevic
Why don't you just map rdd's rows to lines and then call saveAsTextFile()? On 3.2.2015. 11:15, Hafiz Mujadid wrote: I want to write whole schemardd to single in hdfs but facing following exception

Re: Spark-submit and multiple files

2015-03-20 Thread Petar Zecevic
I tried your program in yarn-client mode and it worked with no exception. This is the command I used: spark-submit --master yarn-client --py-files work.py main.py (Spark 1.2.1) On 20.3.2015. 9:47, Guillaume Charhon wrote: Hi Davies, I am already using --py-files. The system does use the

Re: How to configure SparkUI to use internal ec2 ip

2015-03-31 Thread Petar Zecevic
Did you try setting the SPARK_MASTER_IP parameter in spark-env.sh? On 31.3.2015. 19:19, Anny Chen wrote: Hi Akhil, I tried editing the /etc/hosts on the master and on the workers, and seems it is not working for me. I tried adding hostname internal-ip and it didn't work. I then tried

Re: Hamburg Apache Spark Meetup

2015-02-25 Thread Petar Zecevic
Please add the Zagreb Meetup group, too. http://www.meetup.com/Apache-Spark-Zagreb-Meetup/ Thanks! On 18.2.2015. 19:46, Johan Beisser wrote: If you could also add the Hamburg Apache Spark Meetup, I'd appreciate it. http://www.meetup.com/Hamburg-Apache-Spark-Meetup/ On Tue, Feb 17, 2015 at

Re: Accumulator in SparkUI for streaming

2015-02-24 Thread Petar Zecevic
Interesting. Accumulators are shown on Web UI if you are using the ordinary SparkContext (Spark 1.2). It just has to be named (and that's what you did). scala val acc = sc.accumulator(0, test accumulator) acc: org.apache.spark.Accumulator[Int] = 0 scala val rdd = sc.parallelize(1 to 1000)

Re: Posting to the list

2015-02-21 Thread Petar Zecevic
The message went through after all. Sorry for spamming. On 21.2.2015. 21:27, pzecevic wrote: Hi Spark users. Does anybody know what are the steps required to be able to post to this list by sending an email to user@spark.apache.org? I just sent a reply to Corey Nolet's mail Missing shuffle

Re: Missing shuffle files

2015-02-21 Thread Petar Zecevic
Could you try to turn on the external shuffle service? spark.shuffle.service.enable= true On 21.2.2015. 17:50, Corey Nolet wrote: I'm experiencing the same issue. Upon closer inspection I'm noticing that executors are being lost as well. Thing is, I can't figure out how they are dying. I'm

Re: Facing error while extending scala class with Product interface to overcome limit of 22 fields in spark-shell

2015-02-25 Thread Petar Zecevic
I believe your class needs to be defined as a case class (as I answered on SO).. On 25.2.2015. 5:15, anamika gupta wrote: Hi Akhil I guess it skipped my attention. I would definitely give it a try. While I would still like to know what is the issue with the way I have created schema?

Re: Fwd: Model weights of linear regression becomes abnormal values

2015-05-29 Thread Petar Zecevic
You probably need to scale the values in the data set so that they are all of comparable ranges and translate them so that their means get to 0. You can use pyspark.mllib.feature.StandardScaler(True, True) object for that. On 28.5.2015. 6:08, Maheshakya Wijewardena wrote: Hi, I'm trying

Re: Is spark suitable for real time query

2015-07-28 Thread Petar Zecevic
You can try out a few tricks employed by folks at Lynx Analytics... Daniel Darabos gave some details at Spark Summit: https://www.youtube.com/watch?v=zt1LdVj76LUindex=13list=PL-x35fyliRwhP52fwDqULJLOnqnrN5nDs On 22.7.2015. 17:00, Louis Hust wrote: My code like below: MapString,

Re: Spark - Eclipse IDE - Maven

2015-07-28 Thread Petar Zecevic
Sorry about self-promotion, but there's a really nice tutorial for setting up Eclipse for Spark in Spark in Action book: http://www.manning.com/bonaci/ On 27.7.2015. 10:22, Akhil Das wrote: You can follow this doc

Re: Spark - Eclipse IDE - Maven

2015-07-28 Thread Petar Zecevic
Sorry about self-promotion, but there's a really nice tutorial for setting up Eclipse for Spark in Spark in Action book: http://www.manning.com/bonaci/ On 24.7.2015. 7:26, Siva Reddy wrote: Hi All, I am trying to setup the Eclipse (LUNA) with Maven so that I create a maven projects

Re: Is spark suitable for real time query

2015-07-28 Thread Petar Zecevic
You can try out a few tricks employed by folks at Lynx Analytics... Daniel Darabos gave some details at Spark Summit: https://www.youtube.com/watch?v=zt1LdVj76LUindex=13list=PL-x35fyliRwhP52fwDqULJLOnqnrN5nDs On 22.7.2015. 17:00, Louis Hust wrote: My code like below: MapString,