Re: Implementing a custom Spark shell

2014-03-06 Thread Sampo Niskanen
Hi, I've tried to enable debug logging, but can't figure out what might be going wrong. Can anyone assist decyphering the log? The log of the startup and run attempts is at http://pastebin.com/XyeY92VF This uses SparkILoop, DEBUG level logging and settings.debug.value = true option. Line 323:

Re: Kryo serialization does not compress

2014-03-06 Thread pradeeps8
We are trying to use kryo serialization, but with kryo serialization ON the memory consumption does not change. We have tried this on multiple sets of data. We have also checked the logs of Kryo serialization and have confirmed that Kryo is being used. Can somebody please help us with this? The

Re: disconnected from cluster; reconnecting gives java.net.BindException

2014-03-06 Thread Nicholas Chammas
So this happened again today. As I noted before, the Spark shell starts up fine after I reconnect to the cluster, but this time around I tried opening a file and doing some processing. I get this message over and over (and can't do anything): 14/03/06 15:43:09 WARN scheduler.TaskSchedulerImpl:

Building spark with native library support

2014-03-06 Thread Alan Burlison
Hi, I've successfully built 0.9.0-incubating on Solaris using sbt, following the instructions at http://spark.incubator.apache.org/docs/latest/ and it seems to work OK. However, when I start it up I get an error about missing Hadoop native libraries. I can't find any mention of how to build

Re: PIG to SPARK

2014-03-06 Thread suman bharadwaj
Thanks Mayur. I don't have clear idea on how pipe works wanted to understand more on it. But when do we use pipe() and how it works ?. Can you please share some sample code if you have ( even pseudo-code is fine ) ? It will really help. Regards, Suman Bharadwaj S On Thu, Mar 6, 2014 at 3:46 AM,

Re: Building spark with native library support

2014-03-06 Thread Matei Zaharia
Is it an error, or just a warning? In any case, you need to get those libraries from a build of Hadoop for your platform. Then add them to the SPARK_LIBRARY_PATH environment variable in conf/spark-env.sh, or to your -Djava.library.path if launching an application separately. These libraries

RE: Building spark with native library support

2014-03-06 Thread Jeyaraj, Arockia R (Arockia)
Hi, I am trying to setup Spark in windows for development environment. I get following error when I run sbt. Pl help me to resolve this issue. I am working for Verizon and am in my company network and can't access internet without proxy. C:\Userssbt Getting org.fusesource.jansi jansi 1.11 ...

Access SBT with proxy

2014-03-06 Thread Mayur Rustagi
export JAVA_OPTS=$JAVA_OPTS -Dhttp.proxyHost=yourserver -Dhttp.proxyPort=8080 -Dhttp.proxyUser=username -Dhttp.proxyPassword=password Also please use separate thread for different questions. Regards Mayur Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi

RE: Access SBT with proxy

2014-03-06 Thread Jeyaraj, Arockia R (Arockia)
Thanks Alan. I am very new to Spark. I am trying to set Spark development environment in Windows. I added below mentioned export as set in sbt.bat file and tried, it was not working. Where will I see .gitconfig? set JAVA_OPTS=%JAVA_OPTS% -Dhttp.proxyHost=myservername -Dhttp.proxyPort=8080

Re: major Spark performance problem

2014-03-06 Thread Christopher Nguyen
Dana, When you run multiple applications under Spark, and if each application takes up the entire cluster resources, it is expected that one will block the other completely, thus you're seeing that the wall time add together sequentially. In addition there is some overhead associated with

Pig on Spark

2014-03-06 Thread Sameer Tilak
Hi everyone, We are using to Pig to build our data pipeline. I came across Spork -- Pig on Spark at: https://github.com/dvryaboy/pig and not sure if it is still active. Can someone please let me know the status of Spork or any other effort that will let us run Pig on Spark? We can

Re: Pig on Spark

2014-03-06 Thread Tom Graves
I had asked a similar question on the dev mailing list a while back (Jan 22nd).  See the archives:  http://mail-archives.apache.org/mod_mbox/spark-dev/201401.mbox/browser - look for spork. Basically Matei said: Yup, that was it, though I believe people at Twitter picked it up again recently.

Re: Pig on Spark

2014-03-06 Thread Aniket Mokashi
There is some work to make this work on yarn at https://github.com/aniket486/pig. (So, compile pig with ant -Dhadoopversion=23) You can look at https://github.com/aniket486/pig/blob/spork/pig-spark to find out what sort of env variables you need (sorry, I haven't been able to clean this up-

Re: Building spark with native library support

2014-03-06 Thread Alan Burlison
On 06/03/2014 18:55, Matei Zaharia wrote: For the native libraries, you can use an existing Hadoop build and just put them on the path. For linking to Hadoop, Spark grabs it through Maven, but you can do mvn install locally on your version of Hadoop to install it to your local Maven cache, and

RE: Pig on Spark

2014-03-06 Thread Sameer Tilak
Hi Aniket,Many thanks! I will check this out. Date: Thu, 6 Mar 2014 13:46:50 -0800 Subject: Re: Pig on Spark From: aniket...@gmail.com To: user@spark.apache.org; tgraves...@yahoo.com There is some work to make this work on yarn at https://github.com/aniket486/pig. (So, compile pig with ant

Re: Job aborted: Spark cluster looks down

2014-03-06 Thread Mayur Rustagi
Can you see your webUI of Spark. Is it running? (would run on masterurl:8080) if so what is the master URL shown thr.. MASTER=spark://URL:PORT ./bin/spark-shell Should work. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On

Re: NoSuchMethodError - Akka - Props

2014-03-06 Thread Deepak Nulu
I see the same error. I am trying a standalone example integrated into a Play Framework v2.2.2 application. The error occurs when I try to create a Spark Streaming Context. Compilation succeeds, so I am guessing it has to do with the version of Akka getting picked up at runtime. -- View this

Re: NoSuchMethodError - Akka - Props

2014-03-06 Thread Tathagata Das
Are you launching your application using scala or java command? scala command bring in a version of Akka that we have found to cause conflicts with Spark's version for Akka. So its best to launch using Java. TD On Thu, Mar 6, 2014 at 3:45 PM, Deepak Nulu deepakn...@gmail.com wrote: I see the

Re: NoSuchMethodError - Akka - Props

2014-03-06 Thread Deepak Nulu
I was just able to fix this in my environment. By looking at the repository/cache in my Play Framework installation, I was able to determine that spark-0.9.0-incubating uses Akka version 2.2.3. Similarly, looking at repository/local revealed that Play Framework 2.2.2 ships with Akka version

Re: Python 2.7 + numpy break sortByKey()

2014-03-06 Thread Patrick Wendell
The difference between your two jobs is that take() is optimized and only runs on the machine where you are using the shell, whereas sortByKey requires using many machines. It seems like maybe python didn't get upgraded correctly on one of the slaves. I would look in the /root/spark/work/ folder

Re: NoSuchMethodError in KafkaReciever

2014-03-06 Thread Tathagata Das
I dont have a Eclipse setup so I am not sure what is going on here. I would try to use maven in the command line with a pom to see if this compiles. Also, try to cleanup your system maven cache. Who knows if it had pulled in a wrong version of kafka 0.8 and using it all the time. Blowing away the

Re: need someone to help clear some questions.

2014-03-06 Thread qingyang li
many thanks for guiding. 2014-03-06 23:39 GMT+08:00 Yana Kadiyska yana.kadiy...@gmail.com: Hi qingyang, 1. You do not need to install shark on every node. 2. Not really sure..it's just a warning so I'd see if it works despite it 3. You need to provide the actual hdfs path, e.g.

Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-06 Thread polkosity
We're not using Ooyala's job server. We are holding the spark context for reuse within our own REST server (with a service to run each job). Our low-latency job now reads all its data from a memory cached RDD, instead of from HDFS seq file (upstream jobs cache resultant RDDs for downstream jobs

Running actions in loops

2014-03-06 Thread Ognen Duzlevski
Hello, What is the general approach people take when trying to do analysis across multiple large files where the data to be extracted from a successive file depends on the data extracted from a previous file or set of files? For example: I have the following: a group of HDFS files each

Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-06 Thread Mayur Rustagi
Would you be the best person in the world share some code. Its a pretty common problem . On Mar 6, 2014 6:36 PM, polkosity polkos...@gmail.com wrote: We're not using Ooyala's job server. We are holding the spark context for reuse within our own REST server (with a service to run each job).

Re: need someone to help clear some questions.

2014-03-06 Thread qingyang li
Hi, Yana, do you know if there is mailing list for shark like spark's? 2014-03-06 23:39 GMT+08:00 Yana Kadiyska yana.kadiy...@gmail.com: Hi qingyang, 1. You do not need to install shark on every node. 2. Not really sure..it's just a warning so I'd see if it works despite it 3. You need to

Re: NoSuchMethodError in KafkaReciever

2014-03-06 Thread Venkatakrishna T
Will give it a shot, later. BTW, this forced me to move to Scala! Decided to design our aggregation frame-work in scala for now. On 07-Mar-2014, at 6:02 AM, Tathagata Das tathagata.das1...@gmail.com wrote: I dont have a Eclipse setup so I am not sure what is going on here. I would try to use

Re: Running actions in loops

2014-03-06 Thread Ognen Duzlevski
It looks like the problem is in the filter task - is there anything special about filter()? I have removed the filter line from the loops just to see if things will work and they do. Anyone has any ideas? Thanks! Ognen On 3/6/14, 9:39 PM, Ognen Duzlevski wrote: Hello, What is the general

Please remove me from the mail list.//Re: NoSuchMethodError - Akka - Props

2014-03-06 Thread Qiuxin (robert)
Please remove me from the mail list. -邮件原件- 发件人: Deepak Nulu [mailto:deepakn...@gmail.com] 发送时间: 2014年3月7日 7:45 收件人: u...@spark.incubator.apache.org 主题: Re: NoSuchMethodError - Akka - Props I see the same error. I am trying a standalone example integrated into a Play Framework v2.2.2