Re: sbt assembly fails

2014-03-18 Thread Chengi Liu
Hi Sean, Yeah.. I am seeing erros across all repos and yepp.. this error is mainly because of connectivity issue... How do I set up proxy.. I did set up proxy as suggested by Mayur: export JAVA_OPTS=$JAVA_OPTS -Dhttp.proxyHost=yourserver -Dhttp.proxyPort=8080 -Dhttp.proxyUser=username

Re: sbt assembly fails

2014-03-18 Thread Chengi Liu
Yeah.. The http_proxy is set up.. and so is https_proxy.. Basically, my maven projects, git pulls etc everything is working fine.. except this. Here is another question which might help me to bypass this issue If I create a jar using eclipse... how do i run that jar in code. Like in hadoop, I

Re: sbt assembly fails

2014-03-18 Thread Mayur Rustagi
you need to assemble the code to get spark working (unless you are using hadoop 1.0.4). to run the code you can follow any of the standalone guides here: https://spark.apache.org/docs/0.9.0/quick-start.html#a-standalone-app-in-scalayou would still need sbt though. Mayur Rustagi Ph: +1 (760)

Feed KMeans algorithm with a row major matrix

2014-03-18 Thread Jaonary Rabarisoa
Dear All, I'm trying to cluster data from native library code with Spark Kmeans||. In my native library the data are represented as a matrix (row = number of data and col = dimension). For efficiency reason, they are copied into a one dimensional scala Array row major wise so after the

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-03-18 Thread dmpour23
On Sunday, 2 March 2014 19:19:49 UTC+2, Aureliano Buendia wrote: Is there a reason for spark using the older akka? On Sun, Mar 2, 2014 at 1:53 PM, 1esha alexey.r...@gmail.com wrote: The problem is in akka remote. It contains files compiled with 2.4.*. When you run it with 2.5.* in

Connect Exception Error in spark interactive shell...

2014-03-18 Thread Sai Prasanna
Hi ALL !! In the interactive spark shell i get the following error. I just followed the steps of the video First steps with spark - spark screen cast #1 by andy konwinski... Any thoughts ??? scala val textfile = sc.textFile(README.md) textfile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at

Re: Apache Spark 0.9.0 Build Error

2014-03-18 Thread wapisani
I tried that command on Fedora and I got a lot of random downloads (around 250 downloads) and it appeared that something was trying to get BitTorrent start. That command ./sbt/sbt assembly doesn't work on Windows. I installed sbt separately. Is there a way to determine if I'm using the sbt that's

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-03-18 Thread Ognen Duzlevski
On 3/18/14, 4:49 AM, dmpou...@gmail.com wrote: On Sunday, 2 March 2014 19:19:49 UTC+2, Aureliano Buendia wrote: Is there a reason for spark using the older akka? On Sun, Mar 2, 2014 at 1:53 PM, 1esha alexey.r...@gmail.com wrote: The problem is in akka remote. It contains files compiled

KryoSerializer return null when deserialize Task obj in Executor

2014-03-18 Thread 林武康
Hi all, I changed spark.closure.serializer to kryo, when I try count action in spark shell the Task obj deserialize in Executor return null, src line is: override def run(){ .. task = ser.deserializer[Task[Any]](...) .. } Where task is null Can any one help me? Thank you!

Re: example of non-line oriented input data?

2014-03-18 Thread Diana Carroll
Well, if anyone is still following this, I've gotten the following code working which in theory should allow me to parse whole XML files: (the problem was that I can't return the tree iterator directly. I have to call iter(). Why?) import xml.etree.ElementTree as ET # two source files, format

Re: Apache Spark 0.9.0 Build Error

2014-03-18 Thread Robin Cjc
hi, if you run that under windows, you should use \ to replace /. sbt/sbt means the sbt file under the sbt folder. On Mar 18, 2014 8:42 PM, wapisani wapis...@mtu.edu wrote: I tried that command on Fedora and I got a lot of random downloads (around 250 downloads) and it appeared that something

Re: Apache Spark 0.9.0 Build Error

2014-03-18 Thread wapisani
Hi Chen, I tried sbt\sbt assembly and I got an error of 'sbt\sbt' is not recognized as an internal or external command, operable program or batch file. On Tue, Mar 18, 2014 at 11:18 AM, Chen Jingci [via Apache Spark User List] ml-node+s1001560n2811...@n3.nabble.com wrote: hi, if you run

[spark] New article on spark scalaz-stream ( a bit of ML)

2014-03-18 Thread Pascal Voitot Dev
Hi, I wrote this new article after studying deeper how to adapt scalaz-stream to spark dstreams. I re-explain a few spark ( scalaz-stream) concepts (in my own words) in it and I went further using new scalaz-stream NIO API which is quite interesting IMHO. The result is a long blog tryptic

Re: inexplicable exceptions in Spark 0.7.3

2014-03-18 Thread Walrus theCat
Hi Andrew, Thanks for your interest. This is a standalone job. On Mon, Mar 17, 2014 at 4:30 PM, Andrew Ash and...@andrewash.com wrote: Are you running from the spark shell or from a standalone job? On Mon, Mar 17, 2014 at 4:17 PM, Walrus theCat walrusthe...@gmail.comwrote: Hi, I'm

Re: possible bug in Spark's ALS implementation...

2014-03-18 Thread Xiangrui Meng
Sorry, the link was wrong. Should be https://github.com/apache/spark/pull/131 -Xiangrui On Tue, Mar 18, 2014 at 10:20 AM, Michael Allman m...@allman.ms wrote: Hi Xiangrui, I don't see how https://github.com/apache/spark/pull/161 relates to ALS. Can you explain? Also, thanks for addressing

Re: Feed KMeans algorithm with a row major matrix

2014-03-18 Thread Xiangrui Meng
Hi Jaonary, With the current implementation, you need to call Array.slice to make each row an Array[Double] and cache the result RDD. There is a plan to support block-wise input data and I will keep you informed. Best, Xiangrui On Tue, Mar 18, 2014 at 2:46 AM, Jaonary Rabarisoa

Re: spark-shell fails

2014-03-18 Thread psteckler
Although sbt assembly reports success, I re-ran that step, and see errors like: Error extracting zip entry 'scala/tools/nsc/transformUnCurry$UnCurryTransformer$$anonfun$14$$anonfun$apply (omitting rest of super-long path) (File name too long) Is this a problem with the 'zip' tool on my

Re: spark-shell fails

2014-03-18 Thread psteckler
OK, the problem was that the directory where I had installed Spark is encrypted. The particular encryption system appears to limit the length of files. I re-installed on a vanilla partition, and spark-shell runs fine. -- View this message in context:

Maven repo for Spark pre-built with CDH4?

2014-03-18 Thread Punya Biswal
Hi all, The Maven central repo contains an artifact for spark 0.9.0 built with unmodified Hadoop, and the Cloudera repo contains an artifact for spark 0.9.0 built with CDH 5 beta. Is there a repo that contains spark-core built against a non-beta version of CDH (such as 4.4.0)? Punya

Re: possible bug in Spark's ALS implementation...

2014-03-18 Thread Michael Allman
I just ran a runtime performance comparison between 0.9.0-incubating and your als branch. I saw a 1.5x improvement in performance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2823.html Sent from the

Re: possible bug in Spark's ALS implementation...

2014-03-18 Thread Xiangrui Meng
Glad to hear the speed-up. Wish we can improve the implementation further in the future. -Xiangrui On Tue, Mar 18, 2014 at 1:55 PM, Michael Allman m...@allman.ms wrote: I just ran a runtime performance comparison between 0.9.0-incubating and your als branch. I saw a 1.5x improvement in

Regarding Successive operation on elements and recursively

2014-03-18 Thread yh18190
Hi , I am new to Spark scala environment.Currently I am working on Discrete wavelet transformation algos on time series data. I have to perform recursive additions on successive elements in RDDs. for example List of elements(RDDS) --a1 a2 a3 a4. level1 Tranformation --a1+a2 a3+a4 a1-a2

Re: Incrementally add/remove vertices in GraphX

2014-03-18 Thread Matei Zaharia
I just meant that you call union() before creating the RDDs that you pass to new Graph(). If you call it after it will produce other RDDs. The Graph() constructor actually shuffles and “indexes” the data to make graph operations efficient, so it’s not too easy to add elements after. You could

Access original filename in a map function

2014-03-18 Thread Uri Laserson
Hi spark-folk, I have a directory full of files that I want to process using PySpark. There is some necessary metadata in the filename that I would love to attach to each record in that file. Using Java MapReduce, I would access (FileSplit) context.getInputSplit()).getPath().getName() in the

Re: There is an error in Graphx

2014-03-18 Thread ankurdave
This problem occurs because graph.triplets generates an iterator that reuses the same EdgeTriplet object for every triplet in the partition. The workaround is to force a copy using graph.triplets.map(_.copy()). The solution in the AMPCamp tutorial is mistaken -- I'm not sure if that ever worked.

Re: There is an error in Graphx

2014-03-18 Thread ankurdave
The workaround is to force a copy using graph.triplets.map(_.copy()). Sorry, this actually won't copy the entire triplet, only the attributes defined in Edge. The right workaround is to copy the EdgeTriplet explicitly: graph.triplets.map { et = val et2 = new EdgeTriplet[VD, ED] // Replace

Re: sample data for pagerank?

2014-03-18 Thread ankurdave
The examples in graphx/data are meant to show the input data format, but if you want to play around with larger and more interesting datasets, we've been using the following ones, among others: - SNAP's web-Google dataset (5M edges): https://snap.stanford.edu/data/web-Google.html - SNAP's