date:20141115

Help with Spark Streaming

2014-11-15 Thread Bahubali Jain

Hi, Trying to use spark streaming, but I am struggling with word count :( I want consolidate output of the word count (not on a per window basis), so I am using updateStateByKey(), but for some reason this is not working. The function it self is not being invoked(do not see the sysout output on

repartition combined with zipWithIndex get stuck

2014-11-15 Thread lev

Hi, I'm having trouble using both zipWithIndex and repartition. When I use them both, the following action will get stuck and won't return. I'm using spark 1.1.0. Those 2 lines work as expected: scala sc.parallelize(1 to 10).repartition(10).count() res0: Long = 10 scala sc.parallelize(1 to

Re: SparkSQL exception on cached parquet table

2014-11-15 Thread Cheng Lian

Hi Sadhan, Could you please provide the stack trace of the |ArrayIndexOutOfBoundsException| (if any)? The reason why the first query succeeds is that Spark SQL doesn’t bother reading all data from the table to give |COUNT(*)|. In the second case, however, the whole table is asked to be

Re: saveAsTextFile error

2014-11-15 Thread Prannoy

Hi Niko, Have you tried it running keeping the wordCounts.print() ?? Possibly the import to the package *org.apache.spark.streaming._* is not there so during sbt package it is unable to locate the saveAsTextFile API. Go to

RE: Submitting Python Applications from Remote to Master

2014-11-15 Thread Ashic Mahtab

Hi Ben,I haven't tried it with Python, but the instructions are the same as for Scala compiled (jar) apps. What it's saying is that it's not possible to offload the entire work to the master (ala hadoop) in a fire and forget (or rather submit-and-forget) manner when running on stand alone.

Re: Submitting Python Applications from Remote to Master

2014-11-15 Thread Ognen Duzlevski

Ashic, Thanks for your email. Two things: 1. I think a whole lot of data scientists and other people would love it if they could just fire off jobs from their laptops. It is, in my opinion, a common desired use case. 2. Did anyone actually get the Ooyala job server to work? I asked that

RE: Submitting Python Applications from Remote to Master

2014-11-15 Thread Ashic Mahtab

Hi Ognen,Currently, Note that cluster mode is currently not supported for standalone clusters, Mesos clusters, or python applications. So it seems like Yarn + scala is the only option for fire and forget. It shouldn't be too hard to create a proxy submitter, but yes, that does involve another

using zip gets EOFError error

2014-11-15 Thread chocjy

I was trying to zip the rdd with another rdd. I store my matrix in HDFS and load it as Ab_rdd = sc.textFile('data/Ab.txt', 100) If I do idx = sc.parallelize(range(m),100) #m is the number of records in Ab.txt print matrix_Ab.matrix.zip(idx).first() I got the following error: If I store my

Re: Using data in RDD to specify HDFS directory to write to

2014-11-15 Thread jschindler

UPDATE I have removed and added things systematically to the job and have figured that the inclusion of the construction of the SparkContext object is what is causing it to fail. The last run contained the code below. I keep losing executors apparently and I'm not sure why. Some of the

Pagerank implementation

2014-11-15 Thread tom85

Hi, I wonder if the pagerank implementation is correct. More specifically, I look at the following function from PageRank.scala https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala , which is given to Pregel: def vertexProgram(id:

How to incrementally compile spark examples using mvn

2014-11-15 Thread Yiming (John) Zhang

Hi, I have already successfully compile and run spark examples. My problem is that if I make some modifications (e.g., on SparkPi.scala or LogQuery.scala) I have to use mvn -DskipTests package to rebuild the whole spark project and wait a relatively long time. I also tried mvn scala:cc as

Re: How to incrementally compile spark examples using mvn

2014-11-15 Thread Marcelo Vanzin

I haven't tried scala:cc, but you can ask maven to just build a particular sub-project. For example: mvn -pl :spark-examples_2.10 compile On Sat, Nov 15, 2014 at 5:31 PM, Yiming (John) Zhang sdi...@gmail.com wrote: Hi, I have already successfully compile and run spark examples. My problem

Re: Client application that calls Spark and receives an MLlib model Scala Object and then predicts without Spark installed on hadoop

2014-11-15 Thread Xiangrui Meng

If Spark is not installed on the client side, you won't be able to deserialize the model. Instead of serializing the model object, you may serialize the model weights array and implement predict on the client side. -Xiangrui On Fri, Nov 14, 2014 at 2:54 PM, xiaoyan yu xiaoyan...@gmail.com wrote:

Re: repartition combined with zipWithIndex get stuck

2014-11-15 Thread Xiangrui Meng

This is a bug. Could you make a JIRA? -Xiangrui On Sat, Nov 15, 2014 at 3:27 AM, lev kat...@gmail.com wrote: Hi, I'm having trouble using both zipWithIndex and repartition. When I use them both, the following action will get stuck and won't return. I'm using spark 1.1.0. Those 2 lines

Re: repartition combined with zipWithIndex get stuck

2014-11-15 Thread Xiangrui Meng

I think I understand where the bug is now. I created a JIRA (https://issues.apache.org/jira/browse/SPARK-4433) and will make a PR soon. -Xiangrui On Sat, Nov 15, 2014 at 7:39 PM, Xiangrui Meng men...@gmail.com wrote: This is a bug. Could you make a JIRA? -Xiangrui On Sat, Nov 15, 2014 at 3:27

Re: repartition combined with zipWithIndex get stuck

2014-11-15 Thread Xiangrui Meng

PR: https://github.com/apache/spark/pull/3291 . For now, here is a workaround: val a = sc.parallelize(1 to 10).zipWithIndex() a.partitions // call .partitions explicitly a.repartition(10).count() Thanks for reporting the bug! -Xiangrui On Sat, Nov 15, 2014 at 8:38 PM, Xiangrui Meng

Re: SparkSQL exception on cached parquet table

2014-11-15 Thread sadhan

Hi Cheng, Thanks for your response.Here is the stack trace from yarn logs: -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-exception-on-cached-parquet-table-tp18978p19020.html Sent from the Apache Spark User List mailing list archive at

SparkSQL exception on spark.sql.codegen

2014-11-15 Thread Eric Zhen

Hi all, We run SparkSQL on TPCDS benchmark Q19 with spark.sql.codegen=true, we got exceptions as below, has anyone else saw these before? java.lang.ExceptionInInitializerError at org.apache.spark.sql.execution.SparkPlan.newProjection(SparkPlan.scala:92) at

Help with Spark Streaming

repartition combined with zipWithIndex get stuck

Re: SparkSQL exception on cached parquet table

Re: saveAsTextFile error

RE: Submitting Python Applications from Remote to Master

Re: Submitting Python Applications from Remote to Master

RE: Submitting Python Applications from Remote to Master

using zip gets EOFError error

Re: Using data in RDD to specify HDFS directory to write to

Pagerank implementation

How to incrementally compile spark examples using mvn

Re: How to incrementally compile spark examples using mvn

Re: Client application that calls Spark and receives an MLlib model Scala Object and then predicts without Spark installed on hadoop

Re: repartition combined with zipWithIndex get stuck

Re: repartition combined with zipWithIndex get stuck

Re: repartition combined with zipWithIndex get stuck

Re: SparkSQL exception on cached parquet table

SparkSQL exception on spark.sql.codegen

18 matches

Site Navigation

Mail list logo

Footer information