[Stream] Checkpointing | chmod: cannot access `/cygdrive/d/tmp/spark/f8e594bf-d940-41cb-ab0e-0fd3710696cb/rdd-57/.part-00001-attempt-215': No such file or directory

2014-09-01 Thread Aniket Bhatnagar
On my local (windows) dev environment, I have been trying to get spark streaming running to test my real time(ish) jobs. I have set the checkpoint directory as /tmp/spark and have installed latest cygwin. I keep getting the following error: org.apache.hadoop.util.Shell$ExitCodeException: chmod:

Re: [Stream] Checkpointing | chmod: cannot access `/cygdrive/d/tmp/spark/f8e594bf-d940-41cb-ab0e-0fd3710696cb/rdd-57/.part-00001-attempt-215': No such file or directory

2014-09-01 Thread Aniket Bhatnagar
Hi everyone It turns out that I had chef installed and it's chmod has higher preference than cygwin's chmod in the PATH. I fixed the environment variable and now its working fine. On 1 September 2014 11:48, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: On my local (windows) dev

operations on replicated RDD

2014-09-01 Thread rapelly kartheek
Hi, An RDD replicated by an application is owned by only that application. No other applications can share it. Then, what is motive behind providing the rdd replication feature. What all oparations can be performed on the replicated RDD. Thank you!!! -karthik

Spark driver application can not connect to Spark-Master

2014-09-01 Thread moon soo Lee
Hi, I'm developing an application with Spark. My java application trying to creates spark context like Creating spark context public SparkContext createSparkContext(){ String execUri = System.getenv(SPARK_EXECUTOR_URI); String[] jars = SparkILoop.getAddedJars();

Can value in spark-defaults.conf support system variables?

2014-09-01 Thread Zhanfeng Huo
Hi,all: Can value in spark-defaults.conf support system variables? Such as mess = ${user.home}/${user.name}. Best Regards Zhanfeng Huo

Has anybody faced SPARK-2604 issue regarding Application hang state

2014-09-01 Thread twinkle sachdeva
Hi, Has anyone else also experienced https://issues.apache.org/jira/browse/SPARK-2604? It is an edge case scenario of mis configuration, where the executor memory asked is same as the maximum allowed memory by yarn. In such situation, application stays in hang state, and the reason is not logged

Value of SHUFFLE_PARTITIONS

2014-09-01 Thread Chirag Aggarwal
Hi, Currently the number of shuffle partitions is config driven parameter (SHUFFLE_PARTITIONS) . This means that anyone who is running a spark-sql query should first of all analyze that what value of SHUFFLE_PARTITIONS would give the best performance for the query. Shouldn't there be a logic

[Streaming] Triggering an action in absence of data

2014-09-01 Thread Aniket Bhatnagar
Hi all I am struggling to implement a use case wherein I need to trigger an action in case no data has been received for X amount of time. I haven't been able to figure out an easy way to do this. No state/foreach methods get called when no data has arrived. I thought of generating a 'tick'

Re: Problem Accessing Hive Table from hiveContext

2014-09-01 Thread Yin Huai
Hello Igor, Although Decimal is supported, Hive 0.12 does not support user definable precision and scale (it was introduced in Hive 0.13). Thanks, Yin On Sat, Aug 30, 2014 at 1:50 AM, Zitser, Igor igor.zit...@citi.com wrote: Hi All, New to spark and using Spark 1.0.2 and hive 0.12. If

Re: how to filter value in spark

2014-09-01 Thread Matthew Farrellee
you could join, it'll give you the intersection and a list of the labels where the value was found. a.join(b).collect Array[(String, (String, String))] = Array((4,(a,b)), (3,(a,b))) best, matt On 08/31/2014 09:23 PM, Liu, Raymond wrote: You could use cogroup to combine RDDs in one RDD for

Re: transforming a Map object to RDD

2014-09-01 Thread Matthew Farrellee
and in python, map = {'a': 1, 'b': 2, 'c': 3} rdd = sc.parallelize(map.items()) rdd.collect() [('a', 1), ('c', 3), ('b', 2)] best, matt On 08/28/2014 07:01 PM, Sean Owen wrote: val map = Map(foo - 1, bar - 2, baz - 3) val rdd = sc.parallelize(map.toSeq) rdd is a an RDD[(String,Int)] and

Spark and Shark

2014-09-01 Thread arthur.hk.c...@gmail.com
Hi, I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling from source). spark: 1.0.2 shark: 0.9.2 hadoop: 2.4.1 java: java version “1.7.0_67” protobuf: 2.5.0 I have tried the smoke test in shark but got “java.util.NoSuchElementException” error, can you please advise

Re: Spark and Shark

2014-09-01 Thread Michael Armbrust
I don't believe that Shark works with Spark 1.0. Have you considered trying Spark SQL? On Mon, Sep 1, 2014 at 8:21 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling from source). spark: 1.0.2

RE: Spark and Shark

2014-09-01 Thread Paolo Platter
We tried to connect the old Simba Shark ODBC driver to the Thrift JDBC Server with Spark 1.1 RC2 and it works fine. Best Paolo Paolo Platter Agile Lab CTO Da: Michael Armbrust mich...@databricks.com Inviato: lunedì 1 settembre 2014 19:43 A:

Re: Time series forecasting

2014-09-01 Thread filipus
i guess it is not a question of spark but a question on your dataset you need to Setup think about what you wonna model and how you can shape the data in such a way spark can use it akima is a technique i know a_{t+1} = C1 * a_{t} + C2* a_{t-1} + ... + C6 * a_{t-5} spark can finde the

Spark 1.0.2 Can GroupByTest example be run in Eclipse without change

2014-09-01 Thread Shing Hing Man
Hi, I have noticed that the GroupByTest example in https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/GroupByTest.scala has been changed to be run using spark-submit. Previously, I set local as the first command line parameter, and this enable me to

Re: Can value in spark-defaults.conf support system variables?

2014-09-01 Thread Andrew Or
No, not currently. 2014-09-01 2:53 GMT-07:00 Zhanfeng Huo huozhanf...@gmail.com: Hi,all: Can value in spark-defaults.conf support system variables? Such as mess = ${user.home}/${user.name}. Best Regards -- Zhanfeng Huo

zip equal-length but unequally-partition

2014-09-01 Thread Kevin Jung
http://www.adamcrume.com/blog/archive/2014/02/19/fixing-sparks-rdd-zip http://www.adamcrume.com/blog/archive/2014/02/19/fixing-sparks-rdd-zip Please check this url . I got same problem in v1.0.1 In some cases, RDD losts several elements after zip so that a total count of ZippedRDD is less than