How to change the values in Array of Bytes

2014-09-06 Thread Deep Pradhan
Hi, I have an array of bytes and I have filled the array with 0 in all the postitions. *var Array = Array.fill[Byte](10)(0)* Now, if certain conditions are satisfied, I want to change some elements of the array to 1 instead of 0. If I run, *if (Array.apply(index)==0) Array.apply(index) = 1*

Re: Support R in Spark

2014-09-06 Thread oppokui
Cool! It is a very good news. Can’t wait for it. Kui On Sep 5, 2014, at 1:58 AM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: Thanks Kui. SparkR is a pretty young project, but there are a bunch of things we are working on. One of the main features is to expose a data frame

Re: error: type mismatch while Union

2014-09-06 Thread Dhimant
I am using Spark version 1.0.2 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/error-type-mismatch-while-Union-tp13547p13618.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to change the values in Array of Bytes

2014-09-06 Thread Aaron Davidson
More of a Scala question than Spark, but apply here can be written with just parentheses like this: val array = Array.fill[Byte](10)(0) if (array(index) == 0) { array(index) = 1 } The second instance of array(index) = 1 is actually not calling apply, but update. It's a scala-ism that's usually

Re: error: type mismatch while Union

2014-09-06 Thread Aaron Davidson
Are you doing this from the spark-shell? You're probably running into https://issues.apache.org/jira/browse/SPARK-1199 which should be fixed in 1.1. On Sat, Sep 6, 2014 at 3:03 AM, Dhimant dhimant84.jays...@gmail.com wrote: I am using Spark version 1.0.2 -- View this message in context:

Re: question on replicate() in blockManager.scala

2014-09-06 Thread Aaron Davidson
Looks like that's BlockManagerWorker.syncPutBlock(), which is in an if check, perhaps obscuring its existence. On Fri, Sep 5, 2014 at 2:19 AM, rapelly kartheek kartheek.m...@gmail.com wrote: Hi, var cachedPeers: Seq[BlockManagerId] = null private def replicate(blockId: String, data:

Re: Task not serializable

2014-09-06 Thread Sean Owen
I disagree that the generally right change is to try to make the classes serializable. Usually, classes that are not serializable are not supposed to be serialized. You're using them in a way that's causing them to be serialized, and that's probably not desired. For example, this is wrong: val

Re: Getting the type of an RDD in spark AND pyspark

2014-09-06 Thread Aaron Davidson
Pretty easy to do in Scala: rdd.elementClassTag.runtimeClass You can access this method from Python as well by using the internal _jrdd. It would look something like this (warning, I have not tested it): rdd._jrdd.classTag().runtimeClass() (The method name is classTag for JavaRDDLike, and

unsubscribe

2014-09-06 Thread Murali Raju

Re: unsubscribe

2014-09-06 Thread Derek Schoettle
Unsubscribe On Sep 6, 2014, at 7:48 AM, Murali Raju murali.r...@infrastacks.com wrote:

Re: Support R in Spark

2014-09-06 Thread Christopher Nguyen
Hi Kui, DDF (open sourced) also aims to do something similar, adding RDBMS idioms, and is already implemented on top of Spark. One philosophy is that the DDF API aggressively hides the notion of parallel datasets, exposing only (mutable) tables to users, on which they can apply R and other

Re: Task not serializable

2014-09-06 Thread Sarath Chandra
Thanks Alok, Sean. As suggested by Sean, I tried a sample program. I wrote a function in which I made a reference to a class from third party library that is not serialized and passed it to my map function. On executing I got same exception. Then I modified the program removed function and

Re: How spark parallelize maps Slices to tasks/executors/workers

2014-09-06 Thread Matthew Farrellee
On 09/04/2014 09:55 PM, Mozumder, Monir wrote: I have this 2-node cluster setup, where each node has 4-cores. MASTER (Worker-on-master) (Worker-on-node1) (slaves(master,node1)) SPARK_WORKER_INSTANCES=1 I am trying to understand Spark's parallelize behavior.

Spark SQL check if query is completed (pyspark)

2014-09-06 Thread jamborta
Hi, I am using Spark SQL to run some administrative queries and joins (e.g. create table, insert overwrite, etc), where the query does not return any data. I noticed if the query fails it prints some error message on the console, but does not actually throw an exception (this is spark 1.0.2).

Re: unsubscribe

2014-09-06 Thread Nicholas Chammas
To unsubscribe send an email to user-unsubscr...@spark.apache.org Links to sub/unsub are here: https://spark.apache.org/community.html On Sat, Sep 6, 2014 at 7:52 AM, Derek Schoettle dscho...@us.ibm.com wrote: Unsubscribe On Sep 6, 2014, at 7:48 AM, Murali Raju murali.r...@infrastacks.com

Re: Support R in Spark

2014-09-06 Thread oppokui
Thanks, Christopher. I saw it before, it is amazing. Last time I try to download it from adatao, but no response after filling the table. How can I download it or its source code? What is the license? Kui On Sep 6, 2014, at 8:08 PM, Christopher Nguyen c...@adatao.com wrote: Hi Kui, DDF

Re: Spark SQL check if query is completed (pyspark)

2014-09-06 Thread Davies Liu
The SQLContext.sql() will return an SchemaRDD, you need to call collect() to pull the data in. On Sat, Sep 6, 2014 at 6:02 AM, jamborta jambo...@gmail.com wrote: Hi, I am using Spark SQL to run some administrative queries and joins (e.g. create table, insert overwrite, etc), where the query

Re: Getting the type of an RDD in spark AND pyspark

2014-09-06 Thread Davies Liu
But you can not get what you expected in PySpark, because the RDD in Scala is serialized, so it will always be RDD[Array[Byte]], whatever the type of RDD in Python is. Davies On Sat, Sep 6, 2014 at 4:09 AM, Aaron Davidson ilike...@gmail.com wrote: Pretty easy to do in Scala:

Q: About scenarios where driver execution flow may block...

2014-09-06 Thread didata
Hello friends: I have a theory question about call blocking in a Spark driver. Consider this (admittedly contrived =:)) snippet to illustrate this question... x = rdd01.reduceByKey() # or maybe some other 'shuffle-requiring action'. b = sc.broadcast(x. take(20)) # Or any statement that

Re: Is there any way to control the parallelism in LogisticRegression

2014-09-06 Thread DB Tsai
Yes. But you need to store RDD as *serialized* Java objects. See the session of storage level http://spark.apache.org/docs/latest/programming-guide.html Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn:

Re: Support R in Spark

2014-09-06 Thread Christopher Nguyen
Hi Kui, sorry about that. That link you mentioned is probably the one for the products. We don't have one pointing from adatao.com to ddf.io; maybe we'll add it. As for access to the code base itself, I think the team has already created a GitHub repo for it, and should open it up within a few

Re: prepending jars to the driver class path for spark-submit on YARN

2014-09-06 Thread Victor Tso-Guillen
I ran into the same issue. What I did was use maven shade plugin to shade my version of httpcomponents libraries into another package. On Fri, Sep 5, 2014 at 4:33 PM, Penny Espinoza pesp...@societyconsulting.com wrote: Hey - I’m struggling with some dependency issues with