broken UI in 2.3?

2018-03-05 Thread Nan Zhu
Hi, all I am experiencing some issues in UI when using 2.3 when I clicked executor/storage tab, I got the following exception java.lang.NullPointerException at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) at

Re: Palantir replease under org.apache.spark?

2018-01-09 Thread Nan Zhu
nvm On Tue, Jan 9, 2018 at 9:42 AM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > Hi, all > > Out of curious, I just found a bunch of Palantir release under > org.apache.spark in maven central (https://mvnrepository.com/ > artifact/org.apache.spark/spark-core_2.11)? > >

Palantir replease under org.apache.spark?

2018-01-09 Thread Nan Zhu
Hi, all Out of curious, I just found a bunch of Palantir release under org.apache.spark in maven central ( https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11)? Is it on purpose? Best, Nan

Re: --jars does not take remote jar?

2017-05-02 Thread Nan Zhu
I see.Thanks! On Tue, May 2, 2017 at 9:12 AM, Marcelo Vanzin <van...@cloudera.com> wrote: > On Tue, May 2, 2017 at 9:07 AM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > > I have no easy way to pass jar path to those forked Spark > > applications? (except that I down

Re: --jars does not take remote jar?

2017-05-02 Thread Nan Zhu
ay 2, 2017 at 8:43 AM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > > Hi, all > > > > For some reason, I tried to pass in a HDFS path to the --jars option in > > spark-submit > > > > According to the document, > > http://spark.apache.org/docs/latest/submi

--jars does not take remote jar?

2017-05-02 Thread Nan Zhu
Hi, all For some reason, I tried to pass in a HDFS path to the --jars option in spark-submit According to the document, http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management, --jars would accept remote path However, in the implementation,

Re: Azure Event Hub with Pyspark

2017-04-20 Thread Nan Zhu
DocDB does have a java client? Anything prevent you using that? Get Outlook for iOS From: ayan guha Sent: Thursday, April 20, 2017 9:24:03 PM To: Ashish Singh Cc: user Subject: Re: Azure Event Hub with Pyspark Hi yes,

[Package Release] Widely accepted XGBoost now available in Spark

2016-03-16 Thread Nan Zhu
are more than welcome to join us and contribute to the project! For more details of distributed XGBoost, you can refer to the recently published paper: http://arxiv.org/abs/1603.02754 Best, -- Nan Zhu http://codingcat.me

Release Announcement: XGBoost4J - Portable Distributed XGBoost in Spark, Flink and Dataflow

2016-03-15 Thread Nan Zhu
! For more details of distributed XGBoost, you can refer to the recently published paper: http://arxiv.org/abs/1603.02754 Best, -- Nan Zhu http://codingcat.me

Re: Failing MiMa tests

2016-03-14 Thread Nan Zhu
I guess it’s Jenkins’ problem? My PR was failed for MiMa but still got a message from SparkQA (https://github.com/SparkQA) saying that "This patch passes all tests." I checked Jenkins’ history, there are other PRs with the same issue…. Best, -- Nan Zhu http://codingcat.me

Re: Paper on Spark SQL

2015-08-17 Thread Nan Zhu
an extra “,” is at the end -- Nan Zhu http://codingcat.me On Monday, August 17, 2015 at 9:28 AM, Ted Yu wrote: I got 404 when trying to access the link. On Aug 17, 2015, at 5:31 AM, Todd bit1...@163.com (mailto:bit1...@163.com) wrote: Hi, I can't access http

Re: [SparkScore]Performance portal for Apache Spark - WW26

2015-06-26 Thread Nan Zhu
Thank you, Jie! Very nice work! -- Nan Zhu http://codingcat.me On Friday, June 26, 2015 at 8:17 AM, Huang, Jie wrote: Correct. Your calculation is right! We have been aware of that kmeans performance drop also. According to our observation, it is caused by some unbalanced

Re: [SparkScore]Performance portal for Apache Spark - WW26

2015-06-26 Thread Nan Zhu
, what happened to k-means in HiBench? Best, -- Nan Zhu http://codingcat.me On Friday, June 26, 2015 at 7:24 AM, Huang, Jie wrote: Intel® Xeon® CPU E5-2697

Re: What happened to the Row class in 1.3.0?

2015-04-06 Thread Nan Zhu
Row class was not documented mistakenly in 1.3.0 you can check the 1.3.1 API doc http://people.apache.org/~pwendell/spark-1.3.1-rc1-docs/api/scala/index.html#org.apache.spark.sql.Row Best, -- Nan Zhu http://codingcat.me On Monday, April 6, 2015 at 10:23 AM, ARose wrote: I am trying

Re: What happened to the Row class in 1.3.0?

2015-04-06 Thread Nan Zhu
Hi, Ted It’s here: https://github.com/apache/spark/blob/61b427d4b1c4934bd70ed4da844b64f0e9a377aa/sql/catalyst/src/main/java/org/apache/spark/sql/RowFactory.java Best, -- Nan Zhu http://codingcat.me On Monday, April 6, 2015 at 10:44 AM, Ted Yu wrote: I searched code base but didn't

Re: java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

2015-03-31 Thread Nan Zhu
The example in https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/HBaseTest.scala might help Best, -- Nan Zhu http://codingcat.me On Tuesday, March 31, 2015 at 3:56 PM, Sean Owen wrote: Yep, it's not serializable: https://hbase.apache.org

Re: How to use more executors

2015-03-11 Thread Nan Zhu
at least 1.4 I think now using YARN or allowing multiple worker instances are just fine Best, -- Nan Zhu http://codingcat.me On Wednesday, March 11, 2015 at 8:42 PM, Du Li wrote: Is it being merged in the next release? It's indeed a critical patch! Du On Wednesday

Re: How to use more executors

2015-03-11 Thread Nan Zhu
I think this should go to another PR can you create a JIRA on that? Best, -- Nan Zhu http://codingcat.me On Wednesday, March 11, 2015 at 8:50 PM, Du Li wrote: Is it possible to extend this PR further (or create another PR) to allow for per-node configuration of workers

Re: No overwrite flag for saveAsXXFile

2015-03-06 Thread Nan Zhu
[Boolean] = new DynamicVariable[Boolean](false) I’m not sure if there is enough amount of benefits to make it worth exposing this variable to the user… Best, -- Nan Zhu http://codingcat.me On Friday, March 6, 2015 at 10:22 AM, Ted Yu wrote: Found this thread: http://search-hadoop.com/m

Re: multiple sparkcontexts and streamingcontexts

2015-03-02 Thread Nan Zhu
(in most of cases, that’s one of the existing actor receivers) The limitation might be that, all receivers are on the same machine... Here is a PR trying to expose the APIs to the user: https://github.com/apache/spark/pull/3984 Best, -- Nan Zhu http://codingcat.me On Monday, March 2, 2015

Re: How to use more executors

2015-01-21 Thread Nan Zhu
…not sure when will it be reviewed… but for now you can work around by allowing multiple worker instances on a single machine http://spark.apache.org/docs/latest/spark-standalone.html search SPARK_WORKER_INSTANCES Best, -- Nan Zhu http://codingcat.me On Wednesday, January 21, 2015

Re: enable debug-level log output of akka?

2015-01-14 Thread Nan Zhu
.* to executor? Hi, Josh, would you mind giving some hints, as you created and closed the JIRA? Best, -- Nan Zhu On Wednesday, January 14, 2015 at 6:19 PM, Nan Zhu wrote: Hi, Ted, Thanks I know how to set in Akka’s context, my question is just how to pass this aka.loglevel=DEBUG

Re: enable debug-level log output of akka?

2015-01-14 Thread Nan Zhu
sorry for the mistake, I found that those akka related messages are from Spark Akka-related component (ActorLogReceive) , instead of Akka itself, though it has been enough for the debugging purpose (in my case) the question in this thread is still in open status…. Best, -- Nan Zhu http

Re: enable debug-level log output of akka?

2015-01-14 Thread Nan Zhu
for others who have the same question: you can simply set logging level in log4j.properties to DEBUG to achieve this Best, -- Nan Zhu http://codingcat.me On Wednesday, January 14, 2015 at 6:28 PM, Nan Zhu wrote: I quickly went through the code, In ExecutorBackend, we build

Re: enable debug-level log output of akka?

2015-01-14 Thread Nan Zhu
Hi, Ted, Thanks I know how to set in Akka’s context, my question is just how to pass this aka.loglevel=DEBUG to Spark’s actor system Best, -- Nan Zhu http://codingcat.me On Wednesday, January 14, 2015 at 6:09 PM, Ted Yu wrote: I assume you have looked at: http://doc.akka.io/docs

enable debug-level log output of akka?

2015-01-14 Thread Nan Zhu
of spark streaming, (like me), usually needs the detailed log for debugging…. Best, -- Nan Zhu http://codingcat.me

Re: MLUtil.kfold generates overlapped training and validation set?

2014-10-10 Thread Nan Zhu
Thanks, Xiangrui, I found the reason of overlapped training set and test set …. Another counter-intuitive issue related to https://github.com/apache/spark/pull/2508 Best, -- Nan Zhu On Friday, October 10, 2014 at 2:19 AM, Xiangrui Meng wrote: 1. No. 2. The seed per partition

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Nan Zhu
Great! Congratulations! -- Nan Zhu On Friday, October 10, 2014 at 11:19 AM, Mridul Muralidharan wrote: Brilliant stuff ! Congrats all :-) This is indeed really heartening news ! Regards, Mridul On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia matei.zaha...@gmail.com

Re: Akka disassociation on Java SE Embedded

2014-10-10 Thread Nan Zhu
https://github.com/CodingCat/spark/commit/c5cee24689ac4ad1187244e6a16537452e99e771 -- Nan Zhu On Friday, October 10, 2014 at 4:31 PM, bhusted wrote: How do you increase the spark block manager timeout? -- View this message in context: http://apache-spark-user-list.1001560.n3

MLUtil.kfold generates overlapped training and validation set?

2014-10-09 Thread Nan Zhu
training and validation set ? (counter intuitive to me) 2. I had some misunderstanding on the code? 3. it’s a bug? Anyone can explain it to me? Best, -- Nan Zhu

Re: Reading from HBase is too slow

2014-09-29 Thread Nan Zhu
can you look at your HBase UI to check whether your job is just reading from a single region server? Best, -- Nan Zhu On Monday, September 29, 2014 at 10:21 PM, Tao Xiao wrote: I submitted a job in Yarn-Client mode, which simply reads from a HBase table containing tens of millions

Re: executorAdded event to DAGScheduler

2014-09-26 Thread Nan Zhu
such a deployment mode Best, -- Nan Zhu On Friday, September 26, 2014 at 8:02 AM, praveen seluka wrote: Can someone explain the motivation behind passing executorAdded event to DAGScheduler ? DAGScheduler does submitWaitingStages when executorAdded method is called

Re: Distributed dictionary building

2014-09-23 Thread Nan Zhu
great, thanks -- Nan Zhu On Tuesday, September 23, 2014 at 9:58 AM, Sean Owen wrote: Yes, Matei made a JIRA last week and I just suggested a PR: https://github.com/apache/spark/pull/2508 On Sep 23, 2014 2:55 PM, Nan Zhu zhunanmcg...@gmail.com (mailto:zhunanmcg...@gmail.com) wrote

Re: Distributed dictionary building

2014-09-23 Thread Nan Zhu
shall we document this in the API doc? Best, -- Nan Zhu On Sunday, September 21, 2014 at 12:18 PM, Debasish Das wrote: zipWithUniqueId is also affected... I had to persist the dictionaries to make use of the indices lower down in the flow... On Sun, Sep 21, 2014 at 1:15 AM, Sean

Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2014-09-11 Thread Nan Zhu
Hi, Can you attach more logs to see if there is some entry from ContextCleaner? I met very similar issue before…but haven’t get resolved Best, -- Nan Zhu On Thursday, September 11, 2014 at 10:13 AM, Dibyendu Bhattacharya wrote: Dear All, Not sure if this is a false alarm

Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2014-09-11 Thread Nan Zhu
) at java.lang.Thread.run(Thread.java:744) -- Nan Zhu On Thursday, September 11, 2014 at 10:42 AM, Nan Zhu wrote: Hi, Can you attach more logs to see if there is some entry from ContextCleaner? I met very similar issue before…but haven’t get resolved Best, -- Nan Zhu

Re: Upgrading 1.0.0 to 1.0.2

2014-08-26 Thread Nan Zhu
Hi, Victor, the issue for you to have different version in driver and cluster is that you the master will shutdown your application due to the inconsistent SerialVersionID in ExecutorState Best, -- Nan Zhu On Tuesday, August 26, 2014 at 10:10 PM, Matei Zaharia wrote: Things

SELECT DISTINCT generates random results?

2014-08-05 Thread Nan Zhu
Hi, all I use “SELECT DISTINCT” to query the data saved in hive it seems that this statement cannot understand the table structure and just output the data in other fields Anyone met the similar problem before? Best, -- Nan Zhu

Re: SELECT DISTINCT generates random results?

2014-08-05 Thread Nan Zhu
nvm, some problem brought by the ill-formatted raw data -- Nan Zhu On Tuesday, August 5, 2014 at 3:42 PM, Nan Zhu wrote: Hi, all I use “SELECT DISTINCT” to query the data saved in hive it seems that this statement cannot understand the table structure and just output

DROP IF EXISTS still throws exception about table does not exist?

2014-07-21 Thread Nan Zhu
. Maybe you could fill a ticket on Spark JIRA. BUT, it's not a bug in HIVE IMHO.” My question is the DDL is executed by Hive itself, doesn’t it? Best, -- Nan Zhu

broadcast variable get cleaned by ContextCleaner unexpectedly ?

2014-07-21 Thread Nan Zhu
the variable is cleaned, since there are enough memory space? Best, -- Nan Zhu

Re: broadcast variable get cleaned by ContextCleaner unexpectedly ?

2014-07-21 Thread Nan Zhu
, in usual, it will success in 1/10 times) I once suspected that it’s related to some concurrency issue, but even I disable the parallel test in built.sbt, the problem is still there --- Best, -- Nan Zhu On Monday, July 21, 2014 at 5:40 PM, Tathagata Das wrote: The ContextCleaner cleans

Re: broadcast variable get cleaned by ContextCleaner unexpectedly ?

2014-07-21 Thread Nan Zhu
well, But I do not think it will bring this problem even the spark.cores.max is too large? Best, -- Nan Zhu On Monday, July 21, 2014 at 6:11 PM, Nan Zhu wrote: Hi, TD, Thanks for the reply I tried to reproduce this in a simpler program, but no luck However, the program

Re: DROP IF EXISTS still throws exception about table does not exist?

2014-07-21 Thread Nan Zhu
Ah, I see, thanks, Yin -- Nan Zhu On Monday, July 21, 2014 at 5:00 PM, Yin Huai wrote: Hi Nan, It is basically a log entry because your table does not exist. It is not a real exception. Thanks, Yin On Mon, Jul 21, 2014 at 7:10 AM, Nan Zhu zhunanmcg...@gmail.com

Re: broadcast variable get cleaned by ContextCleaner unexpectedly ?

2014-07-21 Thread Nan Zhu
Ah, sorry, sorry, my brain just damaged….. sent some wrong information not “spark.cores.max” but the minPartitions in sc.textFile() Best, -- Nan Zhu On Monday, July 21, 2014 at 7:17 PM, Tathagata Das wrote: That is definitely weird. spark.core.max should not affect thing when

try JDBC server

2014-07-11 Thread Nan Zhu
Hi, all I would like to give a try on JDBC server (which is supposed to be released in 1.1) where can I find the document about that? Best, -- Nan Zhu

Re: try JDBC server

2014-07-11 Thread Nan Zhu
nvm for others with the same question: https://github.com/apache/spark/commit/8032fe2fae3ac40a02c6018c52e76584a14b3438 -- Nan Zhu On Friday, July 11, 2014 at 7:02 PM, Nan Zhu wrote: Hi, all I would like to give a try on JDBC server (which is supposed to be released in 1.1) where

Re: master attempted to re-register the worker and then took all workers as unregistered

2014-07-07 Thread Nan Zhu
Hey, Cheney, The problem is still existing? Sorry for the delay, I’m starting to look at this issue, Best, -- Nan Zhu On Tuesday, May 6, 2014 at 10:06 PM, Cheney Sun wrote: Hi Nan, In worker's log, I see the following exception thrown when try to launch on executor

Re: long GC pause during file.cache()

2014-06-15 Thread Nan Zhu
SPARK_JAVA_OPTS is deprecated in 1.0, though it works fine if you don’t mind the WARNING in the logs you can set spark.executor.extraJavaOpts in your SparkConf obj Best, -- Nan Zhu On Sunday, June 15, 2014 at 12:13 PM, Hao Wang wrote: Hi, Wei You may try to set JVM opts in spark

Re: long GC pause during file.cache()

2014-06-15 Thread Nan Zhu
Yes, I think in the spark-env.sh.template, it is listed in the comments (didn’t check….) Best, -- Nan Zhu On Sunday, June 15, 2014 at 5:21 PM, Surendranauth Hiraman wrote: Is SPARK_DAEMON_JAVA_OPTS valid in 1.0.0? On Sun, Jun 15, 2014 at 4:59 PM, Nan Zhu zhunanmcg

Re: overwriting output directory

2014-06-12 Thread Nan Zhu
Hi, SK For 1.0.0 you have to delete it manually in 1.0.1 there will be a parameter to enable overwriting https://github.com/apache/spark/pull/947/files Best, -- Nan Zhu On Thursday, June 12, 2014 at 1:57 PM, SK wrote: Hi, When we have multiple runs of a program writing to the same

Re: Writing data to HBase using Spark

2014-06-12 Thread Nan Zhu
you are using spark streaming? master = “local[n]” where n 1? Best, -- Nan Zhu On Wednesday, June 11, 2014 at 4:23 AM, gaurav.dasgupta wrote: Hi Kanwaldeep, I have tried your code but arrived into a problem. The code is working fine in local mode. But if I run the same code

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-12 Thread Nan Zhu
Actually this has been merged to the master branch https://github.com/apache/spark/pull/947 -- Nan Zhu On Thursday, June 12, 2014 at 2:39 PM, Daniel Siegmann wrote: The old behavior (A) was dangerous, so it's good that (B) is now the default. But in some cases I really do want

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-12 Thread Nan Zhu
ah, I see, I think it’s hard to do something like fs.delete() in spark code (it’s scary as we discussed in the previous PR ) so if you want (C), I guess you have to do some delete work manually Best, -- Nan Zhu On Thursday, June 12, 2014 at 3:31 PM, Daniel Siegmann wrote: I do

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Nan Zhu
Hi, Patrick, I think https://issues.apache.org/jira/browse/SPARK-1677 is talking about the same thing? How about assigning it to me? I think I missed the configuration part in my previous commit, though I declared that in the PR description…. Best, -- Nan Zhu On Monday, June 2

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Nan Zhu
I made the PR, the problem is …after many rounds of review, that configuration part is missed….sorry about that I will fix it Best, -- Nan Zhu On Monday, June 2, 2014 at 5:13 PM, Pierre Borckmans wrote: I'm a bit confused because the PR mentioned by Patrick seems to adress all

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Nan Zhu
I remember that in the earlier version of that PR, I deleted files by calling HDFS API we discussed and concluded that, it’s a bit scary to have something directly deleting user’s files in Spark Best, -- Nan Zhu On Monday, June 2, 2014 at 10:39 PM, Patrick Wendell wrote: (A) Semantics

IllegelAccessError when writing to HBase?

2014-05-18 Thread Nan Zhu
(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Can anyone give some hint to the issue? Best, -- Nan Zhu

Re: IllegelAccessError when writing to HBase?

2014-05-18 Thread Nan Zhu
I tried hbase-0.96.2/0.98.1/0.98.2 HDFS version is 2.3 -- Nan Zhu On Sunday, May 18, 2014 at 4:18 PM, Nan Zhu wrote: Hi, all I tried to write data to HBase in a Spark-1.0 rc8 application, the application is terminated due to the java.lang.IllegalAccessError, Hbase shell works

Re: Spark unit testing best practices

2014-05-16 Thread Nan Zhu
+1, at least with current code just watch the log printed by DAGScheduler… -- Nan Zhu On Wednesday, May 14, 2014 at 1:58 PM, Mark Hamstra wrote: serDe

Re: sbt run with spark.ContextCleaner ERROR

2014-05-15 Thread Nan Zhu
same problem +1, though does not change the program result -- Nan Zhu On Tuesday, May 6, 2014 at 11:58 PM, Tathagata Das wrote: Okay, this needs to be fixed. Thanks for reporting this! On Mon, May 5, 2014 at 11:00 PM, wxhsdp wxh...@gmail.com (mailto:wxh...@gmail.com) wrote: Hi

Re: master attempted to re-register the worker and then took all workers as unregistered

2014-05-05 Thread Nan Zhu
Ah, I think this should be fixed in 0.9.1? Did you see the exception is thrown in the worker side? Best, -- Nan Zhu On Sunday, May 4, 2014 at 10:15 PM, Cheney Sun wrote: Hi Nan, Have you found a way to fix the issue? Now I run into the same problem with version 0.9.1. Thanks

Spark-1.0.0-rc3 compiled against Hadoop 2.3.0 cannot read HDFS 2.3.0?

2014-05-03 Thread Nan Zhu
) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Anyone met the same issue before? Best, -- Nan Zhu

spark-0.9.1 compiled with Hadoop 2.3.0 doesn't work with S3?

2014-04-21 Thread Nan Zhu
) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 63 more Anyone else met the similar problem? Best, -- Nan Zhu

Re: spark-0.9.1 compiled with Hadoop 2.3.0 doesn't work with S3?

2014-04-21 Thread Nan Zhu
Yes, I fixed in the same way, but didn’t get a change to get back to here I also made a PR: https://github.com/apache/spark/pull/468 Best, -- Nan Zhu On Monday, April 21, 2014 at 8:19 PM, Parviz Deyhim wrote: I ran into the same issue. The problem seems to be with the jets3t library

Re: Only TraversableOnce?

2014-04-09 Thread Nan Zhu
Yeah, should be right -- Nan Zhu On Wednesday, April 9, 2014 at 8:54 PM, wxhsdp wrote: thank you, it works after my operation over p, return p.toIterator, because mapPartitions has iterator return type, is that right? rdd.mapPartitions{D = {val p = D.toArray; ...; p.toIterator

Re: Only TraversableOnce?

2014-04-08 Thread Nan Zhu
so, the data structure looks like: D consists of D1, D2, D3 (DX is partition) and DX consists of d1, d2, d3 (dx is the part in your context)? what you want to do is to transform DX to (d1 + d2, d1 + d3, d2 + d3)? Best, -- Nan Zhu On Tuesday, April 8, 2014 at 8:09 AM, wxhsdp wrote

Re: Only TraversableOnce?

2014-04-08 Thread Nan Zhu
If that’s the case, I think mapPartition is what you need, but it seems that you have to load the partition into the memory as whole by toArray rdd.mapPartition{D = {val p = D.toArray; ...}} -- Nan Zhu On Tuesday, April 8, 2014 at 8:40 AM, wxhsdp wrote: yes, how can i do

Re: Why doesn't the driver node do any work?

2014-04-08 Thread Nan Zhu
may be unrelated to the question itself, just FYI you can run your driver program in worker node with Spark-0.9 http://spark.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster Best, -- Nan Zhu On Tuesday, April 8, 2014 at 5:11 PM, Nicholas Chammas

Re: Status of MLI?

2014-04-01 Thread Nan Zhu
/browse/SPARK/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel -- Nan Zhu On Tuesday, April 1, 2014 at 10:38 PM, Krakna H wrote: What is the current development status of MLI/MLBase? I see that the github repo is lying dormant (https://github.com/amplab/MLI) and JIRA has

Re: Status of MLI?

2014-04-01 Thread Nan Zhu
Ah, I see, I’m sorry, I didn’t read your email carefully then I have no idea about the progress on MLBase Best, -- Nan Zhu On Tuesday, April 1, 2014 at 11:05 PM, Krakna H wrote: Hi Nan, I was actually referring to MLI/MLBase (http://www.mlbase.org); is this being actively

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Nan Zhu
in the Spark UI -- Nan Zhu On Wednesday, March 26, 2014 at 8:54 AM, Sai Prasanna wrote: Is it possible to run across cluster using Spark Interactive Shell ? To be more explicit, is the procedure similar to running standalone master-slave spark. I want to execute my code in the interactive

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Nan Zhu
to the executors, i.e. run in a distributed fashion Best, -- Nan Zhu On Wednesday, March 26, 2014 at 9:01 AM, Sai Prasanna wrote: Nan Zhu, its the later, I want to distribute the tasks to the cluster [machines available.] If i set the SPARK_MASTER_IP at the other machines and set

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Nan Zhu
master does more work than that actually, I just explained why he should set MASTER_IP correctly a simplified list: 1. maintain the worker status 2. maintain in-cluster driver status 3. maintain executor status (the worker tells master what happened on the executor, -- Nan Zhu

Re: Distributed running in Spark Interactive shell

2014-03-26 Thread Nan Zhu
to the cluster remotely, it’s better to open an RPC to the driver and have it submit operations from nearby than to run a driver far away from the worker nodes. -- Nan Zhu On Wednesday, March 26, 2014 at 9:59 AM, Nan Zhu wrote: master does more work than that actually, I just explained why he should

Re: quick start guide: building a standalone scala program

2014-03-24 Thread Nan Zhu
Hi, Diana, See my inlined answer -- Nan Zhu On Monday, March 24, 2014 at 3:44 PM, Diana Carroll wrote: Has anyone successfully followed the instructions on the Quick Start page of the Spark home page to run a standalone Scala application? I can't, and I figure I must be missing

Re: quick start guide: building a standalone scala program

2014-03-24 Thread Nan Zhu
Hi, Diana, You don’t need to use spark-distributed sbt just download sbt from its official website and set your PATH to the right place Best, -- Nan Zhu On Monday, March 24, 2014 at 4:30 PM, Diana Carroll wrote: Yeah, that's exactly what I did. Unfortunately it doesn't work

Re: quick start guide: building a standalone scala program

2014-03-24 Thread Nan Zhu
I found that I never read the document carefully and I never find that Spark document is suggesting you to use Spark-distributed sbt…… Best, -- Nan Zhu On Monday, March 24, 2014 at 5:47 PM, Diana Carroll wrote: Thanks for your help, everyone. Several folks have explained that I can

Re: Splitting RDD and Grouping together to perform computation

2014-03-24 Thread Nan Zhu
partition your input into even number partitions use mapPartition to operate on Iterator[Int] maybe there are some more efficient way…. Best, -- Nan Zhu On Monday, March 24, 2014 at 7:59 PM, yh18190 wrote: Hi, I have large data set of numbers ie RDD and wanted to perform

Re: quick start guide: building a standalone scala program

2014-03-24 Thread Nan Zhu
Yes, actually even for spark, I mostly use the sbt I installed…..so always missing this issue…. If you can reproduce the problem with a spark-distribtued sbt…I suggest proposing a PR to fix the document, before 0.9.1 is officially released Best, -- Nan Zhu On Monday, March 24, 2014

Re: Splitting RDD and Grouping together to perform computation

2014-03-24 Thread Nan Zhu
at console:16 scala res7.collect res10: Array[Int] = Array(3, 7) Best, -- Nan Zhu On Monday, March 24, 2014 at 8:40 PM, Nan Zhu wrote: partition your input into even number partitions use mapPartition to operate on Iterator[Int] maybe there are some more efficient way…. Best