how to send additional configuration to the RDD after it was lazily created

2015-09-17 Thread Gil Vernik
Hi, I have the following case, which i am not sure how to resolve. My code uses HadoopRDD and creates various RDDs on top of it (MapPartitionsRDD, and so on ) After all RDDs were lazily created, my code "knows" some new information and i want that "compute" method of the HadoopRDD will be

bug in Worker.scala, ExecutorRunner is not serializable

2015-09-17 Thread Huangguowei
In Worker.scala line 480: case RequestWorkerState => sender ! WorkerStateResponse(host, port, workerId, executors.values.toList, finishedExecutors.values.toList, drivers.values.toList, finishedDrivers.values.toList, activeMasterUrl, cores, memory, coresUsed,

Re: bug in Worker.scala, ExecutorRunner is not serializable

2015-09-17 Thread Sean Owen
Did this cause an error for you? On Thu, Sep 17, 2015, 8:51 AM Huangguowei wrote: > > > In Worker.scala line 480: > > > > case RequestWorkerState => > > sender ! WorkerStateResponse(host, port, workerId, > executors.values.toList, > >

re: bug in Worker.scala, ExecutorRunner is not serializable

2015-09-17 Thread Huangguowei
Is it possible to get Executors status when running an application? 发件人: Sean Owen [mailto:so...@cloudera.com] 发送时间: 2015年9月17日 15:54 收件人: Huangguowei; Dev 主题: Re: bug in Worker.scala, ExecutorRunner is not serializable Did this cause an error for you? On Thu, Sep 17, 2015, 8:51 AM

Re: QueueStream doesn't support checkpoint makes it difficult to do unit test

2015-09-17 Thread Bin Wang
Never mind. I've found a PR and it merged: https://github.com/apache/spark/pull/8624/commits Bin Wang 于2015年9月17日周四 下午4:50写道: > I'm using spark streaming and use updateStateByKey, which forced to use > checkpoint. In my unit test, I create a queueStream to test. But in spark >

答复: bug in Worker.scala, ExecutorRunner is not serializable

2015-09-17 Thread Huangguowei
Thanks for your reply. I just want to do some monitors, never mind! 发件人: Shixiong Zhu [mailto:zsxw...@gmail.com] 发送时间: 2015年9月17日 17:23 收件人: Huangguowei; dev@spark.apache.org 主题: Re: bug in Worker.scala, ExecutorRunner is not serializable RequestWorkerState is an internal message between Worker

答复: bug in Worker.scala, ExecutorRunner is not serializable

2015-09-17 Thread Huangguowei
Not error in normal case. But if I want to ask Worker through akkaUrl to get executors status, it will cause Exception. 发件人: Sean Owen [mailto:so...@cloudera.com] 发送时间: 2015年9月17日 15:54 收件人: Huangguowei; Dev 主题: Re: bug in Worker.scala, ExecutorRunner is not serializable Did this cause an

Re: bug in Worker.scala, ExecutorRunner is not serializable

2015-09-17 Thread Shixiong Zhu
RequestWorkerState is an internal message between Worker and WorkerWebUI. Since they are in the same process, that's fine. Actually, these are not public APIs. Could you elaborate your use case? Best Regards, Shixiong Zhu 2015-09-17 16:36 GMT+08:00 Huangguowei : > > > Is

RDD: Execution and Scheduling

2015-09-17 Thread gsvic
After reading some parts of Spark source code I would like to make some questions about RDD execution and scheduling. At first, please correct me if I am wrong at the following: 1) The number of partitions equals to the number of tasks will be executed in parallel (e.g. , when an RDD is

Re: [MLlib] BinaryLogisticRegressionSummary on test set

2015-09-17 Thread Feynman Liang
We have kept that private because we need to decide on a name for the method which evaluates on a test set (see the TODO comment ); perhaps you could push for this to happen by creating a Jira and pinging

Re: New Spark json endpoints

2015-09-17 Thread Imran Rashid
Hi Kevin, I think it would be great if you added this. It never got added in the first place b/c the original PR was already pretty bloated, and just never got back to this. I agree with Reynold -- you shouldn't need to increase the version for just adding new endpoints (or even adding new

Re: New Spark json endpoints

2015-09-17 Thread Mark Hamstra
While we're at it, adding endpoints that get results by jobGroup (cf. SparkContext#setJobGroup) instead of just for a single Job would also be very useful to some of us. On Thu, Sep 17, 2015 at 7:30 AM, Imran Rashid wrote: > Hi Kevin, > > I think it would be great if you

Re: New Spark json endpoints

2015-09-17 Thread Kevin Chen
Thank you all for the feedback. I’ve created a corresponding JIRA ticket at https://issues.apache.org/jira/browse/SPARK-10565, updated with a summary of this thread. From: Mark Hamstra Date: Thursday, September 17, 2015 at 8:00 AM To: Imran Rashid

Re: RDD: Execution and Scheduling

2015-09-17 Thread Reynold Xin
Your understanding is mostly correct. Replies inline. On Thu, Sep 17, 2015 at 5:23 AM, gsvic wrote: > After reading some parts of Spark source code I would like to make some > questions about RDD execution and scheduling. > > At first, please correct me if I am wrong at

Re: RDD API patterns

2015-09-17 Thread Debasish Das
Rdd nesting can lead to recursive nesting...i would like to know the usecase and why join can't support it...you can always expose an api over a rdd and access that in another rdd mappartition...use a external data source like hbase cassandra redis to support the api... For ur case group by and

Re: JDBC Dialect tests

2015-09-17 Thread Luciano Resende
Thanks Reynold, Also, what is the status of the associated PR are we planning to merge it soon ? This will help me with the Db2 dialect test framework using Docker. Thanks [1] https://github.com/apache/spark/pull/8101 On Mon, Sep 14, 2015 at 1:47 PM, Reynold Xin wrote: >