Re: RE: Spark checkpoint problem

2015-11-26 Thread eric wong
I don't think it is a deliberate design. So you may need do action on the RDD before the action of RDD, if you want to explicitly checkpoint RDD. 2015-11-26 13:23 GMT+08:00 wyphao.2007 : > Spark 1.5.2. > > 在 2015-11-26 13:19:39,"张志强(旺轩)" 写道: >

Re: Unchecked contribution (JIRA and PR)

2015-11-26 Thread Sergio Ramírez
OK, I'll do that. Thanks for the response. El 17/11/15 a las 01:36, Joseph Bradley escribió: Hi Sergio, Apart from apologies about limited review bandwidth (from me too!), I wanted to add: It would be interesting to hear what feedback you've gotten from users of your package. Perhaps you

Re: A proposal for Spark 2.0

2015-11-26 Thread Sean Owen
Maintaining both a 1.7 and 2.0 is too much work for the project, which is over-stretched now. This means that after 1.6 it's just small maintenance releases in 1.x and no substantial features or evolution. This means that the "in progress" APIs in 1.x that will stay that way, unless one updates to

Re: A proposal for Spark 2.0

2015-11-26 Thread Steve Loughran
> On 25 Nov 2015, at 08:54, Sandy Ryza wrote: > > I see. My concern is / was that cluster operators will be reluctant to > upgrade to 2.0, meaning that developers using those clusters need to stay on > 1.x, and, if they want to move to DataFrames, essentially need to

[no subject]

2015-11-26 Thread Dmitry Tolpeko

question about combining small parquet files

2015-11-26 Thread Nezih Yigitbasi
Hi Spark people, I have a Hive table that has a lot of small parquet files and I am creating a data frame out of it to do some processing, but since I have a large number of splits/files my job creates a lot of tasks, which I don't want. Basically what I want is the same functionality that Hive

Re: Using spark MLlib without installing Spark

2015-11-26 Thread Debasish Das
Decoupling mlllib and core is difficult...it is not intended to run spark core 1.5 with spark mllib 1.6 snapshot...core is more stabilized due to new algorithms getting added to mllib and sometimes you might be tempted to do that but its not recommend. On Nov 21, 2015 8:04 PM, "Reynold Xin"

NettyRpcEnv adverisedPort

2015-11-26 Thread Rad Gruchalski
Dear all, I am currently looking at modifying NettyRpcEnv for this PR: https://github.com/apache/spark/pull/9608 The functionality which I’m trying to achieve is the following: if there is a configuration property spark.driver.advertisedPort, make executors reply to advertisedPort instead of

SparkR read.df Option type doesn't match

2015-11-26 Thread liushiqi9
I try to write a third party datasource plugin for spark. It works perfect fine in scala, but in R because I need to pass the options, which is a map[string,string] in scala, and nothing in R works, I failed. I tried use named list in R, it cannot get the value since I use get in my plugin to get

Grid search with Random Forest

2015-11-26 Thread Ndjido Ardo Bar
Hi folks, Does anyone know whether the Grid Search capability is enabled since the issue spark-9011 of version 1.4.0 ? I'm getting the "rawPredictionCol column doesn't exist" when trying to perform a grid search with Spark 1.4.0. Cheers, Ardo

Re: A proposal for Spark 2.0

2015-11-26 Thread Reynold Xin
I don't think there are any plan for Scala 2.12 support yet. We can always add Scala 2.12 support later. On Thu, Nov 26, 2015 at 12:59 PM, Koert Kuipers wrote: > I also thought the idea was to drop 2.10. Do we want to cross build for 3 > scala versions? > On Nov 25, 2015

Re: NettyRpcEnv adverisedPort

2015-11-26 Thread Shixiong Zhu
I think you are right. The executor gets the driver port from "RpcEnv.address". Best Regards, Shixiong Zhu 2015-11-26 11:45 GMT-08:00 Rad Gruchalski : > Dear all, > > I am currently looking at modifying NettyRpcEnv for this PR: > https://github.com/apache/spark/pull/9608 >

Re: A proposal for Spark 2.0

2015-11-26 Thread Koert Kuipers
I also thought the idea was to drop 2.10. Do we want to cross build for 3 scala versions? On Nov 25, 2015 3:54 AM, "Sandy Ryza" wrote: > I see. My concern is / was that cluster operators will be reluctant to > upgrade to 2.0, meaning that developers using those clusters

Re: SparkR read.df Option type doesn't match

2015-11-26 Thread liushiqi9
I found the answer myself. options should be added like: read.df(sqlContext,path=NULL,source="***",option1="",option2="" ) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/SparkR-read-df-Option-type-doesn-t-match-tp15365p15370.html Sent from the

Re: NettyRpcEnv adverisedPort

2015-11-26 Thread Rad Gruchalski
I did test my change: https://github.com/radekg/spark/commit/b21aae1468169ee0a388d33ba6cebdb17b895956#diff-0c89b4a60c30a7cd2224bb64d93da942R125 It seems to do what I want it to do, however, not quite sure about the overall impact. I’d appreciate if someone who knows the NettyRpcEnv details

Re: tests blocked at "don't call ssc.stop in listener"

2015-11-26 Thread Saisai Shao
Might be related to this JIRA ( https://issues.apache.org/jira/browse/SPARK-11761), not very sure about it. On Fri, Nov 27, 2015 at 10:22 AM, Nan Zhu wrote: > Hi, all > > Anyone noticed that some of the tests just blocked at the test case “don't > call ssc.stop in

steamingContext stop gracefully failed in yarn-cluster mode

2015-11-26 Thread qinggangwa...@gmail.com
Hi all,   I try to stop the streamingContext gracefully in yarn-cluster mode. But it seemes that the job is stopped and start again when I use ssc.stop(true,true).  And the job is stopped when I use ssc.stop(true).  Does it means that the steamingContext cannot be stopped gracefully

tests blocked at "don't call ssc.stop in listener"

2015-11-26 Thread Nan Zhu
Hi, all Anyone noticed that some of the tests just blocked at the test case “don't call ssc.stop in listener” in StreamingListenerSuite? Examples: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46766/console

Re: tests blocked at "don't call ssc.stop in listener"

2015-11-26 Thread Shixiong Zhu
Just found a potential dead-lock in this test. Will send a PR to fix it soon. Best Regards, Shixiong Zhu 2015-11-26 18:55 GMT-08:00 Saisai Shao : > Might be related to this JIRA ( > https://issues.apache.org/jira/browse/SPARK-11761), not very sure about > it. > > On Fri,