Re: use CrossValidatorModel for prediction

2016-10-02 Thread Pengcheng Luo
> On Oct 2, 2016, at 1:04 AM, Pengcheng <pch...@gmail.com> wrote: > > Dear Spark Users, > > I was wondering. > > I have a trained crossvalidator model > model: CrossValidatorModel > > I wan to predict a score for features: RDD[Features] >

use CrossValidatorModel for prediction

2016-10-01 Thread Pengcheng
Dear Spark Users, I was wondering. I have a trained crossvalidator model *model: CrossValidatorModel* I wan to predict a score for *features: RDD[Features]* Right now I have to convert features to dataframe and then perform predictions as following: """ val sqlContext = new

SparkML RandomForest

2016-08-10 Thread Pengcheng
.setInputCol(*"category"*) .setOutputCol(*"label"*) *val *pipeline = *new *Pipeline().setStages(*Array*(indexer, rf)) *val *model: PipelineModel = pipeline.fit(trainingData) thanks, pengcheng ​

Re: spark application was submitted twice unexpectedly

2015-04-19 Thread Pengcheng Liu
looking into the work folder of problematic application, seems that the application is continuing creating executors, and error log of worker is as below: Exception in thread main java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs at

spark application was submitted twice unexpectedly

2015-04-18 Thread Pengcheng Liu
at the same time). One of the two applications will run to end, but the other will always stay in running state and never exit and release resources. Does anyone meet the same issue? The spark version I am using is spark1.1.1. Best Regards, Pengcheng -- View this message in context: http://apache

How to limit the number of concurrent tasks per node?

2015-01-06 Thread Pengcheng YIN
1. But this could take effect globally. My pipeline involves lots of other operations which I do not want to set limit on. Is there any better solution to fulfil the purpose? Thanks! Pengcheng

does calling cache()/persist() on a RDD trigger its immediate evaluation?

2015-01-03 Thread Pengcheng YIN
` with persisted `newRdd`. My concern is that, if RDD is not evaluated and persisted when persist() is called, I need to change the position of persist()/unpersist() called to make it more efficient. Thanks, Pengcheng

merge elements in a Spark RDD under custom condition

2014-12-01 Thread Pengcheng YIN
. For example, suppose RDD[Seq[Int]] = [[1,2,3], [2,4,5], [1,2], [7,8,9]], the result should be [[1,2,3,4,5], [7,8,9]]. Since RDD[Seq[Int]] is very large, I cannot do it in driver program. Is it possible to get it done using distributed groupBy/map/reduce, etc? Thanks in advance, Pengcheng