Re: how to retain part of the features in LogisticRegressionModel (spark2.0)

2017-03-19 Thread jinhong lu
Thanks Dhanesh, and how about the features question? > 在 2017年3月19日,19:08,Dhanesh Padmanabhan 写道: > > Dhanesh Thanks, lujinhong

Re: how to retain part of the features in LogisticRegressionModel (spark2.0)

2017-03-19 Thread jinhong lu
By the way, I found in spark 2.1 I can use setFamily() to decide binomial or multinomial, but how can I do the same thing in spark 2.0.2? If not support , which one is used in spark 2.0.2? binomial or multinomial? > 在 2017年3月19日,18:12,jinhong lu <lujinho...@gmail.com> 写道: > >

how to retain part of the features in LogisticRegressionModel (spark2.0)

2017-03-19 Thread jinhong lu
I train my LogisticRegressionModel like this, I want my model to retain only some of the features(e.g. 500 of them), not all the features. What shou I do? I use .setElasticNetParam(1.0), but still all the features is in lrModel.coefficients. import

Re: how to construct parameter for model.transform() from datafile

2017-03-13 Thread jinhong lu
Anyone help? > 在 2017年3月13日,19:38,jinhong lu <lujinho...@gmail.com> 写道: > > After train the mode, I got the result look like this: > > > scala> predictionResult.show() > > +-+++--

Re: how to construct parameter for model.transform() from datafile

2017-03-13 Thread jinhong lu
r of elements of x. A: 144109, x: 804202 at scala.Predef$.require(Predef.scala:224) at org.apache.spark.ml.linalg.BLAS$.gemv(BLAS.scala:521) at org.apache.spark.ml.linalg.Matrix$class.multiply(Matrices.scala:110) at org.apache.spark.ml.linalg.DenseMatrix.multiply(Matrices.scala:176) what shou

how to construct parameter for model.transform() from datafile

2017-03-13 Thread jinhong lu
Hi, all: I got these training data: 0 31607:17 0 111905:36 0 109:3 506:41 1509:1 2106:4 5309:1 7209:5 8406:1 27108:1 27709:1 30209:8 36109:20 41408:1 42309:1 46509:1 47709:5 57809:1 58009:1 58709:2 112109:4 123305:48 142509:1 0 407:14 2905:2 5209:2 6509:2 6909:2

ml package data types

2017-03-09 Thread jinhong lu
Hi, Is there any documentation for ml package data types? just like the mllib package here : https://spark.apache.org/docs/latest/mllib-data-types.html Or it is the same for ml and mllib? Thanks, lujinhong

mllib based on dataset or dataframe

2016-07-10 Thread jinhong lu
Hi, Since the DataSet will be the major API in spark2.0, why mllib will DataFrame-based, and 'future development will focus on the DataFrame-based API.’ Any plan will change mllib form DataFrame-based to DataSet-based? = Thanks, lujinhong

spark to hbase

2015-10-27 Thread jinhong lu
Hi, I write my result to hdfs, it did well: val model = lines.map(pairFunction).groupByKey().flatMap(pairFlatMapFunction).aggregateByKey(new TrainFeature())(seqOp, combOp).values model.map(a => (a.toKey() + "\t" + a.totalCount + "\t" + a.positiveCount)).saveAsTextFile(modelDataPath); But

Re: spark to hbase

2015-10-27 Thread jinhong lu
You can omit that line. > > Which hbase release are you using ? > > As Deng said, don't flush per row. > > Cheers > > On Oct 27, 2015, at 3:21 AM, Deng Ching-Mallete <och...@apache.org > <mailto:och...@apache.org>> wrote: > >> Hi, >> >&

Re: spark to hbase

2015-10-27 Thread jinhong lu
com>> wrote: >> >> For #2, have you checked task log(s) to see if there was some clue ? >> >> You may want to use foreachPartition to reduce the number of flushes. >> >> In the future, please remove color coding - it is not easy to read. >> >> Cheers

NoSuchMethodException : com.google.common.io.ByteStreams.limit

2015-10-23 Thread jinhong lu
Hi, I run spark to write data to hbase, but found NoSuchMethodException: 15/10/23 18:45:21 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, dn18-formal.i.nease.net): java.lang.NoSuchMethodError: com.google.common.io.ByteStreams.limit(Ljava/io/InputStream;J)Ljava/io/InputStream; I found