Re: Input validation for LogisticRegressionWithSGD

2015-03-15 Thread Rishi Yadav
ca you share some sample data

On Sun, Mar 15, 2015 at 8:51 PM, Rohit U rjupadhy...@gmail.com wrote:

 Hi,

 I am trying to run  LogisticRegressionWithSGD on RDD of LabeledPoints
 loaded using loadLibSVMFile:

 val logistic: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc,
 s3n://logistic-regression/epsilon_normalized)

 val model = LogisticRegressionWithSGD.train(logistic, 100)

 It gives an input validation error after about 10 minutes:

 org.apache.spark.SparkException: Input validation failed.
 at
 org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:162)
 at
 org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:146)
 at
 org.apache.spark.mllib.classification.LogisticRegressionWithSGD$.train(LogisticRegression.scala:157)
 at
 org.apache.spark.mllib.classification.LogisticRegressionWithSGD$.train(LogisticRegression.scala:192)

 From reading this bug report (
 https://issues.apache.org/jira/browse/SPARK-2575) since I am loading
 LibSVM format file there should be only 0/1 in the dataset and should not
 be facing the issue in the bug report. Is there something else I'm missing
 here?

 Thanks!



Re: Input validation for LogisticRegressionWithSGD

2015-03-15 Thread Rohit U
I checked the labels across the entire dataset and it looks like it has -1
and 1 (not the 0 and 1 I originally expected). I will try replacing the -1
with 0 and run it again.

On Mon, Mar 16, 2015 at 12:51 AM, Rishi Yadav ri...@infoobjects.com wrote:

 ca you share some sample data

 On Sun, Mar 15, 2015 at 8:51 PM, Rohit U rjupadhy...@gmail.com wrote:

 Hi,

 I am trying to run  LogisticRegressionWithSGD on RDD of LabeledPoints
 loaded using loadLibSVMFile:

 val logistic: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc,
 s3n://logistic-regression/epsilon_normalized)

 val model = LogisticRegressionWithSGD.train(logistic, 100)

 It gives an input validation error after about 10 minutes:

 org.apache.spark.SparkException: Input validation failed.
 at
 org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:162)
 at
 org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:146)
 at
 org.apache.spark.mllib.classification.LogisticRegressionWithSGD$.train(LogisticRegression.scala:157)
 at
 org.apache.spark.mllib.classification.LogisticRegressionWithSGD$.train(LogisticRegression.scala:192)

 From reading this bug report (
 https://issues.apache.org/jira/browse/SPARK-2575) since I am loading
 LibSVM format file there should be only 0/1 in the dataset and should not
 be facing the issue in the bug report. Is there something else I'm missing
 here?

 Thanks!