Re: Input validation for LogisticRegressionWithSGD
I checked the labels across the entire dataset and it looks like it has -1 and 1 (not the 0 and 1 I originally expected). I will try replacing the -1 with 0 and run it again. On Mon, Mar 16, 2015 at 12:51 AM, Rishi Yadav wrote: > ca you share some sample data > > On Sun, Mar 15, 2015 at 8:51 PM, Rohit U wrote: > >> Hi, >> >> I am trying to run LogisticRegressionWithSGD on RDD of LabeledPoints >> loaded using loadLibSVMFile: >> >> val logistic: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc, >> "s3n://logistic-regression/epsilon_normalized") >> >> val model = LogisticRegressionWithSGD.train(logistic, 100) >> >> It gives an input validation error after about 10 minutes: >> >> org.apache.spark.SparkException: Input validation failed. >> at >> org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:162) >> at >> org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:146) >> at >> org.apache.spark.mllib.classification.LogisticRegressionWithSGD$.train(LogisticRegression.scala:157) >> at >> org.apache.spark.mllib.classification.LogisticRegressionWithSGD$.train(LogisticRegression.scala:192) >> >> From reading this bug report ( >> https://issues.apache.org/jira/browse/SPARK-2575) since I am loading >> LibSVM format file there should be only 0/1 in the dataset and should not >> be facing the issue in the bug report. Is there something else I'm missing >> here? >> >> Thanks! >> > >
Re: Input validation for LogisticRegressionWithSGD
ca you share some sample data On Sun, Mar 15, 2015 at 8:51 PM, Rohit U wrote: > Hi, > > I am trying to run LogisticRegressionWithSGD on RDD of LabeledPoints > loaded using loadLibSVMFile: > > val logistic: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc, > "s3n://logistic-regression/epsilon_normalized") > > val model = LogisticRegressionWithSGD.train(logistic, 100) > > It gives an input validation error after about 10 minutes: > > org.apache.spark.SparkException: Input validation failed. > at > org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:162) > at > org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:146) > at > org.apache.spark.mllib.classification.LogisticRegressionWithSGD$.train(LogisticRegression.scala:157) > at > org.apache.spark.mllib.classification.LogisticRegressionWithSGD$.train(LogisticRegression.scala:192) > > From reading this bug report ( > https://issues.apache.org/jira/browse/SPARK-2575) since I am loading > LibSVM format file there should be only 0/1 in the dataset and should not > be facing the issue in the bug report. Is there something else I'm missing > here? > > Thanks! >
Input validation for LogisticRegressionWithSGD
Hi, I am trying to run LogisticRegressionWithSGD on RDD of LabeledPoints loaded using loadLibSVMFile: val logistic: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc, "s3n://logistic-regression/epsilon_normalized") val model = LogisticRegressionWithSGD.train(logistic, 100) It gives an input validation error after about 10 minutes: org.apache.spark.SparkException: Input validation failed. at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:162) at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:146) at org.apache.spark.mllib.classification.LogisticRegressionWithSGD$.train(LogisticRegression.scala:157) at org.apache.spark.mllib.classification.LogisticRegressionWithSGD$.train(LogisticRegression.scala:192) >From reading this bug report ( https://issues.apache.org/jira/browse/SPARK-2575) since I am loading LibSVM format file there should be only 0/1 in the dataset and should not be facing the issue in the bug report. Is there something else I'm missing here? Thanks!