Mikhail Shiryaev created SPARK-16377:
----------------------------------------

             Summary: Spark MLlib: MultilayerPerceptronClassifier - error while 
training
                 Key: SPARK-16377
                 URL: https://issues.apache.org/jira/browse/SPARK-16377
             Project: Spark
          Issue Type: Bug
          Components: ML, MLilb
    Affects Versions: 1.5.2
            Reporter: Mikhail Shiryaev


Hi, 

I am trying to train model by MultilayerPerceptronClassifier. 

It works on sample data from 
data/mllib/sample_multiclass_classification_data.txt with 4 features, 3 classes 
and layers [4, 4, 3]. 
But when I try to use other input files with other features and classes (from 
here for example: 
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html) 
then I get errors. 

Example: 
Input file aloi (128 features, 1000 classes, layers [128, 128, 1000]): 


with block size = 1: 
ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. 
Decreasing step size to Infinity 
ERROR LBFGS: Failure! Resetting history: breeze.optimize.FirstOrderException: 
Line search failed 
ERROR LBFGS: Failure again! Giving up and returning. Maybe the objective is 
just poorly behaved? 


with default block size = 128: 
 java.lang.ArrayIndexOutOfBoundsException 
  at java.lang.System.arraycopy(Native Method) 
  at 
org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:629)
 
  at 
org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:628)
 
   at scala.collection.immutable.List.foreach(List.scala:381) 
   at 
org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:628)
 
   at 
org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:624)
 



Even if I modify sample_multiclass_classification_data.txt file (rename all 
4-th features to 5-th) and run with layers [5, 5, 3] then I also get the same 
errors as for file above. 


So to resume: 
I can't run training with default block size and with more than 4 features. 
If I set  block size to 1 then some actions are happened but I get errors from 
LBFGS. 
It is reproducible with Spark 1.5.2 and from master branch on github (from 4-th 
July). 

Did somebody already met with such behavior? 
Is there bug in MultilayerPerceptronClassifier or I use it incorrectly? 

Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to