[ https://issues.apache.org/jira/browse/SPARK-20811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018246#comment-16018246 ]
Nan Zhu commented on SPARK-20811: --------------------------------- thanks, let me try it > GBT Classifier failed with mysterious StackOverflowError > -------------------------------------------------------- > > Key: SPARK-20811 > URL: https://issues.apache.org/jira/browse/SPARK-20811 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.1.0 > Reporter: Nan Zhu > > I am running GBT Classifier over airline dataset (combining 2005-2008) and in > total it's around 22M examples as training data > code is simple > {code:title=Bar.scala|borderStyle=solid} > val gradientBoostedTrees = new GBTClassifier() > gradientBoostedTrees.setMaxBins(1000) > gradientBoostedTrees.setMaxIter(500) > gradientBoostedTrees.setMaxDepth(6) > gradientBoostedTrees.setStepSize(1.0) > transformedTrainingSet.cache().foreach(_ => Unit) > val startTime = System.nanoTime() > val model = gradientBoostedTrees.fit(transformedTrainingSet) > println(s"===training time cost: ${(System.nanoTime() - startTime) / > 1000.0 / 1000.0} ms") > val resultDF = model.transform(transformedTestset) > val binaryClassificationEvaluator = new BinaryClassificationEvaluator() > > binaryClassificationEvaluator.setRawPredictionCol("prediction").setLabelCol("label") > println(s"=====test AUC: > ${binaryClassificationEvaluator.evaluate(resultDF)}======") > {code} > my training job always failed with > {quote} > 17/05/19 13:41:29 WARN TaskSetManager: Lost task 18.0 in stage 3907.0 (TID > 137506, 10.0.0.13, executor 3): java.lang.StackOverflowError > at > java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:3037) > at > java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:3061) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2234) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) > at > scala.collection.immutable.List$SerializationProxy.readObject(List.scala:479) > at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058) > {quote} > the above pattern repeated for many times > Is it a bug or did I make something wrong when using GBTClassifier in ML? -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org