[ https://issues.apache.org/jira/browse/SPARK-45154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17795716#comment-17795716 ]
APeng Zhang commented on SPARK-45154: ------------------------------------- [~oumarnour] I think you need to set the _seed_ param of CrossValidator. > Pyspark DecisionTreeClassifier: results and tree structure in spark3 very > different from that of the spark2 version on the same data and with the same > hyperparameters. > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-45154 > URL: https://issues.apache.org/jira/browse/SPARK-45154 > Project: Spark > Issue Type: Bug > Components: ML, MLlib, PySpark, Spark Core > Affects Versions: 3.0.0, 3.3.1, 3.2.4, 3.3.3, 3.3.2, 3.4.0, 3.4.1 > Reporter: Oumar Nour > Priority: Critical > Labels: decisiontree, pyspark3, spark2, spark3 > > Hello, > I have an engine running on spark2 using a DecisionTreeClassifier model using > the CrossValidator. > > {code:java} > dt = DecisionTreeClassifier(maxBins=10000, seed=0) > cv_dt_evaluator = BinaryClassificationEvaluator( > metricName="", > rawPredictionCol="probability") > # Create param grid and cross validator for model selection > dt_grid = ParamGridBuilder()\ > .addGrid( > dt.minInstancesPerNode, [100] > )\ > .addGrid( > dt.maxDepth, [10] > )\ > .build() > cv = CrossValidator( > estimator=dt, estimatorParamMaps=dt_grid, > evaluator=cv_dt_evaluator, > parallelism=4 > numFolds=4 > ){code} > > I want to {*}migrate from spark2 to spark3{*}. I've run > *DecisionTreeClassifier* on the same data with the same parameter values. But > unfortunately my results are {*}completely different, especially in terms of > tree structure{*}. I have trees with less depth and fewer splits on spark3. > I've tried to read the documentation but I haven't found an answer to my > question. > > Can you help me find a solution to this problem? > Thanks in advance for your help > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org