[ https://issues.apache.org/jira/browse/SPARK-15092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ivan SPM updated SPARK-15092: ----------------------------- Description: The attribute toDebugString is missing from the DecisionTreeClassifier and DecisionTreeClassifierModel from ML. The attribute exists on the MLLib DecisionTree model. There's no way to check or print the model tree structure from the ML. The basic code for it is this: rom pyspark.ml import Pipeline from pyspark.ml.feature import VectorAssembler, StringIndexer from pyspark.ml.classification import DecisionTreeClassifier cl = DecisionTreeClassifier(labelCol='target_idx', featuresCol='features') pipe = Pipeline(stages=[target_index, assembler, cl]) model = pipe.fit(df_train) # Prediction and model evaluation predictions = model.transform(df_test) mc_evaluator = MulticlassClassificationEvaluator( labelCol="target_idx", predictionCol="prediction", metricName="precision" ) accuracy = mc_evaluator.evaluate(predictions) print("Test Error = {}".format(1.0 - accuracy)) now it would be great to be able to do what is being done on the MLLib model: print model.toDebugString(), # it already has newline DecisionTreeModel classifier of depth 1 with 3 nodes If (feature 0 <= 0.0) Predict: 0.0 Else (feature 0 > 0.0) Predict: 1.0 but there's no toDebugString attribute either to the pipeline model or the DecisionTreeClassifier model: cl.toDebugString() Attribute Error https://spark.apache.org/docs/1.6.0/api/python/_modules/pyspark/mllib/tree.html was: The attribute toDebugString is missing from the DecisionTreeClassifier and DecisionTreeClassifierModel from ML. The attribute exists on the MLLib DecisionTree model. There's no way to check or print the model tree structure from the ML. > toDebugString missing from ML DecisionTreeClassifier > ---------------------------------------------------- > > Key: SPARK-15092 > URL: https://issues.apache.org/jira/browse/SPARK-15092 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 1.6.0 > Environment: HDP 2.3.4, Red Hat 6.7 > Reporter: Ivan SPM > Priority: Minor > Labels: features > > The attribute toDebugString is missing from the DecisionTreeClassifier and > DecisionTreeClassifierModel from ML. The attribute exists on the MLLib > DecisionTree model. > There's no way to check or print the model tree structure from the ML. > The basic code for it is this: > rom pyspark.ml import Pipeline > from pyspark.ml.feature import VectorAssembler, StringIndexer > from pyspark.ml.classification import DecisionTreeClassifier > cl = DecisionTreeClassifier(labelCol='target_idx', featuresCol='features') > pipe = Pipeline(stages=[target_index, assembler, cl]) > model = pipe.fit(df_train) > # Prediction and model evaluation > predictions = model.transform(df_test) > mc_evaluator = MulticlassClassificationEvaluator( > labelCol="target_idx", predictionCol="prediction", metricName="precision" ) > accuracy = mc_evaluator.evaluate(predictions) > print("Test Error = {}".format(1.0 - accuracy)) > now it would be great to be able to do what is being done on the MLLib model: > print model.toDebugString(), # it already has newline > DecisionTreeModel classifier of depth 1 with 3 nodes > If (feature 0 <= 0.0) > Predict: 0.0 > Else (feature 0 > 0.0) > Predict: 1.0 > but there's no toDebugString attribute either to the pipeline model or the > DecisionTreeClassifier model: > cl.toDebugString() > Attribute Error > https://spark.apache.org/docs/1.6.0/api/python/_modules/pyspark/mllib/tree.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org