[ 
https://issues.apache.org/jira/browse/SPARK-15092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan SPM updated SPARK-15092:
-----------------------------
    Description: 
The attribute toDebugString is missing from the DecisionTreeClassifier and 
DecisionTreeClassifierModel from ML. The attribute exists on the MLLib 
DecisionTree model. 

There's no way to check or print the model tree structure from the ML.

The basic code for it is this:


rom pyspark.ml import Pipeline
from pyspark.ml.feature import VectorAssembler, StringIndexer
from pyspark.ml.classification import DecisionTreeClassifier

cl = DecisionTreeClassifier(labelCol='target_idx', featuresCol='features')
pipe = Pipeline(stages=[target_index, assembler, cl])
model = pipe.fit(df_train)

# Prediction and model evaluation
predictions = model.transform(df_test)

mc_evaluator = MulticlassClassificationEvaluator(
labelCol="target_idx", predictionCol="prediction", metricName="precision"    )

accuracy = mc_evaluator.evaluate(predictions)
print("Test Error = {}".format(1.0 - accuracy))

now it would be great to be able to do what is being done on the MLLib model:

print model.toDebugString(),  # it already has newline
        DecisionTreeModel classifier of depth 1 with 3 nodes
          If (feature 0 <= 0.0)
           Predict: 0.0
          Else (feature 0 > 0.0)
           Predict: 1.0

but there's no toDebugString attribute either to the pipeline model or the 
DecisionTreeClassifier model:

cl.toDebugString()
Attribute Error

https://spark.apache.org/docs/1.6.0/api/python/_modules/pyspark/mllib/tree.html



  was:
The attribute toDebugString is missing from the DecisionTreeClassifier and 
DecisionTreeClassifierModel from ML. The attribute exists on the MLLib 
DecisionTree model. 

There's no way to check or print the model tree structure from the ML.


> toDebugString missing from ML DecisionTreeClassifier
> ----------------------------------------------------
>
>                 Key: SPARK-15092
>                 URL: https://issues.apache.org/jira/browse/SPARK-15092
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 1.6.0
>         Environment: HDP 2.3.4, Red Hat 6.7
>            Reporter: Ivan SPM
>            Priority: Minor
>              Labels: features
>
> The attribute toDebugString is missing from the DecisionTreeClassifier and 
> DecisionTreeClassifierModel from ML. The attribute exists on the MLLib 
> DecisionTree model. 
> There's no way to check or print the model tree structure from the ML.
> The basic code for it is this:
> rom pyspark.ml import Pipeline
> from pyspark.ml.feature import VectorAssembler, StringIndexer
> from pyspark.ml.classification import DecisionTreeClassifier
> cl = DecisionTreeClassifier(labelCol='target_idx', featuresCol='features')
> pipe = Pipeline(stages=[target_index, assembler, cl])
> model = pipe.fit(df_train)
> # Prediction and model evaluation
> predictions = model.transform(df_test)
> mc_evaluator = MulticlassClassificationEvaluator(
> labelCol="target_idx", predictionCol="prediction", metricName="precision"    )
> accuracy = mc_evaluator.evaluate(predictions)
> print("Test Error = {}".format(1.0 - accuracy))
> now it would be great to be able to do what is being done on the MLLib model:
> print model.toDebugString(),  # it already has newline
>         DecisionTreeModel classifier of depth 1 with 3 nodes
>           If (feature 0 <= 0.0)
>            Predict: 0.0
>           Else (feature 0 > 0.0)
>            Predict: 1.0
> but there's no toDebugString attribute either to the pipeline model or the 
> DecisionTreeClassifier model:
> cl.toDebugString()
> Attribute Error
> https://spark.apache.org/docs/1.6.0/api/python/_modules/pyspark/mllib/tree.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to