[ https://issues.apache.org/jira/browse/SPARK-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15935646#comment-15935646 ]
Facai Yan commented on SPARK-3165: ---------------------------------- Do you mean that: TreePoint.binnedFeatures is Array[int], which doesn't sparsity in data? So those modifications is need: 1. modify TreePoint.binnedFeatures to Vector. 2. modify LearningNode.predictImpl method if need. 3. modify the methods about Bin-wise computation, such as binSeqOp, to accelerate computation. Please correct me if misunderstand. I'd like to work on it. > DecisionTree does not use sparsity in data > ------------------------------------------ > > Key: SPARK-3165 > URL: https://issues.apache.org/jira/browse/SPARK-3165 > Project: Spark > Issue Type: Improvement > Components: MLlib > Reporter: Joseph K. Bradley > Priority: Minor > > Improvement: computation > DecisionTree should take advantage of sparse feature vectors. Aggregation > over training data could handle the empty/zero-valued data elements more > efficiently. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org