[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl
[ https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947339#comment-14947339 ] Joseph K. Bradley commented on SPARK-9963: -- The PR's waiting for me to take a look---and I will now! > ML RandomForest cleanup: replace predictNodeIndex with predictImpl > -- > > Key: SPARK-9963 > URL: https://issues.apache.org/jira/browse/SPARK-9963 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > > Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl. > This should be straightforward, but please ping me if anything is unclear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl
[ https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943904#comment-14943904 ] Amey Chaugule commented on SPARK-9963: -- Is this still WIP or has the PR been abandoned? I could take a stab at this. > ML RandomForest cleanup: replace predictNodeIndex with predictImpl > -- > > Key: SPARK-9963 > URL: https://issues.apache.org/jira/browse/SPARK-9963 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > > Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl. > This should be straightforward, but please ping me if anything is unclear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl
[ https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746813#comment-14746813 ] Luvsandondov Lkhamsuren commented on SPARK-9963: Implementing {code:title=Node.scala|borderStyle=solid} def predictImpl(binnedFeatures: Array[Int], splits: Array[Array[Split]]) {code} then using it inside: {code:title=RandomForest.scala|borderStyle=solid} def binSeqOp( agg: Array[DTStatsAggregator], baggedPoint: BaggedPoint[TreePoint]): Array[DTStatsAggregator] = { treeToNodeToIndexInfo.foreach { case (treeIndex, nodeIndexToInfo) => val node = topNodes(treeIndex).toNode.predictImpl(baggedPoint.datum.binnedFeatures, splits) nodeBinSeqOp(treeIndex, nodeIndexToInfo.getOrElse(node.id, null), agg, baggedPoint) } agg } {code} We need to extract the nodeIndex to retrieve the relevant NodeIndexInfo from treeToNodeToIndexInfo. Any insights on how I might be able to get around getting nodeIndex in this case? > ML RandomForest cleanup: replace predictNodeIndex with predictImpl > -- > > Key: SPARK-9963 > URL: https://issues.apache.org/jira/browse/SPARK-9963 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > > Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl. > This should be straightforward, but please ping me if anything is unclear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl
[ https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731462#comment-14731462 ] Joseph K. Bradley commented on SPARK-9963: -- Sorry for the slow response! (I've been traveling.) Option 2 sounds best. It can resemble the current predictImpl, but can use the version of shouldGoLeft taking binned feature values. > ML RandomForest cleanup: replace predictNodeIndex with predictImpl > -- > > Key: SPARK-9963 > URL: https://issues.apache.org/jira/browse/SPARK-9963 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > > Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl. > This should be straightforward, but please ping me if anything is unclear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl
[ https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731468#comment-14731468 ] Joseph K. Bradley commented on SPARK-9963: -- Yep, that first case in the if-else is for the right-most bin with range [maxSplitValue, +inf] > ML RandomForest cleanup: replace predictNodeIndex with predictImpl > -- > > Key: SPARK-9963 > URL: https://issues.apache.org/jira/browse/SPARK-9963 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > > Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl. > This should be straightforward, but please ping me if anything is unclear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl
[ https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731702#comment-14731702 ] Apache Spark commented on SPARK-9963: - User 'lkhamsurenl' has created a pull request for this issue: https://github.com/apache/spark/pull/8609 > ML RandomForest cleanup: replace predictNodeIndex with predictImpl > -- > > Key: SPARK-9963 > URL: https://issues.apache.org/jira/browse/SPARK-9963 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > > Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl. > This should be straightforward, but please ping me if anything is unclear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl
[ https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720966#comment-14720966 ] Luvsandondov Lkhamsuren commented on SPARK-9963: Thanks Trevor Cai! I was kind of hoping someone would approve my direction (this is my first time) so that I can start implementing. Once we agree on the direction, I can definitely implement the part. ML RandomForest cleanup: replace predictNodeIndex with predictImpl -- Key: SPARK-9963 URL: https://issues.apache.org/jira/browse/SPARK-9963 Project: Spark Issue Type: Improvement Components: ML Reporter: Joseph K. Bradley Priority: Trivial Labels: starter Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl. This should be straightforward, but please ping me if anything is unclear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl
[ https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720824#comment-14720824 ] Trevor Cai commented on SPARK-9963: --- [~lkhamsurenl]: Are you still working on this? I took a quick look at the code, and it looks to me like it's not trivial to convert the binnedFeatures and splits arrays into a single vector mapping feature to threshold. The below section of code seems to indicate to me that some features don't have a threshold and should always move right, and using Double.MAX_VALUE as the threshold in those cases seems like it could potentially cause issues. {code} override private[tree] def shouldGoLeft(binnedFeature: Int, splits: Array[Split]): Boolean = { if (binnedFeature == splits.length) { // last split, so split right false } else { val featureValueUpperBound = splits(binnedFeature).asInstanceOf[ContinuousSplit].threshold featureValueUpperBound = threshold } } {code} As a result, if you're still working on this, your second proposal makes more sense. If not, I can pick up the issue and implement the second option. [~josephkb] Does this seem reasonable to you? ML RandomForest cleanup: replace predictNodeIndex with predictImpl -- Key: SPARK-9963 URL: https://issues.apache.org/jira/browse/SPARK-9963 Project: Spark Issue Type: Improvement Components: ML Reporter: Joseph K. Bradley Priority: Trivial Labels: starter Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl. This should be straightforward, but please ping me if anything is unclear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl
[ https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703542#comment-14703542 ] Joseph K. Bradley commented on SPARK-9963: -- Not that I know, so please go ahead. If you haven't read it, please also read: [https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark] It is long but has lots of helpful info. Thanks! ML RandomForest cleanup: replace predictNodeIndex with predictImpl -- Key: SPARK-9963 URL: https://issues.apache.org/jira/browse/SPARK-9963 Project: Spark Issue Type: Improvement Components: ML Reporter: Joseph K. Bradley Priority: Trivial Labels: starter Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl. This should be straightforward, but please ping me if anything is unclear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl
[ https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703536#comment-14703536 ] Luvsandondov Lkhamsuren commented on SPARK-9963: Has anyone started working on this? If not, could I (new to this)? ML RandomForest cleanup: replace predictNodeIndex with predictImpl -- Key: SPARK-9963 URL: https://issues.apache.org/jira/browse/SPARK-9963 Project: Spark Issue Type: Improvement Components: ML Reporter: Joseph K. Bradley Priority: Trivial Labels: starter Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl. This should be straightforward, but please ping me if anything is unclear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl
[ https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703562#comment-14703562 ] Joseph K. Bradley commented on SPARK-9963: -- {quote}When you say replace predictNodeIndex with the Node.predictImpl, do you mean override the abstract method predictImpl for the first argument in predictNodeIndex?{quote} Both of those method recursively trace down the tree, simulating prediction, in order to find the leaf node which would make the prediction for a given feature vector. Now that we have predictImpl, I believe predictNodeIndex is no longer needed. Callers could instead use predictImpl, and then get the node index from the resulting node. ML RandomForest cleanup: replace predictNodeIndex with predictImpl -- Key: SPARK-9963 URL: https://issues.apache.org/jira/browse/SPARK-9963 Project: Spark Issue Type: Improvement Components: ML Reporter: Joseph K. Bradley Priority: Trivial Labels: starter Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl. This should be straightforward, but please ping me if anything is unclear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl
[ https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703824#comment-14703824 ] Luvsandondov Lkhamsuren commented on SPARK-9963: Thanks for the clarification! I see two possibilities, correct me if I'm wrong: As of right now, we have binnedFeatures and splits when calling predictNodeIndex. 1. predictImpl(feature: Vector) takes Vector feature therefore we can create a feature from binnedFeatures and splits, then call predictImpl on the node. 2. create a another method predictImpl(binnedFeatures: Array[Int], splits: Array[Array[Split]]) in Node class. Did you have anything else in mind? ML RandomForest cleanup: replace predictNodeIndex with predictImpl -- Key: SPARK-9963 URL: https://issues.apache.org/jira/browse/SPARK-9963 Project: Spark Issue Type: Improvement Components: ML Reporter: Joseph K. Bradley Priority: Trivial Labels: starter Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl. This should be straightforward, but please ping me if anything is unclear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl
[ https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703842#comment-14703842 ] Luvsandondov Lkhamsuren commented on SPARK-9963: Actually, I have a question: override private[tree] def shouldGoLeft(binnedFeature: Int, splits: Array[Split]): Boolean In this method, I do not see splits getting used anywhere, could you explain me the purpose of it? Thanks ML RandomForest cleanup: replace predictNodeIndex with predictImpl -- Key: SPARK-9963 URL: https://issues.apache.org/jira/browse/SPARK-9963 Project: Spark Issue Type: Improvement Components: ML Reporter: Joseph K. Bradley Priority: Trivial Labels: starter Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl. This should be straightforward, but please ping me if anything is unclear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org