[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl

2015-10-07 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947339#comment-14947339
 ] 

Joseph K. Bradley commented on SPARK-9963:
--

The PR's waiting for me to take a look---and I will now!

> ML RandomForest cleanup: replace predictNodeIndex with predictImpl
> --
>
> Key: SPARK-9963
> URL: https://issues.apache.org/jira/browse/SPARK-9963
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Trivial
>  Labels: starter
>
> Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl.
> This should be straightforward, but please ping me if anything is unclear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl

2015-10-05 Thread Amey Chaugule (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943904#comment-14943904
 ] 

Amey Chaugule commented on SPARK-9963:
--

Is this still WIP or has the PR been abandoned? I could take a stab at this.

> ML RandomForest cleanup: replace predictNodeIndex with predictImpl
> --
>
> Key: SPARK-9963
> URL: https://issues.apache.org/jira/browse/SPARK-9963
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Trivial
>  Labels: starter
>
> Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl.
> This should be straightforward, but please ping me if anything is unclear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl

2015-09-15 Thread Luvsandondov Lkhamsuren (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746813#comment-14746813
 ] 

Luvsandondov Lkhamsuren commented on SPARK-9963:


Implementing 
{code:title=Node.scala|borderStyle=solid}
def predictImpl(binnedFeatures: Array[Int], splits: Array[Array[Split]])
{code}
then using it inside:
{code:title=RandomForest.scala|borderStyle=solid}
def binSeqOp(
agg: Array[DTStatsAggregator],
baggedPoint: BaggedPoint[TreePoint]): Array[DTStatsAggregator] = {
  treeToNodeToIndexInfo.foreach { case (treeIndex, nodeIndexToInfo) =>
val node = 
topNodes(treeIndex).toNode.predictImpl(baggedPoint.datum.binnedFeatures, splits)
nodeBinSeqOp(treeIndex, nodeIndexToInfo.getOrElse(node.id, null), agg, 
baggedPoint)
  }
  agg
}
{code}

We need to extract the nodeIndex to retrieve the relevant NodeIndexInfo from  
treeToNodeToIndexInfo. Any insights on how I might be able to get around 
getting nodeIndex in this case?

> ML RandomForest cleanup: replace predictNodeIndex with predictImpl
> --
>
> Key: SPARK-9963
> URL: https://issues.apache.org/jira/browse/SPARK-9963
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Trivial
>  Labels: starter
>
> Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl.
> This should be straightforward, but please ping me if anything is unclear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl

2015-09-04 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731462#comment-14731462
 ] 

Joseph K. Bradley commented on SPARK-9963:
--

Sorry for the slow response! (I've been traveling.)  Option 2 sounds best.  It 
can resemble the current predictImpl, but can use the version of shouldGoLeft 
taking binned feature values.

> ML RandomForest cleanup: replace predictNodeIndex with predictImpl
> --
>
> Key: SPARK-9963
> URL: https://issues.apache.org/jira/browse/SPARK-9963
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Trivial
>  Labels: starter
>
> Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl.
> This should be straightforward, but please ping me if anything is unclear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl

2015-09-04 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731468#comment-14731468
 ] 

Joseph K. Bradley commented on SPARK-9963:
--

Yep, that first case in the if-else is for the right-most bin with range 
[maxSplitValue, +inf]

> ML RandomForest cleanup: replace predictNodeIndex with predictImpl
> --
>
> Key: SPARK-9963
> URL: https://issues.apache.org/jira/browse/SPARK-9963
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Trivial
>  Labels: starter
>
> Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl.
> This should be straightforward, but please ping me if anything is unclear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl

2015-09-04 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731702#comment-14731702
 ] 

Apache Spark commented on SPARK-9963:
-

User 'lkhamsurenl' has created a pull request for this issue:
https://github.com/apache/spark/pull/8609

> ML RandomForest cleanup: replace predictNodeIndex with predictImpl
> --
>
> Key: SPARK-9963
> URL: https://issues.apache.org/jira/browse/SPARK-9963
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Trivial
>  Labels: starter
>
> Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl.
> This should be straightforward, but please ping me if anything is unclear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl

2015-08-28 Thread Luvsandondov Lkhamsuren (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720966#comment-14720966
 ] 

Luvsandondov Lkhamsuren commented on SPARK-9963:


Thanks Trevor Cai! I was kind of hoping someone would approve my direction 
(this is my first time) so that I can start implementing. Once we agree on the 
direction, I can definitely implement the part.

 ML RandomForest cleanup: replace predictNodeIndex with predictImpl
 --

 Key: SPARK-9963
 URL: https://issues.apache.org/jira/browse/SPARK-9963
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Joseph K. Bradley
Priority: Trivial
  Labels: starter

 Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl.
 This should be straightforward, but please ping me if anything is unclear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl

2015-08-28 Thread Trevor Cai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720824#comment-14720824
 ] 

Trevor Cai commented on SPARK-9963:
---

[~lkhamsurenl]: Are you still working on this?
I took a quick look at the code, and it looks to me like it's not trivial to 
convert the binnedFeatures and splits arrays into a single vector mapping 
feature to threshold. The below section of code seems to indicate to me that 
some features don't have a threshold and should always move right, and using 
Double.MAX_VALUE as the threshold in those cases seems like it could 
potentially cause issues.
{code}
override private[tree] def shouldGoLeft(binnedFeature: Int, splits: 
Array[Split]): Boolean = {
if (binnedFeature == splits.length) {
  //  last split, so split right
  false
} else {
  val featureValueUpperBound = 
splits(binnedFeature).asInstanceOf[ContinuousSplit].threshold
  featureValueUpperBound = threshold
}
  }
{code}

As a result, if you're still working on this, your second proposal makes more 
sense. If not, I can pick up the issue and implement the second option.

[~josephkb] Does this seem reasonable to you?


 ML RandomForest cleanup: replace predictNodeIndex with predictImpl
 --

 Key: SPARK-9963
 URL: https://issues.apache.org/jira/browse/SPARK-9963
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Joseph K. Bradley
Priority: Trivial
  Labels: starter

 Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl.
 This should be straightforward, but please ping me if anything is unclear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl

2015-08-19 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703542#comment-14703542
 ] 

Joseph K. Bradley commented on SPARK-9963:
--

Not that I know, so please go ahead.  If you haven't read it, please also read: 
[https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark]
It is long but has lots of helpful info.  Thanks!

 ML RandomForest cleanup: replace predictNodeIndex with predictImpl
 --

 Key: SPARK-9963
 URL: https://issues.apache.org/jira/browse/SPARK-9963
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Joseph K. Bradley
Priority: Trivial
  Labels: starter

 Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl.
 This should be straightforward, but please ping me if anything is unclear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl

2015-08-19 Thread Luvsandondov Lkhamsuren (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703536#comment-14703536
 ] 

Luvsandondov Lkhamsuren commented on SPARK-9963:


Has anyone started working on this? If not, could I (new to this)?

 ML RandomForest cleanup: replace predictNodeIndex with predictImpl
 --

 Key: SPARK-9963
 URL: https://issues.apache.org/jira/browse/SPARK-9963
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Joseph K. Bradley
Priority: Trivial
  Labels: starter

 Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl.
 This should be straightforward, but please ping me if anything is unclear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl

2015-08-19 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703562#comment-14703562
 ] 

Joseph K. Bradley commented on SPARK-9963:
--

{quote}When you say replace predictNodeIndex with the Node.predictImpl, do you 
mean override the abstract method predictImpl for the first argument in 
predictNodeIndex?{quote}

Both of those method recursively trace down the tree, simulating prediction, in 
order to find the leaf node which would make the prediction for a given feature 
vector.  Now that we have predictImpl, I believe predictNodeIndex is no longer 
needed.  Callers could instead use predictImpl, and then get the node index 
from the resulting node.

 ML RandomForest cleanup: replace predictNodeIndex with predictImpl
 --

 Key: SPARK-9963
 URL: https://issues.apache.org/jira/browse/SPARK-9963
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Joseph K. Bradley
Priority: Trivial
  Labels: starter

 Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl.
 This should be straightforward, but please ping me if anything is unclear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl

2015-08-19 Thread Luvsandondov Lkhamsuren (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703824#comment-14703824
 ] 

Luvsandondov Lkhamsuren commented on SPARK-9963:


Thanks for the clarification! I see two possibilities, correct me if I'm wrong:
As of right now, we have binnedFeatures and splits when calling 
predictNodeIndex. 
1. predictImpl(feature: Vector) takes Vector feature therefore we can create a 
feature from binnedFeatures and splits, then call predictImpl on the node.

2. create a another method predictImpl(binnedFeatures: Array[Int], splits: 
Array[Array[Split]]) in Node class.

Did you have anything else in mind?

 ML RandomForest cleanup: replace predictNodeIndex with predictImpl
 --

 Key: SPARK-9963
 URL: https://issues.apache.org/jira/browse/SPARK-9963
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Joseph K. Bradley
Priority: Trivial
  Labels: starter

 Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl.
 This should be straightforward, but please ping me if anything is unclear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: replace predictNodeIndex with predictImpl

2015-08-19 Thread Luvsandondov Lkhamsuren (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703842#comment-14703842
 ] 

Luvsandondov Lkhamsuren commented on SPARK-9963:


Actually, I have a question:
override private[tree] def shouldGoLeft(binnedFeature: Int, splits: 
Array[Split]): Boolean

In this method, I do not see splits getting used anywhere, could you explain me 
the purpose of it?

Thanks

 ML RandomForest cleanup: replace predictNodeIndex with predictImpl
 --

 Key: SPARK-9963
 URL: https://issues.apache.org/jira/browse/SPARK-9963
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Joseph K. Bradley
Priority: Trivial
  Labels: starter

 Replace ml.tree.impl.RandomForest.predictNodeIndex with Node.predictImpl.
 This should be straightforward, but please ping me if anything is unclear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org