[jira] [Updated] (SPARK-14610) Remove superfluous split from random forest findSplitsForContinousFeature

2016-10-10 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-14610:
--
Assignee: Seth Hendrickson

> Remove superfluous split from random forest findSplitsForContinousFeature
> -
>
> Key: SPARK-14610
> URL: https://issues.apache.org/jira/browse/SPARK-14610
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Seth Hendrickson
>Assignee: Seth Hendrickson
>Priority: Minor
>
> Currently, the method findSplitsForContinuousFeature in random forest 
> produces an unnecessary split. For example, if a continuous feature has 
> unique values: (1, 2, 3), then the possible splits generated by this method 
> are:
> * {1|2,3}
> * {1,2|3} 
> * {1,2,3|}
> The following unit test is quite clearly incorrect:
> {code:title=rf.scala|borderStyle=solid}
> val featureSamples = Array(1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3).map(_.toDouble)
>   val splits = 
> RandomForest.findSplitsForContinuousFeature(featureSamples, fakeMetadata, 0)
>   assert(splits.length === 3)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14610) Remove superfluous split from random forest findSplitsForContinousFeature

2016-04-15 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-14610:
--
Priority: Minor  (was: Major)

> Remove superfluous split from random forest findSplitsForContinousFeature
> -
>
> Key: SPARK-14610
> URL: https://issues.apache.org/jira/browse/SPARK-14610
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Seth Hendrickson
>Priority: Minor
>
> Currently, the method findSplitsForContinuousFeature in random forest 
> produces an unnecessary split. For example, if a continuous feature has 
> unique values: (1, 2, 3), then the possible splits generated by this method 
> are:
> * {1|2,3}
> * {1,2|3} 
> * {1,2,3|}
> The following unit test is quite clearly incorrect:
> {code:title=rf.scala|borderStyle=solid}
> val featureSamples = Array(1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3).map(_.toDouble)
>   val splits = 
> RandomForest.findSplitsForContinuousFeature(featureSamples, fakeMetadata, 0)
>   assert(splits.length === 3)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14610) Remove superfluous split from random forest findSplitsForContinousFeature

2016-04-13 Thread Seth Hendrickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Seth Hendrickson updated SPARK-14610:
-
Description: 
Currently, the method findSplitsForContinuousFeature in random forest produces 
an unnecessary split. For example, if a continuous feature has unique values: 
(1, 2, 3), then the possible splits generated by this method are:
* {1|2,3}
* {1,2|3} 
* {1,2,3|}

The following unit test is quite clearly incorrect:

{code:title=rf.scala|borderStyle=solid}
val featureSamples = Array(1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3).map(_.toDouble)
  val splits = RandomForest.findSplitsForContinuousFeature(featureSamples, 
fakeMetadata, 0)
  assert(splits.length === 3)
{code}

  was:
Currently, the method findSplitsForContinuousFeature in random forest produces 
an unnecessary split. For example, if a continuous feature has unique values: 
{1, 2, 3}, then the possible splits generated by this method are:
{1|2,3}, {1,2|3} and {1,2,3|}. The following unit test is quite clearly 
incorrect:

{code:title=rf.scala|borderStyle=solid}
val featureSamples = Array(1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3).map(_.toDouble)
  val splits = RandomForest.findSplitsForContinuousFeature(featureSamples, 
fakeMetadata, 0)
  assert(splits.length === 3)
{code}


> Remove superfluous split from random forest findSplitsForContinousFeature
> -
>
> Key: SPARK-14610
> URL: https://issues.apache.org/jira/browse/SPARK-14610
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Seth Hendrickson
>
> Currently, the method findSplitsForContinuousFeature in random forest 
> produces an unnecessary split. For example, if a continuous feature has 
> unique values: (1, 2, 3), then the possible splits generated by this method 
> are:
> * {1|2,3}
> * {1,2|3} 
> * {1,2,3|}
> The following unit test is quite clearly incorrect:
> {code:title=rf.scala|borderStyle=solid}
> val featureSamples = Array(1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3).map(_.toDouble)
>   val splits = 
> RandomForest.findSplitsForContinuousFeature(featureSamples, fakeMetadata, 0)
>   assert(splits.length === 3)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org