[GitHub] spark issue #16457: [SPARK-19057][ML] Instances' weight must be non-negative

2017-01-19 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16457 I think it better to discuss in the JIRA. When we come to an agreement, I will reopen this pr. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #16457: [SPARK-19057][ML] Instances' weight must be non-negative

2017-01-03 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16457 Agreed. Now five algs inherit `HasWeightCol`: GLR/LoR/LiR/NB/IsotonicReg I found that some algs use `RDD[Instance]` in `train` : GLR/LoR/LiR ``` val instances: RDD[Instance] =

[GitHub] spark issue #16457: [SPARK-19057][ML] Instances' weight must be non-negative

2017-01-03 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16457 @srowen OK. This is the list of algs that deals with weights: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #16457: [SPARK-19057][ML] Instances' weight must be non-negative

2017-01-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16457 +1 for @sethah 's comment: Algorithms should validate input data. Some already do:

[GitHub] spark issue #16457: [SPARK-19057][ML] Instances' weight must be non-negative

2017-01-03 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16457 Reasonable, yeah, though I think the usages in the existing algorithms will all have no meaningful interpretation of negative weights. At the least this could be more targeted to code paths where

[GitHub] spark issue #16457: [SPARK-19057][ML] Instances' weight must be non-negative

2017-01-03 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16457 I tend to think the actual algorithms should handle invalid weights, instead of adding that check into instance creation. Also, this will add overhead each time an instance is created.

[GitHub] spark issue #16457: [SPARK-19057][ML] Instances' weight must be non-negative

2017-01-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16457 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16457: [SPARK-19057][ML] Instances' weight must be non-negative

2017-01-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16457 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70814/ Test PASSed. ---

[GitHub] spark issue #16457: [SPARK-19057][ML] Instances' weight must be non-negative

2017-01-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16457 **[Test build #70814 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70814/testReport)** for PR 16457 at commit

[GitHub] spark issue #16457: [SPARK-19057][ML] Instances' weight must be non-negative

2017-01-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16457 **[Test build #70814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70814/testReport)** for PR 16457 at commit

[GitHub] spark issue #16457: [SPARK-19057][ML] Instances' weight must be non-negative

2017-01-03 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16457 The doc fixes look unrelated; let's exclude those? There are many weight fields/args in .ml that should probably also be checked -- would you please skim them and see if there are other obviously