[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees

2019-01-28 Thread GitBox
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample 
weights to decision trees
URL: https://github.com/apache/spark/pull/21632#issuecomment-458404664
 
 
   @srowen thanks for the quick response, I've created a follow-up PR here:
   https://github.com/apache/spark/pull/23682
   In testing I've found the tolerance:
   val tolerance = Utils.EPSILON * unweightedNumSamples * 100
   to be good enough, not sure if I need to make it larger


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees

2019-01-28 Thread GitBox
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample 
weights to decision trees
URL: https://github.com/apache/spark/pull/21632#issuecomment-458203376
 
 
   I think I made a mistake and it should actually be:
   ```
   val tolerance = Utils.EPSILON * (unweightedNumSamples + unweightedNumSamples)
   ```
   or perhaps a larger threshold:
   ```
   val tolerance = Utils.EPSILON * unweightedNumSamples * SomeLargeConstant
   ```
   but I will need to verify by adding some debug to ensure that no zero 
features slip through for the sample tests, otherwise that tolerance would 
still be too low and the factor would need to be increased; my worry is that by 
using the square of the samples the tolerance would become too high with a very 
large number of samples and then some values would be included as zero feature 
values which we don't want


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees

2019-01-27 Thread GitBox
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample 
weights to decision trees
URL: https://github.com/apache/spark/pull/21632#issuecomment-457997552
 
 
   @srowen thank you for the merge and the thorough review.  I have some doubts 
about the tolerance we decided for zero values:
   val tolerance = Utils.EPSILON * unweightedNumSamples * unweightedNumSamples
   
   
https://github.com/apache/spark/pull/21632/files#diff-1fd1bc8d3fc9306c83cd65fbf3ca4bbeR1054
   
   For a large number of unweighted samples I am worried that it might be too 
high.  Note EPSILON=2.2E-16.  I am wondering if I should change the tolerance 
to be:
   val tolerance = Utils.EPSILON * unweightedNumSamples * (some constant)
   What are your thoughts?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees

2019-01-22 Thread GitBox
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample 
weights to decision trees
URL: https://github.com/apache/spark/pull/21632#issuecomment-456670961
 
 
   "up to your judgment on whether to add a new overload to 
DecisionTreeMetadata to simplify the test code"
   This is a tough decision to make; I would prefer not to modify the source 
code for the sake of tests, but modifying a lot of test code to call the 
DecisionTreeMetadata.buildMetadata with LabeledPoint converted to instances 
instead of LabeledPoint is bad too.
   There are other options as well.  I could make 
DecisionTreeMetadata.buildMetadata accept an RDD[_] and then dynamically figure 
out the type but this doesn't seem like a good choice either.
   I could also create a wrapper around buildMetadata in the test code and then 
call that wrapper from all tests which should make maintaining code easier in 
the future (eg the conversion could have been done in the wrapper) but that 
would only introduce more changes - not less - to the PR, and it also creates 
another level of indirection which may make the test code more confusing.
   The current code seems the slightly better choice of the four options listed 
above (and there may be other options as well), but if there is a strong 
preference toward one of the other choices I would be glad to update the PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees

2019-01-13 Thread GitBox
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample 
weights to decision trees
URL: https://github.com/apache/spark/pull/21632#issuecomment-453899345
 
 
   @srowen thank you for taking another look at the PR.  I think I have 
addressed most comments except for two of them.  For one of them, I'm not sure 
how I could handle the MIMA exclusion, I assume that I need to define another 
constructor without the new parameter but I am getting duplicate method errors. 
 For the other, I just need some clarification - you mentioned mllib should not 
point to ml but in this case it indeed does not, since there are two 
LabeledPoint classes, one in mllib and the other (which was the one modified) 
in ml.  I've suggested two ways to fix the issue but both seem sub-optimal to 
me.  Would like to know your thoughts on this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees

2019-01-11 Thread GitBox
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample 
weights to decision trees
URL: https://github.com/apache/spark/pull/21632#issuecomment-453666838
 
 
   it looks like the test failures were unrelated, rebasing to latest to see if 
it will fix them


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees

2019-01-11 Thread GitBox
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample 
weights to decision trees
URL: https://github.com/apache/spark/pull/21632#issuecomment-453551478
 
 
   jenkins retest this please
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees

2019-01-09 Thread GitBox
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample 
weights to decision trees
URL: https://github.com/apache/spark/pull/21632#issuecomment-452782044
 
 
   jenkins retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees

2018-12-20 Thread GitBox
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample 
weights to decision trees
URL: https://github.com/apache/spark/pull/21632#issuecomment-449256537
 
 
   @srowen thank you for the review!  I think I have resolved most of the 
comments except I had a question about the toInstance comment you had.  Would 
you be able to take another look at the PR when you have time?
   Also tagging @sethah, @holdenk and @jkbradley in case they have time to 
review the updated PR


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees

2018-12-20 Thread GitBox
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample 
weights to decision trees
URL: https://github.com/apache/spark/pull/21632#issuecomment-449082192
 
 
   jenkins retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees

2018-12-20 Thread GitBox
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample 
weights to decision trees
URL: https://github.com/apache/spark/pull/21632#issuecomment-449081540
 
 
   it looks like the test failure was a random failure in the build system not 
related to the PR, going to try again


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees

2018-12-19 Thread GitBox
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample 
weights to decision trees
URL: https://github.com/apache/spark/pull/21632#issuecomment-448626125
 
 
   jenkins retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees

2018-12-10 Thread GitBox
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample 
weights to decision trees
URL: https://github.com/apache/spark/pull/21632#issuecomment-446008052
 
 
   jenkins retest this please (updated PR to latest)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org