[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees URL: https://github.com/apache/spark/pull/21632#issuecomment-458404664 @srowen thanks for the quick response, I've created a follow-up PR here: https://github.com/apache/spark/pull/23682 In testing I've found the tolerance: val tolerance = Utils.EPSILON * unweightedNumSamples * 100 to be good enough, not sure if I need to make it larger This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees URL: https://github.com/apache/spark/pull/21632#issuecomment-458203376 I think I made a mistake and it should actually be: ``` val tolerance = Utils.EPSILON * (unweightedNumSamples + unweightedNumSamples) ``` or perhaps a larger threshold: ``` val tolerance = Utils.EPSILON * unweightedNumSamples * SomeLargeConstant ``` but I will need to verify by adding some debug to ensure that no zero features slip through for the sample tests, otherwise that tolerance would still be too low and the factor would need to be increased; my worry is that by using the square of the samples the tolerance would become too high with a very large number of samples and then some values would be included as zero feature values which we don't want This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees URL: https://github.com/apache/spark/pull/21632#issuecomment-457997552 @srowen thank you for the merge and the thorough review. I have some doubts about the tolerance we decided for zero values: val tolerance = Utils.EPSILON * unweightedNumSamples * unweightedNumSamples https://github.com/apache/spark/pull/21632/files#diff-1fd1bc8d3fc9306c83cd65fbf3ca4bbeR1054 For a large number of unweighted samples I am worried that it might be too high. Note EPSILON=2.2E-16. I am wondering if I should change the tolerance to be: val tolerance = Utils.EPSILON * unweightedNumSamples * (some constant) What are your thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees URL: https://github.com/apache/spark/pull/21632#issuecomment-456670961 "up to your judgment on whether to add a new overload to DecisionTreeMetadata to simplify the test code" This is a tough decision to make; I would prefer not to modify the source code for the sake of tests, but modifying a lot of test code to call the DecisionTreeMetadata.buildMetadata with LabeledPoint converted to instances instead of LabeledPoint is bad too. There are other options as well. I could make DecisionTreeMetadata.buildMetadata accept an RDD[_] and then dynamically figure out the type but this doesn't seem like a good choice either. I could also create a wrapper around buildMetadata in the test code and then call that wrapper from all tests which should make maintaining code easier in the future (eg the conversion could have been done in the wrapper) but that would only introduce more changes - not less - to the PR, and it also creates another level of indirection which may make the test code more confusing. The current code seems the slightly better choice of the four options listed above (and there may be other options as well), but if there is a strong preference toward one of the other choices I would be glad to update the PR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees URL: https://github.com/apache/spark/pull/21632#issuecomment-453899345 @srowen thank you for taking another look at the PR. I think I have addressed most comments except for two of them. For one of them, I'm not sure how I could handle the MIMA exclusion, I assume that I need to define another constructor without the new parameter but I am getting duplicate method errors. For the other, I just need some clarification - you mentioned mllib should not point to ml but in this case it indeed does not, since there are two LabeledPoint classes, one in mllib and the other (which was the one modified) in ml. I've suggested two ways to fix the issue but both seem sub-optimal to me. Would like to know your thoughts on this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees URL: https://github.com/apache/spark/pull/21632#issuecomment-453666838 it looks like the test failures were unrelated, rebasing to latest to see if it will fix them This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees URL: https://github.com/apache/spark/pull/21632#issuecomment-453551478 jenkins retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees URL: https://github.com/apache/spark/pull/21632#issuecomment-452782044 jenkins retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees URL: https://github.com/apache/spark/pull/21632#issuecomment-449256537 @srowen thank you for the review! I think I have resolved most of the comments except I had a question about the toInstance comment you had. Would you be able to take another look at the PR when you have time? Also tagging @sethah, @holdenk and @jkbradley in case they have time to review the updated PR This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees URL: https://github.com/apache/spark/pull/21632#issuecomment-449082192 jenkins retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees URL: https://github.com/apache/spark/pull/21632#issuecomment-449081540 it looks like the test failure was a random failure in the build system not related to the PR, going to try again This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees URL: https://github.com/apache/spark/pull/21632#issuecomment-448626125 jenkins retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees
imatiach-msft commented on issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees URL: https://github.com/apache/spark/pull/21632#issuecomment-446008052 jenkins retest this please (updated PR to latest) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org