[GitHub] spark issue #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle perform...

2018-03-05 Thread lucio-yz
Github user lucio-yz commented on the issue: https://github.com/apache/spark/pull/20472 @srowen Any other problems? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle ...

2018-02-28 Thread lucio-yz
Github user lucio-yz commented on a diff in the pull request: https://github.com/apache/spark/pull/20472#discussion_r171184500 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1001,11 +996,18 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle ...

2018-02-28 Thread lucio-yz
Github user lucio-yz commented on a diff in the pull request: https://github.com/apache/spark/pull/20472#discussion_r171182634 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -931,7 +925,8 @@ private[spark] object RandomForest extends Logging

[GitHub] spark pull request #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle ...

2018-02-28 Thread lucio-yz
Github user lucio-yz commented on a diff in the pull request: https://github.com/apache/spark/pull/20472#discussion_r171182291 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -931,7 +925,8 @@ private[spark] object RandomForest extends Logging

[GitHub] spark pull request #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle ...

2018-02-27 Thread lucio-yz
Github user lucio-yz commented on a diff in the pull request: https://github.com/apache/spark/pull/20472#discussion_r171160692 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1001,11 +996,18 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle ...

2018-02-06 Thread lucio-yz
Github user lucio-yz commented on a diff in the pull request: https://github.com/apache/spark/pull/20472#discussion_r166501185 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1001,11 +996,19 @@ private[spark] object RandomForest extends

[GitHub] spark issue #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle perform...

2018-02-05 Thread lucio-yz
Github user lucio-yz commented on the issue: https://github.com/apache/spark/pull/20472 I tested on 2 datasets: 1. _rcv1.binary_, which has 47,236 dimensions. Before improvement, the shuffle write size in _findSplitsBySorting_ is 1GB. After improvement, the shuffle size

[GitHub] spark pull request #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle ...

2018-02-04 Thread lucio-yz
Github user lucio-yz commented on a diff in the pull request: https://github.com/apache/spark/pull/20472#discussion_r165872418 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1001,11 +1002,22 @@ private[spark] object RandomForest extends

[GitHub] spark issue #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle perform...

2018-02-02 Thread lucio-yz
Github user lucio-yz commented on the issue: https://github.com/apache/spark/pull/20472 previous problems have been solved --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle ...

2018-02-02 Thread lucio-yz
Github user lucio-yz commented on a diff in the pull request: https://github.com/apache/spark/pull/20472#discussion_r165595281 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1002,10 +1008,14 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle ...

2018-02-01 Thread lucio-yz
GitHub user lucio-yz opened a pull request: https://github.com/apache/spark/pull/20472 [SPARK-22751][ML]Improve ML RandomForest shuffle performance ## What changes were proposed in this pull request? As I mentioned in [SPARK-22751](https://issues.apache.org/jira/browse