[kudu-CR] [spark] Add prefetching option to kudu-spark
Yao Xu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14598 ) Change subject: [spark] Add prefetching option to kudu-spark .. Patch Set 1: > (1 comment) > > Thanks for finding and fixing this bug. Could we break this into > two patches for clarity? The first patch should fix exposing > prefetching in the scan token, and the second should expose > prefetching to spark. Ok, I will break this patch into two patches. I think it's better to set it to false for the time being, because prefetching means that the spark task needs more memory, which may cause problems for the stability of existing spark jobs, such as memory out of limits. -- To view, visit http://gerrit.cloudera.org:8080/14598 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If48735d693ad560f96e8cd5781eff916c06b8aa8 Gerrit-Change-Number: 14598 Gerrit-PatchSet: 1 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yao Xu Gerrit-Comment-Date: Fri, 01 Nov 2019 07:59:01 + Gerrit-HasComments: No
[kudu-CR] [spark] Add prefetching option to kudu-spark
Yao Xu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14598 ) Change subject: [spark] Add prefetching option to kudu-spark .. Patch Set 1: > (1 comment) Maybe we can use the existing testcase to test the function of prefetching, I will take a look. -- To view, visit http://gerrit.cloudera.org:8080/14598 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If48735d693ad560f96e8cd5781eff916c06b8aa8 Gerrit-Change-Number: 14598 Gerrit-PatchSet: 1 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yao Xu Gerrit-Comment-Date: Fri, 01 Nov 2019 08:01:22 + Gerrit-HasComments: No
[kudu-CR] [spark] Add prefetching option to kudu-spark
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14598 ) Change subject: [spark] Add prefetching option to kudu-spark .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/14598/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14598/1//COMMIT_MSG@11 PS1, Line 11: can be greatly reduced in some scenarios. Therefore, I added prefetching > Out of curiosity, have you seen a performance increase when using pre-fetch I share Grant's curiosity. I am also a little anxious about advertising it more widely given that it has no automated testing at all (see KUDU-1260). For more context, this client-side prefetching thing was inherited from asynchbase when we first built the Java client; it wasn't explicitly added to Kudu. -- To view, visit http://gerrit.cloudera.org:8080/14598 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If48735d693ad560f96e8cd5781eff916c06b8aa8 Gerrit-Change-Number: 14598 Gerrit-PatchSet: 1 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Thu, 31 Oct 2019 19:52:21 + Gerrit-HasComments: Yes
[kudu-CR] [spark] Add prefetching option to kudu-spark
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/14598 ) Change subject: [spark] Add prefetching option to kudu-spark .. Patch Set 1: Code-Review+1 (1 comment) Thanks for finding and fixing this bug. Could we break this into two patches for clarity? The first patch should fix exposing prefetching in the scan token, and the second should expose prefetching to spark. http://gerrit.cloudera.org:8080/#/c/14598/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14598/1//COMMIT_MSG@11 PS1, Line 11: can be greatly reduced in some scenarios. Therefore, I added prefetching Out of curiosity, have you seen a performance increase when using pre-fetching? Do you have a quantified example? Should we consider setting the default to true? why or why not? -- To view, visit http://gerrit.cloudera.org:8080/14598 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If48735d693ad560f96e8cd5781eff916c06b8aa8 Gerrit-Change-Number: 14598 Gerrit-PatchSet: 1 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Thu, 31 Oct 2019 13:27:21 + Gerrit-HasComments: Yes
[kudu-CR] [spark] Add prefetching option to kudu-spark
Yao Xu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14598 Change subject: [spark] Add prefetching option to kudu-spark .. [spark] Add prefetching option to kudu-spark We have already supported the scanner prefetching feature in the previous patches. With the prefetching, the time for the spark task to read kudu data can be greatly reduced in some scenarios. Therefore, I added prefetching option for kudu-spark in this patch. Change-Id: If48735d693ad560f96e8cd5781eff916c06b8aa8 --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala M src/kudu/client/client.proto 9 files changed, 44 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/98/14598/1 -- To view, visit http://gerrit.cloudera.org:8080/14598 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If48735d693ad560f96e8cd5781eff916c06b8aa8 Gerrit-Change-Number: 14598 Gerrit-PatchSet: 1 Gerrit-Owner: Yao Xu