[kudu-CR] [spark] Add prefetching option to kudu-spark

2019-11-01 Thread Yao Xu (Code Review)
Yao Xu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14598 )

Change subject: [spark] Add prefetching option to kudu-spark
..


Patch Set 1:

> (1 comment)
 >
 > Thanks for finding and fixing this bug. Could we break this into
 > two patches for clarity? The first patch should fix exposing
 > prefetching in the scan token, and the second should expose
 > prefetching to spark.

Ok, I will break this patch into two patches.
I think it's better to set it to false for the time being, because prefetching 
means that the spark task needs more memory, which may cause problems for the 
stability of existing spark jobs, such as memory out of limits.


--
To view, visit http://gerrit.cloudera.org:8080/14598
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If48735d693ad560f96e8cd5781eff916c06b8aa8
Gerrit-Change-Number: 14598
Gerrit-PatchSet: 1
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Comment-Date: Fri, 01 Nov 2019 07:59:01 +
Gerrit-HasComments: No


[kudu-CR] [spark] Add prefetching option to kudu-spark

2019-11-01 Thread Yao Xu (Code Review)
Yao Xu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14598 )

Change subject: [spark] Add prefetching option to kudu-spark
..


Patch Set 1:

> (1 comment)

Maybe we can use the existing testcase to test the function of prefetching, I 
will  take a look.


--
To view, visit http://gerrit.cloudera.org:8080/14598
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If48735d693ad560f96e8cd5781eff916c06b8aa8
Gerrit-Change-Number: 14598
Gerrit-PatchSet: 1
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Comment-Date: Fri, 01 Nov 2019 08:01:22 +
Gerrit-HasComments: No


[kudu-CR] [spark] Add prefetching option to kudu-spark

2019-10-31 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14598 )

Change subject: [spark] Add prefetching option to kudu-spark
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14598/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14598/1//COMMIT_MSG@11
PS1, Line 11: can be greatly reduced in some scenarios. Therefore, I added 
prefetching
> Out of curiosity, have you seen a performance increase when using pre-fetch
I share Grant's curiosity. I am also a little anxious about advertising it more 
widely given that it has no automated testing at all (see KUDU-1260). For more 
context, this client-side prefetching thing was inherited from asynchbase when 
we first built the Java client; it wasn't explicitly added to Kudu.



--
To view, visit http://gerrit.cloudera.org:8080/14598
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If48735d693ad560f96e8cd5781eff916c06b8aa8
Gerrit-Change-Number: 14598
Gerrit-PatchSet: 1
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 31 Oct 2019 19:52:21 +
Gerrit-HasComments: Yes


[kudu-CR] [spark] Add prefetching option to kudu-spark

2019-10-31 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14598 )

Change subject: [spark] Add prefetching option to kudu-spark
..


Patch Set 1: Code-Review+1

(1 comment)

Thanks for finding and fixing this bug. Could we break this into two patches 
for clarity? The first patch should fix exposing prefetching in the scan token, 
and the second should expose prefetching to spark.

http://gerrit.cloudera.org:8080/#/c/14598/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14598/1//COMMIT_MSG@11
PS1, Line 11: can be greatly reduced in some scenarios. Therefore, I added 
prefetching
Out of curiosity, have you seen a performance increase when using pre-fetching? 
Do you have a quantified example?

Should we consider setting the default to true? why or why not?



--
To view, visit http://gerrit.cloudera.org:8080/14598
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If48735d693ad560f96e8cd5781eff916c06b8aa8
Gerrit-Change-Number: 14598
Gerrit-PatchSet: 1
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 31 Oct 2019 13:27:21 +
Gerrit-HasComments: Yes


[kudu-CR] [spark] Add prefetching option to kudu-spark

2019-10-30 Thread Yao Xu (Code Review)
Yao Xu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/14598


Change subject: [spark] Add prefetching option to kudu-spark
..

[spark] Add prefetching option to kudu-spark

We have already supported the scanner prefetching feature in the previous
patches. With the prefetching, the time for the spark task to read kudu data
can be greatly reduced in some scenarios. Therefore, I added prefetching
option for kudu-spark in this patch.

Change-Id: If48735d693ad560f96e8cd5781eff916c06b8aa8
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
M 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala
M 
java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
M src/kudu/client/client.proto
9 files changed, 44 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/98/14598/1
--
To view, visit http://gerrit.cloudera.org:8080/14598
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If48735d693ad560f96e8cd5781eff916c06b8aa8
Gerrit-Change-Number: 14598
Gerrit-PatchSet: 1
Gerrit-Owner: Yao Xu