Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11571 )

Change subject: KUDU-2563: [spark] Use the scanner keepAlive API
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/11571/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala:

http://gerrit.cloudera.org:8080/#/c/11571/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@125
PS1, Line 125:  * @param scanner the wrapped scanner
             :  * @param kuduContext the kudu context
Update this.


http://gerrit.cloudera.org:8080/#/c/11571/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@141
PS1, Line 141:         if (!scanner.isClosed) scanner.keepAlive()
I wish I had remembered earlier, but this isn't safe; scanners aren't 
thread-safe as they are only expected to be used by a single thread. That's 
probably why the Impala implementation sends the keep alive requests from 
within the read loop.

In this specific case, I can think of a few different races:
1. TOCTOU on AsyncKuduScanner.closed.
2. Unsafe access to AsyncKuduScanner.closed.
3. Unsafe access to AsyncKuduScanner.tablet.

I think the semantics of the Java memory model guarantee that #2 and #3 are 
safe, but I'm not sure. Regardless, the general point still stands: scanners 
aren't supposed to be accessed by multiple threads, and there's no guarantee 
that an operation that's "safe" today will be safe tomorrow.


http://gerrit.cloudera.org:8080/#/c/11571/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@161
PS1, Line 161:       keepAliveExecutor.shutdown()
Don't we need to do this if we were interrupted too? So maybe wrap the whole 
loop in a try/finally or something like that?


http://gerrit.cloudera.org:8080/#/c/11571/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala:

http://gerrit.cloudera.org:8080/#/c/11571/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala@53
PS1, Line 53:   val defaultKeepAlivePeriodMs: Long = 15000
Should doc the choice in value here, probably related to the default value of 
the scanner ttl server-side?



--
To view, visit http://gerrit.cloudera.org:8080/11571
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db
Gerrit-Change-Number: 11571
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke <granthe...@apache.org>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Comment-Date: Wed, 03 Oct 2018 15:49:12 +0000
Gerrit-HasComments: Yes

Reply via email to