Grant Henke has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16031 )
Change subject: KUDU-1802: Avoid calls to master when using scan tokens ...................................................................... KUDU-1802: Avoid calls to master when using scan tokens This patch adds new metadata to the scan token to allow it to contain all of the metadata required to construct a KuduTable and open a scanner in the clients. This means the GetTableSchema and GetTableLocations RPC calls to the master are no longer required when using the scan token. New TableMetadataPB, TabletMetadataPB, and authorization token fields were added as optional fields on the token. Additionally a `projected_column_idx` field was added that can be used in place of the `projected_columns`. This significantly reduces the size of the scan token by not duplicating the ColumnSchemaPB that is already in the TableMetadataPB. Adding the table metadata to the scan token is enabled by default given it’s more scalable and performant. However, it can be disabled in rare cases where more resiliency to column renaming is desired. One example where disabling the table metadata is used is the backup job. Future work, tracked by KUDU-3146, should allow for table metadata to be leveraged in those cases as well. This doesn’t avoid the need for a call to the master to get the schema in the case of writing data to Kudu, that work is tracked by KUDU-3135. I expect the TableMetadataPB message would be used there as well. I included the ability to disable this functionality in the kudu-spark integration via `kudu.useDriverMetadata` just in case there are any unforeseen issues or regressions with this feature. I added a test to compare the serialized size of the scan token with and without the table and tablet metadata. The size results for a 100 column table are: no metadata: 2697 Bytes tablet metadata: 2805 tablet, table, and authz metadata: 3258 Change-Id: I88c1b8392de37dd5e8b7bd8b78a21603ff8b1d1b Reviewed-on: http://gerrit.cloudera.org:8080/16031 Reviewed-by: Grant Henke <[email protected]> Tested-by: Grant Henke <[email protected]> --- M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java M java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java M java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java M java/kudu-client/src/main/java/org/apache/kudu/client/TableLocationsCache.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala M src/kudu/client/CMakeLists.txt M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/client.proto M src/kudu/client/meta_cache.cc M src/kudu/client/meta_cache.h M src/kudu/client/scan_token-internal.cc M src/kudu/client/scan_token-internal.h M src/kudu/client/scan_token-test.cc 19 files changed, 1,024 insertions(+), 136 deletions(-) Approvals: Grant Henke: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16031 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I88c1b8392de37dd5e8b7bd8b78a21603ff8b1d1b Gerrit-Change-Number: 16031 Gerrit-PatchSet: 21 Gerrit-Owner: Grant Henke <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Greg Solovyev <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon <[email protected]>
