Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10817 )
Change subject: WIP: KUDU-2191: support table-name identifiers with upper case chars ...................................................................... Patch Set 4: (13 comments) http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.h File src/kudu/hms/hms_catalog.h: http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.h@131 PS3, Line 131: only the : // characters a-z, A-Z, 0-9, _, and /. Nit: I think you can get by with "only ascii alphanumerics, _, and /" http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.h@134 PS3, Line 134: Normalizeed Normalized http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.h@140 PS3, Line 140: table Nit: "name" maybe? You've used 'table' for a hive::Table* in PopulateTable so I don't think it should be used for a std::string here. http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.h@162 PS3, Line 162: static Status ParseTableName(const std::string& table, I think StringPiece is a more idiomatic choice for "pointer to part of a string". It doesn't have a mutable_data() member (which you'll need for ToLowerCase) but you can add that. Also, please update the function doc to explain what the lifecycles of hms_database and hms_table are now. On second thought, why bother with this optimization? Why not just continue to return std::strings? This doesn't seem performance critical, so the extra lifecycle considerations don't seem worth it. http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.cc File src/kudu/hms/hms_catalog.cc: http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.cc@105 PS3, Line 105: '_', and '/' > The HMS appears to always allow underscores, and allows forward slashes by Should we at least warn users about how '/' is configurable in the HMS? Or is that too much detail? http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/integration-tests/master_hms-itest.cc File src/kudu/integration-tests/master_hms-itest.cc: http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/integration-tests/master_hms-itest.cc@477 PS3, Line 477: vector<string>( Surprised you needed this. Below too. http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.h File src/kudu/master/catalog_manager.h: http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.h@766 PS3, Line 766: 'normalized_table_names_map_', 'table_ids_map_', : // and 'tablet_map_' Nit: feel free to make this vaguer so it needn't be updated whenever the maps change names. http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@318 PS3, Line 318: auto* existing = InsertOrReturnExisting(&catalog_manager_->normalized_table_names_map_, I'm a little nervous about this; I think there may have been a good reason for why the code overwrites entries in table_names_map_, beyond consistency with the insertion into table_ids_map_ and tablet_map_. I can't think of any now, though. Anything interesting in the git history? http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@328 PS3, Line 328: Substitute("$0 or $1 [id=$2]", (*existing)->ToString(), l.data().name(), table_id)); Wouldn't it be helpful to show the IDs of both tables? http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@1513 PS3, Line 1513: ); > When HMS integration is enabled, maybe we should log the normalized table If you do that, update the log statement on L1525 and possibly elsewhere here too. http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@2304 PS3, Line 2304: // reserve the new table name, since it's already reserved by way of Nit: normally I wouldn't advocate for a double negative, but I wonder if the condition would be more readable if it matched the comment: if (!(table_name != req.new_table_name() && normalized_table_name == normalized_new_table_name)) http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@4794 PS3, Line 4794: string normalized_table_name = table_name; Why ignore_result()? AFAICT every place calling CatalogManager::NormalizeTableName could return a bad Status, so why not propagate this? http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/mini-cluster/external_mini_cluster.h File src/kudu/mini-cluster/external_mini_cluster.h: http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/mini-cluster/external_mini_cluster.h@322 PS3, Line 322: // Enable Hive Metastore integration. : // Overrides HMS integration options set by ExternalMiniClusterOptions. : void EnableMetastoreIntegration(); : : // Disable Hive Metastore integration. : // Overrides HMS integration options set by ExternalMiniClusterOptions. : void DisableMetastoreIntegration(); These should probably doc that the cluster must be restarted for the change to take effect. Alternatively, perhaps it should be like SetDaemonBinPath and the cluster has to be stopped to even call these? Seems simpler for clients. -- To view, visit http://gerrit.cloudera.org:8080/10817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd Gerrit-Change-Number: 10817 Gerrit-PatchSet: 4 Gerrit-Owner: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Wed, 27 Jun 2018 17:43:23 +0000 Gerrit-HasComments: Yes