Hello Kudu Jenkins, Adar Dembo, Hao Hao, Todd Lipcon,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10817

to look at the new patch set (#2).

Change subject: WIP: KUDU-2191: support table-name identifiers with upper case 
chars
......................................................................

WIP: KUDU-2191: support table-name identifiers with upper case chars

The HMS lowercases all database (table) identifiers during database
(table) creation, only storing the lowercased version. On database and
table lookup the HMS automatically does a case-insensitive compare.
During table creation Kudu checks that table names are valid UTF-8, and
does no transformations on identiers. During table lookups Kudu requires
that the table name match exactly, including case.

As a result of these behavior differences and the design of the
notification log listener, tables with upper-case characters can not be
altered or deleted when the HMS integration is enabled. This commit
fixes this by changing how the Catalog Manager handles identifiers when
the HMS integration is enabled:

* During table creation, the Catalog Manager preserves the case of table
  names.
* On table lookup, the Catalog Manager does a case-insensitive
  comparison to find the table.

This is implemented by storing the preserved case in the table's
sys-catalog metadata entry, and storing a 'normalized' (down-cased)
identifier in the ephemeral by-name table map. The various parts of the
catalog manager which deal with the by-name map are converted to use the
normalized version of the name. When the HMS integration is not
configured, normalized table names are equal to the original table name,
so the behavior changes that this patch introduces are entirely opt-in.

There is one edge case that complicates turning on the HMS integration
in rare circumstances: if there are existing (legacy) tables with names
which map to the same normalized form (e.g. differ only in case), the
catalog manager will fail to startup and instruct the operator to rename
the offending tables before trying again. Additionally, this check only
applies to tables that otherwise follow the Hive table naming rules
(matching regex '[\w_/]+\.[\w_/]+').

WIP: I'm unsure about these details of the patch:

* Database names and table names are treated the same, including case
  preservation. This may lead to confusion since Kudu tables 'foo.bar'
  and 'FOO.baz' will be in the same Hive database.

* When a client opens a table the name stored in the table object is the
  name the client specifies.  So if table 'default.FOO' is opened with name
  'Default.Foo', the table object will internally hold the name
  'Default.Foo'.

Both of these seem like the right thing to do to me, but may be a source
of confusion.

Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
---
M src/kudu/hms/hms_catalog-test.cc
M src/kudu/hms/hms_catalog.cc
M src/kudu/hms/hms_catalog.h
M src/kudu/hms/hms_client-test.cc
M src/kudu/integration-tests/master_hms-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/mini-cluster/external_mini_cluster.cc
M src/kudu/mini-cluster/external_mini_cluster.h
9 files changed, 361 insertions(+), 100 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/10817/2
--
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Hao Hao <hao....@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>

Reply via email to