Attila Bukor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13760 )

Change subject: KUDU-1938 Add support for CHAR/VARCHAR pt 1
......................................................................


Patch Set 22:

(4 comments)

While double checking maximum lengths for CHAR and VARCHAR in other RDMBSs I 
noticed that there's a difference between the 'standard' approach of padding 
CHARs vs. Apache Impala's and now ours.

Originally I implemented the padding of CHARs *before* persisting which it 
seems is what other databases (e.g. MySQL[1], Oracle[2] and PostgreSQL[3]) is 
doing. IIRC this was originally to have fixed-width rows, but with UTF-8 they 
still wouldn't be fixed-width as UTF-8 itself is variable length.

In MySQL's case the trailing spaces are even removed by default when scanned:

> The length of a CHAR column is fixed to the length that you declare when you 
> create the table. The length can be any value from 0 to 255. When CHAR values 
> are stored, they are right-padded with spaces to the specified length. When 
> CHAR values are retrieved, trailing spaces are removed unless the 
> PAD_CHAR_TO_FULL_LENGTH SQL mode is enabled.

Impala[4] on the other hand stores the data without trailing whitespaces and 
it's padded upon retrieval:

> If you store a CHAR value containing trailing spaces in a table, those 
> trailing spaces are not stored in the data file. When the value is retrieved 
> by a query, the result could have a different number of trailing spaces. That 
> is, the value includes however many spaces are needed to pad it to the 
> specified length of the column.

Due to the variable length nature of UTF8 and the columnar format I believe it 
makes most sense to implement it the same way as Impala did, only wanted to 
bring your attention to this discrepancy.

[1] https://docs.oracle.com/cd/E17952_01/mysql-5.1-en/char.html
[2] 
https://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#CNCPT1821
[3] https://www.postgresql.org/docs/9.0/datatype-character.html
[4] https://impala.apache.org/docs/build/html/topics/impala_char.html

http://gerrit.cloudera.org:8080/#/c/13760/22//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13760/22//COMMIT_MSG@15
PS22, Line 15: The maximum length for VARCHAR is 65,535 and 255 for CHAR
> Could you add _why_ these maximum lengths make sense?
Done


http://gerrit.cloudera.org:8080/#/c/13760/22/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

http://gerrit.cloudera.org:8080/#/c/13760/22/src/kudu/common/partial_row.h@415
PS22, Line 415:   /// Get the string/binary value for a column by its name.
> This sentence should also be updated.
Done


http://gerrit.cloudera.org:8080/#/c/13760/22/src/kudu/common/partial_row.h@438
PS22, Line 438:   /// Get the string/binary value for a column by its index.
> Likewise.
Done


http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/common/schema.h
File src/kudu/common/schema.h:

http://gerrit.cloudera.org:8080/#/c/13760/20/src/kudu/common/schema.h@124
PS20, Line 124:   // Maximum value of the length is 65,535 for compatibility 
reasons as it's
              :   // used by VARCHAR type which can be set to a maximum of 
65,535 in case of
              :   // MySQL and less for other major RDMBMS implementations.
> Thanks for the clarification. Could you update the comment to reflect that
Done



--
To view, visit http://gerrit.cloudera.org:8080/13760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I998982dba93831db91c43a97ce30d3e68c2a4a54
Gerrit-Change-Number: 13760
Gerrit-PatchSet: 22
Gerrit-Owner: Attila Bukor <abu...@apache.org>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <abu...@apache.org>
Gerrit-Reviewer: Grant Henke <granthe...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>
Gerrit-Comment-Date: Sat, 20 Jul 2019 10:37:12 +0000
Gerrit-HasComments: Yes

Reply via email to