[
https://issues.apache.org/jira/browse/HADOOP-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628843#action_12628843
]
Ashish Thusoo commented on HADOOP-4085:
---------------------------------------
Comments are below. The most major one is about how we are treating character
set name in the grammar. Ideally we would want this to an identifier instead of
token (similar to table name identifiers). With that approach we would be able
to support any kinds of character sets very easily.
Inline Comments:
cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java:85: nitpick - Can we
follow the convention of having the opening brace on the same line as the code.
ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:781: Instead of having fixed
tokens per character set in the grammar, we should define a character-set
identifier and pass that across to the java calls. That is much more scalable
and would get us to seamlessly be able to support any character sets supported
by the java run time.
http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/Charset.html
has information on what can be grammar rules to determine the character set
name and how new charactersets can be added to the JVM by CharactersetProvider.
So the rule for the character set could look something like
charSetStringLiteral : charSetIdentifier StringLiteral charSetIdentifier can
be defined in terms of the rules mentioned in the link above.
ql/src/test/queries/clientpositive/inputddl4.q:0: Lets put a brief comment in
this describing what this actually tests.
ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java:157: nitpick - maybe we
should call this PREFIX and not SAME
ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java:143: Should this not
check across all sort columns instead of bucket columns? Is this a bug?
ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java:384: This function
hardcodes the terminating character and the field delimiters while in the
current code these are parameterized which is better as later we want to drive
them through session level properties.
> internationalization support and sort order (ascedning/descending) support in
> create table
> ------------------------------------------------------------------------------------------
>
> Key: HADOOP-4085
> URL: https://issues.apache.org/jira/browse/HADOOP-4085
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/hive
> Reporter: Namit Jain
> Assignee: Namit Jain
> Attachments: patch1
>
>
> User cannot specify utf8 strings in the query, both for selection and
> filtering. Mysql syntax should be followed:
> select _utf8 'string' from <TableName>
> select <selectExpr> from <TableName> where col = _utf8 0x<HexValue>
> To start with, utf8 strings should be supported. Support for other character
> sets can be added in the future on demand.
> The identifiers (table name/column name etc.) cannot be utf8 strings, it is
> only for the data values.
> Although, in create table, the user has the option of specifying sorted
> columns, he does not have the option of specifying whether they are ascending
> or descending.
> Create Table syntax should be enhanced to support that.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.