[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

2017-01-28 Thread Jim Apple (Code Review)
Jim Apple has abandoned this change.

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() 
function.
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Youwei Wang 


[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

2016-10-13 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() 
function.
..


Patch Set 8:

(24 comments)

http://gerrit.cloudera.org:8080/#/c/4474/8//COMMIT_MSG
Commit Message:

Line 14: All syntaxes confirm the standard SQL syntax (Core SQL feature ID
Please describe the existing syntax, too, and say that it already works.


PS8, Line 18: trailing
Please capitalize these.


PS8, Line 18: leading
Please try to match our existing formatting choices. This includes breaking 
lines at the 70-character mark, not arbitrarily early as in line 17.


PS8, Line 19:   
Do not indent lines after the first one in a paragraph. Please use git log to 
study what previous commit messages looked like.


PS8, Line 23: option
this word is not needed.


PS8, Line 22: empty
:   argument("" or '')
just say "the empty string"


PS8, Line 24: Syntax
Why is this word capitalized, here and below


PS8, Line 24: target
"target" was not used earlier. Do you mean string_to_be_trimmed?


PS8, Line 31: (standard space)
You don't have to say "standard space" here or below. I was asking about what 
the standard says. Are you sure it just says " "? What do postgres and mysql do?


PS8, Line 43: abcdefg
Make the parameter strings even shorter


PS8, Line 46: Syntax #2:
Please separate these sections by blank lines.


PS8, Line 51: "
What is this doing here? Same above. Please try to be more consciencious about 
these things.


http://gerrit.cloudera.org:8080/#/c/4474/8/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

PS8, Line 1977:  
Make the test cases even smaller.


http://gerrit.cloudera.org:8080/#/c/4474/8/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS8, Line 799: static
static in an unnamed namespace is redundant.


PS8, Line 817: static
Put this in the unnamed namespace


PS8, Line 817: *
&


PS8, Line 818: begin
Is this always 0? Here and below


PS8, Line 838: int begin = 0;
 :   begin = 
Make this one line


PS8, Line 849: StringVal
Does this do the right thing is is_null is true but ptr and len are set?


PS8, Line 862: option.ptr[i]
incorrect argument type


PS8, Line 865: trim_option
This is still not done at parsing/analysis time. That work happens in the 
front-end.


Line 893:   if (trim_option == TrimOption::LEADING || trim_option == 
TrimOption::BOTH) {
trim_option | TrimOption::LEADING


http://gerrit.cloudera.org:8080/#/c/4474/8/common/function-registry/impala_functions.py
File common/function-registry/impala_functions.py:

Line 429:   [['trim'], 'STRING', ['STRING'], 'impala::StringFunctions::Trim'],
Is this line still needed, or is it covered by the new grammar rules?


Line 430:   [['trim'], 'STRING', ['STRING', 'STRING', 'STRING'], 
'impala::StringFunctions::TrimStringWithOption',
Did you try changing this name and making it invisible? What happens?


-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Youwei Wang 
Gerrit-HasComments: Yes


Re: [Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

2016-10-09 Thread Jim Apple
There are more than 40 other code reviews pending right now.

https://gerrit.cloudera.org/#/q/status:open+project:Impala-ASF

This is not an urgent bug fix, so it's lower priority, and it's
already on patch set 8, and it's not that close to being +1ed.

Please be patient.

On Sun, Oct 9, 2016 at 6:25 PM, Youwei Wang (Code Review)
 wrote:
> Youwei Wang has posted comments on this change.
>
> Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() 
> function.
> ..
>
>
> Patch Set 8:
>
> Greetings everyone. The review status of this patch has paused for a few 
> days. If possible, would you please move this a bit forward? Thank you. :)
>
> --
> To view, visit http://gerrit.cloudera.org:8080/4474
> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>
> Gerrit-MessageType: comment
> Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
> Gerrit-PatchSet: 8
> Gerrit-Project: Impala-ASF
> Gerrit-Branch: master
> Gerrit-Owner: Youwei Wang 
> Gerrit-Reviewer: Alex Behm 
> Gerrit-Reviewer: Jim Apple 
> Gerrit-Reviewer: Mostafa Mokhtar 
> Gerrit-Reviewer: Youwei Wang 
> Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

2016-10-09 Thread Youwei Wang (Code Review)
Youwei Wang has posted comments on this change.

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() 
function.
..


Patch Set 8:

Greetings everyone. The review status of this patch has paused for a few days. 
If possible, would you please move this a bit forward? Thank you. :)

-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Youwei Wang 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

2016-10-04 Thread Youwei Wang (Code Review)
Youwei Wang has uploaded a new patch set (#8).

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() 
function.
..

IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Purpose: Removes all instances of one or more characters from the
  specified direction(s) of a STRING value.
Syntax #1 TRIM(where FROM string_to_be_trimmed);
Syntax #2 TRIM(trim_char_set FROM string_to_be_trimmed);
Syntax #3 TRIM(where trim_char_set FROM string_to_be_trimmed);
All syntaxes confirm the standard SQL syntax (Core SQL feature ID
  E021-09).

"where": Case-insensitive trim direction. Valid options are
  leading, trailing, and both. leading means trimming characters
  from the start; trailing means trimming characters from the end;
  both means trimming characters from both sides. These options
  should be provided without single/double quotation marks since they
  are not string literals but identifiers. NULL or empty
  argument("" or '') implies "both" option. If an invalid option is
  specified, TRIM returns target untouched. For Syntax #2, since no
  "where" is specified, the option both is implied by default.

"trim_char_set": Case-sensitive characters to be removed. This
  argument is regarded as a character set going to be removed. So the
  occurrence order of each character doesn't matter and duplicated
  instance of same character will be ignored. NULL argument implies
  " "(standard space) by default. Empty argument("" or '') makes TRIM
  return "string_to_be_trimmed" untouched. For Syntax #1, since no
  "trim_char_set" is specified, it trims " "(standard space) by
  default.

"target": Case-sensitive target string to trim. This argument can be
  NULL. Blank argument causes parsing error.

Return type: string

Examples:
Syntax #1:
trim(both 'abfg' from 'abcdefg'); Result: 'cde';
trim(leading FROM ' 123 '); Result: '123 ';
trim(trailing FROM ' 123 '); Result: ' 123';
Syntax #2:
trim('xyz' FROM 'zyxabczyx')"; Result: "abc";
Syntax #3:
trim(leading 'ab' from 'abcdefg'); Result: 'cdefg';
trim(trailing 'bfg%' from 'abcdefg'); Result: 'abcde';
trim(both 'abfg' from 'abcdefg')"; Result "cde";

Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
M common/thrift/Exprs.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/TrimExpr.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
9 files changed, 406 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4474/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Youwei Wang 


[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

2016-10-04 Thread Youwei Wang (Code Review)
Youwei Wang has posted comments on this change.

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() 
function.
..


Patch Set 6:

(32 comments)

http://gerrit.cloudera.org:8080/#/c/4474/1//COMMIT_MSG
Commit Message:

Line 10:   specified direction(s) of a STRING value.
> Then change the name of the new function away from btrim to something else.
Greetings, Jim.
This old function name "btrim" has been discarded. 
And it has been replaced using the new one "trim".
Thank you.


http://gerrit.cloudera.org:8080/#/c/4474/6//COMMIT_MSG
Commit Message:

PS6, Line 14: Syntax #3 confirms the standard SQL syntax (Core SQL feature ID
> All of them are part of the standard, as is simply TRIM(string_to_be_trimme
Done


PS6, Line 18: |
> just use regular prose here: Valid options are "leading", "trailing", and "
Done


PS6, Line 22: NULL
> How is a NULL passed? Is it literally just "trim(NULL from 'foo')"?
Greetings, Jim.
Yes, it is exact as you have said.
But I have to say: "trim(NULL from 'foo')" is invalid.
The correct calling form is like: trim(NULL 'f' from 'foo');


PS6, Line 22: empty argument
> What does "empty argument" mean in this case?
Greetings, Jim. 
"empty argument" means an empty string like: "" or ''.
Extra explaination has been added. 
Thank you for pointing out this. :)


PS6, Line 23: returns
> not just "returns", "TRIM returns". Here and below.
Greetings, Jim.
Thank you for pointing out that. :)


PS6, Line 25: "trim_char_set": Case-sensitive characters to be removed. This
> Please separate paragraphs by a blank line.
Done


PS6, Line 29: " "
> Is the standard space or all whitespace, including tabs, newlines, etc.?
Greetings, Jim.
I mean the standard space here.
This comment has been fixed.
Thank you. :)


PS6, Line 32: Blank
> How is a "blank" argument different than an "empty argument" above?
Greetings, Jim.
No, "Blank" means nothing provided.
For example: select trim(both " " from );
This is the case of "blank" argument.
As you can see here, this definitely causes parsing error.

Thank you.


PS6, Line 33: ote: For Syntax #1, since no "characters" is specified, it trims
: " "(space) by default. For Syntax #2, since no "where" is 
specified,
: the option both is implied by default.
> Please split this up and put it above in the text on each option.
Done


PS6, Line 44: @&@&@&@
> This is hard to read. Can you stick to alphanumeric characters in these exa
Greetings, Jim. 
Thank you for pointing out that. 
Corresponding tests will be modified according to your guideline.


http://gerrit.cloudera.org:8080/#/c/4474/6/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1977:   TestStringValue("trim(leading FROM ' &@&127+  &@   ')", 
"&@&127+  &@   ");
> Please try to have each test case test one thing different from every other
Done


http://gerrit.cloudera.org:8080/#/c/4474/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

Line 788: static inline bitset<256>* TrimInternalPrepare(FunctionContext* ctx,
> Please use unnamed namespaces; see https://google.github.io/styleguide/cppg
Done


PS6, Line 805: bitset
> const
Done


PS6, Line 806: &
> Out parameters, when used, are not references. https://google.github.io/sty
Done


PS6, Line 806: begin
> prefer to return, not pass out parameters.
Done


PS6, Line 807: static_cast
> What type is str.ptr[begin]? What type does test take? Do you need this cas
Greetings, Jim.
The type of str.ptr[begin] is uint8_t. 
I believe we don't need this case in this case.


PS6, Line 807: test
> Please use [], not test - we don't need the bounds checking.
Done


PS6, Line 813: int &
> begin is not modified.
Done


PS6, Line 822: bitset
> const
Done


PS6, Line 824: int32_t
> Don't use int32_t and int interchangeably without a specific reason
Done


PS6, Line 835: str
> The original functions made a new string val, rather than returning this on
Greetings, Jim.
I guess it is related to the dynamic memory management.
Am I right? :)


Line 848: option.ptr[i] = std::tolower(option.ptr[i]);
> Though awkward, please insert the required casts here.
Done


PS6, Line 851: TrimOption
> This should be done at parsing/analysis time, for efficiency reasons
Done


PS6, Line 855: INVALID
> LEADING = 1, TRAILING = 2, BOTH = LEADING | TRAILING, INVALID = ~BOTH. The 
Done


PS6, Line 858: opt_
> https://google.github.io/styleguide/cppguide.html#Variable_Names
Done


PS6, Line 861: ( 
> Please use clang-format to format your code.
Done


PS6, Line 861: 7
> try to avoid repeating literals, like 7. Maybe set static constexpr char LE
Done


PS6, Line 863: both
> bad copy
Done


PS6, Line 866: both
> bad copy
Done


http://gerrit.cloudera.org:8080/#/c/4474/1/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:

PS1, Line 123: Base6
> And while you're changing the name, no need to call this BTrim.
Greetings, Jim.
I mean no o

[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

2016-10-04 Thread Youwei Wang (Code Review)
Youwei Wang has uploaded a new patch set (#7).

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() 
function.
..

IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Purpose: Removes all instances of one or more characters from the
  specified direction(s) of a STRING value.
Syntax #1 TRIM(where FROM string_to_be_trimmed);
Syntax #2 TRIM(trim_char_set FROM string_to_be_trimmed);
Syntax #3 TRIM(where trim_char_set FROM string_to_be_trimmed);
All syntaxes confirm the standard SQL syntax (Core SQL feature ID
  E021-09).

"where": Case-insensitive trim direction. Valid options are
  leading, trailing, and both. leading means trimming characters
  from the start; trailing means trimming characters from the end;
  both means trimming characters from both sides. These options
  should be provided without single/double quotation marks since they
  are not string literals but identifiers. NULL or empty
  argument("" or '') implies "both" option. If an invalid option is
  specified, TRIM returns target untouched. For Syntax #2, since no
  "where" is specified, the option both is implied by default.

"trim_char_set": Case-sensitive characters to be removed. This
  argument is regarded as a character set going to be removed. So the
  occurrence order of each character doesn't matter and duplicated
  instance of same character will be ignored. NULL argument implies
  " "(standard space) by default. Empty argument("" or '') makes TRIM
  return "string_to_be_trimmed" untouched. For Syntax #1, since no
  "trim_char_set" is specified, it trims " "(standard space) by
  default.

"target": Case-sensitive target string to trim. This argument can be
  NULL. Blank argument causes parsing error.

Return type: string

Examples:
Syntax #1:
trim(both 'abfg' from 'abcdefg'); Result: 'cde';
trim(leading FROM ' 123 '); Result: '123 ';
trim(trailing FROM ' 123 '); Result: ' 123';
Syntax #2:
trim('xyz' FROM 'zyxabczyx')"; Result: "abc";
Syntax #3:
trim(leading 'ab' from 'abcdefg'); Result: 'cdefg';
trim(trailing 'bfg%' from 'abcdefg'); Result: 'abcde';
trim(both 'abfg' from 'abcdefg')"; Result "cde";

Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
M common/thrift/Exprs.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/TrimExpr.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
9 files changed, 406 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4474/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Youwei Wang 


[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

2016-10-03 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() 
function.
..


Patch Set 6:

(32 comments)

http://gerrit.cloudera.org:8080/#/c/4474/1//COMMIT_MSG
Commit Message:

Line 10:   specified direction(s) of a STRING value.
> Greetings, Jim.
Then change the name of the new function away from btrim to something else.


http://gerrit.cloudera.org:8080/#/c/4474/6//COMMIT_MSG
Commit Message:

PS6, Line 14: Syntax #3 confirms the standard SQL syntax (Core SQL feature ID
All of them are part of the standard, as is simply TRIM(string_to_be_trimmed).

http://savage.net.au/SQL/sql-92.bnf.html


PS6, Line 18: |
just use regular prose here: Valid options are "leading", "trailing", and "both"


PS6, Line 22: empty argument
What does "empty argument" mean in this case?


PS6, Line 22: NULL
How is a NULL passed? Is it literally just "trim(NULL from 'foo')"?


PS6, Line 23: returns
not just "returns", "TRIM returns". Here and below.


PS6, Line 25: "trim_char_set": Case-sensitive characters to be removed. This
Please separate paragraphs by a blank line.


PS6, Line 29: " "
Is the standard space or all whitespace, including tabs, newlines, etc.?


PS6, Line 32: Blank
How is a "blank" argument different than an "empty argument" above?


PS6, Line 33: ote: For Syntax #1, since no "characters" is specified, it trims
: " "(space) by default. For Syntax #2, since no "where" is 
specified,
: the option both is implied by default.
Please split this up and put it above in the text on each option.


PS6, Line 44: @&@&@&@
This is hard to read. Can you stick to alphanumeric characters in these 
examples, please? The smaller the input that illustrates the point, the clearer 
it is.

This applies to tests, as well.


http://gerrit.cloudera.org:8080/#/c/4474/6/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1977:   TestStringValue("trim(leading FROM ' &@&127+  &@   ')", 
"&@&127+  &@   ");
Please try to have each test case test one thing different from every other 
test case. What a case is testing should be clear by looking at it. The smaller 
the test case is, the better.

As an example, would this test case be better as "(trim leading FROM ' a b ')"?


http://gerrit.cloudera.org:8080/#/c/4474/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

Line 788: static inline bitset<256>* TrimInternalPrepare(FunctionContext* ctx,
Please use unnamed namespaces; see 
https://google.github.io/styleguide/cppguide.html


PS6, Line 805: bitset
const


PS6, Line 806: &
Out parameters, when used, are not references. 
https://google.github.io/styleguide/cppguide.html#Reference_Arguments


PS6, Line 806: begin
prefer to return, not pass out parameters.


PS6, Line 807: test
Please use [], not test - we don't need the bounds checking.


PS6, Line 807: static_cast
What type is str.ptr[begin]? What type does test take? Do you need this cast? 
Why or why not?


PS6, Line 813: int &
begin is not modified.


PS6, Line 822: bitset
const


PS6, Line 824: int32_t
Don't use int32_t and int interchangeably without a specific reason


PS6, Line 835: str
The original functions made a new string val, rather than returning this one. 
Do you know why?


Line 848: option.ptr[i] = std::tolower(option.ptr[i]);
Though awkward, please insert the required casts here.


PS6, Line 851: TrimOption
This should be done at parsing/analysis time, for efficiency reasons


PS6, Line 855: INVALID
LEADING = 1, TRAILING = 2, BOTH = LEADING | TRAILING, INVALID = ~BOTH. The 
comparison on line 886 gets simplified.


PS6, Line 858: opt_
https://google.github.io/styleguide/cppguide.html#Variable_Names


PS6, Line 861: ( 
Please use clang-format to format your code.


PS6, Line 861: 7
try to avoid repeating literals, like 7. Maybe set static constexpr char 
LEADING[] = "leading", then check if option.len == sizeof(LEADING)-1? See how 
that looks.


PS6, Line 863: both
bad copy


PS6, Line 866: both
bad copy


http://gerrit.cloudera.org:8080/#/c/4474/1/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:

PS1, Line 123: Base6
And while you're changing the name, no need to call this BTrim.


http://gerrit.cloudera.org:8080/#/c/4474/1/common/function-registry/impala_functions.py
File common/function-registry/impala_functions.py:

PS1, Line 478: ezza 
This is the string you should change. Change it to something else, like 
sql92trim, then make that invisible.


-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Youwei 

[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

2016-09-30 Thread Youwei Wang (Code Review)
Youwei Wang has posted comments on this change.

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() 
function.
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4474/3//COMMIT_MSG
Commit Message:

Line 9: Purpose: Removes all instances of one or more characters from the
> We should definitely not invent a new function name and stick to trim(). Th
Done


Line 32:   NULL. Blank argument causes parsing error.
> The 'where' and 'trim_char_set' clauses are optional, so the following are 
Done


http://gerrit.cloudera.org:8080/#/c/4474/3/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

Line 2418:   {: RESULT = c; :}
> Not clear what this rule adds. The copy+paste nature of it means an increas
Done


Line 2449: 
> My understanding is that several of the trim() clauses are optional, e.g., 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Youwei Wang 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

2016-09-30 Thread Youwei Wang (Code Review)
Youwei Wang has uploaded a new patch set (#6).

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() 
function.
..

IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Purpose: Removes all instances of one or more characters from the
  specified direction(s) of a STRING value.
Syntax #1 TRIM(where FROM string_to_be_trimmed);
Syntax #2 TRIM(trim_char_set FROM string_to_be_trimmed);
Syntax #3 TRIM(where trim_char_set FROM string_to_be_trimmed);
Syntax #3 confirms the standard SQL syntax (Core SQL feature ID
  E021-09).

"where": Case-insensitive trim direction. Valid options are:
  leading|trailing|both. leading means trimming characters from the
  start; trailing means trimming characters from the end; both means
  trimming characters from both sides. These options should be
  provided without single/double quotation marks since they are not
  string literals but identifiers. NULL or empty argument implies
  "both" option. If an invalid option is specified, returns target
  untouched.
"trim_char_set": Case-sensitive characters to be removed. This
  argument is regarded as a character set going to be removed. So the
  occurrence order of each character doesn't matter and duplicated
  instance of same character will be ignored. NULL argument implies
  " "(space) by default. Empty argument return string_to_be_trimmed
  untouched.
"target": Case-sensitive target string to trim. This argument can be
  NULL. Blank argument causes parsing error.
Note: For Syntax #1, since no "characters" is specified, it trims
" "(space) by default. For Syntax #2, since no "where" is specified,
the option both is implied by default.
Return type: string

Examples:
Syntax #1:
trim(both 'abfg%' from 'abc%%defg%'); returns 'c%%de';
trim(leading FROM ' 123 '); returns '123 ';
trim(trailing FROM ' 123 '); returns ' 123';
Syntax #2:
trim('&@' FROM '&@&@&@&127+1-@&@&@&@')"; returns "127+1-";
Syntax #3:
trim(leading 'ab%' from 'abc%%defg%'); returns 'c%%defg%';
trim(trailing 'bfg%' from 'abc%%defg%'); returns 'abc%%de';
trim(both 'abfg%' from 'abc%%defg%')"; returns "c%%de";

Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
M common/thrift/Exprs.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/TrimExpr.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
9 files changed, 427 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4474/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Youwei Wang 


[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

2016-09-29 Thread Youwei Wang (Code Review)
Youwei Wang has uploaded a new patch set (#5).

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() 
function.
..

IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Purpose: Removes all instances of one or more characters from the
  specified direction(s) of a STRING value.
Syntax #1 TRIM(where FROM string_to_be_trimmed);
Syntax #2 TRIM(trim_char_set FROM string_to_be_trimmed);
Syntax #3 TRIM(where trim_char_set FROM string_to_be_trimmed);
Syntax #3 confirms the standard SQL syntax (Core SQL feature ID
  E021-09).

"where": Case-insensitive trim direction. Valid options are:
  leading|trailing|both. leading means trimming characters from the
  start; trailing means trimming characters from the end; both means
  trimming characters from both sides. These options should be
  provided without single/double quotation marks since they are not
  string literals but identifiers. NULL or empty argument implies
  "both" option. If an invalid option is specified, returns target
  untouched.
"trim_char_set": Case-sensitive characters to be removed. This
  argument is regarded as a character set going to be removed. So the
  occurrence order of each character doesn't matter and duplicated
  instance of same character will be ignored. NULL argument implies
  " "(space) by default. Empty argument return string_to_be_trimmed
  untouched.
"target": Case-sensitive target string to trim. This argument can be
  NULL. Blank argument causes parsing error.
Note: For Syntax #1, since no "characters" is specified, it trims
" "(space) by default. For Syntax #2, since no "where" is specified,
the option both is implied by default.
Return type: string

Examples:
Syntax #1:
trim(both 'abfg%' from 'abc%%defg%'); returns 'c%%de';
trim(leading FROM ' 123 '); returns '123 ';
trim(trailing FROM ' 123 '); returns ' 123';
Syntax #2:
trim('&@' FROM '&@&@&@&127+1-@&@&@&@')"; returns "127+1-";
Syntax #3:
trim(leading 'ab%' from 'abc%%defg%'); returns 'c%%defg%';
trim(trailing 'bfg%' from 'abc%%defg%'); returns 'abc%%de';
trim(both 'abfg%' from 'abc%%defg%')"; returns "c%%de";

Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
M common/thrift/Exprs.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/com/cloudera/impala/analysis/TrimExpr.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeExprsTest.java
9 files changed, 429 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4474/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Youwei Wang 


[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

2016-09-29 Thread Youwei Wang (Code Review)
Youwei Wang has uploaded a new patch set (#4).

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() 
function.
..

IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Purpose: Removes all instances of one or more characters from the
  specified direction(s) of a STRING value.
Syntax #1 TRIM(where FROM string_to_be_trimmed);
Syntax #2 TRIM(trim_char_set FROM string_to_be_trimmed);
Syntax #3 TRIM(where trim_char_set FROM string_to_be_trimmed);
Syntax #3 confirms the standard SQL syntax (Core SQL feature ID
  E021-09).

"where": Case-insensitive trim direction. Valid options are:
  leading|trailing|both. leading means trimming characters from the
  start; trailing means trimming characters from the end; both means
  trimming characters from both sides. These options should be
  provided without single/double quotation marks since they are not
  string literals but identifiers. NULL or empty argument implies
  "both" option. If an invalid option is specified, returns target
  untouched.
"trim_char_set": Case-sensitive characters to be removed. This
  argument is regarded as a character set going to be removed. So the
  occurrence order of each character doesn't matter and duplicated
  instance of same character will be ignored. NULL argument implies
  " "(space) by default. Empty argument return string_to_be_trimmed
  untouched.
"target": Case-sensitive target string to trim. This argument can be
  NULL. Blank argument causes parsing error.
Note: For Syntax #1, since no "characters" is specified, it trims
" "(space) by default. For Syntax #2, since no "where" is specified,
the option both is implied by default.
Return type: string

Examples:
Syntax #1:
trim(both 'abfg%' from 'abc%%defg%'); returns 'c%%de';
trim(leading FROM ' 123 '); returns '123 ';
trim(trailing FROM ' 123 '); returns ' 123';
Syntax #2:
trim('&@' FROM '&@&@&@&127+1-@&@&@&@')"; returns "127+1-";
Syntax #3:
trim(leading 'ab%' from 'abc%%defg%'); returns 'c%%defg%';
trim(trailing 'bfg%' from 'abc%%defg%'); returns 'abc%%de';
trim(both 'abfg%' from 'abc%%defg%')"; returns "c%%de";

Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
A common/function-registry/string-functions-ir.cc
M common/thrift/Exprs.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/com/cloudera/impala/analysis/TrimExpr.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeExprsTest.java
10 files changed, 1,439 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4474/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Youwei Wang 


[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

2016-09-28 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() 
function.
..


Patch Set 3:

(4 comments)

Missing parser tests as well as TrimExpr.java

http://gerrit.cloudera.org:8080/#/c/4474/3//COMMIT_MSG
Commit Message:

Line 9: Syntax #1: BTRIM(string where, string trim_char_set, string target);
We should definitely not invent a new function name and stick to trim(). The 
existing trim() is a special case of the more general ISO-SQL trim(), so we 
should fold the old functionality into the new more general expr.


Line 32:   btrim('left', 'a%', 'abc%%defg%'); returns 'bc%%defg%';
The 'where' and 'trim_char_set' clauses are optional, so the following are also 
legal uses of trim():

trim('  abcde  ');
trim(right from 'abcde  ');
trim('z' from 'zzzabczzz');
trim(left 'xz' from 'abcxz');


http://gerrit.cloudera.org:8080/#/c/4474/3/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

Line 2418: string_expr ::=
Not clear what this rule adds. The copy+paste nature of it means an increased 
maintenance burden.


Line 2449:   // Below is a special case for BTRIM. Idents are used to avoid 
adding new keywords.
My understanding is that several of the trim() clauses are optional, e.g., the 
location and the characters to trim.

The expr seems complicated enough that we should make TRIM a keyword and add a 
separate production "trim_expr ::=" to recognize it (keep the optional clauses 
in mind).

Most other DBs seem to have TRIM as a keyword also, so we'd not be alone here.


-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Youwei Wang 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

2016-09-27 Thread Youwei Wang (Code Review)
Youwei Wang has posted comments on this change.

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() 
function.
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4474/1//COMMIT_MSG
Commit Message:

Line 10:   Purpose: Removes all instances of one or more characters from the
> Please look at invisible_functions, line 672 of that file.
Greetings, Jim.
Thank you so much for providing this great idea. However, I am afraid I must 
say this approach doesn't work as it is intended to be. If I move the btrim 
entry to the invisible_functions part, both two syntaxes are unavailable now 
for these two syntaxes share the same entry. I have verified this by 
experiments. Please feel free to point out my mistake if you think I am wrong. 
Thank you. :)


-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Youwei Wang 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

2016-09-26 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() 
function.
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4474/1//COMMIT_MSG
Commit Message:

Line 10:   Purpose: Removes all instances of one or more characters from the
> Greetings, Jim.
Please look at invisible_functions, line 672 of that file.


-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Youwei Wang 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

2016-09-22 Thread Youwei Wang (Code Review)
Youwei Wang has posted comments on this change.

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() 
function.
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4474/1//COMMIT_MSG
Commit Message:

Line 10:   Purpose: Removes all instances of one or more characters from the
> "Pros" means the reasons for something, while "cons" means the reasons agai
Greetings, Jim.
Please be kind to allow me to give some explanations from the start:
1. In order to support the syntax like: btrim(heading|trailing|both char_set 
from string), it is necessary to register an entry in the 
common/function-registry/impala_functions.py file. Actually, it is also mapped 
to the entry point of the backend logic.
2. Such entry will be encoded in the frontend built-in UDF category for parsing 
usage and it will be called when users input such query:  “select 
btrim(heading|trailing|both char_set from string)”;
3. Once such entry is ready and compiled, the syntax of the form #1 will be 
automatically supported, no matter you like it or not. This is exactly the same 
case as this function “extract(timestamp, string unit) || extract(unit FROM 
timestamp)” shown in this link:
http://www.cloudera.com/documentation/archive/impala/2-x/2-1-x/topics/impala_datetime_functions.html
It is easy for me to conceal the description of the form 1. But I can’t stop 
users from using it if they have some hacking skills and dig deeply into the 
impala_functions.py file. So in my opinion, it would be more friendly to point 
out the existence of such syntax - maybe there is someone whoever wants it. 
I am not sure whether my explanation can answer your question that “why there 
should be two different syntaxes for the same function.”If you don’t think my 
explanation is right. Please feel free to point out, thank you. :)


PS1, Line 12:   "where": Case-insensitive trim direction. Valid options are:
> I don't think you mean "an enumerate value" -- "enumerate" is, as far as I 
Done


PS1, Line 13: 
> This line is still 3 characters over the red line in patch set 2. Please do
Done


Line 19:   "trim_char_set": Case-sensitive characters to be removed. This
> I noted "don't just answer those particular questions", but you did exactly
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Youwei Wang 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

2016-09-22 Thread Youwei Wang (Code Review)
Youwei Wang has uploaded a new patch set (#3).

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() 
function.
..

IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Syntax #1: BTRIM(string where, string trim_char_set, string target);
  Purpose: Removes all instances of one or more characters from the
specified direction(s) of a STRING value.
  "where": Case-insensitive trim direction. Valid options are:
'left|leading|right|trailing|both'. 'left|leading' means trimming
characters from the start; 'right|trailing' means trimming
characters from the end; 'both' means trimming characters from
both sides. If this argument is NULL or non-NULL but none of the
valid options, the function returns the "target" argument
untouched. Blank argument causes parsing error.
  "trim_char_set": Case-sensitive characters to be removed. This
argument is regarded as a character set going to be removed. So
the occurrence order of each character doesn't matter and
duplicated instance of same character will be ignored. If this
argument is NULL, the function returns the target string
untouched or it removes all occurrences of characters in the
"trim_char_set" string according to the "where" option. Blank
argument causes parsing error.
  "target": Case-sensitive target string to trim. Blank argument
causes parsing error.
  Return type: string

  Examples:
  btrim('left', 'a%', 'abc%%defg%'); returns 'bc%%defg%';
  btrim('right', 'fg%', 'abc%%defg%'); returns 'abc%%de';
  btrim('leading', 'ab%', 'abc%%defg%'); returns 'c%%defg%';
  btrim('trailing', 'bfg%', 'abc%%defg%'); returns 'abc%%de';
  btrim('both', 'abfg%', 'abc%%defg%'); returns 'c%%de';

Syntax #2: BTRIM(where string FROM string);
  Purpose: Identical as Form #1 except this form confirms the
standard SQL syntax (Core SQL feature ID E021-09).
  "where": Case-insensitive trim direction, can be one of leading,
trailing, or both. These options should be provided without
single/double quotation marks since they are not string literals
but identifiers. For left and right are Impala keywords, they are
not available in this syntax.
  The second string argument corresponds to the "trim_char_set" in
  Syntax #1 and the third string argument corresponds to the "target"
  in Syntax #1. Same argument meaning and restriction from Syntax #1
  are applied.
  Return type: string

  Examples:
  btrim(leading 'ab%' from 'abc%%defg%'); returns 'c%%defg%';
  btrim(trailing 'bfg%' from 'abc%%defg%'); returns 'abc%%de';
  btrim(both 'abfg%' from 'abc%%defg%'); returns 'c%%de';

Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
M common/thrift/Exprs.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeExprsTest.java
7 files changed, 198 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4474/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Youwei Wang