Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10950 )

Change subject: IMPALA-376: add built-in functions for parsing JSON
......................................................................


Patch Set 9:

(12 comments)

Zoltan, thank you for taking a look! Since the RapidJSON has been upgraded to 
1.1.0, I'll update this patch soon.

http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc
File be/src/exprs/string-functions.cc:

http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@67
PS9, Line 67: thread_local FunctionContext* UdfJsonAllocator::ctx_ = nullptr;
> nit: please insert empty line after class definition
Done. Remove the static thread_local variable by refactoring the Malloc & Free 
logics as discussed before.


http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@69
PS9, Line 69: RapidJSON required the input strings be end of '\0'.
> I don't think it's grammatically correct. Maybe:
Done


http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@70
PS9, Line 70: tailing
> nit: trailing
Done


http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@96
PS9, Line 96: src_
> nit: read_position_ would be a better name I think
Done. Good name. I think 'pos_ ' is shorter.


http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@166
PS9, Line 166: values.erase
> Maybe it'd be better to have two vectors and juggling the values between th
I think I should make the var name more meaningful. I'd like to rename 'values' 
to 'queue' since it's the queue in BFS. 'push_back' and 'erase' act more like a 
queue. On the other hand, the depth of the json selector is usually small, so 
such kind of functions won't be called too many times in parsing a json string. 
Refactoring these with two vectors might not have significant benifits.


http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@194
PS9, Line 194: (path[path_idx] != ' ')
> I am not sure that there is a test for it. Either for the true and false ca
Sure. Will add test coverage for this.


http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@195
PS9, Line 195: return -1;
> Maybe taking the function context and set some error would be helpful for t
Good point! I'll add more error messages.


http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@199
PS9, Line 199: return NULL
> This function returns -1, the return NULL statement is in GetJsonObjectImpl
Sorry, this comment is stale.


http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@217
PS9, Line 217: return -1;
> Error could be set in function context. We could also use 'path_idx' in the
Sure


http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@251
PS9, Line 251:   vector<UdfValue*> values;
> The C++ standard doesn't specify the initial capacity of std::vector, but i
OK. But the queue is usually short, the benefits might be limit.


http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@267
PS9, Line 267: (end == nullptr)
> Set error in function context?
Sure


http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/util/string-util.h
File be/src/util/string-util.h:

http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/util/string-util.h@53
PS9, Line 53: AdvanceIdentifier
> Maybe 'FindIdentifierEnd' or something like that would be a better name.
Sure. This function is a slightly modified version of 
https://github.com/apache/impala/blob/2e63752858d71cc745534367a686980e060a8180/be/src/gutil/strings/util.cc#L743



--
To view, visit http://gerrit.cloudera.org:8080/10950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a9d3598cb3beca0865a7edb094f3a5b602dbd2f
Gerrit-Change-Number: 10950
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Mon, 20 Aug 2018 00:53:47 +0000
Gerrit-HasComments: Yes

Reply via email to