Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 2:

(29 comments)

Dumping another batch of comments :)

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exec/data-source-scan-node.h
File be/src/exec/data-source-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exec/data-source-scan-node.h@100
PS1, Line 100:   Status MaterializeNextRow(RuntimeState* state, MemPool* 
mem_pool, Tuple* tuple);
What do you think about passing cctz::time_zone* instead of RuntimeState*?

Can you add a comment for the  new parameter and it's purpose?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/expr-test.cc@6398
PS1, Line 6398: const char* local_tz_name = "PST8PDT";
              :     ScopedTimeZoneOverride time_zone(local_tz_name);
              :     const cctz::time_zone* local_tz = 
TimezoneDatabase::FindTimezone(local_tz_name);
              :     DCHECK(local_tz != nullptr);
Have you considered moving this to a function or macro or such? As I see you do 
this same thing 3 times.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@a197
PS1, Line 197:
Nice that we can get rid of this magic :)


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@73
PS1, Line 73: from_cs
Might be just me but I don't really find the names from_cs, to_cs and from_tp 
too self-descriptive.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@76
PS1, Line 76: auto from_tp = cctz::convert(from_cs, 
TimezoneDatabase::GetUtcTimezone());
            :   auto to_cs = cctz::convert(from_tp, *timezone);
I think it would worth writing a comment why the two cctz::convert() calls are 
needed.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@79
PS1, Line 79:   // Check if resulting timestamp is within range
In my opinion this comment doesn't add extra value as the name of the function 
states the same.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@113
PS1, Line 113:   context->AddWarning(ss.str().c_str());
             :     return ts_val;
             :   }
             :
this seems duplicate code (TimestampFunctions::FromUtc)


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@140
PS1, Line 140:   context->AddWarning(msg.c_str());
             :     return TimestampVal::null();
             :   }
             :
             :   // Create 'return_date' and 'return_time' from 'to_cs'.
I think this conversion could go to a function as it seems duplicate for me 
with the same in TimestampFunctions::FromUtc.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h
File be/src/exprs/timezone_db.h:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@21
PS1, Line 21: #include <boost/unordered_map.hpp>
shouldn't we use the one from std?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@35
PS1, Line 35:
string& GetPath() ?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@47
PS1, Line 47:   }
FindTimezone() returns pointer while GetUtcTimezone() returns reference to a 
timezone. Can these be unified?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@54
PS1, Line 54:   static const cctz::time_zone UTC_TIMEZONE_;
Could you add a comment what the string param is used for?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@62
PS1, Line 62:   /// location.
As I see you wrote comments for these function in the .cc file. Could you move 
them to the header?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@70
PS1, Line 70: db.
zone_abbrev


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@10
PS1, Line 10: //
Unrelated to your change but shouldn't we replace this to the Apache header 
other files use?
Same goes for the header.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@103
PS1, Line 103: // Returns 'true' if path 'a' starts with path 'b'. If 
'relative' is not nullptr, it will
FileSystemUtil would be a better place for this, I think.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@105
PS1, Line 105: PathStartsWith
I have the feeling that this function should only decide if path 'b' is the 
prefix of path 'a'. Returning the remainder after 'b' is something that I think 
should be out of this functions scope.
I think this mix of responsibilities is the reason the name of the function is 
not that self-explanatory. It could be e.g. IsPrefixPath() or such.

What do you think?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@105
PS1, Line 105: string *relative
For me the name of this variable doesn't indicate it's purpose. Can you name it 
remainder or rest or something similar? Also reading the function comment 
didn't explain me what a path relative to 'b' means.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@108
PS1, Line 108: b.length() + 1 < a.length()
what if a==b. In theory then a starts with b, still this returns false.

As I see in this case 'a' should be 'b'+"/" to get true. If this is intentional 
then please mention it in the function comment.


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@118
PS1, Line 118: // with an uppercase letter.
Could you mention examples here that contain the mentioned allowed chars?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@119
PS1, Line 119: bool IsTimezoneNameSegmentValid(const string& tz_seg) {
Shouldn't this be part of TimezoneDatabase or some other timezone related 
helper class?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@121
PS1, Line 121:       find_if(
Tricky :)

Just for my information, have you considered using regex?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@132
PS1, Line 132: // time-zone name segments delimited by '/'.
Could you mention one input example in the comment?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@133
PS1, Line 133: bool IsTimezoneNameValid(const string& tz_name) {
Shouldn't this be part of TimezoneDatabase or some other timezone related 
helper class?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@136
PS1, Line 136:   while (end != string::npos) {
Wouldn't it be easier to verify this with a regex and get rid of this while 
loop and IsTimezoneNameSegmentValid() ?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@147
PS1, Line 147: bool
can you return int64_t* and return nullptr in case the offset is not valid?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@147
PS1, Line 147: bool IsTimezoneOffsetValid(const string& tz_offset, int64_t* 
offset_sec) {
Shouldn't this be part of TimezoneDatabase or some other timezone related 
helper class?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@160
PS1, Line 160: // The implementation here was adapted from
> Maybe this could be moved to class FileSystemUtil.
I again feel here that this function serves 2 purposes instead of a clear one: 
decide if path is a symbolic link and get the 'real_path'. I think these should 
be split to multiple functions.

What do you think?


http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@251
PS1, Line 251:   // mkdtemp operates in place, so we need a mutable array.
Can you move the comment to the header?



--
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Apr 2018 07:32:54 +0000
Gerrit-HasComments: Yes

Reply via email to