Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/9986 )
Change subject: IMPALA-3307: Add support for IANA time-zone db ...................................................................... Patch Set 2: (29 comments) Dumping another batch of comments :) http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exec/data-source-scan-node.h File be/src/exec/data-source-scan-node.h: http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exec/data-source-scan-node.h@100 PS1, Line 100: Status MaterializeNextRow(RuntimeState* state, MemPool* mem_pool, Tuple* tuple); What do you think about passing cctz::time_zone* instead of RuntimeState*? Can you add a comment for the new parameter and it's purpose? http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/expr-test.cc@6398 PS1, Line 6398: const char* local_tz_name = "PST8PDT"; : ScopedTimeZoneOverride time_zone(local_tz_name); : const cctz::time_zone* local_tz = TimezoneDatabase::FindTimezone(local_tz_name); : DCHECK(local_tz != nullptr); Have you considered moving this to a function or macro or such? As I see you do this same thing 3 times. http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@a197 PS1, Line 197: Nice that we can get rid of this magic :) http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@73 PS1, Line 73: from_cs Might be just me but I don't really find the names from_cs, to_cs and from_tp too self-descriptive. http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@76 PS1, Line 76: auto from_tp = cctz::convert(from_cs, TimezoneDatabase::GetUtcTimezone()); : auto to_cs = cctz::convert(from_tp, *timezone); I think it would worth writing a comment why the two cctz::convert() calls are needed. http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@79 PS1, Line 79: // Check if resulting timestamp is within range In my opinion this comment doesn't add extra value as the name of the function states the same. http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@113 PS1, Line 113: context->AddWarning(ss.str().c_str()); : return ts_val; : } : this seems duplicate code (TimestampFunctions::FromUtc) http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timestamp-functions.cc@140 PS1, Line 140: context->AddWarning(msg.c_str()); : return TimestampVal::null(); : } : : // Create 'return_date' and 'return_time' from 'to_cs'. I think this conversion could go to a function as it seems duplicate for me with the same in TimestampFunctions::FromUtc. http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h File be/src/exprs/timezone_db.h: http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@21 PS1, Line 21: #include <boost/unordered_map.hpp> shouldn't we use the one from std? http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@35 PS1, Line 35: string& GetPath() ? http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@47 PS1, Line 47: } FindTimezone() returns pointer while GetUtcTimezone() returns reference to a timezone. Can these be unified? http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@54 PS1, Line 54: static const cctz::time_zone UTC_TIMEZONE_; Could you add a comment what the string param is used for? http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@62 PS1, Line 62: /// location. As I see you wrote comments for these function in the .cc file. Could you move them to the header? http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.h@70 PS1, Line 70: db. zone_abbrev http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc File be/src/exprs/timezone_db.cc: http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@10 PS1, Line 10: // Unrelated to your change but shouldn't we replace this to the Apache header other files use? Same goes for the header. http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@103 PS1, Line 103: // Returns 'true' if path 'a' starts with path 'b'. If 'relative' is not nullptr, it will FileSystemUtil would be a better place for this, I think. http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@105 PS1, Line 105: PathStartsWith I have the feeling that this function should only decide if path 'b' is the prefix of path 'a'. Returning the remainder after 'b' is something that I think should be out of this functions scope. I think this mix of responsibilities is the reason the name of the function is not that self-explanatory. It could be e.g. IsPrefixPath() or such. What do you think? http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@105 PS1, Line 105: string *relative For me the name of this variable doesn't indicate it's purpose. Can you name it remainder or rest or something similar? Also reading the function comment didn't explain me what a path relative to 'b' means. http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@108 PS1, Line 108: b.length() + 1 < a.length() what if a==b. In theory then a starts with b, still this returns false. As I see in this case 'a' should be 'b'+"/" to get true. If this is intentional then please mention it in the function comment. http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@118 PS1, Line 118: // with an uppercase letter. Could you mention examples here that contain the mentioned allowed chars? http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@119 PS1, Line 119: bool IsTimezoneNameSegmentValid(const string& tz_seg) { Shouldn't this be part of TimezoneDatabase or some other timezone related helper class? http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@121 PS1, Line 121: find_if( Tricky :) Just for my information, have you considered using regex? http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@132 PS1, Line 132: // time-zone name segments delimited by '/'. Could you mention one input example in the comment? http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@133 PS1, Line 133: bool IsTimezoneNameValid(const string& tz_name) { Shouldn't this be part of TimezoneDatabase or some other timezone related helper class? http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@136 PS1, Line 136: while (end != string::npos) { Wouldn't it be easier to verify this with a regex and get rid of this while loop and IsTimezoneNameSegmentValid() ? http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@147 PS1, Line 147: bool can you return int64_t* and return nullptr in case the offset is not valid? http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@147 PS1, Line 147: bool IsTimezoneOffsetValid(const string& tz_offset, int64_t* offset_sec) { Shouldn't this be part of TimezoneDatabase or some other timezone related helper class? http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@160 PS1, Line 160: // The implementation here was adapted from > Maybe this could be moved to class FileSystemUtil. I again feel here that this function serves 2 purposes instead of a clear one: decide if path is a symbolic link and get the 'real_path'. I think these should be split to multiple functions. What do you think? http://gerrit.cloudera.org:8080/#/c/9986/1/be/src/exprs/timezone_db.cc@251 PS1, Line 251: // mkdtemp operates in place, so we need a mutable array. Can you move the comment to the header? -- To view, visit http://gerrit.cloudera.org:8080/9986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77 Gerrit-Change-Number: 9986 Gerrit-PatchSet: 2 Gerrit-Owner: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Fri, 20 Apr 2018 07:32:54 +0000 Gerrit-HasComments: Yes