Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9986 )

Change subject: IMPALA-3307: Add support for IANA time-zone db
......................................................................


Patch Set 10:

(5 comments)

> > > > > > Uploaded patch set 9.
 > > > > >
 > > > > > Patch -set 9 contains the following changes:
 > > > > > - Added a full timezone db to testdata/tzdb.
 > > > > > - End-to-end tests and BE-tests were changed to use this
 > > > timezone
 > > > > > db. This was necessary because some timezone-tests were
 > > failing
 > > > > on
 > > > > > older jenkins workers that had an older tzdata package
 > > > installed.
 > > > >
 > > > > It might be a good idea to store the timezone-db files in one
 > > > .tar
 > > > > file and extract them before running the tests. What do you
 > > > think?
 > > >
 > > > I agree, .taring or compressing the tz db would be much better,
 > > if
 > > > it does not make the code too complicated. Having less file
 > would
 > > > make the review more readable,

>From a review ability perspective, there's absolutely no need for this commit 
>to have 221 timezone files. You can test it with ~3, or do a separate commit. 
>i.e., it's neither here nor there.

 > >
 > > Alternatively we can store timezone files in a JAR archive
 > instead.
 > > The BE can call into the java FE to extract files from it.
 >
 > Tim, Dan, what do you think?

The Yarn equivalent here has this notion of a "distributed cache" which is to 
say it stores the files locally and re-uses them across jobs. I can't tell if 
we should be worried that all impalads, at boot time, will slam HDFS with 
reading the timezonedb. I think reading ~200 files per impalad times 200 
impalad daemons may be a lot of HDFS metadata load, but maybe it's comparable 
to what we do for queries anyway. I certainly think that tar or jar is a better 
way to go. Since this is happening at boot, we can probably still fork to tar, 
which we can assume is available, or use Java, which has tar libraries and 
native support for zip.

http://gerrit.cloudera.org:8080/#/c/9986/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9986/10//COMMIT_MSG@35
PS10, Line 35:   specify an HDFS/S3/ADLS location that contains the shared 
compiled
In what format? Does it add to the host one or override it?

My /usr/share/zoneinfo is full of symlinks which HDFS doesn't support in some 
configurations (and S3 certainly doesn't).


http://gerrit.cloudera.org:8080/#/c/9986/10//COMMIT_MSG@41
PS10, Line 41: - The name of the coordinator node’s local time-zone is saved to 
the
Is it easy to tell what the local time zone is of an impalad node? (E.g., do we 
log it?)


http://gerrit.cloudera.org:8080/#/c/9986/10//COMMIT_MSG@45
PS10, Line 45: - Introduces a new startup flag (--hdfs_zone_abbrev_conf) to 
impalad
What's the distinction between this and --hdfs_zone_info_dir? Do we need both?


http://gerrit.cloudera.org:8080/#/c/9986/10/testdata/tzdb/2017c/Africa/Abidjan
File testdata/tzdb/2017c/Africa/Abidjan:

http://gerrit.cloudera.org:8080/#/c/9986/10/testdata/tzdb/2017c/Africa/Abidjan@1
PS10, Line 1: ../Atlantic/St_Helena
We're adding a ton of files. Do we need such a big database for our testing 
purposes?


http://gerrit.cloudera.org:8080/#/c/9986/10/tests/custom_cluster/test_shared_tzdb.py
File tests/custom_cluster/test_shared_tzdb.py:

http://gerrit.cloudera.org:8080/#/c/9986/10/tests/custom_cluster/test_shared_tzdb.py@38
PS10, Line 38:     cls.ImpalaTestMatrix.add_constraint(lambda v:
Add a comment about what this is trying to do?



--
To view, visit http://gerrit.cloudera.org:8080/9986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Gerrit-Change-Number: 9986
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Wed, 30 May 2018 15:59:54 +0000
Gerrit-HasComments: Yes

Reply via email to