David Knupp has posted comments on this change. Change subject: Enabling end-to-end tests on a remote cluster ......................................................................
Patch Set 1: (16 comments) http://gerrit.cloudera.org:8080/#/c/4769/1/bin/remote_data_load.py File bin/remote_data_load.py: PS1, Line 142: fe > Does this mean we're still using the 3rd party client libraries with these Nope, this is the path to where we keep our config files. We're just literally overwriting some of these files on the client with the same files downloaded from the cluster. $ ls fe/src/test/resources/*.xml -l lrwxrwxrwx 1 dknupp dknupp 78 Oct 25 18:22 fe/src/test/resources/core-site.xml -> /home/dknupp/Impala/testdata/cluster/cdh5/node-1/etc/hadoop/conf/core-site.xml -rw-rw-r-- 1 dknupp dknupp 1985 Oct 25 18:22 fe/src/test/resources/hbase-site.xml lrwxrwxrwx 1 dknupp dknupp 78 Oct 25 18:22 fe/src/test/resources/hdfs-site.xml -> /home/dknupp/Impala/testdata/cluster/cdh5/node-1/etc/hadoop/conf/hdfs-site.xml -rw-rw-r-- 1 dknupp dknupp 67730 Oct 18 18:18 fe/src/test/resources/hive-default.xml -rw-rw-r-- 1 dknupp dknupp 4728 Oct 25 18:22 fe/src/test/resources/hive-site.xml -rw-rw-r-- 1 dknupp dknupp 1976 Oct 25 18:22 fe/src/test/resources/sentry-site.xml PS1, Line 149: service > I believe the Cluster object in comparisons/cluster.py has helper methods f Going to leave this for a later investigation. PS1, Line 160: settings required for data loading > It would be good to document here exactly what is returned, and an explanat Done PS1, Line 224: environment > Is there a reason to update the current environment rather than create an e My presumption is that we set environment variables here because "that's how it's done" under our current model. That said, I don't think the current environment really gets updated, right? Python gets forked as a child process for the shell, and the environment gets set for the life span of the script. I agree that it seems a bit hacky, but it shouldn't have a persistent effect on one's environment. PS1, Line 266: load > Might be good to time this at least overall. Even if we just log the total I added a decorator that we can use on various functions. It might be handy when/if this script gets refactors to time various parts or stages of it. For right now, it just logs the time as you requested, but we can change the decorator to do something more intelligent at any time, e.g., record time in a DB for eventual trending, etc. PS1, Line 278: INFO A > What does this mean? You know, I'm not sure. I think Martin may have just been marking when certain phases completed, or testing the logger setup. I'll remove it. PS1, Line 281: logger > Two blank lines before this line, probably remove at least one. Done PS1, Line 296: INFO B > This must relate to INFO A above, but what does it mean? Removed. PS1, Line 297: chmod > Are we re-setting these permissions at the end, or do we know that tests do I'm not sure, but as elsewhere, I've filed a JIRA to investigate at a later time. PS1, Line 315: Re-load > Does this mean it was already loaded and now it's being loaded again? Why? I'm not sure, but I can't actually get this far into the script now, owing to the breakages introduced by the latest Kudu changes. I'll have to make a note to look into this once we fix IMPALA-4365. PS1, Line 335: test > This seems to not belong in this class; it doesn't do any data load. This may be here due to the fact that, running as part of the forked child python process, it can make use of the environment changes from before. I'm going to leave this in place for now, with the idea that we can refactor it out at a later time. JIRA has been filed. PS1, Line 365: main > If we have a parse_options() method a run(parsed_options) method, then you I'm having a bit of trouble parsing this sentence. Can you clarify? PS1, Line 393: test > This seems to belong elsewhere. Why does it go here? See the reply from above. http://gerrit.cloudera.org:8080/#/c/4769/1/testdata/bin/compute-table-stats.sh File testdata/bin/compute-table-stats.sh: PS1, Line 27: IMPALAD > Can you reference the Jira in a comment? Yup, a comment was added. I think you may have been looking at an older patch. http://gerrit.cloudera.org:8080/#/c/4769/1/testdata/bin/create-load-data.sh File testdata/bin/create-load-data.sh: PS1, Line 38: HS2_HOST_PORT > Is it reasonable to add a comment referencing the Jira here? Possible you were looking at an older patch. A comment has been added to the code. http://gerrit.cloudera.org:8080/#/c/4769/1/testdata/bin/setup-hdfs-env.sh File testdata/bin/setup-hdfs-env.sh: PS1, Line 53: CACHEADMIN_ARGS > If the is_kerberized block is executed above, then the CACHADMIN_ARGS would I feel like some of these comments might be outside of the scope of this review, esp. with regard to factoring out the existing is_kerberized block. Since I'm not an expert on either HDFS or Kerberos, I'm going to throw a check around each block in question with regard to REMOTE_LOAD, and file a new JIRA to look into the issue. See IMPALA-4378. -- To view, visit http://gerrit.cloudera.org:8080/4769 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp <dkn...@cloudera.com> Gerrit-Reviewer: David Knupp <dkn...@cloudera.com> Gerrit-Reviewer: Harrison Sheinblatt <h...@hotmail.com> Gerrit-Reviewer: Martin Grund <grundprin...@gmail.com> Gerrit-Reviewer: Michael Brown <mi...@cloudera.com> Gerrit-HasComments: Yes