David Knupp has posted comments on this change.

Change subject: Enabling end-to-end tests on a remote cluster
......................................................................


Patch Set 1:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/4769/1/bin/remote_data_load.py
File bin/remote_data_load.py:

PS1, Line 142: fe
> Does this mean we're still using the 3rd party client libraries with these 
Nope, this is the path to where we keep our config files. We're just literally 
overwriting some of these files on the client with the same files downloaded 
from the cluster.

  $ ls fe/src/test/resources/*.xml -l
  lrwxrwxrwx 1 dknupp dknupp    78 Oct 25 18:22 
fe/src/test/resources/core-site.xml -> 
/home/dknupp/Impala/testdata/cluster/cdh5/node-1/etc/hadoop/conf/core-site.xml
  -rw-rw-r-- 1 dknupp dknupp  1985 Oct 25 18:22 
fe/src/test/resources/hbase-site.xml
  lrwxrwxrwx 1 dknupp dknupp    78 Oct 25 18:22 
fe/src/test/resources/hdfs-site.xml -> 
/home/dknupp/Impala/testdata/cluster/cdh5/node-1/etc/hadoop/conf/hdfs-site.xml
  -rw-rw-r-- 1 dknupp dknupp 67730 Oct 18 18:18 
fe/src/test/resources/hive-default.xml
  -rw-rw-r-- 1 dknupp dknupp  4728 Oct 25 18:22 
fe/src/test/resources/hive-site.xml
  -rw-rw-r-- 1 dknupp dknupp  1976 Oct 25 18:22 
fe/src/test/resources/sentry-site.xml


PS1, Line 149: service
> I believe the Cluster object in comparisons/cluster.py has helper methods f
Going to leave this for a later investigation.


PS1, Line 160: settings required for data loading
> It would be good to document here exactly what is returned, and an explanat
Done


PS1, Line 224: environment
> Is there a reason to update the current environment rather than create an e
My presumption is that we set environment variables here because "that's how 
it's done" under our current model.

That said, I don't think the current environment really gets updated, right? 
Python gets forked as a child process for the shell, and the environment gets 
set for the life span of the script. I agree that it seems a bit hacky, but it 
shouldn't have a persistent effect on one's environment.


PS1, Line 266: load
> Might be good to time this at least overall.  Even if we just log the total
I added a decorator that we can use on various functions. It might be handy 
when/if this script gets refactors to time various parts or stages of it.

For right now, it just logs the time as you requested, but we can change the 
decorator to do something more intelligent at any time, e.g., record time in a 
DB for eventual trending, etc.


PS1, Line 278: INFO A
> What does this mean?
You know, I'm not sure. I think Martin may have just been marking when certain 
phases completed, or testing the logger setup. I'll remove it.


PS1, Line 281: logger
> Two blank lines before this line, probably remove at least one.
Done


PS1, Line 296: INFO B
> This must relate to INFO A above, but what does it mean?
Removed.


PS1, Line 297: chmod
> Are we re-setting these permissions at the end, or do we know that tests do
I'm not sure, but as elsewhere, I've filed a JIRA to investigate at a later 
time.


PS1, Line 315: Re-load
> Does this mean it was already loaded and now it's being loaded again?  Why?
I'm not sure, but I can't actually get this far into the script now, owing to 
the breakages introduced by the latest Kudu changes. I'll have to make a note 
to look into this once we fix IMPALA-4365.


PS1, Line 335: test
> This seems to not belong in this class; it doesn't do any data load.
This may be here due to the fact that, running as part of the forked child 
python process, it can make use of the environment changes from before. I'm 
going to leave this in place for now, with the idea that we can refactor it out 
at a later time. JIRA has been filed.


PS1, Line 365: main
> If we have a parse_options() method a run(parsed_options) method, then you 
I'm having a bit of trouble parsing this sentence. Can you clarify?


PS1, Line 393: test
> This seems to belong elsewhere.  Why does it go here?
See the reply from above.


http://gerrit.cloudera.org:8080/#/c/4769/1/testdata/bin/compute-table-stats.sh
File testdata/bin/compute-table-stats.sh:

PS1, Line 27: IMPALAD
> Can you reference the Jira in a comment?
Yup, a comment was added. I think you may have been looking at an older patch.


http://gerrit.cloudera.org:8080/#/c/4769/1/testdata/bin/create-load-data.sh
File testdata/bin/create-load-data.sh:

PS1, Line 38: HS2_HOST_PORT
> Is it reasonable to add a comment referencing the Jira here?
Possible you were looking at an older patch. A comment has been added to the 
code.


http://gerrit.cloudera.org:8080/#/c/4769/1/testdata/bin/setup-hdfs-env.sh
File testdata/bin/setup-hdfs-env.sh:

PS1, Line 53: CACHEADMIN_ARGS
> If the is_kerberized block is executed above, then the CACHADMIN_ARGS would
I feel like some of these comments might be outside of the scope of this 
review, esp. with regard to factoring out the existing is_kerberized block. 
Since I'm not an expert on either HDFS or Kerberos, I'm going to throw a check 
around each block in question with regard to REMOTE_LOAD, and file a new JIRA 
to look into the issue. See IMPALA-4378.


-- 
To view, visit http://gerrit.cloudera.org:8080/4769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp <dkn...@cloudera.com>
Gerrit-Reviewer: David Knupp <dkn...@cloudera.com>
Gerrit-Reviewer: Harrison Sheinblatt <h...@hotmail.com>
Gerrit-Reviewer: Martin Grund <grundprin...@gmail.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to