[
https://issues.apache.org/jira/browse/HIVE-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966752#comment-13966752
]
Shuaishuai Nie commented on HIVE-5072:
--------------------------------------
Thanks [~ekoifman] for the comments. Please see below for the answers:
0. If I understand this correctly, optionsFile should contain the details of
Sqoop command to execute. But in the code it seems that the expectation is that
this file is present in DFS. Thus to submit a Sqoop job via WebHcat (and use
optionsFile) the user has to first upload this file to the cluster. This is an
extra call for job submission and possibly extra config on the cluster side to
enable the client of WebHCat to upload files. Why not just let the client
upload the file WebHCat as part of the REST POST request? This seems a lot more
user friendly/usable.
The user scenario for the option file is user may want to reuse some part of
sqoop command arguments across different commands, like connection string;
username or password. In this case, user should expect the file already exist
on DFS for the to use across different jobs. Since Sqoop only support option
file from local file system, and Templeon may launch sqoop job on any
workernode, Templeton need to add the option file used to distribute cache so
that it can be used in Sqoop command. You mentioned "Why not just let the
client upload the file WebHCat as part of the REST POST request", where the
file located originally? If it is from local file system, it will require extra
copy and extra command for each Templeton Sqoop job.
1. -d 'user.name=foo' is deprecated (i.e. user.name as a Form parameter).
user.name has to be part of the query string. The test cases and examples in
.pdf should be updated.
2. Formatting in ScoopDelegator doesn't follow Hive conventions
3. Server.scoop() - there is Server.checkEnableLogPrerequisite() to check
'enableLog' parameter setting.
4. I see that new parameters for Scoop tests are added in 3 places in
build.xml. Only the 'test' target actually runs jobsubmission.conf.
Will change the patch and documentation accordingly for 1-4
5. For the tests you added, where does the JDBC driver come from for any
particular DB?
The JDBC drive should come from the Sqoop installation based on which Database
is used. It should located at %SQOOP_HOME%\lib folder
6. Can for Form parameter for optionsFile (Server.sqoop()) be called
"optionsFile" instead of just "file"?
The "file" argument does not works exactly the same as the "--options-file" in
Sqoop since the "--options-file" can be only part of the command and "file"
here can only be the entire command. But I think change the name to optionFile
may be more explanatory for users.
7. it seems from
http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_using_options_files_to_pass_arguments
that in a Sqoop command, either options-file (with command and args) or
command name and all args inline can be specifed. The tests you added seem to
expect only command args to be in options-file. In particular Server.sqoop()
tests "command == null && optionsFile == null" but not if both options are
specified. Seems like this is not expected usage.
As I mentioned earlier, the optionsFile here in Server.sqoop() is not exactly
works the same as the "--options-file" in Sqoop. The use of "--options-file"
from Sqoop is tested in the second e2e test for Sqoop. In that test, the
"--option-file" substitute part of the Sqoop command.
The Templeton-Sqoop option should not allow both "command" and "optionsFile" to
be defined since the "optionsFile" here supposed to be used as the entire Sqoop
command. I will add the condition check for this scenario.
8. Is there anything that can be done to make the test self-contained, so that
the DB table is automatically created, for example in the DB that contains the
metastore data?
There is not an efficient way to make the test self-contained given any
database may be used for the test and even for the metastore the type of
database can be different.
> [WebHCat]Enable directly invoke Sqoop job through Templeton
> -----------------------------------------------------------
>
> Key: HIVE-5072
> URL: https://issues.apache.org/jira/browse/HIVE-5072
> Project: Hive
> Issue Type: Improvement
> Components: WebHCat
> Affects Versions: 0.12.0
> Reporter: Shuaishuai Nie
> Assignee: Shuaishuai Nie
> Attachments: HIVE-5072.1.patch, HIVE-5072.2.patch, HIVE-5072.3.patch,
> Templeton-Sqoop-Action.pdf
>
>
> Now it is hard to invoke a Sqoop job through templeton. The only way is to
> use the classpath jar generated by a sqoop job and use the jar delegator in
> Templeton. We should implement Sqoop Delegator to enable directly invoke
> Sqoop job through Templeton.
--
This message was sent by Atlassian JIRA
(v6.2#6252)