[ 
https://issues.apache.org/jira/browse/HIVE-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Howell updated HIVE-7288:
-------------------------------
    Tags: hadoop streaming, WebHcat, libjars, archives, CSS  (was: hadoop 
streaming, WebHcat, libjars, archives)

> Enable support for -libjars and -archives in WebHcat for Streaming MapReduce 
> jobs
> ---------------------------------------------------------------------------------
>
>                 Key: HIVE-7288
>                 URL: https://issues.apache.org/jira/browse/HIVE-7288
>             Project: Hive
>          Issue Type: New Feature
>          Components: WebHCat
>    Affects Versions: 0.11.0, 0.12.0, 0.13.0, 0.13.1
>         Environment: HDInsight deploying HDP 2.1;  Also HDP 2.1 on Windows 
>            Reporter: Azim Uddin
>            Assignee: shanyu zhao
>         Attachments: HIVE-7288.1.patch, hive-7288.patch
>
>
> Issue:
> ======
> Due to lack of parameters (or support for) equivalent of '-libjars' and 
> '-archives' in WebHcat REST API, we cannot use an external Java Jars or 
> Archive files with a Streaming MapReduce job, when the job is submitted via 
> WebHcat/templeton. 
> I am citing a few use cases here, but there can be plenty of scenarios like 
> this-
> #1 
> (for -archives):In order to use R with a hadoop distribution like HDInsight 
> or HDP on Windows, we could package the R directory up in a zip file and 
> rename it to r.jar and put it into HDFS or WASB. We can then do 
> something like this from hadoop command line (ignore the wasb syntax, same 
> command can be run with hdfs) - 
> hadoop jar %HADOOP_HOME%\lib\hadoop-streaming.jar -archives 
> wasb:///example/jars/r.jar -files 
> "wasb:///example/apps/mapper.r,wasb:///example/apps/reducer.r" -mapper 
> "./r.jar/bin/Rscript.exe mapper.r" -reducer "./r.jar/bin/Rscript.exe 
> reducer.r" -input /example/data/gutenberg -output /probe/r/wordcount
> This works from hadoop command line, but due to lack of support for 
> '-archives' parameter in WebHcat, we can't submit the same Streaming MR job 
> via WebHcat.
> #2 (for -libjars):
> Consider a scenario where a user would like to use a custom inputFormat with 
> a Streaming MapReduce job and wrote his own custom InputFormat JAR. From a 
> hadoop command line we can do something like this - 
> hadoop jar /path/to/hadoop-streaming.jar \
>         -libjars /path/to/custom-formats.jar \
>         -D map.output.key.field.separator=, \
>         -D mapred.text.key.partitioner.options=-k1,1 \
>         -input my_data/ \
>         -output my_output/ \
>         -outputformat test.example.outputformat.DateFieldMultipleOutputFormat 
> \
>         -mapper my_mapper.py \
>         -reducer my_reducer.py \
> But due to lack of support for '-libjars' parameter for streaming MapReduce 
> job in WebHcat, we can't submit the above streaming MR job (that uses a 
> custom Java JAR) via WebHcat.
> Impact:
> ========
> We think, being able to submit jobs remotely is a vital feature for hadoop to 
> be enterprise-ready and WebHcat plays an important role there. Streaming 
> MapReduce job is also very important for interoperability. So, it would be 
> very useful to keep WebHcat on par with hadoop command line in terms of 
> streaming MR job submission capability.
> Ask:
> ====
> Enable parameter support for 'libjars' and 'archives' in WebHcat for Hadoop 
> streaming jobs in WebHcat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to