[jira] [Updated] (HIVE-7288) Enable support for -libjars and -archives in WebHcat for Streaming MapReduce jobs

2015-06-23 Thread Jason Howell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Howell updated HIVE-7288:
---
Tags: hadoop streaming, WebHcat, libjars, archives, CSS  (was: hadoop 
streaming, WebHcat, libjars, archives)

 Enable support for -libjars and -archives in WebHcat for Streaming MapReduce 
 jobs
 -

 Key: HIVE-7288
 URL: https://issues.apache.org/jira/browse/HIVE-7288
 Project: Hive
  Issue Type: New Feature
  Components: WebHCat
Affects Versions: 0.11.0, 0.12.0, 0.13.0, 0.13.1
 Environment: HDInsight deploying HDP 2.1;  Also HDP 2.1 on Windows 
Reporter: Azim Uddin
Assignee: shanyu zhao
 Attachments: HIVE-7288.1.patch, hive-7288.patch


 Issue:
 ==
 Due to lack of parameters (or support for) equivalent of '-libjars' and 
 '-archives' in WebHcat REST API, we cannot use an external Java Jars or 
 Archive files with a Streaming MapReduce job, when the job is submitted via 
 WebHcat/templeton. 
 I am citing a few use cases here, but there can be plenty of scenarios like 
 this-
 #1 
 (for -archives):In order to use R with a hadoop distribution like HDInsight 
 or HDP on Windows, we could package the R directory up in a zip file and 
 rename it to r.jar and put it into HDFS or WASB. We can then do 
 something like this from hadoop command line (ignore the wasb syntax, same 
 command can be run with hdfs) - 
 hadoop jar %HADOOP_HOME%\lib\hadoop-streaming.jar -archives 
 wasb:///example/jars/r.jar -files 
 wasb:///example/apps/mapper.r,wasb:///example/apps/reducer.r -mapper 
 ./r.jar/bin/Rscript.exe mapper.r -reducer ./r.jar/bin/Rscript.exe 
 reducer.r -input /example/data/gutenberg -output /probe/r/wordcount
 This works from hadoop command line, but due to lack of support for 
 '-archives' parameter in WebHcat, we can't submit the same Streaming MR job 
 via WebHcat.
 #2 (for -libjars):
 Consider a scenario where a user would like to use a custom inputFormat with 
 a Streaming MapReduce job and wrote his own custom InputFormat JAR. From a 
 hadoop command line we can do something like this - 
 hadoop jar /path/to/hadoop-streaming.jar \
 -libjars /path/to/custom-formats.jar \
 -D map.output.key.field.separator=, \
 -D mapred.text.key.partitioner.options=-k1,1 \
 -input my_data/ \
 -output my_output/ \
 -outputformat test.example.outputformat.DateFieldMultipleOutputFormat 
 \
 -mapper my_mapper.py \
 -reducer my_reducer.py \
 But due to lack of support for '-libjars' parameter for streaming MapReduce 
 job in WebHcat, we can't submit the above streaming MR job (that uses a 
 custom Java JAR) via WebHcat.
 Impact:
 
 We think, being able to submit jobs remotely is a vital feature for hadoop to 
 be enterprise-ready and WebHcat plays an important role there. Streaming 
 MapReduce job is also very important for interoperability. So, it would be 
 very useful to keep WebHcat on par with hadoop command line in terms of 
 streaming MR job submission capability.
 Ask:
 
 Enable parameter support for 'libjars' and 'archives' in WebHcat for Hadoop 
 streaming jobs in WebHcat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7288) Enable support for -libjars and -archives in WebHcat for Streaming MapReduce jobs

2015-06-23 Thread Jason Howell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Howell updated HIVE-7288:
---
Tags: hadoop streaming, WebHcat, libjars, archives, MicrosoftCSS  (was: 
hadoop streaming, WebHcat, libjars, archives, CSS)

 Enable support for -libjars and -archives in WebHcat for Streaming MapReduce 
 jobs
 -

 Key: HIVE-7288
 URL: https://issues.apache.org/jira/browse/HIVE-7288
 Project: Hive
  Issue Type: New Feature
  Components: WebHCat
Affects Versions: 0.11.0, 0.12.0, 0.13.0, 0.13.1
 Environment: HDInsight deploying HDP 2.1;  Also HDP 2.1 on Windows 
Reporter: Azim Uddin
Assignee: shanyu zhao
 Attachments: HIVE-7288.1.patch, hive-7288.patch


 Issue:
 ==
 Due to lack of parameters (or support for) equivalent of '-libjars' and 
 '-archives' in WebHcat REST API, we cannot use an external Java Jars or 
 Archive files with a Streaming MapReduce job, when the job is submitted via 
 WebHcat/templeton. 
 I am citing a few use cases here, but there can be plenty of scenarios like 
 this-
 #1 
 (for -archives):In order to use R with a hadoop distribution like HDInsight 
 or HDP on Windows, we could package the R directory up in a zip file and 
 rename it to r.jar and put it into HDFS or WASB. We can then do 
 something like this from hadoop command line (ignore the wasb syntax, same 
 command can be run with hdfs) - 
 hadoop jar %HADOOP_HOME%\lib\hadoop-streaming.jar -archives 
 wasb:///example/jars/r.jar -files 
 wasb:///example/apps/mapper.r,wasb:///example/apps/reducer.r -mapper 
 ./r.jar/bin/Rscript.exe mapper.r -reducer ./r.jar/bin/Rscript.exe 
 reducer.r -input /example/data/gutenberg -output /probe/r/wordcount
 This works from hadoop command line, but due to lack of support for 
 '-archives' parameter in WebHcat, we can't submit the same Streaming MR job 
 via WebHcat.
 #2 (for -libjars):
 Consider a scenario where a user would like to use a custom inputFormat with 
 a Streaming MapReduce job and wrote his own custom InputFormat JAR. From a 
 hadoop command line we can do something like this - 
 hadoop jar /path/to/hadoop-streaming.jar \
 -libjars /path/to/custom-formats.jar \
 -D map.output.key.field.separator=, \
 -D mapred.text.key.partitioner.options=-k1,1 \
 -input my_data/ \
 -output my_output/ \
 -outputformat test.example.outputformat.DateFieldMultipleOutputFormat 
 \
 -mapper my_mapper.py \
 -reducer my_reducer.py \
 But due to lack of support for '-libjars' parameter for streaming MapReduce 
 job in WebHcat, we can't submit the above streaming MR job (that uses a 
 custom Java JAR) via WebHcat.
 Impact:
 
 We think, being able to submit jobs remotely is a vital feature for hadoop to 
 be enterprise-ready and WebHcat plays an important role there. Streaming 
 MapReduce job is also very important for interoperability. So, it would be 
 very useful to keep WebHcat on par with hadoop command line in terms of 
 streaming MR job submission capability.
 Ask:
 
 Enable parameter support for 'libjars' and 'archives' in WebHcat for Hadoop 
 streaming jobs in WebHcat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7288) Enable support for -libjars and -archives in WebHcat for Streaming MapReduce jobs

2015-06-23 Thread Jason Howell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Howell updated HIVE-7288:
---
Tags: hadoop streaming, WebHcat, libjars, archives, MicrosoftSupport  (was: 
hadoop streaming, WebHcat, libjars, archives, MicrosoftCSS)

 Enable support for -libjars and -archives in WebHcat for Streaming MapReduce 
 jobs
 -

 Key: HIVE-7288
 URL: https://issues.apache.org/jira/browse/HIVE-7288
 Project: Hive
  Issue Type: New Feature
  Components: WebHCat
Affects Versions: 0.11.0, 0.12.0, 0.13.0, 0.13.1
 Environment: HDInsight deploying HDP 2.1;  Also HDP 2.1 on Windows 
Reporter: Azim Uddin
Assignee: shanyu zhao
 Attachments: HIVE-7288.1.patch, hive-7288.patch


 Issue:
 ==
 Due to lack of parameters (or support for) equivalent of '-libjars' and 
 '-archives' in WebHcat REST API, we cannot use an external Java Jars or 
 Archive files with a Streaming MapReduce job, when the job is submitted via 
 WebHcat/templeton. 
 I am citing a few use cases here, but there can be plenty of scenarios like 
 this-
 #1 
 (for -archives):In order to use R with a hadoop distribution like HDInsight 
 or HDP on Windows, we could package the R directory up in a zip file and 
 rename it to r.jar and put it into HDFS or WASB. We can then do 
 something like this from hadoop command line (ignore the wasb syntax, same 
 command can be run with hdfs) - 
 hadoop jar %HADOOP_HOME%\lib\hadoop-streaming.jar -archives 
 wasb:///example/jars/r.jar -files 
 wasb:///example/apps/mapper.r,wasb:///example/apps/reducer.r -mapper 
 ./r.jar/bin/Rscript.exe mapper.r -reducer ./r.jar/bin/Rscript.exe 
 reducer.r -input /example/data/gutenberg -output /probe/r/wordcount
 This works from hadoop command line, but due to lack of support for 
 '-archives' parameter in WebHcat, we can't submit the same Streaming MR job 
 via WebHcat.
 #2 (for -libjars):
 Consider a scenario where a user would like to use a custom inputFormat with 
 a Streaming MapReduce job and wrote his own custom InputFormat JAR. From a 
 hadoop command line we can do something like this - 
 hadoop jar /path/to/hadoop-streaming.jar \
 -libjars /path/to/custom-formats.jar \
 -D map.output.key.field.separator=, \
 -D mapred.text.key.partitioner.options=-k1,1 \
 -input my_data/ \
 -output my_output/ \
 -outputformat test.example.outputformat.DateFieldMultipleOutputFormat 
 \
 -mapper my_mapper.py \
 -reducer my_reducer.py \
 But due to lack of support for '-libjars' parameter for streaming MapReduce 
 job in WebHcat, we can't submit the above streaming MR job (that uses a 
 custom Java JAR) via WebHcat.
 Impact:
 
 We think, being able to submit jobs remotely is a vital feature for hadoop to 
 be enterprise-ready and WebHcat plays an important role there. Streaming 
 MapReduce job is also very important for interoperability. So, it would be 
 very useful to keep WebHcat on par with hadoop command line in terms of 
 streaming MR job submission capability.
 Ask:
 
 Enable parameter support for 'libjars' and 'archives' in WebHcat for Hadoop 
 streaming jobs in WebHcat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)