[jira] [Commented] (PIG-3430) Add xml format for explaining MapReduce Plan.
[ https://issues.apache.org/jira/browse/PIG-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753350#comment-13753350 ] Daniel Dai commented on PIG-3430: - I can get xml mapreduce plan with the patch. Two questions: 1. Any reason we only do it in mapreduce plan? 2. Why we need to mark tmpLoader? Is it to support xml mapreduce plan? Or it is a separate thing? Add xml format for explaining MapReduce Plan. - Key: PIG-3430 URL: https://issues.apache.org/jira/browse/PIG-3430 Project: Pig Issue Type: New Feature Reporter: Jeremy Karn Attachments: PIG-3430.patch At Mortar we needed an easy way to store/parse a script's map reduce plan. We added an xml output format for the MapReduce plan to make this easier. We also added a flag to keep track of if each store or load was from the original script (and associated with an alias) or if its a temporary store/load generated by Pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2606) union/ join operations are not accepting same alias as multiple inputs
[ https://issues.apache.org/jira/browse/PIG-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated PIG-2606: --- Attachment: PIG-2606.2.patch.txt Adding unit tests. union/ join operations are not accepting same alias as multiple inputs -- Key: PIG-2606 URL: https://issues.apache.org/jira/browse/PIG-2606 Project: Pig Issue Type: Bug Affects Versions: 0.9.2, 0.10.0 Reporter: Thejas M Nair Assignee: Hari Sankar Sivarama Subramaniyan Attachments: PIG-2606.2.patch.txt, PIG-2606.patch.txt grunt l = load 'x'; grunt u = union l, l; 2012-03-16 18:48:45,687 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. Union with Count(Operand) 2 grunt a = load 'a0.txt' as (a0, a1); grunt b = join a by a0, a by a1; 2013-08-27 13:36:21,807 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2225: Projection with nothing to reference! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3433) The import sdsu cannot be resolved
[ https://issues.apache.org/jira/browse/PIG-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753523#comment-13753523 ] Ido Hadanny commented on PIG-3433: -- I didn't know that, as you said, I just did top level ant clean eclipse-files + ant compile gen The import sdsu cannot be resolved -- Key: PIG-3433 URL: https://issues.apache.org/jira/browse/PIG-3433 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.11.1 Environment: Eclipse indigo Reporter: Ido Hadanny executed: ➜ trunk svn update At revision 1516115. ant clean eclipse-files ant compile gen getting: https://issues.apache.org/jira/browse/PIG-3399 AND after manually removing the wrong javacc-4.2 dependency, getting: The import sdsu cannot be resolved in DataGenerator.java -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753843#comment-13753843 ] Bikas Saha commented on PIG-3419: - Folks, FYI, based on recent feedback we have changed the names used in some of the TEZ API's. It a simple refactoring on the Tez side and should be a simple refactoring fix on the Pig side too. Jira for reference. TEZ-410. Pluggable Execution Engine --- Key: PIG-3419 URL: https://issues.apache.org/jira/browse/PIG-3419 Project: Pig Issue Type: New Feature Affects Versions: 0.12 Reporter: Achal Soni Assignee: Achal Soni Priority: Minor Attachments: execengine.patch, mapreduce_execengine.patch, stats_scriptstate.patch, test_failures.txt, test_suite.patch, updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch In an effort to adapt Pig to work using Apache Tez (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for a cleaner ExecutionEngine abstraction than existed before. The changes are not that major as Pig was already relatively abstracted out between the frontend and backend. The changes in the attached commit are essentially the barebones changes -- I tried to not change the structure of Pig's different components too much. I think it will be interesting to see in the future how we can refactor more areas of Pig to really honor this abstraction between the frontend and backend. Some of the changes was to reinstate an ExecutionEngine interface to tie together the front end and backend, and making the changes in Pig to delegate to the EE when necessary, and creating an MRExecutionEngine that implements this interface. Other work included changing ExecType to cycle through the ExecutionEngines on the classpath and select the appropriate one (this is done using Java ServiceLoader, exactly how MapReduce does for choosing the framework to use between local and distributed mode). Also I tried to make ScriptState, JobStats, and PigStats as abstract as possible in its current state. I think in the future some work will need to be done here to perhaps re-evaluate the usage of ScriptState and the responsibilities of the different statistics classes. I haven't touched the PPNL, but I think more abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3441) Allow Pig to use default resources from Configuration objects
[ https://issues.apache.org/jira/browse/PIG-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753860#comment-13753860 ] Bhooshan Mogal commented on PIG-3441: - Could you please tell me where the code snippet you mentioned is from? A lot of code-flows in pig seem to re-create configuration objects by loading from Properties as well in the ConfigurationUtil.toConfiguration() method. In this method, I saw that default resources are ignored as - {code} public static Configuration toConfiguration(Properties properties) { assert properties != null; final Configuration config = new Configuration(false); final EnumerationObject iter = properties.keys(); ... {code} Due to this, Pig was unable to read from custom resources added statically. The patch addresses this by allowing users to create the Configuration object in this method with loadDefaults set to true, based on a pig property. Allow Pig to use default resources from Configuration objects - Key: PIG-3441 URL: https://issues.apache.org/jira/browse/PIG-3441 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.11.1 Reporter: Bhooshan Mogal Attachments: PIG-3441_1.patch, PIG-3441.patch Pig currently ignores parameters from configuration files added statically to Configuration objects as Configuration.addDefaultResource(filename.xml). Consider the following scenario - In a hadoop FileSystem driver for a non-HDFS filesystem you load properties specific to that FileSystem in a static initializer block in the class that extends org.apache.hadoop.fs.Filesystem for your FileSystem like below - {code} class MyFileSystem extends FileSystem { static { Configuration.addDefaultResource(myfs-default.xml); Configuration.addDefaultResource(myfs-site.xml); } } {code} Interfaces like the Hadoop CLI, Hive, Hadoop M/R can find configuration parameters defined in these configuration files as long as they are on the classpath. However, Pig cannot find parameters from these files, because it ignores configuration files added statically. Pig should allow users to specify if they would like pig to read parameters from resources loaded statically. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753869#comment-13753869 ] Bikas Saha commented on PIG-3419: - Looks like this jira wasnt the appropriate one to comment on. Is there a different umbrella jira for Pig on Tez that I can track and post comments on? Pluggable Execution Engine --- Key: PIG-3419 URL: https://issues.apache.org/jira/browse/PIG-3419 Project: Pig Issue Type: New Feature Affects Versions: 0.12 Reporter: Achal Soni Assignee: Achal Soni Priority: Minor Attachments: execengine.patch, mapreduce_execengine.patch, stats_scriptstate.patch, test_failures.txt, test_suite.patch, updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch In an effort to adapt Pig to work using Apache Tez (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for a cleaner ExecutionEngine abstraction than existed before. The changes are not that major as Pig was already relatively abstracted out between the frontend and backend. The changes in the attached commit are essentially the barebones changes -- I tried to not change the structure of Pig's different components too much. I think it will be interesting to see in the future how we can refactor more areas of Pig to really honor this abstraction between the frontend and backend. Some of the changes was to reinstate an ExecutionEngine interface to tie together the front end and backend, and making the changes in Pig to delegate to the EE when necessary, and creating an MRExecutionEngine that implements this interface. Other work included changing ExecType to cycle through the ExecutionEngines on the classpath and select the appropriate one (this is done using Java ServiceLoader, exactly how MapReduce does for choosing the framework to use between local and distributed mode). Also I tried to make ScriptState, JobStats, and PigStats as abstract as possible in its current state. I think in the future some work will need to be done here to perhaps re-evaluate the usage of ScriptState and the responsibilities of the different statistics classes. I haven't touched the PPNL, but I think more abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753878#comment-13753878 ] Achal Soni commented on PIG-3419: - [~bikassaha] Thanks for the heads-up Bikas! This JIRA is not concerned with the Tez integration for Pig and is simply the abstraction in Pig to allow for alternate ExecutionEngines in Pig. But will certainly change this on the Tez integration side of stuff. Thanks a lot [~cheolsoo] for continuing this! I think everything looks good from my end. I can certainly see why we may want to keep this on a different branch until everything is finalized. Certain things may still need more work. For example, OutputStats is not completed abstracted out, as it still has references to POStore which is a MR implementation construct. ScriptState/PPNL/JobStats may still need more abstraction (especially PPNL) and reworking to incorporate a new ExecutionEngine abstraction. I think what we have done here is the minimum foundation for an abstraction though, and it would be appropriate to put into trunk, but these are not my decisions to make. With regard to public methods that were changed, I don't think most of them are a big deal, besides as Cheolsoo said, the PigServer throwing PigException. I never thought IOException was a good exception to throw, but I think reverting PigServer back to IOException as it is userfacing code is not a big deal. The rest of the method signature changes shouldn't be worrisome because most of them are internal to the project. However, the change from JobStats to MRJobStats, while necessary (as each ExecutionEngine would have it's own type of JobStats it would present to the end user), could be problematic because it is userfacing code and would probably break people who were previously using JobStats. That I think is the most important thing to keep in mind. The task of making the PPNL and JobStats clearly tied to the ExecutionEngine should be thought through also. Pluggable Execution Engine --- Key: PIG-3419 URL: https://issues.apache.org/jira/browse/PIG-3419 Project: Pig Issue Type: New Feature Affects Versions: 0.12 Reporter: Achal Soni Assignee: Achal Soni Priority: Minor Attachments: execengine.patch, mapreduce_execengine.patch, stats_scriptstate.patch, test_failures.txt, test_suite.patch, updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch In an effort to adapt Pig to work using Apache Tez (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for a cleaner ExecutionEngine abstraction than existed before. The changes are not that major as Pig was already relatively abstracted out between the frontend and backend. The changes in the attached commit are essentially the barebones changes -- I tried to not change the structure of Pig's different components too much. I think it will be interesting to see in the future how we can refactor more areas of Pig to really honor this abstraction between the frontend and backend. Some of the changes was to reinstate an ExecutionEngine interface to tie together the front end and backend, and making the changes in Pig to delegate to the EE when necessary, and creating an MRExecutionEngine that implements this interface. Other work included changing ExecType to cycle through the ExecutionEngines on the classpath and select the appropriate one (this is done using Java ServiceLoader, exactly how MapReduce does for choosing the framework to use between local and distributed mode). Also I tried to make ScriptState, JobStats, and PigStats as abstract as possible in its current state. I think in the future some work will need to be done here to perhaps re-evaluate the usage of ScriptState and the responsibilities of the different statistics classes. I haven't touched the PPNL, but I think more abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3441) Allow Pig to use default resources from Configuration objects
[ https://issues.apache.org/jira/browse/PIG-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753880#comment-13753880 ] Daniel Dai commented on PIG-3441: - This is from the test case of PIG-3135. If the custom configuration get lost along the way, I wonder why PIG-3135 works. Seems they should share the same issue. Allow Pig to use default resources from Configuration objects - Key: PIG-3441 URL: https://issues.apache.org/jira/browse/PIG-3441 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.11.1 Reporter: Bhooshan Mogal Attachments: PIG-3441_1.patch, PIG-3441.patch Pig currently ignores parameters from configuration files added statically to Configuration objects as Configuration.addDefaultResource(filename.xml). Consider the following scenario - In a hadoop FileSystem driver for a non-HDFS filesystem you load properties specific to that FileSystem in a static initializer block in the class that extends org.apache.hadoop.fs.Filesystem for your FileSystem like below - {code} class MyFileSystem extends FileSystem { static { Configuration.addDefaultResource(myfs-default.xml); Configuration.addDefaultResource(myfs-site.xml); } } {code} Interfaces like the Hadoop CLI, Hive, Hadoop M/R can find configuration parameters defined in these configuration files as long as they are on the classpath. However, Pig cannot find parameters from these files, because it ignores configuration files added statically. Pig should allow users to specify if they would like pig to read parameters from resources loaded statically. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3441) Allow Pig to use default resources from Configuration objects
[ https://issues.apache.org/jira/browse/PIG-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753899#comment-13753899 ] Bhooshan Mogal commented on PIG-3441: - The test cases of [PIG-3135|https://issues.apache.org/jira/browse/PIG-3135] pass a false to the Configuration constructor. Also, [PIG-3135|https://issues.apache.org/jira/browse/PIG-3135] calls ConfigurationUtil.toConfiguration as well, which creates the configuration object with loadDefaults set to false like in my previous comment. I tried using [PIG-3135|https://issues.apache.org/jira/browse/PIG-3135] and setting pig.use.overriden.hadoop.configs to true, however, pig did not read from the custom configuration files added statically. When I changed it to set loadDefaults to true, it worked fine. Allow Pig to use default resources from Configuration objects - Key: PIG-3441 URL: https://issues.apache.org/jira/browse/PIG-3441 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.11.1 Reporter: Bhooshan Mogal Attachments: PIG-3441_1.patch, PIG-3441.patch Pig currently ignores parameters from configuration files added statically to Configuration objects as Configuration.addDefaultResource(filename.xml). Consider the following scenario - In a hadoop FileSystem driver for a non-HDFS filesystem you load properties specific to that FileSystem in a static initializer block in the class that extends org.apache.hadoop.fs.Filesystem for your FileSystem like below - {code} class MyFileSystem extends FileSystem { static { Configuration.addDefaultResource(myfs-default.xml); Configuration.addDefaultResource(myfs-site.xml); } } {code} Interfaces like the Hadoop CLI, Hive, Hadoop M/R can find configuration parameters defined in these configuration files as long as they are on the classpath. However, Pig cannot find parameters from these files, because it ignores configuration files added statically. Pig should allow users to specify if they would like pig to read parameters from resources loaded statically. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753912#comment-13753912 ] Julien Le Dem commented on PIG-3419: [~cheolsoo]: thanks a lot for looking into this. Here are my thoughts: 1. let's change it back 2. 4. 5. 6. 7. are either internal to Pig or necessary to add the execution engine abstraction. 3. JobStats still exists but the MR specific part is split into MRJobStats which extends JobStats Same thing for PigStatsUtil and ScriptState. Those classes are not disappearing but the MR specific part is abstracted out. HExecutionEngine could be renamed back to what it was but this is again what is becoming the new abstraction. Unfortunately tools like Ambrose and Lipstick depend on the MR specific parts of Pig and look at the internals. This patch is a necessary change so that those tools can work independently of the execution engine in the future. The changes to Ambrose and Lipstick should be minimal though with this patch. But yes they would suffer from some incompatibility, but again there is no way around it when a tool looks inside the execution engine internals. I think we should revert 1. and commit the patch. Pluggable Execution Engine --- Key: PIG-3419 URL: https://issues.apache.org/jira/browse/PIG-3419 Project: Pig Issue Type: New Feature Affects Versions: 0.12 Reporter: Achal Soni Assignee: Achal Soni Priority: Minor Attachments: execengine.patch, mapreduce_execengine.patch, stats_scriptstate.patch, test_failures.txt, test_suite.patch, updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch In an effort to adapt Pig to work using Apache Tez (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for a cleaner ExecutionEngine abstraction than existed before. The changes are not that major as Pig was already relatively abstracted out between the frontend and backend. The changes in the attached commit are essentially the barebones changes -- I tried to not change the structure of Pig's different components too much. I think it will be interesting to see in the future how we can refactor more areas of Pig to really honor this abstraction between the frontend and backend. Some of the changes was to reinstate an ExecutionEngine interface to tie together the front end and backend, and making the changes in Pig to delegate to the EE when necessary, and creating an MRExecutionEngine that implements this interface. Other work included changing ExecType to cycle through the ExecutionEngines on the classpath and select the appropriate one (this is done using Java ServiceLoader, exactly how MapReduce does for choosing the framework to use between local and distributed mode). Also I tried to make ScriptState, JobStats, and PigStats as abstract as possible in its current state. I think in the future some work will need to be done here to perhaps re-evaluate the usage of ScriptState and the responsibilities of the different statistics classes. I haven't touched the PPNL, but I think more abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3048) Add mapreduce workflow information to job configuration
[ https://issues.apache.org/jira/browse/PIG-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753942#comment-13753942 ] Dmitriy V. Ryaboy commented on PIG-3048: no objections. after all, usage of the config info is purely optional. We've run into trouble before with information of this sort becoming very big and triggering JobConf too large errors. Might want to look at compression at some point. Add mapreduce workflow information to job configuration --- Key: PIG-3048 URL: https://issues.apache.org/jira/browse/PIG-3048 Project: Pig Issue Type: Improvement Reporter: Billie Rinaldi Assignee: Billie Rinaldi Fix For: 0.11.2 Attachments: PIG-3048.patch, PIG-3048.patch, PIG-3048.patch Adding workflow properties to the job configuration would enable logging and analysis of workflows in addition to individual MapReduce jobs. Suggested properties include a workflow ID, workflow name, adjacency list connecting nodes in the workflow, and the name of the current node in the workflow. mapreduce.workflow.id - a unique ID for the workflow, ideally prepended with the application name e.g. pig_pigScriptId mapreduce.workflow.name - a name for the workflow, to distinguish this workflow from other workflows and to group different runs of the same workflow e.g. pig command line mapreduce.workflow.adjacency - an adjacency list for the workflow graph, encoded as mapreduce.workflow.adjacency.source node = comma-separated list of target nodes mapreduce.workflow.node.name - the name of the node corresponding to this MapReduce job in the workflow adjacency list -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3419: --- Attachment: updated-8-29-2013-exec-engine.patch I am uploading a new patch that revert the PigServer constructor (#1). The diff can be viewed [here|https://github.com/piaozhexiu/apache-pig/commit/a1e46e23ef0842874db6c09769a630ec47f5d259]. (There are two unrelated minor changes.) The new patch is rebased to trunk. Please let me know if anyone has objections. If I don't hear back, I will commit it to trunk tomorrow. Thanks! Pluggable Execution Engine --- Key: PIG-3419 URL: https://issues.apache.org/jira/browse/PIG-3419 Project: Pig Issue Type: New Feature Affects Versions: 0.12 Reporter: Achal Soni Assignee: Achal Soni Priority: Minor Attachments: execengine.patch, mapreduce_execengine.patch, stats_scriptstate.patch, test_failures.txt, test_suite.patch, updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, updated-8-29-2013-exec-engine.patch In an effort to adapt Pig to work using Apache Tez (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for a cleaner ExecutionEngine abstraction than existed before. The changes are not that major as Pig was already relatively abstracted out between the frontend and backend. The changes in the attached commit are essentially the barebones changes -- I tried to not change the structure of Pig's different components too much. I think it will be interesting to see in the future how we can refactor more areas of Pig to really honor this abstraction between the frontend and backend. Some of the changes was to reinstate an ExecutionEngine interface to tie together the front end and backend, and making the changes in Pig to delegate to the EE when necessary, and creating an MRExecutionEngine that implements this interface. Other work included changing ExecType to cycle through the ExecutionEngines on the classpath and select the appropriate one (this is done using Java ServiceLoader, exactly how MapReduce does for choosing the framework to use between local and distributed mode). Also I tried to make ScriptState, JobStats, and PigStats as abstract as possible in its current state. I think in the future some work will need to be done here to perhaps re-evaluate the usage of ScriptState and the responsibilities of the different statistics classes. I haven't touched the PPNL, but I think more abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753979#comment-13753979 ] Daniel Dai commented on PIG-3255: - I personally does not realize anyone using StreamToPig, but need to check with [~alangates], since he marked it as public stable. Other part of the patch looks good. Avoiding 2 byte array copy and reuse Text object would save memory and enhance performance. Avoid extra byte array copy in streaming deserialize Key: PIG-3255 URL: https://issues.apache.org/jira/browse/PIG-3255 Project: Pig Issue Type: Bug Affects Versions: 0.11 Reporter: Rohini Palaniswamy Assignee: Rohini Palaniswamy Fix For: 0.12 Attachments: PIG-3255-1.patch, PIG-3255-2.patch PigStreaming.java: public Tuple deserialize(byte[] bytes) throws IOException { Text val = new Text(bytes); return StorageUtil.textToTuple(val, fieldDel); } Should remove new Text(bytes) copy and construct the tuple directly from the bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753985#comment-13753985 ] Achal Soni commented on PIG-3419: - I agree with all that is said, but there is no need to rename HExecutionEngine back. It doesn't semantically make sense and I don't think that anybody was directly interacting it outside of the test cases? Whatever changes to Ambrose and Lipstick should be communicated clearly also. I have noted some issues with PPNL before with regard to abstraction -- namely, Pig provides the MROperPlan to the listeners, which is not relevant in a differen execution engine. Julien suggested this should be fixed in a follow up patch. This will most certainly affect Ambrose and Lipstick so we should be cautious in that regard. Pluggable Execution Engine --- Key: PIG-3419 URL: https://issues.apache.org/jira/browse/PIG-3419 Project: Pig Issue Type: New Feature Affects Versions: 0.12 Reporter: Achal Soni Assignee: Achal Soni Priority: Minor Attachments: execengine.patch, mapreduce_execengine.patch, stats_scriptstate.patch, test_failures.txt, test_suite.patch, updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, updated-8-29-2013-exec-engine.patch In an effort to adapt Pig to work using Apache Tez (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for a cleaner ExecutionEngine abstraction than existed before. The changes are not that major as Pig was already relatively abstracted out between the frontend and backend. The changes in the attached commit are essentially the barebones changes -- I tried to not change the structure of Pig's different components too much. I think it will be interesting to see in the future how we can refactor more areas of Pig to really honor this abstraction between the frontend and backend. Some of the changes was to reinstate an ExecutionEngine interface to tie together the front end and backend, and making the changes in Pig to delegate to the EE when necessary, and creating an MRExecutionEngine that implements this interface. Other work included changing ExecType to cycle through the ExecutionEngines on the classpath and select the appropriate one (this is done using Java ServiceLoader, exactly how MapReduce does for choosing the framework to use between local and distributed mode). Also I tried to make ScriptState, JobStats, and PigStats as abstract as possible in its current state. I think in the future some work will need to be done here to perhaps re-evaluate the usage of ScriptState and the responsibilities of the different statistics classes. I haven't touched the PPNL, but I think more abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3048) Add mapreduce workflow information to job configuration
[ https://issues.apache.org/jira/browse/PIG-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-3048: Resolution: Fixed Fix Version/s: (was: 0.11.2) 0.12 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks guys! Add mapreduce workflow information to job configuration --- Key: PIG-3048 URL: https://issues.apache.org/jira/browse/PIG-3048 Project: Pig Issue Type: Improvement Reporter: Billie Rinaldi Assignee: Billie Rinaldi Fix For: 0.12 Attachments: PIG-3048.patch, PIG-3048.patch, PIG-3048.patch Adding workflow properties to the job configuration would enable logging and analysis of workflows in addition to individual MapReduce jobs. Suggested properties include a workflow ID, workflow name, adjacency list connecting nodes in the workflow, and the name of the current node in the workflow. mapreduce.workflow.id - a unique ID for the workflow, ideally prepended with the application name e.g. pig_pigScriptId mapreduce.workflow.name - a name for the workflow, to distinguish this workflow from other workflows and to group different runs of the same workflow e.g. pig command line mapreduce.workflow.adjacency - an adjacency list for the workflow graph, encoded as mapreduce.workflow.adjacency.source node = comma-separated list of target nodes mapreduce.workflow.node.name - the name of the node corresponding to this MapReduce job in the workflow adjacency list -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3048) Add mapreduce workflow information to job configuration
[ https://issues.apache.org/jira/browse/PIG-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753997#comment-13753997 ] Bill Graham commented on PIG-3048: -- +1 to commit. Just one style nit re spaces: {noformat} (getFileName() != null)?getFileName():default {noformat} should instead be: {noformat} (getFileName() != null) ? getFileName() : default {noformat} Add mapreduce workflow information to job configuration --- Key: PIG-3048 URL: https://issues.apache.org/jira/browse/PIG-3048 Project: Pig Issue Type: Improvement Reporter: Billie Rinaldi Assignee: Billie Rinaldi Fix For: 0.12 Attachments: PIG-3048.patch, PIG-3048.patch, PIG-3048.patch Adding workflow properties to the job configuration would enable logging and analysis of workflows in addition to individual MapReduce jobs. Suggested properties include a workflow ID, workflow name, adjacency list connecting nodes in the workflow, and the name of the current node in the workflow. mapreduce.workflow.id - a unique ID for the workflow, ideally prepended with the application name e.g. pig_pigScriptId mapreduce.workflow.name - a name for the workflow, to distinguish this workflow from other workflows and to group different runs of the same workflow e.g. pig command line mapreduce.workflow.adjacency - an adjacency list for the workflow graph, encoded as mapreduce.workflow.adjacency.source node = comma-separated list of target nodes mapreduce.workflow.node.name - the name of the node corresponding to this MapReduce job in the workflow adjacency list -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3048) Add mapreduce workflow information to job configuration
[ https://issues.apache.org/jira/browse/PIG-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753998#comment-13753998 ] Bill Graham commented on PIG-3048: -- Whoops, I was a minute too late. :) Add mapreduce workflow information to job configuration --- Key: PIG-3048 URL: https://issues.apache.org/jira/browse/PIG-3048 Project: Pig Issue Type: Improvement Reporter: Billie Rinaldi Assignee: Billie Rinaldi Fix For: 0.12 Attachments: PIG-3048.patch, PIG-3048.patch, PIG-3048.patch Adding workflow properties to the job configuration would enable logging and analysis of workflows in addition to individual MapReduce jobs. Suggested properties include a workflow ID, workflow name, adjacency list connecting nodes in the workflow, and the name of the current node in the workflow. mapreduce.workflow.id - a unique ID for the workflow, ideally prepended with the application name e.g. pig_pigScriptId mapreduce.workflow.name - a name for the workflow, to distinguish this workflow from other workflows and to group different runs of the same workflow e.g. pig command line mapreduce.workflow.adjacency - an adjacency list for the workflow graph, encoded as mapreduce.workflow.adjacency.source node = comma-separated list of target nodes mapreduce.workflow.node.name - the name of the node corresponding to this MapReduce job in the workflow adjacency list -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3048) Add mapreduce workflow information to job configuration
[ https://issues.apache.org/jira/browse/PIG-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754001#comment-13754001 ] Daniel Dai commented on PIG-3048: - No problem, I just committed the change you suggested. Thanks Bill! Add mapreduce workflow information to job configuration --- Key: PIG-3048 URL: https://issues.apache.org/jira/browse/PIG-3048 Project: Pig Issue Type: Improvement Reporter: Billie Rinaldi Assignee: Billie Rinaldi Fix For: 0.12 Attachments: PIG-3048.patch, PIG-3048.patch, PIG-3048.patch Adding workflow properties to the job configuration would enable logging and analysis of workflows in addition to individual MapReduce jobs. Suggested properties include a workflow ID, workflow name, adjacency list connecting nodes in the workflow, and the name of the current node in the workflow. mapreduce.workflow.id - a unique ID for the workflow, ideally prepended with the application name e.g. pig_pigScriptId mapreduce.workflow.name - a name for the workflow, to distinguish this workflow from other workflows and to group different runs of the same workflow e.g. pig command line mapreduce.workflow.adjacency - an adjacency list for the workflow graph, encoded as mapreduce.workflow.adjacency.source node = comma-separated list of target nodes mapreduce.workflow.node.name - the name of the node corresponding to this MapReduce job in the workflow adjacency list -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3441) Allow Pig to use default resources from Configuration objects
[ https://issues.apache.org/jira/browse/PIG-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754017#comment-13754017 ] Daniel Dai commented on PIG-3441: - I am not doubting your case does not work, but curious to know why PIG-3135 works. Seems both are trying to pass a custom configuration in. In PIG-3135, it pass some handcoded entries, and you want construct Configuration(true), both then pass the config object to Pig. If Pig does take the config object, then both case work, if not, both case fail. I do want to solve both issue in one consistent way if possible. Allow Pig to use default resources from Configuration objects - Key: PIG-3441 URL: https://issues.apache.org/jira/browse/PIG-3441 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.11.1 Reporter: Bhooshan Mogal Attachments: PIG-3441_1.patch, PIG-3441.patch Pig currently ignores parameters from configuration files added statically to Configuration objects as Configuration.addDefaultResource(filename.xml). Consider the following scenario - In a hadoop FileSystem driver for a non-HDFS filesystem you load properties specific to that FileSystem in a static initializer block in the class that extends org.apache.hadoop.fs.Filesystem for your FileSystem like below - {code} class MyFileSystem extends FileSystem { static { Configuration.addDefaultResource(myfs-default.xml); Configuration.addDefaultResource(myfs-site.xml); } } {code} Interfaces like the Hadoop CLI, Hive, Hadoop M/R can find configuration parameters defined in these configuration files as long as they are on the classpath. However, Pig cannot find parameters from these files, because it ignores configuration files added statically. Pig should allow users to specify if they would like pig to read parameters from resources loaded statically. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3433) The import sdsu cannot be resolved
[ https://issues.apache.org/jira/browse/PIG-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754061#comment-13754061 ] Daniel Dai commented on PIG-3433: - The error happens when you run ant or inside your eclipse? The import sdsu cannot be resolved -- Key: PIG-3433 URL: https://issues.apache.org/jira/browse/PIG-3433 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.11.1 Environment: Eclipse indigo Reporter: Ido Hadanny executed: ➜ trunk svn update At revision 1516115. ant clean eclipse-files ant compile gen getting: https://issues.apache.org/jira/browse/PIG-3399 AND after manually removing the wrong javacc-4.2 dependency, getting: The import sdsu cannot be resolved in DataGenerator.java -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2606) union/ join operations are not accepting same alias as multiple inputs
[ https://issues.apache.org/jira/browse/PIG-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-2606: Resolution: Fixed Fix Version/s: 0.12 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks Hari! union/ join operations are not accepting same alias as multiple inputs -- Key: PIG-2606 URL: https://issues.apache.org/jira/browse/PIG-2606 Project: Pig Issue Type: Bug Affects Versions: 0.9.2, 0.10.0 Reporter: Thejas M Nair Assignee: Hari Sankar Sivarama Subramaniyan Fix For: 0.12 Attachments: PIG-2606.2.patch.txt, PIG-2606.patch.txt grunt l = load 'x'; grunt u = union l, l; 2012-03-16 18:48:45,687 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. Union with Count(Operand) 2 grunt a = load 'a0.txt' as (a0, a1); grunt b = join a by a0, a by a1; 2013-08-27 13:36:21,807 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2225: Projection with nothing to reference! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3441) Allow Pig to use default resources from Configuration objects
[ https://issues.apache.org/jira/browse/PIG-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754067#comment-13754067 ] Bhooshan Mogal commented on PIG-3441: - I sort of see your point now. It seems like [PIG-3135|https://issues.apache.org/jira/browse/PIG-3135] would work only if resources are added to Configuration objects as confObject.addResource() and not Configuration.addDefaultResource(), since loadDefaults is set to false in ConfigurationUtil.toConfiguration()? Unless the standard configuration files are added as configObject.addResource() somewhere in the code? Allow Pig to use default resources from Configuration objects - Key: PIG-3441 URL: https://issues.apache.org/jira/browse/PIG-3441 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.11.1 Reporter: Bhooshan Mogal Attachments: PIG-3441_1.patch, PIG-3441.patch Pig currently ignores parameters from configuration files added statically to Configuration objects as Configuration.addDefaultResource(filename.xml). Consider the following scenario - In a hadoop FileSystem driver for a non-HDFS filesystem you load properties specific to that FileSystem in a static initializer block in the class that extends org.apache.hadoop.fs.Filesystem for your FileSystem like below - {code} class MyFileSystem extends FileSystem { static { Configuration.addDefaultResource(myfs-default.xml); Configuration.addDefaultResource(myfs-site.xml); } } {code} Interfaces like the Hadoop CLI, Hive, Hadoop M/R can find configuration parameters defined in these configuration files as long as they are on the classpath. However, Pig cannot find parameters from these files, because it ignores configuration files added statically. Pig should allow users to specify if they would like pig to read parameters from resources loaded statically. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3349) Document ToString(Datetime, String) UDF
[ https://issues.apache.org/jira/browse/PIG-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754150#comment-13754150 ] Daniel Dai commented on PIG-3349: - +1. We also need to complete type conversion table later. Document ToString(Datetime, String) UDF --- Key: PIG-3349 URL: https://issues.apache.org/jira/browse/PIG-3349 Project: Pig Issue Type: Bug Components: documentation Affects Versions: 0.11.1 Reporter: pat chan Assignee: Cheolsoo Park Priority: Minor Fix For: 0.12 Attachments: PIG-3349.patch Currently you can't cast a datetimeobject into a chararray: grunt B = foreach A generate (chararray)a; dump B; 2013-06-05 15:29:01,372 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1052: line 8, column 24 Cannot cast datetime to chararray Details at logfile: /Users/patc/projects/pig-0.11.1/pig_1370471270879.log Was this an oversight? The documented casting matrix does not show the datetime object so I'm not sure if the current behavior is correct or not. My recommendation would be to support casting to and from strings. Casting from a string would behave exactly like loading a datetime. Casting to a string would be exactly the format you get when you dump a datetime. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3285) Jobs using HBaseStorage fail to ship dependency jars
[ https://issues.apache.org/jira/browse/PIG-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754182#comment-13754182 ] Daniel Dai commented on PIG-3285: - That sounds good. We can switch to it for newer version of hbase once HBASE-9165 committed. Jobs using HBaseStorage fail to ship dependency jars Key: PIG-3285 URL: https://issues.apache.org/jira/browse/PIG-3285 Project: Pig Issue Type: Bug Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.11.1 Attachments: 0001-PIG-3285-Add-HBase-dependency-jars.patch, 0001-PIG-3285-Add-HBase-dependency-jars.patch, 1.pig, 1.txt, 2.pig Launching a job consuming {{HBaseStorage}} fails out of the box. The user must specify {{-Dpig.additional.jars}} for HBase and all of its dependencies. Exceptions look something like this: {noformat} 2013-04-19 18:58:39,360 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.NoClassDefFoundError: com/google/protobuf/Message at org.apache.hadoop.hbase.io.HbaseObjectWritable.clinit(HbaseObjectWritable.java:266) at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:612) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84) at $Proxy7.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:136) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3445) Make Parquet format available out of the box in Pig
Julien Le Dem created PIG-3445: -- Summary: Make Parquet format available out of the box in Pig Key: PIG-3445 URL: https://issues.apache.org/jira/browse/PIG-3445 Project: Pig Issue Type: Improvement Reporter: Julien Le Dem We would add the Parquet jar in the Pig packages to make it available out of the box to pig users. On top of that we could add the parquet.pig package to the list of packages to search for UDFs. (alternatively, the parquet jar could contain classes name or.apache.pig.builtin.ParquetLoader and ParquetStorer) This way users can use Parquet simply by typing: A = LOAD 'foo' USING ParquetLoader(); STORE A INTO 'bar' USING ParquetStorer(); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Parquet support built in Pig
Hello fellow Pig developers I have opened a JIRA to add Parquet as a buit-in format in Pig: https://issues.apache.org/jira/browse/PIG-3445 Please let me know what you think. Julien
Re: Parquet support built in Pig
I think this is awesome. Best thing since diet sliced bread (they cut the slices thin). On Thu, Aug 29, 2013 at 4:36 PM, Julien Le Dem jul...@ledem.net wrote: Hello fellow Pig developers I have opened a JIRA to add Parquet as a buit-in format in Pig: https://issues.apache.org/jira/browse/PIG-3445 Please let me know what you think. Julien -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754235#comment-13754235 ] Dmitriy V. Ryaboy commented on PIG-3419: [~billgraham] looping you in for Ambrose. Pluggable Execution Engine --- Key: PIG-3419 URL: https://issues.apache.org/jira/browse/PIG-3419 Project: Pig Issue Type: New Feature Affects Versions: 0.12 Reporter: Achal Soni Assignee: Achal Soni Priority: Minor Attachments: execengine.patch, mapreduce_execengine.patch, stats_scriptstate.patch, test_failures.txt, test_suite.patch, updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, updated-8-29-2013-exec-engine.patch In an effort to adapt Pig to work using Apache Tez (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for a cleaner ExecutionEngine abstraction than existed before. The changes are not that major as Pig was already relatively abstracted out between the frontend and backend. The changes in the attached commit are essentially the barebones changes -- I tried to not change the structure of Pig's different components too much. I think it will be interesting to see in the future how we can refactor more areas of Pig to really honor this abstraction between the frontend and backend. Some of the changes was to reinstate an ExecutionEngine interface to tie together the front end and backend, and making the changes in Pig to delegate to the EE when necessary, and creating an MRExecutionEngine that implements this interface. Other work included changing ExecType to cycle through the ExecutionEngines on the classpath and select the appropriate one (this is done using Java ServiceLoader, exactly how MapReduce does for choosing the framework to use between local and distributed mode). Also I tried to make ScriptState, JobStats, and PigStats as abstract as possible in its current state. I think in the future some work will need to be done here to perhaps re-evaluate the usage of ScriptState and the responsibilities of the different statistics classes. I haven't touched the PPNL, but I think more abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3285) Jobs using HBaseStorage fail to ship dependency jars
[ https://issues.apache.org/jira/browse/PIG-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754251#comment-13754251 ] Nick Dimiduk commented on PIG-3285: --- [~daijy] would you mind commenting positively on the HBase ticket as well? Thanks. Jobs using HBaseStorage fail to ship dependency jars Key: PIG-3285 URL: https://issues.apache.org/jira/browse/PIG-3285 Project: Pig Issue Type: Bug Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.11.1 Attachments: 0001-PIG-3285-Add-HBase-dependency-jars.patch, 0001-PIG-3285-Add-HBase-dependency-jars.patch, 1.pig, 1.txt, 2.pig Launching a job consuming {{HBaseStorage}} fails out of the box. The user must specify {{-Dpig.additional.jars}} for HBase and all of its dependencies. Exceptions look something like this: {noformat} 2013-04-19 18:58:39,360 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.NoClassDefFoundError: com/google/protobuf/Message at org.apache.hadoop.hbase.io.HbaseObjectWritable.clinit(HbaseObjectWritable.java:266) at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:612) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84) at $Proxy7.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:136) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (19 issues) Subscriber: pigdaily Key Summary PIG-3441Allow Pig to use default resources from Configuration objects https://issues.apache.org/jira/browse/PIG-3441 PIG-3434Null subexpression in bincond nullifies outer tuple (or bag) https://issues.apache.org/jira/browse/PIG-3434 PIG-3431Return more information for parsing related exceptions. https://issues.apache.org/jira/browse/PIG-3431 PIG-3430Add xml format for explaining MapReduce Plan. https://issues.apache.org/jira/browse/PIG-3430 PIG-3426Add support for removing s3 files https://issues.apache.org/jira/browse/PIG-3426 PIG-3419Pluggable Execution Engine https://issues.apache.org/jira/browse/PIG-3419 PIG-3374CASE and IN fail when expression includes dereferencing operator https://issues.apache.org/jira/browse/PIG-3374 PIG-3349Document ToString(Datetime, String) UDF https://issues.apache.org/jira/browse/PIG-3349 PIG-3346New property that controls the number of combined splits https://issues.apache.org/jira/browse/PIG-3346 PIG-Fix remaining Windows core unit test failures https://issues.apache.org/jira/browse/PIG- PIG-3325Adding a tuple to a bag is slow https://issues.apache.org/jira/browse/PIG-3325 PIG-3295Casting from bytearray failing after Union (even when each field is from a single Loader) https://issues.apache.org/jira/browse/PIG-3295 PIG-3292Logical plan invalid state: duplicate uid in schema during self-join to get cross product https://issues.apache.org/jira/browse/PIG-3292 PIG-3257Add unique identifier UDF https://issues.apache.org/jira/browse/PIG-3257 PIG-3255Avoid extra byte array copy in streaming deserialize https://issues.apache.org/jira/browse/PIG-3255 PIG-3199Expose LogicalPlan via PigServer API https://issues.apache.org/jira/browse/PIG-3199 PIG-3117A debug mode in which pig does not delete temporary files https://issues.apache.org/jira/browse/PIG-3117 PIG-3088Add a builtin udf which removes prefixes https://issues.apache.org/jira/browse/PIG-3088 PIG-3021Split results missing records when there is null values in the column comparison https://issues.apache.org/jira/browse/PIG-3021 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225filterId=12322384