[jira] [Commented] (PIG-2031) NPE in TOP
[ https://issues.apache.org/jira/browse/PIG-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028810#comment-13028810 ] Dmitriy V. Ryaboy commented on PIG-2031: Patch seems fine, thanks for doing that. A new test that tries the null bag would be good. NPE in TOP -- Key: PIG-2031 URL: https://issues.apache.org/jira/browse/PIG-2031 Project: Pig Issue Type: Bug Reporter: Jacob Perkins Attachments: toppatch.txt If a NULL DataBag is passed to org.apache.pig.builtin.TOP then a NPE is thrown. Consider: {code} $: cat foo.tsv a {(foo,1),(bar,2)} b c {(fyha,4),(asdf,9)} {code} then: {code} data = LOAD 'foo.tsv' AS (key:chararray, a_bag:bag {t:tuple (name:chararray, value:int)}); tpd = FOREACH data { top_n = TOP(1, 1, a_bag); GENERATE key AS key, top_n AS top_n ; }; DUMP tpd; {code} will throw an NPE when it gets to the row with no bag. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
run_hadoop_grid5000
Hi, I run hadoop in grid 5000,I like to know how I can use pig in hadoop mode in this case ?? and thank you
[jira] [Commented] (PIG-2008) Cache outputFormat in HBaseStorage
[ https://issues.apache.org/jira/browse/PIG-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028841#comment-13028841 ] Alan Gates commented on PIG-2008: - Committed to trunk. Will commit to 0.9 branch shortly. Cache outputFormat in HBaseStorage -- Key: PIG-2008 URL: https://issues.apache.org/jira/browse/PIG-2008 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.8.0 Reporter: Jacob Perkins Priority: Minor Fix For: 0.8.0 Attachments: patch_file.txt Original Estimate: 10m Remaining Estimate: 10m getOutputFormat gets called more than one time in a StoreFunc. Modify HBaseStorage to only create an instance of TableOutputFormat one time (since it creates a new HTable connection each time) as opposed to multiple times like it does now. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro
Macro expansion doesn't handle multiple expansions of same macro inside another macro - Key: PIG-2035 URL: https://issues.apache.org/jira/browse/PIG-2035 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Here is the use case: {code} define test ( in, out, x ) returns c { a = load '$in' as (name, age, gpa); b = group a by gpa; $c = foreach b generate group, COUNT(a.$x); store $c into '$out'; }; define test2( in, out ) returns x { $x = test( '$in', '$out', 'name' ); $x = test( '$in', '$out.1', 'age' ); $x = test( '$in', '$out.2', 'gpa' ); }; x = test2('studenttab10k', 'myoutput'); {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2036) Set header delimiter in PigStorageSchema
[ https://issues.apache.org/jira/browse/PIG-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mads Moeller updated PIG-2036: -- Attachment: PIG-2036.patch Patch Set header delimiter in PigStorageSchema Key: PIG-2036 URL: https://issues.apache.org/jira/browse/PIG-2036 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0, 0.8.0 Reporter: Mads Moeller Priority: Minor Attachments: PIG-2036.patch Piggybanks' PigStorageSchema currently defaults the delimiter to a tab in the generated header file (.pig_header). The attached patch set the header delimiter to what is passed in via the constructor. Otherwise it'll default to tab '\t'. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2039) IndexOutOfBounException for a case
IndexOutOfBounException for a case -- Key: PIG-2039 URL: https://issues.apache.org/jira/browse/PIG-2039 Project: Pig Issue Type: Bug Affects Versions: 0.9.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.9.0 The following query gives an exception: a = load '1.txt' as (a0:int, a1:int, a2:int); b = group a by a0; c = foreach b { c1 = limit a 10; c2 = distinct c1.a1; c3 = distinct c1.a2; generate c2, c3;}; store c into 'output'; 2011-05-04 12:36:01,720 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. Index: 0, Size: 0 Stack trace: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.pig.newplan.logical.expression.ProjectExpression.getFieldSchema(ProjectExpression.java:279) at org.apache.pig.newplan.logical.relational.LOGenerate.getSchema(LOGenerate.java:88) at org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.validate(SchemaAliasVisitor.java:60) at org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:104) at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:240) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:99) at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:73) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.PigServer$Graph.compile(PigServer.java:1664) at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1615) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1586) at org.apache.pig.PigServer.registerQuery(PigServer.java:580) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:930) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:176) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:152) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76) at org.apache.pig.Main.run(Main.java:488) at org.apache.pig.Main.main(Main.java:109) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro
[ https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2035: -- Attachment: PIG-2035_1.patch Macro expansion doesn't handle multiple expansions of same macro inside another macro - Key: PIG-2035 URL: https://issues.apache.org/jira/browse/PIG-2035 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2035_1.patch Here is the use case: {code} define test ( in, out, x ) returns c { a = load '$in' as (name, age, gpa); b = group a by gpa; $c = foreach b generate group, COUNT(a.$x); store $c into '$out'; }; define test2( in, out ) returns x { $x = test( '$in', '$out', 'name' ); $x = test( '$in', '$out.1', 'age' ); $x = test( '$in', '$out.2', 'gpa' ); }; x = test2('studenttab10k', 'myoutput'); {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2040) Move classloader from QueryParserDriver to PigContext
Move classloader from QueryParserDriver to PigContext - Key: PIG-2040 URL: https://issues.apache.org/jira/browse/PIG-2040 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.9.0 Attachments: PIG-2040-1.patch After PIG-1775, mapreduce mode fail. The reason is we move classloader from LogicalPlanBuilder to QueryParserDriver, which will need antlr.jar, however, we don't ship antlr.jar to backend. It is better to move classloader to PigContext. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2040) Move classloader from QueryParserDriver to PigContext
[ https://issues.apache.org/jira/browse/PIG-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-2040: Attachment: PIG-2040-1.patch Move classloader from QueryParserDriver to PigContext - Key: PIG-2040 URL: https://issues.apache.org/jira/browse/PIG-2040 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.9.0 Attachments: PIG-2040-1.patch After PIG-1775, mapreduce mode fail. The reason is we move classloader from LogicalPlanBuilder to QueryParserDriver, which will need antlr.jar, however, we don't ship antlr.jar to backend. It is better to move classloader to PigContext. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1866) Dereference a bag within a tuple does not work
[ https://issues.apache.org/jira/browse/PIG-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029014#comment-13029014 ] Daniel Dai commented on PIG-1866: - Yes, it only goes to 0.9. For 0.8, you need to apply PIG-1866-3.patch manually. Dereference a bag within a tuple does not work -- Key: PIG-1866 URL: https://issues.apache.org/jira/browse/PIG-1866 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.9.0 Attachments: PIG-1866-1.patch, PIG-1866-2.patch, PIG-1866-3.patch The following script does not work (both in new and old logical plan): {code} a = load '1.txt' as (t : tuple(i: int, b1: bag { b_tuple : tuple ( b_str: chararray) })); b = foreach a generate t.b1; dump b; {code} 1.txt: (1,{(one),(two)}) Error from old logical plan: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.pig.data.DataBag at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:482) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:197) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:480) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:197) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:339) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:237) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) Error from new logical plan: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.consumeInputBag(POProject.java:246) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:200) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:339) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:237) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) If we change b = foreach a generate t.b1; to b = foreach a generate t.i;, it works fine, only refer to a bag does not work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro
[ https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029045#comment-13029045 ] Richard Ding commented on PIG-2035: --- test-patch result: {code} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 585 release audit warnings (more than the trunk's current 584 warnings). {code} Macro expansion doesn't handle multiple expansions of same macro inside another macro - Key: PIG-2035 URL: https://issues.apache.org/jira/browse/PIG-2035 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2035_1.patch Here is the use case: {code} define test ( in, out, x ) returns c { a = load '$in' as (name, age, gpa); b = group a by gpa; $c = foreach b generate group, COUNT(a.$x); store $c into '$out'; }; define test2( in, out ) returns x { $x = test( '$in', '$out', 'name' ); $x = test( '$in', '$out.1', 'age' ); $x = test( '$in', '$out.2', 'gpa' ); }; x = test2('studenttab10k', 'myoutput'); {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2040) Move classloader from QueryParserDriver to PigContext
[ https://issues.apache.org/jira/browse/PIG-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029058#comment-13029058 ] Xuefu Zhang commented on PIG-2040: -- +1 Move classloader from QueryParserDriver to PigContext - Key: PIG-2040 URL: https://issues.apache.org/jira/browse/PIG-2040 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.9.0 Attachments: PIG-2040-1.patch After PIG-1775, mapreduce mode fail. The reason is we move classloader from LogicalPlanBuilder to QueryParserDriver, which will need antlr.jar, however, we don't ship antlr.jar to backend. It is better to move classloader to PigContext. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2034) Pig client uses fs.default.name as provided from the JobTracker instead of local value
[ https://issues.apache.org/jira/browse/PIG-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029059#comment-13029059 ] Bill Graham commented on PIG-2034: -- I was wondering the same thing after I filed this. I can't tell if this is from the MR job submission doing this on the client or Pig. Can you tell from the stack trace I provided? Pig client uses fs.default.name as provided from the JobTracker instead of local value -- Key: PIG-2034 URL: https://issues.apache.org/jira/browse/PIG-2034 Project: Pig Issue Type: Bug Reporter: Bill Graham Attachments: pig_1304465896181.log When submitting a Pig job, the client uses the {{fs.default.name}} supplied to it by the JobTracker (via core-site.xml on the master typically) during the staging phase. After that, the client then uses the {{fs.default.name}} from it's local configs. This seems like a bug to me. Expected behavior would be to always use the local value. I found this bug when the server configs were set to not use a FQDN for {{fs.default.name}}. This caused the client to fail because it didn't have the same default DNS domain. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2012) Comments at the begining of the file throws off line numbers in errors
[ https://issues.apache.org/jira/browse/PIG-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029063#comment-13029063 ] Xuefu Zhang commented on PIG-2012: -- 1. Macro invocation stack should be per macro, not per node in a macro. Also, stack seems to be a more appropriate data structure for this. 2. To get line numbers correctly, we should be able to do it more naturally in PigParserNode constractor, rather than prepending \n in the text. 3. It might be cleaner and more OO if we have PigParserMacroNode that inherits from PigParserNode. In that case, we only need to override toString() method, but avoid if-else clauses in the code. Comments at the begining of the file throws off line numbers in errors -- Key: PIG-2012 URL: https://issues.apache.org/jira/browse/PIG-2012 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Alan Gates Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2012_1.patch, macro.pig The preprocessor does not appear to be handling leading comments properly when calculating line numbers for error messages. In the attached script, the error is reported to be on line 7. It is actually on line 10. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1775) Removal of old logical plan
[ https://issues.apache.org/jira/browse/PIG-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1775: - Attachment: PIG-1775-3.patch 1. Migrated additional test cases 2. Provided new implementation for parsing schema from a string, parsing constant from a string, etc. Removal of old logical plan --- Key: PIG-1775 URL: https://issues.apache.org/jira/browse/PIG-1775 Project: Pig Issue Type: Improvement Affects Versions: 0.9.0 Reporter: Yan Zhou Assignee: Xuefu Zhang Fix For: 0.9.0 Attachments: PIG-1775-2.patch, PIG-1775-3.patch, PIG-1775.patch The new logical plan will only be used and the old logical plan will be removed once the new one is stable enough. It is scheduled for the 0.9 release. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2034) Pig client uses fs.default.name as provided from the JobTracker instead of local value
[ https://issues.apache.org/jira/browse/PIG-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029073#comment-13029073 ] Daniel Dai commented on PIG-2034: - The stack is all from mapreduce. I wonder if you will see the same thing in a pure mapreduce job. Pig client uses fs.default.name as provided from the JobTracker instead of local value -- Key: PIG-2034 URL: https://issues.apache.org/jira/browse/PIG-2034 Project: Pig Issue Type: Bug Reporter: Bill Graham Attachments: pig_1304465896181.log When submitting a Pig job, the client uses the {{fs.default.name}} supplied to it by the JobTracker (via core-site.xml on the master typically) during the staging phase. After that, the client then uses the {{fs.default.name}} from it's local configs. This seems like a bug to me. Expected behavior would be to always use the local value. I found this bug when the server configs were set to not use a FQDN for {{fs.default.name}}. This caused the client to fail because it didn't have the same default DNS domain. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2040) Move classloader from QueryParserDriver to PigContext
[ https://issues.apache.org/jira/browse/PIG-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029081#comment-13029081 ] Santhosh Srinivasan commented on PIG-2040: -- I remember a user trying to parse schemas using the Util.parseFromString() method in an UDF. This might require shipping the antlr binaries to the back-end. Will this patch address this issue too? Move classloader from QueryParserDriver to PigContext - Key: PIG-2040 URL: https://issues.apache.org/jira/browse/PIG-2040 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.9.0 Attachments: PIG-2040-1.patch After PIG-1775, mapreduce mode fail. The reason is we move classloader from LogicalPlanBuilder to QueryParserDriver, which will need antlr.jar, however, we don't ship antlr.jar to backend. It is better to move classloader to PigContext. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2034) Pig client uses fs.default.name as provided from the JobTracker instead of local value
[ https://issues.apache.org/jira/browse/PIG-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029082#comment-13029082 ] Bill Graham commented on PIG-2034: -- Yup. I just tried the same with WordCount and got the same failure, so this doesn't seem to be a Pig bug. Do you think this is a valid MapReduce bug? {code} Exception in thread main java.net.UnknownHostException: unknown host: colo1-hadoop-nn-r0-n0 at org.apache.hadoop.ipc.Client$Connection.init(Client.java:195) at org.apache.hadoop.ipc.Client.getConnection(Client.java:850) at org.apache.hadoop.ipc.Client.call(Client.java:720) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy0.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:207) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:170) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1419) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1444) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1432) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175) at org.apache.hadoop.mapred.JobClient.getFs(JobClient.java:504) at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:608) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:802) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:771) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1290) at cnwk.hadoop.mapreduce.WordCount.main(WordCount.java:54) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {code} Pig client uses fs.default.name as provided from the JobTracker instead of local value -- Key: PIG-2034 URL: https://issues.apache.org/jira/browse/PIG-2034 Project: Pig Issue Type: Bug Reporter: Bill Graham Attachments: pig_1304465896181.log When submitting a Pig job, the client uses the {{fs.default.name}} supplied to it by the JobTracker (via core-site.xml on the master typically) during the staging phase. After that, the client then uses the {{fs.default.name}} from it's local configs. This seems like a bug to me. Expected behavior would be to always use the local value. I found this bug when the server configs were set to not use a FQDN for {{fs.default.name}}. This caused the client to fail because it didn't have the same default DNS domain. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2041) Minicluster should make each run independent
Minicluster should make each run independent Key: PIG-2041 URL: https://issues.apache.org/jira/browse/PIG-2041 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0, 0.9.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.9.0 Minicluster will reuse ~/pigtest/conf/hadoop-site.xml. If something wrong in hadoop-site.xml, next test will also be affected. This leads to some mysterious test failures. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2041) Minicluster should make each run independent
[ https://issues.apache.org/jira/browse/PIG-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-2041: Attachment: PIG-2041-1.patch PIG-2041-1.patch also include a change to fix TestStoreInstances failure. Minicluster should make each run independent Key: PIG-2041 URL: https://issues.apache.org/jira/browse/PIG-2041 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0, 0.9.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.9.0 Attachments: PIG-2041-1.patch Minicluster will reuse ~/pigtest/conf/hadoop-site.xml. If something wrong in hadoop-site.xml, next test will also be affected. This leads to some mysterious test failures. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2040) Move classloader from QueryParserDriver to PigContext
[ https://issues.apache.org/jira/browse/PIG-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029093#comment-13029093 ] Daniel Dai commented on PIG-2040: - bq. I remember a user trying to parse schemas using the Util.parseFromString() method in an UDF. This might require shipping the antlr binaries to the back-end. Will this patch address this issue too? No, it will not. Seems we need to ship antlr.jar as well. But still put classloader into PigContext is more clear. Also I wonder how Util.parseFromString() works now? We don't ship javacc as well. Move classloader from QueryParserDriver to PigContext - Key: PIG-2040 URL: https://issues.apache.org/jira/browse/PIG-2040 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.9.0 Attachments: PIG-2040-1.patch After PIG-1775, mapreduce mode fail. The reason is we move classloader from LogicalPlanBuilder to QueryParserDriver, which will need antlr.jar, however, we don't ship antlr.jar to backend. It is better to move classloader to PigContext. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2008) Cache outputFormat in HBaseStorage
[ https://issues.apache.org/jira/browse/PIG-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-2008. - Resolution: Fixed Fix Version/s: (was: 0.8.0) 0.10 0.9.0 Assignee: Jacob Perkins Patch checked into trunk and 0.9 branch. Thanks Jacob. Cache outputFormat in HBaseStorage -- Key: PIG-2008 URL: https://issues.apache.org/jira/browse/PIG-2008 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.8.0 Reporter: Jacob Perkins Assignee: Jacob Perkins Priority: Minor Fix For: 0.9.0, 0.10 Attachments: patch_file.txt Original Estimate: 10m Remaining Estimate: 10m getOutputFormat gets called more than one time in a StoreFunc. Modify HBaseStorage to only create an instance of TableOutputFormat one time (since it creates a new HTable connection each time) as opposed to multiple times like it does now. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira