[jira] Updated: (PIG-997) [zebra] Sorted Table Support by Zebra
[ https://issues.apache.org/jira/browse/PIG-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-997: - Status: Patch Available (was: Open) > [zebra] Sorted Table Support by Zebra > - > > Key: PIG-997 > URL: https://issues.apache.org/jira/browse/PIG-997 > Project: Pig > Issue Type: New Feature >Reporter: Yan Zhou >Assignee: Yan Zhou > Fix For: 0.6.0 > > Attachments: SortedTable.patch, SortedTable.patch, SortedTable.patch, > SortedTable.patch > > > This new feature is for Zebra to support sorted data in storage. As a storage > library, Zebra will not sort the data by itself. But it will support creation > and use of sorted data either through PIG or through map/reduce tasks that > use Zebra as storage format. > The sorted table keeps the data in a "totally sorted" manner across all > TFiles created by potentially all mappers or reducers. > For sorted data creation through PIG's STORE operator , if the input data is > sorted through "ORDER BY", the new Zebra table will be marked as sorted on > the sorted columns; > For sorted data creation though Map/Reduce tasks, three new static methods > of the BasicTableOutput class will be provided to allow or help the user to > achieve the goal. "setSortInfo" allows the user to specify the sorted columns > of the input tuple to be stored; "getSortKeyGenerator" and "getSortKey" help > the user to generate the key acceptable by Zebra as a sorted key based upon > the schema, sorted columns and the input tuple. > For sorted data read through PIG's LOAD operator, pass string "sorted" as an > extra argument to the TableLoader constructor to ask for sorted table to be > loaded; > For sorted data read through Map/Reduce tasks, a new static method of > TableInputFormat class, requireSortedTable, can be called to ask for a sorted > table to be read. Additionally, an overloaded version of the new method can > be called to ask for a sorted table on specified sort columns and comparator. > For this release, sorted table only supported sorting in ascending order, not > in descending order. In addition, the sort keys must be of simple types not > complex types such as RECORD, COLLECTION and MAP. > Multiple-key sorting is supported. But the ordering of the multiple sort keys > is significant with the first sort column being the primary sort key, the > second being the secondary sort key, etc. > In this release, the sort keys are stored along with the sort columns where > the keys were originally created from, resulting in some data storage > redundancy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Problem running Pig 0.60
Hi pig team, I¹m testing zebra v2 and trying to run the pig 0.60 jar that I got from Yan. However, I got the following error: Caused by: java.lang.ClassNotFoundException: jline.ConsoleReaderInputStream at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) Is there any additional jar file that I need to include with Hadoop or pig? Thanks~ -- Yiping Han y...@yahoo-inc.com US phone: +1(408)349-4403 Beijing phone: +86(10)8215-9357
[jira] Commented: (PIG-958) Splitting output data on key field
[ https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773389#action_12773389 ] Ankur commented on PIG-958: --- > Can you explain this a little bit more - .. In the earlier patch (958.v3.patch), After moving the results from the tasks current working directory, I was manually deleting the directory. This is to ensure that empty part files don't get moved to the final output directory. But doing so causes hadoop to complain that it can no longer write to task's output dir and the task fails. > I saw compile errors while trying to run unit test: ... Did you compile the pig.jar and ran core test before. This creates the necessary classes and jar file son the local machine required by contrib tests. On my local machine gan...@grainflydivide-dr:pig_trunk$ ant ... buildJar: [echo] svnString 830456 [jar] Building jar: /home/gankur/eclipse/workspace/pig_trunk/build/pig-0.6.0-dev-core.jar [jar] Building jar: /home/gankur/eclipse/workspace/pig_trunk/build/pig-0.6.0-dev.jar [copy] Copying 1 file to /home/gankur/eclipse/workspace/pig_trunk gan...@grainflydivide-dr:pig_trunk$ ant test ... test-core: [delete] Deleting directory /home/gankur/eclipse/workspace/pig_trunk/build/test/logs [mkdir] Created dir: /home/gankur/eclipse/workspace/pig_trunk/build/test/logs [junit] Running org.apache.pig.test.TestAdd [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.024 sec [junit] Running org.apache.pig.test.TestAlgebraicEval ... gan...@grainflydivide-dr:pig_trunk$ cd contrib/piggybank/java/ gan...@grainflydivide-dr:java$ ant test ... test: [echo] *** Running UDF tests *** [delete] Deleting directory /home/gankur/eclipse/workspace/pig_trunk/contrib/piggybank/java/build/test/logs [mkdir] Created dir: /home/gankur/eclipse/workspace/pig_trunk/contrib/piggybank/java/build/test/logs [junit] Running org.apache.pig.piggybank.test.evaluation.TestEvalString [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.15 sec [junit] Running org.apache.pig.piggybank.test.evaluation.TestMathUDF [junit] Tests run: 35, Failures: 0, Errors: 0, Time elapsed: 0.123 sec [junit] Running org.apache.pig.piggybank.test.evaluation.TestStat [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.114 sec [junit] Running org.apache.pig.piggybank.test.evaluation.datetime.TestDiffDate [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.105 sec [junit] Running org.apache.pig.piggybank.test.evaluation.decode.TestDecode [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.089 sec [junit] Running org.apache.pig.piggybank.test.evaluation.string.TestHashFNV [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.094 sec [junit] Running org.apache.pig.piggybank.test.evaluation.string.TestLookupInFiles [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 17.163 sec [junit] Running org.apache.pig.piggybank.test.evaluation.string.TestRegex [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.092 sec [junit] Running org.apache.pig.piggybank.test.evaluation.util.TestSearchQuery [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.093 sec [junit] Running org.apache.pig.piggybank.test.evaluation.util.TestTop [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.099 sec [junit] Running org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestDateExtractor [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.087 sec [junit] Running org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestHostExtractor [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.083 sec [junit] Running org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestSearchEngineExtractor [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.091 sec [junit] Running org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestSearchTermExtractor [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.1 sec [junit] Running org.apache.pig.piggybank.test.storage.TestCombinedLogLoader [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.535 sec [junit] Running org.apache.pig.piggybank.test.storage.TestCommonLogLoader [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.54 sec [junit] Running org.apache.pig.piggybank.test.storage.TestHelper [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.014 sec [junit] Running org.apache.pig.piggybank.test.storage.TestMultiStorage [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 16.964 sec [junit] Running org.apache.pig.piggybank.test.storage.TestMyRegExLoader [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.452 sec [junit] Running org.apache
[jira] Updated: (PIG-970) Support of HBase 0.20.0
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-970: --- Attachment: Pig_HBase_0.20.0.patch Alan, I find the problem. Before in eclipse I put the output folder to build/classes which is conflict with the output folder in build.xml. So it hides the problem. Now I add one line in build.xml: {code} {code} So that the test case code can find hbase-site.xml in classpath. > Support of HBase 0.20.0 > --- > > Key: PIG-970 > URL: https://issues.apache.org/jira/browse/PIG-970 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.3.0 >Reporter: Vincent BARAT >Assignee: Jeff Zhang > Fix For: 0.5.0 > > Attachments: build.xml.path, hbase-0.20.0-test.jar, hbase-0.20.0.jar, > pig-hbase-0.20.0-support.patch, pig-hbase-20-v2.patch, > Pig_HBase_0.20.0.patch, TEST-org.apache.pig.test.TestHBaseStorage.txt, > TEST-org.apache.pig.test.TestHBaseStorage.txt, > TEST-org.apache.pig.test.TestHBaseStorage.txt, test-output.tgz, > zookeeper-hbase-1329.jar > > > The support of HBase is currently very limited and restricted to HBase 0.18.0. > Because the next releases of PIG will support Hadoop 0.20.0, they should also > support HBase 0.20.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: two-level access problem?
The twoLevelAccessRequired flag is not quite a long term solution to the problem. The problem is that we treat output of relations to be bags but their schemas do NOT have twoLevelAccessRequired to be true. Only bag constants and bags from input data have this flag set to true. We need to move to either *all* bag schemas having a tuple schema with the real schema which reflects the layout of the bag or think of an alternative. Implementing the solution may have many more details which will need to be looked at. This flag should be removed and should not be needed once we arrive at a solution. Otherwise Resource Schema would also need to have this notion of two level access for bag fields. Pradeep. -Original Message- From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] Sent: Tuesday, November 03, 2009 12:30 PM To: pig-dev@hadoop.apache.org Subject: Re: two-level access problem? Thanks Pradeep, I saw that comment. I guess my question is, given the solution this comment describes, what are you referring to in the Load/Store redesign doc when you say "we must fix the two level access issues with schema of bags in current schema before we make these changes, otherwise that same contagion will afflict us here?" -D On Tue, Nov 3, 2009 at 2:10 PM, Pradeep Kamath wrote: > From comments in Schema.java: > // In bags which have a schema with a tuple which contains > // the fields present in it, if we access the second field (say) > // we are actually trying to access the second field in the > // tuple in the bag. This is currently true for two cases: > // 1) bag constants - the schema of bag constant has a tuple > // which internally has the actual elements > // 2) When bags are loaded from input data, if the user > // specifies a schema with the "bag" type, he has to specify > // the bag as containing a tuple with the actual elements in > // the schema declaration. However in both the cases above, > // the user can still say b.i where b is the bag and i is > // an element in the bag's tuple schema. So in these cases, > // the access should translate to a lookup for "i" in the > // tuple schema present in the bag. To indicate this, the > // flag below is used. It is false by default because, > // currently we use bag as the type for relations. However > // the schema of a relation does NOT have a tuple fieldschema > // with items in it. Instead, the schema directly has the > // field schema of the items. So for a relation "b", the > // above b.i access would be a direct single level access > // of i in b's schema. This is treated as the "default" case > private boolean twoLevelAccessRequired = false; > > -Original Message- > From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] > Sent: Monday, November 02, 2009 5:33 PM > To: pig-dev@hadoop.apache.org > Subject: two-level access problem? > > Could someone explain the nature of the "two-level access problem" > referred to in the Load/Store redesign wiki and in the DataType code? > > > Thanks, > -D >
[jira] Updated: (PIG-970) Support of HBase 0.20.0
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-970: --- Attachment: (was: Pig_HBase_0.20.0.patch) > Support of HBase 0.20.0 > --- > > Key: PIG-970 > URL: https://issues.apache.org/jira/browse/PIG-970 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.3.0 >Reporter: Vincent BARAT >Assignee: Jeff Zhang > Fix For: 0.5.0 > > Attachments: build.xml.path, hbase-0.20.0-test.jar, hbase-0.20.0.jar, > pig-hbase-0.20.0-support.patch, pig-hbase-20-v2.patch, > TEST-org.apache.pig.test.TestHBaseStorage.txt, > TEST-org.apache.pig.test.TestHBaseStorage.txt, > TEST-org.apache.pig.test.TestHBaseStorage.txt, test-output.tgz, > zookeeper-hbase-1329.jar > > > The support of HBase is currently very limited and restricted to HBase 0.18.0. > Because the next releases of PIG will support Hadoop 0.20.0, they should also > support HBase 0.20.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1036) Fragment-replicate left outer join
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773149#action_12773149 ] Ankit Modi commented on PIG-1036: - Also the the patch fixes two wrong error codes in {code}LogToPhyTranslationVisitor.updateWithEmptyBagCheck{code} {code} int errCode = 1109; // was 1105 String msg = "Input (" + joinInput.getAlias() + ") " + "on which outer join is desired should have a valid schema"; } catch (FrontendException e) { int errCode = 2104; // was 2014 {code} > Fragment-replicate left outer join > -- > > Key: PIG-1036 > URL: https://issues.apache.org/jira/browse/PIG-1036 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Ankit Modi > Attachments: LeftOuterFRJoin.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-970) Support of HBase 0.20.0
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773297#action_12773297 ] Jeff Zhang commented on PIG-970: yes, Alan, Could you attach the whole log including the logs of task tracker Thank you. > Support of HBase 0.20.0 > --- > > Key: PIG-970 > URL: https://issues.apache.org/jira/browse/PIG-970 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.3.0 >Reporter: Vincent BARAT >Assignee: Jeff Zhang > Fix For: 0.5.0 > > Attachments: build.xml.path, hbase-0.20.0-test.jar, hbase-0.20.0.jar, > pig-hbase-0.20.0-support.patch, pig-hbase-20-v2.patch, > Pig_HBase_0.20.0.patch, TEST-org.apache.pig.test.TestHBaseStorage.txt, > TEST-org.apache.pig.test.TestHBaseStorage.txt, zookeeper-hbase-1329.jar > > > The support of HBase is currently very limited and restricted to HBase 0.18.0. > Because the next releases of PIG will support Hadoop 0.20.0, they should also > support HBase 0.20.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1036) Fragment-replicate left outer join
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1036: Status: Open (was: Patch Available) > Fragment-replicate left outer join > -- > > Key: PIG-1036 > URL: https://issues.apache.org/jira/browse/PIG-1036 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Ankit Modi > Attachments: LeftOuterFRJoin.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-966) Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces
[ https://issues.apache.org/jira/browse/PIG-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773295#action_12773295 ] Pradeep Kamath commented on PIG-966: I have updated http://wiki.apache.org/pig/LoadStoreRedesignProposal with some changes in the interfaces and recorded the reasons in the "Changes" section at the bottom of the page. I have also cleaned up the topic a bit and added few new sections giving details of the implementation so far in the branch. I have also added a list of remaining task items. Please review and provide comments - also it would be good to keep this topic up-to-date with changes discussed here and implemented on the branch. > Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces > --- > > Key: PIG-966 > URL: https://issues.apache.org/jira/browse/PIG-966 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Alan Gates >Assignee: Alan Gates > > I propose that we rework the LoadFunc, StoreFunc, and Slice/r interfaces > significantly. See http://wiki.apache.org/pig/LoadStoreRedesignProposal for > full details -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1036) Fragment-replicate left outer join
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1036: Attachment: (was: LeftOuterFRJoin.patch) > Fragment-replicate left outer join > -- > > Key: PIG-1036 > URL: https://issues.apache.org/jira/browse/PIG-1036 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Ankit Modi > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-997) [zebra] Sorted Table Support by Zebra
[ https://issues.apache.org/jira/browse/PIG-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773192#action_12773192 ] Alan Gates commented on PIG-997: After applying this patch TestColumnSecurity fails. The output of the failed test is: Testsuite: org.apache.hadoop.zebra.types.TestColumnSecurity Tests run: 0, Failures: 0, Errors: 1, Time elapsed: 0.15 sec - Standard Output --- SUPERUSER NAME: gates - --- - Standard Error - log4j:WARN No appenders could be found for logger (org.apache.hadoop.conf.Configuration). log4j:WARN Please initialize the log4j system properly. - --- Testcase: org.apache.hadoop.zebra.types.TestColumnSecurity took 0 sec Caused an ERROR chmod: cannot access `/user/jing1234': No such file or directory org.apache.hadoop.util.Shell$ExitCodeException: chmod: cannot access `/user/jing1234': No such file or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:195) at org.apache.hadoop.util.Shell.run(Shell.java:134) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286) at org.apache.hadoop.util.Shell.execCommand(Shell.java:354) at org.apache.hadoop.util.Shell.execCommand(Shell.java:337) at org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:481) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:473) at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:280) at org.apache.hadoop.zebra.types.TestColumnSecurity.setUpOnce(TestColumnSecurity.java:105) > [zebra] Sorted Table Support by Zebra > - > > Key: PIG-997 > URL: https://issues.apache.org/jira/browse/PIG-997 > Project: Pig > Issue Type: New Feature >Reporter: Yan Zhou >Assignee: Yan Zhou > Fix For: 0.6.0 > > Attachments: SortedTable.patch, SortedTable.patch, SortedTable.patch > > > This new feature is for Zebra to support sorted data in storage. As a storage > library, Zebra will not sort the data by itself. But it will support creation > and use of sorted data either through PIG or through map/reduce tasks that > use Zebra as storage format. > The sorted table keeps the data in a "totally sorted" manner across all > TFiles created by potentially all mappers or reducers. > For sorted data creation through PIG's STORE operator , if the input data is > sorted through "ORDER BY", the new Zebra table will be marked as sorted on > the sorted columns; > For sorted data creation though Map/Reduce tasks, three new static methods > of the BasicTableOutput class will be provided to allow or help the user to > achieve the goal. "setSortInfo" allows the user to specify the sorted columns > of the input tuple to be stored; "getSortKeyGenerator" and "getSortKey" help > the user to generate the key acceptable by Zebra as a sorted key based upon > the schema, sorted columns and the input tuple. > For sorted data read through PIG's LOAD operator, pass string "sorted" as an > extra argument to the TableLoader constructor to ask for sorted table to be > loaded; > For sorted data read through Map/Reduce tasks, a new static method of > TableInputFormat class, requireSortedTable, can be called to ask for a sorted > table to be read. Additionally, an overloaded version of the new method can > be called to ask for a sorted table on specified sort columns and comparator. > For this release, sorted table only supported sorting in ascending order, not > in descending order. In addition, the sort keys must be of simple types not > complex types such as RECORD, COLLECTION and MAP. > Multiple-key sorting is supported. But the ordering of the multiple sort keys > is significant with the first sort column being the primary sort key, the > second being the secondary sort key, etc. > In this release, the sort keys are stored along with the sort columns where > the keys were originally created from, resulting in some data storage > redundancy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1058) FINDBUGS: remaining "Correctness Warnings"
[ https://issues.apache.org/jira/browse/PIG-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1058: Resolution: Fixed Status: Resolved (was: Patch Available) patch committed. Thanks, Pradeep, for help with resolving one of the findbugs issues! > FINDBUGS: remaining "Correctness Warnings" > -- > > Key: PIG-1058 > URL: https://issues.apache.org/jira/browse/PIG-1058 > Project: Pig > Issue Type: Improvement >Reporter: Olga Natkovich >Assignee: Olga Natkovich > Attachments: PIG-1058.patch, PIG-1058_v2.patch > > > BCImpossible cast from java.lang.Object[] to java.lang.String[] in > org.apache.pig.PigServer.listPaths(String) > ECCall to equals() comparing different types in > org.apache.pig.impl.plan.Operator.equals(Object) > GCjava.lang.Byte is incompatible with expected argument type > java.lang.Integer in > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator$LoRearrangeDiscoverer.visitLocalRearrange(POLocalRearrange) > ILThere is an apparent infinite recursive loop in > org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POCogroup$groupComparator.equals(Object) > INT Bad comparison of nonnegative value with -1 in > org.apache.tools.bzip2r.CBZip2InputStream.bsR(int) > INT Bad comparison of nonnegative value with -1 in > org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode() > INT Bad comparison of nonnegative value with -1 in > org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode() > MFField ConstantExpression.res masks field in superclass > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator > Nm > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitSplit(POSplit) > doesn't override method in superclass because parameter type > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit > doesn't match superclass parameter type > org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit > Nm > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.NoopStoreRemover$PhysicalRemover.visitSplit(POSplit) > doesn't override method in superclass because parameter type > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit > doesn't match superclass parameter type > org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit > NPPossible null pointer dereference of ? in > org.apache.pig.impl.logicalLayer.optimizer.PushDownForeachFlatten.check(List) > NPPossible null pointer dereference of lo in > org.apache.pig.impl.logicalLayer.optimizer.StreamOptimizer.transform(List) > NPPossible null pointer dereference of > Schema$FieldSchema.Schema$FieldSchema.alias in > org.apache.pig.impl.logicalLayer.schema.Schema.equals(Schema, Schema, > boolean, boolean) > NPPossible null pointer dereference of Schema$FieldSchema.alias in > org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.equals(Schema$FieldSchema, > Schema$FieldSchema, boolean, boolean) > NPPossible null pointer dereference of inp in > org.apache.pig.impl.streaming.ExecutableManager$ProcessInputThread.run() > RCN Nullcheck of pigContext at line 123 of value previously dereferenced in > org.apache.pig.impl.util.JarManager.createJar(OutputStream, List, PigContext) > RV > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.fixUpDomain(String, > Properties) ignores return value of java.net.InetAddress.getByName(String) > RVBad attempt to compute absolute value of signed 32-bit hashcode in > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.SkewedPartitioner.getPartition(PigNullableWritable, > Writable, int) > RVBad attempt to compute absolute value of signed 32-bit hashcode in > org.apache.pig.impl.plan.DotPlanDumper.getID(Operator) > UwF Field only ever set to null: > org.apache.pig.impl.builtin.MergeJoinIndexer.dummyTuple -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-970) Support of HBase 0.20.0
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773309#action_12773309 ] Jeff Zhang commented on PIG-970: Alan, do you have file hbase-site.xml in folder test ? ( I put it in my patch) Because I look into the logs and find that the map task is attempting to connect to zookeeper at port 2181, but the the port of MiniZookeeperCluster is 21810. So there should be a file hbase-site.xml in folder test to override the configuration just like they did in hbase trunk. > Support of HBase 0.20.0 > --- > > Key: PIG-970 > URL: https://issues.apache.org/jira/browse/PIG-970 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.3.0 >Reporter: Vincent BARAT >Assignee: Jeff Zhang > Fix For: 0.5.0 > > Attachments: build.xml.path, hbase-0.20.0-test.jar, hbase-0.20.0.jar, > pig-hbase-0.20.0-support.patch, pig-hbase-20-v2.patch, > Pig_HBase_0.20.0.patch, TEST-org.apache.pig.test.TestHBaseStorage.txt, > TEST-org.apache.pig.test.TestHBaseStorage.txt, > TEST-org.apache.pig.test.TestHBaseStorage.txt, test-output.tgz, > zookeeper-hbase-1329.jar > > > The support of HBase is currently very limited and restricted to HBase 0.18.0. > Because the next releases of PIG will support Hadoop 0.20.0, they should also > support HBase 0.20.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-997) [zebra] Sorted Table Support by Zebra
[ https://issues.apache.org/jira/browse/PIG-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-997: - Status: Open (was: Patch Available) The failure is due to a misplaced test in the nightly suite. I'm going to exclude that in next patch. > [zebra] Sorted Table Support by Zebra > - > > Key: PIG-997 > URL: https://issues.apache.org/jira/browse/PIG-997 > Project: Pig > Issue Type: New Feature >Reporter: Yan Zhou >Assignee: Yan Zhou > Fix For: 0.6.0 > > Attachments: SortedTable.patch, SortedTable.patch, SortedTable.patch > > > This new feature is for Zebra to support sorted data in storage. As a storage > library, Zebra will not sort the data by itself. But it will support creation > and use of sorted data either through PIG or through map/reduce tasks that > use Zebra as storage format. > The sorted table keeps the data in a "totally sorted" manner across all > TFiles created by potentially all mappers or reducers. > For sorted data creation through PIG's STORE operator , if the input data is > sorted through "ORDER BY", the new Zebra table will be marked as sorted on > the sorted columns; > For sorted data creation though Map/Reduce tasks, three new static methods > of the BasicTableOutput class will be provided to allow or help the user to > achieve the goal. "setSortInfo" allows the user to specify the sorted columns > of the input tuple to be stored; "getSortKeyGenerator" and "getSortKey" help > the user to generate the key acceptable by Zebra as a sorted key based upon > the schema, sorted columns and the input tuple. > For sorted data read through PIG's LOAD operator, pass string "sorted" as an > extra argument to the TableLoader constructor to ask for sorted table to be > loaded; > For sorted data read through Map/Reduce tasks, a new static method of > TableInputFormat class, requireSortedTable, can be called to ask for a sorted > table to be read. Additionally, an overloaded version of the new method can > be called to ask for a sorted table on specified sort columns and comparator. > For this release, sorted table only supported sorting in ascending order, not > in descending order. In addition, the sort keys must be of simple types not > complex types such as RECORD, COLLECTION and MAP. > Multiple-key sorting is supported. But the ordering of the multiple sort keys > is significant with the first sort column being the primary sort key, the > second being the secondary sort key, etc. > In this release, the sort keys are stored along with the sort columns where > the keys were originally created from, resulting in some data storage > redundancy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-970) Support of HBase 0.20.0
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773339#action_12773339 ] Jeff Zhang commented on PIG-970: Well, it's weird. Alan, could check again that the pig-0.6.0-dev-withouthadoop.jar have file hbase-site.xml, and in this file hbase.zookeeper.property.clientPort is set to 21810 ? > Support of HBase 0.20.0 > --- > > Key: PIG-970 > URL: https://issues.apache.org/jira/browse/PIG-970 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.3.0 >Reporter: Vincent BARAT >Assignee: Jeff Zhang > Fix For: 0.5.0 > > Attachments: build.xml.path, hbase-0.20.0-test.jar, hbase-0.20.0.jar, > pig-hbase-0.20.0-support.patch, pig-hbase-20-v2.patch, > Pig_HBase_0.20.0.patch, TEST-org.apache.pig.test.TestHBaseStorage.txt, > TEST-org.apache.pig.test.TestHBaseStorage.txt, > TEST-org.apache.pig.test.TestHBaseStorage.txt, test-output.tgz, > zookeeper-hbase-1329.jar > > > The support of HBase is currently very limited and restricted to HBase 0.18.0. > Because the next releases of PIG will support Hadoop 0.20.0, they should also > support HBase 0.20.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1036) Fragment-replicate left outer join
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1036: Resolution: Fixed Fix Version/s: 0.6.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed, thanks Ankit! > Fragment-replicate left outer join > -- > > Key: PIG-1036 > URL: https://issues.apache.org/jira/browse/PIG-1036 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Ankit Modi > Fix For: 0.6.0 > > Attachments: LeftOuterFRJoin.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1002) FINDBUGS: BC: Equals method should not assume anything about the type of its argument
[ https://issues.apache.org/jira/browse/PIG-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-1002. - Resolution: Fixed this has been addressed in other JIRAs > FINDBUGS: BC: Equals method should not assume anything about the type of its > argument > -- > > Key: PIG-1002 > URL: https://issues.apache.org/jira/browse/PIG-1002 > Project: Pig > Issue Type: Bug >Reporter: Olga Natkovich >Assignee: Olga Natkovich > > BCEquals method for org.apache.pig.builtin.PigStorage assumes the > argument is of type PigStorage > BCEquals method for > org.apache.pig.impl.streaming.StreamingCommand$HandleSpec assumes the > argument is of type StreamingCommand$HandleSpec -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-970) Support of HBase 0.20.0
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773314#action_12773314 ] Alan Gates commented on PIG-970: Yes, it's there. > Support of HBase 0.20.0 > --- > > Key: PIG-970 > URL: https://issues.apache.org/jira/browse/PIG-970 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.3.0 >Reporter: Vincent BARAT >Assignee: Jeff Zhang > Fix For: 0.5.0 > > Attachments: build.xml.path, hbase-0.20.0-test.jar, hbase-0.20.0.jar, > pig-hbase-0.20.0-support.patch, pig-hbase-20-v2.patch, > Pig_HBase_0.20.0.patch, TEST-org.apache.pig.test.TestHBaseStorage.txt, > TEST-org.apache.pig.test.TestHBaseStorage.txt, > TEST-org.apache.pig.test.TestHBaseStorage.txt, test-output.tgz, > zookeeper-hbase-1329.jar > > > The support of HBase is currently very limited and restricted to HBase 0.18.0. > Because the next releases of PIG will support Hadoop 0.20.0, they should also > support HBase 0.20.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-958) Splitting output data on key field
[ https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773181#action_12773181 ] Pradeep Kamath commented on PIG-958: bq. 2. Deleting the temporary directory manually in finish(), causes the job to fail. Removed the manual deletion. As a side effect, user specified PARENT output directory in the UDF will have empty part-* files. These should be deleted manually by the user. Can you explain this a little more - been long since I last looked at the code - there seems to be some mv and this deletion happening - if you can explain that part too it would be helpful Otherwise looks good. > Splitting output data on key field > -- > > Key: PIG-958 > URL: https://issues.apache.org/jira/browse/PIG-958 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Ankur > Attachments: 958.v3.patch, 958.v4.patch > > > Pig users often face the need to split the output records into a bunch of > files and directories depending on the type of record. Pig's SPLIT operator > is useful when record types are few and known in advance. In cases where type > is not directly known but is derived dynamically from values of a key field > in the output tuple, a custom store function is a better solution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1058) FINDBUGS: remaining "Correctness Warnings"
[ https://issues.apache.org/jira/browse/PIG-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1058: Status: Patch Available (was: Open) > FINDBUGS: remaining "Correctness Warnings" > -- > > Key: PIG-1058 > URL: https://issues.apache.org/jira/browse/PIG-1058 > Project: Pig > Issue Type: Improvement >Reporter: Olga Natkovich >Assignee: Olga Natkovich > Attachments: PIG-1058.patch, PIG-1058_v2.patch > > > BCImpossible cast from java.lang.Object[] to java.lang.String[] in > org.apache.pig.PigServer.listPaths(String) > ECCall to equals() comparing different types in > org.apache.pig.impl.plan.Operator.equals(Object) > GCjava.lang.Byte is incompatible with expected argument type > java.lang.Integer in > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator$LoRearrangeDiscoverer.visitLocalRearrange(POLocalRearrange) > ILThere is an apparent infinite recursive loop in > org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POCogroup$groupComparator.equals(Object) > INT Bad comparison of nonnegative value with -1 in > org.apache.tools.bzip2r.CBZip2InputStream.bsR(int) > INT Bad comparison of nonnegative value with -1 in > org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode() > INT Bad comparison of nonnegative value with -1 in > org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode() > MFField ConstantExpression.res masks field in superclass > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator > Nm > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitSplit(POSplit) > doesn't override method in superclass because parameter type > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit > doesn't match superclass parameter type > org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit > Nm > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.NoopStoreRemover$PhysicalRemover.visitSplit(POSplit) > doesn't override method in superclass because parameter type > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit > doesn't match superclass parameter type > org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit > NPPossible null pointer dereference of ? in > org.apache.pig.impl.logicalLayer.optimizer.PushDownForeachFlatten.check(List) > NPPossible null pointer dereference of lo in > org.apache.pig.impl.logicalLayer.optimizer.StreamOptimizer.transform(List) > NPPossible null pointer dereference of > Schema$FieldSchema.Schema$FieldSchema.alias in > org.apache.pig.impl.logicalLayer.schema.Schema.equals(Schema, Schema, > boolean, boolean) > NPPossible null pointer dereference of Schema$FieldSchema.alias in > org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.equals(Schema$FieldSchema, > Schema$FieldSchema, boolean, boolean) > NPPossible null pointer dereference of inp in > org.apache.pig.impl.streaming.ExecutableManager$ProcessInputThread.run() > RCN Nullcheck of pigContext at line 123 of value previously dereferenced in > org.apache.pig.impl.util.JarManager.createJar(OutputStream, List, PigContext) > RV > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.fixUpDomain(String, > Properties) ignores return value of java.net.InetAddress.getByName(String) > RVBad attempt to compute absolute value of signed 32-bit hashcode in > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.SkewedPartitioner.getPartition(PigNullableWritable, > Writable, int) > RVBad attempt to compute absolute value of signed 32-bit hashcode in > org.apache.pig.impl.plan.DotPlanDumper.getID(Operator) > UwF Field only ever set to null: > org.apache.pig.impl.builtin.MergeJoinIndexer.dummyTuple -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1058) FINDBUGS: remaining "Correctness Warnings"
[ https://issues.apache.org/jira/browse/PIG-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1058: Status: Open (was: Patch Available) > FINDBUGS: remaining "Correctness Warnings" > -- > > Key: PIG-1058 > URL: https://issues.apache.org/jira/browse/PIG-1058 > Project: Pig > Issue Type: Improvement >Reporter: Olga Natkovich >Assignee: Olga Natkovich > Attachments: PIG-1058.patch, PIG-1058_v2.patch > > > BCImpossible cast from java.lang.Object[] to java.lang.String[] in > org.apache.pig.PigServer.listPaths(String) > ECCall to equals() comparing different types in > org.apache.pig.impl.plan.Operator.equals(Object) > GCjava.lang.Byte is incompatible with expected argument type > java.lang.Integer in > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator$LoRearrangeDiscoverer.visitLocalRearrange(POLocalRearrange) > ILThere is an apparent infinite recursive loop in > org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POCogroup$groupComparator.equals(Object) > INT Bad comparison of nonnegative value with -1 in > org.apache.tools.bzip2r.CBZip2InputStream.bsR(int) > INT Bad comparison of nonnegative value with -1 in > org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode() > INT Bad comparison of nonnegative value with -1 in > org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode() > MFField ConstantExpression.res masks field in superclass > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator > Nm > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitSplit(POSplit) > doesn't override method in superclass because parameter type > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit > doesn't match superclass parameter type > org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit > Nm > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.NoopStoreRemover$PhysicalRemover.visitSplit(POSplit) > doesn't override method in superclass because parameter type > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit > doesn't match superclass parameter type > org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit > NPPossible null pointer dereference of ? in > org.apache.pig.impl.logicalLayer.optimizer.PushDownForeachFlatten.check(List) > NPPossible null pointer dereference of lo in > org.apache.pig.impl.logicalLayer.optimizer.StreamOptimizer.transform(List) > NPPossible null pointer dereference of > Schema$FieldSchema.Schema$FieldSchema.alias in > org.apache.pig.impl.logicalLayer.schema.Schema.equals(Schema, Schema, > boolean, boolean) > NPPossible null pointer dereference of Schema$FieldSchema.alias in > org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.equals(Schema$FieldSchema, > Schema$FieldSchema, boolean, boolean) > NPPossible null pointer dereference of inp in > org.apache.pig.impl.streaming.ExecutableManager$ProcessInputThread.run() > RCN Nullcheck of pigContext at line 123 of value previously dereferenced in > org.apache.pig.impl.util.JarManager.createJar(OutputStream, List, PigContext) > RV > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.fixUpDomain(String, > Properties) ignores return value of java.net.InetAddress.getByName(String) > RVBad attempt to compute absolute value of signed 32-bit hashcode in > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.SkewedPartitioner.getPartition(PigNullableWritable, > Writable, int) > RVBad attempt to compute absolute value of signed 32-bit hashcode in > org.apache.pig.impl.plan.DotPlanDumper.getID(Operator) > UwF Field only ever set to null: > org.apache.pig.impl.builtin.MergeJoinIndexer.dummyTuple -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1058) FINDBUGS: remaining "Correctness Warnings"
[ https://issues.apache.org/jira/browse/PIG-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1058: Attachment: PIG-1058_v2.patch Addressed unit test failures > FINDBUGS: remaining "Correctness Warnings" > -- > > Key: PIG-1058 > URL: https://issues.apache.org/jira/browse/PIG-1058 > Project: Pig > Issue Type: Improvement >Reporter: Olga Natkovich >Assignee: Olga Natkovich > Attachments: PIG-1058.patch, PIG-1058_v2.patch > > > BCImpossible cast from java.lang.Object[] to java.lang.String[] in > org.apache.pig.PigServer.listPaths(String) > ECCall to equals() comparing different types in > org.apache.pig.impl.plan.Operator.equals(Object) > GCjava.lang.Byte is incompatible with expected argument type > java.lang.Integer in > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator$LoRearrangeDiscoverer.visitLocalRearrange(POLocalRearrange) > ILThere is an apparent infinite recursive loop in > org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POCogroup$groupComparator.equals(Object) > INT Bad comparison of nonnegative value with -1 in > org.apache.tools.bzip2r.CBZip2InputStream.bsR(int) > INT Bad comparison of nonnegative value with -1 in > org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode() > INT Bad comparison of nonnegative value with -1 in > org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode() > MFField ConstantExpression.res masks field in superclass > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator > Nm > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitSplit(POSplit) > doesn't override method in superclass because parameter type > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit > doesn't match superclass parameter type > org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit > Nm > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.NoopStoreRemover$PhysicalRemover.visitSplit(POSplit) > doesn't override method in superclass because parameter type > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit > doesn't match superclass parameter type > org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit > NPPossible null pointer dereference of ? in > org.apache.pig.impl.logicalLayer.optimizer.PushDownForeachFlatten.check(List) > NPPossible null pointer dereference of lo in > org.apache.pig.impl.logicalLayer.optimizer.StreamOptimizer.transform(List) > NPPossible null pointer dereference of > Schema$FieldSchema.Schema$FieldSchema.alias in > org.apache.pig.impl.logicalLayer.schema.Schema.equals(Schema, Schema, > boolean, boolean) > NPPossible null pointer dereference of Schema$FieldSchema.alias in > org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.equals(Schema$FieldSchema, > Schema$FieldSchema, boolean, boolean) > NPPossible null pointer dereference of inp in > org.apache.pig.impl.streaming.ExecutableManager$ProcessInputThread.run() > RCN Nullcheck of pigContext at line 123 of value previously dereferenced in > org.apache.pig.impl.util.JarManager.createJar(OutputStream, List, PigContext) > RV > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.fixUpDomain(String, > Properties) ignores return value of java.net.InetAddress.getByName(String) > RVBad attempt to compute absolute value of signed 32-bit hashcode in > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.SkewedPartitioner.getPartition(PigNullableWritable, > Writable, int) > RVBad attempt to compute absolute value of signed 32-bit hashcode in > org.apache.pig.impl.plan.DotPlanDumper.getID(Operator) > UwF Field only ever set to null: > org.apache.pig.impl.builtin.MergeJoinIndexer.dummyTuple -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Pig 0.5.0 is released!
Pig Team is happy to announce Pig 0.5.0 release! Pig is a Hadoop subproject that provides high-level data-flow language and an execution framework for parallel computation on a Hadoop cluster. More details about Pig can be found at http://hadoop.apache.org/pig/. This release makes functionality of Pig 0.4.0 available on Hadoop 20 clusters. The details of the release are available at http://hadoop.apache.org/pig/releases.html Olga
[jira] Commented: (PIG-1036) Fragment-replicate left outer join
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773229#action_12773229 ] Pradeep Kamath commented on PIG-1036: - +1, will commit once hudson QA comes back. > Fragment-replicate left outer join > -- > > Key: PIG-1036 > URL: https://issues.apache.org/jira/browse/PIG-1036 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Ankit Modi > Attachments: LeftOuterFRJoin.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1036) Fragment-replicate left outer join
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1036: Status: Patch Available (was: Open) > Fragment-replicate left outer join > -- > > Key: PIG-1036 > URL: https://issues.apache.org/jira/browse/PIG-1036 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Ankit Modi > Attachments: LeftOuterFRJoin.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1071) Support comma separated file/directory names in load statements
[ https://issues.apache.org/jira/browse/PIG-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding reassigned PIG-1071: - Assignee: Richard Ding > Support comma separated file/directory names in load statements > --- > > Key: PIG-1071 > URL: https://issues.apache.org/jira/browse/PIG-1071 > Project: Pig > Issue Type: New Feature >Reporter: Richard Ding >Assignee: Richard Ding > > Currently Pig Latin support following LOAD syntax: > {code} > LOAD 'data' [USING loader function] [AS schema]; > {code} > where data is the name of the file or directory, including files specified > with Hadoop-supported globing syntax. This name is passed to the loader > function. > This feature is to support loaders that can load multiple files from > different directories and allows users to pass in the file names in a comma > separated string. > For example, these will be valid load statements: > {code} > LOAD '/usr/pig/test1/a,/usr/pig/test2/b' USING someloader()'; > {code} > and > {code} > LOAD '/usr/pig/test1/{a,c},/usr/pig/test2/b' USING someloader(); > {code} > This comma separated string is passed to the loader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1071) Support comma separated file/directory names in load statements
Support comma separated file/directory names in load statements --- Key: PIG-1071 URL: https://issues.apache.org/jira/browse/PIG-1071 Project: Pig Issue Type: New Feature Reporter: Richard Ding Currently Pig Latin support following LOAD syntax: {code} LOAD 'data' [USING loader function] [AS schema]; {code} where data is the name of the file or directory, including files specified with Hadoop-supported globing syntax. This name is passed to the loader function. This feature is to support loaders that can load multiple files from different directories and allows users to pass in the file names in a comma separated string. For example, these will be valid load statements: {code} LOAD '/usr/pig/test1/a,/usr/pig/test2/b' USING someloader()'; {code} and {code} LOAD '/usr/pig/test1/{a,c},/usr/pig/test2/b' USING someloader(); {code} This comma separated string is passed to the loader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-997) [zebra] Sorted Table Support by Zebra
[ https://issues.apache.org/jira/browse/PIG-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-997: - Attachment: SortedTable.patch > [zebra] Sorted Table Support by Zebra > - > > Key: PIG-997 > URL: https://issues.apache.org/jira/browse/PIG-997 > Project: Pig > Issue Type: New Feature >Reporter: Yan Zhou >Assignee: Yan Zhou > Fix For: 0.6.0 > > Attachments: SortedTable.patch, SortedTable.patch, SortedTable.patch, > SortedTable.patch > > > This new feature is for Zebra to support sorted data in storage. As a storage > library, Zebra will not sort the data by itself. But it will support creation > and use of sorted data either through PIG or through map/reduce tasks that > use Zebra as storage format. > The sorted table keeps the data in a "totally sorted" manner across all > TFiles created by potentially all mappers or reducers. > For sorted data creation through PIG's STORE operator , if the input data is > sorted through "ORDER BY", the new Zebra table will be marked as sorted on > the sorted columns; > For sorted data creation though Map/Reduce tasks, three new static methods > of the BasicTableOutput class will be provided to allow or help the user to > achieve the goal. "setSortInfo" allows the user to specify the sorted columns > of the input tuple to be stored; "getSortKeyGenerator" and "getSortKey" help > the user to generate the key acceptable by Zebra as a sorted key based upon > the schema, sorted columns and the input tuple. > For sorted data read through PIG's LOAD operator, pass string "sorted" as an > extra argument to the TableLoader constructor to ask for sorted table to be > loaded; > For sorted data read through Map/Reduce tasks, a new static method of > TableInputFormat class, requireSortedTable, can be called to ask for a sorted > table to be read. Additionally, an overloaded version of the new method can > be called to ask for a sorted table on specified sort columns and comparator. > For this release, sorted table only supported sorting in ascending order, not > in descending order. In addition, the sort keys must be of simple types not > complex types such as RECORD, COLLECTION and MAP. > Multiple-key sorting is supported. But the ordering of the multiple sort keys > is significant with the first sort column being the primary sort key, the > second being the secondary sort key, etc. > In this release, the sort keys are stored along with the sort columns where > the keys were originally created from, resulting in some data storage > redundancy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: two-level access problem?
>From comments in Schema.java: // In bags which have a schema with a tuple which contains // the fields present in it, if we access the second field (say) // we are actually trying to access the second field in the // tuple in the bag. This is currently true for two cases: // 1) bag constants - the schema of bag constant has a tuple // which internally has the actual elements // 2) When bags are loaded from input data, if the user // specifies a schema with the "bag" type, he has to specify // the bag as containing a tuple with the actual elements in // the schema declaration. However in both the cases above, // the user can still say b.i where b is the bag and i is // an element in the bag's tuple schema. So in these cases, // the access should translate to a lookup for "i" in the // tuple schema present in the bag. To indicate this, the // flag below is used. It is false by default because, // currently we use bag as the type for relations. However // the schema of a relation does NOT have a tuple fieldschema // with items in it. Instead, the schema directly has the // field schema of the items. So for a relation "b", the // above b.i access would be a direct single level access // of i in b's schema. This is treated as the "default" case private boolean twoLevelAccessRequired = false; -Original Message- From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] Sent: Monday, November 02, 2009 5:33 PM To: pig-dev@hadoop.apache.org Subject: two-level access problem? Could someone explain the nature of the "two-level access problem" referred to in the Load/Store redesign wiki and in the DataType code? Thanks, -D
Re: two-level access problem?
Thanks Pradeep, I saw that comment. I guess my question is, given the solution this comment describes, what are you referring to in the Load/Store redesign doc when you say "we must fix the two level access issues with schema of bags in current schema before we make these changes, otherwise that same contagion will afflict us here?" -D On Tue, Nov 3, 2009 at 2:10 PM, Pradeep Kamath wrote: > From comments in Schema.java: > // In bags which have a schema with a tuple which contains > // the fields present in it, if we access the second field (say) > // we are actually trying to access the second field in the > // tuple in the bag. This is currently true for two cases: > // 1) bag constants - the schema of bag constant has a tuple > // which internally has the actual elements > // 2) When bags are loaded from input data, if the user > // specifies a schema with the "bag" type, he has to specify > // the bag as containing a tuple with the actual elements in > // the schema declaration. However in both the cases above, > // the user can still say b.i where b is the bag and i is > // an element in the bag's tuple schema. So in these cases, > // the access should translate to a lookup for "i" in the > // tuple schema present in the bag. To indicate this, the > // flag below is used. It is false by default because, > // currently we use bag as the type for relations. However > // the schema of a relation does NOT have a tuple fieldschema > // with items in it. Instead, the schema directly has the > // field schema of the items. So for a relation "b", the > // above b.i access would be a direct single level access > // of i in b's schema. This is treated as the "default" case > private boolean twoLevelAccessRequired = false; > > -Original Message- > From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] > Sent: Monday, November 02, 2009 5:33 PM > To: pig-dev@hadoop.apache.org > Subject: two-level access problem? > > Could someone explain the nature of the "two-level access problem" > referred to in the Load/Store redesign wiki and in the DataType code? > > > Thanks, > -D >
[jira] Commented: (PIG-958) Splitting output data on key field
[ https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773184#action_12773184 ] Pradeep Kamath commented on PIG-958: I saw compile errors while trying to run unit test: {noformat} [..contrib/piggybank/java]ant test .. [javac] /homes/pradeepk/dev/pig-commit/PIG-958.v4/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestMultiStorage.java:44: cannot find symbol [javac] symbol : variable MiniCluster [javac] location: class org.apache.pig.piggybank.test.storage.TestMultiStorage [javac] private MiniCluster cluster = MiniCluster.buildCluster(); [javac] ^ [javac] /homes/pradeepk/dev/pig-commit/PIG-958.v4/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestMultiStorage.java:73: cannot find symbol [javac] symbol : variable Util [javac] location: class org.apache.pig.piggybank.test.storage.TestMultiStorage [javac] Util.deleteFile(cluster, INPUT_FILE); [javac] ^ [javac] /homes/pradeepk/dev/pig-commit/PIG-958.v4/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestMultiStorage.java:74: cannot find symbol [javac] symbol : variable Util [javac] location: class org.apache.pig.piggybank.test.storage.TestMultiStorage [javac] Util.copyFromLocalToCluster(cluster, INPUT_FILE, INPUT_FILE); [javac] ^ [javac] /homes/pradeepk/dev/pig-commit/PIG-958.v4/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestMultiStorage.java:96: cannot find symbol [javac] symbol : variable Util [javac] location: class org.apache.pig.piggybank.test.storage.TestMultiStorage [javac] Util.deleteFile(cluster, INPUT_FILE); [javac] ^ .. {noformat} > Splitting output data on key field > -- > > Key: PIG-958 > URL: https://issues.apache.org/jira/browse/PIG-958 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Ankur > Attachments: 958.v3.patch, 958.v4.patch > > > Pig users often face the need to split the output records into a bunch of > files and directories depending on the type of record. Pig's SPLIT operator > is useful when record types are few and known in advance. In cases where type > is not directly known but is derived dynamically from values of a key field > in the output tuple, a custom store function is a better solution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-970) Support of HBase 0.20.0
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773348#action_12773348 ] Alan Gates commented on PIG-970: afterside:~/src/pig/PIG-970-3/trunk> jar tf pig-withouthadoop.jar | grep hbase org/apache/pig/backend/hadoop/hbase/ org/apache/pig/backend/hadoop/hbase/HBaseSlice.class org/apache/pig/backend/hadoop/hbase/HBaseStorage.class > Support of HBase 0.20.0 > --- > > Key: PIG-970 > URL: https://issues.apache.org/jira/browse/PIG-970 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.3.0 >Reporter: Vincent BARAT >Assignee: Jeff Zhang > Fix For: 0.5.0 > > Attachments: build.xml.path, hbase-0.20.0-test.jar, hbase-0.20.0.jar, > pig-hbase-0.20.0-support.patch, pig-hbase-20-v2.patch, > Pig_HBase_0.20.0.patch, TEST-org.apache.pig.test.TestHBaseStorage.txt, > TEST-org.apache.pig.test.TestHBaseStorage.txt, > TEST-org.apache.pig.test.TestHBaseStorage.txt, test-output.tgz, > zookeeper-hbase-1329.jar > > > The support of HBase is currently very limited and restricted to HBase 0.18.0. > Because the next releases of PIG will support Hadoop 0.20.0, they should also > support HBase 0.20.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: LoadFunc.skipNext() function for faster sampling ?
We definitely want to avoid parsing every tuple when sampling. But do we need to implement a special function for it? Pig will have access to the InputFormat instance, correct? Can it not call InputFormat.getNext the desired number of times (which will not parse the tuple) and then call LoadFunc.getNext to get the next parsed tuple? Alan. On Nov 3, 2009, at 4:28 PM, Thejas Nair wrote: In the new implementation of SampleLoader subclasses (used by order- by, skew-join ..) as part of the loader redesign, we are not only reading all the records input but also parsing them as pig tuples. This is because the SampleLoaders are wrappers around the actual input loaders specified in the query. We can make things much faster by having a skipNext() function (or skipNext(int numSkip) ) which will avoid parsing the record into a pig tuple. LoadFunc could optionally implement this (easy to implement) function (which will be part of an interface) for improving speed of queries such as order-by. -Thejas
[jira] Commented: (PIG-1036) Fragment-replicate left outer join
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773273#action_12773273 ] Hadoop QA commented on PIG-1036: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423944/LeftOuterFRJoin.patch against trunk revision 832086. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/137/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/137/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/137/console This message is automatically generated. > Fragment-replicate left outer join > -- > > Key: PIG-1036 > URL: https://issues.apache.org/jira/browse/PIG-1036 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Ankit Modi > Attachments: LeftOuterFRJoin.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-970) Support of HBase 0.20.0
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-970: --- Attachment: test-output.tgz TEST-org.apache.pig.test.TestHBaseStorage.txt Test run results plus logs. > Support of HBase 0.20.0 > --- > > Key: PIG-970 > URL: https://issues.apache.org/jira/browse/PIG-970 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.3.0 >Reporter: Vincent BARAT >Assignee: Jeff Zhang > Fix For: 0.5.0 > > Attachments: build.xml.path, hbase-0.20.0-test.jar, hbase-0.20.0.jar, > pig-hbase-0.20.0-support.patch, pig-hbase-20-v2.patch, > Pig_HBase_0.20.0.patch, TEST-org.apache.pig.test.TestHBaseStorage.txt, > TEST-org.apache.pig.test.TestHBaseStorage.txt, > TEST-org.apache.pig.test.TestHBaseStorage.txt, test-output.tgz, > zookeeper-hbase-1329.jar > > > The support of HBase is currently very limited and restricted to HBase 0.18.0. > Because the next releases of PIG will support Hadoop 0.20.0, they should also > support HBase 0.20.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1048) inner join using 'skewed' produces multiple rows for keys with single row in both input relations
[ https://issues.apache.org/jira/browse/PIG-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1277#action_1277 ] Alan Gates commented on PIG-1048: - When attempting to apply this patch to the 0.5 branch, I got the following error: Testcase: testSkewedJoinOneValue took 145.739 sec Caused an ERROR Unable to open iterator for alias E org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias E at org.apache.pig.PigServer.openIterator(PigServer.java:475) at org.apache.pig.test.TestSkewedJoin.testSkewedJoinOneValue(TestSkewedJoin.java:340) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.RuntimeException: Error in configuring object at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:237) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:181) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:209) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773) at org.apache.pig.PigServer.store(PigServer.java:522) at org.apache.pig.PigServer.openIterator(PigServer.java:458) > inner join using 'skewed' produces multiple rows for keys with single row in > both input relations > - > > Key: PIG-1048 > URL: https://issues.apache.org/jira/browse/PIG-1048 > Project: Pig > Issue Type: Bug >Reporter: Thejas M Nair >Assignee: Sriranjan Manjunath > Fix For: 0.6.0 > > Attachments: pig_1048.patch > > > ${code} > grunt> cat students.txt > asdfxc M 23 12.44 > qwerF 21 14.44 > uhsdf M 34 12.11 > zxldf M 21 12.56 > qwerF 23 145.5 > oiueM 54 23.33 > l1 = load 'students.txt'; > l2 = load 'students.txt'; > j = join l1 by $0, l2 by $0 ; > store j into 'tmp.txt' > grunt> cat tmp.txt > oiueM 54 23.33 oiueM 54 23.33 > oiueM 54 23.33 oiueM 54 23.33 > qwerF 21 14.44 qwerF 21 14.44 > qwerF 21 14.44 qwerF 23 145.5 > qwerF 23 145.5 qwerF 21 14.44 > qwerF 23 145.5 qwerF 23 145.5 > uhsdf M 34 12.11 uhsdf M 34 12.11 > uhsdf M 34 12.11 uhsdf M 34 12.11 > zxldf M 21 12.56 zxldf M 21 12.56 > zxldf M 21 12.56 zxldf M 21 12.56 > asdfxc M 23 12.44 asdfxc M 23 12.44 > asdfxc M 23 12.44 asdfxc M 23 12.44$ > ${code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-970) Support of HBase 0.20.0
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773103#action_12773103 ] Alan Gates commented on PIG-970: When I run TestHBaseStorage now I get: Testcase: testLoadFromHBase took 592.908 sec Caused an ERROR Unable to open iterator for alias a org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias a at org.apache.pig.PigServer.openIterator(PigServer.java:481) at org.apache.pig.test.TestHBaseStorage.testLoadFromHBase(TestHBaseStorage.java:170) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6015: During execution, encountered a Hadoop error. at .apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:922) at .apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:573) at .apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:555) at .apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:686) at .apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:582) at .apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:555) at .apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:686) at .apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:586) at .apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:549) at .apache.hadoop.hbase.client.HTable.(HTable.java:125) at .apache.pig.backend.hadoop.hbase.HBaseSlice.init(HBaseSlice.java:159) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper.makeReader(SliceWrapper.java:129) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getRecordReader(PigInputFormat.java:258) at .apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338) at .apache.hadoop.mapred.MapTask.run(MapTask.java:307) Caused by: org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region Let me know if you'd like to see the whole log. > Support of HBase 0.20.0 > --- > > Key: PIG-970 > URL: https://issues.apache.org/jira/browse/PIG-970 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.3.0 >Reporter: Vincent BARAT >Assignee: Jeff Zhang > Fix For: 0.5.0 > > Attachments: build.xml.path, hbase-0.20.0-test.jar, hbase-0.20.0.jar, > pig-hbase-0.20.0-support.patch, pig-hbase-20-v2.patch, > Pig_HBase_0.20.0.patch, TEST-org.apache.pig.test.TestHBaseStorage.txt, > TEST-org.apache.pig.test.TestHBaseStorage.txt, zookeeper-hbase-1329.jar > > > The support of HBase is currently very limited and restricted to HBase 0.18.0. > Because the next releases of PIG will support Hadoop 0.20.0, they should also > support HBase 0.20.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1036) Fragment-replicate left outer join
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1036: Attachment: LeftOuterFRJoin.patch Attaching a new patch. The join now only supports two way Left join. Join requires a schema to be mandatory be present on the right side, and it is used to determine the number of null fields/columns in nullTuple. As its a two way join we use nullBag instead of an Array of nullBag. A DataBag is used instead of a Tuple to maintain consistency on the result Type of ConstantExpression. > Fragment-replicate left outer join > -- > > Key: PIG-1036 > URL: https://issues.apache.org/jira/browse/PIG-1036 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Ankit Modi > Attachments: LeftOuterFRJoin.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
LoadFunc.skipNext() function for faster sampling ?
In the new implementation of SampleLoader subclasses (used by order-by, skew-join ..) as part of the loader redesign, we are not only reading all the records input but also parsing them as pig tuples. This is because the SampleLoaders are wrappers around the actual input loaders specified in the query. We can make things much faster by having a skipNext() function (or skipNext(int numSkip) ) which will avoid parsing the record into a pig tuple. LoadFunc could optionally implement this (easy to implement) function (which will be part of an interface) for improving speed of queries such as order-by. -Thejas
[jira] Commented: (PIG-1058) FINDBUGS: remaining "Correctness Warnings"
[ https://issues.apache.org/jira/browse/PIG-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773341#action_12773341 ] Hadoop QA commented on PIG-1058: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423961/PIG-1058_v2.patch against trunk revision 832086. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/39/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/39/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/39/console This message is automatically generated. > FINDBUGS: remaining "Correctness Warnings" > -- > > Key: PIG-1058 > URL: https://issues.apache.org/jira/browse/PIG-1058 > Project: Pig > Issue Type: Improvement >Reporter: Olga Natkovich >Assignee: Olga Natkovich > Attachments: PIG-1058.patch, PIG-1058_v2.patch > > > BCImpossible cast from java.lang.Object[] to java.lang.String[] in > org.apache.pig.PigServer.listPaths(String) > ECCall to equals() comparing different types in > org.apache.pig.impl.plan.Operator.equals(Object) > GCjava.lang.Byte is incompatible with expected argument type > java.lang.Integer in > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator$LoRearrangeDiscoverer.visitLocalRearrange(POLocalRearrange) > ILThere is an apparent infinite recursive loop in > org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POCogroup$groupComparator.equals(Object) > INT Bad comparison of nonnegative value with -1 in > org.apache.tools.bzip2r.CBZip2InputStream.bsR(int) > INT Bad comparison of nonnegative value with -1 in > org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode() > INT Bad comparison of nonnegative value with -1 in > org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode() > MFField ConstantExpression.res masks field in superclass > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator > Nm > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitSplit(POSplit) > doesn't override method in superclass because parameter type > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit > doesn't match superclass parameter type > org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit > Nm > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.NoopStoreRemover$PhysicalRemover.visitSplit(POSplit) > doesn't override method in superclass because parameter type > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit > doesn't match superclass parameter type > org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit > NPPossible null pointer dereference of ? in > org.apache.pig.impl.logicalLayer.optimizer.PushDownForeachFlatten.check(List) > NPPossible null pointer dereference of lo in > org.apache.pig.impl.logicalLayer.optimizer.StreamOptimizer.transform(List) > NPPossible null pointer dereference of > Schema$FieldSchema.Schema$FieldSchema.alias in > org.apache.pig.impl.logicalLayer.schema.Schema.equals(Schema, Schema, > boolean, boolean) > NPPossible null pointer dereference of Schema$FieldSchema.alias in > org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.equals(Schema$FieldSchema, > Schema$FieldSchema, boolean, boolean) > NPPossible null pointer dereference of inp in > org.apache.pig.impl.streaming.ExecutableManager$ProcessInputThread.run() > RCN Nullcheck of pigContext at line 123 of value previously dereferenced in > org.apache.pig.impl.util.JarManager.createJar(OutputStream, List, PigContext) > RV > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.fixUpDomain(String, > Properties) ignores return value of java.net.InetAddress.getByName(String) > RVBad a
[jira] Updated: (PIG-970) Support of HBase 0.20.0
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-970: --- Status: Open (was: Patch Available) > Support of HBase 0.20.0 > --- > > Key: PIG-970 > URL: https://issues.apache.org/jira/browse/PIG-970 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.3.0 >Reporter: Vincent BARAT >Assignee: Jeff Zhang > Fix For: 0.5.0 > > Attachments: build.xml.path, hbase-0.20.0-test.jar, hbase-0.20.0.jar, > pig-hbase-0.20.0-support.patch, pig-hbase-20-v2.patch, > Pig_HBase_0.20.0.patch, TEST-org.apache.pig.test.TestHBaseStorage.txt, > TEST-org.apache.pig.test.TestHBaseStorage.txt, > TEST-org.apache.pig.test.TestHBaseStorage.txt, test-output.tgz, > zookeeper-hbase-1329.jar > > > The support of HBase is currently very limited and restricted to HBase 0.18.0. > Because the next releases of PIG will support Hadoop 0.20.0, they should also > support HBase 0.20.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: LoadFunc.skipNext() function for faster sampling ?
Yes, that should work. I will use InputFormat.getNext from the SampleLoader to skip the records. Thanks, Thejas On 11/3/09 6:39 PM, "Alan Gates" wrote: > We definitely want to avoid parsing every tuple when sampling. But do > we need to implement a special function for it? Pig will have access > to the InputFormat instance, correct? Can it not call > InputFormat.getNext the desired number of times (which will not parse > the tuple) and then call LoadFunc.getNext to get the next parsed tuple? > > Alan. > > On Nov 3, 2009, at 4:28 PM, Thejas Nair wrote: > >> In the new implementation of SampleLoader subclasses (used by order- >> by, >> skew-join ..) as part of the loader redesign, we are not only >> reading all >> the records input but also parsing them as pig tuples. >> >> This is because the SampleLoaders are wrappers around the actual input >> loaders specified in the query. We can make things much faster by >> having a >> skipNext() function (or skipNext(int numSkip) ) which will avoid >> parsing the >> record into a pig tuple. >> LoadFunc could optionally implement this (easy to implement) >> function (which >> will be part of an interface) for improving speed of queries such as >> order-by. >> >> -Thejas >> >