[jira] Updated: (PIG-997) [zebra] Sorted Table Support by Zebra

2009-11-03 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-997:
-

Status: Patch Available  (was: Open)

> [zebra] Sorted Table Support by Zebra
> -
>
> Key: PIG-997
> URL: https://issues.apache.org/jira/browse/PIG-997
> Project: Pig
>  Issue Type: New Feature
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.6.0
>
> Attachments: SortedTable.patch, SortedTable.patch, SortedTable.patch, 
> SortedTable.patch
>
>
> This new feature is for Zebra to support sorted data in storage. As a storage 
> library, Zebra will not sort the data by itself. But it will support creation 
> and use of sorted data either through PIG  or through map/reduce tasks that 
> use Zebra as storage format.
> The sorted table keeps the data in a "totally sorted" manner across all 
> TFiles created by potentially all mappers or reducers.
> For sorted data creation through PIG's STORE operator ,  if the input data is 
> sorted through "ORDER BY", the new Zebra table will be marked as sorted on 
> the sorted columns;
> For sorted data creation though Map/Reduce tasks,  three new static methods 
> of the BasicTableOutput class will be provided to allow or help the user to 
> achieve the goal. "setSortInfo" allows the user to specify the sorted columns 
> of the input tuple to be stored; "getSortKeyGenerator" and "getSortKey" help 
> the user to generate the key acceptable by Zebra as a sorted key based upon 
> the schema, sorted columns and the input tuple.
> For sorted data read through PIG's LOAD operator, pass string "sorted" as an 
> extra argument to the TableLoader constructor to ask for sorted table to be 
> loaded;
> For sorted data read through Map/Reduce tasks, a new static method of 
> TableInputFormat class, requireSortedTable, can be called to ask for a sorted 
> table to be read. Additionally, an overloaded version of the new method can 
> be called to ask for a sorted table on specified sort columns and comparator.
> For this release, sorted table only supported sorting in ascending order, not 
> in descending order. In addition, the sort keys must be of simple types not 
> complex types such as RECORD, COLLECTION and MAP. 
> Multiple-key sorting is supported. But the ordering of the multiple sort keys 
> is significant with the first sort column being the primary sort key, the 
> second being the secondary sort key, etc.
> In this release, the sort keys are stored along with the sort columns where 
> the keys were originally created from, resulting in some data storage 
> redundancy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Problem running Pig 0.60

2009-11-03 Thread Yiping Han
Hi pig team,

I¹m testing zebra v2 and trying to run the pig 0.60 jar that I got from Yan.
However, I got the following error:

Caused by: java.lang.ClassNotFoundException: jline.ConsoleReaderInputStream
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)

Is there any additional jar file that I need to include with Hadoop or pig?


Thanks~
--
Yiping Han
y...@yahoo-inc.com
US phone: +1(408)349-4403
Beijing phone: +86(10)8215-9357 



[jira] Commented: (PIG-958) Splitting output data on key field

2009-11-03 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773389#action_12773389
 ] 

Ankur commented on PIG-958:
---

> Can you explain this a little bit more - ..
In the earlier patch (958.v3.patch), After moving the results from the tasks 
current working directory, I was manually deleting the directory. This is to 
ensure that empty part files don't get moved to the final output directory. But 
doing so causes hadoop to complain that it can no longer write to task's output 
dir and the task fails.

> I saw compile errors while trying to run unit test: ...
Did you compile the pig.jar  and ran core test before. This creates the 
necessary classes and jar file son the local machine required by contrib tests.

On my local machine
gan...@grainflydivide-dr:pig_trunk$ ant 
...
buildJar:
 [echo] svnString 830456
  [jar] Building jar: 
/home/gankur/eclipse/workspace/pig_trunk/build/pig-0.6.0-dev-core.jar
  [jar] Building jar: 
/home/gankur/eclipse/workspace/pig_trunk/build/pig-0.6.0-dev.jar
 [copy] Copying 1 file to /home/gankur/eclipse/workspace/pig_trunk

gan...@grainflydivide-dr:pig_trunk$ ant test
...
test-core:
   [delete] Deleting directory 
/home/gankur/eclipse/workspace/pig_trunk/build/test/logs
[mkdir] Created dir: 
/home/gankur/eclipse/workspace/pig_trunk/build/test/logs
[junit] Running org.apache.pig.test.TestAdd
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.024 sec
[junit] Running org.apache.pig.test.TestAlgebraicEval
...
gan...@grainflydivide-dr:pig_trunk$ cd contrib/piggybank/java/
gan...@grainflydivide-dr:java$ ant test
...
test:
 [echo]  *** Running UDF tests ***
   [delete] Deleting directory 
/home/gankur/eclipse/workspace/pig_trunk/contrib/piggybank/java/build/test/logs
[mkdir] Created dir: 
/home/gankur/eclipse/workspace/pig_trunk/contrib/piggybank/java/build/test/logs
[junit] Running org.apache.pig.piggybank.test.evaluation.TestEvalString
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.15 sec
[junit] Running org.apache.pig.piggybank.test.evaluation.TestMathUDF
[junit] Tests run: 35, Failures: 0, Errors: 0, Time elapsed: 0.123 sec
[junit] Running org.apache.pig.piggybank.test.evaluation.TestStat
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.114 sec
[junit] Running 
org.apache.pig.piggybank.test.evaluation.datetime.TestDiffDate
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.105 sec
[junit] Running org.apache.pig.piggybank.test.evaluation.decode.TestDecode
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.089 sec
[junit] Running org.apache.pig.piggybank.test.evaluation.string.TestHashFNV
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.094 sec
[junit] Running 
org.apache.pig.piggybank.test.evaluation.string.TestLookupInFiles
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 17.163 sec
[junit] Running org.apache.pig.piggybank.test.evaluation.string.TestRegex
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.092 sec
[junit] Running 
org.apache.pig.piggybank.test.evaluation.util.TestSearchQuery
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.093 sec
[junit] Running org.apache.pig.piggybank.test.evaluation.util.TestTop
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.099 sec
[junit] Running 
org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestDateExtractor
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.087 sec
[junit] Running 
org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestHostExtractor
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.083 sec
[junit] Running 
org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestSearchEngineExtractor
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.091 sec
[junit] Running 
org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestSearchTermExtractor
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.1 sec
[junit] Running org.apache.pig.piggybank.test.storage.TestCombinedLogLoader
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.535 sec
[junit] Running org.apache.pig.piggybank.test.storage.TestCommonLogLoader
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.54 sec
[junit] Running org.apache.pig.piggybank.test.storage.TestHelper
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.014 sec
[junit] Running org.apache.pig.piggybank.test.storage.TestMultiStorage
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 16.964 sec
[junit] Running org.apache.pig.piggybank.test.storage.TestMyRegExLoader
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.452 sec
[junit] Running org.apache

[jira] Updated: (PIG-970) Support of HBase 0.20.0

2009-11-03 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-970:
---

Attachment: Pig_HBase_0.20.0.patch

Alan, I find the problem.  Before in eclipse I put the output folder to 
build/classes which is conflict with the output folder in build.xml. So it 
hides the problem.

Now I add one line in build.xml:
 {code} {code}

So that the test case code can find hbase-site.xml in classpath.



> Support of HBase 0.20.0
> ---
>
> Key: PIG-970
> URL: https://issues.apache.org/jira/browse/PIG-970
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Vincent BARAT
>Assignee: Jeff Zhang
> Fix For: 0.5.0
>
> Attachments: build.xml.path, hbase-0.20.0-test.jar, hbase-0.20.0.jar, 
> pig-hbase-0.20.0-support.patch, pig-hbase-20-v2.patch, 
> Pig_HBase_0.20.0.patch, TEST-org.apache.pig.test.TestHBaseStorage.txt, 
> TEST-org.apache.pig.test.TestHBaseStorage.txt, 
> TEST-org.apache.pig.test.TestHBaseStorage.txt, test-output.tgz, 
> zookeeper-hbase-1329.jar
>
>
> The support of HBase is currently very limited and restricted to HBase 0.18.0.
> Because the next releases of PIG will support Hadoop 0.20.0, they should also 
> support HBase 0.20.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: two-level access problem?

2009-11-03 Thread Pradeep Kamath
The twoLevelAccessRequired flag is not quite a long term solution to the 
problem. The problem is that we treat output of relations to be bags but their 
schemas do NOT have twoLevelAccessRequired to be true. Only bag constants and 
bags from input data have this flag set to true. We need to move to either 
*all* bag schemas having a tuple schema with the real schema which reflects the 
layout of the bag or think of an alternative. Implementing the solution may 
have many more details which will need to be looked at. This flag should be 
removed and should not be needed once we arrive at a solution. Otherwise 
Resource Schema would also need to have this notion of two level access for bag 
fields.

Pradeep.

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] 
Sent: Tuesday, November 03, 2009 12:30 PM
To: pig-dev@hadoop.apache.org
Subject: Re: two-level access problem?

Thanks Pradeep,
I saw that comment. I guess my question is, given the solution this
comment describes, what are you referring to in the Load/Store
redesign doc when you say "we must fix the two level access issues
with schema of bags in current schema before we make these changes,
otherwise that same contagion will afflict us here?"

-D

On Tue, Nov 3, 2009 at 2:10 PM, Pradeep Kamath  wrote:
> From comments in Schema.java:
>    // In bags which have a schema with a tuple which contains
>    // the fields present in it, if we access the second field (say)
>    // we are actually trying to access the second field in the
>    // tuple in the bag. This is currently true for two cases:
>    // 1) bag constants - the schema of bag constant has a tuple
>    // which internally has the actual elements
>    // 2) When bags are loaded from input data, if the user
>    // specifies a schema with the "bag" type, he has to specify
>    // the bag as containing a tuple with the actual elements in
>    // the schema declaration. However in both the cases above,
>    // the user can still say b.i where b is the bag and i is
>    // an element in the bag's tuple schema. So in these cases,
>    // the access should translate to a lookup for "i" in the
>    // tuple schema present in the bag. To indicate this, the
>    // flag below is used. It is false by default because,
>    // currently we use bag as the type for relations. However
>    // the schema of a relation does NOT have a tuple fieldschema
>    // with items in it. Instead, the schema directly has the
>    // field schema of the items. So for a relation "b", the
>    // above b.i access would be a direct single level access
>    // of i in b's schema. This is treated as the "default" case
>    private boolean twoLevelAccessRequired = false;
>
> -Original Message-
> From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
> Sent: Monday, November 02, 2009 5:33 PM
> To: pig-dev@hadoop.apache.org
> Subject: two-level access problem?
>
> Could someone explain the nature of the "two-level access problem"
> referred to in the Load/Store redesign wiki and in the DataType code?
>
>
> Thanks,
> -D
>


[jira] Updated: (PIG-970) Support of HBase 0.20.0

2009-11-03 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-970:
---

Attachment: (was: Pig_HBase_0.20.0.patch)

> Support of HBase 0.20.0
> ---
>
> Key: PIG-970
> URL: https://issues.apache.org/jira/browse/PIG-970
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Vincent BARAT
>Assignee: Jeff Zhang
> Fix For: 0.5.0
>
> Attachments: build.xml.path, hbase-0.20.0-test.jar, hbase-0.20.0.jar, 
> pig-hbase-0.20.0-support.patch, pig-hbase-20-v2.patch, 
> TEST-org.apache.pig.test.TestHBaseStorage.txt, 
> TEST-org.apache.pig.test.TestHBaseStorage.txt, 
> TEST-org.apache.pig.test.TestHBaseStorage.txt, test-output.tgz, 
> zookeeper-hbase-1329.jar
>
>
> The support of HBase is currently very limited and restricted to HBase 0.18.0.
> Because the next releases of PIG will support Hadoop 0.20.0, they should also 
> support HBase 0.20.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773149#action_12773149
 ] 

Ankit Modi commented on PIG-1036:
-

Also the the patch fixes two wrong error codes in 
{code}LogToPhyTranslationVisitor.updateWithEmptyBagCheck{code}

{code}
int errCode = 1109;  // was 1105
String msg = "Input (" + joinInput.getAlias() + ") " +
"on which outer join is desired should have a valid 
schema";
  
} catch (FrontendException e) {
int errCode = 2104;  // was 2014

{code}

> Fragment-replicate left outer join
> --
>
> Key: PIG-1036
> URL: https://issues.apache.org/jira/browse/PIG-1036
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
> Attachments: LeftOuterFRJoin.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-970) Support of HBase 0.20.0

2009-11-03 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773297#action_12773297
 ] 

Jeff Zhang commented on PIG-970:


yes, Alan, Could you attach the whole log including the logs of task tracker
Thank you.

> Support of HBase 0.20.0
> ---
>
> Key: PIG-970
> URL: https://issues.apache.org/jira/browse/PIG-970
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Vincent BARAT
>Assignee: Jeff Zhang
> Fix For: 0.5.0
>
> Attachments: build.xml.path, hbase-0.20.0-test.jar, hbase-0.20.0.jar, 
> pig-hbase-0.20.0-support.patch, pig-hbase-20-v2.patch, 
> Pig_HBase_0.20.0.patch, TEST-org.apache.pig.test.TestHBaseStorage.txt, 
> TEST-org.apache.pig.test.TestHBaseStorage.txt, zookeeper-hbase-1329.jar
>
>
> The support of HBase is currently very limited and restricted to HBase 0.18.0.
> Because the next releases of PIG will support Hadoop 0.20.0, they should also 
> support HBase 0.20.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1036:


Status: Open  (was: Patch Available)

> Fragment-replicate left outer join
> --
>
> Key: PIG-1036
> URL: https://issues.apache.org/jira/browse/PIG-1036
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
> Attachments: LeftOuterFRJoin.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-966) Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces

2009-11-03 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773295#action_12773295
 ] 

Pradeep Kamath commented on PIG-966:


I have updated http://wiki.apache.org/pig/LoadStoreRedesignProposal with some 
changes in the interfaces and recorded the reasons in the "Changes" section at 
the bottom of the page. I have also cleaned up the topic a bit and added few 
new sections giving details of the implementation so far in the branch. I have 
also added a list of remaining task items. Please review and provide comments - 
also it would be good to keep this topic up-to-date with changes discussed here 
and implemented on the branch.

> Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces
> ---
>
> Key: PIG-966
> URL: https://issues.apache.org/jira/browse/PIG-966
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Alan Gates
>Assignee: Alan Gates
>
> I propose that we rework the LoadFunc, StoreFunc, and Slice/r interfaces 
> significantly.  See http://wiki.apache.org/pig/LoadStoreRedesignProposal for 
> full details

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1036:


Attachment: (was: LeftOuterFRJoin.patch)

> Fragment-replicate left outer join
> --
>
> Key: PIG-1036
> URL: https://issues.apache.org/jira/browse/PIG-1036
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-997) [zebra] Sorted Table Support by Zebra

2009-11-03 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773192#action_12773192
 ] 

Alan Gates commented on PIG-997:


After applying this patch TestColumnSecurity fails.  The output of the failed 
test is:

Testsuite: org.apache.hadoop.zebra.types.TestColumnSecurity
Tests run: 0, Failures: 0, Errors: 1, Time elapsed: 0.15 sec
- Standard Output ---
SUPERUSER NAME: gates
-  ---
- Standard Error -
log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.conf.Configuration).
log4j:WARN Please initialize the log4j system properly.
-  ---

Testcase: org.apache.hadoop.zebra.types.TestColumnSecurity took 0 sec
Caused an ERROR
chmod: cannot access `/user/jing1234': No such file or directory

org.apache.hadoop.util.Shell$ExitCodeException: chmod: cannot access 
`/user/jing1234': No such file or directory

at org.apache.hadoop.util.Shell.runCommand(Shell.java:195)
at org.apache.hadoop.util.Shell.run(Shell.java:134)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:354)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:337)
at 
org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:481)
at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:473)
at 
org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:280)
at 
org.apache.hadoop.zebra.types.TestColumnSecurity.setUpOnce(TestColumnSecurity.java:105)

> [zebra] Sorted Table Support by Zebra
> -
>
> Key: PIG-997
> URL: https://issues.apache.org/jira/browse/PIG-997
> Project: Pig
>  Issue Type: New Feature
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.6.0
>
> Attachments: SortedTable.patch, SortedTable.patch, SortedTable.patch
>
>
> This new feature is for Zebra to support sorted data in storage. As a storage 
> library, Zebra will not sort the data by itself. But it will support creation 
> and use of sorted data either through PIG  or through map/reduce tasks that 
> use Zebra as storage format.
> The sorted table keeps the data in a "totally sorted" manner across all 
> TFiles created by potentially all mappers or reducers.
> For sorted data creation through PIG's STORE operator ,  if the input data is 
> sorted through "ORDER BY", the new Zebra table will be marked as sorted on 
> the sorted columns;
> For sorted data creation though Map/Reduce tasks,  three new static methods 
> of the BasicTableOutput class will be provided to allow or help the user to 
> achieve the goal. "setSortInfo" allows the user to specify the sorted columns 
> of the input tuple to be stored; "getSortKeyGenerator" and "getSortKey" help 
> the user to generate the key acceptable by Zebra as a sorted key based upon 
> the schema, sorted columns and the input tuple.
> For sorted data read through PIG's LOAD operator, pass string "sorted" as an 
> extra argument to the TableLoader constructor to ask for sorted table to be 
> loaded;
> For sorted data read through Map/Reduce tasks, a new static method of 
> TableInputFormat class, requireSortedTable, can be called to ask for a sorted 
> table to be read. Additionally, an overloaded version of the new method can 
> be called to ask for a sorted table on specified sort columns and comparator.
> For this release, sorted table only supported sorting in ascending order, not 
> in descending order. In addition, the sort keys must be of simple types not 
> complex types such as RECORD, COLLECTION and MAP. 
> Multiple-key sorting is supported. But the ordering of the multiple sort keys 
> is significant with the first sort column being the primary sort key, the 
> second being the secondary sort key, etc.
> In this release, the sort keys are stored along with the sort columns where 
> the keys were originally created from, resulting in some data storage 
> redundancy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1058) FINDBUGS: remaining "Correctness Warnings"

2009-11-03 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1058:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

patch committed. Thanks, Pradeep, for help with resolving one of the findbugs 
issues!

> FINDBUGS: remaining "Correctness Warnings"
> --
>
> Key: PIG-1058
> URL: https://issues.apache.org/jira/browse/PIG-1058
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Attachments: PIG-1058.patch, PIG-1058_v2.patch
>
>
> BCImpossible cast from java.lang.Object[] to java.lang.String[] in 
> org.apache.pig.PigServer.listPaths(String)
> ECCall to equals() comparing different types in 
> org.apache.pig.impl.plan.Operator.equals(Object)
> GCjava.lang.Byte is incompatible with expected argument type 
> java.lang.Integer in 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator$LoRearrangeDiscoverer.visitLocalRearrange(POLocalRearrange)
> ILThere is an apparent infinite recursive loop in 
> org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POCogroup$groupComparator.equals(Object)
> INT   Bad comparison of nonnegative value with -1 in 
> org.apache.tools.bzip2r.CBZip2InputStream.bsR(int)
> INT   Bad comparison of nonnegative value with -1 in 
> org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode()
> INT   Bad comparison of nonnegative value with -1 in 
> org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode()
> MFField ConstantExpression.res masks field in superclass 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
> Nm
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitSplit(POSplit)
>  doesn't override method in superclass because parameter type 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit
>  doesn't match superclass parameter type 
> org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit
> Nm
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.NoopStoreRemover$PhysicalRemover.visitSplit(POSplit)
>  doesn't override method in superclass because parameter type 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit
>  doesn't match superclass parameter type 
> org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit
> NPPossible null pointer dereference of ? in 
> org.apache.pig.impl.logicalLayer.optimizer.PushDownForeachFlatten.check(List)
> NPPossible null pointer dereference of lo in 
> org.apache.pig.impl.logicalLayer.optimizer.StreamOptimizer.transform(List)
> NPPossible null pointer dereference of 
> Schema$FieldSchema.Schema$FieldSchema.alias in 
> org.apache.pig.impl.logicalLayer.schema.Schema.equals(Schema, Schema, 
> boolean, boolean)
> NPPossible null pointer dereference of Schema$FieldSchema.alias in 
> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.equals(Schema$FieldSchema,
>  Schema$FieldSchema, boolean, boolean)
> NPPossible null pointer dereference of inp in 
> org.apache.pig.impl.streaming.ExecutableManager$ProcessInputThread.run()
> RCN   Nullcheck of pigContext at line 123 of value previously dereferenced in 
> org.apache.pig.impl.util.JarManager.createJar(OutputStream, List, PigContext)
> RV
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.fixUpDomain(String,
>  Properties) ignores return value of java.net.InetAddress.getByName(String)
> RVBad attempt to compute absolute value of signed 32-bit hashcode in 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.SkewedPartitioner.getPartition(PigNullableWritable,
>  Writable, int)
> RVBad attempt to compute absolute value of signed 32-bit hashcode in 
> org.apache.pig.impl.plan.DotPlanDumper.getID(Operator)
> UwF   Field only ever set to null: 
> org.apache.pig.impl.builtin.MergeJoinIndexer.dummyTuple

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-970) Support of HBase 0.20.0

2009-11-03 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773309#action_12773309
 ] 

Jeff Zhang commented on PIG-970:


Alan, do you have  file hbase-site.xml in folder test ? ( I put it in my patch)

Because I look into the logs and find that the map task is attempting to 
connect to zookeeper at port 2181, but the the port of MiniZookeeperCluster is 
21810.  So there should be a file hbase-site.xml in folder test to override the 
configuration just like they did in hbase trunk.



> Support of HBase 0.20.0
> ---
>
> Key: PIG-970
> URL: https://issues.apache.org/jira/browse/PIG-970
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Vincent BARAT
>Assignee: Jeff Zhang
> Fix For: 0.5.0
>
> Attachments: build.xml.path, hbase-0.20.0-test.jar, hbase-0.20.0.jar, 
> pig-hbase-0.20.0-support.patch, pig-hbase-20-v2.patch, 
> Pig_HBase_0.20.0.patch, TEST-org.apache.pig.test.TestHBaseStorage.txt, 
> TEST-org.apache.pig.test.TestHBaseStorage.txt, 
> TEST-org.apache.pig.test.TestHBaseStorage.txt, test-output.tgz, 
> zookeeper-hbase-1329.jar
>
>
> The support of HBase is currently very limited and restricted to HBase 0.18.0.
> Because the next releases of PIG will support Hadoop 0.20.0, they should also 
> support HBase 0.20.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-997) [zebra] Sorted Table Support by Zebra

2009-11-03 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-997:
-

Status: Open  (was: Patch Available)

The failure is due to a misplaced test in the nightly suite. I'm going to 
exclude that in next patch.

> [zebra] Sorted Table Support by Zebra
> -
>
> Key: PIG-997
> URL: https://issues.apache.org/jira/browse/PIG-997
> Project: Pig
>  Issue Type: New Feature
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.6.0
>
> Attachments: SortedTable.patch, SortedTable.patch, SortedTable.patch
>
>
> This new feature is for Zebra to support sorted data in storage. As a storage 
> library, Zebra will not sort the data by itself. But it will support creation 
> and use of sorted data either through PIG  or through map/reduce tasks that 
> use Zebra as storage format.
> The sorted table keeps the data in a "totally sorted" manner across all 
> TFiles created by potentially all mappers or reducers.
> For sorted data creation through PIG's STORE operator ,  if the input data is 
> sorted through "ORDER BY", the new Zebra table will be marked as sorted on 
> the sorted columns;
> For sorted data creation though Map/Reduce tasks,  three new static methods 
> of the BasicTableOutput class will be provided to allow or help the user to 
> achieve the goal. "setSortInfo" allows the user to specify the sorted columns 
> of the input tuple to be stored; "getSortKeyGenerator" and "getSortKey" help 
> the user to generate the key acceptable by Zebra as a sorted key based upon 
> the schema, sorted columns and the input tuple.
> For sorted data read through PIG's LOAD operator, pass string "sorted" as an 
> extra argument to the TableLoader constructor to ask for sorted table to be 
> loaded;
> For sorted data read through Map/Reduce tasks, a new static method of 
> TableInputFormat class, requireSortedTable, can be called to ask for a sorted 
> table to be read. Additionally, an overloaded version of the new method can 
> be called to ask for a sorted table on specified sort columns and comparator.
> For this release, sorted table only supported sorting in ascending order, not 
> in descending order. In addition, the sort keys must be of simple types not 
> complex types such as RECORD, COLLECTION and MAP. 
> Multiple-key sorting is supported. But the ordering of the multiple sort keys 
> is significant with the first sort column being the primary sort key, the 
> second being the secondary sort key, etc.
> In this release, the sort keys are stored along with the sort columns where 
> the keys were originally created from, resulting in some data storage 
> redundancy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-970) Support of HBase 0.20.0

2009-11-03 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773339#action_12773339
 ] 

Jeff Zhang commented on PIG-970:


Well, it's weird.

Alan, could check again that the pig-0.6.0-dev-withouthadoop.jar have file 
hbase-site.xml, and in this file hbase.zookeeper.property.clientPort is set to 
21810 ?



> Support of HBase 0.20.0
> ---
>
> Key: PIG-970
> URL: https://issues.apache.org/jira/browse/PIG-970
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Vincent BARAT
>Assignee: Jeff Zhang
> Fix For: 0.5.0
>
> Attachments: build.xml.path, hbase-0.20.0-test.jar, hbase-0.20.0.jar, 
> pig-hbase-0.20.0-support.patch, pig-hbase-20-v2.patch, 
> Pig_HBase_0.20.0.patch, TEST-org.apache.pig.test.TestHBaseStorage.txt, 
> TEST-org.apache.pig.test.TestHBaseStorage.txt, 
> TEST-org.apache.pig.test.TestHBaseStorage.txt, test-output.tgz, 
> zookeeper-hbase-1329.jar
>
>
> The support of HBase is currently very limited and restricted to HBase 0.18.0.
> Because the next releases of PIG will support Hadoop 0.20.0, they should also 
> support HBase 0.20.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1036:


   Resolution: Fixed
Fix Version/s: 0.6.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Patch committed, thanks Ankit!

> Fragment-replicate left outer join
> --
>
> Key: PIG-1036
> URL: https://issues.apache.org/jira/browse/PIG-1036
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
> Fix For: 0.6.0
>
> Attachments: LeftOuterFRJoin.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1002) FINDBUGS: BC: Equals method should not assume anything about the type of its argument

2009-11-03 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-1002.
-

Resolution: Fixed

this has been addressed in other JIRAs

> FINDBUGS: BC: Equals method should not assume anything about the type of its 
> argument 
> --
>
> Key: PIG-1002
> URL: https://issues.apache.org/jira/browse/PIG-1002
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
>
> BCEquals method for org.apache.pig.builtin.PigStorage assumes the 
> argument is of type PigStorage
> BCEquals method for 
> org.apache.pig.impl.streaming.StreamingCommand$HandleSpec assumes the 
> argument is of type StreamingCommand$HandleSpec

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-970) Support of HBase 0.20.0

2009-11-03 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773314#action_12773314
 ] 

Alan Gates commented on PIG-970:


Yes, it's there.

> Support of HBase 0.20.0
> ---
>
> Key: PIG-970
> URL: https://issues.apache.org/jira/browse/PIG-970
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Vincent BARAT
>Assignee: Jeff Zhang
> Fix For: 0.5.0
>
> Attachments: build.xml.path, hbase-0.20.0-test.jar, hbase-0.20.0.jar, 
> pig-hbase-0.20.0-support.patch, pig-hbase-20-v2.patch, 
> Pig_HBase_0.20.0.patch, TEST-org.apache.pig.test.TestHBaseStorage.txt, 
> TEST-org.apache.pig.test.TestHBaseStorage.txt, 
> TEST-org.apache.pig.test.TestHBaseStorage.txt, test-output.tgz, 
> zookeeper-hbase-1329.jar
>
>
> The support of HBase is currently very limited and restricted to HBase 0.18.0.
> Because the next releases of PIG will support Hadoop 0.20.0, they should also 
> support HBase 0.20.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-958) Splitting output data on key field

2009-11-03 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773181#action_12773181
 ] 

Pradeep Kamath commented on PIG-958:


bq. 2. Deleting the temporary directory manually in finish(), causes the job to 
fail. Removed the manual deletion. As a side effect, user specified PARENT 
output directory in the UDF will have empty part-* files. These should be 
deleted manually by the user.

Can you explain this a little more - been long since I last looked at the code 
- there seems to be some mv and this deletion happening - if you can explain 
that part too it would be helpful

Otherwise looks good.

> Splitting output data on key field
> --
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Ankur
> Attachments: 958.v3.patch, 958.v4.patch
>
>
> Pig users often face the need to split the output records into a bunch of 
> files and directories depending on the type of record. Pig's SPLIT operator 
> is useful when record types are few and known in advance. In cases where type 
> is not directly known but is derived dynamically from values of a key field 
> in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1058) FINDBUGS: remaining "Correctness Warnings"

2009-11-03 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1058:


Status: Patch Available  (was: Open)

> FINDBUGS: remaining "Correctness Warnings"
> --
>
> Key: PIG-1058
> URL: https://issues.apache.org/jira/browse/PIG-1058
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Attachments: PIG-1058.patch, PIG-1058_v2.patch
>
>
> BCImpossible cast from java.lang.Object[] to java.lang.String[] in 
> org.apache.pig.PigServer.listPaths(String)
> ECCall to equals() comparing different types in 
> org.apache.pig.impl.plan.Operator.equals(Object)
> GCjava.lang.Byte is incompatible with expected argument type 
> java.lang.Integer in 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator$LoRearrangeDiscoverer.visitLocalRearrange(POLocalRearrange)
> ILThere is an apparent infinite recursive loop in 
> org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POCogroup$groupComparator.equals(Object)
> INT   Bad comparison of nonnegative value with -1 in 
> org.apache.tools.bzip2r.CBZip2InputStream.bsR(int)
> INT   Bad comparison of nonnegative value with -1 in 
> org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode()
> INT   Bad comparison of nonnegative value with -1 in 
> org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode()
> MFField ConstantExpression.res masks field in superclass 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
> Nm
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitSplit(POSplit)
>  doesn't override method in superclass because parameter type 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit
>  doesn't match superclass parameter type 
> org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit
> Nm
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.NoopStoreRemover$PhysicalRemover.visitSplit(POSplit)
>  doesn't override method in superclass because parameter type 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit
>  doesn't match superclass parameter type 
> org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit
> NPPossible null pointer dereference of ? in 
> org.apache.pig.impl.logicalLayer.optimizer.PushDownForeachFlatten.check(List)
> NPPossible null pointer dereference of lo in 
> org.apache.pig.impl.logicalLayer.optimizer.StreamOptimizer.transform(List)
> NPPossible null pointer dereference of 
> Schema$FieldSchema.Schema$FieldSchema.alias in 
> org.apache.pig.impl.logicalLayer.schema.Schema.equals(Schema, Schema, 
> boolean, boolean)
> NPPossible null pointer dereference of Schema$FieldSchema.alias in 
> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.equals(Schema$FieldSchema,
>  Schema$FieldSchema, boolean, boolean)
> NPPossible null pointer dereference of inp in 
> org.apache.pig.impl.streaming.ExecutableManager$ProcessInputThread.run()
> RCN   Nullcheck of pigContext at line 123 of value previously dereferenced in 
> org.apache.pig.impl.util.JarManager.createJar(OutputStream, List, PigContext)
> RV
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.fixUpDomain(String,
>  Properties) ignores return value of java.net.InetAddress.getByName(String)
> RVBad attempt to compute absolute value of signed 32-bit hashcode in 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.SkewedPartitioner.getPartition(PigNullableWritable,
>  Writable, int)
> RVBad attempt to compute absolute value of signed 32-bit hashcode in 
> org.apache.pig.impl.plan.DotPlanDumper.getID(Operator)
> UwF   Field only ever set to null: 
> org.apache.pig.impl.builtin.MergeJoinIndexer.dummyTuple

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1058) FINDBUGS: remaining "Correctness Warnings"

2009-11-03 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1058:


Status: Open  (was: Patch Available)

> FINDBUGS: remaining "Correctness Warnings"
> --
>
> Key: PIG-1058
> URL: https://issues.apache.org/jira/browse/PIG-1058
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Attachments: PIG-1058.patch, PIG-1058_v2.patch
>
>
> BCImpossible cast from java.lang.Object[] to java.lang.String[] in 
> org.apache.pig.PigServer.listPaths(String)
> ECCall to equals() comparing different types in 
> org.apache.pig.impl.plan.Operator.equals(Object)
> GCjava.lang.Byte is incompatible with expected argument type 
> java.lang.Integer in 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator$LoRearrangeDiscoverer.visitLocalRearrange(POLocalRearrange)
> ILThere is an apparent infinite recursive loop in 
> org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POCogroup$groupComparator.equals(Object)
> INT   Bad comparison of nonnegative value with -1 in 
> org.apache.tools.bzip2r.CBZip2InputStream.bsR(int)
> INT   Bad comparison of nonnegative value with -1 in 
> org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode()
> INT   Bad comparison of nonnegative value with -1 in 
> org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode()
> MFField ConstantExpression.res masks field in superclass 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
> Nm
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitSplit(POSplit)
>  doesn't override method in superclass because parameter type 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit
>  doesn't match superclass parameter type 
> org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit
> Nm
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.NoopStoreRemover$PhysicalRemover.visitSplit(POSplit)
>  doesn't override method in superclass because parameter type 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit
>  doesn't match superclass parameter type 
> org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit
> NPPossible null pointer dereference of ? in 
> org.apache.pig.impl.logicalLayer.optimizer.PushDownForeachFlatten.check(List)
> NPPossible null pointer dereference of lo in 
> org.apache.pig.impl.logicalLayer.optimizer.StreamOptimizer.transform(List)
> NPPossible null pointer dereference of 
> Schema$FieldSchema.Schema$FieldSchema.alias in 
> org.apache.pig.impl.logicalLayer.schema.Schema.equals(Schema, Schema, 
> boolean, boolean)
> NPPossible null pointer dereference of Schema$FieldSchema.alias in 
> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.equals(Schema$FieldSchema,
>  Schema$FieldSchema, boolean, boolean)
> NPPossible null pointer dereference of inp in 
> org.apache.pig.impl.streaming.ExecutableManager$ProcessInputThread.run()
> RCN   Nullcheck of pigContext at line 123 of value previously dereferenced in 
> org.apache.pig.impl.util.JarManager.createJar(OutputStream, List, PigContext)
> RV
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.fixUpDomain(String,
>  Properties) ignores return value of java.net.InetAddress.getByName(String)
> RVBad attempt to compute absolute value of signed 32-bit hashcode in 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.SkewedPartitioner.getPartition(PigNullableWritable,
>  Writable, int)
> RVBad attempt to compute absolute value of signed 32-bit hashcode in 
> org.apache.pig.impl.plan.DotPlanDumper.getID(Operator)
> UwF   Field only ever set to null: 
> org.apache.pig.impl.builtin.MergeJoinIndexer.dummyTuple

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1058) FINDBUGS: remaining "Correctness Warnings"

2009-11-03 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1058:


Attachment: PIG-1058_v2.patch

Addressed unit test failures

> FINDBUGS: remaining "Correctness Warnings"
> --
>
> Key: PIG-1058
> URL: https://issues.apache.org/jira/browse/PIG-1058
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Attachments: PIG-1058.patch, PIG-1058_v2.patch
>
>
> BCImpossible cast from java.lang.Object[] to java.lang.String[] in 
> org.apache.pig.PigServer.listPaths(String)
> ECCall to equals() comparing different types in 
> org.apache.pig.impl.plan.Operator.equals(Object)
> GCjava.lang.Byte is incompatible with expected argument type 
> java.lang.Integer in 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator$LoRearrangeDiscoverer.visitLocalRearrange(POLocalRearrange)
> ILThere is an apparent infinite recursive loop in 
> org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POCogroup$groupComparator.equals(Object)
> INT   Bad comparison of nonnegative value with -1 in 
> org.apache.tools.bzip2r.CBZip2InputStream.bsR(int)
> INT   Bad comparison of nonnegative value with -1 in 
> org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode()
> INT   Bad comparison of nonnegative value with -1 in 
> org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode()
> MFField ConstantExpression.res masks field in superclass 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
> Nm
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitSplit(POSplit)
>  doesn't override method in superclass because parameter type 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit
>  doesn't match superclass parameter type 
> org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit
> Nm
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.NoopStoreRemover$PhysicalRemover.visitSplit(POSplit)
>  doesn't override method in superclass because parameter type 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit
>  doesn't match superclass parameter type 
> org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit
> NPPossible null pointer dereference of ? in 
> org.apache.pig.impl.logicalLayer.optimizer.PushDownForeachFlatten.check(List)
> NPPossible null pointer dereference of lo in 
> org.apache.pig.impl.logicalLayer.optimizer.StreamOptimizer.transform(List)
> NPPossible null pointer dereference of 
> Schema$FieldSchema.Schema$FieldSchema.alias in 
> org.apache.pig.impl.logicalLayer.schema.Schema.equals(Schema, Schema, 
> boolean, boolean)
> NPPossible null pointer dereference of Schema$FieldSchema.alias in 
> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.equals(Schema$FieldSchema,
>  Schema$FieldSchema, boolean, boolean)
> NPPossible null pointer dereference of inp in 
> org.apache.pig.impl.streaming.ExecutableManager$ProcessInputThread.run()
> RCN   Nullcheck of pigContext at line 123 of value previously dereferenced in 
> org.apache.pig.impl.util.JarManager.createJar(OutputStream, List, PigContext)
> RV
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.fixUpDomain(String,
>  Properties) ignores return value of java.net.InetAddress.getByName(String)
> RVBad attempt to compute absolute value of signed 32-bit hashcode in 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.SkewedPartitioner.getPartition(PigNullableWritable,
>  Writable, int)
> RVBad attempt to compute absolute value of signed 32-bit hashcode in 
> org.apache.pig.impl.plan.DotPlanDumper.getID(Operator)
> UwF   Field only ever set to null: 
> org.apache.pig.impl.builtin.MergeJoinIndexer.dummyTuple

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Pig 0.5.0 is released!

2009-11-03 Thread Olga Natkovich
Pig Team is happy to announce Pig 0.5.0 release!

 

Pig is a Hadoop subproject that provides high-level data-flow language
and an execution framework for parallel computation on a Hadoop cluster.

More details about Pig can be found at http://hadoop.apache.org/pig/. 

 

 

This release makes functionality of Pig 0.4.0 available on Hadoop 20
clusters. The details of the release are available at
http://hadoop.apache.org/pig/releases.html

 

Olga



[jira] Commented: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773229#action_12773229
 ] 

Pradeep Kamath commented on PIG-1036:
-

+1, will commit once hudson QA comes back.

> Fragment-replicate left outer join
> --
>
> Key: PIG-1036
> URL: https://issues.apache.org/jira/browse/PIG-1036
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
> Attachments: LeftOuterFRJoin.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1036:


Status: Patch Available  (was: Open)

> Fragment-replicate left outer join
> --
>
> Key: PIG-1036
> URL: https://issues.apache.org/jira/browse/PIG-1036
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
> Attachments: LeftOuterFRJoin.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1071) Support comma separated file/directory names in load statements

2009-11-03 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding reassigned PIG-1071:
-

Assignee: Richard Ding

> Support comma separated file/directory names in load statements
> ---
>
> Key: PIG-1071
> URL: https://issues.apache.org/jira/browse/PIG-1071
> Project: Pig
>  Issue Type: New Feature
>Reporter: Richard Ding
>Assignee: Richard Ding
>
> Currently Pig Latin support following LOAD syntax:
> {code}
> LOAD 'data' [USING loader function] [AS schema];  
> {code}
> where data is the name of the file or directory, including files specified 
> with Hadoop-supported globing syntax. This name is passed to the loader 
> function.
> This feature is to support loaders that can load multiple files from 
> different directories and allows users to pass in the file names in a comma 
> separated string.
> For example, these will be valid load statements:
> {code}
> LOAD '/usr/pig/test1/a,/usr/pig/test2/b' USING someloader()';
> {code}
> and 
> {code}
> LOAD '/usr/pig/test1/{a,c},/usr/pig/test2/b' USING someloader();
> {code}
> This comma separated string is passed to the loader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1071) Support comma separated file/directory names in load statements

2009-11-03 Thread Richard Ding (JIRA)
Support comma separated file/directory names in load statements
---

 Key: PIG-1071
 URL: https://issues.apache.org/jira/browse/PIG-1071
 Project: Pig
  Issue Type: New Feature
Reporter: Richard Ding


Currently Pig Latin support following LOAD syntax:

{code}
LOAD 'data' [USING loader function] [AS schema];  
{code}

where data is the name of the file or directory, including files specified with 
Hadoop-supported globing syntax. This name is passed to the loader function.

This feature is to support loaders that can load multiple files from different 
directories and allows users to pass in the file names in a comma separated 
string.

For example, these will be valid load statements:

{code}
LOAD '/usr/pig/test1/a,/usr/pig/test2/b' USING someloader()';
{code}

and 

{code}
LOAD '/usr/pig/test1/{a,c},/usr/pig/test2/b' USING someloader();
{code}

This comma separated string is passed to the loader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-997) [zebra] Sorted Table Support by Zebra

2009-11-03 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-997:
-

Attachment: SortedTable.patch

> [zebra] Sorted Table Support by Zebra
> -
>
> Key: PIG-997
> URL: https://issues.apache.org/jira/browse/PIG-997
> Project: Pig
>  Issue Type: New Feature
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.6.0
>
> Attachments: SortedTable.patch, SortedTable.patch, SortedTable.patch, 
> SortedTable.patch
>
>
> This new feature is for Zebra to support sorted data in storage. As a storage 
> library, Zebra will not sort the data by itself. But it will support creation 
> and use of sorted data either through PIG  or through map/reduce tasks that 
> use Zebra as storage format.
> The sorted table keeps the data in a "totally sorted" manner across all 
> TFiles created by potentially all mappers or reducers.
> For sorted data creation through PIG's STORE operator ,  if the input data is 
> sorted through "ORDER BY", the new Zebra table will be marked as sorted on 
> the sorted columns;
> For sorted data creation though Map/Reduce tasks,  three new static methods 
> of the BasicTableOutput class will be provided to allow or help the user to 
> achieve the goal. "setSortInfo" allows the user to specify the sorted columns 
> of the input tuple to be stored; "getSortKeyGenerator" and "getSortKey" help 
> the user to generate the key acceptable by Zebra as a sorted key based upon 
> the schema, sorted columns and the input tuple.
> For sorted data read through PIG's LOAD operator, pass string "sorted" as an 
> extra argument to the TableLoader constructor to ask for sorted table to be 
> loaded;
> For sorted data read through Map/Reduce tasks, a new static method of 
> TableInputFormat class, requireSortedTable, can be called to ask for a sorted 
> table to be read. Additionally, an overloaded version of the new method can 
> be called to ask for a sorted table on specified sort columns and comparator.
> For this release, sorted table only supported sorting in ascending order, not 
> in descending order. In addition, the sort keys must be of simple types not 
> complex types such as RECORD, COLLECTION and MAP. 
> Multiple-key sorting is supported. But the ordering of the multiple sort keys 
> is significant with the first sort column being the primary sort key, the 
> second being the secondary sort key, etc.
> In this release, the sort keys are stored along with the sort columns where 
> the keys were originally created from, resulting in some data storage 
> redundancy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: two-level access problem?

2009-11-03 Thread Pradeep Kamath
>From comments in Schema.java:
// In bags which have a schema with a tuple which contains
// the fields present in it, if we access the second field (say)
// we are actually trying to access the second field in the
// tuple in the bag. This is currently true for two cases:
// 1) bag constants - the schema of bag constant has a tuple
// which internally has the actual elements
// 2) When bags are loaded from input data, if the user 
// specifies a schema with the "bag" type, he has to specify
// the bag as containing a tuple with the actual elements in 
// the schema declaration. However in both the cases above,
// the user can still say b.i where b is the bag and i is 
// an element in the bag's tuple schema. So in these cases,
// the access should translate to a lookup for "i" in the 
// tuple schema present in the bag. To indicate this, the
// flag below is used. It is false by default because, 
// currently we use bag as the type for relations. However 
// the schema of a relation does NOT have a tuple fieldschema
// with items in it. Instead, the schema directly has the 
// field schema of the items. So for a relation "b", the 
// above b.i access would be a direct single level access
// of i in b's schema. This is treated as the "default" case
private boolean twoLevelAccessRequired = false;

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] 
Sent: Monday, November 02, 2009 5:33 PM
To: pig-dev@hadoop.apache.org
Subject: two-level access problem?

Could someone explain the nature of the "two-level access problem"
referred to in the Load/Store redesign wiki and in the DataType code?


Thanks,
-D


Re: two-level access problem?

2009-11-03 Thread Dmitriy Ryaboy
Thanks Pradeep,
I saw that comment. I guess my question is, given the solution this
comment describes, what are you referring to in the Load/Store
redesign doc when you say "we must fix the two level access issues
with schema of bags in current schema before we make these changes,
otherwise that same contagion will afflict us here?"

-D

On Tue, Nov 3, 2009 at 2:10 PM, Pradeep Kamath  wrote:
> From comments in Schema.java:
>    // In bags which have a schema with a tuple which contains
>    // the fields present in it, if we access the second field (say)
>    // we are actually trying to access the second field in the
>    // tuple in the bag. This is currently true for two cases:
>    // 1) bag constants - the schema of bag constant has a tuple
>    // which internally has the actual elements
>    // 2) When bags are loaded from input data, if the user
>    // specifies a schema with the "bag" type, he has to specify
>    // the bag as containing a tuple with the actual elements in
>    // the schema declaration. However in both the cases above,
>    // the user can still say b.i where b is the bag and i is
>    // an element in the bag's tuple schema. So in these cases,
>    // the access should translate to a lookup for "i" in the
>    // tuple schema present in the bag. To indicate this, the
>    // flag below is used. It is false by default because,
>    // currently we use bag as the type for relations. However
>    // the schema of a relation does NOT have a tuple fieldschema
>    // with items in it. Instead, the schema directly has the
>    // field schema of the items. So for a relation "b", the
>    // above b.i access would be a direct single level access
>    // of i in b's schema. This is treated as the "default" case
>    private boolean twoLevelAccessRequired = false;
>
> -Original Message-
> From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
> Sent: Monday, November 02, 2009 5:33 PM
> To: pig-dev@hadoop.apache.org
> Subject: two-level access problem?
>
> Could someone explain the nature of the "two-level access problem"
> referred to in the Load/Store redesign wiki and in the DataType code?
>
>
> Thanks,
> -D
>


[jira] Commented: (PIG-958) Splitting output data on key field

2009-11-03 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773184#action_12773184
 ] 

Pradeep Kamath commented on PIG-958:


I saw compile errors while trying to run unit test:

{noformat}
[..contrib/piggybank/java]ant test
..

[javac] 
/homes/pradeepk/dev/pig-commit/PIG-958.v4/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestMultiStorage.java:44:
 cannot find symbol
[javac] symbol  : variable MiniCluster
[javac] location: class 
org.apache.pig.piggybank.test.storage.TestMultiStorage
[javac]   private MiniCluster cluster = MiniCluster.buildCluster();
[javac] ^
[javac] 
/homes/pradeepk/dev/pig-commit/PIG-958.v4/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestMultiStorage.java:73:
 cannot find symbol
[javac] symbol  : variable Util
[javac] location: class 
org.apache.pig.piggybank.test.storage.TestMultiStorage
[javac] Util.deleteFile(cluster, INPUT_FILE);
[javac] ^
[javac] 
/homes/pradeepk/dev/pig-commit/PIG-958.v4/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestMultiStorage.java:74:
 cannot find symbol
[javac] symbol  : variable Util
[javac] location: class 
org.apache.pig.piggybank.test.storage.TestMultiStorage
[javac] Util.copyFromLocalToCluster(cluster, INPUT_FILE, INPUT_FILE);
[javac] ^
[javac] 
/homes/pradeepk/dev/pig-commit/PIG-958.v4/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestMultiStorage.java:96:
 cannot find symbol
[javac] symbol  : variable Util
[javac] location: class 
org.apache.pig.piggybank.test.storage.TestMultiStorage
[javac] Util.deleteFile(cluster, INPUT_FILE);
[javac] ^
..
{noformat}

> Splitting output data on key field
> --
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Ankur
> Attachments: 958.v3.patch, 958.v4.patch
>
>
> Pig users often face the need to split the output records into a bunch of 
> files and directories depending on the type of record. Pig's SPLIT operator 
> is useful when record types are few and known in advance. In cases where type 
> is not directly known but is derived dynamically from values of a key field 
> in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-970) Support of HBase 0.20.0

2009-11-03 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773348#action_12773348
 ] 

Alan Gates commented on PIG-970:


afterside:~/src/pig/PIG-970-3/trunk> jar tf pig-withouthadoop.jar | grep hbase
org/apache/pig/backend/hadoop/hbase/
org/apache/pig/backend/hadoop/hbase/HBaseSlice.class
org/apache/pig/backend/hadoop/hbase/HBaseStorage.class

> Support of HBase 0.20.0
> ---
>
> Key: PIG-970
> URL: https://issues.apache.org/jira/browse/PIG-970
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Vincent BARAT
>Assignee: Jeff Zhang
> Fix For: 0.5.0
>
> Attachments: build.xml.path, hbase-0.20.0-test.jar, hbase-0.20.0.jar, 
> pig-hbase-0.20.0-support.patch, pig-hbase-20-v2.patch, 
> Pig_HBase_0.20.0.patch, TEST-org.apache.pig.test.TestHBaseStorage.txt, 
> TEST-org.apache.pig.test.TestHBaseStorage.txt, 
> TEST-org.apache.pig.test.TestHBaseStorage.txt, test-output.tgz, 
> zookeeper-hbase-1329.jar
>
>
> The support of HBase is currently very limited and restricted to HBase 0.18.0.
> Because the next releases of PIG will support Hadoop 0.20.0, they should also 
> support HBase 0.20.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: LoadFunc.skipNext() function for faster sampling ?

2009-11-03 Thread Alan Gates
We definitely want to avoid parsing every tuple when sampling.  But do  
we need to implement a special function for it?  Pig will have access  
to the InputFormat instance, correct?  Can it not call  
InputFormat.getNext the desired number of times (which will not parse  
the tuple) and then call LoadFunc.getNext to get the next parsed tuple?


Alan.

On Nov 3, 2009, at 4:28 PM, Thejas Nair wrote:

In the new implementation of SampleLoader subclasses (used by order- 
by,
skew-join ..) as part of the loader redesign, we are not only  
reading all

the records input but also parsing them as pig tuples.

This is because the SampleLoaders are wrappers around the actual input
loaders specified in the query. We can make things much faster by  
having a
skipNext() function (or skipNext(int numSkip) ) which will avoid  
parsing the

record into a pig tuple.
LoadFunc could optionally implement this (easy to implement)  
function (which

will be part of an interface) for improving speed of queries such as
order-by.

-Thejas





[jira] Commented: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773273#action_12773273
 ] 

Hadoop QA commented on PIG-1036:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12423944/LeftOuterFRJoin.patch
  against trunk revision 832086.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/137/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/137/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/137/console

This message is automatically generated.

> Fragment-replicate left outer join
> --
>
> Key: PIG-1036
> URL: https://issues.apache.org/jira/browse/PIG-1036
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
> Attachments: LeftOuterFRJoin.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-970) Support of HBase 0.20.0

2009-11-03 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-970:
---

Attachment: test-output.tgz
TEST-org.apache.pig.test.TestHBaseStorage.txt

Test run results plus logs.

> Support of HBase 0.20.0
> ---
>
> Key: PIG-970
> URL: https://issues.apache.org/jira/browse/PIG-970
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Vincent BARAT
>Assignee: Jeff Zhang
> Fix For: 0.5.0
>
> Attachments: build.xml.path, hbase-0.20.0-test.jar, hbase-0.20.0.jar, 
> pig-hbase-0.20.0-support.patch, pig-hbase-20-v2.patch, 
> Pig_HBase_0.20.0.patch, TEST-org.apache.pig.test.TestHBaseStorage.txt, 
> TEST-org.apache.pig.test.TestHBaseStorage.txt, 
> TEST-org.apache.pig.test.TestHBaseStorage.txt, test-output.tgz, 
> zookeeper-hbase-1329.jar
>
>
> The support of HBase is currently very limited and restricted to HBase 0.18.0.
> Because the next releases of PIG will support Hadoop 0.20.0, they should also 
> support HBase 0.20.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1048) inner join using 'skewed' produces multiple rows for keys with single row in both input relations

2009-11-03 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1277#action_1277
 ] 

Alan Gates commented on PIG-1048:
-

When attempting to apply this patch to the 0.5 branch, I got the following 
error:

Testcase: testSkewedJoinOneValue took 145.739 sec
Caused an ERROR
Unable to open iterator for alias E
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias E
at org.apache.pig.PigServer.openIterator(PigServer.java:475)
at 
org.apache.pig.test.TestSkewedJoin.testSkewedJoinOneValue(TestSkewedJoin.java:340)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: 
Unable to recreate exception from backed error: java.lang.RuntimeException: 
Error in configuring object
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:237)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:181)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:209)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773)
at org.apache.pig.PigServer.store(PigServer.java:522)
at org.apache.pig.PigServer.openIterator(PigServer.java:458)

> inner join using 'skewed' produces multiple rows for keys with single row in 
> both input relations
> -
>
> Key: PIG-1048
> URL: https://issues.apache.org/jira/browse/PIG-1048
> Project: Pig
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Sriranjan Manjunath
> Fix For: 0.6.0
>
> Attachments: pig_1048.patch
>
>
> ${code}
> grunt> cat students.txt   
> asdfxc  M   23  12.44
> qwerF   21  14.44
> uhsdf   M   34  12.11
> zxldf   M   21  12.56
> qwerF   23  145.5
> oiueM   54  23.33
>  l1 = load 'students.txt';
> l2 = load 'students.txt';  
> j = join l1 by $0, l2 by $0 ; 
> store j into 'tmp.txt' 
> grunt> cat tmp.txt
> oiueM   54  23.33   oiueM   54  23.33
> oiueM   54  23.33   oiueM   54  23.33
> qwerF   21  14.44   qwerF   21  14.44
> qwerF   21  14.44   qwerF   23  145.5
> qwerF   23  145.5   qwerF   21  14.44
> qwerF   23  145.5   qwerF   23  145.5
> uhsdf   M   34  12.11   uhsdf   M   34  12.11
> uhsdf   M   34  12.11   uhsdf   M   34  12.11
> zxldf   M   21  12.56   zxldf   M   21  12.56
> zxldf   M   21  12.56   zxldf   M   21  12.56
> asdfxc  M   23  12.44   asdfxc  M   23  12.44
> asdfxc  M   23  12.44   asdfxc  M   23  12.44$
> ${code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-970) Support of HBase 0.20.0

2009-11-03 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773103#action_12773103
 ] 

Alan Gates commented on PIG-970:


When I run TestHBaseStorage now I get:

Testcase: testLoadFromHBase took 592.908 sec
Caused an ERROR
Unable to open iterator for alias a
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias a
at org.apache.pig.PigServer.openIterator(PigServer.java:481)
at 
org.apache.pig.test.TestHBaseStorage.testLoadFromHBase(TestHBaseStorage.java:170)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6015: 
During execution, encountered a Hadoop error.
at 
.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:922)
at 
.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:573)
at 
.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:555)
at 
.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:686)
at 
.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:582)
at 
.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:555)
at 
.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:686)
at 
.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:586)
at 
.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:549)
at .apache.hadoop.hbase.client.HTable.(HTable.java:125)
at .apache.pig.backend.hadoop.hbase.HBaseSlice.init(HBaseSlice.java:159)
at 
.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper.makeReader(SliceWrapper.java:129)
at 
.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getRecordReader(PigInputFormat.java:258)
at .apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338)
at .apache.hadoop.mapred.MapTask.run(MapTask.java:307)
Caused by: org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out 
trying to locate root region

Let me know if you'd like to see the whole log.

> Support of HBase 0.20.0
> ---
>
> Key: PIG-970
> URL: https://issues.apache.org/jira/browse/PIG-970
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Vincent BARAT
>Assignee: Jeff Zhang
> Fix For: 0.5.0
>
> Attachments: build.xml.path, hbase-0.20.0-test.jar, hbase-0.20.0.jar, 
> pig-hbase-0.20.0-support.patch, pig-hbase-20-v2.patch, 
> Pig_HBase_0.20.0.patch, TEST-org.apache.pig.test.TestHBaseStorage.txt, 
> TEST-org.apache.pig.test.TestHBaseStorage.txt, zookeeper-hbase-1329.jar
>
>
> The support of HBase is currently very limited and restricted to HBase 0.18.0.
> Because the next releases of PIG will support Hadoop 0.20.0, they should also 
> support HBase 0.20.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1036:


Attachment: LeftOuterFRJoin.patch

Attaching a new patch.

The join now only supports two way Left join. 
Join requires a schema to be mandatory be present on the right side, and it is 
used to determine the number of null fields/columns in nullTuple.

As its a two way join we use nullBag instead of an Array of nullBag. 
A DataBag is used instead of a Tuple to maintain consistency on the result Type 
of ConstantExpression.

> Fragment-replicate left outer join
> --
>
> Key: PIG-1036
> URL: https://issues.apache.org/jira/browse/PIG-1036
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
> Attachments: LeftOuterFRJoin.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



LoadFunc.skipNext() function for faster sampling ?

2009-11-03 Thread Thejas Nair
In the new implementation of SampleLoader subclasses (used by order-by,
skew-join ..) as part of the loader redesign, we are not only reading all
the records input but also parsing them as pig tuples.

This is because the SampleLoaders are wrappers around the actual input
loaders specified in the query. We can make things much faster by having a
skipNext() function (or skipNext(int numSkip) ) which will avoid parsing the
record into a pig tuple.
LoadFunc could optionally implement this (easy to implement) function (which
will be part of an interface) for improving speed of queries such as
order-by.

-Thejas



[jira] Commented: (PIG-1058) FINDBUGS: remaining "Correctness Warnings"

2009-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773341#action_12773341
 ] 

Hadoop QA commented on PIG-1058:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12423961/PIG-1058_v2.patch
  against trunk revision 832086.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/39/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/39/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/39/console

This message is automatically generated.

> FINDBUGS: remaining "Correctness Warnings"
> --
>
> Key: PIG-1058
> URL: https://issues.apache.org/jira/browse/PIG-1058
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Attachments: PIG-1058.patch, PIG-1058_v2.patch
>
>
> BCImpossible cast from java.lang.Object[] to java.lang.String[] in 
> org.apache.pig.PigServer.listPaths(String)
> ECCall to equals() comparing different types in 
> org.apache.pig.impl.plan.Operator.equals(Object)
> GCjava.lang.Byte is incompatible with expected argument type 
> java.lang.Integer in 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator$LoRearrangeDiscoverer.visitLocalRearrange(POLocalRearrange)
> ILThere is an apparent infinite recursive loop in 
> org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POCogroup$groupComparator.equals(Object)
> INT   Bad comparison of nonnegative value with -1 in 
> org.apache.tools.bzip2r.CBZip2InputStream.bsR(int)
> INT   Bad comparison of nonnegative value with -1 in 
> org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode()
> INT   Bad comparison of nonnegative value with -1 in 
> org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode()
> MFField ConstantExpression.res masks field in superclass 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
> Nm
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitSplit(POSplit)
>  doesn't override method in superclass because parameter type 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit
>  doesn't match superclass parameter type 
> org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit
> Nm
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.NoopStoreRemover$PhysicalRemover.visitSplit(POSplit)
>  doesn't override method in superclass because parameter type 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit
>  doesn't match superclass parameter type 
> org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit
> NPPossible null pointer dereference of ? in 
> org.apache.pig.impl.logicalLayer.optimizer.PushDownForeachFlatten.check(List)
> NPPossible null pointer dereference of lo in 
> org.apache.pig.impl.logicalLayer.optimizer.StreamOptimizer.transform(List)
> NPPossible null pointer dereference of 
> Schema$FieldSchema.Schema$FieldSchema.alias in 
> org.apache.pig.impl.logicalLayer.schema.Schema.equals(Schema, Schema, 
> boolean, boolean)
> NPPossible null pointer dereference of Schema$FieldSchema.alias in 
> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.equals(Schema$FieldSchema,
>  Schema$FieldSchema, boolean, boolean)
> NPPossible null pointer dereference of inp in 
> org.apache.pig.impl.streaming.ExecutableManager$ProcessInputThread.run()
> RCN   Nullcheck of pigContext at line 123 of value previously dereferenced in 
> org.apache.pig.impl.util.JarManager.createJar(OutputStream, List, PigContext)
> RV
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.fixUpDomain(String,
>  Properties) ignores return value of java.net.InetAddress.getByName(String)
> RVBad a

[jira] Updated: (PIG-970) Support of HBase 0.20.0

2009-11-03 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-970:
---

Status: Open  (was: Patch Available)

> Support of HBase 0.20.0
> ---
>
> Key: PIG-970
> URL: https://issues.apache.org/jira/browse/PIG-970
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Vincent BARAT
>Assignee: Jeff Zhang
> Fix For: 0.5.0
>
> Attachments: build.xml.path, hbase-0.20.0-test.jar, hbase-0.20.0.jar, 
> pig-hbase-0.20.0-support.patch, pig-hbase-20-v2.patch, 
> Pig_HBase_0.20.0.patch, TEST-org.apache.pig.test.TestHBaseStorage.txt, 
> TEST-org.apache.pig.test.TestHBaseStorage.txt, 
> TEST-org.apache.pig.test.TestHBaseStorage.txt, test-output.tgz, 
> zookeeper-hbase-1329.jar
>
>
> The support of HBase is currently very limited and restricted to HBase 0.18.0.
> Because the next releases of PIG will support Hadoop 0.20.0, they should also 
> support HBase 0.20.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: LoadFunc.skipNext() function for faster sampling ?

2009-11-03 Thread Thejas Nair
Yes, that should work. I will use InputFormat.getNext from the SampleLoader
to skip the records.
Thanks,
Thejas


On 11/3/09 6:39 PM, "Alan Gates"  wrote:

> We definitely want to avoid parsing every tuple when sampling.  But do
> we need to implement a special function for it?  Pig will have access
> to the InputFormat instance, correct?  Can it not call
> InputFormat.getNext the desired number of times (which will not parse
> the tuple) and then call LoadFunc.getNext to get the next parsed tuple?
> 
> Alan.
> 
> On Nov 3, 2009, at 4:28 PM, Thejas Nair wrote:
> 
>> In the new implementation of SampleLoader subclasses (used by order-
>> by,
>> skew-join ..) as part of the loader redesign, we are not only
>> reading all
>> the records input but also parsing them as pig tuples.
>> 
>> This is because the SampleLoaders are wrappers around the actual input
>> loaders specified in the query. We can make things much faster by
>> having a
>> skipNext() function (or skipNext(int numSkip) ) which will avoid
>> parsing the
>> record into a pig tuple.
>> LoadFunc could optionally implement this (easy to implement)
>> function (which
>> will be part of an interface) for improving speed of queries such as
>> order-by.
>> 
>> -Thejas
>> 
>