Re: Want to contribute

2013-04-30 Thread Cheolsoo Park
Please see the following wiki page:
https://cwiki.apache.org/confluence/display/PIG/HowToContribute

Thanks,
Cheolsoo


On Tue, Apr 30, 2013 at 9:10 PM, Naidu MS
wrote:

> Hi How to get the source of pig?
> I am interested in going to source code so that i can learn how the
> framework is written.
> I can help in fixing some minor bugs/jira issues.
> Can some one help me how to get the source code ?
>
>
>
> Regards,
> Naidu
>
>
> On Wed, May 1, 2013 at 9:30 AM, Cheolsoo Park 
> wrote:
>
> > Welcome to Pig. There are hundreds of open jiras:
> >
> >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20PIG%20AND%20status%20%3D%20Open%20ORDER%20BY%20created%20DESC%2C%20priority%20DESC
> >
> > Please feel free to submit patches.
> >
> > Thanks,
> > Cheolsoo
> >
> >
> >
> > On Tue, Apr 30, 2013 at 4:16 PM, Vineet Nair 
> wrote:
> >
> > > Hello all ,
> > >
> > > I was just going through the source code of Pig and I would very much
> > like
> > > to contribute to it.
> > > I was just wondering if there are any small Jira requests that i can
> > start
> > > working on.
> > >
> > > Thanks and regards,
> > > Vineet
> > >
> >
>


Re: Want to contribute

2013-04-30 Thread Naidu MS
Hi How to get the source of pig?
I am interested in going to source code so that i can learn how the
framework is written.
I can help in fixing some minor bugs/jira issues.
Can some one help me how to get the source code ?



Regards,
Naidu


On Wed, May 1, 2013 at 9:30 AM, Cheolsoo Park  wrote:

> Welcome to Pig. There are hundreds of open jiras:
>
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20PIG%20AND%20status%20%3D%20Open%20ORDER%20BY%20created%20DESC%2C%20priority%20DESC
>
> Please feel free to submit patches.
>
> Thanks,
> Cheolsoo
>
>
>
> On Tue, Apr 30, 2013 at 4:16 PM, Vineet Nair  wrote:
>
> > Hello all ,
> >
> > I was just going through the source code of Pig and I would very much
> like
> > to contribute to it.
> > I was just wondering if there are any small Jira requests that i can
> start
> > working on.
> >
> > Thanks and regards,
> > Vineet
> >
>


Re: Want to contribute

2013-04-30 Thread Cheolsoo Park
Welcome to Pig. There are hundreds of open jiras:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20PIG%20AND%20status%20%3D%20Open%20ORDER%20BY%20created%20DESC%2C%20priority%20DESC

Please feel free to submit patches.

Thanks,
Cheolsoo



On Tue, Apr 30, 2013 at 4:16 PM, Vineet Nair  wrote:

> Hello all ,
>
> I was just going through the source code of Pig and I would very much like
> to contribute to it.
> I was just wondering if there are any small Jira requests that i can start
> working on.
>
> Thanks and regards,
> Vineet
>


[jira] [Updated] (PIG-3304) XMLLoader in piggybank does not work with inline closed tags

2013-04-30 Thread Ahmed Eldawy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Eldawy updated PIG-3304:
--

Attachment: xmlloader_inline_close_tag_1.patch

An updated path that handles nested elements with inline close tag

> XMLLoader in piggybank does not work with inline closed tags
> 
>
> Key: PIG-3304
> URL: https://issues.apache.org/jira/browse/PIG-3304
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11.1
>Reporter: Ahmed Eldawy
>  Labels: patch
> Attachments: xmlloader_inline_close_tag_1.patch, 
> xmlloader_inline_close_tag.patch
>
>
> The XMLLoader fails to return elements when tags are closed inline such as
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3285) Jobs using HBaseStorage fail to ship dependency jars

2013-04-30 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646289#comment-13646289
 ] 

Nick Dimiduk commented on PIG-3285:
---

bq. Not too many code paths.

Sure there are. Both Pig and HBase are replicating the behavior of 
{{ToolRunner}}'s libjars argument for including jars with a job. They do so in 
slightly different ways, but thus we have 3 different code-paths. I'd prefer 
consolidation on a single code-path.

bq. Filters out pig and hadoop classes from the list of classes so that pig and 
hadoop jar are not included.

We can add a method, something like {{addHBaseDependencyJars(Job)}} which will 
add only HBase and it's dependency jars (currently: zookeeper, protobuf, 
guava), nothing else. That way, we're not including any redundant Pig or Hadoop 
jars and HBase is managing it's own dependencies (meaning Pig won't have to 
change every time we change something). This is effectively the same doing what 
you say above, "Also ensure that you add HTable.class apart from Zookeeper, 
inputformat, input/output key/value, partitioner and combiner classes," that 
is, omitting inputformat, keys, values, partitioner, combiner. Does that sound 
like it'll accomplish what this filter intends?

bq. Find the jars for the other classes and filter out any jars already present 
in PigContext.extrajars and add only the rest to tmpjars.

How do we access the PigContext? Is it in the jobConf or some such? I'd rather 
not put Pig-specific code in the bowels of HBase mapreduce code; my preference 
is to build generic APIs that can be used across the board.

HBase APIs are designed to assist people writing raw MR jobs against HBase (ie, 
including key/value classes, input/output format classes, &c). The slightly 
different requirements of Pig and Hive need to be addressed as well.

> Jobs using HBaseStorage fail to ship dependency jars
> 
>
> Key: PIG-3285
> URL: https://issues.apache.org/jira/browse/PIG-3285
> Project: Pig
>  Issue Type: Bug
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.11.1
>
> Attachments: 0001-PIG-3285-Add-HBase-dependency-jars.patch, 
> 0001-PIG-3285-Add-HBase-dependency-jars.patch, 1.pig, 1.txt, 2.pig
>
>
> Launching a job consuming {{HBaseStorage}} fails out of the box. The user 
> must specify {{-Dpig.additional.jars}} for HBase and all of its dependencies. 
> Exceptions look something like this:
> {noformat}
> 2013-04-19 18:58:39,360 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.NoClassDefFoundError: com/google/protobuf/Message
>   at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.(HbaseObjectWritable.java:266)
>   at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:612)
>   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84)
>   at $Proxy7.getProtocolVersion(Unknown Source)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:136)
>   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Subscription: PIG patch available

2013-04-30 Thread jira
Issue Subscription
Filter: PIG patch available (29 issues)

Subscriber: pigdaily

Key Summary
PIG-3297Avro files with stringType set to String cannot be read by the 
AvroStorage LoadFunc
https://issues.apache.org/jira/browse/PIG-3297
PIG-3295Casting from bytearray failing after Union (even when each field is 
from a single Loader)
https://issues.apache.org/jira/browse/PIG-3295
PIG-3291TestExampleGenerator fails on Windows because of lack of file name 
escaping
https://issues.apache.org/jira/browse/PIG-3291
PIG-3288Kill jobs if the number of output files is over a configurable limit
https://issues.apache.org/jira/browse/PIG-3288
PIG-3286TestPigContext.testImportList fails in trunk
https://issues.apache.org/jira/browse/PIG-3286
PIG-3285Jobs using HBaseStorage fail to ship dependency jars
https://issues.apache.org/jira/browse/PIG-3285
PIG-3281Pig version in pig.pom is incorrect in branch-0.11
https://issues.apache.org/jira/browse/PIG-3281
PIG-3258Patch to allow MultiStorage to use more than one index to generate 
output tree
https://issues.apache.org/jira/browse/PIG-3258
PIG-3257Add unique identifier UDF
https://issues.apache.org/jira/browse/PIG-3257
PIG-3247Piggybank functions to mimic OVER clause in SQL
https://issues.apache.org/jira/browse/PIG-3247
PIG-3223AvroStorage does not handle comma separated input paths
https://issues.apache.org/jira/browse/PIG-3223
PIG-3210Pig fails to start when it cannot write log to log files
https://issues.apache.org/jira/browse/PIG-3210
PIG-3199Expose LogicalPlan via PigServer API
https://issues.apache.org/jira/browse/PIG-3199
PIG-3166Update eclipse .classpath according to ivy library.properties
https://issues.apache.org/jira/browse/PIG-3166
PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections
https://issues.apache.org/jira/browse/PIG-3123
PIG-3105Fix TestJobSubmission unit test failure.
https://issues.apache.org/jira/browse/PIG-3105
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness
https://issues.apache.org/jira/browse/PIG-3069
PIG-3026Pig checked-in baseline comparisons need a pre-filter to address 
OS-specific newline differences
https://issues.apache.org/jira/browse/PIG-3026
PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline 
script needs simplification
https://issues.apache.org/jira/browse/PIG-3025
PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is 
brittle
https://issues.apache.org/jira/browse/PIG-3024
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-2970Nested foreach getting incorrect schema when having unrelated inner 
query
https://issues.apache.org/jira/browse/PIG-2970
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2873Converting bin/pig shell script to python
https://issues.apache.org/jira/browse/PIG-2873
PIG-2248Pig parser does not detect when a macro name masks a UDF name
https://issues.apache.org/jira/browse/PIG-2248
PIG-2244Macros cannot be passed relation names
https://issues.apache.org/jira/browse/PIG-2244
PIG-1914Support load/store JSON data in Pig
https://issues.apache.org/jira/browse/PIG-1914

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


[jira] [Reopened] (PIG-3304) XMLLoader in piggybank does not work with inline closed tags

2013-04-30 Thread Ahmed Eldawy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Eldawy reopened PIG-3304:
---


Found a bug with nested entities that are closed inline

> XMLLoader in piggybank does not work with inline closed tags
> 
>
> Key: PIG-3304
> URL: https://issues.apache.org/jira/browse/PIG-3304
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11.1
>Reporter: Ahmed Eldawy
>  Labels: patch
> Attachments: xmlloader_inline_close_tag.patch
>
>
> The XMLLoader fails to return elements when tags are closed inline such as
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: A contribution to piggybank XMLLoader

2013-04-30 Thread Ahmed Eldawy
Thanks for your help.
I've created an issue and attached my fix. Here's the link
https://issues.apache.org/jira/browse/PIG-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel
What should I do next to get this code merged with the source repo?

Best regards,
Ahmed Eldawy


On Tue, Apr 30, 2013 at 6:08 PM, Prashant Kommireddi wrote:

> Hi Ahmed,
>
> Thanks for the effort. You will need to signup and open a JIRA "Create
> Issue" https://issues.apache.org/jira/browse/PIG
>
> The JIRA must state the bug along with a description. You can then attach a
> patch with your fix.
>
>
>
>
> On Tue, Apr 30, 2013 at 4:00 PM, Ahmed Eldawy  wrote:
>
> > Hi all,
> >  I wanted to work with XMLLoader and I found a bug in it. I've already
> > fixed the bug and I want to publish my contribution to the source so that
> > others can use it. The bug is basically handling tags that are closed
> > inline such as
> > 
> > Attaching to this mail the patch I created. I added a new test case in
> the
> > corresponding unit test and it passes while original tests still pass. I
> > tried to follow the instructions here
> > https://cwiki.apache.org/confluence/display/PIG/HowToContribute
> > to publish my contribution but it says it has to be attached to an
> > existing issue while there is no issue raised for this bug.
> > Can you please help me?
> >
> > Best regards,
> > Ahmed Eldawy
> >
>


[jira] [Resolved] (PIG-3304) XMLLoader in piggybank does not work with inline closed tags

2013-04-30 Thread Ahmed Eldawy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Eldawy resolved PIG-3304.
---

Resolution: Fixed

> XMLLoader in piggybank does not work with inline closed tags
> 
>
> Key: PIG-3304
> URL: https://issues.apache.org/jira/browse/PIG-3304
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11.1
>Reporter: Ahmed Eldawy
>  Labels: patch
> Attachments: xmlloader_inline_close_tag.patch
>
>
> The XMLLoader fails to return elements when tags are closed inline such as
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3304) XMLLoader in piggybank does not work with inline closed tags

2013-04-30 Thread Ahmed Eldawy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Eldawy updated PIG-3304:
--

Status: Open  (was: Patch Available)

diff --git java/src/main/java/org/apache/pig/piggybank/storage/XMLLoader.java 
java/src/main/java/org/apache/pig/piggybank/storage/XMLLoader.java
index 589a545..9daa1a4 100644
--- java/src/main/java/org/apache/pig/piggybank/storage/XMLLoader.java
+++ java/src/main/java/org/apache/pig/piggybank/storage/XMLLoader.java
@@ -212,10 +212,16 @@ class XMLLoaderBufferedPositionedInputStream extends 
BufferedPositionedInputStre
   //startTag[tmp.length+1] = (byte)'>';
   
   
+  // Used to detect tags that are closed inline
+  byte[] inlineCloseTag = {'/', '>'};
 
   ByteArrayOutputStream collectBuf = new ByteArrayOutputStream(1024);
   int idxTagChar = 0;
   int idxStartTagChar = 0;
+  int idxInlineCloseTagChar = 0;
+  // A flag to indicate that we are currently inside the tag to be matched
+  // Initially set to true as skipToTag has been called earlier
+  boolean insideMatchTag = true;
   boolean startTagMatched = false;
   /*
* Read till an end tag is found.It need not check for any condition 
since it 
@@ -247,10 +253,18 @@ class XMLLoaderBufferedPositionedInputStream extends 
BufferedPositionedInputStre
   
   if (b == startTag[idxStartTagChar]){
  ++idxStartTagChar;
- if(idxStartTagChar == startTag.length)
-startTagMatched = true ; // Set the flag as true if start tag 
matches
-  }else
- idxStartTagChar = 0;
+ if(idxStartTagChar == startTag.length) {
+   startTagMatched = true ; // Set the flag as true if start tag 
matches
+   // We are currently inside the tag to be matched
+   insideMatchTag = true;   
+ }
+  } else {
+idxStartTagChar = 0;
+if (idxStartTagChar > 1) {
+  // Matched only a part of the start tag of some element
+  insideMatchTag = false;
+}
+  }
 
   
   
@@ -268,6 +282,23 @@ class XMLLoaderBufferedPositionedInputStream extends 
BufferedPositionedInputStre
   } else 
 idxTagChar = 0; 
   
+  if (b == inlineCloseTag[idxInlineCloseTagChar]) {
+idxInlineCloseTagChar++;
+if (idxInlineCloseTagChar == inlineCloseTag.length) {
+  idxInlineCloseTagChar = 0;
+  if (insideMatchTag) {
+if(nestedTags==0) // Break the loop if there were no nested 
tags
+  break;
+   else{
+  --nestedTags; // Else decrement the count
+  idxInlineCloseTagChar = 0; // Reset the index
+   }
+  }
+}
+  } else {
+idxInlineCloseTagChar = 0;
+  }
+  
 }
 catch (IOException e) {
   this.setReadable(false);
@@ -339,7 +370,7 @@ class XMLLoaderBufferedPositionedInputStream extends 
BufferedPositionedInputStre
   break;
 case S_MATCH_PREFIX:
   // tag match iff next character is whitespaces or close tag mark
-  if (b == ' ' || b == '\t' || b == '>') {
+  if (Character.isWhitespace(b) || b == '/' || b == '>') {
 matchBuf.write((byte)(b));
 state = S_MATCH_TAG;
   } else {
@@ -355,7 +386,7 @@ class XMLLoaderBufferedPositionedInputStream extends 
BufferedPositionedInputStre
 default:
   throw new IllegalArgumentException("Invalid state: " + state);
   }
-  if (state == S_MATCH_TAG && b == '>') {
+  if (state == S_MATCH_TAG && (b == '>' || Character.isWhitespace(b))) 
{
 break;
   }
   if (state != S_MATCH_TAG && this.getPosition() > limit) {
@@ -406,6 +437,12 @@ class XMLLoaderBufferedPositionedInputStream extends 
BufferedPositionedInputStre
 byte[] collectTag(String tagName, long limit) throws IOException {
ByteArrayOutputStream collectBuf = new ByteArrayOutputStream(1024);
byte[] beginTag = skipToTag(tagName, limit);
+   
+   // Check if the tag is closed inline
+   if (beginTag.length > 2 && beginTag[beginTag.length - 2] == '/' &&
+   beginTag[beginTag.length-1] == '>') {
+ return beginTag;
+   }
 
// No need to search for the end tag if the start tag is not found
if(beginTag.length > 0 ){ 
diff --git 
java/src/test/java/org/apache/pig/piggybank/test/storage/TestXMLLoader.java 
java/src/test/java/org/apache/pig/piggybank/test/storage/TestXMLLoader.java
index 4adc9cd..f83f0d9 100644
--- java/src/test/java/org/apache/pig/piggybank/test/storage/TestXMLLoader.java
+++ java/src/test/java/org/apache/pig/piggy

[jira] [Updated] (PIG-3304) XMLLoader in piggybank does not work with inline closed tags

2013-04-30 Thread Ahmed Eldawy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Eldawy updated PIG-3304:
--

Attachment: xmlloader_inline_close_tag.patch

A patch that solves the bug

> XMLLoader in piggybank does not work with inline closed tags
> 
>
> Key: PIG-3304
> URL: https://issues.apache.org/jira/browse/PIG-3304
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11.1
>Reporter: Ahmed Eldawy
>  Labels: patch
> Attachments: xmlloader_inline_close_tag.patch
>
>
> The XMLLoader fails to return elements when tags are closed inline such as
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3304) XMLLoader in piggybank does not work with inline closed tags

2013-04-30 Thread Ahmed Eldawy (JIRA)
Ahmed Eldawy created PIG-3304:
-

 Summary: XMLLoader in piggybank does not work with inline closed 
tags
 Key: PIG-3304
 URL: https://issues.apache.org/jira/browse/PIG-3304
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Affects Versions: 0.11.1
Reporter: Ahmed Eldawy


The XMLLoader fails to return elements when tags are closed inline such as



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3304) XMLLoader in piggybank does not work with inline closed tags

2013-04-30 Thread Ahmed Eldawy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Eldawy updated PIG-3304:
--

Status: Patch Available  (was: Open)

> XMLLoader in piggybank does not work with inline closed tags
> 
>
> Key: PIG-3304
> URL: https://issues.apache.org/jira/browse/PIG-3304
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11.1
>Reporter: Ahmed Eldawy
>  Labels: patch
>
> The XMLLoader fails to return elements when tags are closed inline such as
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3303) add hadoop h2 artifact to publications in ivy.xml

2013-04-30 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3303:
---

Patch Info: Patch Available

> add hadoop h2 artifact to publications in ivy.xml
> -
>
> Key: PIG-3303
> URL: https://issues.apache.org/jira/browse/PIG-3303
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
> Attachments: PIG-3303.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3303) add hadoop h2 artifact to publications in ivy.xml

2013-04-30 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3303:
---

Attachment: PIG-3303.patch

> add hadoop h2 artifact to publications in ivy.xml
> -
>
> Key: PIG-3303
> URL: https://issues.apache.org/jira/browse/PIG-3303
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
> Attachments: PIG-3303.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3303) add hadoop h2 artifact to publications in ivy.xml

2013-04-30 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3303:
--

 Summary: add hadoop h2 artifact to publications in ivy.xml
 Key: PIG-3303
 URL: https://issues.apache.org/jira/browse/PIG-3303
 Project: Pig
  Issue Type: Bug
Reporter: Julien Le Dem




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Want to contribute

2013-04-30 Thread Vineet Nair
Hello all ,

I was just going through the source code of Pig and I would very much like
to contribute to it.
I was just wondering if there are any small Jira requests that i can start
working on.

Thanks and regards,
Vineet


Re: A contribution to piggybank XMLLoader

2013-04-30 Thread Prashant Kommireddi
Hi Ahmed,

Thanks for the effort. You will need to signup and open a JIRA "Create
Issue" https://issues.apache.org/jira/browse/PIG

The JIRA must state the bug along with a description. You can then attach a
patch with your fix.




On Tue, Apr 30, 2013 at 4:00 PM, Ahmed Eldawy  wrote:

> Hi all,
>  I wanted to work with XMLLoader and I found a bug in it. I've already
> fixed the bug and I want to publish my contribution to the source so that
> others can use it. The bug is basically handling tags that are closed
> inline such as
> 
> Attaching to this mail the patch I created. I added a new test case in the
> corresponding unit test and it passes while original tests still pass. I
> tried to follow the instructions here
> https://cwiki.apache.org/confluence/display/PIG/HowToContribute
> to publish my contribution but it says it has to be attached to an
> existing issue while there is no issue raised for this bug.
> Can you please help me?
>
> Best regards,
> Ahmed Eldawy
>


A contribution to piggybank XMLLoader

2013-04-30 Thread Ahmed Eldawy
Hi all,
 I wanted to work with XMLLoader and I found a bug in it. I've already
fixed the bug and I want to publish my contribution to the source so that
others can use it. The bug is basically handling tags that are closed
inline such as

Attaching to this mail the patch I created. I added a new test case in the
corresponding unit test and it passes while original tests still pass. I
tried to follow the instructions here
https://cwiki.apache.org/confluence/display/PIG/HowToContribute
to publish my contribution but it says it has to be attached to an existing
issue while there is no issue raised for this bug.
Can you please help me?

Best regards,
Ahmed Eldawy


[jira] [Commented] (PIG-3285) Jobs using HBaseStorage fail to ship dependency jars

2013-04-30 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646042#comment-13646042
 ] 

Rohini Palaniswamy commented on PIG-3285:
-

Not too many code paths. For hbase storage, need to write a method that
   * Filters out pig and hadoop classes from the list of classes so that pig 
and hadoop jar are not included. 
   * Find the jars for the other classes and filter out any jars already 
present in PigContext.extrajars and add only the rest to tmpjars.
 
  Also ensure that you add HTable.class apart from Zookeeper, inputformat, 
input/output key/value, partitioner and combiner classes. 

> Jobs using HBaseStorage fail to ship dependency jars
> 
>
> Key: PIG-3285
> URL: https://issues.apache.org/jira/browse/PIG-3285
> Project: Pig
>  Issue Type: Bug
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.11.1
>
> Attachments: 0001-PIG-3285-Add-HBase-dependency-jars.patch, 
> 0001-PIG-3285-Add-HBase-dependency-jars.patch, 1.pig, 1.txt, 2.pig
>
>
> Launching a job consuming {{HBaseStorage}} fails out of the box. The user 
> must specify {{-Dpig.additional.jars}} for HBase and all of its dependencies. 
> Exceptions look something like this:
> {noformat}
> 2013-04-19 18:58:39,360 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.NoClassDefFoundError: com/google/protobuf/Message
>   at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.(HbaseObjectWritable.java:266)
>   at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:612)
>   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84)
>   at $Proxy7.getProtocolVersion(Unknown Source)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:136)
>   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3285) Jobs using HBaseStorage fail to ship dependency jars

2013-04-30 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645956#comment-13645956
 ] 

Nick Dimiduk commented on PIG-3285:
---

Gross. Too many code-paths. How then to proceed? Can we consolidate on a 
single, approved method of shipping job dependencies?

(cc [~apurtell] since you were interested in the related HIVE-2055)

> Jobs using HBaseStorage fail to ship dependency jars
> 
>
> Key: PIG-3285
> URL: https://issues.apache.org/jira/browse/PIG-3285
> Project: Pig
>  Issue Type: Bug
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.11.1
>
> Attachments: 0001-PIG-3285-Add-HBase-dependency-jars.patch, 
> 0001-PIG-3285-Add-HBase-dependency-jars.patch, 1.pig, 1.txt, 2.pig
>
>
> Launching a job consuming {{HBaseStorage}} fails out of the box. The user 
> must specify {{-Dpig.additional.jars}} for HBase and all of its dependencies. 
> Exceptions look something like this:
> {noformat}
> 2013-04-19 18:58:39,360 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.NoClassDefFoundError: com/google/protobuf/Message
>   at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.(HbaseObjectWritable.java:266)
>   at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:612)
>   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84)
>   at $Proxy7.getProtocolVersion(Unknown Source)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:136)
>   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3285) Jobs using HBaseStorage fail to ship dependency jars

2013-04-30 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645934#comment-13645934
 ] 

Rohini Palaniswamy commented on PIG-3285:
-

Nick,
 You can raise a jira in Hadoop to handle duplicates. But since we support all 
older versions, we can't rely on it. Also it will not help with the current 
problem anyways. 

  The problem here is that hbase code is setting some jars in tmpjars which 
copies the jar to hdfs to /user/[username]/.staging and adds that hdfs file to 
DistributedCache.addArchiveToClassPath when JobClient.submitJob() is done. Pig 
already puts the pig.jar as job.jar and it ships the other registered jar to a 
tmp location in hdfs (/tmp/...) and then does a 
DistributedCache.addFileToClassPath before submitting the job. In this case, 
all the three settings are different and since pig does not use tmpfiles or 
tmpjars and does the work by itself the hdfs path is also different. So 
duplicates have to be resolved at the pig level. 

> Jobs using HBaseStorage fail to ship dependency jars
> 
>
> Key: PIG-3285
> URL: https://issues.apache.org/jira/browse/PIG-3285
> Project: Pig
>  Issue Type: Bug
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.11.1
>
> Attachments: 0001-PIG-3285-Add-HBase-dependency-jars.patch, 
> 0001-PIG-3285-Add-HBase-dependency-jars.patch, 1.pig, 1.txt, 2.pig
>
>
> Launching a job consuming {{HBaseStorage}} fails out of the box. The user 
> must specify {{-Dpig.additional.jars}} for HBase and all of its dependencies. 
> Exceptions look something like this:
> {noformat}
> 2013-04-19 18:58:39,360 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.NoClassDefFoundError: com/google/protobuf/Message
>   at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.(HbaseObjectWritable.java:266)
>   at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:612)
>   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84)
>   at $Proxy7.getProtocolVersion(Unknown Source)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:136)
>   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3015) Rewrite of AvroStorage

2013-04-30 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645928#comment-13645928
 ] 

Rohini Palaniswamy commented on PIG-3015:
-

Thanks :). Would be very nice to have it. Just saw that Jonathan had asked the 
same question earlier in his review comments.

> Rewrite of AvroStorage
> --
>
> Key: PIG-3015
> URL: https://issues.apache.org/jira/browse/PIG-3015
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Reporter: Joseph Adler
>Assignee: Joseph Adler
> Attachments: bad.avro, good.avro, PIG-3015-10.patch, 
> PIG-3015-11.patch, PIG-3015-2.patch, PIG-3015-3.patch, PIG-3015-4.patch, 
> PIG-3015-5.patch, PIG-3015-6.patch, PIG-3015-7.patch, PIG-3015-9.patch, 
> PIG-3015-doc-2.patch, PIG-3015-doc.patch, TestInput.java, Test.java, 
> with_dates.pig
>
>
> The current AvroStorage implementation has a lot of issues: it requires old 
> versions of Avro, it copies data much more than needed, and it's verbose and 
> complicated. (One pet peeve of mine is that old versions of Avro don't 
> support Snappy compression.)
> I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
> new implementation is significantly faster, and the code is a lot simpler. 
> Rewriting AvroStorage also enabled me to implement support for Trevni (as 
> TrevniStorage).
> I'm opening this ticket to facilitate discussion while I figure out the best 
> way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: PIG-3223 AvroStorage does not handle comma separated input paths

2013-04-30 Thread Rohini Palaniswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10351/#review19974
---



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java


Doing a globStatus again on a known file (FileStatus) is inefficient. 
Better move this block to a separate method and use that for recursion



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java


Pattern should be a private static variable. This pattern only takes into 
account globs of the form {x,y}.  Hadoop glob status supports a lot more


http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#globStatus(org.apache.hadoop.fs.Path).
 

Found this method in pig which would take care of the logic - 
LoadFunc.getPathStrings() . Use this for splitting paths.  This should simplify 
the whole change




- Rohini Palaniswamy


On April 8, 2013, 10:03 p.m., Johnny Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10351/
> ---
> 
> (Updated April 8, 2013, 10:03 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> ---
> 
> we want to support comma separated input paths in AvroStorage, for example
> "test_dir1/test_glob1.avro,test_dir1/test_glob2.avro,test_dir1/test_glob3.avro"
> "test_dir1/*, test_dir2/test_glob4.avro, test_dir2/test_glob5.avro"
> 
> 
> This addresses bug PIG-3223.
> https://issues.apache.org/jira/browse/PIG-3223
> 
> 
> Diffs
> -
> 
>   
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
>  0ac0225 
>   
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
>  bd7a6d2 
> 
> Diff: https://reviews.apache.org/r/10351/diff/
> 
> 
> Testing
> ---
> 
> added two more test cases in TestAvroStorage.java and they all pass
> 
> 
> Thanks,
> 
> Johnny Zhang
> 
>



[jira] [Commented] (PIG-3285) Jobs using HBaseStorage fail to ship dependency jars

2013-04-30 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645850#comment-13645850
 ] 

Nick Dimiduk commented on PIG-3285:
---

Good point, [~rohini]. Would it be instead relevant for Hadoop to remove 
duplicates in the "tmpjars" list?

Also tangentially relevant, have a look at HBASE-8438 for building a minimal 
set of classpath additions. This will minimize the number of places jars can be 
pulled from.

> Jobs using HBaseStorage fail to ship dependency jars
> 
>
> Key: PIG-3285
> URL: https://issues.apache.org/jira/browse/PIG-3285
> Project: Pig
>  Issue Type: Bug
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.11.1
>
> Attachments: 0001-PIG-3285-Add-HBase-dependency-jars.patch, 
> 0001-PIG-3285-Add-HBase-dependency-jars.patch, 1.pig, 1.txt, 2.pig
>
>
> Launching a job consuming {{HBaseStorage}} fails out of the box. The user 
> must specify {{-Dpig.additional.jars}} for HBase and all of its dependencies. 
> Exceptions look something like this:
> {noformat}
> 2013-04-19 18:58:39,360 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.NoClassDefFoundError: com/google/protobuf/Message
>   at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.(HbaseObjectWritable.java:266)
>   at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:612)
>   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84)
>   at $Proxy7.getProtocolVersion(Unknown Source)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:136)
>   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3015) Rewrite of AvroStorage

2013-04-30 Thread Joseph Adler (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645826#comment-13645826
 ] 

Joseph Adler commented on PIG-3015:
---

[~rohini] OK, looks like I implemented the helper functions, and implemented 
the functionality for Trevni, but didn't implement it for AvroStorage. Will 
follow up with a patch.

> Rewrite of AvroStorage
> --
>
> Key: PIG-3015
> URL: https://issues.apache.org/jira/browse/PIG-3015
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Reporter: Joseph Adler
>Assignee: Joseph Adler
> Attachments: bad.avro, good.avro, PIG-3015-10.patch, 
> PIG-3015-11.patch, PIG-3015-2.patch, PIG-3015-3.patch, PIG-3015-4.patch, 
> PIG-3015-5.patch, PIG-3015-6.patch, PIG-3015-7.patch, PIG-3015-9.patch, 
> PIG-3015-doc-2.patch, PIG-3015-doc.patch, TestInput.java, Test.java, 
> with_dates.pig
>
>
> The current AvroStorage implementation has a lot of issues: it requires old 
> versions of Avro, it copies data much more than needed, and it's verbose and 
> complicated. (One pet peeve of mine is that old versions of Avro don't 
> support Snappy compression.)
> I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
> new implementation is significantly faster, and the code is a lot simpler. 
> Rewriting AvroStorage also enabled me to implement support for Trevni (as 
> TrevniStorage).
> I'm opening this ticket to facilitate discussion while I figure out the best 
> way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3015) Rewrite of AvroStorage

2013-04-30 Thread Joseph Adler (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645823#comment-13645823
 ] 

Joseph Adler commented on PIG-3015:
---

[~rohini]: Great question. I definitely implemented that interface in an 
earlier version; I'm not sure what happened to the code. Let me go through the 
patches to figure that one out.

> Rewrite of AvroStorage
> --
>
> Key: PIG-3015
> URL: https://issues.apache.org/jira/browse/PIG-3015
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Reporter: Joseph Adler
>Assignee: Joseph Adler
> Attachments: bad.avro, good.avro, PIG-3015-10.patch, 
> PIG-3015-11.patch, PIG-3015-2.patch, PIG-3015-3.patch, PIG-3015-4.patch, 
> PIG-3015-5.patch, PIG-3015-6.patch, PIG-3015-7.patch, PIG-3015-9.patch, 
> PIG-3015-doc-2.patch, PIG-3015-doc.patch, TestInput.java, Test.java, 
> with_dates.pig
>
>
> The current AvroStorage implementation has a lot of issues: it requires old 
> versions of Avro, it copies data much more than needed, and it's verbose and 
> complicated. (One pet peeve of mine is that old versions of Avro don't 
> support Snappy compression.)
> I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
> new implementation is significantly faster, and the code is a lot simpler. 
> Rewriting AvroStorage also enabled me to implement support for Trevni (as 
> TrevniStorage).
> I'm opening this ticket to facilitate discussion while I figure out the best 
> way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3015) Rewrite of AvroStorage

2013-04-30 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645815#comment-13645815
 ] 

Rohini Palaniswamy commented on PIG-3015:
-

Joseph,
  Thanks for the good work. Planning to go over it more in detail this week. At 
a glance, had the question why doesn't AvroStorage (even the old one) implement 
LoadPushDown (for column pruning) ? If it's just a miss we can create a 
separate jira for it.

> Rewrite of AvroStorage
> --
>
> Key: PIG-3015
> URL: https://issues.apache.org/jira/browse/PIG-3015
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Reporter: Joseph Adler
>Assignee: Joseph Adler
> Attachments: bad.avro, good.avro, PIG-3015-10.patch, 
> PIG-3015-11.patch, PIG-3015-2.patch, PIG-3015-3.patch, PIG-3015-4.patch, 
> PIG-3015-5.patch, PIG-3015-6.patch, PIG-3015-7.patch, PIG-3015-9.patch, 
> PIG-3015-doc-2.patch, PIG-3015-doc.patch, TestInput.java, Test.java, 
> with_dates.pig
>
>
> The current AvroStorage implementation has a lot of issues: it requires old 
> versions of Avro, it copies data much more than needed, and it's verbose and 
> complicated. (One pet peeve of mine is that old versions of Avro don't 
> support Snappy compression.)
> I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
> new implementation is significantly faster, and the code is a lot simpler. 
> Rewriting AvroStorage also enabled me to implement support for Trevni (as 
> TrevniStorage).
> I'm opening this ticket to facilitate discussion while I figure out the best 
> way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3285) Jobs using HBaseStorage fail to ship dependency jars

2013-04-30 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645772#comment-13645772
 ] 

Rohini Palaniswamy commented on PIG-3285:
-

bq. I am not sure if we double ship those jars if we doing this. Actually I 
would prefer a TableMapReduce.addDependencyJars version which only adds 
hbase.jar/guava.jar/protobuf.jar and additional dependencies when hbase evolves 
(but no hadoop.jar/pig.jar)

Nick, Looking at the code I am sure we will end up double shipping jars which 
is very inefficient. It would be good to write a separate function instead of 
TableMapReduce.addDependencyJars(job) that filters out pig and hadoop jars 
(classes starting with org.apache.pig and org.apache.hadoop) and those in 
pigContext.extraJars from the list of classes in 
TableMapReduce.addDependencyJars(job) and then set them on tmpjars. You can 
reuse JarManager.findContainingJar to find the jar for a class file.

> Jobs using HBaseStorage fail to ship dependency jars
> 
>
> Key: PIG-3285
> URL: https://issues.apache.org/jira/browse/PIG-3285
> Project: Pig
>  Issue Type: Bug
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.11.1
>
> Attachments: 0001-PIG-3285-Add-HBase-dependency-jars.patch, 
> 0001-PIG-3285-Add-HBase-dependency-jars.patch, 1.pig, 1.txt, 2.pig
>
>
> Launching a job consuming {{HBaseStorage}} fails out of the box. The user 
> must specify {{-Dpig.additional.jars}} for HBase and all of its dependencies. 
> Exceptions look something like this:
> {noformat}
> 2013-04-19 18:58:39,360 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.NoClassDefFoundError: com/google/protobuf/Message
>   at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.(HbaseObjectWritable.java:266)
>   at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:612)
>   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84)
>   at $Proxy7.getProtocolVersion(Unknown Source)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:136)
>   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3169) Remove intermediate data after a job finishes

2013-04-30 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3169:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thank you Mark!

> Remove intermediate data after a job finishes
> -
>
> Key: PIG-3169
> URL: https://issues.apache.org/jira/browse/PIG-3169
> Project: Pig
>  Issue Type: Improvement
>Reporter: Mark Wagner
>Assignee: Mark Wagner
>Priority: Minor
> Fix For: 0.12
>
> Attachments: PIG-3169.1.patch, PIG-3169.2.patch, PIG-3169.3.patch, 
> PIG-3169.4.patch, PIG-3169.5.patch, PIG-3169.6.patch, PIG-3169-hotfix.patch
>
>
> When using Grunt, intermediate data and distributed caches files are left in 
> 'pig.temp.dir' until the session is closed. It would be nice to cleanup files 
> as they are no longer needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3169) Remove intermediate data after a job finishes

2013-04-30 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645730#comment-13645730
 ] 

Cheolsoo Park commented on PIG-3169:


+1. All unit tests pass except TestPigContext.testImportList, which is a known 
issue (PIG-3286).

> Remove intermediate data after a job finishes
> -
>
> Key: PIG-3169
> URL: https://issues.apache.org/jira/browse/PIG-3169
> Project: Pig
>  Issue Type: Improvement
>Reporter: Mark Wagner
>Assignee: Mark Wagner
>Priority: Minor
> Fix For: 0.12
>
> Attachments: PIG-3169.1.patch, PIG-3169.2.patch, PIG-3169.3.patch, 
> PIG-3169.4.patch, PIG-3169.5.patch, PIG-3169.6.patch, PIG-3169-hotfix.patch
>
>
> When using Grunt, intermediate data and distributed caches files are left in 
> 'pig.temp.dir' until the session is closed. It would be nice to cleanup files 
> as they are no longer needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3285) Jobs using HBaseStorage fail to ship dependency jars

2013-04-30 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645712#comment-13645712
 ] 

Nick Dimiduk commented on PIG-3285:
---

bump.

> Jobs using HBaseStorage fail to ship dependency jars
> 
>
> Key: PIG-3285
> URL: https://issues.apache.org/jira/browse/PIG-3285
> Project: Pig
>  Issue Type: Bug
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.11.1
>
> Attachments: 0001-PIG-3285-Add-HBase-dependency-jars.patch, 
> 0001-PIG-3285-Add-HBase-dependency-jars.patch, 1.pig, 1.txt, 2.pig
>
>
> Launching a job consuming {{HBaseStorage}} fails out of the box. The user 
> must specify {{-Dpig.additional.jars}} for HBase and all of its dependencies. 
> Exceptions look something like this:
> {noformat}
> 2013-04-19 18:58:39,360 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.NoClassDefFoundError: com/google/protobuf/Message
>   at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.(HbaseObjectWritable.java:266)
>   at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:612)
>   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84)
>   at $Proxy7.getProtocolVersion(Unknown Source)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:136)
>   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira