[jira] [Commented] (PIG-2578) Multiple Store-commands mess up mapred.output.dir.

2012-08-23 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440912#comment-13440912
 ] 

Dmitriy V. Ryaboy commented on PIG-2578:


Reverted in PIG-2890. I don't see a way to reopen this jira and change it to 
won't fix..

> Multiple Store-commands mess up mapred.output.dir.
> --
>
> Key: PIG-2578
> URL: https://issues.apache.org/jira/browse/PIG-2578
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.1, 0.9.2
>Reporter: Mithun Radhakrishnan
>Assignee: Daniel Dai
> Fix For: 0.10.0, 0.11
>
> Attachments: PIG-2578-1.patch
>
>
> When one runs a pig-script with multiple storers, one sees the following:
> 1. When run as a script, Pig launches a single job.
> 2. PigOutputCommitter::setupJob() calls the 
> underlyingOutputCommitter::setupJob(), once for each storer. But the 
> mapred.output.dir is the same for both calls, even though the storers write 
> to different locations. 
> This was originally seen in HCATALOG-276, when HCatalog's end-to-end tests 
> are run against Pig.
> (https://issues.apache.org/jira/browse/HCATALOG-276)
> Sample pig-script (near identical to HCatalog's Pig_Checkin_4 test):
> a = load 'keyvals' using org.apache.hcatalog.pig.HCatLoader();
> split a into b if key<200, c if key >=200;
> store b into 'keyvals_lt200' using org.apache.hcatalog.pig.HCatStorer();
> store c into 'keyvals_ge200' using org.apache.hcatalog.pig.HCatStorer();
> I've suggested a workaround in HCat for the time being, but I think this 
> might be something that needs fixing in Pig.
> Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2870) pigServer.openIterator fails for jobs with no input splits

2012-08-23 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440911#comment-13440911
 ] 

Dmitriy V. Ryaboy commented on PIG-2870:


Reverted PIG-2578. Is there still a no-input-split problem, or does this solve 
the issue?

> pigServer.openIterator fails for jobs with no input splits
> --
>
> Key: PIG-2870
> URL: https://issues.apache.org/jira/browse/PIG-2870
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Bill Graham
>Assignee: Bill Graham
> Attachments: PIG-2870.1.patch
>
>
> Jobs that have valid input data, but 0 input splits (this is the case where 
> indexing implemented in the {{InputFormat}} might return 0 splits for an 
> aggressive filter) fail when {{pigServer.openIterator}} is called. This is 
> because {{mapred.output.dir}} isn't set, so the job succeeds without creating 
> the empty output directory. The {{ReadToEndLoader}} then fails due to the 
> null input directory.
> It seems PIG-2578 introduced this issue. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2872) StoreFuncInterface.setStoreLocation get's a copy of a Configuration object

2012-08-23 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440910#comment-13440910
 ] 

Dmitriy V. Ryaboy commented on PIG-2872:


Reverted PIG-2578. Is still a problem, or does reverting that patch fix this 
issue?

> StoreFuncInterface.setStoreLocation get's a copy of a Configuration object
> --
>
> Key: PIG-2872
> URL: https://issues.apache.org/jira/browse/PIG-2872
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.11
> Environment: Pig trunk, Hadoop 0.20.205 with Kerberos, ElasticSearch 
> trunk, Wonderdog trunk
>Reporter: Evert Lammerts
>
> When an implementation of StoreFuncInterface.setStoreLocation is called from 
> JobControlCompiler.getJob, it is passed a copy of the Configuration that will 
> be used for the Job that will be submitted:
> {code:title=JobControlCompiler.java}
> sFunc.setStoreLocation(st.getSFile().getFileName(), new 
> org.apache.hadoop.mapreduce.Job(nwJob.getConfiguration()));
> {code}
> When a new org.apache.hadoop.mapreduce.Job is created it creates a copy of 
> the Configuration object, as far as I know. Thus anything added to the 
> Configuration object in the implementation of setStoreLocation will not be 
> included in the Configuration of nwJob in JobControlCompiler.getJob.
> I notice this goes wrong in Wonderdog, which needs to include the 
> Elasticsearch configuration file in the DistributedCache. It is added to 
> mapred.cache.files through setStoreLocation, but this setting doesn't make it 
> back into the Job returned by JobControlCompiler.getJob, and is therefore 
> never localized.
> This might be intentional semantics within Pig, but I'm not familiar enough 
> with StoreFuncs to know whether it is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2890) Revert PIG-2578

2012-08-23 Thread Dmitriy V. Ryaboy (JIRA)
Dmitriy V. Ryaboy created PIG-2890:
--

 Summary: Revert  PIG-2578
 Key: PIG-2890
 URL: https://issues.apache.org/jira/browse/PIG-2890
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy


PIG-2870 and PIG-2872 contain the discussion on why that patch needs to be 
reverted. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2890) Revert PIG-2578

2012-08-23 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440909#comment-13440909
 ] 

Dmitriy V. Ryaboy commented on PIG-2890:


Reverting PIG-2578 given Daniel Dai's +1 in that ticket.

> Revert  PIG-2578
> 
>
> Key: PIG-2890
> URL: https://issues.apache.org/jira/browse/PIG-2890
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.11
>
>
> PIG-2870 and PIG-2872 contain the discussion on why that patch needs to be 
> reverted. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (PIG-2890) Revert PIG-2578

2012-08-23 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy resolved PIG-2890.


   Resolution: Fixed
Fix Version/s: 0.11

> Revert  PIG-2578
> 
>
> Key: PIG-2890
> URL: https://issues.apache.org/jira/browse/PIG-2890
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.11
>
>
> PIG-2870 and PIG-2872 contain the discussion on why that patch needs to be 
> reverted. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2662) skew join does not honor its config parameters

2012-08-23 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440905#comment-13440905
 ] 

Koji Noguchi commented on PIG-2662:
---

bq. Koji, What OS, JVM are you using ?

It is failing me on linux rhel5.6 + jvm1.6.0_32.  However, now I see that it's 
succeeding on my Mac. I'll take a look tomorrow.  

> skew join does not honor its config parameters
> --
>
> Key: PIG-2662
> URL: https://issues.apache.org/jira/browse/PIG-2662
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.2, 0.10.0
>Reporter: Thejas M Nair
>Assignee: Rajesh Balamohan
> Fix For: 0.11
>
> Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch, PIG-2662.3.patch
>
>
> Skew join can be configured using pig.sksampler.samplerate and 
> pig.skewedjoin.reduce.memusage. But the section of code the retrieves the 
> config values from properties (PoissonSampleLoader.computeSamples) is not 
> getting called. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Number of mappers in MRCompiler

2012-08-23 Thread Prasanth J
Oh yeah.. This question is not related to our cube sampling stuff that we 
discussed.. wanted to know the reason behind that just out of curiosity :) 


Thanks
-- Prasanth

On Aug 23, 2012, at 11:20 PM, Dmitriy Ryaboy  wrote:

> I think we decided to instead stub in a special loader that reads a
> few records from each underlying split, in a single mapper (by using a
> single wrapping split), right?
> 
> On Thu, Aug 23, 2012 at 7:55 PM, Prasanth J  
> wrote:
>> I see. Thanks Alan for your reply.
>> Also one more question that I posted earlier was
>> 
>> I used RandomSampleLoader and specified a sample size of 100. The number of 
>> map tasks that are executed is 110. So I am expecting total samples that are 
>> received on the reducer to be 110*100 = 11000 but its always more than the 
>> expected value. The actual received tuples is between 14000 to 15000. I am 
>> not sure if its a bug or if I am missing something. Is it an expected 
>> behavior?
>> 
>> Thanks
>> -- Prasanth
>> 
>> On Aug 23, 2012, at 6:20 PM, Alan Gates  wrote:
>> 
>>> Sorry for the very slow response, but here it is, hopefully better late 
>>> than never.
>>> 
>>> On Jul 25, 2012, at 4:28 PM, Prasanth J wrote:
>>> 
 Thanks Alan.
 The requirement for me is that I want to load N number of samples based on 
 the input file size and perform naive cube computation to determine the 
 large groups that will not fit in reducer's memory. I need to know the 
 exact number of samples for calculating the partition factor for large 
 groups.
 Currently I am using RandomSampleLoader to load 1000 tuples from each 
 mapper. Without knowing the number of mappers I will not be able to find 
 the exact number of samples loaded. Also RandomSampleLoader doesn't attach 
 any special marker (as in PoissonSampleLoader) tuples which tells the 
 number of samples loaded.
 Is there any other way to know the exact number of samples loaded?
>>> Not that I know of.
>>> 
 
 By analyzing the MR plans of order-by and skewed-join, it seems like the 
 entire dataset is copied to a temp file and then SampleLoaders use the 
 temp file to load samples. Is there any specific reason for this redundant 
 copy? Is it because SampleLoaders can only use pig's internal i/o format?
>>> Partly, but also because it allows any operators that need to run before 
>>> the sample (like project or filter) to be placed in the pipeline.
>>> 
>>> Alan.
>>> 
>> 



Re: Number of mappers in MRCompiler

2012-08-23 Thread Dmitriy Ryaboy
I think we decided to instead stub in a special loader that reads a
few records from each underlying split, in a single mapper (by using a
single wrapping split), right?

On Thu, Aug 23, 2012 at 7:55 PM, Prasanth J  wrote:
> I see. Thanks Alan for your reply.
> Also one more question that I posted earlier was
>
> I used RandomSampleLoader and specified a sample size of 100. The number of 
> map tasks that are executed is 110. So I am expecting total samples that are 
> received on the reducer to be 110*100 = 11000 but its always more than the 
> expected value. The actual received tuples is between 14000 to 15000. I am 
> not sure if its a bug or if I am missing something. Is it an expected 
> behavior?
>
> Thanks
> -- Prasanth
>
> On Aug 23, 2012, at 6:20 PM, Alan Gates  wrote:
>
>> Sorry for the very slow response, but here it is, hopefully better late than 
>> never.
>>
>> On Jul 25, 2012, at 4:28 PM, Prasanth J wrote:
>>
>>> Thanks Alan.
>>> The requirement for me is that I want to load N number of samples based on 
>>> the input file size and perform naive cube computation to determine the 
>>> large groups that will not fit in reducer's memory. I need to know the 
>>> exact number of samples for calculating the partition factor for large 
>>> groups.
>>> Currently I am using RandomSampleLoader to load 1000 tuples from each 
>>> mapper. Without knowing the number of mappers I will not be able to find 
>>> the exact number of samples loaded. Also RandomSampleLoader doesn't attach 
>>> any special marker (as in PoissonSampleLoader) tuples which tells the 
>>> number of samples loaded.
>>> Is there any other way to know the exact number of samples loaded?
>> Not that I know of.
>>
>>>
>>> By analyzing the MR plans of order-by and skewed-join, it seems like the 
>>> entire dataset is copied to a temp file and then SampleLoaders use the temp 
>>> file to load samples. Is there any specific reason for this redundant copy? 
>>> Is it because SampleLoaders can only use pig's internal i/o format?
>> Partly, but also because it allows any operators that need to run before the 
>> sample (like project or filter) to be placed in the pipeline.
>>
>> Alan.
>>
>


[jira] [Resolved] (PIG-2850) Pig should support loading macro files as resources stored in JAR files

2012-08-23 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy resolved PIG-2850.


   Resolution: Fixed
Fix Version/s: 0.11

Committed to trunk.

> Pig should support loading macro files as resources stored in JAR files
> ---
>
> Key: PIG-2850
> URL: https://issues.apache.org/jira/browse/PIG-2850
> Project: Pig
>  Issue Type: Improvement
>Reporter: Matthew Hayes
>Assignee: Matthew Hayes
>Priority: Minor
> Fix For: 0.11
>
> Attachments: import_macros_from_jar_2.diff, 
> import_macros_from_jar_3.diff, import_macros_from_jar_4.diff, 
> import_macros_from_jar_5.diff, import_macros_from_jar.diff
>
>
> A file containing macros can be imported in pig like so:
> {code}
> IMPORT 'some_path/my_macros.pig';
> {code}
> It would be convenient if a macro file could be imported from a registered 
> JAR as well.  This would make it easier to distribute them.  One could 
> package a set of UDFs and macros in a single JAR.  Once the JAR is registered 
> any of the UDFs or macros can be used once.
> For example, support that {{some_path/my_macros.pig}} has been packaged in a 
> JAR named {{my_macros.jar}}.  The above code then becomes 
> {code}
> REGISTER my_macros.jar;
> IMPORT 'some_path/my_macros.pig';
> {code}
> Pig would first check if the file is found at the path 
> {{some_path/my_macros.pig}}, and failing that it would attempt to load a 
> resource by that name.  Since the JAR is registered it will find it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Number of mappers in MRCompiler

2012-08-23 Thread Prasanth J
I see. Thanks Alan for your reply. 
Also one more question that I posted earlier was

I used RandomSampleLoader and specified a sample size of 100. The number of map 
tasks that are executed is 110. So I am expecting total samples that are 
received on the reducer to be 110*100 = 11000 but its always more than the 
expected value. The actual received tuples is between 14000 to 15000. I am not 
sure if its a bug or if I am missing something. Is it an expected behavior?

Thanks
-- Prasanth

On Aug 23, 2012, at 6:20 PM, Alan Gates  wrote:

> Sorry for the very slow response, but here it is, hopefully better late than 
> never.
> 
> On Jul 25, 2012, at 4:28 PM, Prasanth J wrote:
> 
>> Thanks Alan.
>> The requirement for me is that I want to load N number of samples based on 
>> the input file size and perform naive cube computation to determine the 
>> large groups that will not fit in reducer's memory. I need to know the exact 
>> number of samples for calculating the partition factor for large groups. 
>> Currently I am using RandomSampleLoader to load 1000 tuples from each 
>> mapper. Without knowing the number of mappers I will not be able to find the 
>> exact number of samples loaded. Also RandomSampleLoader doesn't attach any 
>> special marker (as in PoissonSampleLoader) tuples which tells the number of 
>> samples loaded. 
>> Is there any other way to know the exact number of samples loaded? 
> Not that I know of.
> 
>> 
>> By analyzing the MR plans of order-by and skewed-join, it seems like the 
>> entire dataset is copied to a temp file and then SampleLoaders use the temp 
>> file to load samples. Is there any specific reason for this redundant copy? 
>> Is it because SampleLoaders can only use pig's internal i/o format? 
> Partly, but also because it allows any operators that need to run before the 
> sample (like project or filter) to be placed in the pipeline.
> 
> Alan.
> 



[jira] [Assigned] (PIG-2850) Pig should support loading macro files as resources stored in JAR files

2012-08-23 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy reassigned PIG-2850:
--

Assignee: Matthew Hayes

> Pig should support loading macro files as resources stored in JAR files
> ---
>
> Key: PIG-2850
> URL: https://issues.apache.org/jira/browse/PIG-2850
> Project: Pig
>  Issue Type: Improvement
>Reporter: Matthew Hayes
>Assignee: Matthew Hayes
>Priority: Minor
> Attachments: import_macros_from_jar_2.diff, 
> import_macros_from_jar_3.diff, import_macros_from_jar_4.diff, 
> import_macros_from_jar_5.diff, import_macros_from_jar.diff
>
>
> A file containing macros can be imported in pig like so:
> {code}
> IMPORT 'some_path/my_macros.pig';
> {code}
> It would be convenient if a macro file could be imported from a registered 
> JAR as well.  This would make it easier to distribute them.  One could 
> package a set of UDFs and macros in a single JAR.  Once the JAR is registered 
> any of the UDFs or macros can be used once.
> For example, support that {{some_path/my_macros.pig}} has been packaged in a 
> JAR named {{my_macros.jar}}.  The above code then becomes 
> {code}
> REGISTER my_macros.jar;
> IMPORT 'some_path/my_macros.pig';
> {code}
> Pig would first check if the file is found at the path 
> {{some_path/my_macros.pig}}, and failing that it would attempt to load a 
> resource by that name.  Since the JAR is registered it will find it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2850) Pig should support loading macro files as resources stored in JAR files

2012-08-23 Thread Matthew Hayes (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Hayes updated PIG-2850:
---

Attachment: import_macros_from_jar_5.diff

yep i did, sorry, now fixed

> Pig should support loading macro files as resources stored in JAR files
> ---
>
> Key: PIG-2850
> URL: https://issues.apache.org/jira/browse/PIG-2850
> Project: Pig
>  Issue Type: Improvement
>Reporter: Matthew Hayes
>Priority: Minor
> Attachments: import_macros_from_jar_2.diff, 
> import_macros_from_jar_3.diff, import_macros_from_jar_4.diff, 
> import_macros_from_jar_5.diff, import_macros_from_jar.diff
>
>
> A file containing macros can be imported in pig like so:
> {code}
> IMPORT 'some_path/my_macros.pig';
> {code}
> It would be convenient if a macro file could be imported from a registered 
> JAR as well.  This would make it easier to distribute them.  One could 
> package a set of UDFs and macros in a single JAR.  Once the JAR is registered 
> any of the UDFs or macros can be used once.
> For example, support that {{some_path/my_macros.pig}} has been packaged in a 
> JAR named {{my_macros.jar}}.  The above code then becomes 
> {code}
> REGISTER my_macros.jar;
> IMPORT 'some_path/my_macros.pig';
> {code}
> Pig would first check if the file is found at the path 
> {{some_path/my_macros.pig}}, and failing that it would attempt to load a 
> resource by that name.  Since the JAR is registered it will find it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2850) Pig should support loading macro files as resources stored in JAR files

2012-08-23 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440890#comment-13440890
 ] 

Dmitriy V. Ryaboy commented on PIG-2850:


Fails to compile. Did you forget to add ResourceNotFoundException ?

> Pig should support loading macro files as resources stored in JAR files
> ---
>
> Key: PIG-2850
> URL: https://issues.apache.org/jira/browse/PIG-2850
> Project: Pig
>  Issue Type: Improvement
>Reporter: Matthew Hayes
>Priority: Minor
> Attachments: import_macros_from_jar_2.diff, 
> import_macros_from_jar_3.diff, import_macros_from_jar_4.diff, 
> import_macros_from_jar.diff
>
>
> A file containing macros can be imported in pig like so:
> {code}
> IMPORT 'some_path/my_macros.pig';
> {code}
> It would be convenient if a macro file could be imported from a registered 
> JAR as well.  This would make it easier to distribute them.  One could 
> package a set of UDFs and macros in a single JAR.  Once the JAR is registered 
> any of the UDFs or macros can be used once.
> For example, support that {{some_path/my_macros.pig}} has been packaged in a 
> JAR named {{my_macros.jar}}.  The above code then becomes 
> {code}
> REGISTER my_macros.jar;
> IMPORT 'some_path/my_macros.pig';
> {code}
> Pig would first check if the file is found at the path 
> {{some_path/my_macros.pig}}, and failing that it would attempt to load a 
> resource by that name.  Since the JAR is registered it will find it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2850) Pig should support loading macro files as resources stored in JAR files

2012-08-23 Thread Matthew Hayes (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440876#comment-13440876
 ] 

Matthew Hayes commented on PIG-2850:


Oops I should have done an svn up.  I resolved the conflicts and submitted 
patch #4.

> Pig should support loading macro files as resources stored in JAR files
> ---
>
> Key: PIG-2850
> URL: https://issues.apache.org/jira/browse/PIG-2850
> Project: Pig
>  Issue Type: Improvement
>Reporter: Matthew Hayes
>Priority: Minor
> Attachments: import_macros_from_jar_2.diff, 
> import_macros_from_jar_3.diff, import_macros_from_jar_4.diff, 
> import_macros_from_jar.diff
>
>
> A file containing macros can be imported in pig like so:
> {code}
> IMPORT 'some_path/my_macros.pig';
> {code}
> It would be convenient if a macro file could be imported from a registered 
> JAR as well.  This would make it easier to distribute them.  One could 
> package a set of UDFs and macros in a single JAR.  Once the JAR is registered 
> any of the UDFs or macros can be used once.
> For example, support that {{some_path/my_macros.pig}} has been packaged in a 
> JAR named {{my_macros.jar}}.  The above code then becomes 
> {code}
> REGISTER my_macros.jar;
> IMPORT 'some_path/my_macros.pig';
> {code}
> Pig would first check if the file is found at the path 
> {{some_path/my_macros.pig}}, and failing that it would attempt to load a 
> resource by that name.  Since the JAR is registered it will find it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2850) Pig should support loading macro files as resources stored in JAR files

2012-08-23 Thread Matthew Hayes (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Hayes updated PIG-2850:
---

Attachment: import_macros_from_jar_4.diff

> Pig should support loading macro files as resources stored in JAR files
> ---
>
> Key: PIG-2850
> URL: https://issues.apache.org/jira/browse/PIG-2850
> Project: Pig
>  Issue Type: Improvement
>Reporter: Matthew Hayes
>Priority: Minor
> Attachments: import_macros_from_jar_2.diff, 
> import_macros_from_jar_3.diff, import_macros_from_jar_4.diff, 
> import_macros_from_jar.diff
>
>
> A file containing macros can be imported in pig like so:
> {code}
> IMPORT 'some_path/my_macros.pig';
> {code}
> It would be convenient if a macro file could be imported from a registered 
> JAR as well.  This would make it easier to distribute them.  One could 
> package a set of UDFs and macros in a single JAR.  Once the JAR is registered 
> any of the UDFs or macros can be used once.
> For example, support that {{some_path/my_macros.pig}} has been packaged in a 
> JAR named {{my_macros.jar}}.  The above code then becomes 
> {code}
> REGISTER my_macros.jar;
> IMPORT 'some_path/my_macros.pig';
> {code}
> Pig would first check if the file is found at the path 
> {{some_path/my_macros.pig}}, and failing that it would attempt to load a 
> resource by that name.  Since the JAR is registered it will find it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2850) Pig should support loading macro files as resources stored in JAR files

2012-08-23 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440862#comment-13440862
 ] 

Dmitriy V. Ryaboy commented on PIG-2850:


Ok.. it's actually a legit conflict, not just a merge issue. This code was 
changed in PIG-2866. I'll ask Bill to take a look.

> Pig should support loading macro files as resources stored in JAR files
> ---
>
> Key: PIG-2850
> URL: https://issues.apache.org/jira/browse/PIG-2850
> Project: Pig
>  Issue Type: Improvement
>Reporter: Matthew Hayes
>Priority: Minor
> Attachments: import_macros_from_jar_2.diff, 
> import_macros_from_jar_3.diff, import_macros_from_jar.diff
>
>
> A file containing macros can be imported in pig like so:
> {code}
> IMPORT 'some_path/my_macros.pig';
> {code}
> It would be convenient if a macro file could be imported from a registered 
> JAR as well.  This would make it easier to distribute them.  One could 
> package a set of UDFs and macros in a single JAR.  Once the JAR is registered 
> any of the UDFs or macros can be used once.
> For example, support that {{some_path/my_macros.pig}} has been packaged in a 
> JAR named {{my_macros.jar}}.  The above code then becomes 
> {code}
> REGISTER my_macros.jar;
> IMPORT 'some_path/my_macros.pig';
> {code}
> Pig would first check if the file is found at the path 
> {{some_path/my_macros.pig}}, and failing that it would attempt to load a 
> resource by that name.  Since the JAR is registered it will find it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-23 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440860#comment-13440860
 ] 

Zhijie Shen commented on PIG-1314:
--

Hi Thejas, let me do that.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, 
> PIG-1314-7.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-23 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440858#comment-13440858
 ] 

Thejas M Nair commented on PIG-1314:


We also need to have some test cases that set the timezone property. This might 
not be easy to do in the e2e framework, so unit test cases are better candidate 
for this. Please let me know if you need any help.


> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, 
> PIG-1314-7.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2850) Pig should support loading macro files as resources stored in JAR files

2012-08-23 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440857#comment-13440857
 ] 

Dmitriy V. Ryaboy commented on PIG-2850:


Patch isn't applying cleanly. I'll remediate and post rebased patch.

> Pig should support loading macro files as resources stored in JAR files
> ---
>
> Key: PIG-2850
> URL: https://issues.apache.org/jira/browse/PIG-2850
> Project: Pig
>  Issue Type: Improvement
>Reporter: Matthew Hayes
>Priority: Minor
> Attachments: import_macros_from_jar_2.diff, 
> import_macros_from_jar_3.diff, import_macros_from_jar.diff
>
>
> A file containing macros can be imported in pig like so:
> {code}
> IMPORT 'some_path/my_macros.pig';
> {code}
> It would be convenient if a macro file could be imported from a registered 
> JAR as well.  This would make it easier to distribute them.  One could 
> package a set of UDFs and macros in a single JAR.  Once the JAR is registered 
> any of the UDFs or macros can be used once.
> For example, support that {{some_path/my_macros.pig}} has been packaged in a 
> JAR named {{my_macros.jar}}.  The above code then becomes 
> {code}
> REGISTER my_macros.jar;
> IMPORT 'some_path/my_macros.pig';
> {code}
> Pig would first check if the file is found at the path 
> {{some_path/my_macros.pig}}, and failing that it would attempt to load a 
> resource by that name.  Since the JAR is registered it will find it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2850) Pig should support loading macro files as resources stored in JAR files

2012-08-23 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440853#comment-13440853
 ] 

Dmitriy V. Ryaboy commented on PIG-2850:


Perfect, thanks for the explanation. I like patch 3, will commit.
Sorry about the slow turnaround, feel free to ping if there are no comments for 
more than a few days next time.. we shouldn't let tickets languish like that.

> Pig should support loading macro files as resources stored in JAR files
> ---
>
> Key: PIG-2850
> URL: https://issues.apache.org/jira/browse/PIG-2850
> Project: Pig
>  Issue Type: Improvement
>Reporter: Matthew Hayes
>Priority: Minor
> Attachments: import_macros_from_jar_2.diff, 
> import_macros_from_jar_3.diff, import_macros_from_jar.diff
>
>
> A file containing macros can be imported in pig like so:
> {code}
> IMPORT 'some_path/my_macros.pig';
> {code}
> It would be convenient if a macro file could be imported from a registered 
> JAR as well.  This would make it easier to distribute them.  One could 
> package a set of UDFs and macros in a single JAR.  Once the JAR is registered 
> any of the UDFs or macros can be used once.
> For example, support that {{some_path/my_macros.pig}} has been packaged in a 
> JAR named {{my_macros.jar}}.  The above code then becomes 
> {code}
> REGISTER my_macros.jar;
> IMPORT 'some_path/my_macros.pig';
> {code}
> Pig would first check if the file is found at the path 
> {{some_path/my_macros.pig}}, and failing that it would attempt to load a 
> resource by that name.  Since the JAR is registered it will find it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-23 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440851#comment-13440851
 ] 

Thejas M Nair commented on PIG-1314:


PIG-1314-7.patch committed to trunk! Thanks Zhijie.

We need to update the documentation regarding this change. Can you please 
upload a new patch for that ? To see generated docs, run - ant 
-Dforrest.home= docs. The files to be edited are 
under - trunk/src/docs/src/documentation/ .

We should also add a few end to end test cases for datetime. See 
https://cwiki.apache.org/confluence/display/PIG/HowToTest#HowToTest-EndtoendTesting
 . We should have a few queries that do some of the basic operations on date 
time, and queries that have order-by , group and join on date fields. 
These can be submitted as multiple patches.  

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, 
> PIG-1314-7.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2811) Updating .eclipse.templates/.classpath with the Newest Jython Version

2012-08-23 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2811:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Fixed in the PIG-1314 patch.


> Updating .eclipse.templates/.classpath with the Newest Jython Version
> -
>
> Key: PIG-2811
> URL: https://issues.apache.org/jira/browse/PIG-2811
> Project: Pig
>  Issue Type: Bug
>  Components: tools
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Trivial
> Fix For: 0.11
>
> Attachments: PIG-2811.patch
>
>
> Jython library version has been upgraded to 2.5.2 by the PIG-2665 patch, but 
> the related modification is not made in the Eclipse template file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2850) Pig should support loading macro files as resources stored in JAR files

2012-08-23 Thread Matthew Hayes (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440824#comment-13440824
 ] 

Matthew Hayes commented on PIG-2850:


The function could legitimately return null if the macro file is not found as a 
file or a resource.  I thought an error about a resource not being found could 
be cryptic if the user isn't expecting this behavior when importing the macro 
file.  So I checked for null in getMacroFile after attempting to fetch as a 
resource and then threw and exception about the file not being found.

Alternatively I could throw a ResourceNotFoundException in fetchResource and 
catch this in getMacroFile.  See diff #3 where I am doing this instead.

Thanks!

> Pig should support loading macro files as resources stored in JAR files
> ---
>
> Key: PIG-2850
> URL: https://issues.apache.org/jira/browse/PIG-2850
> Project: Pig
>  Issue Type: Improvement
>Reporter: Matthew Hayes
>Priority: Minor
> Attachments: import_macros_from_jar_2.diff, 
> import_macros_from_jar_3.diff, import_macros_from_jar.diff
>
>
> A file containing macros can be imported in pig like so:
> {code}
> IMPORT 'some_path/my_macros.pig';
> {code}
> It would be convenient if a macro file could be imported from a registered 
> JAR as well.  This would make it easier to distribute them.  One could 
> package a set of UDFs and macros in a single JAR.  Once the JAR is registered 
> any of the UDFs or macros can be used once.
> For example, support that {{some_path/my_macros.pig}} has been packaged in a 
> JAR named {{my_macros.jar}}.  The above code then becomes 
> {code}
> REGISTER my_macros.jar;
> IMPORT 'some_path/my_macros.pig';
> {code}
> Pig would first check if the file is found at the path 
> {{some_path/my_macros.pig}}, and failing that it would attempt to load a 
> resource by that name.  Since the JAR is registered it will find it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2850) Pig should support loading macro files as resources stored in JAR files

2012-08-23 Thread Matthew Hayes (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Hayes updated PIG-2850:
---

Attachment: import_macros_from_jar_3.diff

> Pig should support loading macro files as resources stored in JAR files
> ---
>
> Key: PIG-2850
> URL: https://issues.apache.org/jira/browse/PIG-2850
> Project: Pig
>  Issue Type: Improvement
>Reporter: Matthew Hayes
>Priority: Minor
> Attachments: import_macros_from_jar_2.diff, 
> import_macros_from_jar_3.diff, import_macros_from_jar.diff
>
>
> A file containing macros can be imported in pig like so:
> {code}
> IMPORT 'some_path/my_macros.pig';
> {code}
> It would be convenient if a macro file could be imported from a registered 
> JAR as well.  This would make it easier to distribute them.  One could 
> package a set of UDFs and macros in a single JAR.  Once the JAR is registered 
> any of the UDFs or macros can be used once.
> For example, support that {{some_path/my_macros.pig}} has been packaged in a 
> JAR named {{my_macros.jar}}.  The above code then becomes 
> {code}
> REGISTER my_macros.jar;
> IMPORT 'some_path/my_macros.pig';
> {code}
> Pig would first check if the file is found at the path 
> {{some_path/my_macros.pig}}, and failing that it would attempt to load a 
> resource by that name.  Since the JAR is registered it will find it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2662) skew join does not honor its config parameters

2012-08-23 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440789#comment-13440789
 ] 

Thejas M Nair commented on PIG-2662:


Koji, What OS, JVM are you using ?


> skew join does not honor its config parameters
> --
>
> Key: PIG-2662
> URL: https://issues.apache.org/jira/browse/PIG-2662
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.2, 0.10.0
>Reporter: Thejas M Nair
>Assignee: Rajesh Balamohan
> Fix For: 0.11
>
> Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch, PIG-2662.3.patch
>
>
> Skew join can be configured using pig.sksampler.samplerate and 
> pig.skewedjoin.reduce.memusage. But the section of code the retrieves the 
> config values from properties (PoissonSampleLoader.computeSamples) is not 
> getting called. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2682) pig harness does not correctly count the number of stores for multiple invocations of the same macro

2012-08-23 Thread Andrey Klochkov (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440774#comment-13440774
 ] 

Andrey Klochkov commented on PIG-2682:
--

How about adding a parameter "expected_outfiles_count" into the hash for tests 
which do that? If the parameter is presented then TestDriverPig.countStores 
would just pick this value instead of trying to parse the script. It's not 
going to be used often so an additional inconvenience should be acceptable.

> pig harness does not correctly count the number of stores for multiple 
> invocations of the same macro
> 
>
> Key: PIG-2682
> URL: https://issues.apache.org/jira/browse/PIG-2682
> Project: Pig
>  Issue Type: Test
>  Components: e2e harness
>Reporter: Araceli Henley
>
> For example, in this macro, TestDriverPig.countStores will only count the 
> number of stores in the "test" macro, not the number of times store is 
> invoked. 
> test (in, out, column, filter_value ) returns b {
>a = load '$in' as (name: chararray, age: int, gpa: float);
>$b = filter a by $column < $filter_value ;
>store $b into '$out';
> }
> x = test( '/user/hadoopqa/pignightly/tests/data/singlefile/studenttab10k', 
> '/user/hadoopqa/pignightly/out/hadoopqa.1336171525/Y_Macro_Misc_7.out.1', 
> 'age', 22 );
> x = test( '/user/hadoopqa/pignightly/tests/data/singlefile/studenttab10k', 
> '/user/hadoopqa/pignightly/out/hadoopqa.1336171525/Y_Macro_Misc_7.out.2', 
> 'gpa', 3.0 );
> There's no easy work around.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2662) skew join does not honor its config parameters

2012-08-23 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440766#comment-13440766
 ] 

Rajesh Balamohan commented on PIG-2662:
---

With the trunk code, I quickly tried to run only this testcase.

git status
# On branch trunk
nothing to commit (working directory clean)

ant -Dtestcase=TestPoissonSampleLoader test-core

   [junit] Running org.apache.pig.test.TestPoissonSampleLoader
   [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 25.917 sec

Plz let me know, if I am missing anything here to reproduce the issue you are 
seeing in your environment.

> skew join does not honor its config parameters
> --
>
> Key: PIG-2662
> URL: https://issues.apache.org/jira/browse/PIG-2662
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.2, 0.10.0
>Reporter: Thejas M Nair
>Assignee: Rajesh Balamohan
> Fix For: 0.11
>
> Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch, PIG-2662.3.patch
>
>
> Skew join can be configured using pig.sksampler.samplerate and 
> pig.skewedjoin.reduce.memusage. But the section of code the retrieves the 
> config values from properties (PoissonSampleLoader.computeSamples) is not 
> getting called. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2850) Pig should support loading macro files as resources stored in JAR files

2012-08-23 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440764#comment-13440764
 ] 

Dmitriy V. Ryaboy commented on PIG-2850:


Looks good. In fetchResource, you check if resourceStream is not null; if it is 
null, the whole function will return null. I suspect that would be surprising 
and cause NPEs downstream. Perhaps an exception-throwing assert would be better 
here? Is there a legit reason for this function to ever return null instead of 
throwing?

> Pig should support loading macro files as resources stored in JAR files
> ---
>
> Key: PIG-2850
> URL: https://issues.apache.org/jira/browse/PIG-2850
> Project: Pig
>  Issue Type: Improvement
>Reporter: Matthew Hayes
>Priority: Minor
> Attachments: import_macros_from_jar_2.diff, 
> import_macros_from_jar.diff
>
>
> A file containing macros can be imported in pig like so:
> {code}
> IMPORT 'some_path/my_macros.pig';
> {code}
> It would be convenient if a macro file could be imported from a registered 
> JAR as well.  This would make it easier to distribute them.  One could 
> package a set of UDFs and macros in a single JAR.  Once the JAR is registered 
> any of the UDFs or macros can be used once.
> For example, support that {{some_path/my_macros.pig}} has been packaged in a 
> JAR named {{my_macros.jar}}.  The above code then becomes 
> {code}
> REGISTER my_macros.jar;
> IMPORT 'some_path/my_macros.pig';
> {code}
> Pig would first check if the file is found at the path 
> {{some_path/my_macros.pig}}, and failing that it would attempt to load a 
> resource by that name.  Since the JAR is registered it will find it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Build failed in Jenkins: Pig-trunk #1304

2012-08-23 Thread Apache Jenkins Server
See 

Changes:

[julien] PIG-2848: TestBuiltInBagToTupleOrString fails now that mock.Storage 
enforces not overwriting output (julien)

--
[...truncated 36576 lines...]
[junit] at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:550)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutdownMiniDfsClusters(MiniGenericCluster.java:87)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutdownMiniDfsAndMrClusters(MiniGenericCluster.java:77)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutDown(MiniGenericCluster.java:68)
[junit] at 
org.apache.pig.test.TestStore.oneTimeTearDown(TestStore.java:129)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
[junit] Shutting down DataNode 2
[junit] at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
[junit] at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
[junit] at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37)
[junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
[junit] at 
junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
[junit] 12/08/23 22:32:43 WARN datanode.FSDatasetAsyncDiskService: 
AsyncDiskService has already shut down.
[junit] 12/08/23 22:32:43 INFO mortbay.log: Stopped 
SelectChannelConnector@localhost:0
[junit] 12/08/23 22:32:43 INFO ipc.Server: Stopping server on 33025
[junit] 12/08/23 22:32:43 INFO ipc.Server: Stopping IPC Server listener on 
33025
[junit] 12/08/23 22:32:43 INFO ipc.Server: IPC Server handler 0 on 33025: 
exiting
[junit] 12/08/23 22:32:43 INFO ipc.Server: IPC Server handler 1 on 33025: 
exiting
[junit] 12/08/23 22:32:43 INFO metrics.RpcInstrumentation: shut down
[junit] 12/08/23 22:32:43 INFO ipc.Server: Stopping IPC Server Responder
[junit] 12/08/23 22:32:43 INFO datanode.DataNode: Waiting for threadgroup 
to exit, active threads is 1
[junit] 12/08/23 22:32:43 INFO ipc.Server: IPC Server handler 2 on 33025: 
exiting
[junit] 12/08/23 22:32:43 WARN datanode.DataNode: 
DatanodeRegistration(127.0.0.1:60748, 
storageID=DS-1471663223-67.195.138.20-60748-1345760712835, infoPort=38391, 
ipcPort=33025):DataXceiveServer:java.nio.channels.AsynchronousCloseException
[junit] at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
[junit] at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:159)
[junit] at 
sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
[junit] at 
org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:131)
[junit] at java.lang.Thread.run(Thread.java:662)
[junit] 
[junit] 12/08/23 22:32:43 INFO datanode.DataNode: Exiting DataXceiveServer
[junit] 12/08/23 22:32:43 INFO hdfs.StateChange: BLOCK* ask 127.0.0.1:57428 
to delete  blk_-4149032668565847939_1078 blk_-1353532049794229177_1073
[junit] 12/08/23 22:32:43 INFO hdfs.StateChange: BLOCK* ask 127.0.0.1:45131 
to delete  blk_-4149032668565847939_1078
[junit] 12/08/23 22:32:43 INFO datanode.DataBlockScanner: Exiting 
DataBlockScanner thread.
[junit] 12/08/23 22:32:44 INFO datanode.DataNode: Waiting for threadgroup 
to exit, active threads is 0
[junit] 12/08/23 22:32:44 INFO datanode.DataNode: 
DatanodeRegistration(127.0.0.1:60748, 
storageID=DS-1471663223-67.195.138.20-60748-1345760712835, infoPort=38391, 
ipcPort=33025):Finishing DataNode in: 
FSDataset{dirpath='
[junit] 12/08/23 22:32:44 INFO ipc.Server: Stopping server on 33025
[junit] 12/08/23 22:32:44 INFO metrics.RpcInstrumentation: shut down
[junit] 12/08/23 22:32:44 INFO datanode.DataNode: Waiting for threadgroup 
to exit, active threads is 0
[junit] 12/08/23 22:32:44 INFO datanode.FSDatasetAsyncDiskService: Shutting 
down all async disk service thre

Re: Number of mappers in MRCompiler

2012-08-23 Thread Alan Gates
Sorry for the very slow response, but here it is, hopefully better late than 
never.

On Jul 25, 2012, at 4:28 PM, Prasanth J wrote:

> Thanks Alan.
> The requirement for me is that I want to load N number of samples based on 
> the input file size and perform naive cube computation to determine the large 
> groups that will not fit in reducer's memory. I need to know the exact number 
> of samples for calculating the partition factor for large groups. 
> Currently I am using RandomSampleLoader to load 1000 tuples from each mapper. 
> Without knowing the number of mappers I will not be able to find the exact 
> number of samples loaded. Also RandomSampleLoader doesn't attach any special 
> marker (as in PoissonSampleLoader) tuples which tells the number of samples 
> loaded. 
> Is there any other way to know the exact number of samples loaded? 
Not that I know of.

> 
> By analyzing the MR plans of order-by and skewed-join, it seems like the 
> entire dataset is copied to a temp file and then SampleLoaders use the temp 
> file to load samples. Is there any specific reason for this redundant copy? 
> Is it because SampleLoaders can only use pig's internal i/o format? 
Partly, but also because it allows any operators that need to run before the 
sample (like project or filter) to be placed in the pipeline.

Alan.



[jira] [Commented] (PIG-2662) skew join does not honor its config parameters

2012-08-23 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440714#comment-13440714
 ] 

Koji Noguchi commented on PIG-2662:
---

bq. @Koji, which version of Pig are you using?
Trunk. 

> skew join does not honor its config parameters
> --
>
> Key: PIG-2662
> URL: https://issues.apache.org/jira/browse/PIG-2662
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.2, 0.10.0
>Reporter: Thejas M Nair
>Assignee: Rajesh Balamohan
> Fix For: 0.11
>
> Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch, PIG-2662.3.patch
>
>
> Skew join can be configured using pig.sksampler.samplerate and 
> pig.skewedjoin.reduce.memusage. But the section of code the retrieves the 
> config values from properties (PoissonSampleLoader.computeSamples) is not 
> getting called. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2662) skew join does not honor its config parameters

2012-08-23 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440706#comment-13440706
 ] 

Rajesh Balamohan commented on PIG-2662:
---

@Koji, which version of Pig are you using?

> skew join does not honor its config parameters
> --
>
> Key: PIG-2662
> URL: https://issues.apache.org/jira/browse/PIG-2662
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.2, 0.10.0
>Reporter: Thejas M Nair
>Assignee: Rajesh Balamohan
> Fix For: 0.11
>
> Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch, PIG-2662.3.patch
>
>
> Skew join can be configured using pig.sksampler.samplerate and 
> pig.skewedjoin.reduce.memusage. But the section of code the retrieves the 
> config values from properties (PoissonSampleLoader.computeSamples) is not 
> getting called. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2844) ant makepom is misconfigured

2012-08-23 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440707#comment-13440707
 ] 

Alan Gates commented on PIG-2844:
-

Ok, +1 I guess then.

> ant makepom is misconfigured
> 
>
> Key: PIG-2844
> URL: https://issues.apache.org/jira/browse/PIG-2844
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2844_0.patch
>
>
> Currently we manually maintain a pom. We should use the ant makepom target 
> for this

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Project the last field of a tuple

2012-08-23 Thread Russell Jurney
This is neat. This should maybe be a built in: Tuple.first and
Tuple.last. Very convenient in Ruby.

Russell Jurney http://datasyndrome.com

On Aug 23, 2012, at 1:53 PM, Jonathan Coveney  wrote:

> here's a UDF to do it that took me about 10s to write, so may have errors:
>
> import java.io.IOException;
> import org.apache.pig.EvalFunc;
> import org.apache.pig.data.Tuple;
> import org.apache.pig.impl.logicalLayer.schema.Schema;
>
> public class LastInTuple extends EvalFunc {
>public Object exec(Tuple input) throws IOException {
>int size = input.size();
>if (size > 0) {
>return input.get(size - 1);
>}
>return null;
>}
>
>public Schema outputSchema(Schema input) {
>try {
>int size = input.size();
>if (size > 0) {
>return new Schema(input.getField(size - 1));
>}
>} catch (Exception e) {}
>return null;
>}
> }
>
> 2012/8/23 Ruslan Al-Fakikh 
>
>> Hi Fabian,
>>
>> I don't know whether there is a built-in feature for this, but here is the
>> idea:
>> try to load the whole line as one field (ignoring the delimiter at
>> this step) and then try to extract the last part using substring,
>> regex, etc.
>>
>> Ruslan
>>
>> On Thu, Aug 23, 2012 at 12:53 PM, Fabian Alenius
>>  wrote:
>>> Hi,
>>>
>>> is there anyway to project the last field of a tuple (when you don't
>>> know how many fields there are) without creating a UDF?
>>>
>>>
>>> Thanks,
>>>
>>> Fabian
>>
>>
>>
>> --
>> Best Regards,
>> Ruslan Al-Fakikh
>>


[jira] [Commented] (PIG-2844) ant makepom is misconfigured

2012-08-23 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440705#comment-13440705
 ] 

Julien Le Dem commented on PIG-2844:


I believe that the previously generated pom was incorrect.
There's no good way of being sure unless we try to build Pig with Maven

> ant makepom is misconfigured
> 
>
> Key: PIG-2844
> URL: https://issues.apache.org/jira/browse/PIG-2844
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2844_0.patch
>
>
> Currently we manually maintain a pom. We should use the ant makepom target 
> for this

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (PIG-2848) TestBuiltInBagToTupleOrString fails now that mock.Storage enforces not overwriting output

2012-08-23 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-2848.


   Resolution: Fixed
Fix Version/s: 0.11

> TestBuiltInBagToTupleOrString fails now that mock.Storage enforces not 
> overwriting output
> -
>
> Key: PIG-2848
> URL: https://issues.apache.org/jira/browse/PIG-2848
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 0.11
>
> Attachments: PIG-2848.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2850) Pig should support loading macro files as resources stored in JAR files

2012-08-23 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440628#comment-13440628
 ] 

Dmitriy V. Ryaboy commented on PIG-2850:


Sorry, didn't notice you updated. Will review.

> Pig should support loading macro files as resources stored in JAR files
> ---
>
> Key: PIG-2850
> URL: https://issues.apache.org/jira/browse/PIG-2850
> Project: Pig
>  Issue Type: Improvement
>Reporter: Matthew Hayes
>Priority: Minor
> Attachments: import_macros_from_jar_2.diff, 
> import_macros_from_jar.diff
>
>
> A file containing macros can be imported in pig like so:
> {code}
> IMPORT 'some_path/my_macros.pig';
> {code}
> It would be convenient if a macro file could be imported from a registered 
> JAR as well.  This would make it easier to distribute them.  One could 
> package a set of UDFs and macros in a single JAR.  Once the JAR is registered 
> any of the UDFs or macros can be used once.
> For example, support that {{some_path/my_macros.pig}} has been packaged in a 
> JAR named {{my_macros.jar}}.  The above code then becomes 
> {code}
> REGISTER my_macros.jar;
> IMPORT 'some_path/my_macros.pig';
> {code}
> Pig would first check if the file is found at the path 
> {{some_path/my_macros.pig}}, and failing that it would attempt to load a 
> resource by that name.  Since the JAR is registered it will find it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2662) skew join does not honor its config parameters

2012-08-23 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440550#comment-13440550
 ] 

Koji Noguchi commented on PIG-2662:
---

It seems like org.apache.pig.test.TestPoissonSampleLoader.testNumSamples is 
failing with 
"expected:<47> but was:<42>" after this patch.  Can someone take a look?

> skew join does not honor its config parameters
> --
>
> Key: PIG-2662
> URL: https://issues.apache.org/jira/browse/PIG-2662
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.2, 0.10.0
>Reporter: Thejas M Nair
>Assignee: Rajesh Balamohan
> Fix For: 0.11
>
> Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch, PIG-2662.3.patch
>
>
> Skew join can be configured using pig.sksampler.samplerate and 
> pig.skewedjoin.reduce.memusage. But the section of code the retrieves the 
> config values from properties (PoissonSampleLoader.computeSamples) is not 
> getting called. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2848) TestBuiltInBagToTupleOrString fails now that mock.Storage enforces not overwriting output

2012-08-23 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440548#comment-13440548
 ] 

Jonathan Coveney commented on PIG-2848:
---

+1

> TestBuiltInBagToTupleOrString fails now that mock.Storage enforces not 
> overwriting output
> -
>
> Key: PIG-2848
> URL: https://issues.apache.org/jira/browse/PIG-2848
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2848.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2848) TestBuiltInBagToTupleOrString fails now that mock.Storage enforces not overwriting output

2012-08-23 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440541#comment-13440541
 ] 

Julien Le Dem commented on PIG-2848:


I just got back from a trip. 
I need a +1 to check this in.

> TestBuiltInBagToTupleOrString fails now that mock.Storage enforces not 
> overwriting output
> -
>
> Key: PIG-2848
> URL: https://issues.apache.org/jira/browse/PIG-2848
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2848.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2848) TestBuiltInBagToTupleOrString fails now that mock.Storage enforces not overwriting output

2012-08-23 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440532#comment-13440532
 ] 

Koji Noguchi commented on PIG-2848:
---

What does it take to get this in? 
I want to see successful unit-test on trunk...

> TestBuiltInBagToTupleOrString fails now that mock.Storage enforces not 
> overwriting output
> -
>
> Key: PIG-2848
> URL: https://issues.apache.org/jira/browse/PIG-2848
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2848.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2850) Pig should support loading macro files as resources stored in JAR files

2012-08-23 Thread Matthew Hayes (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440488#comment-13440488
 ] 

Matthew Hayes commented on PIG-2850:


Is there anything else necessary for me to change?

> Pig should support loading macro files as resources stored in JAR files
> ---
>
> Key: PIG-2850
> URL: https://issues.apache.org/jira/browse/PIG-2850
> Project: Pig
>  Issue Type: Improvement
>Reporter: Matthew Hayes
>Priority: Minor
> Attachments: import_macros_from_jar_2.diff, 
> import_macros_from_jar.diff
>
>
> A file containing macros can be imported in pig like so:
> {code}
> IMPORT 'some_path/my_macros.pig';
> {code}
> It would be convenient if a macro file could be imported from a registered 
> JAR as well.  This would make it easier to distribute them.  One could 
> package a set of UDFs and macros in a single JAR.  Once the JAR is registered 
> any of the UDFs or macros can be used once.
> For example, support that {{some_path/my_macros.pig}} has been packaged in a 
> JAR named {{my_macros.jar}}.  The above code then becomes 
> {code}
> REGISTER my_macros.jar;
> IMPORT 'some_path/my_macros.pig';
> {code}
> Pig would first check if the file is found at the path 
> {{some_path/my_macros.pig}}, and failing that it would attempt to load a 
> resource by that name.  Since the JAR is registered it will find it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2889) HBaseAvroStorage UDF

2012-08-23 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440406#comment-13440406
 ] 

Alan Gates commented on PIG-2889:
-

Hive has an AvroSerDe which I believe can read schema on the fly like this.  
This should just work with HCat.  In theory this should work with HBase as 
well, since SerDes are independent of IF/OF and storage handlers in Hive/HCat.  
This would all need to be tested.

All that said, there's nothing to prevent you from doing it as you propose in 
Pig without HCat.

> HBaseAvroStorage UDF
> 
>
> Key: PIG-2889
> URL: https://issues.apache.org/jira/browse/PIG-2889
> Project: Pig
>  Issue Type: New Feature
>  Components: data, piggybank
>Affects Versions: 0.11
>Reporter: Russell Jurney
>Assignee: Russell Jurney
> Fix For: 0.11
>
>
> I want to use HBaseStorage without specifying the schema. Storing data in 
> Avro format in HBase is a very common practice. I would like to create a UDF, 
> HBaseAvroStorage that works just like the internal HBaseStorage UDF, but 
> loads the Avro schema metadata so that specifying a schema is unnecessary.
> I haven't thought through all the particulars, so if you have - please chime 
> in :)
> I am also not sure if this isn't sort of handled some place in HCatalog?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

2012-08-23 Thread Bill Graham (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440367#comment-13440367
 ] 

Bill Graham commented on PIG-2886:
--

Only a subset of tests run during test-commit. test will run all of them (and 
take a while).  Also annotations are used to indicate that that class contains 
tests.

You can do this to test just one test:

{noformat}
ant clean test -Dtestcase=TestHBaseStorage
{noformat}


> Add Scan TimeRange to HBaseStorage 
> ---
>
> Key: PIG-2886
> URL: https://issues.apache.org/jira/browse/PIG-2886
> Project: Pig
>  Issue Type: Bug
>Reporter: Ted Malaska
>Priority: Minor
> Attachments: PIG-2886-0.patch, PIG-2886-1.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't 
> use PIG right now because they only want to fetch the last day's worth of 
> data in HBase.  A filter with time range would require reading all the HStore 
> files.  If we hold major compaction until after the fetch and use Scan Time 
> Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

2012-08-23 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440270#comment-13440270
 ] 

Ted Malaska commented on PIG-2886:
--

Question.  

I added the test cases and ran the following command and I noticed the 
TestHBaseStore doesn't run.

ant -Djavac.args="-Xlint -Xmaxwarns 1000" clean jar test-commit

I'm thinking that because TestHBaseStorage doesn't extend TestCase, also no 
other classes call TestHBaseStorage.

So the question is: Is there a design reason why TestHBaseStorage is not 
running when running unit test?  Is it ok if I make TestHBaseStorage run during 
unit tests?


> Add Scan TimeRange to HBaseStorage 
> ---
>
> Key: PIG-2886
> URL: https://issues.apache.org/jira/browse/PIG-2886
> Project: Pig
>  Issue Type: Bug
>Reporter: Ted Malaska
>Priority: Minor
> Attachments: PIG-2886-0.patch, PIG-2886-1.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't 
> use PIG right now because they only want to fetch the last day's worth of 
> data in HBase.  A filter with time range would require reading all the HStore 
> files.  If we hold major compaction until after the fetch and use Scan Time 
> Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2353) RANK function like in SQL

2012-08-23 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan AvendaƱo updated PIG-2353:


Attachment: PIG-2353-4.txt

All unit and e2e tests passed. 

> RANK function like in SQL
> -
>
> Key: PIG-2353
> URL: https://issues.apache.org/jira/browse/PIG-2353
> Project: Pig
>  Issue Type: New Feature
>Reporter: Gianmarco De Francisci Morales
>Assignee: Allan AvendaƱo
>  Labels: gsoc2012, mentor
> Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, PIG2353.patch
>
>
> Implement a function that given a (sorted) bag adds to each tuple a unique, 
> increasing identifier without gaps, like what RANK does for SQL.
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012
> Functionality implemented so far, is available at 
> https://reviews.apache.org/r/5523/diff/#index_header

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: RANK function like in SQL

2012-08-23 Thread aavendan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5523/
---

(Updated Aug. 23, 2012, 11:07 a.m.)


Review request for pig, aavendan and Gianmarco De Francisci Morales.


Changes
---

all unit and harness tests passed.


Description
---

Review board for https://issues.apache.org/jira/browse/PIG-2353


This addresses bug PIG-2353.
https://issues.apache.org/jira/browse/PIG-2353


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PhyPlanSetter.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigMapReduceCounter.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/DotPOPrinter.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PlanPrinter.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCounter.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllSameRalationalNodesVisitor.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/LogicalPlanPrinter.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/UidResetter.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LORank.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalNodesVisitor.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/rules/ColumnPruneHelper.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/rules/ColumnPruneVisitor.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/LineageFindRelVisitor.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/ProjectStarExpander.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/SchemaAliasVisitor.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/TypeCheckingRelVisitor.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AliasMasker.g
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstPrinter.g
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryLexer.g
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryParser.g
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/pen/IllustratorAttacher.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/pen/LocalMapReduceSimulator.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/tools/pigstats/ScriptState.java
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/deployers/ExistingClusterDeployer.pm
 1372471 
  
http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/deployers/LocalDeployer.pm
 1372471 
  http://svn.apache.org/repos/a