Re: [DISCUSS] Release Pig 0.9.0 (candidate 0)

2011-06-23 Thread Dmitriy Ryaboy
FWIW I just committed the fix for PIG-2137 to both trunk and the 0.9 branch.
Thanks Thejas for testing & improvements.

D

On Thu, Jun 23, 2011 at 3:21 PM, Dmitriy Ryaboy  wrote:

> Moving discussion to a discuss thread so votes can stay on the vote thread.
>
> Richa, I don't feel those two issues you point out are blockers for a
> release, but a patch for 0.10 would be welcome to fix the hardcoding issue
> (gosh I can't imagine how you found that.. :)).
>
> D
>
> On Thu, Jun 23, 2011 at 3:04 PM, Richa Khandelwal wrote:
>
>> Hi Guys,
>>
>> I would vote +1 to remove the hadoop jar from the pig.jar. That would
>> definitely add to the flexibility of using Pig with any version or variant
>> of hadoop.
>>
>> Also, there is "hdfs://" hardcoded at a few places in the pig source. I
>> can
>> provide a patch which removes the hard-coding and makes it possible to use
>> pig on any distributed filesystem that can be used with hadoop, based on
>> pig.properties configuration file. Please let me know.
>>
>> Thanks,
>> Richa
>>
>> On Thu, Jun 23, 2011 at 2:56 PM, Olga Natkovich 
>> wrote:
>>
>> > My experience with sample command is that people use it for some quick
>> > testing and debugging not for production problems. This issue has been
>> there
>> > since the initial introduction of sample. I still don't see a strong
>> need to
>> > hold the beta release for it.
>> >
>> > Olga
>> >
>> > -Original Message-
>> > From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
>> > Sent: Thursday, June 23, 2011 10:14 AM
>> > To: dev@pig.apache.org
>> > Cc: dev@pig.apache.org
>> > Subject: Re: [VOTE] Release Pig 0.9.0 (candidate 0)
>> >
>> > There is a workaround - turning off optimizations, or at least
>> > pushUpFilter. I haven't checked 8 with the new logical plan disabled,
>> that
>> > may work too.
>> > If this failed execution, I'd let it pass since there is a workaround,
>> but
>> > as is it silently returns incorrect data, with no way for an analyst to
>> know
>> > her statistics are now fundamentally wrong. Silent and subtle data
>> > corruption is just about the wrist big we can have.
>> > I can't really block the release, you guys can outvote me. But I'd
>> rather
>> > you didn't; we can patch this problem today.
>> >
>> > On Jun 23, 2011, at 7:47 AM, Alan Gates  wrote:
>> >
>> > > Are you referring to PIG-2137?  I have a few of questions on that
>> before
>> > I vote for this release candidate or to reroll.
>> > >
>> > > Is this a new issue introduced in 0.9?
>> > >
>> > > Is there a workaround for this?
>> > >
>> > > We have already discussed that 0.9.0 will be beta quality, and a
>> follow
>> > up release will be needed as users find bugs.  As sample is not a
>> heavily
>> > used feature I am inclined to view this bug as ok.  You feel this is
>> serious
>> > enough to block a beta release?
>> > >
>> > > Alan.
>> > >
>> > > On Jun 22, 2011, at 9:11 PM, Dmitriy Ryaboy wrote:
>> > >
>> > >> -1
>> > >>
>> > >> I discovered a critical bug in how SAMPLE is treated; I don't think
>> we
>> > >> should release until it's fixed (Thejas is testing the fix).
>> > >>
>> > >> D
>> > >>
>> > >> On Wed, Jun 22, 2011 at 5:56 PM, Olga Natkovich > >
>> > wrote:
>> > >>
>> > >>> I have created a candidate build for Pig 0.9.0 release. This release
>> > >>> introduces control structures, changes query parser, and performs
>> > semantic
>> > >>> cleanup.
>> > >>>
>> > >>> The rat report showed no issues in Java files outside of build
>> > directory.
>> > >>>
>> > >>>
>> > >>> Keys used to sign the release are available at
>> > >>> http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup.
>> > >>>
>> > >>>
>> > >>>
>> > >>> Please try it out:
>> > >>>
>> > >>>
>> > >>>
>> > >>> http://people.apache.org/~olga/pig-0.9.0-candidate-0/
>> > >>>
>> > >>>
>> > >>>
>> > >>> Build is also available in maven:
>> > >>>
>> >
>> https://repository.apache.org/content/repositories/orgapachepig-042/org/apache/pig/pig/0.9.0/
>> > >>>
>> > >>>
>> > >>>
>> > >>> Should we release this? Vote closes on Monday, June 27.
>> > >>>
>> > >>>
>> > >>>
>> > >>> Olga
>> > >>>
>> > >>>
>> > >
>> >
>>
>
>


[jira] [Updated] (PIG-2137) SAMPLE should not be pushed above DISTINCT

2011-06-23 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-2137:
---

   Resolution: Fixed
Fix Version/s: 0.10
   0.9.0
   Status: Resolved  (was: Patch Available)

Committed to 0.9 and 0.10 (trunk)

> SAMPLE should not be pushed above DISTINCT
> --
>
> Key: PIG-2137
> URL: https://issues.apache.org/jira/browse/PIG-2137
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Critical
> Fix For: 0.9.0, 0.10
>
> Attachments: PIG-2137.1.patch, PIG-2137.2.patch, PIG-2137.patch
>
>
> I have an input file that contains 50,000 distinct integers. Each integer is 
> repeated twice, for a total of 100,000 lines.
> Script 1, using GROUP BY to get distinct entries in the data, works:
> {code}
> grunt> f = load 'tmp/dupnumbers.txt';  
> grunt> d = foreach (group f by $0) generate group; 
> grunt> s = sample d 0.01;  
> grunt> n = foreach (group s all) generate COUNT(s);
> grunt> dump n;
> (493)
> {code}
> Script 2, using DISTINCT for the same purpose, allows sampling to be done 
> before DISTINCT:
> {code}
> grunt> f = load 'tmp/dupnumbers.txt';  
> grunt> d = distinct f;
> grunt> s = sample d 0.01;  
> grunt> n = foreach (group s all) generate COUNT(s);
> (980)
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2139) LogicalExpressionSimplifier optimizer rule should check if udf is deterministic while checking if they are equal

2011-06-23 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054210#comment-13054210
 ] 

Dmitriy V. Ryaboy commented on PIG-2139:


Thejas can you add an example of what this breaks to the ticket? Does the rule 
only affect 0.10 or is this also in 8 and 9?

> LogicalExpressionSimplifier optimizer rule should check if udf is 
> deterministic while checking if they are equal
> 
>
> Key: PIG-2139
> URL: https://issues.apache.org/jira/browse/PIG-2139
> Project: Pig
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.10
>
>
> LogicalExpressionSimplifier simplifies filter expressions. In the process, it 
> compares udfs to see if they are 'equal' (ie expected to produce same 
> results). But it does not check if the udfs are annotated as 
> @Nondeterministic. If such an annotation exists, then the udfs should not be 
> considered equal. UserFuncition.isEqual() is being used to compare the udfs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2142) Allow registering multiple jars from DFS via single statement

2011-06-23 Thread Dmitriy V. Ryaboy (JIRA)
Allow registering multiple jars from DFS via single statement
-

 Key: PIG-2142
 URL: https://issues.apache.org/jira/browse/PIG-2142
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Assignee: Raghu Angadi
 Fix For: 0.10


Pig currently allows users to register jars from local and remote filesystems, 
but only one jar can be specified at a time. It would be great to be able to 
say something along the lines of "register hdfs://user/me/lib/*lucene*.jar" and 
get all the jars registered in one go.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2137) SAMPLE should not be pushed above DISTINCT

2011-06-23 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054195#comment-13054195
 ] 

Dmitriy V. Ryaboy commented on PIG-2137:


For 0.8 I was going to backport PIG-2014 before this one.. we are running both 
in production right now (on top of 8.1), they are fine.
Although I did have trouble backporting the tests, a bunch of the optimizer 
interfaces seem to have changed. I don't think 8 is as important, since it 
doesn't seem likely we'll release 8.2 what with 0.9.0 being almost out the door.

> SAMPLE should not be pushed above DISTINCT
> --
>
> Key: PIG-2137
> URL: https://issues.apache.org/jira/browse/PIG-2137
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Critical
> Attachments: PIG-2137.1.patch, PIG-2137.2.patch, PIG-2137.patch
>
>
> I have an input file that contains 50,000 distinct integers. Each integer is 
> repeated twice, for a total of 100,000 lines.
> Script 1, using GROUP BY to get distinct entries in the data, works:
> {code}
> grunt> f = load 'tmp/dupnumbers.txt';  
> grunt> d = foreach (group f by $0) generate group; 
> grunt> s = sample d 0.01;  
> grunt> n = foreach (group s all) generate COUNT(s);
> grunt> dump n;
> (493)
> {code}
> Script 2, using DISTINCT for the same purpose, allows sampling to be done 
> before DISTINCT:
> {code}
> grunt> f = load 'tmp/dupnumbers.txt';  
> grunt> d = distinct f;
> grunt> s = sample d 0.01;  
> grunt> n = foreach (group s all) generate COUNT(s);
> (980)
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2141) Do not bundle apache commons jars with pig-withouthadoop.jar

2011-06-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2141:
--

Description: These jars are already available with hadoop installation.   
(was: This jars are already available with hadoop installation. )

> Do not bundle apache commons jars with pig-withouthadoop.jar
> 
>
> Key: PIG-2141
> URL: https://issues.apache.org/jira/browse/PIG-2141
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: site, 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2141.patch
>
>
> These jars are already available with hadoop installation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2141) Do not bundle apache commons jars with pig-withouthadoop.jar

2011-06-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2141:
--

Attachment: PIG-2141.patch

> Do not bundle apache commons jars with pig-withouthadoop.jar
> 
>
> Key: PIG-2141
> URL: https://issues.apache.org/jira/browse/PIG-2141
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: site, 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2141.patch
>
>
> This jars are already available with hadoop installation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2141) Do not bundle apache commons jars with pig-withouthadoop.jar

2011-06-23 Thread Richard Ding (JIRA)
Do not bundle apache commons jars with pig-withouthadoop.jar


 Key: PIG-2141
 URL: https://issues.apache.org/jira/browse/PIG-2141
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: site, 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0


This jars are already available with hadoop installation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (PIG-2139) LogicalExpressionSimplifier optimizer rule should check if udf is deterministic while checking if they are equal

2011-06-23 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair reassigned PIG-2139:
--

Assignee: Thejas M Nair

> LogicalExpressionSimplifier optimizer rule should check if udf is 
> deterministic while checking if they are equal
> 
>
> Key: PIG-2139
> URL: https://issues.apache.org/jira/browse/PIG-2139
> Project: Pig
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.10
>
>
> LogicalExpressionSimplifier simplifies filter expressions. In the process, it 
> compares udfs to see if they are 'equal' (ie expected to produce same 
> results). But it does not check if the udfs are annotated as 
> @Nondeterministic. If such an annotation exists, then the udfs should not be 
> considered equal. UserFuncition.isEqual() is being used to compare the udfs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (PIG-2100) 'explain -script' does not perform parameter substitution for parameters specified on commandline

2011-06-23 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair reassigned PIG-2100:
--

Assignee: Thejas M Nair

> 'explain -script' does not perform parameter substitution for parameters 
> specified on commandline
> -
>
> Key: PIG-2100
> URL: https://issues.apache.org/jira/browse/PIG-2100
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.8.1, 0.9.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>
> {code}
> # the file
> $  cat t.pig
> a = load '$file' as (a0, a1);
> dump a;
> # parameter on commandline gets substituted 
> $ java -Xmx500m  -classpath pig.jar org.apache.pig.Main -x local  -dryrun -p 
> file=x t.pig
> 2011-05-31 14:00:24,999 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /Users/tejas/pig_lpgen_2083/trunk/pig_1306875624997.log
> 2011-05-31 14:00:25,321 [main] INFO  org.apache.pig.Main - Dry run completed. 
> Substituted pig script is at t.pig.substituted
> $ cat t.pig.substituted 
> a = load 'x' as (a0, a1);
> dump a;
> # but param in commandline does not get used for explain command, and it fails
> java -Xmx500m  -classpath pig.jar org.apache.pig.Main -x local-p file=x  
> -e 'explain -script t.pig;'
> 2011-05-31 14:01:07,217 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /Users/tejas/pig_lpgen_2083/trunk/pig_1306875667215.log
> 2011-05-31 14:01:07,364 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: file:///
> 2011-05-31 14:01:07,547 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2999: Unexpected internal error. Undefined parameter : file
> # parameter gets substituted when specified using %declare statement.
> cat t2.pig
> %declare file x
> a = load '$file' as (a0, a1);
> dump a;
> java -Xmx500m  -classpath pig.jar org.apache.pig.Main -x local-p file=x  
> -e 'explain -script t2.pig;'
> ..
> 2011-05-31 14:01:44,059 [main] WARN  org.apache.pig.tools.grunt.GruntParser - 
> 'dump' statement is ignored while processing 'explain -script' or '-check'
> Logical plan is empty.
> Physical plan is empty.
> Execution plan is empty.
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2137) SAMPLE should not be pushed above DISTINCT

2011-06-23 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054155#comment-13054155
 ] 

Thejas M Nair commented on PIG-2137:


Dmitriy,
Unit tests and test-patch have passed. You can commit the patch.
But this patch can't be committed to 0.8, as the Nondeterministic annotation 
was added only in 0.9.


> SAMPLE should not be pushed above DISTINCT
> --
>
> Key: PIG-2137
> URL: https://issues.apache.org/jira/browse/PIG-2137
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Critical
> Attachments: PIG-2137.1.patch, PIG-2137.2.patch, PIG-2137.patch
>
>
> I have an input file that contains 50,000 distinct integers. Each integer is 
> repeated twice, for a total of 100,000 lines.
> Script 1, using GROUP BY to get distinct entries in the data, works:
> {code}
> grunt> f = load 'tmp/dupnumbers.txt';  
> grunt> d = foreach (group f by $0) generate group; 
> grunt> s = sample d 0.01;  
> grunt> n = foreach (group s all) generate COUNT(s);
> grunt> dump n;
> (493)
> {code}
> Script 2, using DISTINCT for the same purpose, allows sampling to be done 
> before DISTINCT:
> {code}
> grunt> f = load 'tmp/dupnumbers.txt';  
> grunt> d = distinct f;
> grunt> s = sample d 0.01;  
> grunt> n = foreach (group s all) generate COUNT(s);
> (980)
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Jenkins build is back to normal : Pig-trunk #1032

2011-06-23 Thread Apache Jenkins Server
See 




Re: [DISCUSS] Release Pig 0.9.0 (candidate 0)

2011-06-23 Thread Alan Gates
Agreed, same with including a pig-withouthadoop jar.  We do publish  
such a jar already in maven.  This is a non-critical (by which I mean  
it doesn't cause failure or wrong results, not that it isn't  
important) issue and there is not agreement yet on whether the without  
jar should be included in the tar ball.  So we shouldn't hold a  
release for it.


Alan.

On Jun 23, 2011, at 3:21 PM, Dmitriy Ryaboy wrote:

Moving discussion to a discuss thread so votes can stay on the vote  
thread.


Richa, I don't feel those two issues you point out are blockers for a
release, but a patch for 0.10 would be welcome to fix the hardcoding  
issue

(gosh I can't imagine how you found that.. :)).

D

On Thu, Jun 23, 2011 at 3:04 PM, Richa Khandelwal  
wrote:



Hi Guys,

I would vote +1 to remove the hadoop jar from the pig.jar. That would
definitely add to the flexibility of using Pig with any version or  
variant

of hadoop.

Also, there is "hdfs://" hardcoded at a few places in the pig  
source. I can
provide a patch which removes the hard-coding and makes it possible  
to use
pig on any distributed filesystem that can be used with hadoop,  
based on

pig.properties configuration file. Please let me know.

Thanks,
Richa

On Thu, Jun 23, 2011 at 2:56 PM, Olga Natkovich 
wrote:

My experience with sample command is that people use it for some  
quick
testing and debugging not for production problems. This issue has  
been

there
since the initial introduction of sample. I still don't see a  
strong need

to

hold the beta release for it.

Olga

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
Sent: Thursday, June 23, 2011 10:14 AM
To: dev@pig.apache.org
Cc: dev@pig.apache.org
Subject: Re: [VOTE] Release Pig 0.9.0 (candidate 0)

There is a workaround - turning off optimizations, or at least
pushUpFilter. I haven't checked 8 with the new logical plan  
disabled,

that

may work too.
If this failed execution, I'd let it pass since there is a  
workaround,

but
as is it silently returns incorrect data, with no way for an  
analyst to

know

her statistics are now fundamentally wrong. Silent and subtle data
corruption is just about the wrist big we can have.
I can't really block the release, you guys can outvote me. But I'd  
rather

you didn't; we can patch this problem today.

On Jun 23, 2011, at 7:47 AM, Alan Gates  wrote:


Are you referring to PIG-2137?  I have a few of questions on that

before

I vote for this release candidate or to reroll.


Is this a new issue introduced in 0.9?

Is there a workaround for this?

We have already discussed that 0.9.0 will be beta quality, and a  
follow
up release will be needed as users find bugs.  As sample is not a  
heavily

used feature I am inclined to view this bug as ok.  You feel this is

serious

enough to block a beta release?


Alan.

On Jun 22, 2011, at 9:11 PM, Dmitriy Ryaboy wrote:


-1

I discovered a critical bug in how SAMPLE is treated; I don't  
think we

should release until it's fixed (Thejas is testing the fix).

D

On Wed, Jun 22, 2011 at 5:56 PM, Olga Natkovich >

wrote:


I have created a candidate build for Pig 0.9.0 release. This  
release

introduces control structures, changes query parser, and performs

semantic

cleanup.

The rat report showed no issues in Java files outside of build

directory.



Keys used to sign the release are available at
http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup.



Please try it out:



http://people.apache.org/~olga/pig-0.9.0-candidate-0/



Build is also available in maven:




https://repository.apache.org/content/repositories/orgapachepig-042/org/apache/pig/pig/0.9.0/




Should we release this? Vote closes on Monday, June 27.



Olga












Re: [DISCUSS] Release Pig 0.9.0 (candidate 0)

2011-06-23 Thread Dmitriy Ryaboy
Moving discussion to a discuss thread so votes can stay on the vote thread.

Richa, I don't feel those two issues you point out are blockers for a
release, but a patch for 0.10 would be welcome to fix the hardcoding issue
(gosh I can't imagine how you found that.. :)).

D

On Thu, Jun 23, 2011 at 3:04 PM, Richa Khandelwal wrote:

> Hi Guys,
>
> I would vote +1 to remove the hadoop jar from the pig.jar. That would
> definitely add to the flexibility of using Pig with any version or variant
> of hadoop.
>
> Also, there is "hdfs://" hardcoded at a few places in the pig source. I can
> provide a patch which removes the hard-coding and makes it possible to use
> pig on any distributed filesystem that can be used with hadoop, based on
> pig.properties configuration file. Please let me know.
>
> Thanks,
> Richa
>
> On Thu, Jun 23, 2011 at 2:56 PM, Olga Natkovich 
> wrote:
>
> > My experience with sample command is that people use it for some quick
> > testing and debugging not for production problems. This issue has been
> there
> > since the initial introduction of sample. I still don't see a strong need
> to
> > hold the beta release for it.
> >
> > Olga
> >
> > -Original Message-
> > From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
> > Sent: Thursday, June 23, 2011 10:14 AM
> > To: dev@pig.apache.org
> > Cc: dev@pig.apache.org
> > Subject: Re: [VOTE] Release Pig 0.9.0 (candidate 0)
> >
> > There is a workaround - turning off optimizations, or at least
> > pushUpFilter. I haven't checked 8 with the new logical plan disabled,
> that
> > may work too.
> > If this failed execution, I'd let it pass since there is a workaround,
> but
> > as is it silently returns incorrect data, with no way for an analyst to
> know
> > her statistics are now fundamentally wrong. Silent and subtle data
> > corruption is just about the wrist big we can have.
> > I can't really block the release, you guys can outvote me. But I'd rather
> > you didn't; we can patch this problem today.
> >
> > On Jun 23, 2011, at 7:47 AM, Alan Gates  wrote:
> >
> > > Are you referring to PIG-2137?  I have a few of questions on that
> before
> > I vote for this release candidate or to reroll.
> > >
> > > Is this a new issue introduced in 0.9?
> > >
> > > Is there a workaround for this?
> > >
> > > We have already discussed that 0.9.0 will be beta quality, and a follow
> > up release will be needed as users find bugs.  As sample is not a heavily
> > used feature I am inclined to view this bug as ok.  You feel this is
> serious
> > enough to block a beta release?
> > >
> > > Alan.
> > >
> > > On Jun 22, 2011, at 9:11 PM, Dmitriy Ryaboy wrote:
> > >
> > >> -1
> > >>
> > >> I discovered a critical bug in how SAMPLE is treated; I don't think we
> > >> should release until it's fixed (Thejas is testing the fix).
> > >>
> > >> D
> > >>
> > >> On Wed, Jun 22, 2011 at 5:56 PM, Olga Natkovich 
> > wrote:
> > >>
> > >>> I have created a candidate build for Pig 0.9.0 release. This release
> > >>> introduces control structures, changes query parser, and performs
> > semantic
> > >>> cleanup.
> > >>>
> > >>> The rat report showed no issues in Java files outside of build
> > directory.
> > >>>
> > >>>
> > >>> Keys used to sign the release are available at
> > >>> http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup.
> > >>>
> > >>>
> > >>>
> > >>> Please try it out:
> > >>>
> > >>>
> > >>>
> > >>> http://people.apache.org/~olga/pig-0.9.0-candidate-0/
> > >>>
> > >>>
> > >>>
> > >>> Build is also available in maven:
> > >>>
> >
> https://repository.apache.org/content/repositories/orgapachepig-042/org/apache/pig/pig/0.9.0/
> > >>>
> > >>>
> > >>>
> > >>> Should we release this? Vote closes on Monday, June 27.
> > >>>
> > >>>
> > >>>
> > >>> Olga
> > >>>
> > >>>
> > >
> >
>


Re: [VOTE] Release Pig 0.9.0 (candidate 0)

2011-06-23 Thread Richa Khandelwal
Hi Guys,

I would vote +1 to remove the hadoop jar from the pig.jar. That would
definitely add to the flexibility of using Pig with any version or variant
of hadoop.

Also, there is "hdfs://" hardcoded at a few places in the pig source. I can
provide a patch which removes the hard-coding and makes it possible to use
pig on any distributed filesystem that can be used with hadoop, based on
pig.properties configuration file. Please let me know.

Thanks,
Richa

On Thu, Jun 23, 2011 at 2:56 PM, Olga Natkovich  wrote:

> My experience with sample command is that people use it for some quick
> testing and debugging not for production problems. This issue has been there
> since the initial introduction of sample. I still don't see a strong need to
> hold the beta release for it.
>
> Olga
>
> -Original Message-
> From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
> Sent: Thursday, June 23, 2011 10:14 AM
> To: dev@pig.apache.org
> Cc: dev@pig.apache.org
> Subject: Re: [VOTE] Release Pig 0.9.0 (candidate 0)
>
> There is a workaround - turning off optimizations, or at least
> pushUpFilter. I haven't checked 8 with the new logical plan disabled, that
> may work too.
> If this failed execution, I'd let it pass since there is a workaround, but
> as is it silently returns incorrect data, with no way for an analyst to know
> her statistics are now fundamentally wrong. Silent and subtle data
> corruption is just about the wrist big we can have.
> I can't really block the release, you guys can outvote me. But I'd rather
> you didn't; we can patch this problem today.
>
> On Jun 23, 2011, at 7:47 AM, Alan Gates  wrote:
>
> > Are you referring to PIG-2137?  I have a few of questions on that before
> I vote for this release candidate or to reroll.
> >
> > Is this a new issue introduced in 0.9?
> >
> > Is there a workaround for this?
> >
> > We have already discussed that 0.9.0 will be beta quality, and a follow
> up release will be needed as users find bugs.  As sample is not a heavily
> used feature I am inclined to view this bug as ok.  You feel this is serious
> enough to block a beta release?
> >
> > Alan.
> >
> > On Jun 22, 2011, at 9:11 PM, Dmitriy Ryaboy wrote:
> >
> >> -1
> >>
> >> I discovered a critical bug in how SAMPLE is treated; I don't think we
> >> should release until it's fixed (Thejas is testing the fix).
> >>
> >> D
> >>
> >> On Wed, Jun 22, 2011 at 5:56 PM, Olga Natkovich 
> wrote:
> >>
> >>> I have created a candidate build for Pig 0.9.0 release. This release
> >>> introduces control structures, changes query parser, and performs
> semantic
> >>> cleanup.
> >>>
> >>> The rat report showed no issues in Java files outside of build
> directory.
> >>>
> >>>
> >>> Keys used to sign the release are available at
> >>> http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup.
> >>>
> >>>
> >>>
> >>> Please try it out:
> >>>
> >>>
> >>>
> >>> http://people.apache.org/~olga/pig-0.9.0-candidate-0/
> >>>
> >>>
> >>>
> >>> Build is also available in maven:
> >>>
> https://repository.apache.org/content/repositories/orgapachepig-042/org/apache/pig/pig/0.9.0/
> >>>
> >>>
> >>>
> >>> Should we release this? Vote closes on Monday, June 27.
> >>>
> >>>
> >>>
> >>> Olga
> >>>
> >>>
> >
>


RE: [VOTE] Release Pig 0.9.0 (candidate 0)

2011-06-23 Thread Olga Natkovich
My experience with sample command is that people use it for some quick testing 
and debugging not for production problems. This issue has been there since the 
initial introduction of sample. I still don't see a strong need to hold the 
beta release for it. 

Olga

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] 
Sent: Thursday, June 23, 2011 10:14 AM
To: dev@pig.apache.org
Cc: dev@pig.apache.org
Subject: Re: [VOTE] Release Pig 0.9.0 (candidate 0)

There is a workaround - turning off optimizations, or at least pushUpFilter. I 
haven't checked 8 with the new logical plan disabled, that may work too. 
If this failed execution, I'd let it pass since there is a workaround, but as 
is it silently returns incorrect data, with no way for an analyst to know her 
statistics are now fundamentally wrong. Silent and subtle data corruption is 
just about the wrist big we can have. 
I can't really block the release, you guys can outvote me. But I'd rather you 
didn't; we can patch this problem today. 

On Jun 23, 2011, at 7:47 AM, Alan Gates  wrote:

> Are you referring to PIG-2137?  I have a few of questions on that before I 
> vote for this release candidate or to reroll.
> 
> Is this a new issue introduced in 0.9?
> 
> Is there a workaround for this?
> 
> We have already discussed that 0.9.0 will be beta quality, and a follow up 
> release will be needed as users find bugs.  As sample is not a heavily used 
> feature I am inclined to view this bug as ok.  You feel this is serious 
> enough to block a beta release?
> 
> Alan.
> 
> On Jun 22, 2011, at 9:11 PM, Dmitriy Ryaboy wrote:
> 
>> -1
>> 
>> I discovered a critical bug in how SAMPLE is treated; I don't think we
>> should release until it's fixed (Thejas is testing the fix).
>> 
>> D
>> 
>> On Wed, Jun 22, 2011 at 5:56 PM, Olga Natkovich  wrote:
>> 
>>> I have created a candidate build for Pig 0.9.0 release. This release
>>> introduces control structures, changes query parser, and performs semantic
>>> cleanup.
>>> 
>>> The rat report showed no issues in Java files outside of build directory.
>>> 
>>> 
>>> Keys used to sign the release are available at
>>> http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup.
>>> 
>>> 
>>> 
>>> Please try it out:
>>> 
>>> 
>>> 
>>> http://people.apache.org/~olga/pig-0.9.0-candidate-0/
>>> 
>>> 
>>> 
>>> Build is also available in maven:
>>> https://repository.apache.org/content/repositories/orgapachepig-042/org/apache/pig/pig/0.9.0/
>>> 
>>> 
>>> 
>>> Should we release this? Vote closes on Monday, June 27.
>>> 
>>> 
>>> 
>>> Olga
>>> 
>>> 
> 


RE: [VOTE] Release Pig 0.9.0 (candidate 0)

2011-06-23 Thread Olga Natkovich
I don't believe we have seen any requests for this in the apache mailing lists. 
Moreover they can build one from the source code enclosed. Our package is 
already 40MB and I had to split it 5 ways to upload to the people.apache.org. I 
don't think we want to make it any bigger than necessary.

Olga

From: Thejas M Nair
Sent: Thursday, June 23, 2011 2:35 PM
To: dev@pig.apache.org; Olga Natkovich
Subject: Re: [VOTE] Release Pig 0.9.0 (candidate 0)

I have seen many users have problems when they use the pig.jar which is bundled 
with hadoop, because their cluster is using a different version of hadoop.
I think it will be convenient for such users, if pig ships with a jar-without 
hadoop.

-Thejas



On 6/22/11 5:56 PM, "Olga Natkovich"  wrote:
I have created a candidate build for Pig 0.9.0 release. This release introduces 
control structures, changes query parser, and performs semantic cleanup.

The rat report showed no issues in Java files outside of build directory.


Keys used to sign the release are available at 
http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup.



Please try it out:



http://people.apache.org/~olga/pig-0.9.0-candidate-0/



Build is also available in maven: 
https://repository.apache.org/content/repositories/orgapachepig-042/org/apache/pig/pig/0.9.0/



Should we release this? Vote closes on Monday, June 27.



Olga



--


Re: [VOTE] Release Pig 0.9.0 (candidate 0)

2011-06-23 Thread Thejas M Nair
I have seen many users have problems when they use the pig.jar which is bundled 
with hadoop, because their cluster is using a different version of hadoop.
I think it will be convenient for such users, if pig ships with a jar-without 
hadoop.

-Thejas



On 6/22/11 5:56 PM, "Olga Natkovich"  wrote:

I have created a candidate build for Pig 0.9.0 release. This release introduces 
control structures, changes query parser, and performs semantic cleanup.

The rat report showed no issues in Java files outside of build directory.


Keys used to sign the release are available at 
http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup.



Please try it out:



http://people.apache.org/~olga/pig-0.9.0-candidate-0/



Build is also available in maven: 
https://repository.apache.org/content/repositories/orgapachepig-042/org/apache/pig/pig/0.9.0/



Should we release this? Vote closes on Monday, June 27.



Olga




--



[jira] [Updated] (PIG-2136) Implementation of Sample should use LessThanExpression instead of LessThanEqualExpression

2011-06-23 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2136:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

+1 . Patch committed to trunk. Thanks Gianmarco !


> Implementation of Sample should use LessThanExpression instead of 
> LessThanEqualExpression
> -
>
> Key: PIG-2136
> URL: https://issues.apache.org/jira/browse/PIG-2136
> Project: Pig
>  Issue Type: Bug
>Reporter: Gianmarco De Francisci Morales
>Assignee: Gianmarco De Francisci Morales
>Priority: Minor
> Attachments: PIG-2136.patch
>
>
> Currently LogicalPlanBuilder uses a filter with 'Math.random() <= const' to 
> implement Sample.
> Actually it should use 'Math.random() < const' to be correct, because 
> Math.random() generates a number 0 <= x < 1

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [VOTE] Release Pig 0.9.0 (candidate 0)

2011-06-23 Thread Dmitriy Ryaboy
There is a workaround - turning off optimizations, or at least pushUpFilter. I 
haven't checked 8 with the new logical plan disabled, that may work too. 
If this failed execution, I'd let it pass since there is a workaround, but as 
is it silently returns incorrect data, with no way for an analyst to know her 
statistics are now fundamentally wrong. Silent and subtle data corruption is 
just about the wrist big we can have. 
I can't really block the release, you guys can outvote me. But I'd rather you 
didn't; we can patch this problem today. 

On Jun 23, 2011, at 7:47 AM, Alan Gates  wrote:

> Are you referring to PIG-2137?  I have a few of questions on that before I 
> vote for this release candidate or to reroll.
> 
> Is this a new issue introduced in 0.9?
> 
> Is there a workaround for this?
> 
> We have already discussed that 0.9.0 will be beta quality, and a follow up 
> release will be needed as users find bugs.  As sample is not a heavily used 
> feature I am inclined to view this bug as ok.  You feel this is serious 
> enough to block a beta release?
> 
> Alan.
> 
> On Jun 22, 2011, at 9:11 PM, Dmitriy Ryaboy wrote:
> 
>> -1
>> 
>> I discovered a critical bug in how SAMPLE is treated; I don't think we
>> should release until it's fixed (Thejas is testing the fix).
>> 
>> D
>> 
>> On Wed, Jun 22, 2011 at 5:56 PM, Olga Natkovich  wrote:
>> 
>>> I have created a candidate build for Pig 0.9.0 release. This release
>>> introduces control structures, changes query parser, and performs semantic
>>> cleanup.
>>> 
>>> The rat report showed no issues in Java files outside of build directory.
>>> 
>>> 
>>> Keys used to sign the release are available at
>>> http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup.
>>> 
>>> 
>>> 
>>> Please try it out:
>>> 
>>> 
>>> 
>>> http://people.apache.org/~olga/pig-0.9.0-candidate-0/
>>> 
>>> 
>>> 
>>> Build is also available in maven:
>>> https://repository.apache.org/content/repositories/orgapachepig-042/org/apache/pig/pig/0.9.0/
>>> 
>>> 
>>> 
>>> Should we release this? Vote closes on Monday, June 27.
>>> 
>>> 
>>> 
>>> Olga
>>> 
>>> 
> 


[jira] [Commented] (PIG-1874) Make PigServer work in a multithreading environment

2011-06-23 Thread Vincent BARAT (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053923#comment-13053923
 ] 

Vincent BARAT commented on PIG-1874:


Thanks guys ! You save my life with this patch !

> Make PigServer work in a multithreading environment
> ---
>
> Key: PIG-1874
> URL: https://issues.apache.org/jira/browse/PIG-1874
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-1874.patch, PIG-1874_1.patch
>
>
> This means that PigServers should work if one creates separate PigServer 
> instances for each thread (PigServers are not synchronized). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [VOTE] Release Pig 0.9.0 (candidate 0)

2011-06-23 Thread Alan Gates
Are you referring to PIG-2137?  I have a few of questions on that  
before I vote for this release candidate or to reroll.


Is this a new issue introduced in 0.9?

Is there a workaround for this?

We have already discussed that 0.9.0 will be beta quality, and a  
follow up release will be needed as users find bugs.  As sample is not  
a heavily used feature I am inclined to view this bug as ok.  You feel  
this is serious enough to block a beta release?


Alan.

On Jun 22, 2011, at 9:11 PM, Dmitriy Ryaboy wrote:


-1

I discovered a critical bug in how SAMPLE is treated; I don't think we
should release until it's fixed (Thejas is testing the fix).

D

On Wed, Jun 22, 2011 at 5:56 PM, Olga Natkovich inc.com> wrote:



I have created a candidate build for Pig 0.9.0 release. This release
introduces control structures, changes query parser, and performs  
semantic

cleanup.

The rat report showed no issues in Java files outside of build  
directory.



Keys used to sign the release are available at
http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup.



Please try it out:



http://people.apache.org/~olga/pig-0.9.0-candidate-0/



Build is also available in maven:
https://repository.apache.org/content/repositories/orgapachepig-042/org/apache/pig/pig/0.9.0/



Should we release this? Vote closes on Monday, June 27.



Olga






Build failed in Jenkins: Pig-trunk #1031

2011-06-23 Thread Apache Jenkins Server
See 

Changes:

[thejas] fixing javadoc warnings

[thejas] PIG-2140: Usage printed from Main.java gives wrong option for disabling
  LogicalExpressionSimplifier

--
[...truncated 34857 lines...]
[junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
[junit] at java.security.AccessController.doPrivileged(Native Method)
[junit] at javax.security.auth.Subject.doAs(Subject.java:396)
[junit] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
[junit] 
[junit] org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could 
not complete write to file 
/tmp/TestStore-output-6159565120716509592.txt_cleanupOnFailure_succeeded1 by 
DFSClient_-1955661569
[junit] at 
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449)
[junit] at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
[junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
[junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
[junit] at java.security.AccessController.doPrivileged(Native Method)
[junit] at javax.security.auth.Subject.doAs(Subject.java:396)
[junit] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
[junit] 
[junit] at org.apache.hadoop.ipc.Client.call(Client.java:740)
[junit] at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
[junit] at $Proxy0.complete(Unknown Source)
[junit] at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
[junit] at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
[junit] at $Proxy0.complete(Unknown Source)
[junit] at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3264)
[junit] at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3188)
[junit] at 
org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:1043)
[junit] at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:237)
[junit] at 
org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:269)
[junit] at 
org.apache.pig.test.MiniCluster.shutdownMiniDfsAndMrClusters(MiniCluster.java:111)
[junit] at 
org.apache.pig.test.MiniCluster.shutDown(MiniCluster.java:101)
[junit] at 
org.apache.pig.test.TestStore.oneTimeTearDown(TestStore.java:127)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
[junit] at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
[junit] at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
[junit] at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37)
[junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
[junit] at 
junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
[junit] Shutting down the Mini HDFS Cluster
[junit] Shutting down DataNode 3
[junit] 11/06/23 10:35:32 INFO ipc.Server: Stopping server on 57632
[junit] 11/06/23 10:35:32 INFO ipc.Server: IPC Server handler 1 on 57632: 
exiting
[junit] 11/06/23 10:35:32 INFO ipc.Server: IPC Server handler 2 on 57632: 
exiting
[junit] 11/06/23 10:35:32 INFO datanode.DataNode: Waiting for threadgroup 
to exit, active threads is 1
[junit] 11/06/23 10:35:32 INFO ipc.Server: Stopping IPC Server Responder
[junit] 11/06/23 10:35:32 INFO ipc.Server: IPC Server handler 0 on 57632: 
exiting
[junit] 11/06/23 10:35:32 INFO i

[jira] [Updated] (PIG-1926) Sample/Limit should take scalar

2011-06-23 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-1926:


Status: Open  (was: Patch Available)

> Sample/Limit should take scalar
> ---
>
> Key: PIG-1926
> URL: https://issues.apache.org/jira/browse/PIG-1926
> Project: Pig
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Gianmarco De Francisci Morales
>  Labels: gsoc2011
> Attachments: PIG-1926.10.patch, PIG-1926.7.patch, PIG-1926.8.patch, 
> PIG-1926.9.patch, PIG-1926.patch, PIG-1926.patch, PIG-1926.patch, 
> PIG-1926.patch, PIG-1926.patch, PIG-1926.patch
>
>
> Currently, Limit, Sample only takes a constant. It would be better we can use 
> a scalar in the place of constant. Eg:
> {code}
> a = load 'a.txt';
> b = group a all;
> c = foreach b generate COUNT(a) as sum;
> d = order a by $0;
> e = limit d c.sum/100;
> {code}
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-1926) Sample/Limit should take scalar

2011-06-23 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-1926:


Attachment: PIG-1926.10.patch

Attaching PIG-1926.10.patch:
Implemented support and tests for Sample.
Fixed a problem with syntactic predicates in Limit, now it does not break on 
inline_op.

All tests pass on my machine.
We can call this a release candidate.
I think it is ready for review.

> Sample/Limit should take scalar
> ---
>
> Key: PIG-1926
> URL: https://issues.apache.org/jira/browse/PIG-1926
> Project: Pig
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Gianmarco De Francisci Morales
>  Labels: gsoc2011
> Attachments: PIG-1926.10.patch, PIG-1926.7.patch, PIG-1926.8.patch, 
> PIG-1926.9.patch, PIG-1926.patch, PIG-1926.patch, PIG-1926.patch, 
> PIG-1926.patch, PIG-1926.patch, PIG-1926.patch
>
>
> Currently, Limit, Sample only takes a constant. It would be better we can use 
> a scalar in the place of constant. Eg:
> {code}
> a = load 'a.txt';
> b = group a all;
> c = foreach b generate COUNT(a) as sum;
> d = order a by $0;
> e = limit d c.sum/100;
> {code}
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira