Re: Our release process

2012-11-30 Thread Santhosh M S
HI Julien,

You are making most of the points that I did on this thread (CI for e2e, not 
burdening clean e2e prior to every commit for a release branch). The only point 
on which there is no clear agreement is the definition of a bug that can be 
included in a previously released branch. I am fine with a case by case 
inclusion. 

Hi Olga,

Are you fine with Julien's proposal as it stands - bugs that are included will 
be determined at the time of inclusion instead of doing it now.

Santhosh



 From: Julien Le Dem 
To: dev@pig.apache.org; Santhosh M S  
Cc: "billgra...@gmail.com"  
Sent: Friday, November 30, 2012 5:37 PM
Subject: Re: Our release process
 
Proposed criteria:
- it makes the tests fail. targets test-commit + test + e2e tests
- a critical bug is reported in a short time frame (definition of
critical not needed as it is rare and can be decided on a case by case
basis)

That raises another question: what are the existing CI servers running
the tests?
- the Apache CI runs test-commit and test (is it more stable now?)
and not e2e. It would be great if it did.
- we have a Jenkins build at Twitter where we run test-commit and
test, we could not run e2e easily in our environment.
- I understand there's a Yahoo/Hortonworks build (test-commit + test + e2e ???)

Whenever those builds fail we should open or reopen JIRAS and fix it.

The time it takes to run the full
 test suite makes it impractical to
run on a desktop/laptop.

For the release Pig-0.11.0 we need to get this list of JIRAs down to 0
and publish the jar.
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+PIG+AND+fixVersion+%3D+%220.11%22+AND+resolution+%3D+Unresolved+ORDER+BY+updated+DESC%2C+due+ASC%2C+priority+DESC

Julien

On Thu, Nov 29, 2012 at 11:16 PM, Santhosh M S
 wrote:
> Looks like everyone is interested in having frequent releases - I don't see 
> anyone disagreeing with that.
>
> Regarding "If a patch
 makes the release branch unstable, we revert it" - what are the criteria? If 
we can't decide on the criteria on this thread (already pretty long) then lets 
get the release trains going. We can revisit the criteria for inclusion of bug 
fixes when that happens.
>
> Santhosh
>
>
> 
>  From: Julien Le Dem 
> To: dev@pig.apache.org; Santhosh M S 
> Cc: "billgra...@gmail.com" 
> Sent:
 Thursday, November 29, 2012 9:45 AM
> Subject: Re: Our release process
>
> The release branch receives only bug fixes. Patch level releases (3rd
> version number) are issued out of the release branch and introduce
> only bug fixes and no new features.
> Deciding whether a patch is applied to the release branch is based on
> preserving stability (as Bill said). If a patch makes the release
> branch unstable, we revert it.
> New features are added to trunk where new major and minor releases will 
> happen.
> If we need a new feature out then we make a new minor release.
> Doing frequent releases is the industry standard and will resolve
> conflicts around what should go in a release branch.
>
> Making a new release is currently painful *because* we wait so long in
> between two releases. Let's fix that.
>
> Julien
>
> On Wed, Nov 28, 2012 at
 10:09 PM, Santhosh M S
>  wrote:
>> Since releasing a major version once a month is agressive and we have not 
>> released on a quarterly basis, we should allow commits to a released branch 
>> to facilitate dot releases.
>>
>> If we are allowing commits to a released branch, the criteria for inclusion 
>> can be created anew or we use the industry standards for severity (or 
>> priority). It could be painful for a few folks but I don't see better 
>> alternatives.
>>
>> Regarding reverting commits based on e2e tests breaking:
>>         1. Who is running the tests?
>>         2. How often are they run?
>> If we have nightly e2e runs then its easier to catch these errors early. If 
>> not the barrier for inclusion is pretty high and time
 consuming making it harder to develop.
>>
>> Santhosh
>>
>>
>> 
>>  From: Bill Graham 
>> To: dev@pig.apache.org
>> Sent: Wednesday, November 28, 2012 11:39 AM
>> Subject: Re: Our release process
>>
>> I agree releasing often is ideal, but releasing major versions once a month
>> would be a bit agressive.
>>
>> +1 to Olga's initial definition of how Yahoo! determines what goes into a
>> released branch. Basically is something broken without a workaround or is
>> there potential silent data loss. Trying to get a more granular definition
>> than that (i.e. P1, P2, severity, etc) will be
 painful. The reality in that
>> case is that for whomever is blocked by the bug will consider it a P1.
>>
>> Fixes need to be relatively low-risk though to keep stability, but this is
>> also subjective. For this I'm in favor of relying on developer and reviewer
>> judgement to make that call and I'm +1 to Alan's proposal of rolling back
>> patches that break the e2e

[jira] [Updated] (PIG-2907) Publish pig 0.23 jars to maven

2012-11-30 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-2907:


   Resolution: Fixed
Fix Version/s: 0.12
   Status: Resolved  (was: Patch Available)

Thanks Julien. Committed to 0.11 and trunk.

> Publish pig 0.23 jars to maven
> --
>
> Key: PIG-2907
> URL: https://issues.apache.org/jira/browse/PIG-2907
> Project: Pig
>  Issue Type: New Feature
>Reporter: Francis Liu
>Assignee: Rohini Palaniswamy
> Fix For: 0.11, 0.12
>
> Attachments: PIG-2907-1.patch, PIG-2907-2.patch, PIG-2907.patch
>
>
> HCatalog would like to get our unit tests be able to run against 0.23 part of 
> it would require pulling the pig 0.23 dependency from maven.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3014) CurrentTime() UDF has undesirable characteristics

2012-11-30 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507825#comment-13507825
 ] 

Rohini Palaniswamy commented on PIG-3014:
-

Ah. I had forgotten about that question. Agree with Julien.

> CurrentTime() UDF has undesirable characteristics
> -
>
> Key: PIG-3014
> URL: https://issues.apache.org/jira/browse/PIG-3014
> Project: Pig
>  Issue Type: Bug
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch, 
> PIG-3014-3.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had 
> added a CurrentTime() UDF. The issue with this UDF is that it returns the 
> current time _of every exec invocation_, which can lead to confusing results. 
> In PIG-1431 I proposed a way such that every instance of the same NOW() will 
> return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Our release process

2012-11-30 Thread Julien Le Dem
Proposed criteria:
 - it makes the tests fail. targets test-commit + test + e2e tests
 - a critical bug is reported in a short time frame (definition of
critical not needed as it is rare and can be decided on a case by case
basis)

That raises another question: what are the existing CI servers running
the tests?
 - the Apache CI runs test-commit and test (is it more stable now?)
and not e2e. It would be great if it did.
 - we have a Jenkins build at Twitter where we run test-commit and
test, we could not run e2e easily in our environment.
 - I understand there's a Yahoo/Hortonworks build (test-commit + test + e2e ???)

Whenever those builds fail we should open or reopen JIRAS and fix it.

The time it takes to run the full test suite makes it impractical to
run on a desktop/laptop.

For the release Pig-0.11.0 we need to get this list of JIRAs down to 0
and publish the jar.
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+PIG+AND+fixVersion+%3D+%220.11%22+AND+resolution+%3D+Unresolved+ORDER+BY+updated+DESC%2C+due+ASC%2C+priority+DESC

Julien

On Thu, Nov 29, 2012 at 11:16 PM, Santhosh M S
 wrote:
> Looks like everyone is interested in having frequent releases - I don't see 
> anyone disagreeing with that.
>
> Regarding "If a patch makes the release branch unstable, we revert it" - what 
> are the criteria? If we can't decide on the criteria on this thread (already 
> pretty long) then lets get the release trains going. We can revisit the 
> criteria for inclusion of bug fixes when that happens.
>
> Santhosh
>
>
> 
>  From: Julien Le Dem 
> To: dev@pig.apache.org; Santhosh M S 
> Cc: "billgra...@gmail.com" 
> Sent: Thursday, November 29, 2012 9:45 AM
> Subject: Re: Our release process
>
> The release branch receives only bug fixes. Patch level releases (3rd
> version number) are issued out of the release branch and introduce
> only bug fixes and no new features.
> Deciding whether a patch is applied to the release branch is based on
> preserving stability (as Bill said). If a patch makes the release
> branch unstable, we revert it.
> New features are added to trunk where new major and minor releases will 
> happen.
> If we need a new feature out then we make a new minor release.
> Doing frequent releases is the industry standard and will resolve
> conflicts around what should go in a release branch.
>
> Making a new release is currently painful *because* we wait so long in
> between two releases. Let's fix that.
>
> Julien
>
> On Wed, Nov 28, 2012 at 10:09 PM, Santhosh M S
>  wrote:
>> Since releasing a major version once a month is agressive and we have not 
>> released on a quarterly basis, we should allow commits to a released branch 
>> to facilitate dot releases.
>>
>> If we are allowing commits to a released branch, the criteria for inclusion 
>> can be created anew or we use the industry standards for severity (or 
>> priority). It could be painful for a few folks but I don't see better 
>> alternatives.
>>
>> Regarding reverting commits based on e2e tests breaking:
>> 1. Who is running the tests?
>> 2. How often are they run?
>> If we have nightly e2e runs then its easier to catch these errors early. If 
>> not the barrier for inclusion is pretty high and time consuming making it 
>> harder to develop.
>>
>> Santhosh
>>
>>
>> 
>>  From: Bill Graham 
>> To: dev@pig.apache.org
>> Sent: Wednesday, November 28, 2012 11:39 AM
>> Subject: Re: Our release process
>>
>> I agree releasing often is ideal, but releasing major versions once a month
>> would be a bit agressive.
>>
>> +1 to Olga's initial definition of how Yahoo! determines what goes into a
>> released branch. Basically is something broken without a workaround or is
>> there potential silent data loss. Trying to get a more granular definition
>> than that (i.e. P1, P2, severity, etc) will be painful. The reality in that
>> case is that for whomever is blocked by the bug will consider it a P1.
>>
>> Fixes need to be relatively low-risk though to keep stability, but this is
>> also subjective. For this I'm in favor of relying on developer and reviewer
>> judgement to make that call and I'm +1 to Alan's proposal of rolling back
>> patches that break the e2e tests or anything else.
>>
>> I think our policy should avoid time-based consideration on how many
>> quarters away are we from the next major release since that's also
>> impossible to quantify. Plus, if the answer to the question is that we're
>> more than 1-2 quarters from the next release is "yes" then we should be
>> fixing that release problem.
>>
>>
>> On Wed, Nov 28, 2012 at 10:22 AM, Julien Le Dem  wrote:
>>
>>> I would really like to see us doing frequent releases (at least once
>>> per quarter if not once a month).
>>> I think the whole notion of priority or being a "blocker" is subjective.
>>> Releasing infrequently pressures us to push more changes than we would
>>> 

[jira] Subscription: PIG patch available

2012-11-30 Thread jira
Issue Subscription
Filter: PIG patch available (32 issues)

Subscriber: pigdaily

Key Summary
PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness
https://issues.apache.org/jira/browse/PIG-3069
PIG-3067HBaseStorage should be split up to become more managable
https://issues.apache.org/jira/browse/PIG-3067
PIG-3066Fix TestPigRunner in trunk
https://issues.apache.org/jira/browse/PIG-3066
PIG-3058Upgrade junit to at least 4.8
https://issues.apache.org/jira/browse/PIG-3058
PIG-3057make readField protected to be able to override it if we extend 
PigStorage
https://issues.apache.org/jira/browse/PIG-3057
PIG-3051java.lang.IndexOutOfBoundsException  failure with LimitOptimizer + 
ColumnPruning
https://issues.apache.org/jira/browse/PIG-3051
PIG-3033test-patch failed with javadoc warnings
https://issues.apache.org/jira/browse/PIG-3033
PIG-3029TestTypeCheckingValidatorNewLP has some path reference issues for 
cross-platform execution
https://issues.apache.org/jira/browse/PIG-3029
PIG-3028testGrunt dev test needs some command filters to run correctly 
without cygwin
https://issues.apache.org/jira/browse/PIG-3028
PIG-3027pigTest unit test needs a newline filter for comparisons of golden 
multi-line
https://issues.apache.org/jira/browse/PIG-3027
PIG-3026Pig checked-in baseline comparisons need a pre-filter to address 
OS-specific newline differences
https://issues.apache.org/jira/browse/PIG-3026
PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline 
script needs simplification
https://issues.apache.org/jira/browse/PIG-3025
PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is 
brittle
https://issues.apache.org/jira/browse/PIG-3024
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-3010Allow UDF's to flatten themselves
https://issues.apache.org/jira/browse/PIG-3010
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2957TetsScriptUDF fail due to volume prefix in jar
https://issues.apache.org/jira/browse/PIG-2957
PIG-2956Invalid cache specification for some streaming statement
https://issues.apache.org/jira/browse/PIG-2956
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2907Publish pig 0.23 jars to maven
https://issues.apache.org/jira/browse/PIG-2907
PIG-2873Converting bin/pig shell script to python
https://issues.apache.org/jira/browse/PIG-2873
PIG-2834MultiStorage requires unused constructor argument
https://issues.apache.org/jira/browse/PIG-2834
PIG-2824Pushing checking number of fields into LoadFunc
https://issues.apache.org/jira/browse/PIG-2824
PIG-2661Pig uses an extra job for loading data in Pigmix L9
https://issues.apache.org/jira/browse/PIG-2661
PIG-2614AvroStorage crashes on LOADING a single bad error
https://issues.apache.org/jira/browse/PIG-2614
PIG-2507Semicolon in paramenters for UDF results in parsing error
https://issues.apache.org/jira/browse/PIG-2507
PIG-2433Jython import module not working if module path is in classpath
https://issues.apache.org/jira/browse/PIG-2433
PIG-2417Streaming UDFs -  allow users to easily write UDFs in scripting 
languages with no JVM implementation.
https://issues.apache.org/jira/browse/PIG-2417
PIG-2362Rework Ant build.xml to use macrodef instead of antcall
https://issues.apache.org/jira/browse/PIG-2362
PIG-2312NPE when relation and column share the same name and used in Nested 
Foreach 
https://issues.apache.org/jira/browse/PIG-2312
PIG-1942script UDF (jython) should utilize the intended output schema to 
more directly convert Py objects to Pig objects
https://issues.apache.org/jira/browse/PIG-1942
PIG-1237Piggybank MutliStorage - specify field to write in output
https://issues.apache.org/jira/browse/PIG-1237

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


[jira] [Updated] (PIG-3014) CurrentTime() UDF has undesirable characteristics

2012-11-30 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3014:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to trunk.

> CurrentTime() UDF has undesirable characteristics
> -
>
> Key: PIG-3014
> URL: https://issues.apache.org/jira/browse/PIG-3014
> Project: Pig
>  Issue Type: Bug
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch, 
> PIG-3014-3.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had 
> added a CurrentTime() UDF. The issue with this UDF is that it returns the 
> current time _of every exec invocation_, which can lead to confusing results. 
> In PIG-1431 I proposed a way such that every instance of the same NOW() will 
> return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2907) Publish pig 0.23 jars to maven

2012-11-30 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507752#comment-13507752
 ] 

Julien Le Dem commented on PIG-2907:


+1

> Publish pig 0.23 jars to maven
> --
>
> Key: PIG-2907
> URL: https://issues.apache.org/jira/browse/PIG-2907
> Project: Pig
>  Issue Type: New Feature
>Reporter: Francis Liu
>Assignee: Rohini Palaniswamy
> Fix For: 0.11
>
> Attachments: PIG-2907-1.patch, PIG-2907-2.patch, PIG-2907.patch
>
>
> HCatalog would like to get our unit tests be able to run against 0.23 part of 
> it would require pulling the pig 0.23 dependency from maven.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3014) CurrentTime() UDF has undesirable characteristics

2012-11-30 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507750#comment-13507750
 ] 

Julien Le Dem commented on PIG-3014:


I think it's better to have one test class per UDF.
Usually tests are grouped per class or functional group of classes.
All builtin UDFs do not make a functional group as they have various different 
purposes. It just makes a huge Test class which is undesirable.

> CurrentTime() UDF has undesirable characteristics
> -
>
> Key: PIG-3014
> URL: https://issues.apache.org/jira/browse/PIG-3014
> Project: Pig
>  Issue Type: Bug
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch, 
> PIG-3014-3.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had 
> added a CurrentTime() UDF. The issue with this UDF is that it returns the 
> current time _of every exec invocation_, which can lead to confusing results. 
> In PIG-1431 I proposed a way such that every instance of the same NOW() will 
> return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3014) CurrentTime() UDF has undesirable characteristics

2012-11-30 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507731#comment-13507731
 ] 

Cheolsoo Park commented on PIG-3014:


Thanks Rohini.

In fact, I asked that question on the dev mailing list a while ago:
http://search-hadoop.com/m/OVyoR1Ktpcy/Adding+new+test+cases+to+TestBuiltin.java&subj=Adding+new+test+cases+to+TestBuiltin+java

Julien said that each built-in UDF should have its own test suite, so I 
followed it in PIG-2881. I guess that the same applies to CurrentTime().

Please anyone correct me if I am wrong.

> CurrentTime() UDF has undesirable characteristics
> -
>
> Key: PIG-3014
> URL: https://issues.apache.org/jira/browse/PIG-3014
> Project: Pig
>  Issue Type: Bug
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch, 
> PIG-3014-3.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had 
> added a CurrentTime() UDF. The issue with this UDF is that it returns the 
> current time _of every exec invocation_, which can lead to confusing results. 
> In PIG-1431 I proposed a way such that every instance of the same NOW() will 
> return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3014) CurrentTime() UDF has undesirable characteristics

2012-11-30 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507725#comment-13507725
 ] 

Rohini Palaniswamy commented on PIG-3014:
-

bq. Since the test case is not valid, I simply removed it.

+1. TestCurrentTime covers CurrentTime udf adequately.

Just an observation though. All builtin udf tests are in TestBuiltin, but 
CurrentTime alone has a separate test class with just one test. Should we move 
that to TestBuiltin?

> CurrentTime() UDF has undesirable characteristics
> -
>
> Key: PIG-3014
> URL: https://issues.apache.org/jira/browse/PIG-3014
> Project: Pig
>  Issue Type: Bug
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch, 
> PIG-3014-3.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had 
> added a CurrentTime() UDF. The issue with this UDF is that it returns the 
> current time _of every exec invocation_, which can lead to confusing results. 
> In PIG-1431 I proposed a way such that every instance of the same NOW() will 
> return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3014) CurrentTime() UDF has undesirable characteristics

2012-11-30 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3014:
---

Attachment: PIG-3014-3.patch

Attached a patch that fixes {{TestBuiltin}}.

The CurrentTime() must get called only in the back-end because it reads the 
value of "pig.job.submitted.timestamp" out of JobConf. But the unit test case 
was calling it in the front-end, resulting in a NullPointerException.

Since the test case is not valid, I simply removed it.

Thanks!

> CurrentTime() UDF has undesirable characteristics
> -
>
> Key: PIG-3014
> URL: https://issues.apache.org/jira/browse/PIG-3014
> Project: Pig
>  Issue Type: Bug
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch, 
> PIG-3014-3.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had 
> added a CurrentTime() UDF. The issue with this UDF is that it returns the 
> current time _of every exec invocation_, which can lead to confusing results. 
> In PIG-1431 I proposed a way such that every instance of the same NOW() will 
> return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3014) CurrentTime() UDF has undesirable characteristics

2012-11-30 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3014:
---

Status: Patch Available  (was: Reopened)

> CurrentTime() UDF has undesirable characteristics
> -
>
> Key: PIG-3014
> URL: https://issues.apache.org/jira/browse/PIG-3014
> Project: Pig
>  Issue Type: Bug
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch, 
> PIG-3014-3.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had 
> added a CurrentTime() UDF. The issue with this UDF is that it returns the 
> current time _of every exec invocation_, which can lead to confusing results. 
> In PIG-1431 I proposed a way such that every instance of the same NOW() will 
> return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3014) CurrentTime() UDF has undesirable characteristics

2012-11-30 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507654#comment-13507654
 ] 

Cheolsoo Park commented on PIG-3014:


Hi Julien,

Sorry for that. It is failing because {{TestBuiltin}} is not set 
{{pig.job.submitted.timestamp}}. I will get it fixed now.

> CurrentTime() UDF has undesirable characteristics
> -
>
> Key: PIG-3014
> URL: https://issues.apache.org/jira/browse/PIG-3014
> Project: Pig
>  Issue Type: Bug
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had 
> added a CurrentTime() UDF. The issue with this UDF is that it returns the 
> current time _of every exec invocation_, which can lead to confusing results. 
> In PIG-1431 I proposed a way such that every instance of the same NOW() will 
> return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (PIG-3014) CurrentTime() UDF has undesirable characteristics

2012-11-30 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem reopened PIG-3014:



I see a failing test:
org.apache.pig.test.TestBuiltin.testConversionBetweenDateTimeAndString

java.lang.NullPointerException
at org.apache.pig.builtin.CurrentTime.exec(CurrentTime.java:41)
at 
org.apache.pig.test.TestBuiltin.testConversionBetweenDateTimeAndString(TestBuiltin.java:450)


> CurrentTime() UDF has undesirable characteristics
> -
>
> Key: PIG-3014
> URL: https://issues.apache.org/jira/browse/PIG-3014
> Project: Pig
>  Issue Type: Bug
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had 
> added a CurrentTime() UDF. The issue with this UDF is that it returns the 
> current time _of every exec invocation_, which can lead to confusing results. 
> In PIG-1431 I proposed a way such that every instance of the same NOW() will 
> return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2614) AvroStorage crashes on LOADING a single bad error

2012-11-30 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2614:
---

Status: Patch Available  (was: Open)

> AvroStorage crashes on LOADING a single bad error
> -
>
> Key: PIG-2614
> URL: https://issues.apache.org/jira/browse/PIG-2614
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.10.0, 0.11
>Reporter: Russell Jurney
>Assignee: Jonathan Coveney
>  Labels: avro, avrostorage, bad, book, cutting, doug, for, my, 
> pig, sadism
> Fix For: 0.11, 0.10.1
>
> Attachments: PIG-2614_0.patch, PIG-2614_1.patch, PIG-2614_2.patch, 
> test_avro_files.tar.gz
>
>
> AvroStorage dies when a single bad record exists, such as one with missing 
> fields.  This is very bad on 'big data,' where bad records are inevitable.  
> See discussion at 
> http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss
>  for more theory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2614) AvroStorage crashes on LOADING a single bad error

2012-11-30 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507564#comment-13507564
 ] 

Cheolsoo Park commented on PIG-2614:


In addition to applying the patch, the following commands should be also 
executed to run the unit test cases:
{code}
wget 
https://issues.apache.org/jira/secure/attachment/1246/test_avro_files.tar.gz
tar -xf test_avro_files.tar.gz
svn rm 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_corrupted_file.avro
svn add 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_corrupted_file
svn rm 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testCorruptedFile.avro
svn add 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testCorruptedFile2.avro
svn add 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testCorruptedFile3.avro
{code}

Thanks!

> AvroStorage crashes on LOADING a single bad error
> -
>
> Key: PIG-2614
> URL: https://issues.apache.org/jira/browse/PIG-2614
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.10.0, 0.11
>Reporter: Russell Jurney
>Assignee: Jonathan Coveney
>  Labels: avro, avrostorage, bad, book, cutting, doug, for, my, 
> pig, sadism
> Fix For: 0.11, 0.10.1
>
> Attachments: PIG-2614_0.patch, PIG-2614_1.patch, PIG-2614_2.patch, 
> test_avro_files.tar.gz
>
>
> AvroStorage dies when a single bad record exists, such as one with missing 
> fields.  This is very bad on 'big data,' where bad records are inevitable.  
> See discussion at 
> http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss
>  for more theory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2614) AvroStorage crashes on LOADING a single bad error

2012-11-30 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2614:
---

Attachment: test_avro_files.tar.gz
PIG-2614_2.patch

Hi all,

I rebased the patch to trunk. Hopefully, this will make things more clear:
- Removed PIG-2551 code since it's already committed to trunk.
- Replaced the {{ignore_bad_file}} option that was committed in PIG-2909 with 
the {{bad.record.threshold}} and {{bad.record.min}} properties.
- Added unit test cases
{{testCorruptedFile1,2,3}}.

@Joe,
I am not sure if I fully understand your question. Please correct me if I am 
wrong.

You're right that {{InputErrorTracker}} can be used by any LoadFunc. What 
storages need to do is to create a {{InputErrorTracker}} and increase counters. 
Do you have a better suggestion?

Thanks!

> AvroStorage crashes on LOADING a single bad error
> -
>
> Key: PIG-2614
> URL: https://issues.apache.org/jira/browse/PIG-2614
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.10.0, 0.11
>Reporter: Russell Jurney
>Assignee: Jonathan Coveney
>  Labels: avro, avrostorage, bad, book, cutting, doug, for, my, 
> pig, sadism
> Fix For: 0.11, 0.10.1
>
> Attachments: PIG-2614_0.patch, PIG-2614_1.patch, PIG-2614_2.patch, 
> test_avro_files.tar.gz
>
>
> AvroStorage dies when a single bad record exists, such as one with missing 
> fields.  This is very bad on 'big data,' where bad records are inevitable.  
> See discussion at 
> http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss
>  for more theory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2907) Publish pig 0.23 jars to maven

2012-11-30 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-2907:


Attachment: PIG-2907-2.patch

Updated patch with comment explaining move based on Julien's comments in 
reviewboard.

Julien,
  Need a +1 from you for the h23 to h2 change. 

> Publish pig 0.23 jars to maven
> --
>
> Key: PIG-2907
> URL: https://issues.apache.org/jira/browse/PIG-2907
> Project: Pig
>  Issue Type: New Feature
>Reporter: Francis Liu
>Assignee: Rohini Palaniswamy
> Fix For: 0.11
>
> Attachments: PIG-2907-1.patch, PIG-2907-2.patch, PIG-2907.patch
>
>
> HCatalog would like to get our unit tests be able to run against 0.23 part of 
> it would require pulling the pig 0.23 dependency from maven.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2907) Publish pig 0.23 jars to maven

2012-11-30 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-2907:


Attachment: PIG-2907-1.patch

Updated patch changing classifier from h23 to h2 based on Alejandro's review 
comment

> Publish pig 0.23 jars to maven
> --
>
> Key: PIG-2907
> URL: https://issues.apache.org/jira/browse/PIG-2907
> Project: Pig
>  Issue Type: New Feature
>Reporter: Francis Liu
>Assignee: Rohini Palaniswamy
> Fix For: 0.11
>
> Attachments: PIG-2907-1.patch, PIG-2907.patch
>
>
> HCatalog would like to get our unit tests be able to run against 0.23 part of 
> it would require pulling the pig 0.23 dependency from maven.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: [PIG-2907] Publish pig 0.23 jars to maven

2012-11-30 Thread Rohini Palaniswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8157/
---

(Updated Nov. 30, 2012, 3:49 p.m.)


Review request for pig.


Changes
---

Changed classifier from h23 to h2 based on Alejandro's review comment


Description
---

Publishing h23 compiled pig jar with classifier h23. 


This addresses bug PIG-2907.
https://issues.apache.org/jira/browse/PIG-2907


Diffs (updated)
-

  http://svn.apache.org/repos/asf/pig/trunk/build.xml 1415689 

Diff: https://reviews.apache.org/r/8157/diff/


Testing
---

Tested with a local nexus repository using the command

ant clean mvn-deploy -Dasfrepo=http://localhost:8089/nexus


Thanks,

Rohini Palaniswamy



[jira] [Commented] (PIG-3058) Upgrade junit to at least 4.8

2012-11-30 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507365#comment-13507365
 ] 

Cheolsoo Park commented on PIG-3058:


+1.

I ran the full unit test suite with hadoop 20/23 and don't see any additional 
test failures.

> Upgrade junit to at least 4.8
> -
>
> Key: PIG-3058
> URL: https://issues.apache.org/jira/browse/PIG-3058
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.11
>Reporter: fang fang chen
>Assignee: fang fang chen
> Fix For: 0.11, 0.12
>
> Attachments: PIG-3058.patch
>
>
> Pig needs to upgrade junit version to at least 4.8. Otherwise, one gets 
> following warnings.
>   [javadoc] 
> org/apache/hadoop/hbase/mapreduce/TestWALPlayer.class(org/apache/hadoop/hbase/mapreduce:TestWALPlayer.class):
>  warning: Cannot find annotation method 'value()' in type 
> 'org.junit.experimental.categories.Category': class file for 
> org.junit.experimental.categories.Category not found

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira