[jira] [Commented] (PIG-2684) :: in field name causes AvroStorage to fail

2012-11-28 Thread Angad Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505475#comment-13505475
 ] 

Angad Singh commented on PIG-2684:
--

Has it occurred to anyone that due to this problem AvroStorage does not support 
storing pig JOINs at all? Pig always uses ::'s in column names of JOIN's. 
AvroStorage completely fails with pig JOINs. Epic fail.

 :: in field name causes AvroStorage to fail
 ---

 Key: PIG-2684
 URL: https://issues.apache.org/jira/browse/PIG-2684
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Reporter: Fabian Alenius

 There appears to be a bug in AvroStorage which causes it to fail when there 
 are field names that contain ::
 For example, the following will fail:
 data = load 'test.txt' as (one, two);
 grp = GROUP data by (one, two);
 result = foreach grp generate FLATTEN(group); 
   
 
 store result into 'test.avro' using 
 org.apache.pig.piggybank.storage.avro.AvroStorage();
 ERROR 2999: Unexpected internal error. Illegal character in: group::one
 While the following will succeed:
 data = load 'test.txt' as (one, two);
 grp = GROUP data by (one, two);
 result = foreach grp generate FLATTEN(group) as (one,two);
  
 store result into 'test.avro' using 
 org.apache.pig.piggybank.storage.avro.AvroStorage();
 Here is a minimal test case:
 data = load 'test.txt' as (one::two, three);  
   
 
 store data into 'test.avro' using 
 org.apache.pig.piggybank.storage.avro.AvroStorage();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Our release process

2012-11-28 Thread Julien Le Dem
I would really like to see us doing frequent releases (at least once
per quarter if not once a month).
I think the whole notion of priority or being a blocker is subjective.
Releasing infrequently pressures us to push more changes than we would
want to the release branch.
We should focus on keeping TRUNK stable as well so that it is easier
to release and users can do more frequent and smaller upgrades.

There should be a small enough number of patches going in the release
branch so that we can get agreement on whether we check them in or
not.
I like Alan's proposal of reverting quickly when there's a problem.
Again, this becomes less of a problem if we release more often.

Which leads me to my next question: what are the next steps for
releasing pig 0.11 ?

Julien

On Tue, Nov 27, 2012 at 10:22 PM, Santhosh M S
santhosh_mut...@yahoo.com wrote:
 Hi Olga,

 For a moment, I will move away from P1 and P2 which are related to priorities 
 and use the Severity definitions.

 The standard bugzilla definitions for severity are:

 Blocker - Blocks development and/or testing work.
 Critical - Crashes, loss of data, severe memory leak.
 Major - Major loss of function.

 I am skipping the other levels (normal, minor and trivial) for this 
 discussion.

 Coming back to priorities, the proposed definitions map P1 to Blocker and 
 Critical. I am proposing mapping P2 to Major even when there are known 
 workarounds. We are doing this since JIRA does not have severity by default 
 (see: https://confluence.atlassian.com/pages/viewpage.action?pageId=192840)

 I am proposing that P2s be included in the released branch only when trunk or 
 unreleased versions are known to be backward incompatible or if the release 
 is more than a quarter (or two) away.

 Thanks,
 Santhosh

 
  From: Olga Natkovich onatkov...@yahoo.com
 To: dev@pig.apache.org dev@pig.apache.org; Santhosh M S 
 santhosh_mut...@yahoo.com
 Sent: Tuesday, November 27, 2012 10:41 AM
 Subject: Re: Our release process

 Hi Santhosh,

 What is your definition of P2s?

 Olga


 - Original Message -
 From: Santhosh M S santhosh_mut...@yahoo.com
 To: dev@pig.apache.org dev@pig.apache.org; Olga Natkovich 
 onatkov...@yahoo.com
 Cc:
 Sent: Monday, November 26, 2012 11:49 PM
 Subject: Re: Our release process

 Hi Olga,

 I agree that we cannot guarantee backward compatibility upfront. With that 
 knowledge, I am proposing a small modification to your proposal.

 1. If the trunk or unreleased version is known to be backwards compatible 
 then only P1 issues go into the released branch.
 2. If the the trunk or unreleased version is known to be backwards 
 incompatible or the release is a long ways off (two quarters?) then we should 
 allow for dot releases on the branch, i.e., P1 and P2 issues.

 I am hoping that should provide an incentive for users to move to a higher 
 release and at the same time allow developers to fix issues of significance 
 without impacting stability.

 Thanks,
 Santhosh


 
 From: Olga Natkovich onatkov...@yahoo.com
 To: dev@pig.apache.org dev@pig.apache.org
 Sent: Monday, November 26, 2012 9:38 AM
 Subject: Re: Our release process

 Hi Santhosh,

 I understand the compatibility issue though I am not sure we can guarantee it 
 for all releases upfront but agree that we should make an effort.

 On the e2e tests, part of the proposal is only do make P1 type of changes to 
 the branch after the initial release so they should be rare.

 Olga


 
 From: Santhosh M S santhosh_mut...@yahoo.com
 To: Olga Natkovich onatkov...@yahoo.com; dev@pig.apache.org 
 dev@pig.apache.org
 Sent: Monday, November 26, 2012 12:00 AM
 Subject: Re: Our release process


 It takes too long to run. If the e2e tests are run every night or a 
 reasonable timeframe then it will reduce the barrier for submitting patches. 
 The context for this: the reluctance of folks to move to a higher version 
 when the higher version is not backward compatible.

 Santhosh


 
 From: Olga Natkovich onatkov...@yahoo.com
 To: dev@pig.apache.org dev@pig.apache.org; Santhosh M S 
 santhosh_mut...@yahoo.com
 Sent: Sunday, November 25, 2012 5:56 PM
 Subject: Re: Our release process

 Hi Santhosh,

 Can you clarify why running e2e tests on every checking is a problem?

 Olga


 
 From: Santhosh M S santhosh_mut...@yahoo.com
 To: dev@pig.apache.org dev@pig.apache.org
 Sent: Monday, November 19, 2012 3:48 PM
 Subject: Re: Our release process

 The push for an upgrade will work only if the higher release is backward 
 compatible with the lower release. If not, folks will tend to use private 
 branches. Having a stable branch on a large deployment is a good indicator of 
 stability. However, please note that there have been instances where some 
 releases were never adopted. I will be extremely careful in applying the rule 
 of
 running e2e 

Re: Our release process

2012-11-28 Thread Bill Graham
I agree releasing often is ideal, but releasing major versions once a month
would be a bit agressive.

+1 to Olga's initial definition of how Yahoo! determines what goes into a
released branch. Basically is something broken without a workaround or is
there potential silent data loss. Trying to get a more granular definition
than that (i.e. P1, P2, severity, etc) will be painful. The reality in that
case is that for whomever is blocked by the bug will consider it a P1.

Fixes need to be relatively low-risk though to keep stability, but this is
also subjective. For this I'm in favor of relying on developer and reviewer
judgement to make that call and I'm +1 to Alan's proposal of rolling back
patches that break the e2e tests or anything else.

I think our policy should avoid time-based consideration on how many
quarters away are we from the next major release since that's also
impossible to quantify. Plus, if the answer to the question is that we're
more than 1-2 quarters from the next release is yes then we should be
fixing that release problem.


On Wed, Nov 28, 2012 at 10:22 AM, Julien Le Dem jul...@twitter.com wrote:

 I would really like to see us doing frequent releases (at least once
 per quarter if not once a month).
 I think the whole notion of priority or being a blocker is subjective.
 Releasing infrequently pressures us to push more changes than we would
 want to the release branch.
 We should focus on keeping TRUNK stable as well so that it is easier
 to release and users can do more frequent and smaller upgrades.

 There should be a small enough number of patches going in the release
 branch so that we can get agreement on whether we check them in or
 not.
 I like Alan's proposal of reverting quickly when there's a problem.
 Again, this becomes less of a problem if we release more often.

 Which leads me to my next question: what are the next steps for
 releasing pig 0.11 ?

 Julien

 On Tue, Nov 27, 2012 at 10:22 PM, Santhosh M S
 santhosh_mut...@yahoo.com wrote:
  Hi Olga,
 
  For a moment, I will move away from P1 and P2 which are related to
 priorities and use the Severity definitions.
 
  The standard bugzilla definitions for severity are:
 
  Blocker - Blocks development and/or testing work.
  Critical - Crashes, loss of data, severe memory leak.
  Major - Major loss of function.
 
  I am skipping the other levels (normal, minor and trivial) for this
 discussion.
 
  Coming back to priorities, the proposed definitions map P1 to Blocker
 and Critical. I am proposing mapping P2 to Major even when there are known
 workarounds. We are doing this since JIRA does not have severity by default
 (see: https://confluence.atlassian.com/pages/viewpage.action?pageId=192840
 )
 
  I am proposing that P2s be included in the released branch only when
 trunk or unreleased versions are known to be backward incompatible or if
 the release is more than a quarter (or two) away.
 
  Thanks,
  Santhosh
 
  
   From: Olga Natkovich onatkov...@yahoo.com
  To: dev@pig.apache.org dev@pig.apache.org; Santhosh M S 
 santhosh_mut...@yahoo.com
  Sent: Tuesday, November 27, 2012 10:41 AM
  Subject: Re: Our release process
 
  Hi Santhosh,
 
  What is your definition of P2s?
 
  Olga
 
 
  - Original Message -
  From: Santhosh M S santhosh_mut...@yahoo.com
  To: dev@pig.apache.org dev@pig.apache.org; Olga Natkovich 
 onatkov...@yahoo.com
  Cc:
  Sent: Monday, November 26, 2012 11:49 PM
  Subject: Re: Our release process
 
  Hi Olga,
 
  I agree that we cannot guarantee backward compatibility upfront. With
 that knowledge, I am proposing a small modification to your proposal.
 
  1. If the trunk or unreleased version is known to be backwards
 compatible then only P1 issues go into the released branch.
  2. If the the trunk or unreleased version is known to be backwards
 incompatible or the release is a long ways off (two quarters?) then we
 should allow for dot releases on the branch, i.e., P1 and P2 issues.
 
  I am hoping that should provide an incentive for users to move to a
 higher release and at the same time allow developers to fix issues of
 significance without impacting stability.
 
  Thanks,
  Santhosh
 
 
  
  From: Olga Natkovich onatkov...@yahoo.com
  To: dev@pig.apache.org dev@pig.apache.org
  Sent: Monday, November 26, 2012 9:38 AM
  Subject: Re: Our release process
 
  Hi Santhosh,
 
  I understand the compatibility issue though I am not sure we can
 guarantee it for all releases upfront but agree that we should make an
 effort.
 
  On the e2e tests, part of the proposal is only do make P1 type of
 changes to the branch after the initial release so they should be rare.
 
  Olga
 
 
  
  From: Santhosh M S santhosh_mut...@yahoo.com
  To: Olga Natkovich onatkov...@yahoo.com; dev@pig.apache.org 
 dev@pig.apache.org
  Sent: Monday, November 26, 2012 12:00 AM
  Subject: Re: Our release process
 
 
  It takes 

[jira] [Updated] (PIG-3014) CurrentTime() UDF has undesirable characteristics

2012-11-28 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3014:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Jonathan and Rohini!

 CurrentTime() UDF has undesirable characteristics
 -

 Key: PIG-3014
 URL: https://issues.apache.org/jira/browse/PIG-3014
 Project: Pig
  Issue Type: Bug
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch


 As part of the explanation of the new DateTime datatype I noticed that we had 
 added a CurrentTime() UDF. The issue with this UDF is that it returns the 
 current time _of every exec invocation_, which can lead to confusing results. 
 In PIG-1431 I proposed a way such that every instance of the same NOW() will 
 return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1431) Current DateTime UDFs: ISONOW(), UNIXNOW()

2012-11-28 Thread Jonathan Coveney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-1431:
--

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Closing this, as PIG-3014 covers this.

 Current DateTime UDFs: ISONOW(), UNIXNOW()
 --

 Key: PIG-1431
 URL: https://issues.apache.org/jira/browse/PIG-1431
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
Assignee: Jonathan Coveney
  Labels: datetime, now, simple, udf
 Fix For: 0.12

 Attachments: PIG-1431-0.patch


 Need a NOW() for getting datetime diffs between now and a prior or future 
 date.  Will use the system timezone.  Will make one for ISO datetime and one 
 for Unix time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Pig-trunk #1367

2012-11-28 Thread Apache Jenkins Server
See https://builds.apache.org/job/Pig-trunk/1367/changes

Changes:

[cheolsoo] PIG-3014: CurrentTime() UDF has undesirable characteristics 
(jcoveney via
cheolsoo) - adding a new test file

[cheolsoo] PIG-3014: CurrentTime() UDF has undesirable characteristics 
(jcoveney via cheolsoo)

--
[...truncated 6643 lines...]
 [findbugs]   jline.History
 [findbugs]   org.jruby.embed.internal.LocalContextProvider
 [findbugs]   org.apache.hadoop.io.BooleanWritable
 [findbugs]   org.apache.log4j.Logger
 [findbugs]   org.apache.hadoop.hbase.filter.FamilyFilter
 [findbugs]   org.codehaus.jackson.annotate.JsonPropertyOrder
 [findbugs]   groovy.lang.Tuple
 [findbugs]   org.antlr.runtime.IntStream
 [findbugs]   org.apache.hadoop.util.ReflectionUtils
 [findbugs]   org.apache.hadoop.fs.ContentSummary
 [findbugs]   org.jruby.runtime.builtin.IRubyObject
 [findbugs]   org.jruby.RubyInteger
 [findbugs]   org.python.core.PyTuple
 [findbugs]   org.mortbay.log.Log
 [findbugs]   org.apache.hadoop.conf.Configuration
 [findbugs]   com.google.common.base.Joiner
 [findbugs]   org.apache.hadoop.mapreduce.lib.input.FileSplit
 [findbugs]   org.apache.hadoop.mapred.Counters$Counter
 [findbugs]   com.jcraft.jsch.Channel
 [findbugs]   org.apache.hadoop.mapred.JobPriority
 [findbugs]   org.apache.commons.cli.Options
 [findbugs]   org.apache.hadoop.mapred.JobID
 [findbugs]   org.apache.hadoop.util.bloom.BloomFilter
 [findbugs]   org.python.core.PyFrame
 [findbugs]   org.apache.hadoop.hbase.filter.CompareFilter
 [findbugs]   org.apache.hadoop.util.VersionInfo
 [findbugs]   org.python.core.PyString
 [findbugs]   org.apache.hadoop.io.Text$Comparator
 [findbugs]   org.jruby.runtime.Block
 [findbugs]   org.antlr.runtime.MismatchedSetException
 [findbugs]   org.apache.hadoop.io.BytesWritable
 [findbugs]   org.apache.hadoop.fs.FsShell
 [findbugs]   org.joda.time.Months
 [findbugs]   org.mozilla.javascript.ImporterTopLevel
 [findbugs]   org.apache.hadoop.hbase.mapreduce.TableOutputFormat
 [findbugs]   org.apache.hadoop.mapred.TaskReport
 [findbugs]   org.apache.hadoop.security.UserGroupInformation
 [findbugs]   org.antlr.runtime.tree.RewriteRuleSubtreeStream
 [findbugs]   org.apache.commons.cli.HelpFormatter
 [findbugs]   com.google.common.collect.Maps
 [findbugs]   org.joda.time.ReadableInstant
 [findbugs]   org.mozilla.javascript.NativeObject
 [findbugs]   org.apache.hadoop.hbase.HConstants
 [findbugs]   org.apache.hadoop.io.serializer.Deserializer
 [findbugs]   org.antlr.runtime.FailedPredicateException
 [findbugs]   org.apache.hadoop.io.compress.CompressionCodec
 [findbugs]   org.jruby.RubyNil
 [findbugs]   org.apache.hadoop.fs.FileStatus
 [findbugs]   org.apache.hadoop.hbase.client.Result
 [findbugs]   org.apache.hadoop.mapreduce.JobContext
 [findbugs]   org.codehaus.jackson.JsonGenerator
 [findbugs]   org.apache.hadoop.mapreduce.TaskAttemptContext
 [findbugs]   org.apache.hadoop.io.LongWritable$Comparator
 [findbugs]   org.codehaus.jackson.map.util.LRUMap
 [findbugs]   org.apache.hadoop.hbase.util.Bytes
 [findbugs]   org.antlr.runtime.MismatchedTokenException
 [findbugs]   org.codehaus.jackson.JsonParser
 [findbugs]   com.jcraft.jsch.UserInfo
 [findbugs]   org.apache.hadoop.hbase.filter.WhileMatchFilter
 [findbugs]   org.python.core.PyException
 [findbugs]   org.apache.commons.cli.ParseException
 [findbugs]   org.apache.hadoop.io.compress.CompressionOutputStream
 [findbugs]   org.apache.hadoop.hbase.filter.WritableByteArrayComparable
 [findbugs]   org.antlr.runtime.tree.CommonTreeNodeStream
 [findbugs]   org.apache.log4j.Level
 [findbugs]   org.apache.hadoop.hbase.client.Scan
 [findbugs]   org.jruby.anno.JRubyMethod
 [findbugs]   org.apache.hadoop.mapreduce.Job
 [findbugs]   com.google.common.util.concurrent.Futures
 [findbugs]   org.apache.commons.logging.LogFactory
 [findbugs]   org.apache.commons.collections.IteratorUtils
 [findbugs]   org.apache.commons.codec.binary.Base64
 [findbugs]   org.codehaus.jackson.map.ObjectMapper
 [findbugs]   org.apache.hadoop.fs.FileSystem
 [findbugs]   org.jruby.embed.LocalContextScope
 [findbugs]   org.apache.hadoop.hbase.filter.FilterList$Operator
 [findbugs]   org.jruby.RubySymbol
 [findbugs]   org.codehaus.jackson.map.annotate.JacksonStdImpl
 [findbugs]   org.apache.hadoop.hbase.io.ImmutableBytesWritable
 [findbugs]   org.apache.hadoop.io.serializer.SerializationFactory
 [findbugs]   org.antlr.runtime.tree.TreeAdaptor
 [findbugs]   org.apache.hadoop.mapred.RunningJob
 [findbugs]   org.antlr.runtime.CommonTokenStream
 [findbugs]   org.apache.hadoop.io.DataInputBuffer
 [findbugs]   org.apache.hadoop.io.file.tfile.TFile
 [findbugs]   org.apache.commons.cli.GnuParser
 [findbugs]   org.mozilla.javascript.Context
 [findbugs]   org.apache.hadoop.io.FloatWritable
 [findbugs]   org.antlr.runtime.tree.RewriteEarlyExitException
 [findbugs]   org.apache.hadoop.hbase.HBaseConfiguration
 [findbugs]   org.codehaus.jackson.JsonGenerationException
 [findbugs]   

[jira] [Updated] (PIG-2978) TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x

2012-11-28 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2978:
---

Attachment: PIG-2978-2.patch

Incorporated Rohini's comments in the RB.
- Changed Job.class.getName() to getJobName()
- Added comments regarding the difference between hadoop 1.0.x and 2.0.x in 
terms of the number of StoreFunc instances.

 TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x
 --

 Key: PIG-2978
 URL: https://issues.apache.org/jira/browse/PIG-2978
 Project: Pig
  Issue Type: Sub-task
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.11

 Attachments: PIG-2978-2.patch, PIG-2978.patch


 To reproduce, please run:
 {code}
 ant clean test -Dtestcase=TestLoadStoreFuncLifeCycle -Dhadoopversion=23
 {code}
 This fails with the following error:
 {code}
 Error during parsing. Job in state DEFINE instead of RUNNING
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
 parsing. Job in state DEFINE instead of RUNNING
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1546)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:516)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:529)
 at 
 org.apache.pig.TestLoadStoreFuncLifeCycle.testLoadStoreFunc(TestLoadStoreFuncLifeCycle.java:332)
 Caused by: Failed to parse: Job in state DEFINE instead of RUNNING
 at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:193)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1599)
 Caused by: java.lang.IllegalStateException: Job in state DEFINE instead of 
 RUNNING
 at org.apache.hadoop.mapreduce.Job.ensureState(Job.java:292)
 at org.apache.hadoop.mapreduce.Job.toString(Job.java:456)
 at java.lang.String.valueOf(String.java:2826)
 at 
 org.apache.pig.TestLoadStoreFuncLifeCycle.logCaller(TestLoadStoreFuncLifeCycle.java:270)
 at 
 org.apache.pig.TestLoadStoreFuncLifeCycle.access$000(TestLoadStoreFuncLifeCycle.java:41)
 at 
 org.apache.pig.TestLoadStoreFuncLifeCycle$InstrumentedStorage.logCaller(TestLoadStoreFuncLifeCycle.java:54)
 at 
 org.apache.pig.TestLoadStoreFuncLifeCycle$InstrumentedStorage.getSchema(TestLoadStoreFuncLifeCycle.java:115)
 at 
 org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:174)
 at org.apache.pig.newplan.logical.relational.LOLoad.init(LOLoad.java:88)
 at 
 org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:839)
 at 
 org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3236)
 at 
 org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1315)
 at 
 org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:799)
 at 
 org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:517)
 at 
 org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:392)
 at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2978) TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x

2012-11-28 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2978:
---

Status: Patch Available  (was: Open)

 TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x
 --

 Key: PIG-2978
 URL: https://issues.apache.org/jira/browse/PIG-2978
 Project: Pig
  Issue Type: Sub-task
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.11

 Attachments: PIG-2978-2.patch, PIG-2978.patch


 To reproduce, please run:
 {code}
 ant clean test -Dtestcase=TestLoadStoreFuncLifeCycle -Dhadoopversion=23
 {code}
 This fails with the following error:
 {code}
 Error during parsing. Job in state DEFINE instead of RUNNING
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
 parsing. Job in state DEFINE instead of RUNNING
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1546)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:516)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:529)
 at 
 org.apache.pig.TestLoadStoreFuncLifeCycle.testLoadStoreFunc(TestLoadStoreFuncLifeCycle.java:332)
 Caused by: Failed to parse: Job in state DEFINE instead of RUNNING
 at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:193)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1599)
 Caused by: java.lang.IllegalStateException: Job in state DEFINE instead of 
 RUNNING
 at org.apache.hadoop.mapreduce.Job.ensureState(Job.java:292)
 at org.apache.hadoop.mapreduce.Job.toString(Job.java:456)
 at java.lang.String.valueOf(String.java:2826)
 at 
 org.apache.pig.TestLoadStoreFuncLifeCycle.logCaller(TestLoadStoreFuncLifeCycle.java:270)
 at 
 org.apache.pig.TestLoadStoreFuncLifeCycle.access$000(TestLoadStoreFuncLifeCycle.java:41)
 at 
 org.apache.pig.TestLoadStoreFuncLifeCycle$InstrumentedStorage.logCaller(TestLoadStoreFuncLifeCycle.java:54)
 at 
 org.apache.pig.TestLoadStoreFuncLifeCycle$InstrumentedStorage.getSchema(TestLoadStoreFuncLifeCycle.java:115)
 at 
 org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:174)
 at org.apache.pig.newplan.logical.relational.LOLoad.init(LOLoad.java:88)
 at 
 org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:839)
 at 
 org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3236)
 at 
 org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1315)
 at 
 org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:799)
 at 
 org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:517)
 at 
 org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:392)
 at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2978) TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x

2012-11-28 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2978:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to 0.11/trunk.

Thanks Rohini for clarifying the difference in Hadoop 2.0.x.

 TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x
 --

 Key: PIG-2978
 URL: https://issues.apache.org/jira/browse/PIG-2978
 Project: Pig
  Issue Type: Sub-task
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.11

 Attachments: PIG-2978-2.patch, PIG-2978.patch


 To reproduce, please run:
 {code}
 ant clean test -Dtestcase=TestLoadStoreFuncLifeCycle -Dhadoopversion=23
 {code}
 This fails with the following error:
 {code}
 Error during parsing. Job in state DEFINE instead of RUNNING
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
 parsing. Job in state DEFINE instead of RUNNING
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1546)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:516)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:529)
 at 
 org.apache.pig.TestLoadStoreFuncLifeCycle.testLoadStoreFunc(TestLoadStoreFuncLifeCycle.java:332)
 Caused by: Failed to parse: Job in state DEFINE instead of RUNNING
 at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:193)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1599)
 Caused by: java.lang.IllegalStateException: Job in state DEFINE instead of 
 RUNNING
 at org.apache.hadoop.mapreduce.Job.ensureState(Job.java:292)
 at org.apache.hadoop.mapreduce.Job.toString(Job.java:456)
 at java.lang.String.valueOf(String.java:2826)
 at 
 org.apache.pig.TestLoadStoreFuncLifeCycle.logCaller(TestLoadStoreFuncLifeCycle.java:270)
 at 
 org.apache.pig.TestLoadStoreFuncLifeCycle.access$000(TestLoadStoreFuncLifeCycle.java:41)
 at 
 org.apache.pig.TestLoadStoreFuncLifeCycle$InstrumentedStorage.logCaller(TestLoadStoreFuncLifeCycle.java:54)
 at 
 org.apache.pig.TestLoadStoreFuncLifeCycle$InstrumentedStorage.getSchema(TestLoadStoreFuncLifeCycle.java:115)
 at 
 org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:174)
 at org.apache.pig.newplan.logical.relational.LOLoad.init(LOLoad.java:88)
 at 
 org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:839)
 at 
 org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3236)
 at 
 org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1315)
 at 
 org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:799)
 at 
 org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:517)
 at 
 org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:392)
 at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3034) Remove Penny code from Pig repository

2012-11-28 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3034:
---

Fix Version/s: 0.11

Committed to 0.11.

 Remove Penny code from Pig repository
 -

 Key: PIG-3034
 URL: https://issues.apache.org/jira/browse/PIG-3034
 Project: Pig
  Issue Type: Task
Affects Versions: 0.12
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.11, 0.12

 Attachments: PIG-penniless.patch


 Per the discussion at 
 http://mail-archives.apache.org/mod_mbox/pig-dev/201210.mbox/%3C7C2F4342-E5AE-4FEF-B4C6-8413646D8D37%40hortonworks.com%3E
  and 
 http://mail-archives.apache.org/mod_mbox/pig-dev/201211.mbox/%3CCAO8ATY2WgFf37qBmyzT8B6HNCsGMS-1QQOkY9zp4AL_8Aud_cw%40mail.gmail.com%3E
  we have decided to remove Penny from Pig's source code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3015) Rewrite of AvroStorage

2012-11-28 Thread Joseph Adler (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506099#comment-13506099
 ] 

Joseph Adler commented on PIG-3015:
---

Hi Timothy:

I have not tried the patch with Pig 0.10, but I don't know of any reason why it 
would not work. Give it a spin and let us know what happens.

-- Joe

 Rewrite of AvroStorage
 --

 Key: PIG-3015
 URL: https://issues.apache.org/jira/browse/PIG-3015
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Joseph Adler
Assignee: Joseph Adler
 Attachments: PIG-3015.patch


 The current AvroStorage implementation has a lot of issues: it requires old 
 versions of Avro, it copies data much more than needed, and it's verbose and 
 complicated. (One pet peeve of mine is that old versions of Avro don't 
 support Snappy compression.)
 I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
 new implementation is significantly faster, and the code is a lot simpler. 
 Rewriting AvroStorage also enabled me to implement support for Trevni.
 I'm opening this ticket to facilitate discussion while I figure out the best 
 way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2614) AvroStorage crashes on LOADING a single bad error

2012-11-28 Thread Joseph Adler (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506101#comment-13506101
 ] 

Joseph Adler commented on PIG-2614:
---

Repeating an old question: is there any reason that this patch is only for 
Avro? I think this could work for all storage types.

 AvroStorage crashes on LOADING a single bad error
 -

 Key: PIG-2614
 URL: https://issues.apache.org/jira/browse/PIG-2614
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Affects Versions: 0.10.0, 0.11
Reporter: Russell Jurney
Assignee: Jonathan Coveney
  Labels: avro, avrostorage, bad, book, cutting, doug, for, my, 
 pig, sadism
 Fix For: 0.11, 0.10.1

 Attachments: PIG-2614_0.patch, PIG-2614_1.patch


 AvroStorage dies when a single bad record exists, such as one with missing 
 fields.  This is very bad on 'big data,' where bad records are inevitable.  
 See discussion at 
 http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss
  for more theory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3071) update hcatalog jar and path to hbase storage handler har

2012-11-28 Thread Arpit Gupta (JIRA)
Arpit Gupta created PIG-3071:


 Summary: update hcatalog jar and path to hbase storage handler har
 Key: PIG-3071
 URL: https://issues.apache.org/jira/browse/PIG-3071
 Project: Pig
  Issue Type: Bug
Reporter: Arpit Gupta


Due to changes in hcatalog 0.5 packaging we need to update the hcatalog jar 
name and the path to the hbase storage handler jar.

pig script should be updated to work with either version.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3071) update hcatalog jar and path to hbase storage handler har

2012-11-28 Thread Arpit Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Gupta updated PIG-3071:
-

Attachment: PIG-3071.patch

attached is patch that takes a stab at fixing this

 update hcatalog jar and path to hbase storage handler har
 -

 Key: PIG-3071
 URL: https://issues.apache.org/jira/browse/PIG-3071
 Project: Pig
  Issue Type: Bug
Reporter: Arpit Gupta
 Attachments: PIG-3071.patch


 Due to changes in hcatalog 0.5 packaging we need to update the hcatalog jar 
 name and the path to the hbase storage handler jar.
 pig script should be updated to work with either version.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3071) update hcatalog jar and path to hbase storage handler har

2012-11-28 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-3071:
---

Labels: hcatalog  (was: )

 update hcatalog jar and path to hbase storage handler har
 -

 Key: PIG-3071
 URL: https://issues.apache.org/jira/browse/PIG-3071
 Project: Pig
  Issue Type: Bug
Reporter: Arpit Gupta
  Labels: hcatalog
 Attachments: PIG-3071.patch


 Due to changes in hcatalog 0.5 packaging we need to update the hcatalog jar 
 name and the path to the hbase storage handler jar.
 pig script should be updated to work with either version.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Our release process

2012-11-28 Thread Santhosh M S
Since releasing a major version once a month is agressive and we have not 
released on a quarterly basis, we should allow commits to a released branch to 
facilitate dot releases.

If we are allowing commits to a released branch, the criteria for inclusion can 
be created anew or we use the industry standards for severity (or priority). It 
could be painful for a few folks but I don't see better alternatives.

Regarding reverting commits based on e2e tests breaking:
1. Who is running the tests?
2. How often are they run?
If we have nightly e2e runs then its easier to catch these errors early. If not 
the barrier for inclusion is pretty high and time consuming making it harder to 
develop.

Santhosh



 From: Bill Graham billgra...@gmail.com
To: dev@pig.apache.org 
Sent: Wednesday, November 28, 2012 11:39 AM
Subject: Re: Our release process
 
I agree releasing often is ideal, but releasing major versions once a month
would be a bit agressive.

+1 to Olga's initial definition of how Yahoo! determines what goes into a
released branch. Basically is something broken without a workaround or is
there potential silent data loss. Trying to get a more granular definition
than that (i.e. P1, P2, severity, etc) will be painful. The reality in that
case is that for whomever is blocked by the bug will consider it a P1.

Fixes need to be relatively low-risk though to keep stability, but this is
also subjective. For this I'm in favor of relying on developer and reviewer
judgement to make that call and I'm +1 to Alan's proposal of rolling back
patches that break the e2e tests or anything else.

I think our policy should avoid time-based consideration on how many
quarters away are we from the next major release since that's also
impossible to quantify. Plus, if the answer to the question is that we're
more than 1-2 quarters from the next release is yes then we should be
fixing that release problem.


On Wed, Nov 28, 2012 at 10:22 AM, Julien Le Dem jul...@twitter.com wrote:

 I would really like to see us doing frequent releases (at least once
 per quarter if not once a month).
 I think the whole notion of priority or being a blocker is subjective.
 Releasing infrequently pressures us to push more changes than we would
 want to the release branch.
 We should focus on keeping TRUNK stable as well so that it is easier
 to release and users can do more frequent and smaller upgrades.

 There should be a small enough number of patches going in the release
 branch so that we can get agreement on whether we check them in or
 not.
 I like Alan's proposal of reverting quickly when there's a problem.
 Again, this becomes less of a problem if we release more often.

 Which leads me to my next question: what are the next steps for
 releasing pig 0.11 ?

 Julien

 On Tue, Nov 27, 2012 at 10:22 PM, Santhosh M S
 santhosh_mut...@yahoo.com wrote:
  Hi Olga,
 
  For a moment, I will move away from P1 and P2 which are related to
 priorities and use the Severity definitions.
 
  The standard bugzilla definitions for severity are:
 
  Blocker - Blocks development and/or testing work.
  Critical - Crashes, loss of data, severe memory leak.
  Major - Major loss of function.
 
  I am skipping the other levels (normal, minor and trivial) for this
 discussion.
 
  Coming back to priorities, the proposed definitions map P1 to Blocker
 and Critical. I am proposing mapping P2 to Major even when there are known
 workarounds. We are doing this since JIRA does not have severity by default
 (see: https://confluence.atlassian.com/pages/viewpage.action?pageId=192840
 )
 
  I am proposing that P2s be included in the released branch only when
 trunk or unreleased versions are known to be backward incompatible or if
 the release is more than a quarter (or two) away.
 
  Thanks,
  Santhosh
 
  
   From: Olga Natkovich onatkov...@yahoo.com
  To: dev@pig.apache.org dev@pig.apache.org; Santhosh M S 
 santhosh_mut...@yahoo.com
  Sent: Tuesday, November 27, 2012 10:41 AM
  Subject: Re: Our release process
 
  Hi Santhosh,
 
  What is your definition of P2s?
 
  Olga
 
 
  - Original Message -
  From: Santhosh M S santhosh_mut...@yahoo.com
  To: dev@pig.apache.org dev@pig.apache.org; Olga Natkovich 
 onatkov...@yahoo.com
  Cc:
  Sent: Monday, November 26, 2012 11:49 PM
  Subject: Re: Our release process
 
  Hi Olga,
 
  I agree that we cannot guarantee backward compatibility upfront. With
 that knowledge, I am proposing a small modification to your proposal.
 
  1. If the trunk or unreleased version is known to be backwards
 compatible then only P1 issues go into the released branch.
  2. If the the trunk or unreleased version is known to be backwards
 incompatible or the release is a long ways off (two quarters?) then we
 should allow for dot releases on the branch, i.e., P1 and P2 issues.
 
  I am hoping that should provide an incentive for users to move to a
 higher release and