[jira] [Commented] (PIG-2684) :: in field name causes AvroStorage to fail
[ https://issues.apache.org/jira/browse/PIG-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505475#comment-13505475 ] Angad Singh commented on PIG-2684: -- Has it occurred to anyone that due to this problem AvroStorage does not support storing pig JOINs at all? Pig always uses ::'s in column names of JOIN's. AvroStorage completely fails with pig JOINs. Epic fail. :: in field name causes AvroStorage to fail --- Key: PIG-2684 URL: https://issues.apache.org/jira/browse/PIG-2684 Project: Pig Issue Type: Bug Components: piggybank Reporter: Fabian Alenius There appears to be a bug in AvroStorage which causes it to fail when there are field names that contain :: For example, the following will fail: data = load 'test.txt' as (one, two); grp = GROUP data by (one, two); result = foreach grp generate FLATTEN(group); store result into 'test.avro' using org.apache.pig.piggybank.storage.avro.AvroStorage(); ERROR 2999: Unexpected internal error. Illegal character in: group::one While the following will succeed: data = load 'test.txt' as (one, two); grp = GROUP data by (one, two); result = foreach grp generate FLATTEN(group) as (one,two); store result into 'test.avro' using org.apache.pig.piggybank.storage.avro.AvroStorage(); Here is a minimal test case: data = load 'test.txt' as (one::two, three); store data into 'test.avro' using org.apache.pig.piggybank.storage.avro.AvroStorage(); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Our release process
I would really like to see us doing frequent releases (at least once per quarter if not once a month). I think the whole notion of priority or being a blocker is subjective. Releasing infrequently pressures us to push more changes than we would want to the release branch. We should focus on keeping TRUNK stable as well so that it is easier to release and users can do more frequent and smaller upgrades. There should be a small enough number of patches going in the release branch so that we can get agreement on whether we check them in or not. I like Alan's proposal of reverting quickly when there's a problem. Again, this becomes less of a problem if we release more often. Which leads me to my next question: what are the next steps for releasing pig 0.11 ? Julien On Tue, Nov 27, 2012 at 10:22 PM, Santhosh M S santhosh_mut...@yahoo.com wrote: Hi Olga, For a moment, I will move away from P1 and P2 which are related to priorities and use the Severity definitions. The standard bugzilla definitions for severity are: Blocker - Blocks development and/or testing work. Critical - Crashes, loss of data, severe memory leak. Major - Major loss of function. I am skipping the other levels (normal, minor and trivial) for this discussion. Coming back to priorities, the proposed definitions map P1 to Blocker and Critical. I am proposing mapping P2 to Major even when there are known workarounds. We are doing this since JIRA does not have severity by default (see: https://confluence.atlassian.com/pages/viewpage.action?pageId=192840) I am proposing that P2s be included in the released branch only when trunk or unreleased versions are known to be backward incompatible or if the release is more than a quarter (or two) away. Thanks, Santhosh From: Olga Natkovich onatkov...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org; Santhosh M S santhosh_mut...@yahoo.com Sent: Tuesday, November 27, 2012 10:41 AM Subject: Re: Our release process Hi Santhosh, What is your definition of P2s? Olga - Original Message - From: Santhosh M S santhosh_mut...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org; Olga Natkovich onatkov...@yahoo.com Cc: Sent: Monday, November 26, 2012 11:49 PM Subject: Re: Our release process Hi Olga, I agree that we cannot guarantee backward compatibility upfront. With that knowledge, I am proposing a small modification to your proposal. 1. If the trunk or unreleased version is known to be backwards compatible then only P1 issues go into the released branch. 2. If the the trunk or unreleased version is known to be backwards incompatible or the release is a long ways off (two quarters?) then we should allow for dot releases on the branch, i.e., P1 and P2 issues. I am hoping that should provide an incentive for users to move to a higher release and at the same time allow developers to fix issues of significance without impacting stability. Thanks, Santhosh From: Olga Natkovich onatkov...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org Sent: Monday, November 26, 2012 9:38 AM Subject: Re: Our release process Hi Santhosh, I understand the compatibility issue though I am not sure we can guarantee it for all releases upfront but agree that we should make an effort. On the e2e tests, part of the proposal is only do make P1 type of changes to the branch after the initial release so they should be rare. Olga From: Santhosh M S santhosh_mut...@yahoo.com To: Olga Natkovich onatkov...@yahoo.com; dev@pig.apache.org dev@pig.apache.org Sent: Monday, November 26, 2012 12:00 AM Subject: Re: Our release process It takes too long to run. If the e2e tests are run every night or a reasonable timeframe then it will reduce the barrier for submitting patches. The context for this: the reluctance of folks to move to a higher version when the higher version is not backward compatible. Santhosh From: Olga Natkovich onatkov...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org; Santhosh M S santhosh_mut...@yahoo.com Sent: Sunday, November 25, 2012 5:56 PM Subject: Re: Our release process Hi Santhosh, Can you clarify why running e2e tests on every checking is a problem? Olga From: Santhosh M S santhosh_mut...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org Sent: Monday, November 19, 2012 3:48 PM Subject: Re: Our release process The push for an upgrade will work only if the higher release is backward compatible with the lower release. If not, folks will tend to use private branches. Having a stable branch on a large deployment is a good indicator of stability. However, please note that there have been instances where some releases were never adopted. I will be extremely careful in applying the rule of running e2e
Re: Our release process
I agree releasing often is ideal, but releasing major versions once a month would be a bit agressive. +1 to Olga's initial definition of how Yahoo! determines what goes into a released branch. Basically is something broken without a workaround or is there potential silent data loss. Trying to get a more granular definition than that (i.e. P1, P2, severity, etc) will be painful. The reality in that case is that for whomever is blocked by the bug will consider it a P1. Fixes need to be relatively low-risk though to keep stability, but this is also subjective. For this I'm in favor of relying on developer and reviewer judgement to make that call and I'm +1 to Alan's proposal of rolling back patches that break the e2e tests or anything else. I think our policy should avoid time-based consideration on how many quarters away are we from the next major release since that's also impossible to quantify. Plus, if the answer to the question is that we're more than 1-2 quarters from the next release is yes then we should be fixing that release problem. On Wed, Nov 28, 2012 at 10:22 AM, Julien Le Dem jul...@twitter.com wrote: I would really like to see us doing frequent releases (at least once per quarter if not once a month). I think the whole notion of priority or being a blocker is subjective. Releasing infrequently pressures us to push more changes than we would want to the release branch. We should focus on keeping TRUNK stable as well so that it is easier to release and users can do more frequent and smaller upgrades. There should be a small enough number of patches going in the release branch so that we can get agreement on whether we check them in or not. I like Alan's proposal of reverting quickly when there's a problem. Again, this becomes less of a problem if we release more often. Which leads me to my next question: what are the next steps for releasing pig 0.11 ? Julien On Tue, Nov 27, 2012 at 10:22 PM, Santhosh M S santhosh_mut...@yahoo.com wrote: Hi Olga, For a moment, I will move away from P1 and P2 which are related to priorities and use the Severity definitions. The standard bugzilla definitions for severity are: Blocker - Blocks development and/or testing work. Critical - Crashes, loss of data, severe memory leak. Major - Major loss of function. I am skipping the other levels (normal, minor and trivial) for this discussion. Coming back to priorities, the proposed definitions map P1 to Blocker and Critical. I am proposing mapping P2 to Major even when there are known workarounds. We are doing this since JIRA does not have severity by default (see: https://confluence.atlassian.com/pages/viewpage.action?pageId=192840 ) I am proposing that P2s be included in the released branch only when trunk or unreleased versions are known to be backward incompatible or if the release is more than a quarter (or two) away. Thanks, Santhosh From: Olga Natkovich onatkov...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org; Santhosh M S santhosh_mut...@yahoo.com Sent: Tuesday, November 27, 2012 10:41 AM Subject: Re: Our release process Hi Santhosh, What is your definition of P2s? Olga - Original Message - From: Santhosh M S santhosh_mut...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org; Olga Natkovich onatkov...@yahoo.com Cc: Sent: Monday, November 26, 2012 11:49 PM Subject: Re: Our release process Hi Olga, I agree that we cannot guarantee backward compatibility upfront. With that knowledge, I am proposing a small modification to your proposal. 1. If the trunk or unreleased version is known to be backwards compatible then only P1 issues go into the released branch. 2. If the the trunk or unreleased version is known to be backwards incompatible or the release is a long ways off (two quarters?) then we should allow for dot releases on the branch, i.e., P1 and P2 issues. I am hoping that should provide an incentive for users to move to a higher release and at the same time allow developers to fix issues of significance without impacting stability. Thanks, Santhosh From: Olga Natkovich onatkov...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org Sent: Monday, November 26, 2012 9:38 AM Subject: Re: Our release process Hi Santhosh, I understand the compatibility issue though I am not sure we can guarantee it for all releases upfront but agree that we should make an effort. On the e2e tests, part of the proposal is only do make P1 type of changes to the branch after the initial release so they should be rare. Olga From: Santhosh M S santhosh_mut...@yahoo.com To: Olga Natkovich onatkov...@yahoo.com; dev@pig.apache.org dev@pig.apache.org Sent: Monday, November 26, 2012 12:00 AM Subject: Re: Our release process It takes
[jira] [Updated] (PIG-3014) CurrentTime() UDF has undesirable characteristics
[ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3014: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thanks Jonathan and Rohini! CurrentTime() UDF has undesirable characteristics - Key: PIG-3014 URL: https://issues.apache.org/jira/browse/PIG-3014 Project: Pig Issue Type: Bug Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1431) Current DateTime UDFs: ISONOW(), UNIXNOW()
[ https://issues.apache.org/jira/browse/PIG-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-1431: -- Resolution: Duplicate Status: Resolved (was: Patch Available) Closing this, as PIG-3014 covers this. Current DateTime UDFs: ISONOW(), UNIXNOW() -- Key: PIG-1431 URL: https://issues.apache.org/jira/browse/PIG-1431 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Assignee: Jonathan Coveney Labels: datetime, now, simple, udf Fix For: 0.12 Attachments: PIG-1431-0.patch Need a NOW() for getting datetime diffs between now and a prior or future date. Will use the system timezone. Will make one for ISO datetime and one for Unix time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Pig-trunk #1367
See https://builds.apache.org/job/Pig-trunk/1367/changes Changes: [cheolsoo] PIG-3014: CurrentTime() UDF has undesirable characteristics (jcoveney via cheolsoo) - adding a new test file [cheolsoo] PIG-3014: CurrentTime() UDF has undesirable characteristics (jcoveney via cheolsoo) -- [...truncated 6643 lines...] [findbugs] jline.History [findbugs] org.jruby.embed.internal.LocalContextProvider [findbugs] org.apache.hadoop.io.BooleanWritable [findbugs] org.apache.log4j.Logger [findbugs] org.apache.hadoop.hbase.filter.FamilyFilter [findbugs] org.codehaus.jackson.annotate.JsonPropertyOrder [findbugs] groovy.lang.Tuple [findbugs] org.antlr.runtime.IntStream [findbugs] org.apache.hadoop.util.ReflectionUtils [findbugs] org.apache.hadoop.fs.ContentSummary [findbugs] org.jruby.runtime.builtin.IRubyObject [findbugs] org.jruby.RubyInteger [findbugs] org.python.core.PyTuple [findbugs] org.mortbay.log.Log [findbugs] org.apache.hadoop.conf.Configuration [findbugs] com.google.common.base.Joiner [findbugs] org.apache.hadoop.mapreduce.lib.input.FileSplit [findbugs] org.apache.hadoop.mapred.Counters$Counter [findbugs] com.jcraft.jsch.Channel [findbugs] org.apache.hadoop.mapred.JobPriority [findbugs] org.apache.commons.cli.Options [findbugs] org.apache.hadoop.mapred.JobID [findbugs] org.apache.hadoop.util.bloom.BloomFilter [findbugs] org.python.core.PyFrame [findbugs] org.apache.hadoop.hbase.filter.CompareFilter [findbugs] org.apache.hadoop.util.VersionInfo [findbugs] org.python.core.PyString [findbugs] org.apache.hadoop.io.Text$Comparator [findbugs] org.jruby.runtime.Block [findbugs] org.antlr.runtime.MismatchedSetException [findbugs] org.apache.hadoop.io.BytesWritable [findbugs] org.apache.hadoop.fs.FsShell [findbugs] org.joda.time.Months [findbugs] org.mozilla.javascript.ImporterTopLevel [findbugs] org.apache.hadoop.hbase.mapreduce.TableOutputFormat [findbugs] org.apache.hadoop.mapred.TaskReport [findbugs] org.apache.hadoop.security.UserGroupInformation [findbugs] org.antlr.runtime.tree.RewriteRuleSubtreeStream [findbugs] org.apache.commons.cli.HelpFormatter [findbugs] com.google.common.collect.Maps [findbugs] org.joda.time.ReadableInstant [findbugs] org.mozilla.javascript.NativeObject [findbugs] org.apache.hadoop.hbase.HConstants [findbugs] org.apache.hadoop.io.serializer.Deserializer [findbugs] org.antlr.runtime.FailedPredicateException [findbugs] org.apache.hadoop.io.compress.CompressionCodec [findbugs] org.jruby.RubyNil [findbugs] org.apache.hadoop.fs.FileStatus [findbugs] org.apache.hadoop.hbase.client.Result [findbugs] org.apache.hadoop.mapreduce.JobContext [findbugs] org.codehaus.jackson.JsonGenerator [findbugs] org.apache.hadoop.mapreduce.TaskAttemptContext [findbugs] org.apache.hadoop.io.LongWritable$Comparator [findbugs] org.codehaus.jackson.map.util.LRUMap [findbugs] org.apache.hadoop.hbase.util.Bytes [findbugs] org.antlr.runtime.MismatchedTokenException [findbugs] org.codehaus.jackson.JsonParser [findbugs] com.jcraft.jsch.UserInfo [findbugs] org.apache.hadoop.hbase.filter.WhileMatchFilter [findbugs] org.python.core.PyException [findbugs] org.apache.commons.cli.ParseException [findbugs] org.apache.hadoop.io.compress.CompressionOutputStream [findbugs] org.apache.hadoop.hbase.filter.WritableByteArrayComparable [findbugs] org.antlr.runtime.tree.CommonTreeNodeStream [findbugs] org.apache.log4j.Level [findbugs] org.apache.hadoop.hbase.client.Scan [findbugs] org.jruby.anno.JRubyMethod [findbugs] org.apache.hadoop.mapreduce.Job [findbugs] com.google.common.util.concurrent.Futures [findbugs] org.apache.commons.logging.LogFactory [findbugs] org.apache.commons.collections.IteratorUtils [findbugs] org.apache.commons.codec.binary.Base64 [findbugs] org.codehaus.jackson.map.ObjectMapper [findbugs] org.apache.hadoop.fs.FileSystem [findbugs] org.jruby.embed.LocalContextScope [findbugs] org.apache.hadoop.hbase.filter.FilterList$Operator [findbugs] org.jruby.RubySymbol [findbugs] org.codehaus.jackson.map.annotate.JacksonStdImpl [findbugs] org.apache.hadoop.hbase.io.ImmutableBytesWritable [findbugs] org.apache.hadoop.io.serializer.SerializationFactory [findbugs] org.antlr.runtime.tree.TreeAdaptor [findbugs] org.apache.hadoop.mapred.RunningJob [findbugs] org.antlr.runtime.CommonTokenStream [findbugs] org.apache.hadoop.io.DataInputBuffer [findbugs] org.apache.hadoop.io.file.tfile.TFile [findbugs] org.apache.commons.cli.GnuParser [findbugs] org.mozilla.javascript.Context [findbugs] org.apache.hadoop.io.FloatWritable [findbugs] org.antlr.runtime.tree.RewriteEarlyExitException [findbugs] org.apache.hadoop.hbase.HBaseConfiguration [findbugs] org.codehaus.jackson.JsonGenerationException [findbugs]
[jira] [Updated] (PIG-2978) TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x
[ https://issues.apache.org/jira/browse/PIG-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-2978: --- Attachment: PIG-2978-2.patch Incorporated Rohini's comments in the RB. - Changed Job.class.getName() to getJobName() - Added comments regarding the difference between hadoop 1.0.x and 2.0.x in terms of the number of StoreFunc instances. TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x -- Key: PIG-2978 URL: https://issues.apache.org/jira/browse/PIG-2978 Project: Pig Issue Type: Sub-task Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: 0.11 Attachments: PIG-2978-2.patch, PIG-2978.patch To reproduce, please run: {code} ant clean test -Dtestcase=TestLoadStoreFuncLifeCycle -Dhadoopversion=23 {code} This fails with the following error: {code} Error during parsing. Job in state DEFINE instead of RUNNING org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Job in state DEFINE instead of RUNNING at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1546) at org.apache.pig.PigServer.registerQuery(PigServer.java:516) at org.apache.pig.PigServer.registerQuery(PigServer.java:529) at org.apache.pig.TestLoadStoreFuncLifeCycle.testLoadStoreFunc(TestLoadStoreFuncLifeCycle.java:332) Caused by: Failed to parse: Job in state DEFINE instead of RUNNING at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:193) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1599) Caused by: java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING at org.apache.hadoop.mapreduce.Job.ensureState(Job.java:292) at org.apache.hadoop.mapreduce.Job.toString(Job.java:456) at java.lang.String.valueOf(String.java:2826) at org.apache.pig.TestLoadStoreFuncLifeCycle.logCaller(TestLoadStoreFuncLifeCycle.java:270) at org.apache.pig.TestLoadStoreFuncLifeCycle.access$000(TestLoadStoreFuncLifeCycle.java:41) at org.apache.pig.TestLoadStoreFuncLifeCycle$InstrumentedStorage.logCaller(TestLoadStoreFuncLifeCycle.java:54) at org.apache.pig.TestLoadStoreFuncLifeCycle$InstrumentedStorage.getSchema(TestLoadStoreFuncLifeCycle.java:115) at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:174) at org.apache.pig.newplan.logical.relational.LOLoad.init(LOLoad.java:88) at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:839) at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3236) at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1315) at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:799) at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:517) at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:392) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2978) TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x
[ https://issues.apache.org/jira/browse/PIG-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-2978: --- Status: Patch Available (was: Open) TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x -- Key: PIG-2978 URL: https://issues.apache.org/jira/browse/PIG-2978 Project: Pig Issue Type: Sub-task Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: 0.11 Attachments: PIG-2978-2.patch, PIG-2978.patch To reproduce, please run: {code} ant clean test -Dtestcase=TestLoadStoreFuncLifeCycle -Dhadoopversion=23 {code} This fails with the following error: {code} Error during parsing. Job in state DEFINE instead of RUNNING org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Job in state DEFINE instead of RUNNING at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1546) at org.apache.pig.PigServer.registerQuery(PigServer.java:516) at org.apache.pig.PigServer.registerQuery(PigServer.java:529) at org.apache.pig.TestLoadStoreFuncLifeCycle.testLoadStoreFunc(TestLoadStoreFuncLifeCycle.java:332) Caused by: Failed to parse: Job in state DEFINE instead of RUNNING at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:193) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1599) Caused by: java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING at org.apache.hadoop.mapreduce.Job.ensureState(Job.java:292) at org.apache.hadoop.mapreduce.Job.toString(Job.java:456) at java.lang.String.valueOf(String.java:2826) at org.apache.pig.TestLoadStoreFuncLifeCycle.logCaller(TestLoadStoreFuncLifeCycle.java:270) at org.apache.pig.TestLoadStoreFuncLifeCycle.access$000(TestLoadStoreFuncLifeCycle.java:41) at org.apache.pig.TestLoadStoreFuncLifeCycle$InstrumentedStorage.logCaller(TestLoadStoreFuncLifeCycle.java:54) at org.apache.pig.TestLoadStoreFuncLifeCycle$InstrumentedStorage.getSchema(TestLoadStoreFuncLifeCycle.java:115) at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:174) at org.apache.pig.newplan.logical.relational.LOLoad.init(LOLoad.java:88) at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:839) at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3236) at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1315) at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:799) at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:517) at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:392) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2978) TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x
[ https://issues.apache.org/jira/browse/PIG-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-2978: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to 0.11/trunk. Thanks Rohini for clarifying the difference in Hadoop 2.0.x. TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x -- Key: PIG-2978 URL: https://issues.apache.org/jira/browse/PIG-2978 Project: Pig Issue Type: Sub-task Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: 0.11 Attachments: PIG-2978-2.patch, PIG-2978.patch To reproduce, please run: {code} ant clean test -Dtestcase=TestLoadStoreFuncLifeCycle -Dhadoopversion=23 {code} This fails with the following error: {code} Error during parsing. Job in state DEFINE instead of RUNNING org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Job in state DEFINE instead of RUNNING at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1546) at org.apache.pig.PigServer.registerQuery(PigServer.java:516) at org.apache.pig.PigServer.registerQuery(PigServer.java:529) at org.apache.pig.TestLoadStoreFuncLifeCycle.testLoadStoreFunc(TestLoadStoreFuncLifeCycle.java:332) Caused by: Failed to parse: Job in state DEFINE instead of RUNNING at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:193) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1599) Caused by: java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING at org.apache.hadoop.mapreduce.Job.ensureState(Job.java:292) at org.apache.hadoop.mapreduce.Job.toString(Job.java:456) at java.lang.String.valueOf(String.java:2826) at org.apache.pig.TestLoadStoreFuncLifeCycle.logCaller(TestLoadStoreFuncLifeCycle.java:270) at org.apache.pig.TestLoadStoreFuncLifeCycle.access$000(TestLoadStoreFuncLifeCycle.java:41) at org.apache.pig.TestLoadStoreFuncLifeCycle$InstrumentedStorage.logCaller(TestLoadStoreFuncLifeCycle.java:54) at org.apache.pig.TestLoadStoreFuncLifeCycle$InstrumentedStorage.getSchema(TestLoadStoreFuncLifeCycle.java:115) at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:174) at org.apache.pig.newplan.logical.relational.LOLoad.init(LOLoad.java:88) at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:839) at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3236) at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1315) at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:799) at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:517) at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:392) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3034) Remove Penny code from Pig repository
[ https://issues.apache.org/jira/browse/PIG-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3034: --- Fix Version/s: 0.11 Committed to 0.11. Remove Penny code from Pig repository - Key: PIG-3034 URL: https://issues.apache.org/jira/browse/PIG-3034 Project: Pig Issue Type: Task Affects Versions: 0.12 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.11, 0.12 Attachments: PIG-penniless.patch Per the discussion at http://mail-archives.apache.org/mod_mbox/pig-dev/201210.mbox/%3C7C2F4342-E5AE-4FEF-B4C6-8413646D8D37%40hortonworks.com%3E and http://mail-archives.apache.org/mod_mbox/pig-dev/201211.mbox/%3CCAO8ATY2WgFf37qBmyzT8B6HNCsGMS-1QQOkY9zp4AL_8Aud_cw%40mail.gmail.com%3E we have decided to remove Penny from Pig's source code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506099#comment-13506099 ] Joseph Adler commented on PIG-3015: --- Hi Timothy: I have not tried the patch with Pig 0.10, but I don't know of any reason why it would not work. Give it a spin and let us know what happens. -- Joe Rewrite of AvroStorage -- Key: PIG-3015 URL: https://issues.apache.org/jira/browse/PIG-3015 Project: Pig Issue Type: Improvement Components: piggybank Reporter: Joseph Adler Assignee: Joseph Adler Attachments: PIG-3015.patch The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.) I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled me to implement support for Trevni. I'm opening this ticket to facilitate discussion while I figure out the best way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2614) AvroStorage crashes on LOADING a single bad error
[ https://issues.apache.org/jira/browse/PIG-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506101#comment-13506101 ] Joseph Adler commented on PIG-2614: --- Repeating an old question: is there any reason that this patch is only for Avro? I think this could work for all storage types. AvroStorage crashes on LOADING a single bad error - Key: PIG-2614 URL: https://issues.apache.org/jira/browse/PIG-2614 Project: Pig Issue Type: Bug Components: piggybank Affects Versions: 0.10.0, 0.11 Reporter: Russell Jurney Assignee: Jonathan Coveney Labels: avro, avrostorage, bad, book, cutting, doug, for, my, pig, sadism Fix For: 0.11, 0.10.1 Attachments: PIG-2614_0.patch, PIG-2614_1.patch AvroStorage dies when a single bad record exists, such as one with missing fields. This is very bad on 'big data,' where bad records are inevitable. See discussion at http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss for more theory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3071) update hcatalog jar and path to hbase storage handler har
Arpit Gupta created PIG-3071: Summary: update hcatalog jar and path to hbase storage handler har Key: PIG-3071 URL: https://issues.apache.org/jira/browse/PIG-3071 Project: Pig Issue Type: Bug Reporter: Arpit Gupta Due to changes in hcatalog 0.5 packaging we need to update the hcatalog jar name and the path to the hbase storage handler jar. pig script should be updated to work with either version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3071) update hcatalog jar and path to hbase storage handler har
[ https://issues.apache.org/jira/browse/PIG-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated PIG-3071: - Attachment: PIG-3071.patch attached is patch that takes a stab at fixing this update hcatalog jar and path to hbase storage handler har - Key: PIG-3071 URL: https://issues.apache.org/jira/browse/PIG-3071 Project: Pig Issue Type: Bug Reporter: Arpit Gupta Attachments: PIG-3071.patch Due to changes in hcatalog 0.5 packaging we need to update the hcatalog jar name and the path to the hbase storage handler jar. pig script should be updated to work with either version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3071) update hcatalog jar and path to hbase storage handler har
[ https://issues.apache.org/jira/browse/PIG-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-3071: --- Labels: hcatalog (was: ) update hcatalog jar and path to hbase storage handler har - Key: PIG-3071 URL: https://issues.apache.org/jira/browse/PIG-3071 Project: Pig Issue Type: Bug Reporter: Arpit Gupta Labels: hcatalog Attachments: PIG-3071.patch Due to changes in hcatalog 0.5 packaging we need to update the hcatalog jar name and the path to the hbase storage handler jar. pig script should be updated to work with either version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Our release process
Since releasing a major version once a month is agressive and we have not released on a quarterly basis, we should allow commits to a released branch to facilitate dot releases. If we are allowing commits to a released branch, the criteria for inclusion can be created anew or we use the industry standards for severity (or priority). It could be painful for a few folks but I don't see better alternatives. Regarding reverting commits based on e2e tests breaking: 1. Who is running the tests? 2. How often are they run? If we have nightly e2e runs then its easier to catch these errors early. If not the barrier for inclusion is pretty high and time consuming making it harder to develop. Santhosh From: Bill Graham billgra...@gmail.com To: dev@pig.apache.org Sent: Wednesday, November 28, 2012 11:39 AM Subject: Re: Our release process I agree releasing often is ideal, but releasing major versions once a month would be a bit agressive. +1 to Olga's initial definition of how Yahoo! determines what goes into a released branch. Basically is something broken without a workaround or is there potential silent data loss. Trying to get a more granular definition than that (i.e. P1, P2, severity, etc) will be painful. The reality in that case is that for whomever is blocked by the bug will consider it a P1. Fixes need to be relatively low-risk though to keep stability, but this is also subjective. For this I'm in favor of relying on developer and reviewer judgement to make that call and I'm +1 to Alan's proposal of rolling back patches that break the e2e tests or anything else. I think our policy should avoid time-based consideration on how many quarters away are we from the next major release since that's also impossible to quantify. Plus, if the answer to the question is that we're more than 1-2 quarters from the next release is yes then we should be fixing that release problem. On Wed, Nov 28, 2012 at 10:22 AM, Julien Le Dem jul...@twitter.com wrote: I would really like to see us doing frequent releases (at least once per quarter if not once a month). I think the whole notion of priority or being a blocker is subjective. Releasing infrequently pressures us to push more changes than we would want to the release branch. We should focus on keeping TRUNK stable as well so that it is easier to release and users can do more frequent and smaller upgrades. There should be a small enough number of patches going in the release branch so that we can get agreement on whether we check them in or not. I like Alan's proposal of reverting quickly when there's a problem. Again, this becomes less of a problem if we release more often. Which leads me to my next question: what are the next steps for releasing pig 0.11 ? Julien On Tue, Nov 27, 2012 at 10:22 PM, Santhosh M S santhosh_mut...@yahoo.com wrote: Hi Olga, For a moment, I will move away from P1 and P2 which are related to priorities and use the Severity definitions. The standard bugzilla definitions for severity are: Blocker - Blocks development and/or testing work. Critical - Crashes, loss of data, severe memory leak. Major - Major loss of function. I am skipping the other levels (normal, minor and trivial) for this discussion. Coming back to priorities, the proposed definitions map P1 to Blocker and Critical. I am proposing mapping P2 to Major even when there are known workarounds. We are doing this since JIRA does not have severity by default (see: https://confluence.atlassian.com/pages/viewpage.action?pageId=192840 ) I am proposing that P2s be included in the released branch only when trunk or unreleased versions are known to be backward incompatible or if the release is more than a quarter (or two) away. Thanks, Santhosh From: Olga Natkovich onatkov...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org; Santhosh M S santhosh_mut...@yahoo.com Sent: Tuesday, November 27, 2012 10:41 AM Subject: Re: Our release process Hi Santhosh, What is your definition of P2s? Olga - Original Message - From: Santhosh M S santhosh_mut...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org; Olga Natkovich onatkov...@yahoo.com Cc: Sent: Monday, November 26, 2012 11:49 PM Subject: Re: Our release process Hi Olga, I agree that we cannot guarantee backward compatibility upfront. With that knowledge, I am proposing a small modification to your proposal. 1. If the trunk or unreleased version is known to be backwards compatible then only P1 issues go into the released branch. 2. If the the trunk or unreleased version is known to be backwards incompatible or the release is a long ways off (two quarters?) then we should allow for dot releases on the branch, i.e., P1 and P2 issues. I am hoping that should provide an incentive for users to move to a higher release and