[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (21 issues) Subscriber: pigdaily Key Summary PIG-4326AvroStorageSchemaConversionUtilities does not properly convert schema for maps of arrays of records https://issues.apache.org/jira/browse/PIG-4326 PIG-4313StackOverflowError in LIMIT operation on Spark https://issues.apache.org/jira/browse/PIG-4313 PIG-4264Port TestAvroStorage to tez local mode https://issues.apache.org/jira/browse/PIG-4264 PIG-4251Pig on Storm https://issues.apache.org/jira/browse/PIG-4251 PIG-4239"pig.output.lazy" not works in spark mode https://issues.apache.org/jira/browse/PIG-4239 PIG-4207Make python udfs work with Spark https://issues.apache.org/jira/browse/PIG-4207 PIG-4111Make Pig compiles with avro-1.7.7 https://issues.apache.org/jira/browse/PIG-4111 PIG-4103Fix TestRegisteredJarVisibility(after PIG-4083) https://issues.apache.org/jira/browse/PIG-4103 PIG-4066An optimization for ROLLUP operation in Pig https://issues.apache.org/jira/browse/PIG-4066 PIG-4004Upgrade the Pigmix queries from the (old) mapred API to mapreduce https://issues.apache.org/jira/browse/PIG-4004 PIG-4002Disable combiner when map-side aggregation is used https://issues.apache.org/jira/browse/PIG-4002 PIG-3952PigStorage accepts '-tagSplit' to return full split information https://issues.apache.org/jira/browse/PIG-3952 PIG-3911Define unique fields with @OutputSchema https://issues.apache.org/jira/browse/PIG-3911 PIG-3877Getting Geo Latitude/Longitude from Address Lines https://issues.apache.org/jira/browse/PIG-3877 PIG-3873Geo distance calculation using Haversine https://issues.apache.org/jira/browse/PIG-3873 PIG-3866Create ThreadLocal classloader per PigContext https://issues.apache.org/jira/browse/PIG-3866 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3635Fix e2e tests for Hadoop 2.X on Windows https://issues.apache.org/jira/browse/PIG-3635 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 PIG-3441Allow Pig to use default resources from Configuration objects https://issues.apache.org/jira/browse/PIG-3441 PIG-2692Make the Pig unit faciliities more generalizable and update javadocs https://issues.apache.org/jira/browse/PIG-2692 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328&filterId=12322384
[jira] [Updated] (PIG-4331) update README, '-x' option in usage to include tez
[ https://issues.apache.org/jira/browse/PIG-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-4331: Resolution: Fixed Fix Version/s: 0.14.1 Status: Resolved (was: Patch Available) Patch committed to trunk, thanks Thejas! We shall also backport to 0.14 branch once 0.14.0 is release. Keep the ticket open. > update README, '-x' option in usage to include tez > -- > > Key: PIG-4331 > URL: https://issues.apache.org/jira/browse/PIG-4331 > Project: Pig > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.14.1, 0.15.0 > > Attachments: PIG-4331.1.patch > > > Pig queries can be run using tez, by specifying "pig -x tez". The output of > pig --help needs to be updated to indicate that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (PIG-4331) update README, '-x' option in usage to include tez
[ https://issues.apache.org/jira/browse/PIG-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reopened PIG-4331: - > update README, '-x' option in usage to include tez > -- > > Key: PIG-4331 > URL: https://issues.apache.org/jira/browse/PIG-4331 > Project: Pig > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.14.1, 0.15.0 > > Attachments: PIG-4331.1.patch > > > Pig queries can be run using tez, by specifying "pig -x tez". The output of > pig --help needs to be updated to indicate that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4331) update README, '-x' option in usage to include tez
[ https://issues.apache.org/jira/browse/PIG-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4331: --- Status: Patch Available (was: Open) > update README, '-x' option in usage to include tez > -- > > Key: PIG-4331 > URL: https://issues.apache.org/jira/browse/PIG-4331 > Project: Pig > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.15.0 > > Attachments: PIG-4331.1.patch > > > Pig queries can be run using tez, by specifying "pig -x tez". The output of > pig --help needs to be updated to indicate that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4331) update README, '-x' option in usage to include tez
[ https://issues.apache.org/jira/browse/PIG-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4331: --- Attachment: PIG-4331.1.patch > update README, '-x' option in usage to include tez > -- > > Key: PIG-4331 > URL: https://issues.apache.org/jira/browse/PIG-4331 > Project: Pig > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.15.0 > > Attachments: PIG-4331.1.patch > > > Pig queries can be run using tez, by specifying "pig -x tez". The output of > pig --help needs to be updated to indicate that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4331) update README, '-x' option in usage to include tez
Thejas M Nair created PIG-4331: -- Summary: update README, '-x' option in usage to include tez Key: PIG-4331 URL: https://issues.apache.org/jira/browse/PIG-4331 Project: Pig Issue Type: Bug Affects Versions: 0.14.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.15.0 Pig queries can be run using tez, by specifying "pig -x tez". But usage does not indicate this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4331) update README, '-x' option in usage to include tez
[ https://issues.apache.org/jira/browse/PIG-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4331: --- Description: Pig queries can be run using tez, by specifying "pig -x tez". The output of pig --help needs to be updated to indicate that. (was: Pig queries can be run using tez, by specifying "pig -x tez". But usage does not indicate this. ) > update README, '-x' option in usage to include tez > -- > > Key: PIG-4331 > URL: https://issues.apache.org/jira/browse/PIG-4331 > Project: Pig > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.15.0 > > > Pig queries can be run using tez, by specifying "pig -x tez". The output of > pig --help needs to be updated to indicate that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [VOTE] Release Pig 0.14.0 (candidate 0)
+1 Verified keys Checked LICENSE, README, RELEASE_NOTES, CHANGES files, rat report. Built the source Tried running queries both using local mode and cluster Two minor issues, that doesn’t need to block this RC 1. I think we should update README to indicate the choice of execution engine. 2. pig —help does not show “tez” as valid option for “-x” argument I will create a jira to track these issues. On Wed, Nov 12, 2014 at 8:46 PM, Daniel Dai wrote: > Hi, > > I have created a candidate build for Pig 0.14.0. > > Keys used to sign the release are available at > http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup. > > Please download, test, and try it out: > http://people.apache.org/~daijy/pig-0.14.0-candidate-0/ > > Release notes and the rat report are available at the same location. > > Should we release this? Vote closes on next Monday EOD, Nov 17th 2014. > > Thanks, > Daniel > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Comment Edited] (PIG-3615) Update the way that JsonLoader/JsonStorage deal with BigDecimal
[ https://issues.apache.org/jira/browse/PIG-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212850#comment-14212850 ] Daniel Dai edited comment on PIG-3615 at 11/14/14 9:38 PM: --- I committed your later patch which takes decimal with or without quotes. was (Author: daijy): I committed your later patch with takes decimal with or without quotes. > Update the way that JsonLoader/JsonStorage deal with BigDecimal > --- > > Key: PIG-3615 > URL: https://issues.apache.org/jira/browse/PIG-3615 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.12.0 >Reporter: Erik Selin >Assignee: Erik Selin >Priority: Minor > Fix For: 0.15.0 > > Attachments: PIG-3615.patch, bugPig-3615.patch > > > It's a common (and good) convention to quote fixed point numbers when storing > them as json. The reason being that majority of json libraries will > implicitly load any number value as a floating point number and if you care > about data integrity this will make you very sad. > This update makes JsonLoader able to load BigDecimal values from quoted > values (the old jackson library that we're using doesn't support this through > the current approach) as well as making JsonStorage store BigDecimal values > as quoted strings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-3615) Update the way that JsonLoader/JsonStorage deal with BigDecimal
[ https://issues.apache.org/jira/browse/PIG-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212850#comment-14212850 ] Daniel Dai commented on PIG-3615: - I committed your later patch with takes decimal with or without quotes. > Update the way that JsonLoader/JsonStorage deal with BigDecimal > --- > > Key: PIG-3615 > URL: https://issues.apache.org/jira/browse/PIG-3615 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.12.0 >Reporter: Erik Selin >Assignee: Erik Selin >Priority: Minor > Fix For: 0.15.0 > > Attachments: PIG-3615.patch, bugPig-3615.patch > > > It's a common (and good) convention to quote fixed point numbers when storing > them as json. The reason being that majority of json libraries will > implicitly load any number value as a floating point number and if you care > about data integrity this will make you very sad. > This update makes JsonLoader able to load BigDecimal values from quoted > values (the old jackson library that we're using doesn't support this through > the current approach) as well as making JsonStorage store BigDecimal values > as quoted strings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-2692) Make the Pig unit faciliities more generalizable and update javadocs
[ https://issues.apache.org/jira/browse/PIG-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212848#comment-14212848 ] Daniel Dai commented on PIG-2692: - Can you add some documentation to http://pig.apache.org/docs/r0.13.0/test.html#pigunit? The source code for documentation is in src/docs/src/documentation/content/xdocs/test.xml > Make the Pig unit faciliities more generalizable and update javadocs > > > Key: PIG-2692 > URL: https://issues.apache.org/jira/browse/PIG-2692 > Project: Pig > Issue Type: Improvement >Reporter: Jeremy Hanna >Assignee: Richard So >Priority: Minor > Fix For: 0.15.0 > > Attachments: pig2692.patch > > > This ticket has two goals for Pig unit: > 1) Pig unit has a really nice method assertOutput(String inputAlias, String[] > inputValues, String outputAlias, String[] expectedOutputValues). That method > lets you override an input alias variable with a hardcoded list of values. > That way, the script doesn't actually have to read that input variable from > hdfs or cassandra. Then, it runs the script and checks the specified output > alias variable against the expected set of values. It's a really nice way to > test your entire pig script with a single method call, but only IF your > script has exactly 1 input and 1 output. If you want to test more > complicated scripts, you have to jump through some hoops in order to override > more input variables. But, it would be fairly easy to change PigUnit so that > it can override any number of inputs and check any number of outputs and do > so easily. That's basically the change that I put into the base testing > class I wrote. But, it would be better to push that into PigUnit itself, and > it's something that could easily be done in an afternoon. > 2) Update javadocs for the pig unit test classes to make them more readable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-2692) Make the Pig unit faciliities more generalizable and update javadocs
[ https://issues.apache.org/jira/browse/PIG-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-2692: Fix Version/s: 0.15.0 Assignee: Richard So > Make the Pig unit faciliities more generalizable and update javadocs > > > Key: PIG-2692 > URL: https://issues.apache.org/jira/browse/PIG-2692 > Project: Pig > Issue Type: Improvement >Reporter: Jeremy Hanna >Assignee: Richard So >Priority: Minor > Fix For: 0.15.0 > > Attachments: pig2692.patch > > > This ticket has two goals for Pig unit: > 1) Pig unit has a really nice method assertOutput(String inputAlias, String[] > inputValues, String outputAlias, String[] expectedOutputValues). That method > lets you override an input alias variable with a hardcoded list of values. > That way, the script doesn't actually have to read that input variable from > hdfs or cassandra. Then, it runs the script and checks the specified output > alias variable against the expected set of values. It's a really nice way to > test your entire pig script with a single method call, but only IF your > script has exactly 1 input and 1 output. If you want to test more > complicated scripts, you have to jump through some hoops in order to override > more input variables. But, it would be fairly easy to change PigUnit so that > it can override any number of inputs and check any number of outputs and do > so easily. That's basically the change that I put into the base testing > class I wrote. But, it would be better to push that into PigUnit itself, and > it's something that could easily be done in an afternoon. > 2) Update javadocs for the pig unit test classes to make them more readable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4326) AvroStorageSchemaConversionUtilities does not properly convert schema for maps of arrays of records
[ https://issues.apache.org/jira/browse/PIG-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-4326: Attachment: PIG-4326-0.patch I tried and seems testLoadRecordsWithMapOfRecords pass. I attached my change in AvroStorageSchemaConversionUtilities. If there is inconsistency between avro schema and Pig, we shall fill this gap. For example, if avro map always contain a record, we shall remove it when translating to Pig. Pig always has a tuple inside bag, when translating to avro, we shall remove the tuple. If we are not doing that, then it is a bug we shall fix. > AvroStorageSchemaConversionUtilities does not properly convert schema for > maps of arrays of records > --- > > Key: PIG-4326 > URL: https://issues.apache.org/jira/browse/PIG-4326 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0, 0.13.0 >Reporter: Michael Prim >Assignee: Michael Prim > Fix For: 0.15.0 > > Attachments: PIG-4326-0.patch, mapsOfArraysOfRecords.patch > > > I tried to convert the avro schema of a map of arrays of records into the > proper pig schema and got always empty map schemas in pig. > The reason is that the AvroStorageSchemaConversionUtilities does only assume > records or primitive types as content of the map. However, a map of arrays, > or a map of map, could have a schema itself and requires recursive calling to > derive the full schema. > I wrote a unit test to test for maps of arrays of records which fails with > every pig release since the AvroStorage was rewritten (I think this was in > 0.12), and there have been no changes since then in the trunk. > Further the attached patch contains the (rather simple) fix that makes the > schema conversion utils succeed. > Would appreciate further comments and if this can be included upstream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-3615) Update the way that JsonLoader/JsonStorage deal with BigDecimal
[ https://issues.apache.org/jira/browse/PIG-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212752#comment-14212752 ] Eyal Allweil commented on PIG-3615: --- Daniel, you committed all of the older patch, the one that might cause backward compatibility problems. Can you switch it to the newer patch with changes to JsonLoader, but not JsonStorage? We can of course go with the original patch, too, but then (at the very least) an appropriate warning should be put in the release notes about the change in big decimal write formats. > Update the way that JsonLoader/JsonStorage deal with BigDecimal > --- > > Key: PIG-3615 > URL: https://issues.apache.org/jira/browse/PIG-3615 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.12.0 >Reporter: Erik Selin >Assignee: Erik Selin >Priority: Minor > Fix For: 0.15.0 > > Attachments: PIG-3615.patch, bugPig-3615.patch > > > It's a common (and good) convention to quote fixed point numbers when storing > them as json. The reason being that majority of json libraries will > implicitly load any number value as a floating point number and if you care > about data integrity this will make you very sad. > This update makes JsonLoader able to load BigDecimal values from quoted > values (the old jackson library that we're using doesn't support this through > the current approach) as well as making JsonStorage store BigDecimal values > as quoted strings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4327) Schema of map with value that has an alias can't be parsed again
[ https://issues.apache.org/jira/browse/PIG-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-4327: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks Michael! > Schema of map with value that has an alias can't be parsed again > > > Key: PIG-4327 > URL: https://issues.apache.org/jira/browse/PIG-4327 > Project: Pig > Issue Type: Bug > Components: parser >Affects Versions: 0.12.0, 0.13.0 >Reporter: Michael Prim >Assignee: Michael Prim > Fix For: 0.15.0 > > Attachments: > 0001-Extend-map-type-to-allow-for-alias-of-its-values.patch > > > Tried to create a map of a primitive type, the resulting schema can't be > parsed again by the parser if there is a alias set for the value. > I could not set an alias, but the alias gets set by pig itself, e.g. when > converting avro schemas to pig schemas and there was a map of records in avro. > See also my other bug report https://issues.apache.org/jira/browse/PIG-4326 , > even without that fix, pig produces schemas of maps with values that have an > alias. > You can easily reproduce the crash, using those two unit tests. The second > one should actually succeed but throws a ParserException instead > {code} > @Test > public void testWorksWithoutAlias() throws FrontendException { > List innerFields = new ArrayList<>(); > innerFields.add(new FieldSchema(null, DataType.LONG)); > List fields = new ArrayList<>(); > fields.add(new FieldSchema("mapAlias", new Schema(innerFields), > DataType.MAP)); > Schema inputSchema = new Schema(fields); > Schema fromString = > Utils.getSchemaFromBagSchemaString(inputSchema.toString()); > assertEquals(inputSchema.toString(), fromString.toString()); > } > @Test > public void testBreaksWithAlias() throws FrontendException { > List innerFields = new ArrayList<>(); > innerFields.add(new FieldSchema("valueAlias", DataType.LONG)); > List fields = new ArrayList<>(); > fields.add(new FieldSchema("mapAlias", new Schema(innerFields), > DataType.MAP)); > Schema inputSchema = new Schema(fields); > Schema fromString = > Utils.getSchemaFromBagSchemaString(inputSchema.toString()); > assertEquals(inputSchema.toString(), fromString.toString()); > } > {code} > I suppose that the issue is in the grammar itself and easy to fix for someone > knowing antlr. I don't think the issue is related to the actual type of the > value, as I could also provide tests that fail if we don't use a primitive > but complex type with an alias. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4326) AvroStorageSchemaConversionUtilities does not properly convert schema for maps of arrays of records
[ https://issues.apache.org/jira/browse/PIG-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212115#comment-14212115 ] Michael Prim commented on PIG-4326: --- Thanks for the feedback, I realized that during development and was actually also a bit surprised. However removing this extra layer, breaks the already existing testLoadRecordsWithMapOfRecords test, and would be not backward compatible. Further, if you create a map> using avro avdl files, it is just syntactic sugar for actually having some map where WrapperRecord has one field, namely an array of InnerRecord. As neither the WrapperRecord has an alias, nor the array of InnerRecords itself, it is a bit confusing that both get the "array" alias. So we could stick to the old behavior for records and drop the wrapping tuple only for maps and arrays, but then the resulting output will look different than the input I think. > AvroStorageSchemaConversionUtilities does not properly convert schema for > maps of arrays of records > --- > > Key: PIG-4326 > URL: https://issues.apache.org/jira/browse/PIG-4326 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0, 0.13.0 >Reporter: Michael Prim >Assignee: Michael Prim > Fix For: 0.15.0 > > Attachments: mapsOfArraysOfRecords.patch > > > I tried to convert the avro schema of a map of arrays of records into the > proper pig schema and got always empty map schemas in pig. > The reason is that the AvroStorageSchemaConversionUtilities does only assume > records or primitive types as content of the map. However, a map of arrays, > or a map of map, could have a schema itself and requires recursive calling to > derive the full schema. > I wrote a unit test to test for maps of arrays of records which fails with > every pig release since the AvroStorage was rewritten (I think this was in > 0.12), and there have been no changes since then in the trunk. > Further the attached patch contains the (rather simple) fix that makes the > schema conversion utils succeed. > Would appreciate further comments and if this can be included upstream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)