Jenkins build is back to normal : Pig-trunk #1389
See https://builds.apache.org/job/Pig-trunk/1389/changes
[jira] [Commented] (PIG-3057) Make PigStorage.readField() protected
[ https://issues.apache.org/jira/browse/PIG-3057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559565#comment-13559565 ] pablo martinez commented on PIG-3057: - thank you ! Make PigStorage.readField() protected - Key: PIG-3057 URL: https://issues.apache.org/jira/browse/PIG-3057 Project: Pig Issue Type: Improvement Components: build, internal-udfs Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.9.2, 0.10.0 Reporter: pablo martinez Assignee: pablo martinez Priority: Trivial Labels: patch Fix For: 0.12 Attachments: PIG-3057_1.patch, PigStorage_readField.patch Original Estimate: 2h Remaining Estimate: 2h for the cases when we need to extend PigStorage just to override readField. Currently, we need to copy/paste several private fields and all getNext I've changed readField from private to protected and added a new method: protected void addToCurrentTuple(DataByteArray data) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3126) Permission issue in Pig 0.9.2
Vishnu Ganth created PIG-3126: - Summary: Permission issue in Pig 0.9.2 Key: PIG-3126 URL: https://issues.apache.org/jira/browse/PIG-3126 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.2 Environment: CentOS 5.7 Reporter: Vishnu Ganth Attachments: log.txt A = Load 'sample'; store A into '/user/xyz/sample-out'; When this pig script is run using abc user who does not have write permission in '/user/xyz', PIG is unable to create the directory sample-out and the map-reduce job gets killed ultimately without any log. PIG should throw some error log saying permission denied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3126) Permission issue in Pig 0.9.2
[ https://issues.apache.org/jira/browse/PIG-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishnu Ganth updated PIG-3126: -- Attachment: log.txt Permission issue in Pig 0.9.2 - Key: PIG-3126 URL: https://issues.apache.org/jira/browse/PIG-3126 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.2 Environment: CentOS 5.7 Reporter: Vishnu Ganth Attachments: log.txt A = Load 'sample'; store A into '/user/xyz/sample-out'; When this pig script is run using abc user who does not have write permission in '/user/xyz', PIG is unable to create the directory sample-out and the map-reduce job gets killed ultimately without any log. PIG should throw some error log saying permission denied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3126) Problem in STORE
[ https://issues.apache.org/jira/browse/PIG-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishnu Ganth updated PIG-3126: -- Summary: Problem in STORE (was: Permission issue in Pig 0.9.2) Problem in STORE Key: PIG-3126 URL: https://issues.apache.org/jira/browse/PIG-3126 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.2 Environment: CentOS 5.7 Reporter: Vishnu Ganth Attachments: log.txt A = Load 'sample'; store A into '/user/xyz/sample-out'; When this pig script is run using abc user who does not have write permission in '/user/xyz', PIG is unable to create the directory sample-out and the map-reduce job gets killed ultimately without any log. PIG should throw some error log saying permission denied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3005) TestLargeFile#testOrderBy is failing
[ https://issues.apache.org/jira/browse/PIG-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559848#comment-13559848 ] Jonathan Coveney commented on PIG-3005: --- Why is the test excluded? Is it because it's irrelevant, or b/c it's too big? If it's irrelevant, we should remove it. If it's too big we should either have a special test-extra that has some heavier load tests, or whatever. But I dislike the current state of having these random excluded tests. Any thoughts? TestLargeFile#testOrderBy is failing Key: PIG-3005 URL: https://issues.apache.org/jira/browse/PIG-3005 Project: Pig Issue Type: Sub-task Environment: Mac OSX 10.6.8 Reporter: Jonathan Coveney Fix For: 0.12 When run locally, at least, this test is failing for me. Has anyone else noticed this failing? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3090) Introduce a syntax to be able to easily refer to the previously defined relation
[ https://issues.apache.org/jira/browse/PIG-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559851#comment-13559851 ] Jonathan Coveney commented on PIG-3090: --- Daniel, I think that that is a good idea, but should be a separate patch. It would not be hard to create a syntax which implicitly uses some reserved pig-only alias (reserved or something), which = would implicitly become. If you make the ticket and assign it to me I will work on it. That said, I think this patch can and should be evaluated separately. Cheolsoo, I think that's a matter for documentation. Or do you think we should do something special with the @ inside of a nested foreach? Introduce a syntax to be able to easily refer to the previously defined relation Key: PIG-3090 URL: https://issues.apache.org/jira/browse/PIG-3090 Project: Pig Issue Type: New Feature Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3090-0.patch, PIG-3090-1.patch Sometimes I feel like swimming with ANTLRs. This particular feature isn't too hard to add... and supports syntax like this: {code} a = load 'thing' as (x:int); b = foreach @ generate x; c = foreach @ generate x; d = foreach @ generate x; {code} I have a patch, though I need to make sure it doesn't change anything (it shouldn't) and I need to add tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3127) Add e2e testing for BigInteger and BigDecimal data type
Jonathan Coveney created PIG-3127: - Summary: Add e2e testing for BigInteger and BigDecimal data type Key: PIG-3127 URL: https://issues.apache.org/jira/browse/PIG-3127 Project: Pig Issue Type: Task Affects Versions: 0.12 Reporter: Jonathan Coveney Assignee: Alan Gates Priority: Blocker Fix For: 0.12 We need e2e test coverage for these new data types. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3128) Document the BigInteger and BigDecimal data type
Jonathan Coveney created PIG-3128: - Summary: Document the BigInteger and BigDecimal data type Key: PIG-3128 URL: https://issues.apache.org/jira/browse/PIG-3128 Project: Pig Issue Type: Task Affects Versions: 0.12 Reporter: Jonathan Coveney Priority: Blocker Fix For: 0.12 We need to document the use and existence of BigInt and BigDecimal -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3031) Update Pig to use a newer version of joda-time
[ https://issues.apache.org/jira/browse/PIG-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559871#comment-13559871 ] Jonathan Coveney commented on PIG-3031: --- Sounds good, thanks y'all. Update Pig to use a newer version of joda-time -- Key: PIG-3031 URL: https://issues.apache.org/jira/browse/PIG-3031 Project: Pig Issue Type: Bug Reporter: Jonathan Coveney Assignee: Zhijie Shen Fix For: 0.12 Attachments: PIG-3031.patch The current version is 1.6, which is quite old (~4 years at this point). Is there any reason not to bring us up to a newer version? I tried to compile the 1.6 source and it didn't work because dependencies are outdated, and so on. Also, the interfaces have matured. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: Add BigInteger and BigDecimal to Pig
On Jan. 21, 2013, 6:27 p.m., Alan Gates wrote: src/org/apache/pig/backend/hadoop/hbase/HBaseBinaryConverter.java, line 192 https://reviews.apache.org/r/9012/diff/2/?file=250198#file250198line192 Is this because HBase doesn't have a standard representation for this? Yes, that's how I understood it On Jan. 21, 2013, 6:27 p.m., Alan Gates wrote: src/org/apache/pig/builtin/BinStorage.java, line 143 https://reviews.apache.org/r/9012/diff/2/?file=250203#file250203line143 Why not implement these? You have read and write functions in the BytesWritable implementations. It's specifically a load caster that has nothing implemented On Jan. 21, 2013, 6:27 p.m., Alan Gates wrote: src/org/apache/pig/builtin/TextLoader.java, line 249 https://reviews.apache.org/r/9012/diff/2/?file=250204#file250204line249 Again, why not implement these? There are a string to bigint and bigdecimal functions. In this case it is because of how the TextLoader works. The whole point of the textloader is that you don't try to cast anything, and you just load each line as text. Note that the only implemented one is bytesToCharArray On Jan. 21, 2013, 6:27 p.m., Alan Gates wrote: src/org/apache/pig/data/BinInterSedes.java, line 908 https://reviews.apache.org/r/9012/diff/2/?file=250206#file250206line908 I wonder if there's a way to avoid conversions to and from and strings here. That can hardly be efficient. I don't think it's something we need to change now but down the road maybe something to think about. I wholeheartedly agree. I think we should make a JIRA for this. I didn't sweat it since raging efficiency isn't what this is about anyway, but we definitely could do something much smarter. It would not be hard. On Jan. 21, 2013, 6:27 p.m., Alan Gates wrote: src/org/apache/pig/data/TypeAwareTuple.java, line 39 https://reviews.apache.org/r/9012/diff/2/?file=250211#file250211line39 Seems like the second argument here should be a BigInteger, not a boolean. Same comment for the next line. Oh wow lol. Copy-paste strikes. Good catch. - Jonathan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9012/#review15544 --- On Jan. 18, 2013, 10:11 p.m., Jonathan Coveney wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9012/ --- (Updated Jan. 18, 2013, 10:11 p.m.) Review request for pig, Alan Gates and Mathias Herberts. Description --- This patch adds big integer and big decimal support to Pig. It could use more tests, something I'd appreciate feedback on (but I wanted to make sure the core implementation is good) This addresses bug PIG-2764. https://issues.apache.org/jira/browse/PIG-2764 Diffs - .gitignore cc62d7d src/org/apache/pig/LoadCaster.java 574769b src/org/apache/pig/PigWarning.java 5de075f src/org/apache/pig/StoreCaster.java 5fe48de src/org/apache/pig/backend/hadoop/BigDecimalWritable.java PRE-CREATION src/org/apache/pig/backend/hadoop/BigIntegerWritable.java PRE-CREATION src/org/apache/pig/backend/hadoop/HDataType.java 84a56b8 src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 96fba6b src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigBigDecimalRawComparator.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigBigIntegerRawComparator.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/WeightedRangePartitioner.java 9749339 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java f40eb43 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Add.java c84b767 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ConstantExpression.java db3840f src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Divide.java 4656c28 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/EqualToExpr.java 6683beb src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ExpressionOperator.java 2806336 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GTOrEqualToExpr.java d64a080 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GreaterThanExpr.java 704d0b8 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LTOrEqualToExpr.java 9dc929e
Re: Review Request: Add BigInteger and BigDecimal to Pig
On Jan. 19, 2013, 7:29 p.m., Mathias Herberts wrote: src/org/apache/pig/data/DefaultTuple.java, line 359 https://reviews.apache.org/r/9012/diff/1-2/?file=249804#file249804line359 Since BigDecimal have a scale which is an int, nothing prevents a BigDecimal to have a scale which won't fit on a short, and thus whose string representation might also be longer than a short. Is I understand this code correctly (thanks to your explanation), if type if CHARARRAY, field length is encoded on a short? This will be troublesome if scale 0x7. This is a bit annoying/confusing, but there are two internal CHARARRAYS, either a CHARARRAY or a BIGCHARRAY. So if it's bigger than a short, it will be an int (ie a BIGCHARARRAY) and it should deserialized correctly. Literally all DataType.BIGINTEGER in this case signals is what comes after is a string, which we then serialized and deserialize in accordance with the rest of pig. - Jonathan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9012/#review15522 --- On Jan. 18, 2013, 10:11 p.m., Jonathan Coveney wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9012/ --- (Updated Jan. 18, 2013, 10:11 p.m.) Review request for pig, Alan Gates and Mathias Herberts. Description --- This patch adds big integer and big decimal support to Pig. It could use more tests, something I'd appreciate feedback on (but I wanted to make sure the core implementation is good) This addresses bug PIG-2764. https://issues.apache.org/jira/browse/PIG-2764 Diffs - .gitignore cc62d7d src/org/apache/pig/LoadCaster.java 574769b src/org/apache/pig/PigWarning.java 5de075f src/org/apache/pig/StoreCaster.java 5fe48de src/org/apache/pig/backend/hadoop/BigDecimalWritable.java PRE-CREATION src/org/apache/pig/backend/hadoop/BigIntegerWritable.java PRE-CREATION src/org/apache/pig/backend/hadoop/HDataType.java 84a56b8 src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 96fba6b src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigBigDecimalRawComparator.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigBigIntegerRawComparator.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/WeightedRangePartitioner.java 9749339 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java f40eb43 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Add.java c84b767 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ConstantExpression.java db3840f src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Divide.java 4656c28 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/EqualToExpr.java 6683beb src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ExpressionOperator.java 2806336 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GTOrEqualToExpr.java d64a080 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GreaterThanExpr.java 704d0b8 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LTOrEqualToExpr.java 9dc929e src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LessThanExpr.java 0320698 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Mod.java 6819185 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Multiply.java 7b57bed src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/NotEqualToExpr.java 79a4461 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POBinCond.java 08544d5 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POCast.java e8c2f2c src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POIsNull.java f20b839 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PONegative.java c076ae7 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POProject.java 8887133 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserComparisonFunc.java 479eb83
[jira] [Commented] (PIG-2764) Add a biginteger and bigdecimal type to pig
[ https://issues.apache.org/jira/browse/PIG-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559918#comment-13559918 ] Jonathan Coveney commented on PIG-2764: --- Good comments all around. I just responded and made the tickets, and will update the latest patch shortly once test-commit runs (changes were fairly minor). The comments are mainly discussions of how TextLoasder works etc. Add a biginteger and bigdecimal type to pig --- Key: PIG-2764 URL: https://issues.apache.org/jira/browse/PIG-2764 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Assignee: Jonathan Coveney Attachments: fixedpoint.patch, PIG-2764-0.patch, PIG-2764-1.patch, PIG-2764-2_nows.patch, PIG-2764-2.patch, PIG-2764-3.patch, PIG-2764-4.patch I think it would be useful for applications where precision is more important than speed to have the option of using java's bigdecimal and biginteger types natively. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2764) Add a biginteger and bigdecimal type to pig
[ https://issues.apache.org/jira/browse/PIG-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559921#comment-13559921 ] Jonathan Coveney commented on PIG-2764: --- PS thanks to Alan, Matthias, and Cheolsoo for the eyes. Definitely let me know if you have anything you'd like to me look at. This was a big patch. Add a biginteger and bigdecimal type to pig --- Key: PIG-2764 URL: https://issues.apache.org/jira/browse/PIG-2764 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Assignee: Jonathan Coveney Attachments: fixedpoint.patch, PIG-2764-0.patch, PIG-2764-1.patch, PIG-2764-2_nows.patch, PIG-2764-2.patch, PIG-2764-3.patch, PIG-2764-4.patch I think it would be useful for applications where precision is more important than speed to have the option of using java's bigdecimal and biginteger types natively. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3105) Fix TestJobSubmission unit test failure.
[ https://issues.apache.org/jira/browse/PIG-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559933#comment-13559933 ] Vikram Dixit K commented on PIG-3105: - The related HBase issue was committed earlier this month after this patch was posted. Also, the correct way to use the hbase configuration as expected in hbase 0.94 and beyond is the first line change in the patch: +HBaseTestingUtility util = new HBaseTestingUtility(HBaseConfiguration.create(conf)); I will post the error I was seeing if I can dig it up. Fix TestJobSubmission unit test failure. Key: PIG-3105 URL: https://issues.apache.org/jira/browse/PIG-3105 Project: Pig Issue Type: Bug Components: tools Affects Versions: 0.10.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.11 Attachments: PIG-3105.patch Currently with Hadoop 1.0, the TestJobSubmission unit test fails. This is due to HBASE-7423. This is a work around to that issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3090) Introduce a syntax to be able to easily refer to the previously defined relation
[ https://issues.apache.org/jira/browse/PIG-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559946#comment-13559946 ] Cheolsoo Park commented on PIG-3090: Hi Jonathan, I am fine as long as it's well-documented. Doing something special will make the implementation complicated. Introduce a syntax to be able to easily refer to the previously defined relation Key: PIG-3090 URL: https://issues.apache.org/jira/browse/PIG-3090 Project: Pig Issue Type: New Feature Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3090-0.patch, PIG-3090-1.patch Sometimes I feel like swimming with ANTLRs. This particular feature isn't too hard to add... and supports syntax like this: {code} a = load 'thing' as (x:int); b = foreach @ generate x; c = foreach @ generate x; d = foreach @ generate x; {code} I have a patch, though I need to make sure it doesn't change anything (it shouldn't) and I need to add tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3090) Introduce a syntax to be able to easily refer to the previously defined relation
[ https://issues.apache.org/jira/browse/PIG-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560044#comment-13560044 ] Daniel Dai commented on PIG-3090: - I am fine to do it progressively. We can create another issue on top of this to further get rid of the relation name left to the equal sign. Introduce a syntax to be able to easily refer to the previously defined relation Key: PIG-3090 URL: https://issues.apache.org/jira/browse/PIG-3090 Project: Pig Issue Type: New Feature Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3090-0.patch, PIG-3090-1.patch Sometimes I feel like swimming with ANTLRs. This particular feature isn't too hard to add... and supports syntax like this: {code} a = load 'thing' as (x:int); b = foreach @ generate x; c = foreach @ generate x; d = foreach @ generate x; {code} I have a patch, though I need to make sure it doesn't change anything (it shouldn't) and I need to add tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3126) Problem in STORE
[ https://issues.apache.org/jira/browse/PIG-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560049#comment-13560049 ] Daniel Dai commented on PIG-3126: - Which version of Hadoop are you using? Problem in STORE Key: PIG-3126 URL: https://issues.apache.org/jira/browse/PIG-3126 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.2 Environment: CentOS 5.7 Reporter: Vishnu Ganth Attachments: log.txt A = Load 'sample'; store A into '/user/xyz/sample-out'; When this pig script is run using abc user who does not have write permission in '/user/xyz', PIG is unable to create the directory sample-out and the map-reduce job gets killed ultimately without any log. PIG should throw some error log saying permission denied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: Add BigInteger and BigDecimal to Pig
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9012/ --- (Updated Jan. 22, 2013, 10:05 p.m.) Review request for pig, Alan Gates and Mathias Herberts. Description --- This patch adds big integer and big decimal support to Pig. It could use more tests, something I'd appreciate feedback on (but I wanted to make sure the core implementation is good) This addresses bug PIG-2764. https://issues.apache.org/jira/browse/PIG-2764 Diffs (updated) - .gitignore cc62d7d src/org/apache/pig/LoadCaster.java 574769b src/org/apache/pig/PigWarning.java 5de075f src/org/apache/pig/StoreCaster.java 5fe48de src/org/apache/pig/backend/hadoop/BigDecimalWritable.java PRE-CREATION src/org/apache/pig/backend/hadoop/BigIntegerWritable.java PRE-CREATION src/org/apache/pig/backend/hadoop/HDataType.java 84a56b8 src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 96fba6b src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigBigDecimalRawComparator.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigBigIntegerRawComparator.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/WeightedRangePartitioner.java 9749339 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java f40eb43 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Add.java c84b767 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ConstantExpression.java db3840f src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Divide.java 4656c28 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/EqualToExpr.java 6683beb src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ExpressionOperator.java 2806336 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GTOrEqualToExpr.java d64a080 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GreaterThanExpr.java 704d0b8 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LTOrEqualToExpr.java 9dc929e src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LessThanExpr.java 0320698 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Mod.java 6819185 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Multiply.java 7b57bed src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/NotEqualToExpr.java 79a4461 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POBinCond.java 08544d5 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POCast.java e8c2f2c src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POIsNull.java f20b839 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PONegative.java c076ae7 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POProject.java 8887133 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserComparisonFunc.java 479eb83 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 3c7e741 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Subtract.java 79d4c73 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java bf2ba08 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POLocalRearrange.java ddb25f1 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java aa11409 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPreCombinerLocalRearrange.java 52401eb src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POSort.java ad33e7b src/org/apache/pig/backend/hadoop/hbase/HBaseBinaryConverter.java 60a5899 src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java a6f4ea6 src/org/apache/pig/builtin/ABS.java 8a7c631 src/org/apache/pig/builtin/BigDecimalAbs.java PRE-CREATION src/org/apache/pig/builtin/BigIntegerAbs.java PRE-CREATION src/org/apache/pig/builtin/BinStorage.java 38b4492 src/org/apache/pig/builtin/TextLoader.java d5bcf02 src/org/apache/pig/builtin/Utf8StorageConverter.java da12ed6 src/org/apache/pig/data/BinInterSedes.java e851d8b
Re: Review Request: Add BigInteger and BigDecimal to Pig
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9012/#review15587 --- Ship it! LGTM - Mathias Herberts On Jan. 22, 2013, 10:05 p.m., Jonathan Coveney wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9012/ --- (Updated Jan. 22, 2013, 10:05 p.m.) Review request for pig, Alan Gates and Mathias Herberts. Description --- This patch adds big integer and big decimal support to Pig. It could use more tests, something I'd appreciate feedback on (but I wanted to make sure the core implementation is good) This addresses bug PIG-2764. https://issues.apache.org/jira/browse/PIG-2764 Diffs - .gitignore cc62d7d src/org/apache/pig/LoadCaster.java 574769b src/org/apache/pig/PigWarning.java 5de075f src/org/apache/pig/StoreCaster.java 5fe48de src/org/apache/pig/backend/hadoop/BigDecimalWritable.java PRE-CREATION src/org/apache/pig/backend/hadoop/BigIntegerWritable.java PRE-CREATION src/org/apache/pig/backend/hadoop/HDataType.java 84a56b8 src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 96fba6b src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigBigDecimalRawComparator.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigBigIntegerRawComparator.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/WeightedRangePartitioner.java 9749339 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java f40eb43 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Add.java c84b767 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ConstantExpression.java db3840f src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Divide.java 4656c28 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/EqualToExpr.java 6683beb src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ExpressionOperator.java 2806336 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GTOrEqualToExpr.java d64a080 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GreaterThanExpr.java 704d0b8 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LTOrEqualToExpr.java 9dc929e src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LessThanExpr.java 0320698 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Mod.java 6819185 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Multiply.java 7b57bed src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/NotEqualToExpr.java 79a4461 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POBinCond.java 08544d5 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POCast.java e8c2f2c src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POIsNull.java f20b839 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PONegative.java c076ae7 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POProject.java 8887133 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserComparisonFunc.java 479eb83 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 3c7e741 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Subtract.java 79d4c73 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java bf2ba08 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POLocalRearrange.java ddb25f1 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java aa11409 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPreCombinerLocalRearrange.java 52401eb src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POSort.java ad33e7b src/org/apache/pig/backend/hadoop/hbase/HBaseBinaryConverter.java 60a5899 src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java a6f4ea6
Re: Review Request: Add BigInteger and BigDecimal to Pig
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9012/#review15588 --- Ship it! LGTM - Mathias Herberts On Jan. 22, 2013, 10:05 p.m., Jonathan Coveney wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9012/ --- (Updated Jan. 22, 2013, 10:05 p.m.) Review request for pig, Alan Gates and Mathias Herberts. Description --- This patch adds big integer and big decimal support to Pig. It could use more tests, something I'd appreciate feedback on (but I wanted to make sure the core implementation is good) This addresses bug PIG-2764. https://issues.apache.org/jira/browse/PIG-2764 Diffs - .gitignore cc62d7d src/org/apache/pig/LoadCaster.java 574769b src/org/apache/pig/PigWarning.java 5de075f src/org/apache/pig/StoreCaster.java 5fe48de src/org/apache/pig/backend/hadoop/BigDecimalWritable.java PRE-CREATION src/org/apache/pig/backend/hadoop/BigIntegerWritable.java PRE-CREATION src/org/apache/pig/backend/hadoop/HDataType.java 84a56b8 src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 96fba6b src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigBigDecimalRawComparator.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigBigIntegerRawComparator.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/WeightedRangePartitioner.java 9749339 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java f40eb43 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Add.java c84b767 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ConstantExpression.java db3840f src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Divide.java 4656c28 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/EqualToExpr.java 6683beb src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ExpressionOperator.java 2806336 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GTOrEqualToExpr.java d64a080 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GreaterThanExpr.java 704d0b8 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LTOrEqualToExpr.java 9dc929e src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LessThanExpr.java 0320698 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Mod.java 6819185 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Multiply.java 7b57bed src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/NotEqualToExpr.java 79a4461 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POBinCond.java 08544d5 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POCast.java e8c2f2c src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POIsNull.java f20b839 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PONegative.java c076ae7 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POProject.java 8887133 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserComparisonFunc.java 479eb83 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 3c7e741 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Subtract.java 79d4c73 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java bf2ba08 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POLocalRearrange.java ddb25f1 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java aa11409 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPreCombinerLocalRearrange.java 52401eb src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POSort.java ad33e7b src/org/apache/pig/backend/hadoop/hbase/HBaseBinaryConverter.java 60a5899 src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java a6f4ea6
[jira] [Updated] (PIG-2764) Add a biginteger and bigdecimal type to pig
[ https://issues.apache.org/jira/browse/PIG-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-2764: -- Attachment: PIG-2764-5.patch Updated RB, and attaching here. Add a biginteger and bigdecimal type to pig --- Key: PIG-2764 URL: https://issues.apache.org/jira/browse/PIG-2764 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Assignee: Jonathan Coveney Attachments: fixedpoint.patch, PIG-2764-0.patch, PIG-2764-1.patch, PIG-2764-2_nows.patch, PIG-2764-2.patch, PIG-2764-3.patch, PIG-2764-4.patch, PIG-2764-5.patch I think it would be useful for applications where precision is more important than speed to have the option of using java's bigdecimal and biginteger types natively. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2764) Add a biginteger and bigdecimal type to pig
[ https://issues.apache.org/jira/browse/PIG-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-2764: -- Fix Version/s: 0.12 Status: Patch Available (was: Open) Add a biginteger and bigdecimal type to pig --- Key: PIG-2764 URL: https://issues.apache.org/jira/browse/PIG-2764 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: fixedpoint.patch, PIG-2764-0.patch, PIG-2764-1.patch, PIG-2764-2_nows.patch, PIG-2764-2.patch, PIG-2764-3.patch, PIG-2764-4.patch, PIG-2764-5.patch I think it would be useful for applications where precision is more important than speed to have the option of using java's bigdecimal and biginteger types natively. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3090) Introduce a syntax to be able to easily refer to the previously defined relation
[ https://issues.apache.org/jira/browse/PIG-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560112#comment-13560112 ] Cheolsoo Park commented on PIG-3090: +1. I am comfortable with opening a blocker jira for documentation. Introduce a syntax to be able to easily refer to the previously defined relation Key: PIG-3090 URL: https://issues.apache.org/jira/browse/PIG-3090 Project: Pig Issue Type: New Feature Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3090-0.patch, PIG-3090-1.patch Sometimes I feel like swimming with ANTLRs. This particular feature isn't too hard to add... and supports syntax like this: {code} a = load 'thing' as (x:int); b = foreach @ generate x; c = foreach @ generate x; d = foreach @ generate x; {code} I have a patch, though I need to make sure it doesn't change anything (it shouldn't) and I need to add tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3129) Document syntax to refer to previous relation
Jonathan Coveney created PIG-3129: - Summary: Document syntax to refer to previous relation Key: PIG-3129 URL: https://issues.apache.org/jira/browse/PIG-3129 Project: Pig Issue Type: Task Affects Versions: 0.12 Reporter: Jonathan Coveney Priority: Blocker Fix For: 0.12 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3130) Support for nested projections
Uri Laserson created PIG-3130: - Summary: Support for nested projections Key: PIG-3130 URL: https://issues.apache.org/jira/browse/PIG-3130 Project: Pig Issue Type: Improvement Components: parser Affects Versions: 0.10.0 Reporter: Uri Laserson I have tuple like so: (a: (b:int, c:int, d:int, e:int)) I would like to call a UDF and pass a range of the nested tuple. This is what I would expect the command to be: FOREACH alias GENERATE myUDF(a.(c .. e)); but this gives me an error like ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: line 12, column 133 mismatched input '(' expecting SEMI_COLON -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3078) Make a UDF that, given a string, returns just the columns prefixed by that string
[ https://issues.apache.org/jira/browse/PIG-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560128#comment-13560128 ] Daniel Dai commented on PIG-3078: - +1. Need some document (in addition to javadoc) though. Make a UDF that, given a string, returns just the columns prefixed by that string - Key: PIG-3078 URL: https://issues.apache.org/jira/browse/PIG-3078 Project: Pig Issue Type: Bug Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3078-0.patch, PIG-3078-1.patch This comes up fairly often, usually as the result of a join. Given that the resulting schema has the column name prepended, a udf in the following form could give just the columns from the desired relation: Pluck('relation_name', *) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3131) Document PluckTuple UDF
Jonathan Coveney created PIG-3131: - Summary: Document PluckTuple UDF Key: PIG-3131 URL: https://issues.apache.org/jira/browse/PIG-3131 Project: Pig Issue Type: Task Affects Versions: 0.12 Reporter: Jonathan Coveney Priority: Blocker Fix For: 0.12 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3132) NPE when illustrating a relation with HCatLoader
Daniel Dai created PIG-3132: --- Summary: NPE when illustrating a relation with HCatLoader Key: PIG-3132 URL: https://issues.apache.org/jira/browse/PIG-3132 Project: Pig Issue Type: Bug Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12 Get NPE exception when illustrate a relation with HCatLoader: {code} A = LOAD 'studenttab10k' USING org.apache.hcatalog.pig.HCatLoader(); illustrate A; {code} Exception: {code} java.lang.NullPointerException at org.apache.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:274) at org.apache.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:238) at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:61) at org.apache.pig.impl.io.ReadToEndLoader.getNextHelper(ReadToEndLoader.java:210) at org.apache.pig.impl.io.ReadToEndLoader.getNext(ReadToEndLoader.java:190) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:129) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:267) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.pig.pen.LocalMapReduceSimulator.launchPig(LocalMapReduceSimulator.java:194) at org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:257) at org.apache.pig.pen.ExampleGenerator.readBaseData(ExampleGenerator.java:222) at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:154) at org.apache.pig.PigServer.getExamples(PigServer.java:1245) at org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:698) at org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:591) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:306) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67) {code} HCatalog side is tracked with HCATALOG-163. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3078) Make a UDF that, given a string, returns just the columns prefixed by that string
[ https://issues.apache.org/jira/browse/PIG-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560135#comment-13560135 ] Jonathan Coveney commented on PIG-3078: --- Made a doc jira. Will commit. Note: I think it's fine to defer documentation w/JIRA's because it's possible that future details will change how it looks. For example, PIG-3010 would change how I'd implement this, potentially! Make a UDF that, given a string, returns just the columns prefixed by that string - Key: PIG-3078 URL: https://issues.apache.org/jira/browse/PIG-3078 Project: Pig Issue Type: Bug Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3078-0.patch, PIG-3078-1.patch This comes up fairly often, usually as the result of a join. Given that the resulting schema has the column name prepended, a udf in the following form could give just the columns from the desired relation: Pluck('relation_name', *) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3132) NPE when illustrating a relation with HCatLoader
[ https://issues.apache.org/jira/browse/PIG-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-3132: Attachment: PIG-3132-1.patch NPE when illustrating a relation with HCatLoader - Key: PIG-3132 URL: https://issues.apache.org/jira/browse/PIG-3132 Project: Pig Issue Type: Bug Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12 Attachments: PIG-3132-1.patch Get NPE exception when illustrate a relation with HCatLoader: {code} A = LOAD 'studenttab10k' USING org.apache.hcatalog.pig.HCatLoader(); illustrate A; {code} Exception: {code} java.lang.NullPointerException at org.apache.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:274) at org.apache.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:238) at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:61) at org.apache.pig.impl.io.ReadToEndLoader.getNextHelper(ReadToEndLoader.java:210) at org.apache.pig.impl.io.ReadToEndLoader.getNext(ReadToEndLoader.java:190) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:129) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:267) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.pig.pen.LocalMapReduceSimulator.launchPig(LocalMapReduceSimulator.java:194) at org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:257) at org.apache.pig.pen.ExampleGenerator.readBaseData(ExampleGenerator.java:222) at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:154) at org.apache.pig.PigServer.getExamples(PigServer.java:1245) at org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:698) at org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:591) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:306) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67) {code} HCatalog side is tracked with HCATALOG-163. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3132) NPE when illustrating a relation with HCatLoader
[ https://issues.apache.org/jira/browse/PIG-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560140#comment-13560140 ] Daniel Dai commented on PIG-3132: - There are multiple root causes: 1. There is no distinguish between backend and frontend in illustrate 2. ReadToEndLoader (invoked by POLoader, only used in illustrate) does not invoke setLocation 3. ReadToEndLoader does not have signature There is also a fix in HCat side tracked by HCATALOG-163. NPE when illustrating a relation with HCatLoader - Key: PIG-3132 URL: https://issues.apache.org/jira/browse/PIG-3132 Project: Pig Issue Type: Bug Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12 Attachments: PIG-3132-1.patch Get NPE exception when illustrate a relation with HCatLoader: {code} A = LOAD 'studenttab10k' USING org.apache.hcatalog.pig.HCatLoader(); illustrate A; {code} Exception: {code} java.lang.NullPointerException at org.apache.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:274) at org.apache.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:238) at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:61) at org.apache.pig.impl.io.ReadToEndLoader.getNextHelper(ReadToEndLoader.java:210) at org.apache.pig.impl.io.ReadToEndLoader.getNext(ReadToEndLoader.java:190) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:129) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:267) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.pig.pen.LocalMapReduceSimulator.launchPig(LocalMapReduceSimulator.java:194) at org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:257) at org.apache.pig.pen.ExampleGenerator.readBaseData(ExampleGenerator.java:222) at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:154) at org.apache.pig.PigServer.getExamples(PigServer.java:1245) at org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:698) at org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:591) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:306) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67) {code} HCatalog side is tracked with HCATALOG-163. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2553) Pig shouldn't allow attempts to write multiple relations into same directory
[ https://issues.apache.org/jira/browse/PIG-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560142#comment-13560142 ] Prashant Kommireddi commented on PIG-2553: -- Hey Cheolsoo, been out on vacation all this while. Will be getting back to this soon. Pig shouldn't allow attempts to write multiple relations into same directory Key: PIG-2553 URL: https://issues.apache.org/jira/browse/PIG-2553 Project: Pig Issue Type: Improvement Reporter: Dmitriy V. Ryaboy Assignee: Prashant Kommireddi Attachments: PIG-2553_1.patch, PIG-2553.patch We've seen multiple occasions where users accidentally try to store 2 or more different relations to the same destination directory. Currently, this passes the Pig planner and fails on MR side due to concurrent attempts to create the same part file on the reducer. This is extremely confusing to the user, and hard to debug. We should instead fail their scripts before they are even submitted, since we can identify the erroneous condition from the beginning. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3114) Duplicated macro name error when using pigunit
[ https://issues.apache.org/jira/browse/PIG-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-3114: Assignee: Chetan Nadgire Duplicated macro name error when using pigunit -- Key: PIG-3114 URL: https://issues.apache.org/jira/browse/PIG-3114 Project: Pig Issue Type: Bug Components: parser Affects Versions: 0.11 Reporter: Chetan Nadgire Assignee: Chetan Nadgire Fix For: 0.12 Attachments: PIG-3114.patch, PIG-3114.patch I'm using PigUnit to test a pig script within which a macro is defined. Pig runs fine on cluster but getting parsing error with pigunit. So I tried very basic pig script with macro and getting similar error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. line 9 null. Reason: Duplicated macro name 'my_macro_1' at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1546) at org.apache.pig.PigServer.registerQuery(PigServer.java:516) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:988) at org.apache.pig.pigunit.pig.GruntParser.processPig(GruntParser.java:61) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194) at org.apache.pig.pigunit.pig.PigServer.registerScript(PigServer.java:56) at org.apache.pig.pigunit.PigTest.registerScript(PigTest.java:160) at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:231) at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:261) at FirstPigTest.MyPigTest.testTop2Queries(MyPigTest.java:32) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: Failed to parse: line 9 null. Reason: Duplicated macro name 'my_macro_1' at org.apache.pig.parser.QueryParserDriver.makeMacroDef(QueryParserDriver.java:406) at org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver.java:277) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:178) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1599) ... 30 more Pig script which is failing : {code:title=test.pig|borderStyle=solid} DEFINE my_macro_1 (QUERY, A) RETURNS C { $C = ORDER $QUERY BY total DESC, $A; } ; data = LOAD 'input' AS (query:CHARARRAY); queries_group = GROUP data BY query; queries_count = FOREACH queries_group GENERATE group AS query, COUNT(data) AS total; queries_ordered = my_macro_1(queries_count, query); queries_limit = LIMIT queries_ordered 2; STORE queries_limit INTO 'output'; {code} If I remove macro pigunit works fine. Even just defining macro without using it results in parsing error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3090) Introduce a syntax to be able to easily refer to the previously defined relation
[ https://issues.apache.org/jira/browse/PIG-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-3090: -- Resolution: Fixed Status: Resolved (was: Patch Available) Introduce a syntax to be able to easily refer to the previously defined relation Key: PIG-3090 URL: https://issues.apache.org/jira/browse/PIG-3090 Project: Pig Issue Type: New Feature Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3090-0.patch, PIG-3090-1.patch Sometimes I feel like swimming with ANTLRs. This particular feature isn't too hard to add... and supports syntax like this: {code} a = load 'thing' as (x:int); b = foreach @ generate x; c = foreach @ generate x; d = foreach @ generate x; {code} I have a patch, though I need to make sure it doesn't change anything (it shouldn't) and I need to add tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3114) Duplicated macro name error when using pigunit
[ https://issues.apache.org/jira/browse/PIG-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560148#comment-13560148 ] Daniel Dai commented on PIG-3114: - Looks good. Would you mind add a unit test case? Duplicated macro name error when using pigunit -- Key: PIG-3114 URL: https://issues.apache.org/jira/browse/PIG-3114 Project: Pig Issue Type: Bug Components: parser Affects Versions: 0.11 Reporter: Chetan Nadgire Assignee: Chetan Nadgire Fix For: 0.12 Attachments: PIG-3114.patch, PIG-3114.patch I'm using PigUnit to test a pig script within which a macro is defined. Pig runs fine on cluster but getting parsing error with pigunit. So I tried very basic pig script with macro and getting similar error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. line 9 null. Reason: Duplicated macro name 'my_macro_1' at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1546) at org.apache.pig.PigServer.registerQuery(PigServer.java:516) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:988) at org.apache.pig.pigunit.pig.GruntParser.processPig(GruntParser.java:61) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194) at org.apache.pig.pigunit.pig.PigServer.registerScript(PigServer.java:56) at org.apache.pig.pigunit.PigTest.registerScript(PigTest.java:160) at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:231) at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:261) at FirstPigTest.MyPigTest.testTop2Queries(MyPigTest.java:32) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: Failed to parse: line 9 null. Reason: Duplicated macro name 'my_macro_1' at org.apache.pig.parser.QueryParserDriver.makeMacroDef(QueryParserDriver.java:406) at org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver.java:277) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:178) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1599) ... 30 more Pig script which is failing : {code:title=test.pig|borderStyle=solid} DEFINE my_macro_1 (QUERY, A) RETURNS C { $C = ORDER $QUERY BY total DESC, $A; } ; data = LOAD 'input' AS (query:CHARARRAY); queries_group = GROUP data BY query; queries_count = FOREACH queries_group GENERATE group AS query, COUNT(data) AS total; queries_ordered = my_macro_1(queries_count, query); queries_limit = LIMIT queries_ordered 2; STORE queries_limit INTO 'output'; {code} If I remove macro pigunit works fine. Even just defining macro without using it results in parsing error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3091) Make schema, header and stats file configurable in JsonMetadata
[ https://issues.apache.org/jira/browse/PIG-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Kommireddi updated PIG-3091: - Attachment: PIG-3091_2.patch Jon, ready for your review! Make schema, header and stats file configurable in JsonMetadata --- Key: PIG-3091 URL: https://issues.apache.org/jira/browse/PIG-3091 Project: Pig Issue Type: Improvement Reporter: Prashant Kommireddi Assignee: Prashant Kommireddi Fix For: 0.12 Attachments: PIG-3091_1.patch, PIG-3091_2.patch, PIG-3091.patch JsonMetadata currently sets schema, header and stats file to the following {code} private String schemaFileName = .pig_schema; private String headerFileName = .pig_header; private String statFileName = .pig_stats; {code} This could be made configurable so users can create custom schema files (used by custom Load/StoreFuncs) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3078) Make a UDF that, given a string, returns just the columns prefixed by that string
[ https://issues.apache.org/jira/browse/PIG-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560153#comment-13560153 ] Daniel Dai commented on PIG-3078: - +1 Make a UDF that, given a string, returns just the columns prefixed by that string - Key: PIG-3078 URL: https://issues.apache.org/jira/browse/PIG-3078 Project: Pig Issue Type: Bug Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3078-0.patch, PIG-3078-1.patch This comes up fairly often, usually as the result of a join. Given that the resulting schema has the column name prepended, a udf in the following form could give just the columns from the desired relation: Pluck('relation_name', *) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3091) Make schema, header and stats file configurable in JsonMetadata
[ https://issues.apache.org/jira/browse/PIG-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560163#comment-13560163 ] Jonathan Coveney commented on PIG-3091: --- +1. I'll commit shortly. Make schema, header and stats file configurable in JsonMetadata --- Key: PIG-3091 URL: https://issues.apache.org/jira/browse/PIG-3091 Project: Pig Issue Type: Improvement Reporter: Prashant Kommireddi Assignee: Prashant Kommireddi Fix For: 0.12 Attachments: PIG-3091_1.patch, PIG-3091_2.patch, PIG-3091.patch JsonMetadata currently sets schema, header and stats file to the following {code} private String schemaFileName = .pig_schema; private String headerFileName = .pig_header; private String statFileName = .pig_stats; {code} This could be made configurable so users can create custom schema files (used by custom Load/StoreFuncs) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3078) Make a UDF that, given a string, returns just the columns prefixed by that string
[ https://issues.apache.org/jira/browse/PIG-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-3078: -- Resolution: Fixed Status: Resolved (was: Patch Available) Make a UDF that, given a string, returns just the columns prefixed by that string - Key: PIG-3078 URL: https://issues.apache.org/jira/browse/PIG-3078 Project: Pig Issue Type: Bug Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3078-0.patch, PIG-3078-1.patch This comes up fairly often, usually as the result of a join. Given that the resulting schema has the column name prepended, a udf in the following form could give just the columns from the desired relation: Pluck('relation_name', *) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3105) Fix TestJobSubmission unit test failure.
[ https://issues.apache.org/jira/browse/PIG-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3105: --- Fix Version/s: (was: 0.11) 0.12 I'm moving this to the next release as it does not seem to be a blocker for Pig 0.11 Fix TestJobSubmission unit test failure. Key: PIG-3105 URL: https://issues.apache.org/jira/browse/PIG-3105 Project: Pig Issue Type: Bug Components: tools Affects Versions: 0.10.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.12 Attachments: PIG-3105.patch Currently with Hadoop 1.0, the TestJobSubmission unit test fails. This is due to HBASE-7423. This is a work around to that issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2846) Can we skip hcat related e2e when hcat is not installed?
[ https://issues.apache.org/jira/browse/PIG-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560184#comment-13560184 ] Julien Le Dem commented on PIG-2846: Hey [~cheolsoo] should we detach this from Pig 0.11 ? Can we skip hcat related e2e when hcat is not installed? Key: PIG-2846 URL: https://issues.apache.org/jira/browse/PIG-2846 Project: Pig Issue Type: Sub-task Reporter: Koji Noguchi Priority: Trivial Attachments: pig-2846-trunk-v1.txt Trying pig e2e for the first time, I see couple of the tests (HCatDDL_1,HCatDDL_2 and Jython_Command_1) failing with bq. java.io.IOException: Cannot run program /usr/local/hcat/bin/hcat: bq. java.io.IOException: error=2, No such file or directory Is it ok to change the test_harness to skip these tests when hcat does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3005) TestLargeFile#testOrderBy is failing
[ https://issues.apache.org/jira/browse/PIG-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3005: --- Issue Type: Bug (was: Sub-task) Parent: (was: PIG-2972) TestLargeFile#testOrderBy is failing Key: PIG-3005 URL: https://issues.apache.org/jira/browse/PIG-3005 Project: Pig Issue Type: Bug Environment: Mac OSX 10.6.8 Reporter: Jonathan Coveney Fix For: 0.12 When run locally, at least, this test is failing for me. Has anyone else noticed this failing? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Custom Scripting Engine
So, something like this is not currently possible, but I think it would be possible to expose a set of interfaces that would make this possible. That said, why is this desirable? Is your goal to override one of the existing SE's, or something? I could imagine reworking things so that anyone can register an arbitrary SE, and then we can implement the current SE's in terms of that interface. That said, I'm not sure of a compelling reason to do this, and would love a use case. I worked on the JRuby implementation and reviewed the Groovy one and think that we could be doing a lot more with scripting languages, so you have my attention. 2013/1/21 Connor Woodson cwoodson@gmail.com I want to write a custom scripting engine and I would like to not have to modify the enum in ScriptingEngine.java to get it to work both in the 'register' command for UDFs, but also for embedded scripts. From what I can tell, the former is possible by passing in a FQCN to the register command instead of one of the keywords; however, I can't tell if it is possible to get Pig to run my scripting engine when I pass it a non-pig file (e.g. you pass it a .py file and it runs the jython scripting engine). So is this second use possible, or (for now) can custom SE's only be used for UDFs? (I'll admit here that I don't understand what I meant in the end of my previous email; feel free to ignore it). Thanks, - Connor On Mon, Jan 21, 2013 at 5:04 PM, Jonathan Coveney jcove...@gmail.com wrote: Can you describe at a higher level what you have in mind? 2013/1/21 Connor Woodson cwoodson@gmail.com Is there a way to get Pig to use your custom scripting engine without having to modify ScriptingEngine.java and placing it in the enum? It looks like it's possible with enums, but what about for embedding pig? (as in how Pig can run python scripts). - Connor On Mon, Jan 21, 2013 at 1:59 PM, Daniel Dai da...@hortonworks.com wrote: Pig currently support jython, jruby, javascript and groovy. If you need to write other scripting engine, extend ScriptEngine. Here are some references: 1. http://www.slideshare.net/daijy/pig-programming-is-more-fun-new-features-in-pig (pp 24, 25) 2. Groovy UDF: https://issues.apache.org/jira/browse/PIG-2763 3. JRuby UDF: https://issues.apache.org/jira/browse/PIG-2317 4. Javascript UDF: https://issues.apache.org/jira/browse/PIG-1794 Thanks, Daniel On Fri, Jan 18, 2013 at 6:42 PM, Connor Woodson cwoodson@gmail.com wrote: Is there any support for a custom scripting engine, to allow UDFs to be written in a different language / embed pig in another language? - Connor
[jira] [Resolved] (PIG-3091) Make schema, header and stats file configurable in JsonMetadata
[ https://issues.apache.org/jira/browse/PIG-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney resolved PIG-3091. --- Resolution: Fixed Committed. Thanks, Prashant! Make schema, header and stats file configurable in JsonMetadata --- Key: PIG-3091 URL: https://issues.apache.org/jira/browse/PIG-3091 Project: Pig Issue Type: Improvement Reporter: Prashant Kommireddi Assignee: Prashant Kommireddi Fix For: 0.12 Attachments: PIG-3091_1.patch, PIG-3091_2.patch, PIG-3091.patch JsonMetadata currently sets schema, header and stats file to the following {code} private String schemaFileName = .pig_schema; private String headerFileName = .pig_header; private String statFileName = .pig_stats; {code} This could be made configurable so users can create custom schema files (used by custom Load/StoreFuncs) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: Allow UDFs to flatten themselves
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9060/ --- Review request for pig. Description --- see PIG-3010 Diffs - src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java 834e932 src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRUtil.java 93de6d5 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java a4abdd8 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java bf2ba08 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POJoinPackage.java 7e357ec src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POOptimizedForEach.java 91b3f00 src/org/apache/pig/builtin/FlattenOutput.java PRE-CREATION src/org/apache/pig/builtin/UdfFlatten.java PRE-CREATION src/org/apache/pig/newplan/logical/Util.java c992550 src/org/apache/pig/newplan/logical/relational/LOGenerate.java 383ba15 src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java 5f33b07 src/org/apache/pig/newplan/logical/rules/ColumnPruneHelper.java 19e7628 src/org/apache/pig/newplan/logical/rules/ColumnPruneVisitor.java a458173 src/org/apache/pig/newplan/logical/rules/DuplicateForEachColumnRewrite.java 7091587 src/org/apache/pig/newplan/logical/rules/LimitOptimizer.java c32f9b3 src/org/apache/pig/newplan/logical/rules/MergeForEach.java 8415a5c src/org/apache/pig/newplan/logical/rules/OptimizerUtils.java 4356130 src/org/apache/pig/newplan/logical/rules/PushDownForEachFlatten.java 4db8e35 src/org/apache/pig/newplan/logical/rules/TypeCastInserter.java 35dcfce src/org/apache/pig/newplan/logical/visitor/LineageFindRelVisitor.java 4235d82 src/org/apache/pig/newplan/logical/visitor/ProjectStarExpander.java 135de1c src/org/apache/pig/newplan/logical/visitor/TypeCheckingRelVisitor.java 466ed1a src/org/apache/pig/parser/LogicalPlanBuilder.java 9fc69fc src/org/apache/pig/parser/LogicalPlanGenerator.g 394aad0 test/org/apache/pig/test/OptimizeLimitPlanPrinter.java eadc1d5 test/org/apache/pig/test/TestExampleGenerator.java 4246b65 Diff: https://reviews.apache.org/r/9060/diff/ Testing --- Thanks, Jonathan Coveney
Re: Custom Scripting Engine
I'm starting work on an R scripting engine; I'm not entirely sure how it will be used, but I know that there have been attempts to get R working with MapReduce / EMR and I thought it would be cool to do that through Pig. (One fun use case might be to generate plots/graphs during the MR job (then do something with them)) The easy answer for how to get this working with Pig is to just stick new scripting engines with the existing ones and update the ScriptingEngine enum to include those; however, I would like to use this in EMR which doesn't update its software regularly and so I was hoping there was some hook to get this scripting engine called, but it looks like it'll just have to be used for UDFs for now. If a change is going to be made, I think what would be helpful is a change in how the ScriptingEngine decides which subclass to call; right now (from what I can tell) it will only look at the file suffix or the #! first line of the script and try and match those with its internal list. Maybe allow an annotation like #@ FQCN of a ScriptingEngine as the first line of a script to force Pig to use a specific engine. - Connor On Tue, Jan 22, 2013 at 3:56 PM, Jonathan Coveney jcove...@gmail.comwrote: So, something like this is not currently possible, but I think it would be possible to expose a set of interfaces that would make this possible. That said, why is this desirable? Is your goal to override one of the existing SE's, or something? I could imagine reworking things so that anyone can register an arbitrary SE, and then we can implement the current SE's in terms of that interface. That said, I'm not sure of a compelling reason to do this, and would love a use case. I worked on the JRuby implementation and reviewed the Groovy one and think that we could be doing a lot more with scripting languages, so you have my attention. 2013/1/21 Connor Woodson cwoodson@gmail.com I want to write a custom scripting engine and I would like to not have to modify the enum in ScriptingEngine.java to get it to work both in the 'register' command for UDFs, but also for embedded scripts. From what I can tell, the former is possible by passing in a FQCN to the register command instead of one of the keywords; however, I can't tell if it is possible to get Pig to run my scripting engine when I pass it a non-pig file (e.g. you pass it a .py file and it runs the jython scripting engine). So is this second use possible, or (for now) can custom SE's only be used for UDFs? (I'll admit here that I don't understand what I meant in the end of my previous email; feel free to ignore it). Thanks, - Connor On Mon, Jan 21, 2013 at 5:04 PM, Jonathan Coveney jcove...@gmail.com wrote: Can you describe at a higher level what you have in mind? 2013/1/21 Connor Woodson cwoodson@gmail.com Is there a way to get Pig to use your custom scripting engine without having to modify ScriptingEngine.java and placing it in the enum? It looks like it's possible with enums, but what about for embedding pig? (as in how Pig can run python scripts). - Connor On Mon, Jan 21, 2013 at 1:59 PM, Daniel Dai da...@hortonworks.com wrote: Pig currently support jython, jruby, javascript and groovy. If you need to write other scripting engine, extend ScriptEngine. Here are some references: 1. http://www.slideshare.net/daijy/pig-programming-is-more-fun-new-features-in-pig (pp 24, 25) 2. Groovy UDF: https://issues.apache.org/jira/browse/PIG-2763 3. JRuby UDF: https://issues.apache.org/jira/browse/PIG-2317 4. Javascript UDF: https://issues.apache.org/jira/browse/PIG-1794 Thanks, Daniel On Fri, Jan 18, 2013 at 6:42 PM, Connor Woodson cwoodson@gmail.com wrote: Is there any support for a custom scripting engine, to allow UDFs to be written in a different language / embed pig in another language? - Connor
[jira] [Updated] (PIG-3010) Allow UDF's to flatten themselves
[ https://issues.apache.org/jira/browse/PIG-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-3010: -- Attachment: PIG-3010-3.patch PIG-3010-3_nows.patch I went ahead and made a reviewboard here: https://reviews.apache.org/r/9060/ This is not a small patch, but I'd love comments. I think this would be a huge bump in expressivity for Pig. The current system is very annoying and leads to a lot of annoying realiasing. Allow UDF's to flatten themselves - Key: PIG-3010 URL: https://issues.apache.org/jira/browse/PIG-3010 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3010-0.patch, PIG-3010-1.patch, PIG-3010-2_nowhitespace.patch, PIG-3010-2.patch, PIG-3010-3_nows.patch, PIG-3010-3.patch This is something I thought would be cool for a while, so I sat down and did it because I think there are some useful debugging tools it'd help with. The idea is that if you attach an annotation to a UDF, the Tuple or DataBag you output will be flattened. This is quite powerful. A very common pattern is: a = foreach data generate Flatten(MyUdf(thing)) as (a,b,c); This would let you just do: a = foreach data generate MyUdf(thing); With the exact same result! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2846) Can we skip hcat related e2e when hcat is not installed?
[ https://issues.apache.org/jira/browse/PIG-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560219#comment-13560219 ] Cheolsoo Park commented on PIG-2846: Hi Julien, yes, I just did. Can we skip hcat related e2e when hcat is not installed? Key: PIG-2846 URL: https://issues.apache.org/jira/browse/PIG-2846 Project: Pig Issue Type: Improvement Reporter: Koji Noguchi Priority: Trivial Attachments: pig-2846-trunk-v1.txt Trying pig e2e for the first time, I see couple of the tests (HCatDDL_1,HCatDDL_2 and Jython_Command_1) failing with bq. java.io.IOException: Cannot run program /usr/local/hcat/bin/hcat: bq. java.io.IOException: error=2, No such file or directory Is it ok to change the test_harness to skip these tests when hcat does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2972) Umbrella ticket for test failures in 0.11/trunk
[ https://issues.apache.org/jira/browse/PIG-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park resolved PIG-2972. Resolution: Fixed Closing it since all sub-tasks are resolved or moved out of 0.11. Umbrella ticket for test failures in 0.11/trunk --- Key: PIG-2972 URL: https://issues.apache.org/jira/browse/PIG-2972 Project: Pig Issue Type: Bug Affects Versions: 0.11 Reporter: Rohini Palaniswamy Fix For: 0.11 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3124) Push FLATTENs After FILTERs If Possible
[ https://issues.apache.org/jira/browse/PIG-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560244#comment-13560244 ] Jonathan Coveney commented on PIG-3124: --- Nick, The new patch does not include a lot of changes from the old one. Can you upload a definitive patch? Push FLATTENs After FILTERs If Possible --- Key: PIG-3124 URL: https://issues.apache.org/jira/browse/PIG-3124 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.10.0 Reporter: Nick White Assignee: Nick White Fix For: 0.12 Attachments: PIG-3124.0.patch, PIG-3124.1.patch When optimizing a logical plan, it's safe to push a FLATTEN after a FILTER if the columns being flattened don't occur in the expression that the filter is being done on. When the FILTER comes first the FLATTEN generates fewer rows (usually), and so is more efficient. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3124) Push FLATTENs After FILTERs If Possible
[ https://issues.apache.org/jira/browse/PIG-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560250#comment-13560250 ] Daniel Dai commented on PIG-3124: - Nick added a test case and some comments, but remove the code fix in the new patch I believe. Push FLATTENs After FILTERs If Possible --- Key: PIG-3124 URL: https://issues.apache.org/jira/browse/PIG-3124 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.10.0 Reporter: Nick White Assignee: Nick White Fix For: 0.12 Attachments: PIG-3124.0.patch, PIG-3124.1.patch When optimizing a logical plan, it's safe to push a FLATTEN after a FILTER if the columns being flattened don't occur in the expression that the filter is being done on. When the FILTER comes first the FLATTEN generates fewer rows (usually), and so is more efficient. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3133) Revamp algebraic interface to actually return classes
Jonathan Coveney created PIG-3133: - Summary: Revamp algebraic interface to actually return classes Key: PIG-3133 URL: https://issues.apache.org/jira/browse/PIG-3133 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Fix For: 0.12 The current algebraic interface is a bit weird to work with. It would make a lot more sense to let people return Class? extends EvalFuncTuple or what have you, or even a FuncSpec, but the current string based approach circumvents the whole point of using Java and is annoying. I think we should have an abstract EFInitial, EFIntermediate, EFFinal which implemented the exec function for the user, but in terms of a simpler, clearer interface. This way if people really want the old way they can, but we can present them something less ugly. This would also be a good time to clarify the contracts of Algebraics and simplify them (the initial function's a tuple which contains a bag which contains 1 tuple is super whack). If anyone wants to work on this let me know because this is the sort of thing I will probably bang out when procrastinating something else. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (33 issues) Subscriber: pigdaily Key Summary PIG-3124Push FLATTENs After FILTERs If Possible https://issues.apache.org/jira/browse/PIG-3124 PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections https://issues.apache.org/jira/browse/PIG-3123 PIG-3122Operators should not implicitly become reserved keywords https://issues.apache.org/jira/browse/PIG-3122 PIG-3114Duplicated macro name error when using pigunit https://issues.apache.org/jira/browse/PIG-3114 PIG-3109Missing license headers https://issues.apache.org/jira/browse/PIG-3109 PIG-3108HBaseStorage returns empty maps when mixing wildcard- with other columns https://issues.apache.org/jira/browse/PIG-3108 PIG-3105Fix TestJobSubmission unit test failure. https://issues.apache.org/jira/browse/PIG-3105 PIG-3098Add another test for the self join case https://issues.apache.org/jira/browse/PIG-3098 PIG-3088Add a builtin udf which removes prefixes https://issues.apache.org/jira/browse/PIG-3088 PIG-3086Allow A Prefix To Be Added To URIs In PigUnit Tests https://issues.apache.org/jira/browse/PIG-3086 PIG-3082outputSchema of a UDF allows two usages when describing a Tuple schema https://issues.apache.org/jira/browse/PIG-3082 PIG-3073POUserFunc creating log spam for large scripts https://issues.apache.org/jira/browse/PIG-3073 PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness https://issues.apache.org/jira/browse/PIG-3069 PIG-3028testGrunt dev test needs some command filters to run correctly without cygwin https://issues.apache.org/jira/browse/PIG-3028 PIG-3027pigTest unit test needs a newline filter for comparisons of golden multi-line https://issues.apache.org/jira/browse/PIG-3027 PIG-3026Pig checked-in baseline comparisons need a pre-filter to address OS-specific newline differences https://issues.apache.org/jira/browse/PIG-3026 PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline script needs simplification https://issues.apache.org/jira/browse/PIG-3025 PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is brittle https://issues.apache.org/jira/browse/PIG-3024 PIG-3015Rewrite of AvroStorage https://issues.apache.org/jira/browse/PIG-3015 PIG-3010Allow UDF's to flatten themselves https://issues.apache.org/jira/browse/PIG-3010 PIG-2959Add a pig.cmd for Pig to run under Windows https://issues.apache.org/jira/browse/PIG-2959 PIG-2955 Fix bunch of Pig e2e tests on Windows https://issues.apache.org/jira/browse/PIG-2955 PIG-2878Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitive. https://issues.apache.org/jira/browse/PIG-2878 PIG-2873Converting bin/pig shell script to python https://issues.apache.org/jira/browse/PIG-2873 PIG-2834MultiStorage requires unused constructor argument https://issues.apache.org/jira/browse/PIG-2834 PIG-2764Add a biginteger and bigdecimal type to pig https://issues.apache.org/jira/browse/PIG-2764 PIG-2661Pig uses an extra job for loading data in Pigmix L9 https://issues.apache.org/jira/browse/PIG-2661 PIG-2645PigSplit does not handle the case where SerializationFactory returns null https://issues.apache.org/jira/browse/PIG-2645 PIG-2507Semicolon in paramenters for UDF results in parsing error https://issues.apache.org/jira/browse/PIG-2507 PIG-2417Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation. https://issues.apache.org/jira/browse/PIG-2417 PIG-2312NPE when relation and column share the same name and used in Nested Foreach https://issues.apache.org/jira/browse/PIG-2312 PIG-1942script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects https://issues.apache.org/jira/browse/PIG-1942 PIG-1237Piggybank MutliStorage - specify field to write in output https://issues.apache.org/jira/browse/PIG-1237 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225filterId=12322384
Re: Custom Scripting Engine
There are two ways to go about using R with java (that I've found). Both are a little bit of a hassle depending on your setup. JRI is a JNI for R, so you don't need R installed on the machine for it to work. But you do need to include a set of DLLs in the classpath; the best way I've found to do this is to bundle the dll's in the .jar and then copy them to the local directory at runtime (as copying them elsewhere and changing java.library.path won't work). There are some features missing from JRI, though, especially the ability for multiple environments/sessions; I don't quite yet have down a plan for the R/Pig integration, but having sessions might be useful. The other method is through Rserve, which is both a java package and an application; the application sets up an R server that by default allows only a single connection from a local machine (if you wanted, each map-reduce job could connect to the same R server/instance, but I don't think that's useful). To start this up, you would need R installed and then run Rserve. In EMR, this would be possible as it does have R, so you would just need a bootstrap script to start R. Optionally, it is probably possible to tell Rserve to start from within java, but that's much trickier. I prefer the first method as it eliminates the requirement of having R installed; however, I'm hoping to implement both (for Rserve, I'll require that the server is already started; and maybe include an option for connecting to a specific server). I don't have a clear vision of how R/Pig will interact; it will have to be something different than Python or JScript, but I don't know how different. I want to just scratch out something basic and then try and evolve it from there. I'll go ahead and submit that Jira. Thanks, - Connor On Tue, Jan 22, 2013 at 4:44 PM, Jonathan Coveney jcove...@gmail.comwrote: Ahhh, I see. That makes sense. Sadly, this won't currently be possible in the current version of Pig, but this is a really good reason to want to do this. Can you make a ticket about making it possible to plug in ScriptingEngines without having a make a code change to Pig? I think this would be useful for this reason. That said, if you dig down into how these implementations work, they are based on EvalFunc's, so manually making UDF's to do it is an annoyance, but functionally quite similar. Question about R: is there a JVM implementation, or are you shelling out? 2013/1/22 Connor Woodson cwoodson@gmail.com I'm starting work on an R scripting engine; I'm not entirely sure how it will be used, but I know that there have been attempts to get R working with MapReduce / EMR and I thought it would be cool to do that through Pig. (One fun use case might be to generate plots/graphs during the MR job (then do something with them)) The easy answer for how to get this working with Pig is to just stick new scripting engines with the existing ones and update the ScriptingEngine enum to include those; however, I would like to use this in EMR which doesn't update its software regularly and so I was hoping there was some hook to get this scripting engine called, but it looks like it'll just have to be used for UDFs for now. If a change is going to be made, I think what would be helpful is a change in how the ScriptingEngine decides which subclass to call; right now (from what I can tell) it will only look at the file suffix or the #! first line of the script and try and match those with its internal list. Maybe allow an annotation like #@ FQCN of a ScriptingEngine as the first line of a script to force Pig to use a specific engine. - Connor On Tue, Jan 22, 2013 at 3:56 PM, Jonathan Coveney jcove...@gmail.com wrote: So, something like this is not currently possible, but I think it would be possible to expose a set of interfaces that would make this possible. That said, why is this desirable? Is your goal to override one of the existing SE's, or something? I could imagine reworking things so that anyone can register an arbitrary SE, and then we can implement the current SE's in terms of that interface. That said, I'm not sure of a compelling reason to do this, and would love a use case. I worked on the JRuby implementation and reviewed the Groovy one and think that we could be doing a lot more with scripting languages, so you have my attention. 2013/1/21 Connor Woodson cwoodson@gmail.com I want to write a custom scripting engine and I would like to not have to modify the enum in ScriptingEngine.java to get it to work both in the 'register' command for UDFs, but also for embedded scripts. From what I can tell, the former is possible by passing in a FQCN to the register command instead of one of the keywords; however, I can't tell if it is possible to get Pig to run my scripting engine when I pass it a non-pig file
[jira] [Created] (PIG-3134) Allow support for hot-pluggable Script Engines for running embedded Pig
Connor Woodson created PIG-3134: --- Summary: Allow support for hot-pluggable Script Engines for running embedded Pig Key: PIG-3134 URL: https://issues.apache.org/jira/browse/PIG-3134 Project: Pig Issue Type: Improvement Components: impl Reporter: Connor Woodson Priority: Minor Currently, you can embed Pig in Python, Ruby, Groovy, and Javascript. However you are unable to specify a custom Scripting Engine that deals with Pig embedded in something else. To solve this, Pig can either have a command-line option that specifies which scripting engine to use for running the provided file, or support should be added for something like the following syntax on the first line of a file: #@ FQCN of Script Engine that forces Pig to use the specified Scripting Engine to read the file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3135) HExecutionEngine should look for resources in user passed Properties
Prashant Kommireddi created PIG-3135: Summary: HExecutionEngine should look for resources in user passed Properties Key: PIG-3135 URL: https://issues.apache.org/jira/browse/PIG-3135 Project: Pig Issue Type: Bug Affects Versions: 0.10.0 Reporter: Prashant Kommireddi Looking at this snippet: {code} private void init(Properties properties) throws ExecException { . . . // Check existence of hadoop-site.xml or core-site.xml Configuration testConf = new Configuration(); ClassLoader cl = testConf.getClassLoader(); URL hadoop_site = cl.getResource( HADOOP_SITE ); URL core_site = cl.getResource( CORE_SITE ); if( hadoop_site == null core_site == null ) { throw new ExecException(Cannot find hadoop configurations in classpath (neither hadoop-site.xml nor core-site.xml was found in the classpath). + If you plan to use local mode, please put -x local option in command line, 4010); } {code} This assumes the resources (*-site.xml) are set on the classpath, but this will not always be the case when run with Pig's Java APIs. One could want to programatically set the resources and the code here should additionally check if they are available in there. Example: When a Configuration object is created and resources are added before passing it on to Pig. {code} Configuration conf = new Configuration(false); conf.addResource(foo/core-site.xml); conf.addResource(bar/hadoop-site.xml); PigServer pServer = new PigServer(ExecType.MAPREDUCE, conf); {code} The above conf is not used right now to obtain resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3124) Push FLATTENs After FILTERs If Possible
[ https://issues.apache.org/jira/browse/PIG-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560304#comment-13560304 ] Nick White commented on PIG-3124: - Indeed - Daniel pointed out the code changes weren't necessary as this optimisation is done by a different Rule. I've left a patch with comments to that effect so it's clearer for the next person reading the code :) Push FLATTENs After FILTERs If Possible --- Key: PIG-3124 URL: https://issues.apache.org/jira/browse/PIG-3124 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.10.0 Reporter: Nick White Assignee: Nick White Fix For: 0.12 Attachments: PIG-3124.0.patch, PIG-3124.1.patch When optimizing a logical plan, it's safe to push a FLATTEN after a FILTER if the columns being flattened don't occur in the expression that the filter is being done on. When the FILTER comes first the FLATTEN generates fewer rows (usually), and so is more efficient. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3071) update hcatalog jar and path to hbase storage handler jar in pig script
[ https://issues.apache.org/jira/browse/PIG-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560315#comment-13560315 ] Roman Shaposhnik commented on PIG-3071: --- Guys, this is extremely useful! Question though: what's the recommended way of dealing with hive.metastore.uris for a permanent deployment? I am looking at the docs and it feels like that property should just be picked up from Hive configuration. Am I missing anything? update hcatalog jar and path to hbase storage handler jar in pig script --- Key: PIG-3071 URL: https://issues.apache.org/jira/browse/PIG-3071 Project: Pig Issue Type: Bug Reporter: Arpit Gupta Assignee: Arpit Gupta Labels: hcatalog Fix For: 0.12 Attachments: PIG-3071.patch, PIG-3071.patch, PIG-3071.patch Due to changes in hcatalog 0.5 packaging we need to update the hcatalog jar name and the path to the hbase storage handler jar. We also need to add the pig storage adapter to the class path pig script should be updated to work with either version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3071) update hcatalog jar and path to hbase storage handler jar in pig script
[ https://issues.apache.org/jira/browse/PIG-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560326#comment-13560326 ] Rohini Palaniswamy commented on PIG-3071: - Yes it should be picked from hive-site.xml. update hcatalog jar and path to hbase storage handler jar in pig script --- Key: PIG-3071 URL: https://issues.apache.org/jira/browse/PIG-3071 Project: Pig Issue Type: Bug Reporter: Arpit Gupta Assignee: Arpit Gupta Labels: hcatalog Fix For: 0.12 Attachments: PIG-3071.patch, PIG-3071.patch, PIG-3071.patch Due to changes in hcatalog 0.5 packaging we need to update the hcatalog jar name and the path to the hbase storage handler jar. We also need to add the pig storage adapter to the class path pig script should be updated to work with either version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3005) TestLargeFile#testOrderBy is failing
[ https://issues.apache.org/jira/browse/PIG-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560330#comment-13560330 ] Rohini Palaniswamy commented on PIG-3005: - bq. I think we can close this as the test is excluded anyway. We should probably move it out of 0.11 but not close it. We need to find the issue and fix it or if the test case is not correct remove it. TestLargeFile#testOrderBy is failing Key: PIG-3005 URL: https://issues.apache.org/jira/browse/PIG-3005 Project: Pig Issue Type: Bug Environment: Mac OSX 10.6.8 Reporter: Jonathan Coveney Fix For: 0.12 When run locally, at least, this test is failing for me. Has anyone else noticed this failing? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3126) Problem in STORE
[ https://issues.apache.org/jira/browse/PIG-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560396#comment-13560396 ] Vishnu Ganth commented on PIG-3126: --- Hadoop 2.0.0 Problem in STORE Key: PIG-3126 URL: https://issues.apache.org/jira/browse/PIG-3126 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.2 Environment: CentOS 5.7 Reporter: Vishnu Ganth Attachments: log.txt A = Load 'sample'; store A into '/user/xyz/sample-out'; When this pig script is run using abc user who does not have write permission in '/user/xyz', PIG is unable to create the directory sample-out and the map-reduce job gets killed ultimately without any log. PIG should throw some error log saying permission denied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira