[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor
[ https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907205#action_12907205 ] Robert Gibbon commented on PIG-366: --- Still working on this. Some holidays and some life got in the way. I have stripped back a lot of functionality and focused on improving the script editor for now. When I did some simple tests I noticed that the 0.7 pig parser borks when confronted with %default instructions - maybe you already have a ticket for that. I'll try to tidy up the code and attach it sometime tomorrow. PigPen - Eclipse plugin for a graphical PigLatin editor --- Key: PIG-366 URL: https://issues.apache.org/jira/browse/PIG-366 Project: Pig Issue Type: New Feature Reporter: Shubham Chopra Assignee: Daniel Dai Priority: Minor Attachments: org.apache.pig.pigpen_0.0.1.jar, org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, pigpen.patch, pigPen.patch, PigPen.tgz This is an Eclipse plugin that provides a GUI that can help users create PigLatin scripts and see the example generator outputs on the fly and submit the jobs to hadoop clusters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor
[ https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Gibbon updated PIG-366: -- Attachment: org.apache.pig.pigpen-0.7.0.tar.gz A simplified editor for pig scripts with syntax highlighting and validation. More to come very soon. PigPen - Eclipse plugin for a graphical PigLatin editor --- Key: PIG-366 URL: https://issues.apache.org/jira/browse/PIG-366 Project: Pig Issue Type: New Feature Reporter: Shubham Chopra Assignee: Daniel Dai Priority: Minor Attachments: org.apache.pig.pigpen-0.7.0.tar.gz, org.apache.pig.pigpen_0.0.1.jar, org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, pigpen.patch, pigPen.patch, PigPen.tgz This is an Eclipse plugin that provides a GUI that can help users create PigLatin scripts and see the example generator outputs on the fly and submit the jobs to hadoop clusters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-794) Use Avro serialization in Pig
[ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907280#action_12907280 ] Doug Cutting commented on PIG-794: -- Jeff, please instead use current trunk or the 1.4.0 build that I expect to be released tomorrow (http://people.apache.org/~cutting/avro-1.4.0-rc4/). There was a bug that caused a similar failure in the snapshot you're using, but that should only happen in multi-threaded applications, which I doubt yours is, but it's better to either test against trunk or a release so we don't chase ghosts. Further, while debugging a DatumWriter and DatumReader, you might use a ValidatingEncoder and ValidatingDecoder to ensure that what you write and read conforms to your schema. You might also test by reading and printing your data with GenericDatumReader to see that you've written what you meant to write. If you've written data that does not conform to your declared schema then it cannot be read correctly. If this is the case, we should attempt to improve the error message here. Use Avro serialization in Pig - Key: PIG-794 URL: https://issues.apache.org/jira/browse/PIG-794 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.2.0 Reporter: Rakesh Setty Assignee: Dmitriy V. Ryaboy Attachments: avro-0.1-dev-java_r765402.jar, AvroStorage.patch, AvroStorage_2.patch, AvroStorage_3.patch, AvroStorage_4.patch, AvroTest.java, jackson-asl-0.9.4.jar, PIG-794.patch We would like to use Avro serialization in Pig to pass data between MR jobs instead of the current BinStorage. Attached is an implementation of AvroBinStorage which performs significantly better compared to BinStorage on our benchmarks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1322) Logical Optimizer: change outer join into regular join
[ https://issues.apache.org/jira/browse/PIG-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1322: Assignee: Xuefu Zhang (was: Daniel Dai) Fix Version/s: 0.9.0 Logical Optimizer: change outer join into regular join -- Key: PIG-1322 URL: https://issues.apache.org/jira/browse/PIG-1322 Project: Pig Issue Type: Sub-task Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Xuefu Zhang Fix For: 0.9.0 In some cases, we can change the outer join into a regular join. The benefit is regular join is easier to optimize in subsequent optimization. Example: C = join A by a0 LEFT OUTER, B by b0; D = filter C by b0 0; = C = join A by a0, B by b0; D = filter C by b0 0; Because we made this change, so PushUpFilter rule can further push the filter in front of regular join which otherwise cannot. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1437) [Optimization] Rewrite GroupBy-Foreach-flatten(group) to Distinct
[ https://issues.apache.org/jira/browse/PIG-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1437: Assignee: Xuefu Zhang Fix Version/s: 0.9.0 [Optimization] Rewrite GroupBy-Foreach-flatten(group) to Distinct - Key: PIG-1437 URL: https://issues.apache.org/jira/browse/PIG-1437 Project: Pig Issue Type: Sub-task Components: impl Affects Versions: 0.7.0 Reporter: Ashutosh Chauhan Assignee: Xuefu Zhang Priority: Minor Fix For: 0.9.0 Its possible to rewrite queries like this {code} A = load 'data' as (name,age); B = group A by (name,age); C = foreach B generate group.name, group.age; dump C; {code} or {code} (name,age); B = group A by (name A = load 'data' as,age); C = foreach B generate flatten(group); dump C; {code} to {code} A = load 'data' as (name,age); B = distinct A; dump B; {code} This could only be done if no columns within the bags are referenced subsequently in the script. Since in Pig-Hadoop world DISTINCT will be executed more effeciently then group-by this will be a huge win. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.