[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2010-09-08 Thread Robert Gibbon (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907205#action_12907205
 ] 

Robert Gibbon commented on PIG-366:
---

Still working on this. Some holidays and some life got in the way. 

I have stripped back a lot of functionality and focused on improving the script 
editor for now. When I did some simple tests I noticed that the 0.7 pig parser 
borks when confronted with %default instructions - maybe you already have a 
ticket for that.


I'll try to tidy up the code and attach it sometime tomorrow.




 PigPen - Eclipse plugin for a graphical PigLatin editor
 ---

 Key: PIG-366
 URL: https://issues.apache.org/jira/browse/PIG-366
 Project: Pig
  Issue Type: New Feature
Reporter: Shubham Chopra
Assignee: Daniel Dai
Priority: Minor
 Attachments: org.apache.pig.pigpen_0.0.1.jar, 
 org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, 
 pigpen.patch, pigPen.patch, PigPen.tgz


 This is an Eclipse plugin that provides a GUI that can help users create 
 PigLatin scripts and see the example generator outputs on the fly and submit 
 the jobs to hadoop clusters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2010-09-08 Thread Robert Gibbon (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Gibbon updated PIG-366:
--

Attachment: org.apache.pig.pigpen-0.7.0.tar.gz

A simplified editor for pig scripts with syntax highlighting and validation. 
More to come very soon.

 PigPen - Eclipse plugin for a graphical PigLatin editor
 ---

 Key: PIG-366
 URL: https://issues.apache.org/jira/browse/PIG-366
 Project: Pig
  Issue Type: New Feature
Reporter: Shubham Chopra
Assignee: Daniel Dai
Priority: Minor
 Attachments: org.apache.pig.pigpen-0.7.0.tar.gz, 
 org.apache.pig.pigpen_0.0.1.jar, org.apache.pig.pigpen_0.0.1.tgz, 
 org.apache.pig.pigpen_0.0.4.jar, pigpen.patch, pigPen.patch, PigPen.tgz


 This is an Eclipse plugin that provides a GUI that can help users create 
 PigLatin scripts and see the example generator outputs on the fly and submit 
 the jobs to hadoop clusters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-794) Use Avro serialization in Pig

2010-09-08 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907280#action_12907280
 ] 

Doug Cutting commented on PIG-794:
--

Jeff, please instead use current trunk or the 1.4.0 build that I expect to be 
released tomorrow (http://people.apache.org/~cutting/avro-1.4.0-rc4/).  There 
was a bug that caused a similar failure in the snapshot you're using, but that 
should only happen in multi-threaded applications, which I doubt yours is, but 
it's better to either test against trunk or a release so we don't chase ghosts.

Further, while debugging a DatumWriter and DatumReader, you might use a 
ValidatingEncoder and ValidatingDecoder to ensure that what you write and read 
conforms to your schema.  You might also test by reading and printing your data 
with GenericDatumReader to see that you've written what you meant to write.  If 
you've written data that does not conform to your declared schema then it 
cannot be read correctly.  If this is the case, we should attempt to improve 
the error message here.


 Use Avro serialization in Pig
 -

 Key: PIG-794
 URL: https://issues.apache.org/jira/browse/PIG-794
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Rakesh Setty
Assignee: Dmitriy V. Ryaboy
 Attachments: avro-0.1-dev-java_r765402.jar, AvroStorage.patch, 
 AvroStorage_2.patch, AvroStorage_3.patch, AvroStorage_4.patch, AvroTest.java, 
 jackson-asl-0.9.4.jar, PIG-794.patch


 We would like to use Avro serialization in Pig to pass data between MR jobs 
 instead of the current BinStorage. Attached is an implementation of 
 AvroBinStorage which performs significantly better compared to BinStorage on 
 our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1322) Logical Optimizer: change outer join into regular join

2010-09-08 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1322:


 Assignee: Xuefu Zhang  (was: Daniel Dai)
Fix Version/s: 0.9.0

 Logical Optimizer: change outer join into regular join
 --

 Key: PIG-1322
 URL: https://issues.apache.org/jira/browse/PIG-1322
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang
 Fix For: 0.9.0


 In some cases, we can change the outer join into a regular join. The benefit 
 is regular join is easier to optimize in subsequent optimization. 
 Example:
 C = join A by a0 LEFT OUTER, B by b0;
 D = filter C by b0  0;
 = 
 C = join A by a0, B by b0;
 D = filter C by b0  0;
 Because we made this change, so PushUpFilter rule can further push the filter 
 in front of regular join which otherwise cannot.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1437) [Optimization] Rewrite GroupBy-Foreach-flatten(group) to Distinct

2010-09-08 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1437:


 Assignee: Xuefu Zhang
Fix Version/s: 0.9.0

 [Optimization] Rewrite GroupBy-Foreach-flatten(group) to Distinct
 -

 Key: PIG-1437
 URL: https://issues.apache.org/jira/browse/PIG-1437
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
Assignee: Xuefu Zhang
Priority: Minor
 Fix For: 0.9.0


 Its possible to rewrite queries like this
 {code}
 A = load 'data' as (name,age);
 B = group A by (name,age);
 C = foreach B generate group.name, group.age;
 dump C;
 {code}
 or
 {code} 
 (name,age);
 B = group A by (name
 A = load 'data' as,age);
 C = foreach B generate flatten(group);
 dump C;
 {code}
 to
 {code}
 A = load 'data' as (name,age);
 B = distinct A;
 dump B;
 {code}
 This could only be done if no columns within the bags are referenced 
 subsequently in the script. Since in Pig-Hadoop world DISTINCT will be 
 executed more effeciently then group-by this will be a huge win. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.