date:20100827

[jira] Commented: (PIG-506) Does pig need a NATIVE keyword?

2010-08-27 Thread Daniel Dai (JIRA)

[
https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903263#action_12903263
]

Daniel Dai commented on PIG-506:

Patch looks good. One minor comment, PlanHelper.LoadStoreFinder may better be
PlanHelper.LoadStoreNativeFinder.

Does pig need a NATIVE keyword?
---

Key: PIG-506
URL: https://issues.apache.org/jira/browse/PIG-506
Project: Pig
Issue Type: New Feature
Components: impl
Reporter: Alan Gates
Assignee: Aniket Mokashi
Priority: Minor
Fix For: 0.8.0

Attachments: NativeImplInitial.patch, NativeMapReduceFinale1.patch,
NativeMapReduceFinale2.patch, NativeMapReduceFinale3.patch, PIG-506.2.patch,
PIG-506.3.patch, PIG-506.patch, TestWordCount.jar

Assume a user had a job that broke easily into three pieces. Further assume
that pieces one and three were easily expressible in pig, but that piece two
needed to be written in map reduce for whatever reason (performance,
something that pig could not easily express, legacy job that was too
important to change, etc.). Today the user would either have to use map
reduce for the entire job or manually handle the stitching together of pig
and map reduce jobs. What if instead pig provided a NATIVE keyword that
would allow the script to pass off the data stream to the underlying system
(in this case map reduce). The semantics of NATIVE would vary by underlying
system. In the map reduce case, we would assume that this indicated a
collection of one or more fully contained map reduce jobs, so that pig would
store the data, invoke the map reduce jobs, and then read the resulting data
to continue. It might look something like this:
{code}
A = load 'myfile';
X = load 'myotherfile';
B = group A by $0;
C = foreach B generate group, myudf(B);
D = native (jar=mymr.jar, infile=frompig outfile=topig);
E = join D by $0, X by $0;
...
{code}
This differs from streaming in that it allows the user to insert an arbitrary
amount of native processing, whereas streaming allows the insertion of one
binary. It also differs in that, for streaming, data is piped directly into
and out of the binary as part of the pig pipeline. Here the pipeline would
be broken, data written to disk, and the native block invoked, then data read
back from disk.
Another alternative is to say this is unnecessary because the user can do the
coordination from java, using the PIgServer interface to run pig and calling
the map reduce job explicitly. The advantages of the native keyword are that
the user need not be worried about coordination between the jobs, pig will
take care of it. Also the user can make use of existing java applications
without being a java programmer.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1574) Optimization rule PushUpFilter causes filter to be pushed up out joins

2010-08-27 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1574:
-

Attachment: (was: jira-1574-1.patch)

 Optimization rule PushUpFilter causes filter to be pushed up out joins
 --

 Key: PIG-1574
 URL: https://issues.apache.org/jira/browse/PIG-1574
 Project: Pig
  Issue Type: Bug
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.8.0


 The PushUpFilter optimization rule in the new logical plan moves the filter 
 up to one of the join branch. It does this aggressively by find an operator 
 that has all the projection UIDs. However, it didn't consider that the found 
 operator might be another join. If that join is outer, then we cannot simply 
 move the filter to one of its branches.
 As an example, the following script will be erroneously optimized:
 A = load 'myfile' as (d1:int);
 B = load 'anotherfile' as (d2:int);
 C = join A by d1 full outer, B by d2;
 D = load 'xxx' as (d3:int);
 E = join C by d1, D by d3;
 F = filter E by d1  5;
 G = store F into 'dummy';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1574) Optimization rule PushUpFilter causes filter to be pushed up out joins

2010-08-27 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1574:
-

Status: Patch Available  (was: Open)

 Optimization rule PushUpFilter causes filter to be pushed up out joins
 --

 Key: PIG-1574
 URL: https://issues.apache.org/jira/browse/PIG-1574
 Project: Pig
  Issue Type: Bug
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1574-1.patch


 The PushUpFilter optimization rule in the new logical plan moves the filter 
 up to one of the join branch. It does this aggressively by find an operator 
 that has all the projection UIDs. However, it didn't consider that the found 
 operator might be another join. If that join is outer, then we cannot simply 
 move the filter to one of its branches.
 As an example, the following script will be erroneously optimized:
 A = load 'myfile' as (d1:int);
 B = load 'anotherfile' as (d2:int);
 C = join A by d1 full outer, B by d2;
 D = load 'xxx' as (d3:int);
 E = join C by d1, D by d3;
 F = filter E by d1  5;
 G = store F into 'dummy';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1575) Complete the migration of optimization rule PushUpFilter including missing test cases

2010-08-27 Thread Xuefu Zhang (JIRA)

Complete the migration of optimization rule PushUpFilter including missing test 
cases
-

 Key: PIG-1575
 URL: https://issues.apache.org/jira/browse/PIG-1575
 Project: Pig
  Issue Type: Bug
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.8.0


The Optimization rule under the new logical plan, PushUpFilter, only does a 
subset of optimization scenarios compared to the same rule under the old 
logical plan. For instance, it only considers filter after join, but the old 
optimization also considers other operators such as CoGroup, Union, Cross, etc. 
The migration of the rule should be complete.

Also, the test cases created for testing the old PushUpFilter wasn't migrated 
to the new logical plan code base. It should be also migrated. (A few has been 
migrated in JIRA-1574.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

48 matches

Mail list logo