[jira] Created: (PIG-1578) PigServer.executeBatch does not return status of failed job

2010-08-28 Thread Thejas M Nair (JIRA)
PigServer.executeBatch does not return status of failed job
---

 Key: PIG-1578
 URL: https://issues.apache.org/jira/browse/PIG-1578
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair


For failed job PigServer.executeBatch does not return ExecJob . 
ExecJobs are created using output statistics, and the output statistics for 
jobs that failed does not seem to exist.

The query i tried was a native mapreduce job, where the output file of the 
native mr job already exists causing that job to fail.
{code}
A = load '" + INPUT_FILE + "';
B = mapreduce '" + jarFileName + "' " +
"Store A into 'table_testNativeMRJobSimple_input' "+
"Load 'table_testNativeMRJobSimple_output' "+
"`WordCount table_testNativeMRJobSimple_input " + INPUT_FILE + 
"`;");
Store B into 'table_testNativeMRJobSimpleDir';);
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1578) PigServer.executeBatch does not return status of failed job

2010-08-28 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1578:
---

Fix Version/s: 0.8.0

> PigServer.executeBatch does not return status of failed job
> ---
>
> Key: PIG-1578
> URL: https://issues.apache.org/jira/browse/PIG-1578
> Project: Pig
>  Issue Type: Bug
>Reporter: Thejas M Nair
> Fix For: 0.8.0
>
>
> For failed job PigServer.executeBatch does not return ExecJob . 
> ExecJobs are created using output statistics, and the output statistics for 
> jobs that failed does not seem to exist.
> The query i tried was a native mapreduce job, where the output file of the 
> native mr job already exists causing that job to fail.
> {code}
> A = load '" + INPUT_FILE + "';
> B = mapreduce '" + jarFileName + "' " +
> "Store A into 'table_testNativeMRJobSimple_input' "+
> "Load 'table_testNativeMRJobSimple_output' "+
> "`WordCount table_testNativeMRJobSimple_input " + INPUT_FILE + 
> "`;");
> Store B into 'table_testNativeMRJobSimpleDir';);
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1458) aggregate files for replicated join

2010-08-28 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903898#action_12903898
 ] 

Thejas M Nair commented on PIG-1458:


What i described under 'A note about the 2nd case described in first comment -' 
in previous comment is a change that can be done as part of a separate jira.


> aggregate files for replicated join
> ---
>
> Key: PIG-1458
> URL: https://issues.apache.org/jira/browse/PIG-1458
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1458.patch
>
>
> We have noticed that if the smaller data in replicated join has many files, 
> this puts  unneeded burden on the name node. pre-aggregating the files can 
> improve the situation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1458) aggregate files for replicated join

2010-08-28 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903895#action_12903895
 ] 

Thejas M Nair commented on PIG-1458:


Another comment about the patch -
- The test testUnknownNumMaps2 is same as testUnknownNumMaps, it should be 
removed .


A note about the 2nd case described in first comment -
bq. 2.  The right input is a map-only job and input files do not exist at the 
compile time.

When the input does not exist for the input map-only job, in most(/all ?) cases 
it would be possible to determine the number of files by looking at the 
previous MR operator (or ones before that).
Also, with current implementation, since the checks for number of files are 
being done before the MR jobs are merged together, there will be cases where 
the final plan has only one MR job with existing input for the replicated input 
and pig still considers it as a case 2.

The example used in testUnknownNumMaps() has only one input MR job with inputs 
that exist at compile time, but if pig.frjoin.merge.files.optimistic=false, it 
will create an additional MR job that combines the input -
{code}
A = LOAD '" + INPUT_FILE + "' as (x:int,y:int);
B = Filter A by x < 50;
C = join A by $0, B by $0 using 'repl';
{code}


> aggregate files for replicated join
> ---
>
> Key: PIG-1458
> URL: https://issues.apache.org/jira/browse/PIG-1458
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1458.patch
>
>
> We have noticed that if the smaller data in replicated join has many files, 
> this puts  unneeded burden on the name node. pre-aggregating the files can 
> improve the situation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1458) aggregate files for replicated join

2010-08-28 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903892#action_12903892
 ] 

Thejas M Nair commented on PIG-1458:


+1
Looks good. Some minor comments - 

- If the preceding op is a native MR job (for native mapreduce operator), we 
don't know how many reducers will be run , pig should use the  
pig.frjoin.merge.files.optimistic property in that case. For native mr job, the 
map plan will be empty, so currently the check for number of roots will return 
false.

- If one input file has been found to have several files, we can stop there, 
and avoid checking other files.
{code}
  } else if (!frJoinOptimisticFileMerge) {
// file doesn't exist yet. Treat it as having too many
// files
numFiles = frJoinFileMergeThreshold;  
  }
{code}
{code}
  } else if (!frJoinOptimisticFileMerge) {
// file doesn't exist yet. Treat it as having too many
// files
   return true;
  }
{code}



> aggregate files for replicated join
> ---
>
> Key: PIG-1458
> URL: https://issues.apache.org/jira/browse/PIG-1458
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1458.patch
>
>
> We have noticed that if the smaller data in replicated join has many files, 
> this puts  unneeded burden on the name node. pre-aggregating the files can 
> improve the situation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Pig optimizer

2010-08-28 Thread Renato Marroquín Mogrovejo
Hi Daniel,

Yeah that is, but like there are two types of optimizations right? I mean
physical and logical optimizations. The physical ones are the ones on how
the operators are distributed along mapreduce jobs and the logical ones are
the ones similar to relational algebra right?
Do you have any tips on how to get a quick grasp on pig logical
optimizations?
Thanks again.


Renato M.


2010/8/26 Daniel Dai 

> Hi, Renato,
> I think you are talking about how we organize different operators into
> map-reduce jobs. Unfortunately there is no document currently. Basically we
> will put as much operators into one map-reduce job as possible.
> Co-group/Group, Join, Order, Distinct, Cross, Stream will create a
> map-reduce boundary; Most others we will put into existing jobs. The main
> logic is inside MRCompiler.java.
>
>
> Daniel
>
> Renato Marroquín Mogrovejo wrote:
>
>> Anyone, please?
>>
>> Renato M.
>>
>> 2010/8/24 Renato Marroquín Mogrovejo 
>>
>>
>>
>>> Hi Daniel,
>>>
>>> Thanks, but that was not what I was actually looking. What I want to know
>>> is for example, how the optimizer work when the bags' logical plans are
>>> combined, or if all commands are reduced at the end to CO-GROUP commands,
>>> how is this handled? I know from Pig's paper that the ORDER, and LOAD,
>>> commands generate new MapReduce jobs, are there any optimizations for the
>>> physical plans?
>>> Thanks in advanced.
>>>
>>>
>>> Renato M.
>>>
>>> 2010/8/23 Daniel Dai 
>>>
>>> Hi, Renato,
>>>
>>>
 There is a description of optimization rule in Pig Latin reference menu:

 http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref1.html#Optimization+Rules
 .
 Is that enough?

 Daniel


 Renato Marroquín Mogrovejo wrote:



> Hey everyone, I was wondering if anybody has any references or
> suggestion
> on
> how to learn about Pig's optimizer besides the source code or Pig's
> paper.
> Thanks in advance.
>
>
> Renato M.
>
>
>
>


>>>
>


[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-08-28 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903803#action_12903803
 ] 

Daniel Dai commented on PIG-1178:
-

test-patch result for PIG-11780-8:

 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

Patch committed.

> LogicalPlan and Optimizer are too complex and hard to work with
> ---
>
> Key: PIG-1178
> URL: https://issues.apache.org/jira/browse/PIG-1178
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: expressions-2.patch, expressions.patch, lp.patch, 
> lp.patch, PIG-1178-4.patch, PIG-1178-5.patch, PIG-1178-6.patch, 
> PIG-1178-7.patch, PIG-1178-8.patch, pig_1178.patch, pig_1178.patch, 
> PIG_1178.patch, pig_1178_2.patch, pig_1178_3.2.patch, pig_1178_3.3.patch, 
> pig_1178_3.4.patch, pig_1178_3.patch
>
>
> The current implementation of the logical plan and the logical optimizer in 
> Pig has proven to not be easily extensible. Developer feedback has indicated 
> that adding new rules to the optimizer is quite burdensome. In addition, the 
> logical plan has been an area of numerous bugs, many of which have been 
> difficult to fix. Developers also feel that the logical plan is difficult to 
> understand and maintain. The root cause for these issues is that a number of 
> design decisions that were made as part of the 0.2 rewrite of the front end 
> have now proven to be sub-optimal. The heart of this proposal is to revisit a 
> number of those proposals and rebuild the logical plan with a simpler design 
> that will make it much easier to maintain the logical plan as well as extend 
> the logical optimizer. 
> See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
> details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.