[jira] Commented: (PIG-506) Does pig need a NATIVE keyword?

2010-08-27 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903263#action_12903263
 ] 

Daniel Dai commented on PIG-506:


Patch looks good. One minor comment, PlanHelper.LoadStoreFinder may better be 
PlanHelper.LoadStoreNativeFinder.

 Does pig need a NATIVE keyword?
 ---

 Key: PIG-506
 URL: https://issues.apache.org/jira/browse/PIG-506
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Alan Gates
Assignee: Aniket Mokashi
Priority: Minor
 Fix For: 0.8.0

 Attachments: NativeImplInitial.patch, NativeMapReduceFinale1.patch, 
 NativeMapReduceFinale2.patch, NativeMapReduceFinale3.patch, PIG-506.2.patch, 
 PIG-506.3.patch, PIG-506.patch, TestWordCount.jar


 Assume a user had a job that broke easily into three pieces.  Further assume 
 that pieces one and three were easily expressible in pig, but that piece two 
 needed to be written in map reduce for whatever reason (performance, 
 something that pig could not easily express, legacy job that was too 
 important to change, etc.).  Today the user would either have to use map 
 reduce for the entire job or manually handle the stitching together of pig 
 and map reduce jobs.  What if instead pig provided a NATIVE keyword that 
 would allow the script to pass off the data stream to the underlying system 
 (in this case map reduce).  The semantics of NATIVE would vary by underlying 
 system.  In the map reduce case, we would assume that this indicated a 
 collection of one or more fully contained map reduce jobs, so that pig would 
 store the data, invoke the map reduce jobs, and then read the resulting data 
 to continue.  It might look something like this:
 {code}
 A = load 'myfile';
 X = load 'myotherfile';
 B = group A by $0;
 C = foreach B generate group, myudf(B);
 D = native (jar=mymr.jar, infile=frompig outfile=topig);
 E = join D by $0, X by $0;
 ...
 {code}
 This differs from streaming in that it allows the user to insert an arbitrary 
 amount of native processing, whereas streaming allows the insertion of one 
 binary.  It also differs in that, for streaming, data is piped directly into 
 and out of the binary as part of the pig pipeline.  Here the pipeline would 
 be broken, data written to disk, and the native block invoked, then data read 
 back from disk.
 Another alternative is to say this is unnecessary because the user can do the 
 coordination from java, using the PIgServer interface to run pig and calling 
 the map reduce job explicitly.  The advantages of the native keyword are that 
 the user need not be worried about coordination between the jobs, pig will 
 take care of it.  Also the user can make use of existing java applications 
 without being a java programmer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1574) Optimization rule PushUpFilter causes filter to be pushed up out joins

2010-08-27 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1574:
-

Attachment: (was: jira-1574-1.patch)

 Optimization rule PushUpFilter causes filter to be pushed up out joins
 --

 Key: PIG-1574
 URL: https://issues.apache.org/jira/browse/PIG-1574
 Project: Pig
  Issue Type: Bug
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.8.0


 The PushUpFilter optimization rule in the new logical plan moves the filter 
 up to one of the join branch. It does this aggressively by find an operator 
 that has all the projection UIDs. However, it didn't consider that the found 
 operator might be another join. If that join is outer, then we cannot simply 
 move the filter to one of its branches.
 As an example, the following script will be erroneously optimized:
 A = load 'myfile' as (d1:int);
 B = load 'anotherfile' as (d2:int);
 C = join A by d1 full outer, B by d2;
 D = load 'xxx' as (d3:int);
 E = join C by d1, D by d3;
 F = filter E by d1  5;
 G = store F into 'dummy';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1574) Optimization rule PushUpFilter causes filter to be pushed up out joins

2010-08-27 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1574:
-

Status: Patch Available  (was: Open)

 Optimization rule PushUpFilter causes filter to be pushed up out joins
 --

 Key: PIG-1574
 URL: https://issues.apache.org/jira/browse/PIG-1574
 Project: Pig
  Issue Type: Bug
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1574-1.patch


 The PushUpFilter optimization rule in the new logical plan moves the filter 
 up to one of the join branch. It does this aggressively by find an operator 
 that has all the projection UIDs. However, it didn't consider that the found 
 operator might be another join. If that join is outer, then we cannot simply 
 move the filter to one of its branches.
 As an example, the following script will be erroneously optimized:
 A = load 'myfile' as (d1:int);
 B = load 'anotherfile' as (d2:int);
 C = join A by d1 full outer, B by d2;
 D = load 'xxx' as (d3:int);
 E = join C by d1, D by d3;
 F = filter E by d1  5;
 G = store F into 'dummy';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1575) Complete the migration of optimization rule PushUpFilter including missing test cases

2010-08-27 Thread Xuefu Zhang (JIRA)
Complete the migration of optimization rule PushUpFilter including missing test 
cases
-

 Key: PIG-1575
 URL: https://issues.apache.org/jira/browse/PIG-1575
 Project: Pig
  Issue Type: Bug
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.8.0


The Optimization rule under the new logical plan, PushUpFilter, only does a 
subset of optimization scenarios compared to the same rule under the old 
logical plan. For instance, it only considers filter after join, but the old 
optimization also considers other operators such as CoGroup, Union, Cross, etc. 
The migration of the rule should be complete.

Also, the test cases created for testing the old PushUpFilter wasn't migrated 
to the new logical plan code base. It should be also migrated. (A few has been 
migrated in JIRA-1574.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1563) SUBSTRING function is broken

2010-08-27 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903578#action_12903578
 ] 

Olga Natkovich commented on PIG-1563:
-

I was able to make it successfully working (without wrapping) for the functions 
that have fixed number of arguments:

LAST_INDEX_OF
REPLACE
TRIM

I don't believe there is currently a way to make it work with variable number 
of args (even if the number of combinations is fixed.) Moreover, if we add the 
mapping table in this case, it breaks the case of typed data which is bad. This 
is the case with the remaining functions - INDEXOF and SPLIT.

So my suggestion is only to fix the first set of function and delay the rest to 
0.9 when we fix the mapping code.

Dmitry and others, are you ok with this? If so, I can update the patch to 
reflect this.




 SUBSTRING function is broken
 

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG_1563.patch


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1502) Document and track system limits

2010-08-27 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1502:


Fix Version/s: 0.9.0
   (was: 0.8.0)

 Document and track system limits
 

 Key: PIG-1502
 URL: https://issues.apache.org/jira/browse/PIG-1502
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
 Fix For: 0.9.0


 We need to be able to publsih what system limitations are to make sure that 
 Pig is used in the way it was intended and tested. For instance, if you 
 combine 30 joins in a single MR job (via multiquery) this might not work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1150) VAR() Variance UDF

2010-08-27 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903581#action_12903581
 ] 

Olga Natkovich commented on PIG-1150:
-

Dmitry, are you planning to add unit tests? Do we still want this in for 0.8? 
(Since it is going into piggybank, we can do this post branching but then we 
need to test in 2 places.)

 VAR() Variance UDF
 --

 Key: PIG-1150
 URL: https://issues.apache.org/jira/browse/PIG-1150
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.5.0
 Environment: UDF, written in Pig 0.5 contrib/
Reporter: Russell Jurney
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: var.patch


 I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates 
 variance in a distributed manner, based on the AVG() builtin.  It works by 
 calculating the count, sum and sum of squares, as described here: 
 http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
 Is this a worthwhile contribution?  Taking the square root of this value 
 using the contrib SQRT() function gives Standard Deviation, which is missing 
 from Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1549) Provide utility to construct CNF form of predicates

2010-08-27 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903591#action_12903591
 ] 

Olga Natkovich commented on PIG-1549:
-

I don't think this patch applies. can you regenerate the patch with svn diff 
from the latest code and also add unit tests, thanks

 Provide utility to construct CNF form of predicates
 ---

 Key: PIG-1549
 URL: https://issues.apache.org/jira/browse/PIG-1549
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.8.0
Reporter: Swati Jain
Assignee: Swati Jain
 Fix For: 0.8.0

 Attachments: 0001-Add-CNF-utility-class.patch


 Provide utility to construct CNF form of predicates

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1494) PIG Logical Optimization: Use CNF in PushUpFilter

2010-08-27 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903593#action_12903593
 ] 

Olga Natkovich commented on PIG-1494:
-

Can this be moved from 0.8 to 0.9 release since we are about to branch for 0.9?

 PIG Logical Optimization: Use CNF in PushUpFilter
 -

 Key: PIG-1494
 URL: https://issues.apache.org/jira/browse/PIG-1494
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Swati Jain
Assignee: Swati Jain
Priority: Minor
 Fix For: 0.8.0


 The PushUpFilter rule is not able to handle complicated boolean expressions.
 For example, SplitFilter rule is splitting one LOFilter into two by AND. 
 However it will not be able to split LOFilter if the top level operator is 
 OR. For example:
 *ex script:*
 A = load 'file_a' USING PigStorage(',') as (a1:int,a2:int,a3:int);
 B = load 'file_b' USING PigStorage(',') as (b1:int,b2:int,b3:int);
 C = load 'file_c' USING PigStorage(',') as (c1:int,c2:int,c3:int);
 J1 = JOIN B by b1, C by c1;
 J2 = JOIN J1 by $0, A by a1;
 D = *Filter J2 by ( (c1  10) AND (a3+b3  10) ) OR (c2 == 5);*
 explain D;
 In the above example, the PushUpFilter is not able to push any filter 
 condition across any join as it contains columns from all branches (inputs). 
 But if we convert this expression into Conjunctive Normal Form (CNF) then 
 we would be able to push filter condition c1 10 and c2 == 5 below both join 
 conditions. Here is the CNF expression for highlighted line:
 ( (c1  10) OR (c2 == 5) ) AND ( (a3+b3  10) OR (c2 ==5) )
 *Suggestion:* It would be a good idea to convert LOFilter's boolean 
 expression into CNF, it would then be easy to push parts (conjuncts) of the 
 LOFilter boolean expression selectively. We would also not require rule 
 SplitFilter anymore if we were to add this utility to rule PushUpFilter 
 itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1542) log level not propogated to MR task loggers

2010-08-27 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-1542:
---

Assignee: niraj rai

This will be looked at after the branch since this is a regression and we don't 
have time to do it now.

 log level not propogated to MR task loggers
 ---

 Key: PIG-1542
 URL: https://issues.apache.org/jira/browse/PIG-1542
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: niraj rai
 Fix For: 0.8.0


 Specifying -d DEBUG does not affect the logging of the MR tasks .
 This was fixed earlier in PIG-882 .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1543) IsEmpty returns the wrong value after using LIMIT

2010-08-27 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-1543:
---

Assignee: Daniel Dai

Daniel can you check if this is related to limit optimizer and if it was 
addressed with new optimizer. (This can be done post branch since it is a bug 
split.)

 IsEmpty returns the wrong value after using LIMIT
 -

 Key: PIG-1543
 URL: https://issues.apache.org/jira/browse/PIG-1543
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Justin Hu
Assignee: Daniel Dai
 Fix For: 0.8.0


 1. Two input files:
 1a: limit_empty.input_a
 1
 1
 1
 1b: limit_empty.input_b
 2
 2
 2.
 The pig script: limit_empty.pig
 -- A contains only 1's  B contains only 2's
 A = load 'limit_empty.input_a' as (a1:int);
 B = load 'limit_empty.input_a' as (b1:int);
 C =COGROUP A by a1, B by b1;
 D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), 
 COUNT(B);
 store D into 'limit_empty.output/d';
 -- After the script done, we see the right results:
 -- {(1),(1),(1)}   {}  1   0   3   0
 -- {} {(2),(2)}  0   1   0   2
 C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
 D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 
 0:1), COUNT(Alim), COUNT(Blim);
 store D1 into 'limit_empty.output/d1';
 -- After the script done, we see the unexpected results:
 -- {(1)}   {}1   1   1   0
 -- {}  {(2)} 1   1   0   1
 dump D;
 dump D1;
 3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:
 The major one:
 IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while 
 IsEmpty() returns correctly in limit_empty.output/d/*.
 The difference is that one has been applied with LIMIT before using 
 IsEmpty().
 The minor one:
 The redirected output only contains the first dump:
 ({(1),(1),(1)},{},1,0,3L,0L)
 ({},{(2),(2)},0,1,0L,2L)
 We expect two more lines like:
 ({(1)},{},1,1,1L,0L)
 ({},{(2)},1,1,0L,1L)
 Besides, there is error says:
 [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - 
 java.lang.ClassCastException: java.lang.Integer cannot be cast to 
 org.apache.pig.data.Tuple

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1567) Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly

2010-08-27 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-1567:
---

Assignee: Xuefu Zhang

 Optimization rule FilterAboveForeach is too restrictive and doesn't handle 
 project * correctly
 --

 Key: PIG-1567
 URL: https://issues.apache.org/jira/browse/PIG-1567
 Project: Pig
  Issue Type: Bug
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.8.0


 FilterAboveForeach rule is to optimize the plan by pushing up filter above 
 previous foreach operator. However, during code review, two major problems 
 were found:
 1. Current implementation assumes that if no projection is found in the 
 filter condition then all columns from foreach are projected. This issue 
 prevents the following optimization:
   A = LOAD 'file.txt' AS (a(u,v), b, c);
   B = FOREACH A GENERATE $0, b;
   C = FILTER B BY 8  5;
   STORE C INTO 'empty';
 2. Current implementation doesn't handle * probjection, which means project 
 all columns. As a result, it wasn't able to optimize the following:
   A = LOAD 'file.txt' AS (a(u,v), b, c);
   B = FOREACH A GENERATE $0, b;
   C = FILTER B BY Identity.class.getName(*)  5;
   STORE C INTO 'empty';
   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1570) native mapreduce operator MR job does not follow same failure handling logic as other pig MR jobs

2010-08-27 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-1570:
---

Assignee: Thejas M Nair

 native mapreduce operator MR job does not follow same failure handling logic 
 as other pig MR jobs
 -

 Key: PIG-1570
 URL: https://issues.apache.org/jira/browse/PIG-1570
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0


 The code path for handling failure in MR job corresponding to native MR is 
 different and does not have the same behavior.
 For example, even if the MR job for mapreduce operator fails, the number of 
 jobs that failed is being reported as 0 in PigStats log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1572) change default datatype when relations are used as scalar to bytearray

2010-08-27 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-1572:
---

Assignee: Thejas M Nair

 change default datatype when relations are used as scalar to bytearray
 --

 Key: PIG-1572
 URL: https://issues.apache.org/jira/browse/PIG-1572
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0


 When relations are cast to scalar, the current default type is chararray. 
 This is inconsistent with the behavior in rest of pig-latin.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1567) Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly

2010-08-27 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved PIG-1567.
--

Resolution: Duplicate

Duplicate of PIG-1568.

 Optimization rule FilterAboveForeach is too restrictive and doesn't handle 
 project * correctly
 --

 Key: PIG-1567
 URL: https://issues.apache.org/jira/browse/PIG-1567
 Project: Pig
  Issue Type: Bug
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.8.0


 FilterAboveForeach rule is to optimize the plan by pushing up filter above 
 previous foreach operator. However, during code review, two major problems 
 were found:
 1. Current implementation assumes that if no projection is found in the 
 filter condition then all columns from foreach are projected. This issue 
 prevents the following optimization:
   A = LOAD 'file.txt' AS (a(u,v), b, c);
   B = FOREACH A GENERATE $0, b;
   C = FILTER B BY 8  5;
   STORE C INTO 'empty';
 2. Current implementation doesn't handle * probjection, which means project 
 all columns. As a result, it wasn't able to optimize the following:
   A = LOAD 'file.txt' AS (a(u,v), b, c);
   B = FOREACH A GENERATE $0, b;
   C = FILTER B BY Identity.class.getName(*)  5;
   STORE C INTO 'empty';
   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1150) VAR() Variance UDF

2010-08-27 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903634#action_12903634
 ] 

Dmitriy V. Ryaboy commented on PIG-1150:


I won't have time before the 30th. 

BTW one doesn't even need a udf if using the sum of squares approach.. :-) just 
generate the square and the sum in the foreach (it will perform the algebraic 
decomposition automatically)

 VAR() Variance UDF
 --

 Key: PIG-1150
 URL: https://issues.apache.org/jira/browse/PIG-1150
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.5.0
 Environment: UDF, written in Pig 0.5 contrib/
Reporter: Russell Jurney
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: var.patch


 I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates 
 variance in a distributed manner, based on the AVG() builtin.  It works by 
 calculating the count, sum and sum of squares, as described here: 
 http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
 Is this a worthwhile contribution?  Taking the square root of this value 
 using the contrib SQRT() function gives Standard Deviation, which is missing 
 from Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1563) SUBSTRING function is broken

2010-08-27 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903636#action_12903636
 ] 

Dmitriy V. Ryaboy commented on PIG-1563:


Sounds good.  Should we just merge in the amazon contrib for some of these?

 SUBSTRING function is broken
 

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG_1563.patch


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1150) VAR() Variance UDF

2010-08-27 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903637#action_12903637
 ] 

Olga Natkovich commented on PIG-1150:
-

So should we unlink this from the release?

 VAR() Variance UDF
 --

 Key: PIG-1150
 URL: https://issues.apache.org/jira/browse/PIG-1150
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.5.0
 Environment: UDF, written in Pig 0.5 contrib/
Reporter: Russell Jurney
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: var.patch


 I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates 
 variance in a distributed manner, based on the AVG() builtin.  It works by 
 calculating the count, sum and sum of squares, as described here: 
 http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
 Is this a worthwhile contribution?  Taking the square root of this value 
 using the contrib SQRT() function gives Standard Deviation, which is missing 
 from Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1563) SUBSTRING function is broken

2010-08-27 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903640#action_12903640
 ] 

Olga Natkovich commented on PIG-1563:
-

which JIRA is that?

I will just get this in - I think that's all I have time today but I can look 
at the other one as well next week

 SUBSTRING function is broken
 

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG_1563.patch


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1150) VAR() Variance UDF

2010-08-27 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903643#action_12903643
 ] 

Dmitriy V. Ryaboy commented on PIG-1150:


Yeah I think it's not a big deal if we are splitting piggybank out soon anyway.

 VAR() Variance UDF
 --

 Key: PIG-1150
 URL: https://issues.apache.org/jira/browse/PIG-1150
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.5.0
 Environment: UDF, written in Pig 0.5 contrib/
Reporter: Russell Jurney
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: var.patch


 I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates 
 variance in a distributed manner, based on the AVG() builtin.  It works by 
 calculating the count, sum and sum of squares, as described here: 
 http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
 Is this a worthwhile contribution?  Taking the square root of this value 
 using the contrib SQRT() function gives Standard Deviation, which is missing 
 from Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1563) SUBSTRING function is broken

2010-08-27 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903644#action_12903644
 ] 

Dmitriy V. Ryaboy commented on PIG-1563:


Olga, the amazon contrib is PIG-1565

 SUBSTRING function is broken
 

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG_1563.patch


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1150) VAR() Variance UDF

2010-08-27 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1150:


Fix Version/s: 0.9.0
   (was: 0.8.0)

 VAR() Variance UDF
 --

 Key: PIG-1150
 URL: https://issues.apache.org/jira/browse/PIG-1150
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.5.0
 Environment: UDF, written in Pig 0.5 contrib/
Reporter: Russell Jurney
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.9.0

 Attachments: var.patch


 I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates 
 variance in a distributed manner, based on the AVG() builtin.  It works by 
 calculating the count, sum and sum of squares, as described here: 
 http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
 Is this a worthwhile contribution?  Taking the square root of this value 
 using the contrib SQRT() function gives Standard Deviation, which is missing 
 from Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework

2010-08-27 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1512:


Status: Resolved  (was: Patch Available)
Resolution: Fixed

This is already fixed in the latest code. Thanks Swati!

 PlanPrinter does not print LOJoin operator in the new logical optimization 
 framework
 

 Key: PIG-1512
 URL: https://issues.apache.org/jira/browse/PIG-1512
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Swati Jain
Assignee: Swati Jain
 Fix For: 0.8.0

 Attachments: printJoin.patch


 PlanPrinter does not print LOJoin relational operator. As such, the LOJoin 
 operator would not get printed when we do an explain.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1321) Logical Optimizer: Merge cascading foreach

2010-08-27 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1321:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed. Thanks Xuefu!

 Logical Optimizer: Merge cascading foreach
 --

 Key: PIG-1321
 URL: https://issues.apache.org/jira/browse/PIG-1321
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1321-2.patch, jira-1321-3.patch, pig-1321.patch


 We can merge consecutive foreach statement.
 Eg:
 b = foreach a generate a0#'key1' as b0, a0#'key2' as b1, a1;
 c = foreach b generate b0#'kk1', b0#'kk2', b1, a1;
 = c = foreach a generate a0#'key1'#'kk1', a0#'key1'#'kk2', a0#'key2', a1;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1321) Logical Optimizer: Merge cascading foreach

2010-08-27 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1321:


Attachment: jira-1321-3.patch

Repost the pre-condition:
1. two consecutive foreach statements.
2. the second foreach statement is a simple inner plan in which the ognly 
statement is a GENERATE statement. In other words, the second foreach statement 
must be something like FOREACH A GENERATE 
3. The first foreach statement cannot contain flatten due to its complexity
4. No 1st foreach output is referred more than once in second foreach, eg: B = 
foreach ; C = foreach B generate $0, $1, $0 will not be merged. The reason 
if we merge, $0 will be calculated twice, which defeat the benefit of merging.

All tests pass. test-patch result:
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

 Logical Optimizer: Merge cascading foreach
 --

 Key: PIG-1321
 URL: https://issues.apache.org/jira/browse/PIG-1321
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1321-2.patch, jira-1321-3.patch, pig-1321.patch


 We can merge consecutive foreach statement.
 Eg:
 b = foreach a generate a0#'key1' as b0, a0#'key2' as b1, a1;
 c = foreach b generate b0#'kk1', b0#'kk2', b1, a1;
 = c = foreach a generate a0#'key1'#'kk1', a0#'key1'#'kk2', a0#'key2', a1;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1515) Migrate logical optimization rule: PushDownForeachFlatten

2010-08-27 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1515:


Attachment: jira-1515-2.patch

All tests pass. 

test-patch result:

 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.


 Migrate logical optimization rule: PushDownForeachFlatten
 -

 Key: PIG-1515
 URL: https://issues.apache.org/jira/browse/PIG-1515
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1515-1.patch, jira-1515-2.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1515) Migrate logical optimization rule: PushDownForeachFlatten

2010-08-27 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1515:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed. Thanks Xuefu!

 Migrate logical optimization rule: PushDownForeachFlatten
 -

 Key: PIG-1515
 URL: https://issues.apache.org/jira/browse/PIG-1515
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1515-1.patch, jira-1515-2.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1399) Logical Optimizer: Expression optimizor rule

2010-08-27 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1399:
--

Attachment: PIG-1399.patch

Addressing the review comments except for not making several optimization rules 
since the ordering of the application of the rules is significant.

 Logical Optimizer: Expression optimizor rule
 

 Key: PIG-1399
 URL: https://issues.apache.org/jira/browse/PIG-1399
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Yan Zhou
 Fix For: 0.8.0

 Attachments: PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, 
 PIG-1399.patch, PIG-1399.patch


 We can optimize expression in several ways:
 1. Constant pre-calculation
 Example:
 B = filter A by a0  5+7;
 = B = filter A by a0  12;
 2. Boolean expression optimization
 Example:
 B = filter A by not (not(a05) or a10);
 = B = filter A by a05 and a=10;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-529) Want support for loading CSV files

2010-08-27 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-529.


Resolution: Duplicate

This is duplicate of PIG-1555 which has been resolved for Pig 0.8

 Want support for loading CSV files
 --

 Key: PIG-529
 URL: https://issues.apache.org/jira/browse/PIG-529
 Project: Pig
  Issue Type: New Feature
  Components: data
Reporter: Tom White

 Want to be able to load CSV data into Pig. This needs to handle quoting 
 correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-771) PigDump does not properly output Chinese UTF8 characters - they are displayed as question marks ??

2010-08-27 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-771.


Fix Version/s: 0.7.0
   Resolution: Fixed

PigDump is no longer supported

 PigDump does not properly output Chinese UTF8 characters - they are displayed 
 as question marks ??
 --

 Key: PIG-771
 URL: https://issues.apache.org/jira/browse/PIG-771
 Project: Pig
  Issue Type: Bug
Reporter: David Ciemiewicz
 Fix For: 0.7.0


 PigDump does not properly output Chinese UTF8 characters.
 The reason for this is that the function Tuple.toString() is called.
 DefaultTuple implements Tuple.toString() and it calls Object.toString() on 
 the opaque object d.
 Instead, I think that the code should be changed instead to call the new 
 DataType.toString() function.
 {code}
 @Override
 public String toString() {
 StringBuilder sb = new StringBuilder();
 sb.append('(');
 for (IteratorObject it = mFields.iterator(); it.hasNext();) {
 Object d = it.next();
 if(d != null) {
 if(d instanceof Map) {
 sb.append(DataType.mapToString((MapObject, Object)d));
 } else {
 sb.append(DataType.toString(d));  //  Change this one 
 line
 if(d instanceof Long) {
 sb.append(L);
 } else if(d instanceof Float) {
 sb.append(F);
 }
 }
 } else {
 sb.append();
 }
 if (it.hasNext())
 sb.append(,);
 }
 sb.append(')');
 return sb.toString();
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1482) Pig gets confused when more than one loader is involved

2010-08-27 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903714#action_12903714
 ] 

Thejas M Nair commented on PIG-1482:


Patch review comments -
- Schema.java
{code}
public FieldSchema(String a, Schema s, byte t)  throws 
FrontendException {
alias = a;
schema = s;
log.debug(t:  + t +  Bag:  + DataType.BAG +  tuple:  + 
DataType.TUPLE);

/*
 * The following check is removed because it may not be always 
true. As a matter of 
 * fact, the condition can be produced using other constructors 
anyway.
 *
if ((null != s)  !(DataType.isSchemaType(t))) {
int errCode = 1020;
throw new FrontendException(Only a BAG or TUPLE can have 
schemas. Got 
+ DataType.findTypeName(t), errCode, 
PigException.INPUT);
}
*/
{code}
I think some other code paths might be relying on this constructor for error 
checking. It would be safer to create another constructor with a check boolean 
argument 
{code}
public FieldSchema(String a, Schema s, byte t, boolean innerTypeCheck)  
{code}
 and call that from above constructor and from FieldSchema.copyAndLink(..)


- In LOStream.java.getSchema() mIsSchemaComputed is used to keep track of 
whether the fieldschema parents have been set.
I think it will be better to use a different variable for the purpose - it will 
be more readable, and also not likely to break any assumptions people are 
likely to make about this variable that is from the LogicalOperator class.

- TypeCheckingVisitor.java insertCastForUDF is called on input of udf , it 
seems like same logic should be used for other expressions as well (instead of 
insertCast(.) ). Also, insertCastForUDF(..) and insertCast(..) have only two 
lines different, we can share rest of the code.
{code}
private void insertCastForUDF(LOUserFunc udf,
FieldSchema fromFS, FieldSchema toFs, ExpressionOperator 
predecessor){

toFs.setParent( fromFS.canonicalName, predecessor );
insertCast(udf, toFs.type, toFs, predecessor);
}
{code}

- TypeCheckingVisitor.java In visit(LOCast), it seems like we can just pick any 
of the matching predecessor load functions, shouldn't we check if all the 
FuncSpec returned are the same ?
{code}
for( Map.EntryString, LogicalOperator entry : 
canonicalMap.entrySet() ) {
FuncSpec loadFuncSpec = getLoadFuncSpec( entry.getValue(), 
entry.getKey() );
cast.setLoadFuncSpec( loadFuncSpec );
}
{code}

- LOProject.java
the commented line can be removed -
{code}
+// mFieldSchema.setParent(fs.canonicalName, 
expressionOperator);
{code}






 Pig gets confused when more than one loader is involved
 ---

 Key: PIG-1482
 URL: https://issues.apache.org/jira/browse/PIG-1482
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ankur
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1482-final.patch, jira-1482-final.patch, 
 jira-1482-final.patch


 In case of two relations being loaded using different loader, joined, grouped 
 and projected, pig gets confused in trying to find appropriate loader for the 
 requested cast. Consider the following script :-
 A = LOAD 'data1' USING PigStorage() AS (s, m, l);
 B = FOREACH A GENERATE s#'k1' as v1, m#'k2' as v2, l#'k3' as v3;
 C = FOREACH B GENERATE v1, (v2 == 'v2' ? 1L : 0L) as v2:long, (v3 == 'v3' ? 1 
 :0) as v3:int;
 D = LOAD 'data2' USING TextLoader() AS (a);
 E = JOIN C BY v1, D BY a USING 'replicated';
 F = GROUP E BY (v1, a);
 G = FOREACH F GENERATE (chararray)group.v1, group.a;
 
 dump G;
 This throws the error, stack trace of which is in the next comment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1576) Difference in Semantics between Load statement in Pig and HDFS client on Command line

2010-08-27 Thread Viraj Bhat (JIRA)
Difference in Semantics between Load statement in Pig and HDFS client on 
Command line
-

 Key: PIG-1576
 URL: https://issues.apache.org/jira/browse/PIG-1576
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0, 0.6.0
Reporter: Viraj Bhat


Here is my directory structure on HDFS which I want to access using Pig. 
This is a sample, but in real use case I have more than 100 of these 
directories.
{code}
$ hadoop fs -ls /user/viraj/recursive/
Found 3 items
drwxr-xr-x   - viraj supergroup  0 2010-08-26 11:25 
/user/viraj/recursive/20080615
drwxr-xr-x   - viraj supergroup  0 2010-08-26 11:25 
/user/viraj/recursive/20080616
drwxr-xr-x   - viraj supergroup  0 2010-08-26 11:25 
/user/viraj/recursive/20080617
{code}
Using the command line I am access them using variety of options:
{code}
$ hadoop fs -ls /user/viraj/recursive/{200806}{15..17}/
-rw-r--r--   1 viraj supergroup   5791 2010-08-26 11:25 
/user/viraj/recursive/20080615/kv2.txt
-rw-r--r--   1 viraj supergroup   5791 2010-08-26 11:25 
/user/viraj/recursive/20080616/kv2.txt
-rw-r--r--   1 viraj supergroup   5791 2010-08-26 11:25 
/user/viraj/recursive/20080617/kv2.txt

$ hadoop fs -ls /user/viraj/recursive/{20080615..20080617}/

-rw-r--r--   1 viraj supergroup   5791 2010-08-26 11:25 
/user/viraj/recursive/20080615/kv2.txt

-rw-r--r--   1 viraj supergroup   5791 2010-08-26 11:25 
/user/viraj/recursive/20080616/kv2.txt

-rw-r--r--   1 viraj supergroup   5791 2010-08-26 11:25 
/user/viraj/recursive/20080617/kv2.txt
{code}

I have written a Pig script, all the below combination of load statements do 
not work?
{code}
--A = load '/user/viraj/recursive/{200806}{15..17}/' using PigStorage('\u0001') 
as (k:int, v:chararray);
A = load '/user/viraj/recursive/{20080615..20080617}/' using 
PigStorage('\u0001') as (k:int, v:chararray);
AL = limit A 10;
dump AL;
{code}

I get the following error in Pig 0.8
{noformat}
2010-08-27 16:34:27,704 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil 
- 1 map reduce job(s) failed!
2010-08-27 16:34:27,711 [main] INFO  org.apache.pig.tools.pigstats.PigStats - 
Script Statistics: 
HadoopVersion   PigVersion  UserId  StartedAt   FinishedAt  Features
0.20.2  0.8.0-SNAPSHOT  viraj   2010-08-27 16:34:24 2010-08-27 16:34:27 
LIMIT
Failed!
Failed Jobs:
JobId   Alias   Feature Message Outputs
N/A A,ALMessage: 
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to 
create input splits for: /user/viraj/recursive/{20080615..20080617}/
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
at 
org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
at 
org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input 
Pattern hdfs://localhost:9000/user/viraj/recursive/{20080615..20080617} matches 
0 files
at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:268)
... 7 more
hdfs://localhost:9000/tmp/temp241388470/tmp987803889,
{noformat}

The following works:
{code}
A = load '/user/viraj/recursive/{200806}{15,16,17}/' using PigStorage('\u0001') 
as (k:int, v:chararray);
AL = limit A 10;
dump AL;
{code}

Why is there an inconsistency between HDFS client and Pig?

Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1564) add support for multiple filesystems

2010-08-27 Thread Andrew Hitchcock (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903721#action_12903721
 ] 

Andrew Hitchcock commented on PIG-1564:
---

Hi all,

I think this patch is still useful. With current Pig trunk you can't CD between 
different filesystems. Example:

grunt pwd
hdfs://ip-10-218-57-248.ec2.internal:9000/user/hadoop
grunt cd s3://anhi-test-data/
2010-08-27 23:53:10,522 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
2999: Unexpected internal error. This file system object 
(hdfs://ip-10-218-57-248.ec2.internal:9000) does not support access to the 
request path 's3://anhi-test-data/' You possibly called FileSystem.get(conf) 
when you should of called FileSystem.get(uri, conf) to obtain a file system 
supporting your path.
Details at logfile: /home/hadoop/pig_1282952081120.log

This patch fixes that issue.

Andrew

 add support for multiple filesystems
 

 Key: PIG-1564
 URL: https://issues.apache.org/jira/browse/PIG-1564
 Project: Pig
  Issue Type: Improvement
Reporter: Andrew Hitchcock
 Attachments: PIG-1564-1.patch


 Currently you can't run Pig scripts that read data from one file system and 
 write it to another. Also, Grunt doesn't support CDing from one directory to 
 another on different file systems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1399) Logical Optimizer: Expression optimizor rule

2010-08-27 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1399:
--

Attachment: PIG-1399.patch

rebased on the latest trunk.

 Logical Optimizer: Expression optimizor rule
 

 Key: PIG-1399
 URL: https://issues.apache.org/jira/browse/PIG-1399
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Yan Zhou
 Fix For: 0.8.0

 Attachments: PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, 
 PIG-1399.patch, PIG-1399.patch, PIG-1399.patch


 We can optimize expression in several ways:
 1. Constant pre-calculation
 Example:
 B = filter A by a0  5+7;
 = B = filter A by a0  12;
 2. Boolean expression optimization
 Example:
 B = filter A by not (not(a05) or a10);
 = B = filter A by a05 and a=10;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1564) add support for multiple filesystems

2010-08-27 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903728#action_12903728
 ] 

Dmitriy V. Ryaboy commented on PIG-1564:


Andrew, does 'fs -cd s3://anhi-test-data/' work?

The cd command is also deprecated (though not marked as such) :)

 add support for multiple filesystems
 

 Key: PIG-1564
 URL: https://issues.apache.org/jira/browse/PIG-1564
 Project: Pig
  Issue Type: Improvement
Reporter: Andrew Hitchcock
 Attachments: PIG-1564-1.patch


 Currently you can't run Pig scripts that read data from one file system and 
 write it to another. Also, Grunt doesn't support CDing from one directory to 
 another on different file systems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1482) Pig gets confused when more than one loader is involved

2010-08-27 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1482:
-

Status: Open  (was: Patch Available)

 Pig gets confused when more than one loader is involved
 ---

 Key: PIG-1482
 URL: https://issues.apache.org/jira/browse/PIG-1482
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ankur
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1482-final-1.patch, jira-1482-final.patch, 
 jira-1482-final.patch, jira-1482-final.patch


 In case of two relations being loaded using different loader, joined, grouped 
 and projected, pig gets confused in trying to find appropriate loader for the 
 requested cast. Consider the following script :-
 A = LOAD 'data1' USING PigStorage() AS (s, m, l);
 B = FOREACH A GENERATE s#'k1' as v1, m#'k2' as v2, l#'k3' as v3;
 C = FOREACH B GENERATE v1, (v2 == 'v2' ? 1L : 0L) as v2:long, (v3 == 'v3' ? 1 
 :0) as v3:int;
 D = LOAD 'data2' USING TextLoader() AS (a);
 E = JOIN C BY v1, D BY a USING 'replicated';
 F = GROUP E BY (v1, a);
 G = FOREACH F GENERATE (chararray)group.v1, group.a;
 
 dump G;
 This throws the error, stack trace of which is in the next comment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1482) Pig gets confused when more than one loader is involved

2010-08-27 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1482:
-

Attachment: jira-1482-final-1.patch

 Pig gets confused when more than one loader is involved
 ---

 Key: PIG-1482
 URL: https://issues.apache.org/jira/browse/PIG-1482
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ankur
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1482-final-1.patch, jira-1482-final.patch, 
 jira-1482-final.patch, jira-1482-final.patch


 In case of two relations being loaded using different loader, joined, grouped 
 and projected, pig gets confused in trying to find appropriate loader for the 
 requested cast. Consider the following script :-
 A = LOAD 'data1' USING PigStorage() AS (s, m, l);
 B = FOREACH A GENERATE s#'k1' as v1, m#'k2' as v2, l#'k3' as v3;
 C = FOREACH B GENERATE v1, (v2 == 'v2' ? 1L : 0L) as v2:long, (v3 == 'v3' ? 1 
 :0) as v3:int;
 D = LOAD 'data2' USING TextLoader() AS (a);
 E = JOIN C BY v1, D BY a USING 'replicated';
 F = GROUP E BY (v1, a);
 G = FOREACH F GENERATE (chararray)group.v1, group.a;
 
 dump G;
 This throws the error, stack trace of which is in the next comment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1482) Pig gets confused when more than one loader is involved

2010-08-27 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1482:
-

Status: Patch Available  (was: Open)

Updated the patch based on the review comments.

For comments above, the one next to the last, the map should only contain one 
entry. Before the result is obtained, exception is thrown anytime two  
different loadfunspec's are found. It was done that way before.


 Pig gets confused when more than one loader is involved
 ---

 Key: PIG-1482
 URL: https://issues.apache.org/jira/browse/PIG-1482
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ankur
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1482-final-1.patch, jira-1482-final.patch, 
 jira-1482-final.patch, jira-1482-final.patch


 In case of two relations being loaded using different loader, joined, grouped 
 and projected, pig gets confused in trying to find appropriate loader for the 
 requested cast. Consider the following script :-
 A = LOAD 'data1' USING PigStorage() AS (s, m, l);
 B = FOREACH A GENERATE s#'k1' as v1, m#'k2' as v2, l#'k3' as v3;
 C = FOREACH B GENERATE v1, (v2 == 'v2' ? 1L : 0L) as v2:long, (v3 == 'v3' ? 1 
 :0) as v3:int;
 D = LOAD 'data2' USING TextLoader() AS (a);
 E = JOIN C BY v1, D BY a USING 'replicated';
 F = GROUP E BY (v1, a);
 G = FOREACH F GENERATE (chararray)group.v1, group.a;
 
 dump G;
 This throws the error, stack trace of which is in the next comment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1564) add support for multiple filesystems

2010-08-27 Thread Andrew Hitchcock (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903733#action_12903733
 ] 

Andrew Hitchcock commented on PIG-1564:
---

Nope:

grunt fs -cd s3://anhi-test-data/  
cd: Unknown command


Does that require a specific version of Hadoop to work (since it appears to be 
sending the call to Hadoop code)?

 add support for multiple filesystems
 

 Key: PIG-1564
 URL: https://issues.apache.org/jira/browse/PIG-1564
 Project: Pig
  Issue Type: Improvement
Reporter: Andrew Hitchcock
 Attachments: PIG-1564-1.patch


 Currently you can't run Pig scripts that read data from one file system and 
 write it to another. Also, Grunt doesn't support CDing from one directory to 
 another on different file systems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1577) support to variable number of arguments in UDF

2010-08-27 Thread Olga Natkovich (JIRA)
support to variable number of arguments in UDF
--

 Key: PIG-1577
 URL: https://issues.apache.org/jira/browse/PIG-1577
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Olga Natkovich
 Fix For: 0.9.0


In the current implementation, functionality that allows to map arguments to 
classes does not support functions with variable number of arguments. Also it 
does not support funtions that can have variable (but fixed in number) number 
of arguments. 

This causes problems for string UDFs such as CONCAT that can take an arbitrary 
number of arguments or TRIM that can take 1,2, or 3 arguments

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1563) SUBSTRING function is broken

2010-08-27 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1563:


Attachment: PIG_1563_v2.patch

 SUBSTRING function is broken
 

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG_1563.patch, PIG_1563_v2.patch


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1531) Pig gobbles up error messages

2010-08-27 Thread niraj rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1531:
---

Attachment: PIG_1531_2.patch

I have tried to accommodate all the recommendations from Ashutosh. I have 
changed the existing test case to validate the error message, in case the store 
directory exist. Writing test case for the case, when input file deos not exist 
was  more effort than fixing the actual fix. So, I verified it manually and 
they looked good.
Thanks
Niraj

 Pig gobbles up error messages
 -

 Key: PIG-1531
 URL: https://issues.apache.org/jira/browse/PIG-1531
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
Assignee: niraj rai
 Fix For: 0.8.0

 Attachments: PIG_1531.patch, PIG_1531_2.patch


 Consider the following. I have my own Storer implementing StoreFunc and I am 
 throwing FrontEndException (and other Exceptions derived from PigException) 
 in its various methods. I expect those error messages to be shown in error 
 scenarios. Instead Pig gobbles up my error messages and shows its own generic 
 error message like: 
 {code}
 010-07-31 14:14:25,414 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2116: Unexpected error. Could not validate the output specification for: 
 default.partitoned
 Details at logfile: /Users/ashutosh/workspace/pig/pig_1280610650690.log
 {code}
 Instead I expect it to display my error messages which it stores away in that 
 log file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1563) SUBSTRING function is broken

2010-08-27 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903744#action_12903744
 ] 

Olga Natkovich commented on PIG-1563:
-

Uploaded new patch which does the following:

(1) Adds mapping function for functions with fixed number of arguments: 
SUBSTRING, LAST_INDEX_OF, REPLACE,TRIM
(2) Left the rest of the functions alone which means that until 0.9 they will 
only work on typed data. CONCAT is in the same category
(3) Re-used applicable tests that Dmitry create, thanks!
(3) Added a couple of e2e tests to make sure that we test the mapping function 
as well

Please, review. 

We will keep the open till we address (2) in 0.9.



 SUBSTRING function is broken
 

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG_1563.patch, PIG_1563_v2.patch


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1563) SUBSTRING function is broken

2010-08-27 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903753#action_12903753
 ] 

Dmitriy V. Ryaboy commented on PIG-1563:


+1

question/comment -- any reason you discarded the new buildSimpleFuncSpec I 
wrote in the first iteration of this patch? I think it simplifies the code:

{code}
funcList.add(Utils.buildSimpleFuncSpec(
  this.getClass().getName(), DataType.CHARARRAY, DataType.CHARARRAY));
{code}

vs
{code}
Schema s = new Schema();
s.add(new Schema.FieldSchema(null, DataType.CHARARRAY));
s.add(new Schema.FieldSchema(null, DataType.CHARARRAY));
funcList.add(new FuncSpec(this.getClass().getName(), s));
{code}

 SUBSTRING function is broken
 

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG_1563.patch, PIG_1563_v2.patch


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-08-27 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1178:


Attachment: PIG-1178-8.patch

PIG-1178-8.patch fix TestPruneColumn.testMapKey3

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 lp.patch, PIG-1178-4.patch, PIG-1178-5.patch, PIG-1178-6.patch, 
 PIG-1178-7.patch, PIG-1178-8.patch, pig_1178.patch, pig_1178.patch, 
 PIG_1178.patch, pig_1178_2.patch, pig_1178_3.2.patch, pig_1178_3.3.patch, 
 pig_1178_3.4.patch, pig_1178_3.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1343) pig_log file missing even though Main tells it is creating one and an M/R job fails

2010-08-27 Thread niraj rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1343:
---

Status: Patch Available  (was: Open)

 pig_log file missing even though Main tells it is creating one and an M/R job 
 fails 
 

 Key: PIG-1343
 URL: https://issues.apache.org/jira/browse/PIG-1343
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: niraj rai
 Fix For: 0.8.0

 Attachments: 1343.patch, PIG-1343-1.patch, pig_1343_2.patch, 
 pig_1343_4.patch, PIG_1343_5.patch


 There is a particular case where I was running with the latest trunk of Pig.
 {code}
 $java -cp pig.jar:/home/path/hadoop20cluster org.apache.pig.Main testcase.pig
 [main] INFO  org.apache.pig.Main - Logging error messages to: 
 /homes/viraj/pig_1263420012601.log
 $ls -l pig_1263420012601.log
 ls: pig_1263420012601.log: No such file or directory
 {code}
 The job failed and the log file did not contain anything, the only way to 
 debug was to look into the Jobtracker logs.
 Here are some reasons which would have caused this behavior:
 1) The underlying filer/NFS had some issues. In that case do we not error on 
 stdout?
 2) There are some errors from the backend which are not being captured
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1562) Fix the version for the dependent packages for the maven

2010-08-27 Thread niraj rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1562:
---

Status: Patch Available  (was: Open)

 Fix the version for the dependent packages for the maven 
 -

 Key: PIG-1562
 URL: https://issues.apache.org/jira/browse/PIG-1562
 Project: Pig
  Issue Type: Bug
Reporter: niraj rai
Assignee: niraj rai
 Fix For: 0.8.0

 Attachments: PIG_1562_0.patch


 We need to fix the set version so that, version is properly set for the 
 dependent packages in the maven repository.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1531) Pig gobbles up error messages

2010-08-27 Thread niraj rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1531:
---

Status: Patch Available  (was: Open)

 Pig gobbles up error messages
 -

 Key: PIG-1531
 URL: https://issues.apache.org/jira/browse/PIG-1531
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
Assignee: niraj rai
 Fix For: 0.8.0

 Attachments: PIG_1531.patch, PIG_1531_2.patch


 Consider the following. I have my own Storer implementing StoreFunc and I am 
 throwing FrontEndException (and other Exceptions derived from PigException) 
 in its various methods. I expect those error messages to be shown in error 
 scenarios. Instead Pig gobbles up my error messages and shows its own generic 
 error message like: 
 {code}
 010-07-31 14:14:25,414 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2116: Unexpected error. Could not validate the output specification for: 
 default.partitoned
 Details at logfile: /Users/ashutosh/workspace/pig/pig_1280610650690.log
 {code}
 Instead I expect it to display my error messages which it stores away in that 
 log file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.