[jira] Created: (PIG-1641) Incorrect counters in local mode

2010-09-22 Thread Ashutosh Chauhan (JIRA)
Incorrect counters in local mode


 Key: PIG-1641
 URL: https://issues.apache.org/jira/browse/PIG-1641
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Ashutosh Chauhan


User report, not verified.

email

HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures
0.20.20.8.0-SNAPSHOTuser2010-09-21 19:25:582010-09-21 21:58:42  
  ORDER_BY

Success!

Job Stats (time in seconds):
JobIdMapsReducesMaxMapTimeMinMapTImeAvgMapTime
MaxReduceTimeMinReduceTimeAvgReduceTimeAliasFeatureOutputs
job_local_000100000000rawMAP_ONLY
job_local_000200000000rank_sortSAMPLER  
  
job_local_000300000000rank_sortORDER_BY 
   Processed/user_visits_table,

Input(s):
Successfully read 0 records from: Data/Raw/UserVisits.dat

Output(s):
Successfully stored 0 records in: Processed/user_visits_table


However, when I look in the output:

$ ls -lh Processed/user_visits_table/CG0/
total 15250760
-rwxrwxrwx  1 user  _lpoperator   7.3G Sep 21 21:58 part-0*

It read a 20G input file and generated some output...

/email

Is it that in local mode counters are not available? If so, instead of printing 
zeros we should print Information Unavailable or some such.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1636) Scalar fail if the scalar variable is generated by limit

2010-09-22 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913714#action_12913714
 ] 

Daniel Dai commented on PIG-1636:
-

test-patch result:
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

All tests pass.

 Scalar fail if the scalar variable is generated by limit
 

 Key: PIG-1636
 URL: https://issues.apache.org/jira/browse/PIG-1636
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1636-1.patch


 The following script fail:
 {code}
 a = load 'studenttab10k' as (name: chararray, age: int, gpa: float);
 b = group a all;
 c = foreach b generate SUM(a.age) as total;
 c1= limit c 1;
 d = foreach a generate name, age/(double)c1.total as d_sum;
 store d into '111';
 {code}
 The problem is we have a reference to c1 in d. In the optimizer, we push 
 limit before foreach, d still reference to limit, and we get the wrong schema 
 for the scalar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1636) Scalar fail if the scalar variable is generated by limit

2010-09-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1636.
-

Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to both trunk and 0.8 branch.

 Scalar fail if the scalar variable is generated by limit
 

 Key: PIG-1636
 URL: https://issues.apache.org/jira/browse/PIG-1636
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1636-1.patch


 The following script fail:
 {code}
 a = load 'studenttab10k' as (name: chararray, age: int, gpa: float);
 b = group a all;
 c = foreach b generate SUM(a.age) as total;
 c1= limit c 1;
 d = foreach a generate name, age/(double)c1.total as d_sum;
 store d into '111';
 {code}
 The problem is we have a reference to c1 in d. In the optimizer, we push 
 limit before foreach, d still reference to limit, and we get the wrong schema 
 for the scalar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1632) The core jar in the tarball contains the kitchen sink

2010-09-22 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913733#action_12913733
 ] 

Olga Natkovich commented on PIG-1632:
-

Hi Eli, thanks for the patch.

I don't think this is the approach we want to take. I think we should publish 
just core pig jar in maven since users have a way to pull the dependencies. 
However, as part of our release package we should include bundled pig.jar so 
that it works for users out of the box and they get exactly the version we have 
been testing for. I am fine if additionally we include the core jar as well if 
we do not do this already.

 The core jar in the tarball contains the kitchen sink 
 --

 Key: PIG-1632
 URL: https://issues.apache.org/jira/browse/PIG-1632
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.8.0, 0.9.0
Reporter: Eli Collins
 Fix For: site, 0.9.0

 Attachments: pig-1632-1.patch


 The core jar in the tarball contains the kitchen sink, it's not the same core 
 jar built by ant jar. This is problematic since other projects that want to 
 depend on the pig core jar just want pig core, but 
 pig-0.8.0-SNAPSHOT-core.jar in the tarball contains a bunch of other stuff 
 (hadoop, com.google, commons, etc) that may conflict with the packages also 
 on a user's classpath.
 {noformat}
 pig1 (trunk)$ jar tvf build/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l
 12
 pig1 (trunk)$ tar xvzf build/pig-0.8.0-SNAPSHOT.tar.gz
 ...
 pig1 (trunk)$ jar tvf pig-0.8.0-SNAPSHOT/pig-0.8.0-SNAPSHOT-core.jar |grep -v 
 pig|wc -l
 4819
 {noformat}
 How about restricting the core jar to just Pig classes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1641) Incorrect counters in local mode

2010-09-22 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913736#action_12913736
 ] 

Richard Ding commented on PIG-1641:
---

Hadoop counters are not available in local mode (PIG-1286).

So for now I propose that, in local mode,  Pig stats output is changed to 
something like the following:

{code} 
Job Stats (time in seconds):
JobId  Alias Feature Outputs
job_local_0001 raw MAP_ONLY
job_local_0002 rank_sort SAMPLER
job_local_0003 rank_sort ORDER_BY Processed/user_visits_table,

Input(s):
Successfully read records from: Data/Raw/UserVisits.dat

Output(s):
Successfully stored records in: Processed/user_visits_table
{code}

 Incorrect counters in local mode
 

 Key: PIG-1641
 URL: https://issues.apache.org/jira/browse/PIG-1641
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Ashutosh Chauhan

 User report, not verified.
 email
 HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures
 0.20.20.8.0-SNAPSHOTuser2010-09-21 19:25:582010-09-21 
 21:58:42ORDER_BY
 Success!
 Job Stats (time in seconds):
 JobIdMapsReducesMaxMapTimeMinMapTImeAvgMapTime
 MaxReduceTimeMinReduceTimeAvgReduceTimeAliasFeatureOutputs
 job_local_000100000000rawMAP_ONLY
 job_local_000200000000rank_sort
 SAMPLER
 job_local_000300000000rank_sort
 ORDER_BYProcessed/user_visits_table,
 Input(s):
 Successfully read 0 records from: Data/Raw/UserVisits.dat
 Output(s):
 Successfully stored 0 records in: Processed/user_visits_table
 However, when I look in the output:
 $ ls -lh Processed/user_visits_table/CG0/
 total 15250760
 -rwxrwxrwx  1 user  _lpoperator   7.3G Sep 21 21:58 part-0*
 It read a 20G input file and generated some output...
 /email
 Is it that in local mode counters are not available? If so, instead of 
 printing zeros we should print Information Unavailable or some such.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1632) The core jar in the tarball contains the kitchen sink

2010-09-22 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913738#action_12913738
 ] 

Eli Collins commented on PIG-1632:
--

Hey Olga,

Thanks for the feedback.Agree that we want the out of box experience to use 
the same versions of other jars we've been testing with, but shouldn't that 
happen by bundling the necessary jars in eg the lib directory rather than 
embedding all the jars inside the core pig jar?

If people want all the dependencies bundled into a single jar, how about I 
update the patch so the release has two jars: a pig.jar which is like the 
current one (has all the other jars bundled in) and a pig-core.jar which just 
has pig?

Thanks,
Eli 

 The core jar in the tarball contains the kitchen sink 
 --

 Key: PIG-1632
 URL: https://issues.apache.org/jira/browse/PIG-1632
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.8.0, 0.9.0
Reporter: Eli Collins
 Fix For: site, 0.9.0

 Attachments: pig-1632-1.patch


 The core jar in the tarball contains the kitchen sink, it's not the same core 
 jar built by ant jar. This is problematic since other projects that want to 
 depend on the pig core jar just want pig core, but 
 pig-0.8.0-SNAPSHOT-core.jar in the tarball contains a bunch of other stuff 
 (hadoop, com.google, commons, etc) that may conflict with the packages also 
 on a user's classpath.
 {noformat}
 pig1 (trunk)$ jar tvf build/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l
 12
 pig1 (trunk)$ tar xvzf build/pig-0.8.0-SNAPSHOT.tar.gz
 ...
 pig1 (trunk)$ jar tvf pig-0.8.0-SNAPSHOT/pig-0.8.0-SNAPSHOT-core.jar |grep -v 
 pig|wc -l
 4819
 {noformat}
 How about restricting the core jar to just Pig classes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1641) Incorrect counters in local mode

2010-09-22 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding reassigned PIG-1641:
-

Assignee: Richard Ding

 Incorrect counters in local mode
 

 Key: PIG-1641
 URL: https://issues.apache.org/jira/browse/PIG-1641
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Ashutosh Chauhan
Assignee: Richard Ding

 User report, not verified.
 email
 HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures
 0.20.20.8.0-SNAPSHOTuser2010-09-21 19:25:582010-09-21 
 21:58:42ORDER_BY
 Success!
 Job Stats (time in seconds):
 JobIdMapsReducesMaxMapTimeMinMapTImeAvgMapTime
 MaxReduceTimeMinReduceTimeAvgReduceTimeAliasFeatureOutputs
 job_local_000100000000rawMAP_ONLY
 job_local_000200000000rank_sort
 SAMPLER
 job_local_000300000000rank_sort
 ORDER_BYProcessed/user_visits_table,
 Input(s):
 Successfully read 0 records from: Data/Raw/UserVisits.dat
 Output(s):
 Successfully stored 0 records in: Processed/user_visits_table
 However, when I look in the output:
 $ ls -lh Processed/user_visits_table/CG0/
 total 15250760
 -rwxrwxrwx  1 user  _lpoperator   7.3G Sep 21 21:58 part-0*
 It read a 20G input file and generated some output...
 /email
 Is it that in local mode counters are not available? If so, instead of 
 printing zeros we should print Information Unavailable or some such.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1632) The core jar in the tarball contains the kitchen sink

2010-09-22 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913743#action_12913743
 ] 

Olga Natkovich commented on PIG-1632:
-

I am fine with your second proposal which is what I also suggested in my last 
comment. The first one makes it harder for the users to compile their UDFs

 The core jar in the tarball contains the kitchen sink 
 --

 Key: PIG-1632
 URL: https://issues.apache.org/jira/browse/PIG-1632
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.8.0, 0.9.0
Reporter: Eli Collins
 Fix For: site, 0.9.0

 Attachments: pig-1632-1.patch


 The core jar in the tarball contains the kitchen sink, it's not the same core 
 jar built by ant jar. This is problematic since other projects that want to 
 depend on the pig core jar just want pig core, but 
 pig-0.8.0-SNAPSHOT-core.jar in the tarball contains a bunch of other stuff 
 (hadoop, com.google, commons, etc) that may conflict with the packages also 
 on a user's classpath.
 {noformat}
 pig1 (trunk)$ jar tvf build/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l
 12
 pig1 (trunk)$ tar xvzf build/pig-0.8.0-SNAPSHOT.tar.gz
 ...
 pig1 (trunk)$ jar tvf pig-0.8.0-SNAPSHOT/pig-0.8.0-SNAPSHOT-core.jar |grep -v 
 pig|wc -l
 4819
 {noformat}
 How about restricting the core jar to just Pig classes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1632) The core jar in the tarball contains the kitchen sink

2010-09-22 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated PIG-1632:
-

Attachment: pig-1632-2.patch

Great. Patch attached. I verified the tarball produced by ant tar includes both 
a core jar that is just pig core and a pig jar that has everything. 

 The core jar in the tarball contains the kitchen sink 
 --

 Key: PIG-1632
 URL: https://issues.apache.org/jira/browse/PIG-1632
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.8.0, 0.9.0
Reporter: Eli Collins
 Fix For: site, 0.9.0

 Attachments: pig-1632-1.patch, pig-1632-2.patch


 The core jar in the tarball contains the kitchen sink, it's not the same core 
 jar built by ant jar. This is problematic since other projects that want to 
 depend on the pig core jar just want pig core, but 
 pig-0.8.0-SNAPSHOT-core.jar in the tarball contains a bunch of other stuff 
 (hadoop, com.google, commons, etc) that may conflict with the packages also 
 on a user's classpath.
 {noformat}
 pig1 (trunk)$ jar tvf build/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l
 12
 pig1 (trunk)$ tar xvzf build/pig-0.8.0-SNAPSHOT.tar.gz
 ...
 pig1 (trunk)$ jar tvf pig-0.8.0-SNAPSHOT/pig-0.8.0-SNAPSHOT-core.jar |grep -v 
 pig|wc -l
 4819
 {noformat}
 How about restricting the core jar to just Pig classes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1632) The core jar in the tarball contains the kitchen sink

2010-09-22 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913759#action_12913759
 ] 

Olga Natkovich commented on PIG-1632:
-

+ 1, patch looks good. I will commit it to trunk and 0.8 branch shortly

 The core jar in the tarball contains the kitchen sink 
 --

 Key: PIG-1632
 URL: https://issues.apache.org/jira/browse/PIG-1632
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.8.0, 0.9.0
Reporter: Eli Collins
 Fix For: site, 0.9.0

 Attachments: pig-1632-1.patch, pig-1632-2.patch


 The core jar in the tarball contains the kitchen sink, it's not the same core 
 jar built by ant jar. This is problematic since other projects that want to 
 depend on the pig core jar just want pig core, but 
 pig-0.8.0-SNAPSHOT-core.jar in the tarball contains a bunch of other stuff 
 (hadoop, com.google, commons, etc) that may conflict with the packages also 
 on a user's classpath.
 {noformat}
 pig1 (trunk)$ jar tvf build/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l
 12
 pig1 (trunk)$ tar xvzf build/pig-0.8.0-SNAPSHOT.tar.gz
 ...
 pig1 (trunk)$ jar tvf pig-0.8.0-SNAPSHOT/pig-0.8.0-SNAPSHOT-core.jar |grep -v 
 pig|wc -l
 4819
 {noformat}
 How about restricting the core jar to just Pig classes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1632) The core jar in the tarball contains the kitchen sink

2010-09-22 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913788#action_12913788
 ] 

Olga Natkovich commented on PIG-1632:
-

patch committed to both 0.8 branch and trunk. Thanks, Eli for contributing!

 The core jar in the tarball contains the kitchen sink 
 --

 Key: PIG-1632
 URL: https://issues.apache.org/jira/browse/PIG-1632
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.8.0, 0.9.0
Reporter: Eli Collins
 Fix For: site, 0.9.0

 Attachments: pig-1632-1.patch, pig-1632-2.patch


 The core jar in the tarball contains the kitchen sink, it's not the same core 
 jar built by ant jar. This is problematic since other projects that want to 
 depend on the pig core jar just want pig core, but 
 pig-0.8.0-SNAPSHOT-core.jar in the tarball contains a bunch of other stuff 
 (hadoop, com.google, commons, etc) that may conflict with the packages also 
 on a user's classpath.
 {noformat}
 pig1 (trunk)$ jar tvf build/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l
 12
 pig1 (trunk)$ tar xvzf build/pig-0.8.0-SNAPSHOT.tar.gz
 ...
 pig1 (trunk)$ jar tvf pig-0.8.0-SNAPSHOT/pig-0.8.0-SNAPSHOT-core.jar |grep -v 
 pig|wc -l
 4819
 {noformat}
 How about restricting the core jar to just Pig classes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1632) The core jar in the tarball contains the kitchen sink

2010-09-22 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-1632:
---

Assignee: Eli Collins

 The core jar in the tarball contains the kitchen sink 
 --

 Key: PIG-1632
 URL: https://issues.apache.org/jira/browse/PIG-1632
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.8.0, 0.9.0
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: site, 0.9.0

 Attachments: pig-1632-1.patch, pig-1632-2.patch


 The core jar in the tarball contains the kitchen sink, it's not the same core 
 jar built by ant jar. This is problematic since other projects that want to 
 depend on the pig core jar just want pig core, but 
 pig-0.8.0-SNAPSHOT-core.jar in the tarball contains a bunch of other stuff 
 (hadoop, com.google, commons, etc) that may conflict with the packages also 
 on a user's classpath.
 {noformat}
 pig1 (trunk)$ jar tvf build/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l
 12
 pig1 (trunk)$ tar xvzf build/pig-0.8.0-SNAPSHOT.tar.gz
 ...
 pig1 (trunk)$ jar tvf pig-0.8.0-SNAPSHOT/pig-0.8.0-SNAPSHOT-core.jar |grep -v 
 pig|wc -l
 4819
 {noformat}
 How about restricting the core jar to just Pig classes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1642) Order by doesn't use estimation to determine the parallelism

2010-09-22 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1642:
--

Summary: Order by doesn't use estimation to determine the parallelism  
(was: Order by doesn't use estimation to determine the paralelism)

 Order by doesn't use estimation to determine the parallelism
 

 Key: PIG-1642
 URL: https://issues.apache.org/jira/browse/PIG-1642
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Richard Ding
 Fix For: 0.8.0


 With PIG-1249, a simple heuristic is used to determine the number of reducers 
 if it isn't specified (via PARALLEL or default_parallel). For order by 
 statement, however, it still defaults to 1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1642) Order by doesn't use estimation to determine the paralelism

2010-09-22 Thread Richard Ding (JIRA)
Order by doesn't use estimation to determine the paralelism
---

 Key: PIG-1642
 URL: https://issues.apache.org/jira/browse/PIG-1642
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Richard Ding
 Fix For: 0.8.0


With PIG-1249, a simple heuristic is used to determine the number of reducers 
if it isn't specified (via PARALLEL or default_parallel). For order by 
statement, however, it still defaults to 1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'

2010-09-22 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913842#action_12913842
 ] 

Thejas M Nair commented on PIG-1643:


In case of replicated join, the error was - 
java.lang.NullPointerException
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.setUpHashMap(POFRJoin.java:343)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.getNext(POFRJoin.java:212)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)


 join fails for a query with input having 'load using pigstorage without 
 schema' + 'foreach'
 ---

 Key: PIG-1643
 URL: https://issues.apache.org/jira/browse/PIG-1643
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0


 {code}
 l1 = load 'std.txt';
 l2 = load 'std.txt'; 
 f1 = foreach l1 generate $0 as abc, $1 as  def;
 -- j =  join f1 by $0, l2 by $0 using 'replicated';
 -- j =  join l2 by $0, f1 by $0 using 'replicated';
 j =  join l2 by $0, f1 by $0 ;
 dump j;
 {code}
 the error -
 {code}
 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2044: The type null cannot be collected as a Key type
 {code}
 The MR plan from explain  -
 {code}
 #--
 # Map Reduce Plan  
 #--
 MapReduce node scope-21
 Map Plan
 Union[tuple] - scope-22
 |
 |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11
 |   |   |
 |   |   Project[bytearray][0] - scope-12
 |   |
 |   |---l2: 
 Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
  - scope-0
 |
 |---j: Local Rearrange[tuple]{NULL}(false) - scope-13
 |   |
 |   Project[NULL][0] - scope-14
 |
 |---f1: New For Each(false,false)[bag] - scope-6
 |   |
 |   Project[bytearray][0] - scope-2
 |   |
 |   Project[bytearray][1] - scope-4
 |
 |---l1: 
 Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
  - scope-1
 Reduce Plan
 j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18
 |
 |---POJoinPackage(true,true)[tuple] - scope-23
 Global sort: false
 
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



FW: ASF Board Meeting Summary - 22 September 2010

2010-09-22 Thread Olga Natkovich
Dear Pig Users and Developers,

ASF board just voted for Pig to become TLP. Please, see board notes below. Over 
the next several weeks we will be moving our infrastructure out of Hadoop. You 
can keep track of the progress by following this JIRA:

https://issues.apache.org/jira/browse/INFRA-3005.

Please, let me know if you have any questions.

Olga

-Original Message-
From: Doug Cutting [mailto:cutt...@apache.org] 
Sent: Wednesday, September 22, 2010 1:34 PM
To: committ...@apache.org
Subject: ASF Board Meeting Summary - 22 September 2010

The board met today, 22 September.

The following directors were present:

   Shane Curcuru
   Doug Cutting
   Bertrand Delacretaz
   Roy T. Fielding
   Jim Jagielski
   Geir Magnusson, Jr.
   Sam Ruby
   Noirin Shirley
   Greg Stein

the following officers were present:

   Philip M. Gollucci
   Craig L Russell

and the following guest was present:

   Les Hazlewood

All of the received reports to the board were approved.

The following reports were not received and are expected next month:

   Status report for the Apache ServiceMix Project
   Status report for the Apache Xalan Project
   Status report for the Apache XMLBeans Project

The following resolutions were passed unanimously:

   A. Establish the Apache Pig project
   B. Establish the Apache Hive project
   C. Establish Apache Shiro Project

The next board meeting is scheduled to occur on the 20 October.

Doug



[jira] Resolved: (PIG-1603) dependency created by 'relation as scalar' not captured in graph

2010-09-22 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair resolved PIG-1603.


Resolution: Won't Fix

This bug has been resolved in PIG-1605 .

 dependency created by 'relation as scalar' not captured in graph
 

 Key: PIG-1603
 URL: https://issues.apache.org/jira/browse/PIG-1603
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1603.1.patch, PIG-1603.2.patch


 The LogicalOperator that has a ReadScalar udf has a dependency on the 
 relation that is provides the input to scalar variables. But this is not 
 captured in the graph representation, and as a result DependencyOrderWalker 
 does not traverse the graph in the real dependency order.
 The testcase TestFRJoin2.testConcatenateJobForScalar3 fails as a result of 
 this issue. (It has been commented out for now.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1479) Embed Pig in scripting languages

2010-09-22 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913859#action_12913859
 ] 

Julien Le Dem commented on PIG-1479:


Using the file extension requires a registration mechanism (or hard coded list) 
so if it is supported it would be nice to be able to provide the class name of 
the scripting implementation as well.
I would like to use my own implementation of the scripting engine (let's say 
javascript) by specifying the class name in the command line.
similar to the mecanism for UDFs inclusion:
http://wiki.apache.org/pig/UDFsUsingScriptingLanguages
{quote}
Register 'test.py' using org.apache.pig.scripting.jython.JythonScriptEngine as 
myfuncs;
{quote}

 Embed Pig in scripting languages
 

 Key: PIG-1479
 URL: https://issues.apache.org/jira/browse/PIG-1479
 Project: Pig
  Issue Type: New Feature
Reporter: Julien Le Dem
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-1479.patch, PIG-1479_2.patch, pig-greek-test.tar, 
 pig-greek-test.tar, pig-greek.tgz


 It should be possible to embed Pig calls in a scripting language and let 
 functions defined in the same script available as UDFs.
 This is a spin off of https://issues.apache.org/jira/browse/PIG-928 which 
 lets users define UDFs in scripting languages.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1639) New logical plan: PushUpFilter should not optimize if filter condition contains UDF

2010-09-22 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1639:
-

Attachment: jira-1639-1.patch

 New logical plan: PushUpFilter should not optimize if filter condition 
 contains UDF
 ---

 Key: PIG-1639
 URL: https://issues.apache.org/jira/browse/PIG-1639
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1639-1.patch


 The following script fail:
 {code}
 a = load 'file' AS (f1, f2, f3);
 b = group a by f1;
 c = filter b by COUNT(a)  1;
 dump c;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1639) New logical plan: PushUpFilter should not optimize if filter condition contains UDF

2010-09-22 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1639:
-

Status: Patch Available  (was: Open)

 New logical plan: PushUpFilter should not optimize if filter condition 
 contains UDF
 ---

 Key: PIG-1639
 URL: https://issues.apache.org/jira/browse/PIG-1639
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1639-1.patch


 The following script fail:
 {code}
 a = load 'file' AS (f1, f2, f3);
 b = group a by f1;
 c = filter b by COUNT(a)  1;
 dump c;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1628) log this message at debug level : 'Pig Internal storage in use'

2010-09-22 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1628:
---

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to 0.8 branch and trunk.

 log this message at debug level : 'Pig Internal storage in use'
 ---

 Key: PIG-1628
 URL: https://issues.apache.org/jira/browse/PIG-1628
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1628.1.patch


 The temporary storage functions used are logging at the INFO level. This 
 should change to debug level, they are reducing the visibility of more useful 
 INFO messages. The messages include  'Pig Internal storage in use' from 
 InterStorage and  'TFile storage in use' from TFileStorage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1644) New logical plan: Plan.connect with position is misused in some places

2010-09-22 Thread Daniel Dai (JIRA)
New logical plan: Plan.connect with position is misused in some places
--

 Key: PIG-1644
 URL: https://issues.apache.org/jira/browse/PIG-1644
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0


When we replace/remove/insert a node, we will use disconnect/connect methods of 
OperatorPlan. When we disconnect an edge, we shall save the position of the 
edge in origination and destination, and use this position when connect to the 
new predecessor/successor. Some of the pattens are:

Insert a new node:
{code}
PairInteger, Integer pos = plan.disconnect(pred, succ);
plan.connect(pred, pos.first, newnode, 0);
plan.connect(newnode, 0, succ, pos.second);
{code}

Remove a node:
{code}
PairInteger, Integer pos1 = plan.disconnect(pred, nodeToRemove);
PairInteger, Integer pos2 = plan.disconnect(nodeToRemove, succ);
plan.connect(pred, pos1.first, succ, pos2.second);
{code}

Replace a node:
{code}
PairInteger, Integer pos1 = plan.disconnect(pred, nodeToReplace);
PairInteger, Integer pos2 = plan.disconnect(nodeToReplace, succ);
plan.connect(pred, pos1.first, newNode, pos1.second);
plan.connect(newNode, pos2.first, succ, pos2.second);
{code}

There are couple of places of we does not follow this pattern, that results 
some error. For example, the following script fail:
{code}
a = load '1.txt' as (a0, a1, a2, a3);
b = foreach a generate a0, a1, a2;
store b into 'aaa';
c = order b by a2;
d = foreach c generate a2;
store d into 'bbb';
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1565) additional piggybank datetime and string UDFs

2010-09-22 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913869#action_12913869
 ] 

Alan Gates commented on PIG-1565:
-

I'll review this patch.

 additional piggybank datetime and string UDFs
 -

 Key: PIG-1565
 URL: https://issues.apache.org/jira/browse/PIG-1565
 Project: Pig
  Issue Type: Improvement
Reporter: Andrew Hitchcock
Assignee: Andrew Hitchcock
 Fix For: 0.8.0

 Attachments: PIG-1565-1.patch, PIG-1565-2.patch


 Pig is missing a variety of UDFs that might be helpful for users implementing 
 Pig scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1644) New logical plan: Plan.connect with position is misused in some places

2010-09-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1644:


Attachment: (was: PIG-1644-1.patch)

 New logical plan: Plan.connect with position is misused in some places
 --

 Key: PIG-1644
 URL: https://issues.apache.org/jira/browse/PIG-1644
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1644-1.patch


 When we replace/remove/insert a node, we will use disconnect/connect methods 
 of OperatorPlan. When we disconnect an edge, we shall save the position of 
 the edge in origination and destination, and use this position when connect 
 to the new predecessor/successor. Some of the pattens are:
 Insert a new node:
 {code}
 PairInteger, Integer pos = plan.disconnect(pred, succ);
 plan.connect(pred, pos.first, newnode, 0);
 plan.connect(newnode, 0, succ, pos.second);
 {code}
 Remove a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToRemove);
 PairInteger, Integer pos2 = plan.disconnect(nodeToRemove, succ);
 plan.connect(pred, pos1.first, succ, pos2.second);
 {code}
 Replace a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToReplace);
 PairInteger, Integer pos2 = plan.disconnect(nodeToReplace, succ);
 plan.connect(pred, pos1.first, newNode, pos1.second);
 plan.connect(newNode, pos2.first, succ, pos2.second);
 {code}
 There are couple of places of we does not follow this pattern, that results 
 some error. For example, the following script fail:
 {code}
 a = load '1.txt' as (a0, a1, a2, a3);
 b = foreach a generate a0, a1, a2;
 store b into 'aaa';
 c = order b by a2;
 d = foreach c generate a2;
 store d into 'bbb';
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1644) New logical plan: Plan.connect with position is misused in some places

2010-09-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1644:


Attachment: PIG-1644-1.patch

 New logical plan: Plan.connect with position is misused in some places
 --

 Key: PIG-1644
 URL: https://issues.apache.org/jira/browse/PIG-1644
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1644-1.patch


 When we replace/remove/insert a node, we will use disconnect/connect methods 
 of OperatorPlan. When we disconnect an edge, we shall save the position of 
 the edge in origination and destination, and use this position when connect 
 to the new predecessor/successor. Some of the pattens are:
 Insert a new node:
 {code}
 PairInteger, Integer pos = plan.disconnect(pred, succ);
 plan.connect(pred, pos.first, newnode, 0);
 plan.connect(newnode, 0, succ, pos.second);
 {code}
 Remove a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToRemove);
 PairInteger, Integer pos2 = plan.disconnect(nodeToRemove, succ);
 plan.connect(pred, pos1.first, succ, pos2.second);
 {code}
 Replace a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToReplace);
 PairInteger, Integer pos2 = plan.disconnect(nodeToReplace, succ);
 plan.connect(pred, pos1.first, newNode, pos1.second);
 plan.connect(newNode, pos2.first, succ, pos2.second);
 {code}
 There are couple of places of we does not follow this pattern, that results 
 some error. For example, the following script fail:
 {code}
 a = load '1.txt' as (a0, a1, a2, a3);
 b = foreach a generate a0, a1, a2;
 store b into 'aaa';
 c = order b by a2;
 d = foreach c generate a2;
 store d into 'bbb';
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1644) New logical plan: Plan.connect with position is misused in some places

2010-09-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1644:


Attachment: PIG-1644-1.patch

Attach the patch to address all such places in new logical plan, except for 
ExpressionSimplifier. There is some work underway for ExpressionSimplifier 
([PIG-1635|https://issues.apache.org/jira/browse/PIG-1635]) include some of 
these changes, I don't want to conflict with that patch. So after PIG-1635, we 
may also review the connect/disconnect usage of ExpressionSimplifier.

 New logical plan: Plan.connect with position is misused in some places
 --

 Key: PIG-1644
 URL: https://issues.apache.org/jira/browse/PIG-1644
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1644-1.patch


 When we replace/remove/insert a node, we will use disconnect/connect methods 
 of OperatorPlan. When we disconnect an edge, we shall save the position of 
 the edge in origination and destination, and use this position when connect 
 to the new predecessor/successor. Some of the pattens are:
 Insert a new node:
 {code}
 PairInteger, Integer pos = plan.disconnect(pred, succ);
 plan.connect(pred, pos.first, newnode, 0);
 plan.connect(newnode, 0, succ, pos.second);
 {code}
 Remove a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToRemove);
 PairInteger, Integer pos2 = plan.disconnect(nodeToRemove, succ);
 plan.connect(pred, pos1.first, succ, pos2.second);
 {code}
 Replace a node:
 {code}
 PairInteger, Integer pos1 = plan.disconnect(pred, nodeToReplace);
 PairInteger, Integer pos2 = plan.disconnect(nodeToReplace, succ);
 plan.connect(pred, pos1.first, newNode, pos1.second);
 plan.connect(newNode, pos2.first, succ, pos2.second);
 {code}
 There are couple of places of we does not follow this pattern, that results 
 some error. For example, the following script fail:
 {code}
 a = load '1.txt' as (a0, a1, a2, a3);
 b = foreach a generate a0, a1, a2;
 store b into 'aaa';
 c = order b by a2;
 d = foreach c generate a2;
 store d into 'bbb';
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1645) Using both small split combination and temporary file compression on a query of ORDER BY may cause crash

2010-09-22 Thread Yan Zhou (JIRA)
Using both small split combination and temporary file compression on a query of 
ORDER BY may cause crash


 Key: PIG-1645
 URL: https://issues.apache.org/jira/browse/PIG-1645
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.8.0


The stack looks like the following:

java.lang.NullPointerException at 
java.util.Arrays.binarySearch(Arrays.java:2043) at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:72)
 at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:52)
 at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:565) at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
 at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116)
 at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238)
 at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
 at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
 at
org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:638) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:314) at 
org.apache.hadoop.mapred.Child$4.run(Child.java:217) at
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:396) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062)
 at
org.apache.hadoop.mapred.Child.main(Child.java:211) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.