[jira] Created: (PIG-837) docs ant target is broken

2009-06-04 Thread Giridharan Kesavan (JIRA)
docs ant target is broken 
--

 Key: PIG-837
 URL: https://issues.apache.org/jira/browse/PIG-837
 Project: Pig
  Issue Type: Bug
Reporter: Giridharan Kesavan


docs ant target is broken , this would fail the trunk builds..

 [exec] Java Result: 1
 [exec] 
 [exec]   Copying broken links file to site root.
 [exec]   
 [exec] Copying 1 file to 
/home/hudson/hudson-slave/workspace/Pig-Patch-minerva.apache.org/trunk/src/docs/build/site
 [exec] 
 [exec] BUILD FAILED
 [exec] /home/nigel/tools/forrest/latest/main/targets/site.xml:180: Error 
building site.
 [exec] 
 [exec] There appears to be a problem with your site build.
 [exec] 
 [exec] Read the output above:
 [exec] * Cocoon will report the status of each document:
 [exec] - in column 1: *=okay X=brokenLink ^=pageSkipped (see FAQ).
 [exec] * Even if only one link is broken, you will still get "failed".
 [exec] * Your site would still be generated, but some pages would be 
broken.
 [exec]   - See 
/home/hudson/hudson-slave/workspace/Pig-Patch-minerva.apache.org/trunk/src/docs/build/site/broken-links.xml
 [exec] 
 [exec] Total time: 28 seconds
BUILD FAILED
/home/hudson/hudson-slave/workspace/Pig-Patch-minerva.apache.org/trunk/build.xml:326:
 exec returned: 1

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-Patch-minerva.apache.org #72

2009-06-04 Thread Apache Hudson Server
See 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/72/changes

Changes:

[olga] PIG-813: documentation updates (chandec via olgan)

[pradeepkth] PIG-796: support conversion from numeric types to chararray 
(Ashutosh Chauhan via pradeepkth)

--
started
Building remotely on minerva.apache.org (Ubuntu)
Updating http://svn.apache.org/repos/asf/hadoop/pig/trunk
U test/org/apache/pig/test/TestPOCast.java
C CHANGES.txt
U 
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POCast.java
U src/org/apache/pig/data/DataType.java
D src/docs/src/documentation/content/xdocs/quickstart.xml
U src/docs/src/documentation/content/xdocs/site.xml
U src/docs/src/documentation/content/xdocs/index.xml
U src/docs/src/documentation/content/xdocs/piglatin.xml
Fetching 'http://svn.apache.org/repos/asf/hadoop/core/nightly/test-patch' at -1 
into 
'http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/ws/trunk/test/bin'
 
At revision 781914
At revision 781914
no change for http://svn.apache.org/repos/asf/hadoop/core/nightly/test-patch 
since the previous build
[Pig-Patch-minerva.apache.org] $ /bin/bash /tmp/hudson7154531927977690732.sh
/home/hudson/tools/java/latest1.6/bin/java
Buildfile: build.xml

check-for-findbugs:

findbugs.check:

java5.check:

forrest.check:

hudson-test-patch:
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Testing patch for PIG-765.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Reverted 'CHANGES.txt'
 [exec] 
 [exec] Fetching external item into 'test/bin'
 [exec] Atest/bin/test-patch.sh
 [exec] Updated external to revision 781914.
 [exec] 
 [exec] Updated to revision 781914.
 [exec] PIG-765 patch is being downloaded at Thu Jun  4 22:48:13 PDT 2009 
from
 [exec] 
http://issues.apache.org/jira/secure/attachment/12409932/pig-765.patch
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Pre-building trunk to determine trunk number
 [exec] of release audit, javac, and Findbugs warnings.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] /home/hudson/tools/ant/latest/bin/ant  
-Djava5.home=/home/hudson/tools/java/latest1.5 
-Dforrest.home=/home/nigel/tools/forrest/latest -DPigPatchProcess= releaseaudit 
> 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/ws/patchprocess/trunkReleaseAuditWarnings.txt
  2>&1
 [exec] /home/hudson/tools/ant/latest/bin/ant  -Djavac.args=-Xlint 
-Xmaxwarns 1000 -Declipse.home=/home/nigel/tools/eclipse/latest 
-Djava5.home=/home/hudson/tools/java/latest1.5 
-Dforrest.home=/home/nigel/tools/forrest/latest -DPigPatchProcess= clean tar > 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/ws/patchprocess/trunkJavacWarnings.txt
  2>&1
 [exec] Trunk compilation is broken?
 [exec]   % Total% Received % Xferd  Average Speed   TimeTime 
Time  Current
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec]  Dload  Upload   Total   Spent
Left  Speed
 [exec] 
 [exec]   0 00 00 0  0  0 --:--:-- --:--:-- 
--:--:-- 0  0 00 00 0  0  0 --:--:-- --:--:-- 
--:--:-- 0

BUILD FAILED
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/ws/trunk/build.xml
 :653: exec returned: 1

Total time: 1 minute 45 seconds
Recording test results
Description found: PIG-765



[jira] Created: (PIG-836) Allow setting of end-of-record delimiter in PigStorage

2009-06-04 Thread George Mavromatis (JIRA)
Allow setting of end-of-record delimiter in PigStorage
--

 Key: PIG-836
 URL: https://issues.apache.org/jira/browse/PIG-836
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: George Mavromatis
 Fix For: 0.2.0


PigStorage allows overriding the default field delimiter ('\t'), but does not 
allow overriding the record delimiter ('\n').

It is a valid use case that fields contain new lines, e.g. because they are 
contents of a document/web page. It is possible for the user to create a custom 
load/store UDF to achieve that, but that is extra work on the user, many users 
will have to do it , and that udf would be the exact code duplicate of the 
PigStorage except for the delimiter.

Thus, PigStorage() should allow to configure both field and record separators.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-765) to implement jdiff

2009-06-04 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated PIG-765:
---

Attachment: pig-765.patch

this jdiff patch is created after resolving the author tag issue mentioned in 
pig-806.

> to implement jdiff
> --
>
> Key: PIG-765
> URL: https://issues.apache.org/jira/browse/PIG-765
> Project: Pig
>  Issue Type: Improvement
>  Components: build
>Reporter: Giridharan Kesavan
>Assignee: Giridharan Kesavan
> Attachments: pig-765.patch, pig-765.patch, pig-765.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-765) to implement jdiff

2009-06-04 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated PIG-765:
---

Status: Patch Available  (was: In Progress)

> to implement jdiff
> --
>
> Key: PIG-765
> URL: https://issues.apache.org/jira/browse/PIG-765
> Project: Pig
>  Issue Type: Improvement
>  Components: build
>Reporter: Giridharan Kesavan
>Assignee: Giridharan Kesavan
> Attachments: pig-765.patch, pig-765.patch, pig-765.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-765) to implement jdiff

2009-06-04 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated PIG-765:
---

Status: In Progress  (was: Patch Available)

> to implement jdiff
> --
>
> Key: PIG-765
> URL: https://issues.apache.org/jira/browse/PIG-765
> Project: Pig
>  Issue Type: Improvement
>  Components: build
>Reporter: Giridharan Kesavan
>Assignee: Giridharan Kesavan
> Attachments: pig-765.patch, pig-765.patch, pig-765.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-06-04 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716471#action_12716471
 ] 

Jeff Hammerbacher commented on PIG-833:
---

Hey Hong,

I never mentioned SQL or an ecosystem in my comment, but thanks for your 
observation. I was simply referring to the existence of a fairly detailed 
discussion in a related subproject that the Pig team may not have been 
following. I'll add an additional one here: 
https://issues.apache.org/jira/browse/HIVE-279 addresses the predicate pushdown 
feature.

Regards,
Jeff 

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-06-04 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716470#action_12716470
 ] 

Hong Tang commented on PIG-833:
---

Jeff, just like the SQL effort, the space of columnar storage is also wide 
open, and I think it is more beneficial to the overall healthy of the hadoop 
ecosystem.

With that being said, I also looked at the patch attached with HIVE-352. It 
appears that what the patch does is a level below our stated objectives. 
Specifically, the guts of the implementation (RCFile) is very close in spirit 
to TFile as described HADOOP-3315, which seems to have its first comprehensive 
patch back in December 2008. 

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-823) Hadoop Metadata Service

2009-06-04 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716463#action_12716463
 ] 

Jeff Hammerbacher commented on PIG-823:
---

Hey Olga,

Really looking forward to seeing more discussion on this issue. The NameNode 
already contains file metadata like ctime, mtime, the block list, permissions, 
etc. Will the proposed metadata service subsume those attributes as well? 
Curious to see the proposed design.

Thanks,
Jeff

> Hadoop Metadata Service
> ---
>
> Key: PIG-823
> URL: https://issues.apache.org/jira/browse/PIG-823
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>
> This JIRA is created to track development of a metadata system for  Hadoop. 
> The goal of the system is to allow users and applications to register data 
> stored on HDFS, search for the data available on HDFS, and associate metadata 
> such as schema, statistics, etc. with a particular data unit or a data set 
> stored on HDFS. The initial goal is to provide a fairly generic, low level 
> abstraction that any user or application on HDFS can use to store an retrieve 
> metadata. Over time a higher level abstractions closely tied to particular 
> applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a 
> subproject within Hadoop. For now, the proposal is to make it a contrib to 
> Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-06-04 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716462#action_12716462
 ] 

Jeff Hammerbacher commented on PIG-833:
---

You may want to see the Hive project, where a columnar storage format has been 
developed and benchmarked: https://issues.apache.org/jira/browse/HIVE-352.

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-835) Multiquery optimization does not handle the case where the map keys in the split plans have different key types (tuple and non tuple key type)

2009-06-04 Thread Pradeep Kamath (JIRA)
Multiquery optimization does not handle the case where the map keys in the 
split plans have different key types (tuple and non tuple key type)
--

 Key: PIG-835
 URL: https://issues.apache.org/jira/browse/PIG-835
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.1
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.3.0


A query like the following results in an exception on execution:
{noformat}
a = load 'mult.input' as (name, age, gpa);
b = group a ALL;
c = foreach b generate group, COUNT(a);
store c into 'foo';
d = group a by (name, gpa);
e = foreach d generate flatten(group), MIN(a.age);
store e into 'bar';
{noformat}

Exception on execution:
09/06/04 16:56:11 INFO mapred.TaskInProgress: Error from 
attempt_200906041655_0001_r_00_3: java.lang.ClassCastException: 
java.lang.String cannot be cast to org.apache.pig.data.Tuple
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:312)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:248)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:238)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:320)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:288)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:268)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:142)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-831) Records and bytes written reported by pig are wrong in a multi-store program

2009-06-04 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-831:
---

Status: Patch Available  (was: Open)

> Records and bytes written reported by pig are wrong in a multi-store program
> 
>
> Key: PIG-831
> URL: https://issues.apache.org/jira/browse/PIG-831
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Minor
> Attachments: PIG-831.patch
>
>
> The stats features checked in as part of PIG-626 (reporting the number of 
> records and bytes written at the end of the query) print wrong values (often 
> but not always 0) when the pig script being run contains more than 1 store.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-834) incorrect plan when algebraic functions are nested

2009-06-04 Thread Thejas M Nair (JIRA)
incorrect plan when algebraic functions are nested
--

 Key: PIG-834
 URL: https://issues.apache.org/jira/browse/PIG-834
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Thejas M Nair
Priority: Critical


a = load 'students.txt' as (c1,c2,c3,c4); 
c = group a by c2;  
f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));

Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
distinct does not function, and incorrect results are produced.
Distinct should have been evaluated in the 3 stages and output of Distinct 
should be given to COUNT in reduce stage.


# Map Reduce Plan  
#--
MapReduce node 1-122
Map Plan
Local Rearrange[tuple]{bytearray}(false) - 1-139
|   |
|   Project[bytearray][1] - 1-140
|
|---New For Each(false,false)[bag] - 1-127
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
|   |
|   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
|   |
|   |---Project[bag][2] - 1-123
|   |
|   |---Project[bag][1] - 1-124
|   |
|   Project[bytearray][0] - 1-133
|
|---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
|

|---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
 - 1-111
Combine Plan
Local Rearrange[tuple]{bytearray}(false) - 1-143
|   |
|   Project[bytearray][1] - 1-144
|
|---New For Each(false,false)[bag] - 1-132
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
|   |
|   |---Project[bag][0] - 1-135
|   |
|   Project[bytearray][1] - 1-134
|
|---POCombinerPackage[tuple]{bytearray} - 1-137
Reduce Plan
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
|
|---New For Each(false)[bag] - 1-120
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
|   |
|   |---Project[bag][0] - 1-136
|
|---POCombinerPackage[tuple]{bytearray} - 1-145
Global sort: false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

2009-06-04 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-834:
--

Description: 
a = load 'students.txt' as (c1,c2,c3,c4); 
c = group a by c2;  
f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));

Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
distinct does not function, and incorrect results are produced.
Distinct should have been evaluated in the 3 stages and output of Distinct 
should be given to COUNT in reduce stage.

{code}
# Map Reduce Plan  
#--
MapReduce node 1-122
Map Plan
Local Rearrange[tuple]{bytearray}(false) - 1-139
|   |
|   Project[bytearray][1] - 1-140
|
|---New For Each(false,false)[bag] - 1-127
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
|   |
|   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
|   |
|   |---Project[bag][2] - 1-123
|   |
|   |---Project[bag][1] - 1-124
|   |
|   Project[bytearray][0] - 1-133
|
|---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
|

|---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
 - 1-111
Combine Plan
Local Rearrange[tuple]{bytearray}(false) - 1-143
|   |
|   Project[bytearray][1] - 1-144
|
|---New For Each(false,false)[bag] - 1-132
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
|   |
|   |---Project[bag][0] - 1-135
|   |
|   Project[bytearray][1] - 1-134
|
|---POCombinerPackage[tuple]{bytearray} - 1-137
Reduce Plan
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
|
|---New For Each(false)[bag] - 1-120
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
|   |
|   |---Project[bag][0] - 1-136
|
|---POCombinerPackage[tuple]{bytearray} - 1-145
Global sort: false
{code}

  was:
a = load 'students.txt' as (c1,c2,c3,c4); 
c = group a by c2;  
f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));

Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
distinct does not function, and incorrect results are produced.
Distinct should have been evaluated in the 3 stages and output of Distinct 
should be given to COUNT in reduce stage.


# Map Reduce Plan  
#--
MapReduce node 1-122
Map Plan
Local Rearrange[tuple]{bytearray}(false) - 1-139
|   |
|   Project[bytearray][1] - 1-140
|
|---New For Each(false,false)[bag] - 1-127
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
|   |
|   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
|   |
|   |---Project[bag][2] - 1-123
|   |
|   |---Project[bag][1] - 1-124
|   |
|   Project[bytearray][0] - 1-133
|
|---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
|

|---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
 - 1-111
Combine Plan
Local Rearrange[tuple]{bytearray}(false) - 1-143
|   |
|   Project[bytearray][1] - 1-144
|
|---New For Each(false,false)[bag] - 1-132
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
|   |
|   |---Project[bag][0] - 1-135
|   |
|   Project[bytearray][1] - 1-134
|
|---POCombinerPackage[tuple]{bytearray} - 1-137
Reduce Plan
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
|
|---New For Each(false)[bag] - 1-120
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
|   |
|   |---Project[bag][0] - 1-136
|
|---POCombinerPackage[tuple]{bytearray} - 1-145
Global sort: false


> incorrect plan when algebraic functions are nested
> --
>
> Key: PIG-834
> URL: https://issues.apache.org/jira/browse/PIG-834
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
>Priority: Critical
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
> distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct 
> should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan  
> #--
> MapReduce node 1-122
> Map Plan
> Local Rearrange[t

[jira] Commented: (PIG-831) Records and bytes written reported by pig are wrong in a multi-store program

2009-06-04 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716392#action_12716392
 ] 

Olga Natkovich commented on PIG-831:


+1 on the patch. please, keep the bug open since we should at some point 
correctly report numbers for multiquery


> Records and bytes written reported by pig are wrong in a multi-store program
> 
>
> Key: PIG-831
> URL: https://issues.apache.org/jira/browse/PIG-831
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Minor
> Attachments: PIG-831.patch
>
>
> The stats features checked in as part of PIG-626 (reporting the number of 
> records and bytes written at the end of the query) print wrong values (often 
> but not always 0) when the pig script being run contains more than 1 store.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-817) Pig Docs for 0.3.0 Release

2009-06-04 Thread Corinne Chandel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716370#action_12716370
 ] 

Corinne Chandel commented on PIG-817:
-

Please delete this file (no longer in use): quickstart.xml

\Trunk\src\docs\src\documentation\content\xdocs\quickstart.xml

> Pig Docs for 0.3.0 Release
> --
>
> Key: PIG-817
> URL: https://issues.apache.org/jira/browse/PIG-817
> Project: Pig
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.3.0
>Reporter: Corinne Chandel
> Attachments: PIG-817-2.patch
>
>
> Update Pig docs for 0.3.0 release
> > Getting Started 
> > Pig Latin

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-817) Pig Docs for 0.3.0 Release

2009-06-04 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel updated PIG-817:


Attachment: PIG-817-2.patch


Updated patch #2.

Apply this patch to: http://svn.apache.org/repos/asf/hadoop/pig/trunk

Note: No new test code; changes to documentation only.

> Pig Docs for 0.3.0 Release
> --
>
> Key: PIG-817
> URL: https://issues.apache.org/jira/browse/PIG-817
> Project: Pig
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.3.0
>Reporter: Corinne Chandel
> Attachments: PIG-817-2.patch
>
>
> Update Pig docs for 0.3.0 release
> > Getting Started 
> > Pig Latin

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (PIG-817) Pig Docs for 0.3.0 Release

2009-06-04 Thread Corinne Chandel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712496#action_12712496
 ] 

Corinne Chandel edited comment on PIG-817 at 6/4/09 11:50 AM:
--

(1) PIG-817-2.patch - patch file



  was (Author: chandec):
(1) PIG_817.patch - patch file

(2) Doc-Build.zip - local doc build (for review)

(3) Doc-XML-Files - copies of the updated XML files (in case you need them)
  
> Pig Docs for 0.3.0 Release
> --
>
> Key: PIG-817
> URL: https://issues.apache.org/jira/browse/PIG-817
> Project: Pig
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.3.0
>Reporter: Corinne Chandel
> Attachments: PIG-817-2.patch
>
>
> Update Pig docs for 0.3.0 release
> > Getting Started 
> > Pig Latin

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-833) Storage access layer

2009-06-04 Thread Jay Tang (JIRA)
Storage access layer


 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang


A layer is needed to provide a high level data access abstraction and a tabular 
view of data in Hadoop, and could free Pig users from implementing their own 
data storage/retrieval code.  This layer should also include a columnar storage 
format in order to provide fast data projection, CPU/space-efficient data 
serialization, and a schema language to manage physical storage metadata.  
Eventually it could also support predicate pushdown for further performance 
improvement.  Initially, this layer could be a contrib project in Pig and 
become a hadoop subproject later on.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-796) support conversion from numeric types to chararray

2009-06-04 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-796:
---

   Resolution: Fixed
Fix Version/s: 0.3.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Patch commited - thanks for contributing Ashutosh!

> support  conversion from numeric types to chararray
> ---
>
> Key: PIG-796
> URL: https://issues.apache.org/jira/browse/PIG-796
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.2.0
>Reporter: Olga Natkovich
> Fix For: 0.3.0
>
> Attachments: 796.patch, pig-796.patch, pig-796.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-830) Port Apache Log parsing piggybank contrib to Pig 0.2

2009-06-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716336#action_12716336
 ] 

Hadoop QA commented on PIG-830:
---

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12409730/pig-830-v3.patch
  against trunk revision 781599.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 27 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/71/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/71/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/71/console

This message is automatically generated.

> Port Apache Log parsing piggybank contrib to Pig 0.2
> 
>
> Key: PIG-830
> URL: https://issues.apache.org/jira/browse/PIG-830
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.2.0
>Reporter: Dmitriy V. Ryaboy
>Priority: Minor
> Attachments: pig-830-v2.patch, pig-830-v3.patch, pig-830.patch, 
> TEST-org.apache.pig.piggybank.test.storage.TestMyRegExLoader.txt
>
>
> The piggybank contribs (pig-472, pig-473,  pig-474, pig-476, pig-486, 
> pig-487, pig-488, pig-503, pig-509) got dropped after the types branch was 
> merged in.
> They should be updated to work with the current APIs and added back into 
> trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-564) Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,' "

2009-06-04 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-564:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

patch committed

> Parameter Substitution using -param option does not seem to work when 
> parameters contain special characters such as +,=,-,?,' "
> ---
>
> Key: PIG-564
> URL: https://issues.apache.org/jira/browse/PIG-564
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Viraj Bhat
>Assignee: Olga Natkovich
> Attachments: PIG-564.patch
>
>
> Consider the following Pig script which uses parameter substitution
> {code}
> %default qual '/user/viraj'
> %default mydir 'mydir_myextraqual'
> VISIT_LOGS = load '$qual/$mydir' as (a,b,c);
> dump VISIT_LOGS;
> {code}
> If you run the script as:
> ==
> java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
> -param mydir=mydir-myextraqual mypigparamsub.pig
> ==
> You get the following error:
> ==
> 2008-12-15 19:49:43,964 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - java.io.IOException: /user/viraj/mydir does not exist
> at 
> org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:109)
> at 
> org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
> at 
> org.apache.pig.impl.io.ValidatingInputFileSpec.(ValidatingInputFileSpec.java:44)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
> at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
> at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> at java.lang.Thread.run(Thread.java:619)
> java.io.IOException: Unable to open iterator for alias: VISIT_LOGS [Job 
> terminated with anomalous status FAILED]
> at org.apache.pig.PigServer.openIterator(PigServer.java:389)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
> at org.apache.pig.Main.main(Main.java:306)
> Caused by: java.io.IOException: Job terminated with anomalous status FAILED
> ... 6 more
> ==
> Also tried using:  -param mydir='mydir\-myextraqual'
> This behavior occurs if the parameter value contains characters such as +,=, 
> ?. 
> A workaround for this behavior is using a param_file which contains 
> = on each line, with the  enclosed by 
> quotes. For example:
> mydir='mydir-myextraqual' and then running the pig script as:
> java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
> -param_file myparamfile mypigparamsub.pig
> The following issues need to be fixed:
> 1) In -param option if parameter value contains special characters, it is 
> truncated
> 2) In param_file, if  param_value contains a special characters, it should be 
> enclosed in quotes
> 3) If 2 is a known issue then it should be documented in 
> http://wiki.apache.org/pig/ParameterSubstitution

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-832) Make import list configurable

2009-06-04 Thread Olga Natkovich (JIRA)
Make import list configurable
-

 Key: PIG-832
 URL: https://issues.apache.org/jira/browse/PIG-832
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.3.0


Currently, it is hardwired in PigContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-830) Port Apache Log parsing piggybank contrib to Pig 0.2

2009-06-04 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-830:
--

Status: Patch Available  (was: Open)

> Port Apache Log parsing piggybank contrib to Pig 0.2
> 
>
> Key: PIG-830
> URL: https://issues.apache.org/jira/browse/PIG-830
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.2.0
>Reporter: Dmitriy V. Ryaboy
>Priority: Minor
> Attachments: pig-830-v2.patch, pig-830-v3.patch, pig-830.patch, 
> TEST-org.apache.pig.piggybank.test.storage.TestMyRegExLoader.txt
>
>
> The piggybank contribs (pig-472, pig-473,  pig-474, pig-476, pig-486, 
> pig-487, pig-488, pig-503, pig-509) got dropped after the types branch was 
> merged in.
> They should be updated to work with the current APIs and added back into 
> trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-830) Port Apache Log parsing piggybank contrib to Pig 0.2

2009-06-04 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-830:
--

Status: Open  (was: Patch Available)

trying to get Hudson to pick up the third patch.

> Port Apache Log parsing piggybank contrib to Pig 0.2
> 
>
> Key: PIG-830
> URL: https://issues.apache.org/jira/browse/PIG-830
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.2.0
>Reporter: Dmitriy V. Ryaboy
>Priority: Minor
> Attachments: pig-830-v2.patch, pig-830-v3.patch, pig-830.patch, 
> TEST-org.apache.pig.piggybank.test.storage.TestMyRegExLoader.txt
>
>
> The piggybank contribs (pig-472, pig-473,  pig-474, pig-476, pig-486, 
> pig-487, pig-488, pig-503, pig-509) got dropped after the types branch was 
> merged in.
> They should be updated to work with the current APIs and added back into 
> trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-564) Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,' "

2009-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716246#action_12716246
 ] 

Hudson commented on PIG-564:


Integrated in Pig-trunk #463 (See 
[http://hudson.zones.apache.org/hudson/job/Pig-trunk/463/])
: problem with parameter substitution and special charachters (olgan)


> Parameter Substitution using -param option does not seem to work when 
> parameters contain special characters such as +,=,-,?,' "
> ---
>
> Key: PIG-564
> URL: https://issues.apache.org/jira/browse/PIG-564
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Viraj Bhat
>Assignee: Olga Natkovich
> Attachments: PIG-564.patch
>
>
> Consider the following Pig script which uses parameter substitution
> {code}
> %default qual '/user/viraj'
> %default mydir 'mydir_myextraqual'
> VISIT_LOGS = load '$qual/$mydir' as (a,b,c);
> dump VISIT_LOGS;
> {code}
> If you run the script as:
> ==
> java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
> -param mydir=mydir-myextraqual mypigparamsub.pig
> ==
> You get the following error:
> ==
> 2008-12-15 19:49:43,964 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - java.io.IOException: /user/viraj/mydir does not exist
> at 
> org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:109)
> at 
> org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
> at 
> org.apache.pig.impl.io.ValidatingInputFileSpec.(ValidatingInputFileSpec.java:44)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
> at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
> at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> at java.lang.Thread.run(Thread.java:619)
> java.io.IOException: Unable to open iterator for alias: VISIT_LOGS [Job 
> terminated with anomalous status FAILED]
> at org.apache.pig.PigServer.openIterator(PigServer.java:389)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
> at org.apache.pig.Main.main(Main.java:306)
> Caused by: java.io.IOException: Job terminated with anomalous status FAILED
> ... 6 more
> ==
> Also tried using:  -param mydir='mydir\-myextraqual'
> This behavior occurs if the parameter value contains characters such as +,=, 
> ?. 
> A workaround for this behavior is using a param_file which contains 
> = on each line, with the  enclosed by 
> quotes. For example:
> mydir='mydir-myextraqual' and then running the pig script as:
> java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
> -param_file myparamfile mypigparamsub.pig
> The following issues need to be fixed:
> 1) In -param option if parameter value contains special characters, it is 
> truncated
> 2) In param_file, if  param_value contains a special characters, it should be 
> enclosed in quotes
> 3) If 2 is a known issue then it should be documented in 
> http://wiki.apache.org/pig/ParameterSubstitution

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.