[jira] Created: (PIG-1199) help includes obsolete options

2010-01-20 Thread Olga Natkovich (JIRA)
help includes obsolete options
--

 Key: PIG-1199
 URL: https://issues.apache.org/jira/browse/PIG-1199
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Olga Natkovich
 Fix For: 0.7.0


This is confusing to users

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1197) TextLoader should be updated to match changes to PigStorage

2010-01-20 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803083#action_12803083
 ] 

Alan Gates commented on PIG-1197:
-

I'm ok with putting it in 0.6, as it is very localized and it is a significant 
performance boost.  If I don't hear any complaints over the next couple of days 
I'll check it in.

> TextLoader should be updated to match changes to PigStorage
> ---
>
> Key: PIG-1197
> URL: https://issues.apache.org/jira/browse/PIG-1197
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: PIG-1197.patch
>
>
> In 0.6 PigStorage was changed to use LineRecordReader to parse lines out of 
> its stream instead of doing the parsing itself.  This resulted in about a 30% 
> speed up in parsing time.  TextLoader should be changed to use 
> LineRecordReader in the same way to benefit from the same speed up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"

2010-01-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1189:


Attachment: (was: singlereducestore.pig)

> StoreFunc UDF should ship to the backend automatically without "register"
> -
>
> Key: PIG-1189
> URL: https://issues.apache.org/jira/browse/PIG-1189
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: multimapstore.pig, multireducestore.pig, 
> PIG-1189-1.patch, singlemapstore.pig, singlereducestore.pig
>
>
> Pig should ship store UDF to backend even if user do not use "register". The 
> prerequisite is that UDF should be in classpath on frontend. We make that 
> work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
> we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"

2010-01-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1189:


Attachment: (was: singlereducestore.pig)

> StoreFunc UDF should ship to the backend automatically without "register"
> -
>
> Key: PIG-1189
> URL: https://issues.apache.org/jira/browse/PIG-1189
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: multimapstore.pig, multireducestore.pig, 
> PIG-1189-1.patch, singlemapstore.pig, singlereducestore.pig
>
>
> Pig should ship store UDF to backend even if user do not use "register". The 
> prerequisite is that UDF should be in classpath on frontend. We make that 
> work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
> we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"

2010-01-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1189:


Attachment: multireducestore.pig
multimapstore.pig

> StoreFunc UDF should ship to the backend automatically without "register"
> -
>
> Key: PIG-1189
> URL: https://issues.apache.org/jira/browse/PIG-1189
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: multimapstore.pig, multireducestore.pig, 
> PIG-1189-1.patch, singlemapstore.pig, singlereducestore.pig
>
>
> Pig should ship store UDF to backend even if user do not use "register". The 
> prerequisite is that UDF should be in classpath on frontend. We make that 
> work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
> we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"

2010-01-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1189:


Attachment: (was: singlemapstore.pig)

> StoreFunc UDF should ship to the backend automatically without "register"
> -
>
> Key: PIG-1189
> URL: https://issues.apache.org/jira/browse/PIG-1189
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1189-1.patch, singlemapstore.pig, 
> singlereducestore.pig, singlereducestore.pig, singlereducestore.pig
>
>
> Pig should ship store UDF to backend even if user do not use "register". The 
> prerequisite is that UDF should be in classpath on frontend. We make that 
> work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
> we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"

2010-01-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1189:


Attachment: (was: singlemapstore.pig)

> StoreFunc UDF should ship to the backend automatically without "register"
> -
>
> Key: PIG-1189
> URL: https://issues.apache.org/jira/browse/PIG-1189
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1189-1.patch, singlemapstore.pig, 
> singlereducestore.pig, singlereducestore.pig, singlereducestore.pig
>
>
> Pig should ship store UDF to backend even if user do not use "register". The 
> prerequisite is that UDF should be in classpath on frontend. We make that 
> work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
> we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"

2010-01-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1189:


Attachment: (was: PIG-1189-1.patch)

> StoreFunc UDF should ship to the backend automatically without "register"
> -
>
> Key: PIG-1189
> URL: https://issues.apache.org/jira/browse/PIG-1189
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1189-1.patch, singlemapstore.pig, 
> singlemapstore.pig, singlereducestore.pig, singlereducestore.pig, 
> singlereducestore.pig
>
>
> Pig should ship store UDF to backend even if user do not use "register". The 
> prerequisite is that UDF should be in classpath on frontend. We make that 
> work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
> we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"

2010-01-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1189:


Attachment: (was: PIG-1189-1.patch)

> StoreFunc UDF should ship to the backend automatically without "register"
> -
>
> Key: PIG-1189
> URL: https://issues.apache.org/jira/browse/PIG-1189
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1189-1.patch, PIG-1189-1.patch, singlemapstore.pig, 
> singlemapstore.pig, singlemapstore.pig, singlereducestore.pig, 
> singlereducestore.pig, singlereducestore.pig
>
>
> Pig should ship store UDF to backend even if user do not use "register". The 
> prerequisite is that UDF should be in classpath on frontend. We make that 
> work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
> we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"

2010-01-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1189:


Attachment: singlereducestore.pig
singlemapstore.pig
PIG-1189-1.patch

Unable to write a unit test for that since this issue only happens in real 
cluster, attach testing script I am using for manual verify. Notice there is no 
"register" command in the testing scripts. To run the script, include the jar 
in the class path:

java -Xmx512m -cp $HADOOP_CONF_DIR:pig.jar:zebra.jar org.apache.pig.Main xxx.pig

> StoreFunc UDF should ship to the backend automatically without "register"
> -
>
> Key: PIG-1189
> URL: https://issues.apache.org/jira/browse/PIG-1189
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1189-1.patch, PIG-1189-1.patch, singlemapstore.pig, 
> singlemapstore.pig, singlemapstore.pig, singlereducestore.pig, 
> singlereducestore.pig, singlereducestore.pig
>
>
> Pig should ship store UDF to backend even if user do not use "register". The 
> prerequisite is that UDF should be in classpath on frontend. We make that 
> work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
> we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"

2010-01-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1189:


Attachment: singlereducestore.pig
singlemapstore.pig
PIG-1189-1.patch

Unable to write a unit test for that since this issue only happens in real 
cluster, attach testing script I am using for manual verify. Notice there is no 
"register" command in the testing scripts. To run the script, include the jar 
in the class path:

java -Xmx512m -cp $HADOOP_CONF_DIR:pig.jar:zebra.jar org.apache.pig.Main xxx.pig

> StoreFunc UDF should ship to the backend automatically without "register"
> -
>
> Key: PIG-1189
> URL: https://issues.apache.org/jira/browse/PIG-1189
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1189-1.patch, PIG-1189-1.patch, singlemapstore.pig, 
> singlemapstore.pig, singlemapstore.pig, singlereducestore.pig, 
> singlereducestore.pig, singlereducestore.pig
>
>
> Pig should ship store UDF to backend even if user do not use "register". The 
> prerequisite is that UDF should be in classpath on frontend. We make that 
> work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
> we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"

2010-01-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1189:


Attachment: singlereducestore.pig
singlemapstore.pig
PIG-1189-1.patch

Unable to write a unit test for that since this issue only happens in real 
cluster, attach testing script I am using for manual verify. Notice there is no 
"register" command in the testing scripts. To run the script, include the jar 
in the class path:

java -Xmx512m -cp $HADOOP_CONF_DIR:pig.jar:zebra.jar org.apache.pig.Main xxx.pig

> StoreFunc UDF should ship to the backend automatically without "register"
> -
>
> Key: PIG-1189
> URL: https://issues.apache.org/jira/browse/PIG-1189
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1189-1.patch, PIG-1189-1.patch, PIG-1189-1.patch, 
> singlemapstore.pig, singlemapstore.pig, singlemapstore.pig, 
> singlereducestore.pig, singlereducestore.pig, singlereducestore.pig
>
>
> Pig should ship store UDF to backend even if user do not use "register". The 
> prerequisite is that UDF should be in classpath on frontend. We make that 
> work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
> we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1194) ERROR 2055: Received Error while processing the map plan

2010-01-20 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1194:
--

Status: Patch Available  (was: Open)

> ERROR 2055: Received Error while processing the map plan
> 
>
> Key: PIG-1194
> URL: https://issues.apache.org/jira/browse/PIG-1194
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.5.0, 0.6.0
>Reporter: Viraj Bhat
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: inputdata.txt, PIG-1194.patch
>
>
> I have a simple Pig script which takes 3 columns out of which one is null. 
> {code}
> input = load 'inputdata.txt' using PigStorage() as (col1, col2, col3);
> a = GROUP input BY (((double) col3)/((double) col2) > .001 OR col1 < 11 ? 
> col1 : -1);
> b = FOREACH a GENERATE group as col1, SUM(input.col2) as col2, 
> SUM(input.col3) as  col3;
> store b into 'finalresult';
> {code}
> When I run this script I get the following error:
> ERROR 2055: Received Error while processing the map plan.
> org.apache.pig.backend.executionengine.ExecException: ERROR 2055: Received 
> Error while processing the map plan.
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:277)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 
> A more useful error message for the purpose of debugging would be helpful.
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1194) ERROR 2055: Received Error while processing the map plan

2010-01-20 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1194:
--

Attachment: PIG-1194.patch

Change is made to POLocalRearrange class so it can handle nulls returned by 
conditional operator (POBinCond).

> ERROR 2055: Received Error while processing the map plan
> 
>
> Key: PIG-1194
> URL: https://issues.apache.org/jira/browse/PIG-1194
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.5.0, 0.6.0
>Reporter: Viraj Bhat
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: inputdata.txt, PIG-1194.patch
>
>
> I have a simple Pig script which takes 3 columns out of which one is null. 
> {code}
> input = load 'inputdata.txt' using PigStorage() as (col1, col2, col3);
> a = GROUP input BY (((double) col3)/((double) col2) > .001 OR col1 < 11 ? 
> col1 : -1);
> b = FOREACH a GENERATE group as col1, SUM(input.col2) as col2, 
> SUM(input.col3) as  col3;
> store b into 'finalresult';
> {code}
> When I run this script I get the following error:
> ERROR 2055: Received Error while processing the map plan.
> org.apache.pig.backend.executionengine.ExecException: ERROR 2055: Received 
> Error while processing the map plan.
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:277)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 
> A more useful error message for the purpose of debugging would be helpful.
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1198) [zebra] performance improvements

2010-01-20 Thread Yan Zhou (JIRA)
[zebra] performance improvements


 Key: PIG-1198
 URL: https://issues.apache.org/jira/browse/PIG-1198
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0


Current input split generation is row-based split on individual TFiles. This 
leaves undesired fact that even for TFiles smaller than one block one split is 
still generated for each. Consequently, there will be many mappers, and many 
waves, needed to handle the many small TFiles generated by as many 
mappers/reducers that wrote the data. This issue can be addressed by generating 
input splits that can include multiple TFiles. 

For sorted tables, key distribution generation by table, which is used to 
generated proper input splits, includes key distributions from column groups 
even they are not in projection. This incurs extra cost to perform unnecessary 
computations and, more inappropriately, creates unreasonable results on input 
split generations; 

For unsorted tables, when row split is generated on a union of tables, the 
FileSplits are generated for each table and then lumped together to form the 
final list of splits to Map/Reduce. This has a undesirable fact that number of 
splits is subject to the number of tables in the table union and not just 
controlled by the number of splits used by the Map/Reduce framework; 

The input split's goal size is calculated on all column groups even if some of 
them are not in projection; 

For input splits of multiple files in one column group, all files are opened at 
startup. This is unnecessary and takes unnecessarily resources from start to 
end. The files should be opened when needed and closed when not; 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-20 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802967#action_12802967
 ] 

Richard Ding commented on PIG-1090:
---

Committed PIG-1090-13 patch.

> Update sources to reflect recent changes in load-store interfaces
> -
>
> Key: PIG-1090
> URL: https://issues.apache.org/jira/browse/PIG-1090
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
> PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-2.patch, PIG-1090-3.patch, 
> PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, 
> PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch
>
>
> There have been some changes (as recorded in the Changes Section, Nov 2 2009 
> sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
> load/store interfaces - this jira is to track the task of making those 
> changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-20 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1090:
--

Attachment: PIG-1090-13.patch

New patch (13) that addresses the comments by Dmitriy and Pradeep.

> Update sources to reflect recent changes in load-store interfaces
> -
>
> Key: PIG-1090
> URL: https://issues.apache.org/jira/browse/PIG-1090
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
> PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-2.patch, PIG-1090-3.patch, 
> PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, 
> PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch
>
>
> There have been some changes (as recorded in the Changes Section, Nov 2 2009 
> sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
> load/store interfaces - this jira is to track the task of making those 
> changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1184) PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later

2010-01-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1184:


Fix Version/s: 0.7.0
   Status: Patch Available  (was: Open)

> PruneColumns optimization does not handle the case of foreach flatten 
> correctly if flattened bag is not used later
> --
>
> Key: PIG-1184
> URL: https://issues.apache.org/jira/browse/PIG-1184
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Pradeep Kamath
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1184-1.patch
>
>
> The following script :
> {noformat}
> -e "a = load 'input.txt' as (f1:chararray, f2:chararray, 
> f3:bag{t:tuple(id:chararray)}, f4:bag{t:tuple(loc:chararray)}); b = foreach a 
> generate f1, f2, flatten(f3), flatten(f4), 10; b = foreach b generate f1, f2, 
> \$4; dump b;"
> {noformat}
> gives the following result:
> (oiue,M,10)
> {noformat}
> cat input.txt:
> oiueM   {(3),(4)}   {(toronto),(montreal)}
> {noformat}
> If PruneColumns optimizations is disabled, we get the right result:
> (oiue,M,10)
> (oiue,M,10)
> (oiue,M,10)
> (oiue,M,10)
> The flatten results in 4 records - so the output should contain 4 records.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1184) PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later

2010-01-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-1184:
---

Assignee: Daniel Dai

> PruneColumns optimization does not handle the case of foreach flatten 
> correctly if flattened bag is not used later
> --
>
> Key: PIG-1184
> URL: https://issues.apache.org/jira/browse/PIG-1184
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Pradeep Kamath
>Assignee: Daniel Dai
> Attachments: PIG-1184-1.patch
>
>
> The following script :
> {noformat}
> -e "a = load 'input.txt' as (f1:chararray, f2:chararray, 
> f3:bag{t:tuple(id:chararray)}, f4:bag{t:tuple(loc:chararray)}); b = foreach a 
> generate f1, f2, flatten(f3), flatten(f4), 10; b = foreach b generate f1, f2, 
> \$4; dump b;"
> {noformat}
> gives the following result:
> (oiue,M,10)
> {noformat}
> cat input.txt:
> oiueM   {(3),(4)}   {(toronto),(montreal)}
> {noformat}
> If PruneColumns optimizations is disabled, we get the right result:
> (oiue,M,10)
> (oiue,M,10)
> (oiue,M,10)
> (oiue,M,10)
> The flatten results in 4 records - so the output should contain 4 records.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1184) PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later

2010-01-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1184:


Attachment: PIG-1184-1.patch

> PruneColumns optimization does not handle the case of foreach flatten 
> correctly if flattened bag is not used later
> --
>
> Key: PIG-1184
> URL: https://issues.apache.org/jira/browse/PIG-1184
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Pradeep Kamath
>Assignee: Daniel Dai
> Attachments: PIG-1184-1.patch
>
>
> The following script :
> {noformat}
> -e "a = load 'input.txt' as (f1:chararray, f2:chararray, 
> f3:bag{t:tuple(id:chararray)}, f4:bag{t:tuple(loc:chararray)}); b = foreach a 
> generate f1, f2, flatten(f3), flatten(f4), 10; b = foreach b generate f1, f2, 
> \$4; dump b;"
> {noformat}
> gives the following result:
> (oiue,M,10)
> {noformat}
> cat input.txt:
> oiueM   {(3),(4)}   {(toronto),(montreal)}
> {noformat}
> If PruneColumns optimizations is disabled, we get the right result:
> (oiue,M,10)
> (oiue,M,10)
> (oiue,M,10)
> (oiue,M,10)
> The flatten results in 4 records - so the output should contain 4 records.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-01-20 Thread Ying He (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying He updated PIG-1178:
-

Status: Patch Available  (was: Open)

> LogicalPlan and Optimizer are too complex and hard to work with
> ---
>
> Key: PIG-1178
> URL: https://issues.apache.org/jira/browse/PIG-1178
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ying He
> Attachments: expressions-2.patch, expressions.patch, lp.patch, 
> lp.patch, PIG_1178.patch
>
>
> The current implementation of the logical plan and the logical optimizer in 
> Pig has proven to not be easily extensible. Developer feedback has indicated 
> that adding new rules to the optimizer is quite burdensome. In addition, the 
> logical plan has been an area of numerous bugs, many of which have been 
> difficult to fix. Developers also feel that the logical plan is difficult to 
> understand and maintain. The root cause for these issues is that a number of 
> design decisions that were made as part of the 0.2 rewrite of the front end 
> have now proven to be sub-optimal. The heart of this proposal is to revisit a 
> number of those proposals and rebuild the logical plan with a simpler design 
> that will make it much easier to maintain the logical plan as well as extend 
> the logical optimizer. 
> See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
> details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-01-20 Thread Ying He (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying He updated PIG-1178:
-

Status: Open  (was: Patch Available)

attached a new patch

> LogicalPlan and Optimizer are too complex and hard to work with
> ---
>
> Key: PIG-1178
> URL: https://issues.apache.org/jira/browse/PIG-1178
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ying He
> Attachments: expressions-2.patch, expressions.patch, lp.patch, 
> lp.patch, PIG_1178.patch
>
>
> The current implementation of the logical plan and the logical optimizer in 
> Pig has proven to not be easily extensible. Developer feedback has indicated 
> that adding new rules to the optimizer is quite burdensome. In addition, the 
> logical plan has been an area of numerous bugs, many of which have been 
> difficult to fix. Developers also feel that the logical plan is difficult to 
> understand and maintain. The root cause for these issues is that a number of 
> design decisions that were made as part of the 0.2 rewrite of the front end 
> have now proven to be sub-optimal. The heart of this proposal is to revisit a 
> number of those proposals and rebuild the logical plan with a simpler design 
> that will make it much easier to maintain the logical plan as well as extend 
> the logical optimizer. 
> See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
> details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-01-20 Thread Ying He (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying He updated PIG-1178:
-

Attachment: lp.patch

patch to add relational operator, optimization rules and logical plan migration 
visitor

> LogicalPlan and Optimizer are too complex and hard to work with
> ---
>
> Key: PIG-1178
> URL: https://issues.apache.org/jira/browse/PIG-1178
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ying He
> Attachments: expressions-2.patch, expressions.patch, lp.patch, 
> lp.patch, PIG_1178.patch
>
>
> The current implementation of the logical plan and the logical optimizer in 
> Pig has proven to not be easily extensible. Developer feedback has indicated 
> that adding new rules to the optimizer is quite burdensome. In addition, the 
> logical plan has been an area of numerous bugs, many of which have been 
> difficult to fix. Developers also feel that the logical plan is difficult to 
> understand and maintain. The root cause for these issues is that a number of 
> design decisions that were made as part of the 0.2 rewrite of the front end 
> have now proven to be sub-optimal. The heart of this proposal is to revisit a 
> number of those proposals and rebuild the logical plan with a simpler design 
> that will make it much easier to maintain the logical plan as well as extend 
> the logical optimizer. 
> See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
> details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1197) TextLoader should be updated to match changes to PigStorage

2010-01-20 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802939#action_12802939
 ] 

Dmitriy V. Ryaboy commented on PIG-1197:


I know you guys feel strongly about not adding anything but bug-fixes into 0.6 
at this point, but I would love for this to make it in. It's a huge performance 
boost, and people use TextLoader a lot.

Agreed that it doesn't really need to go into 0.7 if we are hoping to get 966 
completed for that release. 

> TextLoader should be updated to match changes to PigStorage
> ---
>
> Key: PIG-1197
> URL: https://issues.apache.org/jira/browse/PIG-1197
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: PIG-1197.patch
>
>
> In 0.6 PigStorage was changed to use LineRecordReader to parse lines out of 
> its stream instead of doing the parsing itself.  This resulted in about a 30% 
> speed up in parsing time.  TextLoader should be changed to use 
> LineRecordReader in the same way to benefit from the same speed up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1191) POCast throws exception for certain sequences of LOAD, FILTER, FORACH

2010-01-20 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802931#action_12802931
 ] 

Alan Gates commented on PIG-1191:
-

Checked into 0.6 branch.

> POCast throws exception for certain sequences of LOAD, FILTER, FORACH
> -
>
> Key: PIG-1191
> URL: https://issues.apache.org/jira/browse/PIG-1191
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Pradeep Kamath
>Priority: Blocker
> Fix For: 0.6.0
>
> Attachments: PIG-1191-1.patch, PIG-1191-2.patch
>
>
> When using a custom load/store function, one that returns complex data (map 
> of maps, list of maps), for certain sequences  of LOAD, FILTER, FOREACH pig 
> script throws an exception of the form -
>  
> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
> bytearray from the UDF. Cannot determine how to convert the bytearray to 
> 
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639)
> ...
> Looking through the code of POCast, apparently the operator was unable to 
> find the right load function for doing the conversion and consequently bailed 
> out with the exception failing the entire pig script.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1197) TextLoader should be updated to match changes to PigStorage

2010-01-20 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802924#action_12802924
 ] 

Pradeep Kamath commented on PIG-1197:
-

Alan is right - TextLoader on the load-store-redesign branch already uses 
TextInputFormat (and hence LineReader) - do committers feel this patch is 
important enough that it should be committed to trunk? Otherwise I would vote 
in favor of just keeping it a patch as Alan suggested for people to use since 
TextLoader probably is not a frequently used Loader (am guessing).

> TextLoader should be updated to match changes to PigStorage
> ---
>
> Key: PIG-1197
> URL: https://issues.apache.org/jira/browse/PIG-1197
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: PIG-1197.patch
>
>
> In 0.6 PigStorage was changed to use LineRecordReader to parse lines out of 
> its stream instead of doing the parsing itself.  This resulted in about a 30% 
> speed up in parsing time.  TextLoader should be changed to use 
> LineRecordReader in the same way to benefit from the same speed up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-20 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802923#action_12802923
 ] 

Pradeep Kamath commented on PIG-1090:
-

Couple of comments on PIG-1090-13.patch.
 * The call to storeCleanup() should happen after the call to setUpContext() 
since the setUpContext() call changes the Configuration inside the Context and 
we should use this updated Configuration in storeCleanup()
 * In storeCleanup(), we could get StoreFunc instance once by calling 
store.getStoreFunc() and then use that instance later in the method. Also that 
instance can be used to check:
{code}
if(storeFunc instanceof StoreMetadata) {
}
{code}

> Update sources to reflect recent changes in load-store interfaces
> -
>
> Key: PIG-1090
> URL: https://issues.apache.org/jira/browse/PIG-1090
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
> PIG-1090-13.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, 
> PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, 
> PIG-1090.patch, PIG-1190-5.patch
>
>
> There have been some changes (as recorded in the Changes Section, Nov 2 2009 
> sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
> load/store interfaces - this jira is to track the task of making those 
> changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1197) TextLoader should be updated to match changes to PigStorage

2010-01-20 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802920#action_12802920
 ] 

Alan Gates commented on PIG-1197:
-

It's already been rewritten for that branch.  I'll check with Pradeep on 
whether he wants to check this patch in (which will make his merges harder) or 
just leave it here as a patch for anyone who wants to use it, since hopefully 
by 0.7 we'll have PIG-966 checked in and this isn't going into 0.6.

> TextLoader should be updated to match changes to PigStorage
> ---
>
> Key: PIG-1197
> URL: https://issues.apache.org/jira/browse/PIG-1197
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: PIG-1197.patch
>
>
> In 0.6 PigStorage was changed to use LineRecordReader to parse lines out of 
> its stream instead of doing the parsing itself.  This resulted in about a 30% 
> speed up in parsing time.  TextLoader should be changed to use 
> LineRecordReader in the same way to benefit from the same speed up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1197) TextLoader should be updated to match changes to PigStorage

2010-01-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802756#action_12802756
 ] 

Hadoop QA commented on PIG-1197:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12430812/PIG-1197.patch
  against trunk revision 901021.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/185/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/185/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/185/console

This message is automatically generated.

> TextLoader should be updated to match changes to PigStorage
> ---
>
> Key: PIG-1197
> URL: https://issues.apache.org/jira/browse/PIG-1197
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: PIG-1197.patch
>
>
> In 0.6 PigStorage was changed to use LineRecordReader to parse lines out of 
> its stream instead of doing the parsing itself.  This resulted in about a 30% 
> speed up in parsing time.  TextLoader should be changed to use 
> LineRecordReader in the same way to benefit from the same speed up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.