[jira] [Created] (PIG-2779) Refactoring the code for setting number of reducers

2012-06-28 Thread Jie Li (JIRA)
Jie Li created PIG-2779:
---

 Summary: Refactoring the code for setting number of reducers
 Key: PIG-2779
 URL: https://issues.apache.org/jira/browse/PIG-2779
 Project: Pig
  Issue Type: Bug
Reporter: Jie Li


As PIG-2652 observed, currently the code for setting number of reducers is a 
little messy. MapReduceOper.requestedParallelism seems being misused in some 
plases, and now we support runtime estimation of #reducer which further 
complicates the problem.

For example, if we specify parallel 1 for the order-by, the estimated #reducer 
will be used. If we specify parallel 2 while it estimates 4, order-by will fail 
due to "Illegal partition for Null". If we specify parallel 4 while it 
estimates 2, then some reducers will have nothing to do. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-483) PERFORMANCE: different strategies for large and small order bys

2012-06-28 Thread Jie Li (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403635#comment-13403635
 ] 

Jie Li commented on PIG-483:


Ooops, forgot that the sample job will always use 1 reducer instead of the 
estimated #reducer, so we don't have the information to decide whether to skip 
it.

One option is to add a field in MapReduceOper to store the estimated #reducer?

> PERFORMANCE: different strategies for large and small order bys
> ---
>
> Key: PIG-483
> URL: https://issues.apache.org/jira/browse/PIG-483
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.2.0
>Reporter: Olga Natkovich
>  Labels: gsoc2011, performance
> Attachments: PIG-483.0.patch
>
>
> Currently pig always does a multi-pass order by where it first determines a 
> distribution for the keys and then orders in a second pass.  This avoids the 
> necessity of having a single reducer.  However, in cases where the data is 
> small enough to fit into a single reducer, this is inefficient.  For small 
> data sets it would be good to realize the small size of the set and do the 
> order by in a single pass with a single reducer.
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2661) Pig uses an extra job for loading data in Pigmix L9

2012-06-28 Thread Jie Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Li updated PIG-2661:


Attachment: PIG-2661.3.patch

Attached the latest patch containing two unit tests.

> Pig uses an extra job for loading data in Pigmix L9
> ---
>
> Key: PIG-2661
> URL: https://issues.apache.org/jira/browse/PIG-2661
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Jie Li
>Assignee: Jie Li
> Attachments: PIG-2661.0.patch, PIG-2661.1.patch, PIG-2661.2.patch, 
> PIG-2661.3.patch, PIG-2661.plan.txt
>
>
> See 
> https://issues.apache.org/jira/browse/PIG-200?focusedCommentId=13260155&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13260155

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2661) Pig uses an extra job for loading data in Pigmix L9

2012-06-28 Thread Jie Li (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403606#comment-13403606
 ] 

Jie Li commented on PIG-2661:
-

Here are some numbers for why we want to disable merging the pipeline into 
sample if there exist flatten/stream:

Query: 
{code}
A = LOAD '$input/group' USING PigStorage('|') AS (a:int, b:{});
B = foreach A generate a, flatten(b);
ret = order B by $1; 
STORE ret INTO '$output/out';
{code}

Note there is a flatten. See attached PIG-2661.plan.txt for the query plan if 
we merge the pipeline.

Test data:
1GB data, grouped into three bags.

Result:
||merge||don't merge||
|sample(17min) + orderby(14m)| pipeline(11m) + sample(1m26s) + orderby(5m)|

We can see if we merge the pipeline to the sample job, it'll be very slow, due 
to several reasons:
1) the sample job will sample all three bags, which contain all the 1GB data;
2) the sample job requires a reduce phase to aggregate the sample information;
3) the orderby job will need to re-parse the input data.

We can imagine that if we have 10GB data, the difference will be more obvious 
as the 10GB data will go through one reducer of the sample job.

> Pig uses an extra job for loading data in Pigmix L9
> ---
>
> Key: PIG-2661
> URL: https://issues.apache.org/jira/browse/PIG-2661
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Jie Li
>Assignee: Jie Li
> Attachments: PIG-2661.0.patch, PIG-2661.1.patch, PIG-2661.2.patch, 
> PIG-2661.plan.txt
>
>
> See 
> https://issues.apache.org/jira/browse/PIG-200?focusedCommentId=13260155&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13260155

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2661) Pig uses an extra job for loading data in Pigmix L9

2012-06-28 Thread Jie Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Li updated PIG-2661:


Attachment: PIG-2661.plan.txt

> Pig uses an extra job for loading data in Pigmix L9
> ---
>
> Key: PIG-2661
> URL: https://issues.apache.org/jira/browse/PIG-2661
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Jie Li
>Assignee: Jie Li
> Attachments: PIG-2661.0.patch, PIG-2661.1.patch, PIG-2661.2.patch, 
> PIG-2661.plan.txt
>
>
> See 
> https://issues.apache.org/jira/browse/PIG-200?focusedCommentId=13260155&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13260155

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Build failed in Jenkins: Pig-trunk #1269

2012-06-28 Thread Apache Jenkins Server
See 

Changes:

[daijy] PIG-2766: Pig-HCat Usability

[dvryaboy] PIG-2777: Docs are broken due to malformed xml after PIG-2673

--
[...truncated 35882 lines...]
[junit] 12/06/28 22:36:50 WARN util.MBeans: 
Hadoop:service=DataNode,name=FSDatasetState-UndefinedStorageId709507616
[junit] javax.management.InstanceNotFoundException: 
Hadoop:service=DataNode,name=FSDatasetState-UndefinedStorageId709507616
[junit] at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1094)
[junit] at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:415)
[junit] at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:403)
[junit] at 
com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:506)
[junit] at 
org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:71)
[junit] at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.shutdown(FSDataset.java:1934)
[junit] at 
org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:788)
[junit] at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:566)
[junit] at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:550)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutdownMiniDfsClusters(MiniGenericCluster.java:87)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutdownMiniDfsAndMrClusters(MiniGenericCluster.java:77)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutDown(MiniGenericCluster.java:68)
[junit] at 
org.apache.pig.test.TestStore.oneTimeTearDown(TestStore.java:129)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
[junit] at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
[junit] at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
[junit] at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37)
[junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
[junit] at 
junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
[junit] 12/06/28 22:36:50 WARN datanode.FSDatasetAsyncDiskService: 
AsyncDiskService has already shut down.
[junit] 12/06/28 22:36:50 INFO mortbay.log: Stopped 
SelectChannelConnector@localhost:0
[junit] Shutting down DataNode 2
[junit] 12/06/28 22:36:50 INFO ipc.Server: Stopping server on 48033
[junit] 12/06/28 22:36:50 INFO ipc.Server: IPC Server handler 0 on 48033: 
exiting
[junit] 12/06/28 22:36:50 INFO ipc.Server: IPC Server handler 2 on 48033: 
exiting
[junit] 12/06/28 22:36:50 INFO ipc.Server: IPC Server handler 1 on 48033: 
exiting
[junit] 12/06/28 22:36:50 INFO ipc.Server: Stopping IPC Server Responder
[junit] 12/06/28 22:36:50 INFO metrics.RpcInstrumentation: shut down
[junit] 12/06/28 22:36:50 INFO ipc.Server: Stopping IPC Server listener on 
48033
[junit] 12/06/28 22:36:50 WARN datanode.DataNode: 
DatanodeRegistration(127.0.0.1:41840, 
storageID=DS-976074663-67.195.138.20-41840-1340922514838, infoPort=35454, 
ipcPort=48033):DataXceiveServer:java.nio.channels.AsynchronousCloseException
[junit] at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
[junit] at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:159)
[junit] at 
sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
[junit] at 
org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:131)
[junit] at java.lang.Thread.run(Thread.java:662)
[junit] 
[junit] 12/06/28 22:36:50 INFO datanode.DataNode: Exiting DataXceiveServer
[junit] 12/06/28 22:36:50 INFO datanode.DataNode: Waiting for threadgroup 
to exit, active threads is 1
[junit] 12/06/28 22:36:50 INFO datanode.DataBlockScanner: Exiting 
DataBlockScanner thread.
[junit] 12/06/28 22:36:50 INFO datanode.D

[jira] [Created] (PIG-2778) Add 'matches' operator to predicate pushdown

2012-06-28 Thread Dmitriy V. Ryaboy (JIRA)
Dmitriy V. Ryaboy created PIG-2778:
--

 Summary: Add 'matches' operator to predicate pushdown
 Key: PIG-2778
 URL: https://issues.apache.org/jira/browse/PIG-2778
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy


Currently the regex match operation does not get pushed down to LoadMetadata 
(and Expression does not have an enum value for it); it would be quite useful 
to enable this for some optimizations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: RANK function like in SQL

2012-06-28 Thread aavendan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5523/#review8733
---



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java


changed


- aavendan


On June 25, 2012, 10:46 a.m., aavendan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5523/
> ---
> 
> (Updated June 25, 2012, 10:46 a.m.)
> 
> 
> Review request for pig, aavendan and Gianmarco De Francisci Morales.
> 
> 
> Description
> ---
> 
> Review board for https://issues.apache.org/jira/browse/PIG-2353
> 
> 
> This addresses bug PIG-2353.
> https://issues.apache.org/jira/browse/PIG-2353
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PhyPlanSetter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigMapOnly.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/DotPOPrinter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PlanPrinter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCounter.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllSameRalationalNodesVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/LogicalPlanPrinter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/UidResetter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LORank.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalNodesVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/rules/ColumnPruneHelper.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/rules/ColumnPruneVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/LineageFindRelVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/ProjectStarExpander.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/SchemaAliasVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/TypeCheckingRelVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AliasMasker.g
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstPrinter.g
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryLexer.g
>  1353202

Re: Review Request: RANK function like in SQL

2012-06-28 Thread aavendan


> On June 28, 2012, 2:07 p.m., Gianmarco De Francisci Morales wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LORank.java,
> >  line 23
> > 
> >
> > Some imports are unused, we can clean it up.

cleaning code


- aavendan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5523/#review8710
---


On June 25, 2012, 10:46 a.m., aavendan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5523/
> ---
> 
> (Updated June 25, 2012, 10:46 a.m.)
> 
> 
> Review request for pig, aavendan and Gianmarco De Francisci Morales.
> 
> 
> Description
> ---
> 
> Review board for https://issues.apache.org/jira/browse/PIG-2353
> 
> 
> This addresses bug PIG-2353.
> https://issues.apache.org/jira/browse/PIG-2353
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PhyPlanSetter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigMapOnly.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/DotPOPrinter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PlanPrinter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCounter.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllSameRalationalNodesVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/LogicalPlanPrinter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/UidResetter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LORank.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalNodesVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/rules/ColumnPruneHelper.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/rules/ColumnPruneVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/LineageFindRelVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/ProjectStarExpander.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/SchemaAliasVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/TypeCheckingRelVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AliasMasker.g
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstPrinter.g
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parse

Re: Review Request: RANK function like in SQL

2012-06-28 Thread aavendan


> On June 28, 2012, 2:07 p.m., Gianmarco De Francisci Morales wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java,
> >  line 109
> > 
> >
> > We know the size of the final tuple, let's use the optimized 
> > constructor to preallocate the right amount of memory.

changed


- aavendan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5523/#review8710
---


On June 25, 2012, 10:46 a.m., aavendan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5523/
> ---
> 
> (Updated June 25, 2012, 10:46 a.m.)
> 
> 
> Review request for pig, aavendan and Gianmarco De Francisci Morales.
> 
> 
> Description
> ---
> 
> Review board for https://issues.apache.org/jira/browse/PIG-2353
> 
> 
> This addresses bug PIG-2353.
> https://issues.apache.org/jira/browse/PIG-2353
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PhyPlanSetter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigMapOnly.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/DotPOPrinter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PlanPrinter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCounter.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllSameRalationalNodesVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/LogicalPlanPrinter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/UidResetter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LORank.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalNodesVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/rules/ColumnPruneHelper.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/rules/ColumnPruneVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/LineageFindRelVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/ProjectStarExpander.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/SchemaAliasVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/TypeCheckingRelVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AliasMasker.g
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstPrinter.g
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser

Re: Review Request: RANK function like in SQL

2012-06-28 Thread aavendan


> On June 28, 2012, 2:07 p.m., Gianmarco De Francisci Morales wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java,
> >  line 129
> > 
> >
> > Why a separate method for incrementing a counter?
> > Why is it different from countNext?

countNext, counts same values. 
When there are different values among tuples, then it counts 1. 
If the next tuple has a same value as the previous one, the it is counted one 
more with countNext. I'll rename it.

On the following change of value of the tuples, the value of the general 
counter is set depending if it a dense rank or not.
setCount(getCount() + (isDenseRank() ? 1 : countNext));


- aavendan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5523/#review8710
---


On June 25, 2012, 10:46 a.m., aavendan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5523/
> ---
> 
> (Updated June 25, 2012, 10:46 a.m.)
> 
> 
> Review request for pig, aavendan and Gianmarco De Francisci Morales.
> 
> 
> Description
> ---
> 
> Review board for https://issues.apache.org/jira/browse/PIG-2353
> 
> 
> This addresses bug PIG-2353.
> https://issues.apache.org/jira/browse/PIG-2353
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PhyPlanSetter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigMapOnly.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/DotPOPrinter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PlanPrinter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCounter.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllSameRalationalNodesVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/LogicalPlanPrinter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/UidResetter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LORank.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalNodesVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/rules/ColumnPruneHelper.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/rules/ColumnPruneVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/LineageFindRelVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/ProjectStarExpander.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/SchemaAliasVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/TypeCheckingRelVisitor.java
>  1353202 
>   
> http://sv

Re: Review Request: RANK function like in SQL

2012-06-28 Thread aavendan


> On June 28, 2012, 2:07 p.m., Gianmarco De Francisci Morales wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java,
> >  line 146
> > 
> >
> > The same constant is defined in PigMapOnly.MapRank
> > It should be defined only in one place.

I will put it at JobControlCompiler and call its value on MapRank


> On June 28, 2012, 2:07 p.m., Gianmarco De Francisci Morales wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java,
> >  line 342
> > 
> >
> > What if we have more than one job doing a rank in the same MROoPlan at 
> > the same time?
> > I think this piece of code would not work as you would overwrite the 
> > rankGroup.

I renamed the group of counters with the jobID


> On June 28, 2012, 2:07 p.m., Gianmarco De Francisci Morales wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java,
> >  line 719
> > 
> >
> > Logging all this information at INFO level is too much.
> > I would either reduce the amount of logged info or put it at DEBUG 
> > level.

deleted


> On June 28, 2012, 2:07 p.m., Gianmarco De Francisci Morales wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PhyPlanSetter.java,
> >  line 127
> > 
> >
> > Why is this commented?
> > What would be the purpose of this piece of code?

I have to check this visitor, the impact on the code.


> On June 28, 2012, 2:07 p.m., Gianmarco De Francisci Morales wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCounter.java,
> >  line 44
> > 
> >
> > Is POCounter responsible for sorting the input?
> >

it was an initial approach, now I'm considering what is necessary for the 
POCounter


> On June 28, 2012, 2:07 p.m., Gianmarco De Francisci Morales wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCounter.java,
> >  line 120
> > 
> >
> > What role do the rankPlans play in POCounter? It is unclear to me.

It's a mistake naming the variables. Cleaning code!


- aavendan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5523/#review8710
---


On June 25, 2012, 10:46 a.m., aavendan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5523/
> ---
> 
> (Updated June 25, 2012, 10:46 a.m.)
> 
> 
> Review request for pig, aavendan and Gianmarco De Francisci Morales.
> 
> 
> Description
> ---
> 
> Review board for https://issues.apache.org/jira/browse/PIG-2353
> 
> 
> This addresses bug PIG-2353.
> https://issues.apache.org/jira/browse/PIG-2353
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PhyPlanSetter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigMapOnly.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/DotPOPrinter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PlanPrinter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCounter.java
>  PRE-CRE

[jira] [Updated] (PIG-2766) Pig-HCat Usability

2012-06-28 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2766:


   Resolution: Fixed
Fix Version/s: (was: 0.10.0)
   0.10.1
   0.11
   0.9.3
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Patch committed to 0.9/0.10/trunk. Thanks Vikram!

> Pig-HCat Usability
> --
>
> Key: PIG-2766
> URL: https://issues.apache.org/jira/browse/PIG-2766
> Project: Pig
>  Issue Type: Bug
>  Components: grunt, tools
>Affects Versions: 0.10.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Fix For: 0.9.3, 0.11, 0.10.1
>
> Attachments: PIG-2766.patch, PIG-2766_2.patch, PIG-2766_3.patch, 
> PIG-2766_4.patch, PIG-2766_5.patch, PIG-2766_6.patch, PIG-2766_7.patch, 
> PIG-2766_Branch0.9.patch
>
>
> Currently to use hcat from pig (via HCatLoader/HCatStorer) user need to 
> register bunch of jars and set couple of configuration. For a novice user, it 
> is non-trivial to find all the relevant jars and config params. We should 
> have better integration between Pig & HCat by pre-configuring Pig to load all 
> these jars and configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2774) Fix merge join to work with many duplicate left keys

2012-06-28 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403480#comment-13403480
 ] 

Thejas M Nair commented on PIG-2774:


bq. we might have other operations queued up after the join 

In 2nd approach, the operations within map task don't complicate things. But to 
handle a reduce after the merge-join, we would need to introduce another map 
task that does a union of merge-join results. For example, if the merge-join is 
followed by a group+agg , then the follow transformation to plan would be 
needed. 
Map(Merge-join + group+agg ops) + Reduce(group+agg ops)  
 => Map (merge-join wave 1 + group+agg ops)  + Map (merge-join wave 2 + 
group+agg opps) + Map(union of 1st 2 maps) + Reduce(group+agg ops)

This transformation can't happen dynamically - we can't decide to skip the 
reduce while in the map phase. 


To handle this case dynamically, looks like the first approach is one that 
actually would work! The user or a metadata system possibly identify the skew 
problem and recommend using a 'skew-merge' join next time query is run on 
similar data.



> Fix merge join to work with many duplicate left keys
> 
>
> Key: PIG-2774
> URL: https://issues.apache.org/jira/browse/PIG-2774
> Project: Pig
>  Issue Type: Bug
>Reporter: Aneesh Sharma
>
> A merge join can throw an OOM error if the number of duplicate left tuples is 
> large as it accumulates all of them in memory. There are two solutions around 
> this problem:
> 1. Serialize the accumulated tuples to disk if they exceed a certain size.
> 2. Spit out join output periodically, and re-seek on the right hand side 
> index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2774) Fix merge join to work with many duplicate left keys

2012-06-28 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403468#comment-13403468
 ] 

Dmitriy V. Ryaboy commented on PIG-2774:


I like the first paragraph of what you said; the second is more applicable to 
skew join (reduce side) than map join (map side), I think. With a mapside join, 
we might have other operations queued up after the join happening on the same 
mapper, and tracing through separate split files will get unnecessarily 
complicated.

> Fix merge join to work with many duplicate left keys
> 
>
> Key: PIG-2774
> URL: https://issues.apache.org/jira/browse/PIG-2774
> Project: Pig
>  Issue Type: Bug
>Reporter: Aneesh Sharma
>
> A merge join can throw an OOM error if the number of duplicate left tuples is 
> large as it accumulates all of them in memory. There are two solutions around 
> this problem:
> 1. Serialize the accumulated tuples to disk if they exceed a certain size.
> 2. Spit out join output periodically, and re-seek on the right hand side 
> index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2652) Skew join and order by don't trigger reducer estimation

2012-06-28 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-2652:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Skew join and order by don't trigger reducer estimation
> ---
>
> Key: PIG-2652
> URL: https://issues.apache.org/jira/browse/PIG-2652
> Project: Pig
>  Issue Type: Bug
>Reporter: Bill Graham
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.9.3, 0.11, 0.10.1
>
> Attachments: PIG-2652_1.patch, PIG-2652_2.patch, PIG-2652_3.patch, 
> PIG-2652_3_10.patch, PIG-2652_4.patch, PIG-2652_5.patch, PIG-2652_6.patch, 
> PIG-2652_7.patch
>
>
> If neither PARALLEL, default parallel or {{mapred.reduce.tasks}} are set, the 
> number of reducers is not estimated based on input size for skew joins or 
> order by. Instead, these jobs get only 1 reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2774) Fix merge join to work with many duplicate left keys

2012-06-28 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403461#comment-13403461
 ] 

Thejas M Nair commented on PIG-2774:


bq. I'd like to avoid having the user encode these details in the pig script. 

Floating some more ideas -

A more performant way of doing this would be to stop accumulating tuples for a 
join key value from left relation into memory when a certain memory threshold 
is exceeded. Once join of these tuples against the right relation is done, 
discard the accumulated left rel tuples for the join key and and load a new 
set, go back to the start of relations with this join key in right relation and 
continue.
To go back more efficiently to the start of join key in right relation we can 
keep track of its record offset. This approach will have no additional writes 
and have less IO overall. The right relation block hopefully gets in to OS 
cache.
But this approach can result in some map tasks being much slower than others.

Another option is to write the left side join key values that didn't fit into 
memory onto hdfs in separate files, one file for each chunch that is expected 
to fit into memory, and have another round of MR job do merge join on these 
files. ( I think hive has a skew join impl on similar lines). This would 
involve changing the MR plan at runtime.



> Fix merge join to work with many duplicate left keys
> 
>
> Key: PIG-2774
> URL: https://issues.apache.org/jira/browse/PIG-2774
> Project: Pig
>  Issue Type: Bug
>Reporter: Aneesh Sharma
>
> A merge join can throw an OOM error if the number of duplicate left tuples is 
> large as it accumulates all of them in memory. There are two solutions around 
> this problem:
> 1. Serialize the accumulated tuples to disk if they exceed a certain size.
> 2. Spit out join output periodically, and re-seek on the right hand side 
> index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2766) Pig-HCat Usability

2012-06-28 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated PIG-2766:


Attachment: PIG-2766_Branch0.9.patch

Patch for branch 0.9.

> Pig-HCat Usability
> --
>
> Key: PIG-2766
> URL: https://issues.apache.org/jira/browse/PIG-2766
> Project: Pig
>  Issue Type: Bug
>  Components: grunt, tools
>Affects Versions: 0.10.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Fix For: 0.10.0
>
> Attachments: PIG-2766.patch, PIG-2766_2.patch, PIG-2766_3.patch, 
> PIG-2766_4.patch, PIG-2766_5.patch, PIG-2766_6.patch, PIG-2766_7.patch, 
> PIG-2766_Branch0.9.patch
>
>
> Currently to use hcat from pig (via HCatLoader/HCatStorer) user need to 
> register bunch of jars and set couple of configuration. For a novice user, it 
> is non-trivial to find all the relevant jars and config params. We should 
> have better integration between Pig & HCat by pre-configuring Pig to load all 
> these jars and configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2766) Pig-HCat Usability

2012-06-28 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated PIG-2766:


Attachment: PIG-2766_7.patch

Moved the pig opts jars to the beginning of the additional jars.

> Pig-HCat Usability
> --
>
> Key: PIG-2766
> URL: https://issues.apache.org/jira/browse/PIG-2766
> Project: Pig
>  Issue Type: Bug
>  Components: grunt, tools
>Affects Versions: 0.10.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Fix For: 0.10.0
>
> Attachments: PIG-2766.patch, PIG-2766_2.patch, PIG-2766_3.patch, 
> PIG-2766_4.patch, PIG-2766_5.patch, PIG-2766_6.patch, PIG-2766_7.patch
>
>
> Currently to use hcat from pig (via HCatLoader/HCatStorer) user need to 
> register bunch of jars and set couple of configuration. For a novice user, it 
> is non-trivial to find all the relevant jars and config params. We should 
> have better integration between Pig & HCat by pre-configuring Pig to load all 
> these jars and configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2675) Optimization: Remove unnecessary Limit jobs from plan

2012-06-28 Thread Jie Li (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403351#comment-13403351
 ] 

Jie Li commented on PIG-2675:
-

Limit is now always compiled to two jobs. We can optimize at both compile-time 
and runtime.

{code}
data = LOAD 'queries/1.txt' AS (k, v, x);
selected = LIMIT data 2;
explain selected;
{code}

For this query, LIMIT is compiled at both the map phase and reduce phase in the 
1st job, whose requestedParallelism is already set to 1, thus we don't need to 
compile the 2nd job.

{code}
data = LOAD 'queries/1.txt' AS (k, v, x);
grouped = GROUP data BY k;
selected = LIMIT grouped 2;
explain selected;
{code}

For this query, LIMIT is compiled at the reduce phase of the 1st job, therefore 
we need to compile a 2nd job, which can be skipped at run-time.


> Optimization: Remove unnecessary Limit jobs from plan
> -
>
> Key: PIG-2675
> URL: https://issues.apache.org/jira/browse/PIG-2675
> Project: Pig
>  Issue Type: Improvement
>Reporter: Dmitriy V. Ryaboy
>Assignee: Daniel Dai
>
> LIMIT operator always inserts a limiting single-reducer job after PIG-2652.
> We can optimize this job away when the preceding job only has 1 reducer at 
> run-time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (PIG-2777) Docs are broken due to malformed xml after PIG-2673

2012-06-28 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy resolved PIG-2777.


   Resolution: Fixed
Fix Version/s: 0.11

Committed to trunk.

> Docs are broken due to malformed xml after PIG-2673
> ---
>
> Key: PIG-2777
> URL: https://issues.apache.org/jira/browse/PIG-2777
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.11
>
> Attachments: PIG-2777.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2777) Docs are broken due to malformed xml after PIG-2673

2012-06-28 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-2777:
---

Attachment: PIG-2777.patch

Tested by running the ant docs target. Will commit.

> Docs are broken due to malformed xml after PIG-2673
> ---
>
> Key: PIG-2777
> URL: https://issues.apache.org/jira/browse/PIG-2777
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
> Attachments: PIG-2777.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2777) Docs are broken due to malformed xml after PIG-2673

2012-06-28 Thread Dmitriy V. Ryaboy (JIRA)
Dmitriy V. Ryaboy created PIG-2777:
--

 Summary: Docs are broken due to malformed xml after PIG-2673
 Key: PIG-2777
 URL: https://issues.apache.org/jira/browse/PIG-2777
 Project: Pig
  Issue Type: Bug
  Components: documentation
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Build failed in Jenkins: Pig-trunk #1268

2012-06-28 Thread Dmitriy Ryaboy
That's my fault, xml error in the new merge join docs. Will fix.

On Thu, Jun 28, 2012 at 3:09 AM, Apache Jenkins Server
 wrote:
> See 
>
> Changes:
>
> [julien] PIG-2750: add artifacts to the ivy.xml for other jars Pig generates 
> (julien)
>
> [daijy] PIG-2775: Register jar does not goes to classpath in some cases
>
> [jcoveney] fix to PIG-2697 (jcoveney)
>
> --
> [...truncated 3528 lines...]
>     [exec] Fetching plugins descriptor: 
> http://forrest.apache.org/plugins/whiteboard-plugins.xml
>     [exec] Getting: http://forrest.apache.org/plugins/whiteboard-plugins.xml
>     [exec] To: 
> 
>     [exec] local file date : Tue Feb 01 02:18:42 UTC 2011
>     [exec] ..
>     [exec] last modified = Fri Jun 10 08:37:02 UTC 2011
>     [exec] Plugin list loaded from 
> http://forrest.apache.org/plugins/plugins.xml.
>     [exec] Plugin list loaded from 
> http://forrest.apache.org/plugins/whiteboard-plugins.xml.
>     [exec]
>     [exec] init-plugins:
>     [exec] Created dir: 
> 
>     [exec] Copying 1 file to 
> 
>     [exec] Copying 1 file to 
> 
>     [exec] Copying 1 file to 
> 
>     [exec] Copying 1 file to 
> 
>     [exec] Copying 1 file to 
> 
>     [exec]
>     [exec]       
> --
>     [exec]       Installing plugin: org.apache.forrest.plugin.output.pdf
>     [exec]       
> --
>     [exec]
>     [exec]
>     [exec] check-plugin:
>     [exec] org.apache.forrest.plugin.output.pdf is available in the build 
> dir. Trying to update it...
>     [exec]
>     [exec] init-props:
>     [exec]
>     [exec] echo-settings-condition:
>     [exec]
>     [exec] echo-settings:
>     [exec]
>     [exec] init-proxy:
>     [exec]
>     [exec] fetch-plugins-descriptors:
>     [exec]
>     [exec] fetch-plugin:
>     [exec] Trying to find the description of 
> org.apache.forrest.plugin.output.pdf in the different descriptor files
>     [exec] Using the descriptor file 
> 
>     [exec] Processing 
> 
>  to 
> 
>     [exec] Loading stylesheet 
> /home/jenkins/tools/forrest/latest/main/var/pluginlist2fetch.xsl
>     [exec]
>     [exec] fetch-local-unversioned-plugin:
>     [exec]
>     [exec] get-local:
>     [exec] Trying to locally get org.apache.forrest.plugin.output.pdf
>     [exec] Looking in local /home/jenkins/tools/forrest/latest/plugins
>     [exec] Found !
>     [exec]
>     [exec] init-build-compiler:
>     [exec]
>     [exec] echo-init:
>     [exec]
>     [exec] init:
>     [exec]
>     [exec] compile:
>     [exec]
>     [exec] jar:
>     [exec]
>     [exec] local-deploy:
>     [exec] Locally deploying org.apache.forrest.plugin.output.pdf
>     [exec]
>     [exec] build:
>     [exec] Plugin org.apache.forrest.plugin.output.pdf deployed ! Ready to 
> configure
>     [exec]
>     [exec] fetch-remote-unversioned-plugin-version-forrest:
>     [exec]
>     [exec] fetch-remote-unversioned-plugin-unversion-forrest:
>     [exec]
>     [exec] has-been-downloaded:
>     [exec]
>     [exec] downloaded-message:
>     [exec]
>     [exec] uptodate-message:
>     [exec]
>     [exec] not-found-message:
>     [exec] Fetch-plugin Ok, installing !
>     [exec]
>     [exec] unpack-plugin:
>     [exec]
>     [exec] install-plugin:
>     [exec]
>     [exec] configure-plugin:
>     [exec]
>     [exec] configure-output-plugin:
>     [exec] Mounting output plugin: org.apache.forrest.plugin.output.pdf
>     [exec] Processing 
> 
>  to 
> 
>     [exec] Loading stylesheet 
> /home/jenkins/tools/forrest/latest/main/var/pluginMountSnippet.xsl
>     [exec] Moving 1 file to 
> 
>     [exec]
>     [exec] configure-plugin-locationmap:
>     [exec] Mounting plugin locationmap for 
> org.apache.forrest.plugin.output.pdf
>     [exec] Processing 
> 
>  to 
> 

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-28 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403114#comment-13403114
 ] 

Zhijie Shen commented on PIG-1314:
--

Hi Thejas, I'll take your suggestions. Thanks!

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: RANK function like in SQL

2012-06-28 Thread Gianmarco De Francisci Morales

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5523/#review8710
---


Overall looks good.


http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java


The same constant is defined in PigMapOnly.MapRank
It should be defined only in one place.



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java


What if we have more than one job doing a rank in the same MROoPlan at the 
same time?
I think this piece of code would not work as you would overwrite the 
rankGroup.



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java


Logging all this information at INFO level is too much.
I would either reduce the amount of logged info or put it at DEBUG level.



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java


How do we enforce that the number of mappers is the same in the Counter and 
Rank jobs?
It is not too relevant given the other optimized implementation we 
discussed about, but worth pointing it out.



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PhyPlanSetter.java


Why is this commented?
What would be the purpose of this piece of code?



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCounter.java


Is POCounter responsible for sorting the input?




http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCounter.java


What role do the rankPlans play in POCounter? It is unclear to me.



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java


We know the size of the final tuple, let's use the optimized constructor to 
preallocate the right amount of memory.



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java


Why a separate method for incrementing a counter?
Why is it different from countNext?



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java


No need for getAll(), Tuple is Iterable:

for (Object o : in)
  out.append(o);



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LORank.java


Some imports are unused, we can clean it up.


- Gianmarco De Francisci Morales


On June 25, 2012, 10:46 a.m., aavendan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5523/
> ---
> 
> (Updated June 25, 2012, 10:46 a.m.)
> 
> 
> Review request for pig, aavendan and Gianmarco De Francisci Morales.
> 
> 
> Description
> ---
> 
> Review board for https://issues.apache.org/jira/browse/PIG-2353
> 
> 
> This addresses bug PIG-2353.
> https://issues.apache.org/jira/browse/PIG-2353
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PhyPlanSetter.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigMapOnly.java
>  1353202 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/DotPOPrinter.java
>  1353202 
>   
> http://svn

Build failed in Jenkins: Pig-trunk #1268

2012-06-28 Thread Apache Jenkins Server
See 

Changes:

[julien] PIG-2750: add artifacts to the ivy.xml for other jars Pig generates 
(julien)

[daijy] PIG-2775: Register jar does not goes to classpath in some cases

[jcoveney] fix to PIG-2697 (jcoveney)

--
[...truncated 3528 lines...]
 [exec] Fetching plugins descriptor: 
http://forrest.apache.org/plugins/whiteboard-plugins.xml
 [exec] Getting: http://forrest.apache.org/plugins/whiteboard-plugins.xml
 [exec] To: 

 [exec] local file date : Tue Feb 01 02:18:42 UTC 2011
 [exec] ..
 [exec] last modified = Fri Jun 10 08:37:02 UTC 2011
 [exec] Plugin list loaded from 
http://forrest.apache.org/plugins/plugins.xml.
 [exec] Plugin list loaded from 
http://forrest.apache.org/plugins/whiteboard-plugins.xml.
 [exec] 
 [exec] init-plugins:
 [exec] Created dir: 

 [exec] Copying 1 file to 

 [exec] Copying 1 file to 

 [exec] Copying 1 file to 

 [exec] Copying 1 file to 

 [exec] Copying 1 file to 

 [exec] 
 [exec]   --
 [exec]   Installing plugin: org.apache.forrest.plugin.output.pdf
 [exec]   --
 [exec]
 [exec] 
 [exec] check-plugin:
 [exec] org.apache.forrest.plugin.output.pdf is available in the build dir. 
Trying to update it...
 [exec] 
 [exec] init-props:
 [exec] 
 [exec] echo-settings-condition:
 [exec] 
 [exec] echo-settings:
 [exec] 
 [exec] init-proxy:
 [exec] 
 [exec] fetch-plugins-descriptors:
 [exec] 
 [exec] fetch-plugin:
 [exec] Trying to find the description of 
org.apache.forrest.plugin.output.pdf in the different descriptor files
 [exec] Using the descriptor file 

 [exec] Processing 

 to 

 [exec] Loading stylesheet 
/home/jenkins/tools/forrest/latest/main/var/pluginlist2fetch.xsl
 [exec] 
 [exec] fetch-local-unversioned-plugin:
 [exec] 
 [exec] get-local:
 [exec] Trying to locally get org.apache.forrest.plugin.output.pdf
 [exec] Looking in local /home/jenkins/tools/forrest/latest/plugins
 [exec] Found !
 [exec] 
 [exec] init-build-compiler:
 [exec] 
 [exec] echo-init:
 [exec] 
 [exec] init:
 [exec] 
 [exec] compile:
 [exec] 
 [exec] jar:
 [exec] 
 [exec] local-deploy:
 [exec] Locally deploying org.apache.forrest.plugin.output.pdf
 [exec] 
 [exec] build:
 [exec] Plugin org.apache.forrest.plugin.output.pdf deployed ! Ready to 
configure
 [exec] 
 [exec] fetch-remote-unversioned-plugin-version-forrest:
 [exec] 
 [exec] fetch-remote-unversioned-plugin-unversion-forrest:
 [exec] 
 [exec] has-been-downloaded:
 [exec] 
 [exec] downloaded-message:
 [exec] 
 [exec] uptodate-message:
 [exec] 
 [exec] not-found-message:
 [exec] Fetch-plugin Ok, installing !
 [exec] 
 [exec] unpack-plugin:
 [exec] 
 [exec] install-plugin:
 [exec] 
 [exec] configure-plugin:
 [exec] 
 [exec] configure-output-plugin:
 [exec] Mounting output plugin: org.apache.forrest.plugin.output.pdf
 [exec] Processing 

 to 

 [exec] Loading stylesheet 
/home/jenkins/tools/forrest/latest/main/var/pluginMountSnippet.xsl
 [exec] Moving 1 file to 

 [exec] 
 [exec] configure-plugin-locationmap:
 [exec] Mounting plugin locationmap for org.apache.forrest.plugin.output.pdf
 [exec] Processing 

 to 

 [exec] Loading stylesheet 
/home/jenkins/tools/forrest/latest/main/var/pluginLmMountSnippet.xsl
 [exec] Moving 1 file to 


[jira] [Commented] (PIG-2774) Fix merge join to work with many duplicate left keys

2012-06-28 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402923#comment-13402923
 ] 

Dmitriy V. Ryaboy commented on PIG-2774:


Generating non-standard splits can get tricky in the solution Thejas proposed.. 
Also I'd like to avoid having the user encode these details in the pig script. 

> Fix merge join to work with many duplicate left keys
> 
>
> Key: PIG-2774
> URL: https://issues.apache.org/jira/browse/PIG-2774
> Project: Pig
>  Issue Type: Bug
>Reporter: Aneesh Sharma
>
> A merge join can throw an OOM error if the number of duplicate left tuples is 
> large as it accumulates all of them in memory. There are two solutions around 
> this problem:
> 1. Serialize the accumulated tuples to disk if they exceed a certain size.
> 2. Spit out join output periodically, and re-seek on the right hand side 
> index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira