[jira] [Commented] (PIG-2888) Improve performance of POPartialAgg

2012-08-27 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442974#comment-13442974
 ] 

Julien Le Dem commented on PIG-2888:


Awesome. Stop now or it will start to be negative soon.
Comments:
* There's a "pig.exec.nocombiner" that was not replaced by a constant.
* It would be nice to have a consistent way of getting booleans (and floats) 
from the conf. Something like:
{noformat}
PigConfiguration.getBoolean(Properties p, key) {
  return "true".equals(p.getProperty(key, "false"));
}
{noformat}
* some of the class description was still applicable
{noformat}
/**
 * Do partial aggregation in map plan. It uses a hash-map to aggregate. 
 * ...
 */
 public class POPartialAgg extends PhysicalOperator {
{noformat}
* what is the reason for this particular value?
{noformat}
 private static final int MAX_LIST_SIZE = 1 << 13 - 1;
{noformat}
* It looks like this could be a HashSet as the value never gets used (but 
there's no WeakHashSet so I gues I got my answer). It could be as well 
WeakHashMap. Don't you want a visitor to just list them all 
once and set the count? That way you would not have to worry about keeping a 
reference on them. 
{noformat}
private static final WeakHashMap ALL_POPARTS = new 
WeakHashMap();
{noformat}
* +0.5 so that it is never 0 ? Math.min(1, ...) is more readable. 
{noformat}
 firstTierThreshold = (int) (0.5 + totalTuples * (1f - (1f / sizeReduction)));
 secondTierThreshold = (int) (0.5 + totalTuples *  (1f / sizeReduction));
{noformat}
* LOG.info() should be wrapped in if (LOG.isInfoEnabled()) { ... } for perf
* in aggregateSecondLevel() can't the processedInputMap be reused?
* in getMinOutputReductionFromProp(), if minReduction <= 0 it should throw an 
exception.



> Improve performance of POPartialAgg
> ---
>
> Key: PIG-2888
> URL: https://issues.apache.org/jira/browse/PIG-2888
> Project: Pig
>  Issue Type: Improvement
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
> Attachments: partialagg_patch_1.patch, partialagg_patch_2.patch, 
> partialagg_patch_3.patch, partialagg_patch_4.patch
>
>
> During performance testing, we found that POPartialAgg can cause performance 
> degradation for Pig jobs when the Algebraic UDFs it's being applied to aren't 
> well suited to the operator's assumptions. Changing the implementation to a 
> more flexible hash-based model can provide significant performance 
> improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2888) Improve performance of POPartialAgg

2012-08-27 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-2888:
---

Attachment: partialagg_patch_4.patch

Significant improvements to transitions from raw to processed map. Better mem 
utilization estimation. Better logging.

While profiling, also noticed an inordinate amount of time being spent in 
Distinct$Initial's bag registration, fixed that.

The task that I cited as taking 57 seconds with this patch earlier? It now 
takes 30 seconds. Also saw 40% speed improvement vs older version of this patch 
on a production job.

Please review :).

> Improve performance of POPartialAgg
> ---
>
> Key: PIG-2888
> URL: https://issues.apache.org/jira/browse/PIG-2888
> Project: Pig
>  Issue Type: Improvement
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
> Attachments: partialagg_patch_1.patch, partialagg_patch_2.patch, 
> partialagg_patch_3.patch, partialagg_patch_4.patch
>
>
> During performance testing, we found that POPartialAgg can cause performance 
> degradation for Pig jobs when the Algebraic UDFs it's being applied to aren't 
> well suited to the operator's assumptions. Changing the implementation to a 
> more flexible hash-based model can provide significant performance 
> improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2893) fix DBStorage compile issue

2012-08-27 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442922#comment-13442922
 ] 

Thejas M Nair commented on PIG-2893:


The compile error was - 
[javac] symbol  : method setDate(int,java.util.Date)
[javac] location: interface java.sql.PreparedStatement
[javac] ps.setDate(sqlPos, ((DateTime) field).toDate());

> fix DBStorage compile issue
> ---
>
> Key: PIG-2893
> URL: https://issues.apache.org/jira/browse/PIG-2893
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Thejas M Nair
> Attachments: PIG-2893.1.patch
>
>
> DBStorage does not compile after the datetime patch was committed. The joda 
> datetime was passed as argument to java.sql.PreparedStatement.setDate() 
> instead of java.sql.Date .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2893) fix DBStorage compile issue

2012-08-27 Thread Thejas M Nair (JIRA)
Thejas M Nair created PIG-2893:
--

 Summary: fix DBStorage compile issue
 Key: PIG-2893
 URL: https://issues.apache.org/jira/browse/PIG-2893
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
 Attachments: PIG-2893.1.patch

DBStorage does not compile after the datetime patch was committed. The joda 
datetime was passed as argument to java.sql.PreparedStatement.setDate() instead 
of java.sql.Date .


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2893) fix DBStorage compile issue

2012-08-27 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2893:
---

Attachment: PIG-2893.1.patch

PIG-2893.1.patch - fix for compile issue, updates to test case to use datetime 
type.


> fix DBStorage compile issue
> ---
>
> Key: PIG-2893
> URL: https://issues.apache.org/jira/browse/PIG-2893
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Thejas M Nair
> Attachments: PIG-2893.1.patch
>
>
> DBStorage does not compile after the datetime patch was committed. The joda 
> datetime was passed as argument to java.sql.PreparedStatement.setDate() 
> instead of java.sql.Date .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1283) COUNT on null bag causes failure

2012-08-27 Thread Anand L Ranganathan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand L Ranganathan updated PIG-1283:
-

Status: In Progress  (was: Patch Available)

Working.

> COUNT on null bag causes failure
> 
>
> Key: PIG-1283
> URL: https://issues.apache.org/jira/browse/PIG-1283
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Anand L Ranganathan
>  Labels: newbie
> Attachments: PIG-1283-1.patch
>
>
> grunt>  l = load '/tmp/e.bag' as (b : bag{t: (i : int)}, a : int);
> # b is null for the only row
> grunt> c = foreach l generate COUNT(b);   
> grunt> dump c   
> It results in following exception-
> org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error while 
> computing count in COUNT
> at org.apache.pig.builtin.COUNT.exec(COUNT.java:59)
> at org.apache.pig.builtin.COUNT.exec(COUNT.java:39)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:212)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:358)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:232)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)
> Caused by: java.lang.NullPointerException
> at org.apache.pig.builtin.COUNT.exec(COUNT.java:46)
> ... 12 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2341) Need better documentation on Pig/HBase integration

2012-08-27 Thread Jayesh Thakrar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442878#comment-13442878
 ] 

Jayesh Thakrar commented on PIG-2341:
-

Hi,

I am using the HBaseStorage at my work and am very happy about it. I would like 
to volunteer to take up this task. How can I go about doing it? 

Will greatly appreciate any pointers..

Thanks,
Jayesh


> Need better documentation on Pig/HBase integration
> --
>
> Key: PIG-2341
> URL: https://issues.apache.org/jira/browse/PIG-2341
> Project: Pig
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 0.9.0
>Reporter: Mikael Sitruk
>
> One of the nice thing between Pig and Hbase is that they can be integrated. 
> Thanks to recent patch (PIG-1250) committed.
> The documentation is not well updated yet (currently almost relate to the 
> patch itself). It world be nice to document this feature in detail in the Pig 
> documentation page (e.g, in here: 
> http://pig.apache.org/docs/r0.9.1/func.html#load-store-functions).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2892) piggybank build failing on trunk

2012-08-27 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2892:
---

Attachment: PIG-2892.patch

This is a regression from PIG-1314.

The problem is that org.joda.time.DateTime.toDate() returns java.util.Date 
while java.sql.Date is expected for java.sql.PreparedStatement.setDate().

Attached is a patch that converts java.util.Date to java.sql.Date before 
calling java.sql.PreparedStatement.setDate().

Thanks!

> piggybank build failing on trunk
> 
>
> Key: PIG-2892
> URL: https://issues.apache.org/jira/browse/PIG-2892
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Reporter: Alan Gates
>Priority: Critical
> Attachments: PIG-2892.patch
>
>
> When I try to build Piggybank I get:
> {code}
>[javac] 
> /grid/0/hortonal/src/pig/top/trunk/contrib/piggybank/java/build.xml:92: 
> warning: 'includeantruntime' was not set, defaulting to 
> build.sysclasspath=last; set to false for repeatable builds
> [javac] Compiling 159 source files to 
> /grid/0/hortonal/src/pig/top/trunk/contrib/piggybank/java/build/classes
> [javac] 
> /grid/0/hortonal/src/pig/top/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java:121:
>  cannot find symbol
> [javac] symbol  : method setDate(int,java.util.Date)
> [javac] location: interface java.sql.PreparedStatement
> [javac] ps.setDate(sqlPos, ((DateTime) field).toDate());
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2892) piggybank build failing on trunk

2012-08-27 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2892:
---

Assignee: Cheolsoo Park
  Status: Patch Available  (was: Open)

> piggybank build failing on trunk
> 
>
> Key: PIG-2892
> URL: https://issues.apache.org/jira/browse/PIG-2892
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Reporter: Alan Gates
>Assignee: Cheolsoo Park
>Priority: Critical
> Attachments: PIG-2892.patch
>
>
> When I try to build Piggybank I get:
> {code}
>[javac] 
> /grid/0/hortonal/src/pig/top/trunk/contrib/piggybank/java/build.xml:92: 
> warning: 'includeantruntime' was not set, defaulting to 
> build.sysclasspath=last; set to false for repeatable builds
> [javac] Compiling 159 source files to 
> /grid/0/hortonal/src/pig/top/trunk/contrib/piggybank/java/build/classes
> [javac] 
> /grid/0/hortonal/src/pig/top/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java:121:
>  cannot find symbol
> [javac] symbol  : method setDate(int,java.util.Date)
> [javac] location: interface java.sql.PreparedStatement
> [javac] ps.setDate(sqlPos, ((DateTime) field).toDate());
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Pig-trunk #1306

2012-08-27 Thread Apache Jenkins Server
See 

Changes:

[daijy] PIG-2708: split MiniCluster based tests out of 
org.apache.pig.test.TestInputOutputFileValidator

[daijy] PIG-2821:  HBaseStorage should work with secure hbase

--
[...truncated 35898 lines...]
[junit] at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:550)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutdownMiniDfsClusters(MiniGenericCluster.java:87)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutdownMiniDfsAndMrClusters(MiniGenericCluster.java:77)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutDown(MiniGenericCluster.java:68)
[junit] at 
org.apache.pig.test.TestStore.oneTimeTearDown(TestStore.java:132)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
[junit] at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
[junit] at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
[junit] at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37)
[junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
[junit] at 
junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
[junit] 12/08/27 23:42:22 WARN datanode.FSDatasetAsyncDiskService: 
AsyncDiskService has already shut down.
[junit] Shutting down DataNode 2
[junit] 12/08/27 23:42:22 INFO mortbay.log: Stopped 
SelectChannelConnector@localhost:0
[junit] 12/08/27 23:42:22 INFO ipc.Server: Stopping server on 55372
[junit] 12/08/27 23:42:22 INFO ipc.Server: IPC Server handler 2 on 55372: 
exiting
[junit] 12/08/27 23:42:22 INFO metrics.RpcInstrumentation: shut down
[junit] 12/08/27 23:42:22 INFO datanode.DataNode: Waiting for threadgroup 
to exit, active threads is 1
[junit] 12/08/27 23:42:22 INFO ipc.Server: Stopping IPC Server Responder
[junit] 12/08/27 23:42:22 INFO ipc.Server: IPC Server handler 1 on 55372: 
exiting
[junit] 12/08/27 23:42:22 INFO ipc.Server: IPC Server handler 0 on 55372: 
exiting
[junit] 12/08/27 23:42:22 WARN datanode.DataNode: 
DatanodeRegistration(127.0.0.1:34243, 
storageID=DS-1532624747-67.195.138.24-34243-1346110238303, infoPort=44328, 
ipcPort=55372):DataXceiveServer:java.nio.channels.AsynchronousCloseException
[junit] at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
[junit] at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:159)
[junit] at 
sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
[junit] at 
org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:131)
[junit] at java.lang.Thread.run(Thread.java:662)
[junit] 
[junit] 12/08/27 23:42:22 INFO datanode.DataNode: Exiting DataXceiveServer
[junit] 12/08/27 23:42:22 INFO ipc.Server: Stopping IPC Server listener on 
55372
[junit] 12/08/27 23:42:22 INFO datanode.DataBlockScanner: Exiting 
DataBlockScanner thread.
[junit] 12/08/27 23:42:22 INFO datanode.DataNode: Scheduling block 
blk_-1622806193096722258_1101 file 
build/test/data/dfs/data/data3/current/blk_-1622806193096722258 for deletion
[junit] 12/08/27 23:42:22 INFO datanode.DataNode: Scheduling block 
blk_3674274575908176342_1102 file 
build/test/data/dfs/data/data4/current/blk_3674274575908176342 for deletion
[junit] 12/08/27 23:42:22 INFO datanode.DataNode: Scheduling block 
blk_6014977156259038012_1095 file 
build/test/data/dfs/data/data4/current/blk_6014977156259038012 for deletion
[junit] 12/08/27 23:42:22 INFO datanode.DataNode: Deleted block 
blk_3674274575908176342_1102 at file 
build/test/data/dfs/data/data4/current/blk_3674274575908176342
[junit] 12/08/27 23:42:22 INFO datanode.DataNode: Deleted block 
blk_6014977156259038012_1095 at file 
build/test/data/dfs/data/data4/current/blk_6014977156259038012
[junit] 12/08/27 23:42:22 INFO datanode.DataNode: Deleted block 
blk_-1622806193096722258_1101 at file 
build/test/data/dfs/data/data3/current/blk_-1622806193096722258
[junit] 12/08/27 23:42:23 INFO datanode.DataN

[jira] [Created] (PIG-2892) piggybank build failing on trunk

2012-08-27 Thread Alan Gates (JIRA)
Alan Gates created PIG-2892:
---

 Summary: piggybank build failing on trunk
 Key: PIG-2892
 URL: https://issues.apache.org/jira/browse/PIG-2892
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Reporter: Alan Gates
Priority: Critical


When I try to build Piggybank I get:

{code}
   [javac] 
/grid/0/hortonal/src/pig/top/trunk/contrib/piggybank/java/build.xml:92: 
warning: 'includeantruntime' was not set, defaulting to 
build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 159 source files to 
/grid/0/hortonal/src/pig/top/trunk/contrib/piggybank/java/build/classes
[javac] 
/grid/0/hortonal/src/pig/top/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java:121:
 cannot find symbol
[javac] symbol  : method setDate(int,java.util.Date)
[javac] location: interface java.sql.PreparedStatement
[javac] ps.setDate(sqlPos, ((DateTime) field).toDate());
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1891) Enable StoreFunc to make intelligent decision based on job success or failure

2012-08-27 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442760#comment-13442760
 ] 

Alan Gates commented on PIG-1891:
-

Never mind on TestMacroExpansion.  I see that is failing in trunk as well.

> Enable StoreFunc to make intelligent decision based on job success or failure
> -
>
> Key: PIG-1891
> URL: https://issues.apache.org/jira/browse/PIG-1891
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.10.0
>Reporter: Alex Rovner
>Priority: Minor
>  Labels: patch
> Attachments: PIG-1891-1.patch, PIG-1891-2.patch
>
>
> We are in the process of using PIG for various data processing and component 
> integration. Here is where we feel pig storage funcs lack:
> They are not aware if the over all job has succeeded. This creates a problem 
> for storage funcs which needs to "upload" results into another system:
> DB, FTP, another file system etc.
> I looked at the DBStorage in the piggybank 
> (http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java?view=markup)
>  and what I see is essentially a mechanism which for each task does the 
> following:
> 1. Creates a recordwriter (in this case open connection to db)
> 2. Open transaction.
> 3. Writes records into a batch
> 4. Executes commit or rollback depending if the task was successful.
> While this aproach works great on a task level, it does not work at all on a 
> job level. 
> If certain tasks will succeed but over job will fail, partial records are 
> going to get uploaded into the DB.
> Any ideas on the workaround? 
> Our current workaround is fairly ugly: We created a java wrapper that 
> launches pig jobs and then uploads to DB's once pig's job is successful. 
> While the approach works, it's not really integrated into pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-1283) COUNT on null bag causes failure

2012-08-27 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-1283:
---

Assignee: Anand L Ranganathan

> COUNT on null bag causes failure
> 
>
> Key: PIG-1283
> URL: https://issues.apache.org/jira/browse/PIG-1283
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Anand L Ranganathan
>  Labels: newbie
> Attachments: PIG-1283-1.patch
>
>
> grunt>  l = load '/tmp/e.bag' as (b : bag{t: (i : int)}, a : int);
> # b is null for the only row
> grunt> c = foreach l generate COUNT(b);   
> grunt> dump c   
> It results in following exception-
> org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error while 
> computing count in COUNT
> at org.apache.pig.builtin.COUNT.exec(COUNT.java:59)
> at org.apache.pig.builtin.COUNT.exec(COUNT.java:39)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:212)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:358)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:232)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)
> Caused by: java.lang.NullPointerException
> at org.apache.pig.builtin.COUNT.exec(COUNT.java:46)
> ... 12 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1283) COUNT on null bag causes failure

2012-08-27 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442714#comment-13442714
 ] 

Daniel Dai commented on PIG-1283:
-

Hi, Anand, can you add a test case? Usually we add a test case for bug fix if 
possible. Thanks.

> COUNT on null bag causes failure
> 
>
> Key: PIG-1283
> URL: https://issues.apache.org/jira/browse/PIG-1283
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Anand L Ranganathan
>  Labels: newbie
> Attachments: PIG-1283-1.patch
>
>
> grunt>  l = load '/tmp/e.bag' as (b : bag{t: (i : int)}, a : int);
> # b is null for the only row
> grunt> c = foreach l generate COUNT(b);   
> grunt> dump c   
> It results in following exception-
> org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error while 
> computing count in COUNT
> at org.apache.pig.builtin.COUNT.exec(COUNT.java:59)
> at org.apache.pig.builtin.COUNT.exec(COUNT.java:39)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:212)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:358)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:232)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)
> Caused by: java.lang.NullPointerException
> at org.apache.pig.builtin.COUNT.exec(COUNT.java:46)
> ... 12 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2708) split MiniCluster based tests out of org.apache.pig.test.TestInputOutputFileValidator

2012-08-27 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2708:


   Resolution: Fixed
Fix Version/s: 0.11
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Patch committed. Thanks Anand for contributing!

> split MiniCluster based tests out of 
> org.apache.pig.test.TestInputOutputFileValidator
> -
>
> Key: PIG-2708
> URL: https://issues.apache.org/jira/browse/PIG-2708
> Project: Pig
>  Issue Type: Test
>Reporter: Julien Le Dem
>Assignee: Anand L Ranganathan
>  Labels: newbie
> Fix For: 0.11
>
> Attachments: PIG-2708-1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2791) Pig does not work with ViewFileSystem

2012-08-27 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2791:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Commit FixMiniCluster-branch10-1.patch to 0.10.

> Pig does not work with ViewFileSystem
> -
>
> Key: PIG-2791
> URL: https://issues.apache.org/jira/browse/PIG-2791
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.10.0
> Environment: Pig QE
>Reporter: patrick white
>Assignee: Rohini Palaniswamy
>Priority: Blocker
> Attachments: asf_test_notes.txt, FixMiniCluster-branch10-1.patch, 
> FixMiniCluster-branch10.patch, PIG-2791-0.patch, PIG-2791-1.patch, 
> PIG-2791-2.patch, PIG-2791-3-branch10.patch, PIG-2791-3-trunk.patch, 
> PIG-2791-4-branch10.patch, PIG-2791-4-trunk.patch, PIG-2791-5-trunk.patch
>
>
> The Yahoo Pig QE team ran into a blocking issue when trying to test 
> Client-Side Mount Tables, on a Federated cluster with two NNs, this blocks 
> Pig Testing on Federation. 
> Federation relies strongly on the use of CSMT with viewFS, QE found that in 
> this configuration it is not possible to enter grunt shell because Pig makes 
> a call to getDefaultReplication() on the fs, which is ambiguous over viewFS 
> and causes core to throw a 
> org.apache.hadoop.fs.viewfs.NotInMountpointException: "getDefaultReplication 
> on empty path is invalid".
> This in turn cause Pig to exit with an internal error as follows:
> 2012-07-06 22:20:25,657 [main] INFO  org.apache.pig.Main - Apache Pig version 
> 0.10.1.0.1206081058 (r1348169) compiled Jun 08 2012, 17:58:42
> 2012-07-06 22:20:26,074 [main] WARN  org.apache.hadoop.conf.Configuration - 
> mapred.used.genericoptionsparser is deprecated. Instead, use 
> mapreduce.client.genericoptionsparser.used
> 2012-07-06 22:20:26,076 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: viewfs:///
> 2012-07-06 22:20:26,080 [main] WARN  org.apache.hadoop.conf.Configuration - 
> fs.default.name is deprecated. Instead, use fs.defaultFS
> 2012-07-06 22:20:26,522 [main] ERROR org.apache.pig.Main - ERROR 2999: 
> Unexpected internal error. getDefaultReplication on empty path is invalid
> 2012-07-06 22:20:26,522 [main] WARN  org.apache.pig.Main - There is no log 
> file to write to.
> 2012-07-06 22:20:26,522 [main] ERROR org.apache.pig.Main - 
> org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication 
> on empty path is invalid
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:482)
> at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:77)
> at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.(HDataStorage.java:58)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:205)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:118)
> at org.apache.pig.impl.PigContext.connect(PigContext.java:208)
> at org.apache.pig.PigServer.(PigServer.java:246)
> at org.apache.pig.PigServer.(PigServer.java:231)
> at org.apache.pig.tools.grunt.Grunt.(Grunt.java:47)
> at org.apache.pig.Main.run(Main.java:487)
> at org.apache.pig.Main.main(Main.java:111)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-2708) split MiniCluster based tests out of org.apache.pig.test.TestInputOutputFileValidator

2012-08-27 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-2708:
---

Assignee: Anand L Ranganathan

> split MiniCluster based tests out of 
> org.apache.pig.test.TestInputOutputFileValidator
> -
>
> Key: PIG-2708
> URL: https://issues.apache.org/jira/browse/PIG-2708
> Project: Pig
>  Issue Type: Test
>Reporter: Julien Le Dem
>Assignee: Anand L Ranganathan
>  Labels: newbie
> Attachments: PIG-2708-1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1891) Enable StoreFunc to make intelligent decision based on job success or failure

2012-08-27 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442699#comment-13442699
 ] 

Alan Gates commented on PIG-1891:
-

This adds a failure in TestLoadStoreFuncLifeCycle and TestMacroExpansion.  

In TestLoadStoreFuncLifeCycle the failure is because it re-instantiates the 
store function again.  Julien had put in tests to make sure the number of 
instantiation stays down.  After talking with him he said he thought this patch 
was fine, so you can bump up the instantiation number it checks for from 3 to 
4.  

I'm not clear what's driving the failure in TestMacroExpansion.

I'll run the e2e tests as well as post results.



> Enable StoreFunc to make intelligent decision based on job success or failure
> -
>
> Key: PIG-1891
> URL: https://issues.apache.org/jira/browse/PIG-1891
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.10.0
>Reporter: Alex Rovner
>Priority: Minor
>  Labels: patch
> Attachments: PIG-1891-1.patch, PIG-1891-2.patch
>
>
> We are in the process of using PIG for various data processing and component 
> integration. Here is where we feel pig storage funcs lack:
> They are not aware if the over all job has succeeded. This creates a problem 
> for storage funcs which needs to "upload" results into another system:
> DB, FTP, another file system etc.
> I looked at the DBStorage in the piggybank 
> (http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java?view=markup)
>  and what I see is essentially a mechanism which for each task does the 
> following:
> 1. Creates a recordwriter (in this case open connection to db)
> 2. Open transaction.
> 3. Writes records into a batch
> 4. Executes commit or rollback depending if the task was successful.
> While this aproach works great on a task level, it does not work at all on a 
> job level. 
> If certain tasks will succeed but over job will fail, partial records are 
> going to get uploaded into the DB.
> Any ideas on the workaround? 
> Our current workaround is fairly ugly: We created a java wrapper that 
> launches pig jobs and then uploads to DB's once pig's job is successful. 
> While the approach works, it's not really integrated into pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1891) Enable StoreFunc to make intelligent decision based on job success or failure

2012-08-27 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1891:


Status: Open  (was: Patch Available)

> Enable StoreFunc to make intelligent decision based on job success or failure
> -
>
> Key: PIG-1891
> URL: https://issues.apache.org/jira/browse/PIG-1891
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.10.0
>Reporter: Alex Rovner
>Priority: Minor
>  Labels: patch
> Attachments: PIG-1891-1.patch, PIG-1891-2.patch
>
>
> We are in the process of using PIG for various data processing and component 
> integration. Here is where we feel pig storage funcs lack:
> They are not aware if the over all job has succeeded. This creates a problem 
> for storage funcs which needs to "upload" results into another system:
> DB, FTP, another file system etc.
> I looked at the DBStorage in the piggybank 
> (http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java?view=markup)
>  and what I see is essentially a mechanism which for each task does the 
> following:
> 1. Creates a recordwriter (in this case open connection to db)
> 2. Open transaction.
> 3. Writes records into a batch
> 4. Executes commit or rollback depending if the task was successful.
> While this aproach works great on a task level, it does not work at all on a 
> job level. 
> If certain tasks will succeed but over job will fail, partial records are 
> going to get uploaded into the DB.
> Any ideas on the workaround? 
> Our current workaround is fairly ugly: We created a java wrapper that 
> launches pig jobs and then uploads to DB's once pig's job is successful. 
> While the approach works, it's not really integrated into pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2821) HBaseStorage should work with secure hbase

2012-08-27 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2821:


  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Patch committed to 0.10/trunk. Thanks Rohini!

> HBaseStorage should work with secure hbase
> --
>
> Key: PIG-2821
> URL: https://issues.apache.org/jira/browse/PIG-2821
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Francis Liu
>Assignee: Rohini Palaniswamy
> Fix For: 0.11, 0.10.1
>
> Attachments: PIG-2821-1.patch, PIG-2821-branch10.patch, 
> PIG-2821-trunk.patch
>
>
> HBaseStorage needs to add HBase delegation token to the Job object if hbase 
> security is enabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2885) TestJobSumission and TestHBaseStorage don't work with HBase 0.94 and ZK 3.4.3

2012-08-27 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442691#comment-13442691
 ] 

Cheolsoo Park commented on PIG-2885:


Actually, we have a bigger problem. The hbase jar in the maven repository is 
not binary-compatible with hadoop 0.23 (HBASE-5680). To get hbase working with 
hadoop 0.23, we have to recompile the source code against hadoop 0.23.

Using the hbase jar from the maven repository makes "TestHBaseStorage 
-Dhadoopversion=23" fail with the following error:
{code}
2012-08-25 12:55:47,100 FATAL 
[Master:0;localhost.localdomain,49603,1345924546912] master.HMaster 
(HMaster.java:abort(1388)) - HBase is having a problem with its Hadoop jars.  
You may need to recompile HBase against Hadoop version 0.23.1 or change your 
hadoop jars to start properly
java.lang.NoClassDefFoundError: 
org/apache/hadoop/hdfs/protocol/FSConstants$SafeModeAction
at org.apache.hadoop.hbase.util.FSUtils.waitOnSafeMode(FSUtils.java:524)
at 
org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:324)
at 
org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:127)
at 
org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFileSystem.java:112)
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:480)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:343)
at java.lang.Thread.run(Thread.java:662)
{code}

I am going to table this for now.

> TestJobSumission and TestHBaseStorage don't work with HBase 0.94 and ZK 3.4.3
> -
>
> Key: PIG-2885
> URL: https://issues.apache.org/jira/browse/PIG-2885
> Project: Pig
>  Issue Type: Bug
> Environment: Hadoop 1.0.3, CentOS 6.3 64 bit
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
>Priority: Minor
> Attachments: PIG-2885.patch
>
>
> I ran into two unit test failures (TestJobSubmission and TestHBaseStorage) by 
> bumping the version of HBase and ZK to 0.94 and 3.4.3 respectively in hadoop 
> 1.0.3. I am opening a jira to capture what I found for future reference.
> - Two dependency libraries of HBase 0.94 are missing in ivy.xml - 
> high-scale-lib and protobuf-java.
> - The HTable constructor in HBase 0.94 changed:
> {code}
> -HTable table = new HTable(TESTTABLE_2);
> +HTable table = new HTable(conf, TESTTABLE_2);
> {code}
> - The default client port of MiniZooKeeperCluster in HBase 0.94 is no longer 
> 21818. Since it is chosen randomly at runtime, it has to be set in PigContext.
> {code}
> @@ -541,7 +543,7 @@ public class TestJobSubmission {
>  // use the estimation
>  Configuration conf = cluster.getConfiguration();
>  HBaseTestingUtility util = new HBaseTestingUtility(conf);
> -util.startMiniZKCluster();
> +int clientPort = util.startMiniZKCluster().getClientPort();
>  util.startMiniHBaseCluster(1, 1); 
>  
>  String query = "a = load '/passwd';" + 
> @@ -553,6 +555,7 @@ public class TestJobSubmission {
>  
>  pc.getConf().setProperty("pig.exec.reducers.bytes.per.reducer", 
> "100");
>  pc.getConf().setProperty("pig.exec.reducers.max", "10");
> +pc.getConf().setProperty(HConstants.ZOOKEEPER_CLIENT_PORT, 
> Integer.toString(clientPort));
>  ConfigurationValidator.validatePigProperties(pc.getProperties());
>  conf = ConfigurationUtil.toConfiguration(pc.getProperties());
>  JobControlCompiler jcc = new JobControlCompiler(pc, conf);
> {code}
> With the attached patch, both tests pass with hadoop 1.0.3. Please note that 
> TestHBaseStorage fails in hadoop 0.23, and I haven't investigated that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2888) Improve performance of POPartialAgg

2012-08-27 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442613#comment-13442613
 ] 

Dmitriy V. Ryaboy commented on PIG-2888:


none of the PigMix queries hit the particular bad behavior this is meant to 
address. I've verified that the speed is on par with the previous 
implementation for those "good" use cases.

Here is a script for which Pig with this patch finishes in 57 seconds, while 
without the patch, it takes 13 mins 48 secs:

{code}
rmf tmp/delme
l = load 'data.txt';
x = foreach l generate $0 as l, (int) (RANDOM() * 1) as num; 
g = foreach (group x by num % 100) { d = distinct x.num; generate SUM(d); }
store g into 'tmp/delme';
{code}

Data file contains about 7 million rows, 1 letter each. 
This is an intentionally skewed example, but we've encountered similar problems 
with real data, particularly when grouping by high-cardinality columns like 
user_id and subsequently performing algebraic operations on nested distincts.

> Improve performance of POPartialAgg
> ---
>
> Key: PIG-2888
> URL: https://issues.apache.org/jira/browse/PIG-2888
> Project: Pig
>  Issue Type: Improvement
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
> Attachments: partialagg_patch_1.patch, partialagg_patch_2.patch, 
> partialagg_patch_3.patch
>
>
> During performance testing, we found that POPartialAgg can cause performance 
> degradation for Pig jobs when the Algebraic UDFs it's being applied to aren't 
> well suited to the operator's assumptions. Changing the implementation to a 
> more flexible hash-based model can provide significant performance 
> improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2881) Add SUBTRACT eval function

2012-08-27 Thread Joel Costigliola (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Costigliola updated PIG-2881:
--

Attachment: SubtractTest.java

new version of test depending only on JUnit assertions.

> Add SUBTRACT eval function
> --
>
> Key: PIG-2881
> URL: https://issues.apache.org/jira/browse/PIG-2881
> Project: Pig
>  Issue Type: New Feature
>  Components: piggybank
>Affects Versions: 0.10.0
>Reporter: Joel Costigliola
>Priority: Minor
> Attachments: Subtract.java, SubtractTest.java
>
>
> Close to DIFF function but SUBTRACT(bag1, bag2) will subtract elements of 
> bag2 from bag1.
>   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2881) Add SUBTRACT eval function

2012-08-27 Thread Joel Costigliola (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Costigliola updated PIG-2881:
--

Attachment: (was: SubtractTest.java)

> Add SUBTRACT eval function
> --
>
> Key: PIG-2881
> URL: https://issues.apache.org/jira/browse/PIG-2881
> Project: Pig
>  Issue Type: New Feature
>  Components: piggybank
>Affects Versions: 0.10.0
>Reporter: Joel Costigliola
>Priority: Minor
> Attachments: Subtract.java
>
>
> Close to DIFF function but SUBTRACT(bag1, bag2) will subtract elements of 
> bag2 from bag1.
>   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2881) Add SUBTRACT eval function

2012-08-27 Thread Joel Costigliola (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442327#comment-13442327
 ] 

Joel Costigliola commented on PIG-2881:
---

I agree with you, I should have sent tests with basic JUnit assertions.
Truth is I was just lazy to rewrite it but I have more time now, so I can send 
you a test version withour Fest assertions.
You tell em.

> Add SUBTRACT eval function
> --
>
> Key: PIG-2881
> URL: https://issues.apache.org/jira/browse/PIG-2881
> Project: Pig
>  Issue Type: New Feature
>  Components: piggybank
>Affects Versions: 0.10.0
>Reporter: Joel Costigliola
>Priority: Minor
> Attachments: Subtract.java, SubtractTest.java
>
>
> Close to DIFF function but SUBTRACT(bag1, bag2) will subtract elements of 
> bag2 from bag1.
>   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira