[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-16 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436528#comment-13436528
 ] 

Zhijie Shen commented on PIG-1314:
--

{quote}
I have one suggestion - add getWeeks and weeksBetween, if it isn't 
inconvenient. I think Jodatime can do this. It is useful when dealing in weeks.
{quote}

Yes, week field should be useful. In addition to it, I think it's better to add 
getWeekYear as well, because using weeks of year alone may cause ambiguity 
sometimes. For example, both "2008-12-31" and "2009-01-01" are week 1 of 
weekyear 2009, though the two dates are in two different years.

In addition, do you think it is better to rename some time UDFs as follows?

getMonth -> getMonthOfYear
getDay -> getDayOfMonth (do we need getDayOfWeek and getDayOfYear as well?)
getHour -> getHourOfDay
getMinute -> getMinuteOfHour
getSecond -> getSecondOfMinute
getMilliSecond -> getMilliOfSecond

The changes will make UDFs' names longer but clearer.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-16 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436477#comment-13436477
 ] 

Russell Jurney commented on PIG-1314:
-

I have one suggestion - add getWeeks and weeksBetween, if it isn't 
inconvenient. I think Jodatime can do this. It is useful when dealing in weeks.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: Review for PIG-1314 - add datetime type in pig

2012-08-16 Thread Thejas Nair

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5414/
---

(Updated Aug. 17, 2012, 12:09 a.m.)


Review request for pig.


Changes
---

PIG-1314-6.patch


Description
---

Review for PIG-1314


This addresses bug PIG-1314.
https://issues.apache.org/jira/browse/PIG-1314


Diffs (updated)
-

  http://svn.apache.org/repos/asf/pig/trunk/.eclipse.templates/.classpath 
1373741 
  http://svn.apache.org/repos/asf/pig/trunk/conf/pig.properties 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/SequenceFileLoader.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/zebra/src/java/org/apache/hadoop/zebra/pig/SchemaConverter.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/zebra/src/java/org/apache/hadoop/zebra/pig/comparator/DateTimeExpr.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/zebra/src/java/org/apache/hadoop/zebra/pig/comparator/ExprUtils.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/zebra/src/java/org/apache/hadoop/zebra/schema/ColumnType.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/zebra/src/java/org/apache/hadoop/zebra/schema/SchemaParser.jjt
 1373741 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/LoadCaster.java 
1373741 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigWarning.java 
1373741 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/StoreCaster.java 
1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/DateTimeWritable.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/HDataType.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigDateTimeRawComparator.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/WeightedRangePartitioner.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ComparisonOperator.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ConstantExpression.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/EqualToExpr.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ExpressionOperator.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GTOrEqualToExpr.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GreaterThanExpr.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LTOrEqualToExpr.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LessThanExpr.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/NotEqualToExpr.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POBinCond.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POCast.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POIsNull.java
 1373741 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POMapLookUp.java
 1373741 
  
h

Re: Sync delay between git and svn

2012-08-16 Thread Jonathan Coveney
I don't know, but generate <1h. I'd also add the official apache git mirror
as a repo, and pull from there. http://git.apache.org/

2012/8/16 Prasanth J 

> Hello everyone
>
> I am using pig git repository for my development. I forked apache/pig
> project from github to my account and working on a cloned copy.
> Occasionally I update my trunk with latest code from apache/pig/trunk
> remote. I wanted to patch my code with the latest trunk but trunk revision
> in git is different from that of svn. I received a commit message about an
> hour back but that commit is not reflected in git yet. Does anyone know how
> long does svn commit takes to get reflected in git?
>
> Thanks
> -- Prasanth
>
>


Build failed in Jenkins: Pig-trunk #1299

2012-08-16 Thread Apache Jenkins Server
See 

Changes:

[thejas] PIG-2662: skew join does not honor its config parameters 
(rajesh.balamohan via thejas)

--
[...truncated 37421 lines...]
[junit] at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:550)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutdownMiniDfsClusters(MiniGenericCluster.java:87)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutdownMiniDfsAndMrClusters(MiniGenericCluster.java:77)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutDown(MiniGenericCluster.java:68)
[junit] at 
org.apache.pig.test.TestStore.oneTimeTearDown(TestStore.java:129)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
[junit] at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
[junit] at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
[junit] at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37)
[junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
[junit] at 
junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
[junit] 12/08/16 23:19:47 WARN datanode.FSDatasetAsyncDiskService: 
AsyncDiskService has already shut down.
[junit] Shutting down DataNode 2
[junit] 12/08/16 23:19:47 INFO mortbay.log: Stopped 
SelectChannelConnector@localhost:0
[junit] 12/08/16 23:19:47 INFO ipc.Server: Stopping server on 43860
[junit] 12/08/16 23:19:47 INFO ipc.Server: IPC Server handler 0 on 43860: 
exiting
[junit] 12/08/16 23:19:47 INFO ipc.Server: IPC Server handler 2 on 43860: 
exiting
[junit] 12/08/16 23:19:47 INFO ipc.Server: IPC Server handler 1 on 43860: 
exiting
[junit] 12/08/16 23:19:47 INFO ipc.Server: Stopping IPC Server listener on 
43860
[junit] 12/08/16 23:19:47 INFO ipc.Server: Stopping IPC Server Responder
[junit] 12/08/16 23:19:47 INFO metrics.RpcInstrumentation: shut down
[junit] 12/08/16 23:19:47 INFO datanode.DataNode: Waiting for threadgroup 
to exit, active threads is 1
[junit] 12/08/16 23:19:47 WARN datanode.DataNode: 
DatanodeRegistration(127.0.0.1:47567, 
storageID=DS-1141731053-67.195.138.20-47567-1345158728251, infoPort=56642, 
ipcPort=43860):DataXceiveServer:java.nio.channels.AsynchronousCloseException
[junit] at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
[junit] at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:159)
[junit] at 
sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
[junit] at 
org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:131)
[junit] at java.lang.Thread.run(Thread.java:662)
[junit] 
[junit] 12/08/16 23:19:47 INFO datanode.DataNode: Exiting DataXceiveServer
[junit] 12/08/16 23:19:48 INFO hdfs.StateChange: BLOCK* ask 127.0.0.1:60158 
to delete  blk_7472377375372200414_1078 blk_575971279202259882_1073
[junit] 12/08/16 23:19:48 INFO hdfs.StateChange: BLOCK* ask 127.0.0.1:34278 
to delete  blk_7472377375372200414_1078 blk_575971279202259882_1073
[junit] 12/08/16 23:19:48 INFO datanode.DataBlockScanner: Exiting 
DataBlockScanner thread.
[junit] 12/08/16 23:19:48 INFO datanode.DataNode: Waiting for threadgroup 
to exit, active threads is 0
[junit] 12/08/16 23:19:48 INFO datanode.DataNode: 
DatanodeRegistration(127.0.0.1:47567, 
storageID=DS-1141731053-67.195.138.20-47567-1345158728251, infoPort=56642, 
ipcPort=43860):Finishing DataNode in: 
FSDataset{dirpath='
[junit] 12/08/16 23:19:48 INFO ipc.Server: Stopping server on 43860
[junit] 12/08/16 23:19:48 INFO metrics.RpcInstrumentation: shut down
[junit] 12/08/16 23:19:48 INFO datanode.DataNode: Waiting for threadgroup 
to exit, active threads is 0
[junit] 12/08/16 23:19:48 INFO datanode.FSDatasetAsyncDiskService: Shutting 
down all async disk service th

[jira] [Updated] (PIG-2875) Add recursive record support to AvroStorage

2012-08-16 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2875:
---

Attachment: PIG-2875.patch

> Add recursive record support to AvroStorage
> ---
>
> Key: PIG-2875
> URL: https://issues.apache.org/jira/browse/PIG-2875
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Affects Versions: 0.10.0
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Attachments: avro_test_files.tar.gz, PIG-2869.patch, PIG-2875.patch
>
>
> Currently, AvroStorage does not allow recursive records in Avro schema 
> because it is not possible to define Pig schema for recursive records. (i.e. 
> records that have self-referencing fields cause an infinite loop, so they are 
> not supported.)
> Even though there is no natural way of handling recursive records in Pig 
> schema, I'd like to propose the following workaround: mapping recursive 
> records to bytearray.
> Take for example the following Avro schema:
> {code}
> {
>   "type" : "record",
>   "name" : "RECURSIVE_RECORD",
>   "fields" : [ {
> "name" : "value",
> "type" : [ "null", "int" ]
>   }, {
> "name" : "next",
> "type" : [ "null", "RECURSIVE_RECORD" ]
>   } ]
> }
> {code}
> and the following data:
> {code}
> {"value":1,"next":{"RECURSIVE_RECORD":{"value":2,"next":{"RECURSIVE_RECORD":{"value":3,"next":null}
>  
> {"value":2,"next":{"RECURSIVE_RECORD":{"value":3,"next":null}}} 
> {"value":3,"next":null}
> {code}
> Then, we can define Pig schema as follows:
> {code}
> {value: int,next: bytearray}
> {code}
> Even though Pig thinks that the "next" fields are bytearray, they're actually 
> loaded as tuples since AvroStorage uses Avro schema when loading files.
> {code}
> grunt> in = LOAD 'test_recursive_schema.avro' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage ();
> grunt> dump in;
> (1,(2,(3,)))
> (2,(3,))
> (3,)
> {code}
> At this point, we have discrepancy between Avro schema and Pig schema; 
> nevertheless, we can still refer to each field of tuples as follows:
> {code}
> grunt> first = FOREACH in GENERATE $0;
> grunt> dump first;
> (1)
> (2)
> (3)
> or
> grunt> second = FOREACH in GENERATE $1.$0;
> grunt> dump second;
> (2)
> (3)
> ()
> {code}
> Lastly, we can store these tuples as Avro files by specifying schema. Since 
> we can no longer construct Avro schema from Pig schema, it is required for 
> the user to provide Avro schema via the 'schema' parameter in STORE function.
> {code}
> grunt> STORE first INTO 'output' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage ( 'schema', '[ "null", 
> "int" ]' );
> or
> grunt> STORE in INTO 'output' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage ( 'schema', '
> {
>   "type" : "record",
>   "name" : "recursive_schema",
>   "fields" : [ { 
> "name" : "value",
> "type" : [ "null", "int" ]
>   }, {
> "name" : "next",
> "type" : [ "null", "recursive_schema" ]
>   } ] 
> }
> ' );
> {code}
> To implement this workaround, the following work is required:
> - Update the current generic union check so that it can handle recursive 
> records. Currently, AvroStorage checks if the Avro schema contains 1) 
> recursive records and 2) generic unions, and fails if so. But since I am 
> going to remove the 1st check, the 2nd check should be able to handle 
> recursive records without stack overflow.
> - Update AvroSchema2Pig so that recursive records can be detected and mapped 
> to bytearrays in Pig schema.
> - Add the 'no_schema_check' parameter to STORE function so that results can 
> be stored even though there exists discrepancy between Avro schema and Pig 
> schema. Since Avro schema for STORE function cannot be constructed from Pig 
> schema, it has to be specified by the user via the 'schema' parameter, and 
> schema check has to be disabled by 'no_schema_check'.
> - Update AvroStorage wiki.
> - Add unit tests.
> I do not think that any incompatibility issues will be introduced by this.
> P.S. The reason why I chose to map recursive records to bytearray instead of 
> empty tuple is because I cannot refer to any field if I use empty tuple. For 
> example, if Pig schema is defined as follows:
> {code}
> {value: int,next: ()}
> {code}
> I get an exception when I attempt to refer to any field in loaded tuples 
> since their schema is not defined (i.e. empty tuple).
> {code}
> ERROR 1127: Index 0 out of range in schema
> {code}
> This is all what I found by trials and errors, so there might be something 
> that I am missing here. If so, please let me know.
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, se

Re: Review Request: PIG-2875 Add recursive record support to AvroStorage

2012-08-16 Thread Cheolsoo Park


> On Aug. 16, 2012, 7:27 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java,
> >  line 386
> > 
> >
> > The no_schema_check can now appear in any position, i.e., its no longer 
> > to required for it to be the first argument?

In fact, 'no_schema_check' wasn't used with other options at all. But since I 
was making it possible to use it with other options, I didn't want to restrict 
its position. So 'no_schema_check' can be at any position except between a 
parameter and its value:

no_schema_check, key, value => OK
key, value, no_schema_check => OK
key, no_schema_check, value => not OK


> On Aug. 16, 2012, 7:27 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java,
> >  line 389
> > 
> >
> > I am a little concerned with this change. For loop counters, to be 
> > conservative, I would avoid making changes in the body of the loop. Its 
> > hard to detect these changes and harder to maintain.
> > 
> > Is there a better way of implementing this?

I agree that we should be careful about counters. But the problem is that the 
original loop body was implemented with an assumption that every arg is a pair 
of parameter/value, which is no longer true since I am allowing a parameter 
only arg 'no_schema_check'. So I think that I have to make some changes to the 
body of the loop.

Nevertheless, I made increasing counters more explicit in the new patch, so I 
believe that they are more visible now. Please let me know if you have a better 
suggestion.


> On Aug. 16, 2012, 7:27 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java,
> >  line 288
> > 
> >
> > Can this be changed to return containsGenericUnion(fs, visitedRecords) 
> > to be consistent with the other parts of the code?

No, it cannot. It is inside a loop, so we should not return until the loop is 
over.


> On Aug. 16, 2012, 7:27 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java,
> >  line 318
> > 
> >
> > Can this be changed to return containsGenericUnion(fs, visitedRecords) 
> > to be consistent with the other parts of the code?

No, it cannot. It is inside a loop, so we should not return until the loop is 
over.


> On Aug. 16, 2012, 7:27 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java,
> >  line 866
> > 
> >
> > Not sure if this holds true for all cases. This is an either or - i.e., 
> > if a sequence of jobs has one failure then the assertion will kick in for 
> > the first one which is actually a false alarm.

Indeed. I modified the code, so now the number of failed jobs (numOfFailedJobs) 
is counted. If expectedToFail is true, numOfFailedJobs must be greater than 0; 
otherwise, it must be equal to 0.


- Cheolsoo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6536/#review10347
---


On Aug. 16, 2012, 10:18 p.m., Cheolsoo Park wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6536/
> ---
> 
> (Updated Aug. 16, 2012, 10:18 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> ---
> 
> Allow recursive records to be loaded/stored by AvroStorage.
> 
> The changes include:
> 
> 1) Remove the recursive record check from AvroSchema2Pig.
> 2) Modofy inconvert() in AvroSchema2Pig so that it can map recursive records 
> to bytearrays.
> 3) Modify containsGenericUnion() in AvroStorageUtils so that it can handle 
> Avro schema that contains recursive records.
> 4) Update the parameter parsing in AvroStorage so that 'no_schema_check' can 
> be passed to both LoadFunc and StoreFunc.
> 5) Add the recursive record check to AvroSchemaManager. This is needed 
> because 'schema_file' and 'data' cannot refer to avro schema that contains 
> recursive records.
> 
> AvroStorage works as follows:
> 
> 1) PigSchema maps recursive records to bytearrays, so there is discrepancy 
> between Avro schema and Pig schema.
> 2) Recursive records are loaded as tuples even though Pig schema defines them 
> as bytearrays and can be referred to by position (e.g. 

Re: Review Request: PIG-2831: MR-Cube implementation (Distributed cubing for holistic measures)

2012-08-16 Thread j . prasanth . j

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6651/
---

(Updated Aug. 16, 2012, 10:19 p.m.)


Review request for pig and Dmitriy Ryaboy.


Changes
---

Updated the number of reducers in sample job to 1. Since sample dataset is 
small it can be easily handled by 1 reducer. 


Description
---

This is a review board request for 
https://issues.apache.org/jira/browse/PIG-2831


This addresses bug PIG-2831.
https://issues.apache.org/jira/browse/PIG-2831


Diffs (updated)
-

  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
 8029dec 
  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java
 1d05a20 
  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java
 b87c209 
  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/MRPrinter.java
 157caad 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POCast.java
 cde340c 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java
 ff65146 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCube.java
 PRE-CREATION 
  src/org/apache/pig/backend/hadoop/executionengine/util/MapRedUtil.java 
0502917 
  src/org/apache/pig/builtin/CubeDimensions.java 5652029 
  src/org/apache/pig/builtin/PigStorage.java 21e835f 
  src/org/apache/pig/builtin/RollupDimensions.java f6c26e4 
  src/org/apache/pig/impl/builtin/HolisticCube.java PRE-CREATION 
  src/org/apache/pig/impl/builtin/HolisticCubeCompoundKey.java PRE-CREATION 
  src/org/apache/pig/impl/builtin/PartitionMaxGroup.java PRE-CREATION 
  src/org/apache/pig/impl/builtin/PostProcessCube.java PRE-CREATION 
  src/org/apache/pig/impl/io/ReadSingleLoader.java PRE-CREATION 
  src/org/apache/pig/impl/util/Utils.java 270cb6a 
  src/org/apache/pig/newplan/logical/optimizer/LogicalPlanPrinter.java 13439c6 
  src/org/apache/pig/newplan/logical/relational/LOCube.java b262efb 
  src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java 
127ab7a 
  src/org/apache/pig/newplan/logical/rules/ColumnPruneHelper.java 369f5c2 
  src/org/apache/pig/parser/LogicalPlanBuilder.java 289a76f 
  src/org/apache/pig/pen/EquivalenceClasses.java 194f8cb 
  src/org/apache/pig/pen/LineageTrimmingVisitor.java 917073c 
  src/org/apache/pig/pen/util/DisplayExamples.java 265f8f7 
  test/org/apache/pig/impl/builtin/TestHolisticCubeCompundKey.java PRE-CREATION 
  test/org/apache/pig/impl/builtin/TestPartitionMaxGroup.java PRE-CREATION 
  test/org/apache/pig/impl/builtin/TestPostProcessCube.java PRE-CREATION 
  test/org/apache/pig/test/TestCubeOperator.java 65d56a6 

Diff: https://reviews.apache.org/r/6651/diff/


Testing
---

Unit tests: All passed

Pre-commit tests: All passed
ant clean test-commit


Thanks,

Prasanth_J



[jira] [Commented] (PIG-2831) MR-Cube implementation (Distributed cubing for holistic measures)

2012-08-16 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436361#comment-13436361
 ] 

Prasanth J commented on PIG-2831:
-

Updated the patch with number of reducers for sample job set to 1. Since the 
size of sample dataset is small  it can be easily handled by single reducer. 

> MR-Cube implementation (Distributed cubing for holistic measures)
> -
>
> Key: PIG-2831
> URL: https://issues.apache.org/jira/browse/PIG-2831
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Prasanth J
>Assignee: Prasanth J
> Attachments: PIG-2831.1.git.patch, PIG-2831.2.git.patch, 
> PIG-2831.3.git.patch
>
>
> Implementing distributed cube materialization on holistic measure based on 
> MR-Cube approach as described in http://arnab.org/files/mrcube.pdf. 
> Primary steps involved:
> 1) Identify if the measure is holistic or not
> 2) Determine algebraic attribute (can be detected automatically for few 
> cases, if automatic detection fails user should hint the algebraic attribute)
> 3) Modify MRPlan to insert a sampling job which executes naive cube algorithm 
> and generates annotated cube lattice (contains large group partitioning 
> information)
> 4) Modify plan to distribute annotated cube lattice to all mappers using 
> distributed cache
> 5) Execute actual cube materialization on full dataset
> 6) Modify MRPlan to insert a post process job for combining the results of 
> actual cube materialization job
> 7) OOM exception handling

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2831) MR-Cube implementation (Distributed cubing for holistic measures)

2012-08-16 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated PIG-2831:


Attachment: PIG-2831.3.git.patch

> MR-Cube implementation (Distributed cubing for holistic measures)
> -
>
> Key: PIG-2831
> URL: https://issues.apache.org/jira/browse/PIG-2831
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Prasanth J
>Assignee: Prasanth J
> Attachments: PIG-2831.1.git.patch, PIG-2831.2.git.patch, 
> PIG-2831.3.git.patch
>
>
> Implementing distributed cube materialization on holistic measure based on 
> MR-Cube approach as described in http://arnab.org/files/mrcube.pdf. 
> Primary steps involved:
> 1) Identify if the measure is holistic or not
> 2) Determine algebraic attribute (can be detected automatically for few 
> cases, if automatic detection fails user should hint the algebraic attribute)
> 3) Modify MRPlan to insert a sampling job which executes naive cube algorithm 
> and generates annotated cube lattice (contains large group partitioning 
> information)
> 4) Modify plan to distribute annotated cube lattice to all mappers using 
> distributed cache
> 5) Execute actual cube materialization on full dataset
> 6) Modify MRPlan to insert a post process job for combining the results of 
> actual cube materialization job
> 7) OOM exception handling

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: PIG-2875 Add recursive record support to AvroStorage

2012-08-16 Thread Cheolsoo Park

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6536/
---

(Updated Aug. 16, 2012, 10:18 p.m.)


Review request for pig.


Changes
---

Incorporate Santhosh's comments.


Description
---

Allow recursive records to be loaded/stored by AvroStorage.

The changes include:

1) Remove the recursive record check from AvroSchema2Pig.
2) Modofy inconvert() in AvroSchema2Pig so that it can map recursive records to 
bytearrays.
3) Modify containsGenericUnion() in AvroStorageUtils so that it can handle Avro 
schema that contains recursive records.
4) Update the parameter parsing in AvroStorage so that 'no_schema_check' can be 
passed to both LoadFunc and StoreFunc.
5) Add the recursive record check to AvroSchemaManager. This is needed because 
'schema_file' and 'data' cannot refer to avro schema that contains recursive 
records.

AvroStorage works as follows:

1) PigSchema maps recursive records to bytearrays, so there is discrepancy 
between Avro schema and Pig schema.
2) Recursive records are loaded as tuples even though Pig schema defines them 
as bytearrays and can be referred to by position (e.g. $0, $1.$0, etc).
3) To store recursive records, Avro schema must be provided via the 'schema' or 
'same' parameter in StoreFunc. In addition, 'no_schema_check' must be enabled 
because otherwise schema check will fail due to discrepancy between Avro schema 
and Pig schema.
4) Avro schema cannot be specified by the 'data' or 'schema_file' parameter. 
This is because AvroSchemaManager cannot handle recursive records for now. The 
recursive record check is added to AvroSchemaManager, so if Avro schema that 
contains recursive records is specified by these parameters, an exception is 
thrown.


This addresses bug PIG-2875.
https://issues.apache.org/jira/browse/PIG-2875


Diffs (updated)
-

  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchema2Pig.java
 6b1d2a1 
  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchemaManager.java
 1939d3e 
  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
 c9f7d81 
  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
 e24b495 
  
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
 2fab3f7 
  
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java
 040234f 

Diff: https://reviews.apache.org/r/6536/diff/


Testing
---

New test cases are added as follows:

1) Load/store Avro files that contain recursive records in array, map, union, 
and another record.
2) Load Avro files that contains recursive records, generate new relations, 
apply filters, and store them as non-recursive records.
3) Tests for the StoreFunc parameters: no_schema_check, schema, same, 
schema_file, and data.


Thanks,

Cheolsoo Park



Sync delay between git and svn

2012-08-16 Thread Prasanth J
Hello everyone

I am using pig git repository for my development. I forked apache/pig project 
from github to my account and working on a cloned copy. Occasionally I update 
my trunk with latest code from apache/pig/trunk remote. I wanted to patch my 
code with the latest trunk but trunk revision in git is different from that of 
svn. I received a commit message about an hour back but that commit is not 
reflected in git yet. Does anyone know how long does svn commit takes to get 
reflected in git?

Thanks
-- Prasanth



[jira] [Updated] (PIG-2662) skew join does not honor its config parameters

2012-08-16 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2662:
---

Fix Version/s: 0.11
 Assignee: Rajesh Balamohan
   Status: Patch Available  (was: Open)

> skew join does not honor its config parameters
> --
>
> Key: PIG-2662
> URL: https://issues.apache.org/jira/browse/PIG-2662
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0, 0.9.2
>Reporter: Thejas M Nair
>Assignee: Rajesh Balamohan
> Fix For: 0.11
>
> Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch, PIG-2662.3.patch
>
>
> Skew join can be configured using pig.sksampler.samplerate and 
> pig.skewedjoin.reduce.memusage. But the section of code the retrieves the 
> config values from properties (PoissonSampleLoader.computeSamples) is not 
> getting called. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2662) skew join does not honor its config parameters

2012-08-16 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2662:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

+1  Patch committed to trunk.
Thanks Rajesh!


> skew join does not honor its config parameters
> --
>
> Key: PIG-2662
> URL: https://issues.apache.org/jira/browse/PIG-2662
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.2, 0.10.0
>Reporter: Thejas M Nair
>Assignee: Rajesh Balamohan
> Fix For: 0.11
>
> Attachments: PIG-2662-0.9.2.patch, PIG-2662.2.patch, PIG-2662.3.patch
>
>
> Skew join can be configured using pig.sksampler.samplerate and 
> pig.skewedjoin.reduce.memusage. But the section of code the retrieves the 
> config values from properties (PoissonSampleLoader.computeSamples) is not 
> getting called. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: RANK function like in SQL

2012-08-16 Thread aavendan


> On Aug. 16, 2012, 10:40 a.m., Gianmarco De Francisci Morales wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java,
> >  line 144
> > 
> >
> > Same here, a more semantically oriented comment would be better.
> > Something like:
> > Indicates that there is a rank operation in the MR job.

Actually it is never used. I delete it.


> On Aug. 16, 2012, 10:40 a.m., Gianmarco De Francisci Morales wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java,
> >  line 25
> > 
> >
> > I would add here the fact that this PO relies on being run in a 
> > specific MR class because it accesses the counters.

Actually POCounter relies on a specific MR class


> On Aug. 16, 2012, 10:40 a.m., Gianmarco De Francisci Morales wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LORank.java,
> >  line 130
> > 
> >
> > What do you mean by 'Legal'?

changed


> On Aug. 16, 2012, 10:40 a.m., Gianmarco De Francisci Morales wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LORank.java,
> >  line 175
> > 
> >
> > Now that I think of it, we should have some validity checks here.
> > The operator can't be at the same time a dense rank and a row number.
> > I would put the other flag off in the set method, i.e.:
> > in setIsDenseRank() you also set this.isRowNumber = false; and log a 
> > warning saying that something is strange because this should not happen.

I will validate it on LogicalPlanGenerator where these values are set.


- aavendan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5523/#review10398
---


On Aug. 14, 2012, 8:19 a.m., aavendan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5523/
> ---
> 
> (Updated Aug. 14, 2012, 8:19 a.m.)
> 
> 
> Review request for pig, aavendan and Gianmarco De Francisci Morales.
> 
> 
> Description
> ---
> 
> Review board for https://issues.apache.org/jira/browse/PIG-2353
> 
> 
> This addresses bug PIG-2353.
> https://issues.apache.org/jira/browse/PIG-2353
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
>  1372471 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java
>  1372471 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java
>  1372471 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PhyPlanSetter.java
>  1372471 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigMapReduceCounter.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/DotPOPrinter.java
>  1372471 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java
>  1372471 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PlanPrinter.java
>  1372471 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCounter.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java
>  1372471 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllSameRalationalNodesVisitor.java
>  1372471 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/LogicalPlanPrinter.java
>  1372471 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java
>  1372471 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical

Re: Review Request: RANK function like in SQL

2012-08-16 Thread Gianmarco De Francisci Morales

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5523/#review10398
---



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java


small typo here
'teh'



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java


Fatal level logging of the counter size looks a bit too much.
Maybe debug?



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java


Missing a space after 'counterSize '
Also, I think we need to rethrow the exception as a Pig exception here.



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java


Typo: On case -> In case



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java


Typo: On case -> In case



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java


A more semantically oriented comment would be better.
Something like:
Indicates that there is a counter operation in the MR job.



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java


Same here, a more semantically oriented comment would be better.
Something like:
Indicates that there is a rank operation in the MR job.



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java


Here I would comment:
Indicates that there is a rank operation without sorting (row number) in 
the MR job.



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigMapReduceCounter.java


On this case -> In this case



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCounter.java


Can we be a bit more explicit here on what the class does?
Like:
This operator is part of the RANK implementation.
It adds a local counter and a unique task id to each tuple.
There are 2 modes of operations: regular and dense.
The local counter is depends on the mode of operation.
With regular rank
With dense rank




http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java


Missing Apache license.



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java


I would add here the fact that this PO relies on being run in a specific MR 
class because it accesses the counters.



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LORank.java


Very good



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LORank.java


What do you mean by 'Legal'?



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LORank.java


Now that I think of it, we should have some validity checks here.
The operator can't be at the same time a dense rank and a row number.
I would put the other flag off in the set method, i.e.:
in setIsDenseRank() you also set this.isRowNumber = false; and log a 
warning saying that something is strange because this should not happen.



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java


On case -> In case



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java


On case -> In case


- Gianmarco De Francisci Morales


On Aug. 14, 2012, 8:19 a.m., 

[jira] [Updated] (PIG-1314) Add DateTime Support to Pig

2012-08-16 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated PIG-1314:
-

Attachment: PIG-1314-6.patch

Hi Thejas,

I attached my latest patch. In this version, I fixed the default timezone 
issue. Pig can obtain the timezone string from PigContext, which can be loaded 
from either the default property files or some user supplied sources. Instead 
of calling PigMapReduce.sJobConfInternal.get().get("pig.datetime.default.tz") 
every time when no user-supplied timezone is specified for DateTime 
construction, I configure the default timezone of joda when PigGenericMapBase 
and PigGenericMapReduce are at the setup() stage. Therefore, when no timezone 
is specified for DateTime construction, the created DateTime object will 
automatically use the default timezone. I think by doing this,  users do not 
need to touch too much detail (calling  PigMapReduce.sJobConfInternal) when 
writing their UDFs that are related to DateTime, and avoid the ambiguity that 
PigMapReduce.sJobConfInternal.get().get("pig.datetime.default.tz") and 
DateTimeZone.getDefault().getID() may sometimes be different.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2831) MR-Cube implementation (Distributed cubing for holistic measures)

2012-08-16 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated PIG-2831:


Attachment: PIG-2831.2.git.patch

> MR-Cube implementation (Distributed cubing for holistic measures)
> -
>
> Key: PIG-2831
> URL: https://issues.apache.org/jira/browse/PIG-2831
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Prasanth J
>Assignee: Prasanth J
> Attachments: PIG-2831.1.git.patch, PIG-2831.2.git.patch
>
>
> Implementing distributed cube materialization on holistic measure based on 
> MR-Cube approach as described in http://arnab.org/files/mrcube.pdf. 
> Primary steps involved:
> 1) Identify if the measure is holistic or not
> 2) Determine algebraic attribute (can be detected automatically for few 
> cases, if automatic detection fails user should hint the algebraic attribute)
> 3) Modify MRPlan to insert a sampling job which executes naive cube algorithm 
> and generates annotated cube lattice (contains large group partitioning 
> information)
> 4) Modify plan to distribute annotated cube lattice to all mappers using 
> distributed cache
> 5) Execute actual cube materialization on full dataset
> 6) Modify MRPlan to insert a post process job for combining the results of 
> actual cube materialization job
> 7) OOM exception handling

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: PIG-2831: MR-Cube implementation (Distributed cubing for holistic measures)

2012-08-16 Thread j . prasanth . j

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6651/
---

(Updated Aug. 16, 2012, 8:43 a.m.)


Review request for pig and Dmitriy Ryaboy.


Changes
---

This new patch contains a small critical fix and a unit test case.


Description
---

This is a review board request for 
https://issues.apache.org/jira/browse/PIG-2831


This addresses bug PIG-2831.
https://issues.apache.org/jira/browse/PIG-2831


Diffs (updated)
-

  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
 8029dec 
  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java
 1d05a20 
  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java
 b87c209 
  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/MRPrinter.java
 157caad 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POCast.java
 cde340c 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java
 ff65146 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCube.java
 PRE-CREATION 
  src/org/apache/pig/backend/hadoop/executionengine/util/MapRedUtil.java 
0502917 
  src/org/apache/pig/builtin/CubeDimensions.java 5652029 
  src/org/apache/pig/builtin/PigStorage.java 21e835f 
  src/org/apache/pig/builtin/RollupDimensions.java f6c26e4 
  src/org/apache/pig/impl/builtin/HolisticCube.java PRE-CREATION 
  src/org/apache/pig/impl/builtin/HolisticCubeCompoundKey.java PRE-CREATION 
  src/org/apache/pig/impl/builtin/PartitionMaxGroup.java PRE-CREATION 
  src/org/apache/pig/impl/builtin/PostProcessCube.java PRE-CREATION 
  src/org/apache/pig/impl/io/ReadSingleLoader.java PRE-CREATION 
  src/org/apache/pig/impl/util/Utils.java 270cb6a 
  src/org/apache/pig/newplan/logical/optimizer/LogicalPlanPrinter.java 13439c6 
  src/org/apache/pig/newplan/logical/relational/LOCube.java b262efb 
  src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java 
127ab7a 
  src/org/apache/pig/newplan/logical/rules/ColumnPruneHelper.java 369f5c2 
  src/org/apache/pig/parser/LogicalPlanBuilder.java 289a76f 
  src/org/apache/pig/pen/EquivalenceClasses.java 194f8cb 
  src/org/apache/pig/pen/LineageTrimmingVisitor.java 917073c 
  src/org/apache/pig/pen/util/DisplayExamples.java 265f8f7 
  test/org/apache/pig/impl/builtin/TestHolisticCubeCompundKey.java PRE-CREATION 
  test/org/apache/pig/impl/builtin/TestPartitionMaxGroup.java PRE-CREATION 
  test/org/apache/pig/impl/builtin/TestPostProcessCube.java PRE-CREATION 
  test/org/apache/pig/test/TestCubeOperator.java 65d56a6 

Diff: https://reviews.apache.org/r/6651/diff/


Testing
---

Unit tests: All passed

Pre-commit tests: All passed
ant clean test-commit


Thanks,

Prasanth_J



[jira] [Commented] (PIG-2831) MR-Cube implementation (Distributed cubing for holistic measures)

2012-08-16 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435846#comment-13435846
 ] 

Prasanth J commented on PIG-2831:
-

Attaching a new patch. Added a small critical fix and a unit test case. Will 
update the same in RB as well. 

> MR-Cube implementation (Distributed cubing for holistic measures)
> -
>
> Key: PIG-2831
> URL: https://issues.apache.org/jira/browse/PIG-2831
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Prasanth J
>Assignee: Prasanth J
> Attachments: PIG-2831.1.git.patch
>
>
> Implementing distributed cube materialization on holistic measure based on 
> MR-Cube approach as described in http://arnab.org/files/mrcube.pdf. 
> Primary steps involved:
> 1) Identify if the measure is holistic or not
> 2) Determine algebraic attribute (can be detected automatically for few 
> cases, if automatic detection fails user should hint the algebraic attribute)
> 3) Modify MRPlan to insert a sampling job which executes naive cube algorithm 
> and generates annotated cube lattice (contains large group partitioning 
> information)
> 4) Modify plan to distribute annotated cube lattice to all mappers using 
> distributed cache
> 5) Execute actual cube materialization on full dataset
> 6) Modify MRPlan to insert a post process job for combining the results of 
> actual cube materialization job
> 7) OOM exception handling

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: PIG-2875 Add recursive record support to AvroStorage

2012-08-16 Thread Santhosh Srinivasan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6536/#review10347
---


Have a few comments.


contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchemaManager.java


Can you elaborate on this error message? What is the recommended course of 
action when the user sees this error message?

Minor comment: Please use braces to future proof accidental mistakes in the 
future, i..e, 
if () { ...
}



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java


The no_schema_check can now appear in any position, i.e., its no longer to 
required for it to be the first argument?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java


I am a little concerned with this change. For loop counters, to be 
conservative, I would avoid making changes in the body of the loop. Its hard to 
detect these changes and harder to maintain.

Is there a better way of implementing this?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java


Can you use parenthesis to make this more readable?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java


Can this be changed to return containsGenericUnion(fs, visitedRecords) to 
be consistent with the other parts of the code?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java


Can this be changed to return containsGenericUnion(fs, visitedRecords) to 
be consistent with the other parts of the code?



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java


There should be three rows with the third row being NULL.



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java


Can you include a message here. Something like "Negative test to test an 
exception. Should not be succeeding!" 



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java


Can you include a message here. Something like "Negative test to test an 
exception. Should not be succeeding!" 



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java


Can you include a message here. Something like "Negative test to test an 
exception. Should not be succeeding!" 



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java


Not sure if this holds true for all cases. This is an either or - i.e., if 
a sequence of jobs has one failure then the assertion will kick in for the 
first one which is actually a false alarm.


- Santhosh Srinivasan


On Aug. 14, 2012, 10:23 a.m., Cheolsoo Park wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6536/
> ---
> 
> (Updated Aug. 14, 2012, 10:23 a.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> ---
> 
> Allow recursive records to be loaded/stored by AvroStorage.
> 
> The changes include:
> 
> 1) Remove the recursive record check from AvroSchema2Pig.
> 2) Modofy inconvert() in AvroSchema2Pig so that it can map recursive records 
> to bytearrays.
> 3) Modify containsGenericUnion() in AvroStorageUtils so that it can handle 
> Avro schema that contains recursive records.
> 4) Update the parameter parsing in AvroStorage so that 'no_schema_check' can 
> be passed to both LoadFunc and StoreFunc.
> 5) Add the recursive record check to AvroSchemaManager. This is needed 
> because 'schema_file' and 'data' cannot refer to avro schema that contains 
> recursive records.
> 
> AvroStorage works as follows:
> 
> 1) PigSchema maps recursive records to bytearrays, so there is discrepancy 
> between Avro schema and Pig schema.
> 2) Recursive records are loaded as tuples even though Pig schema defines them 
> as bytearrays and can be referred to by position (e.g. $0, $1.$0, etc).
> 3) To store recursive records, Avro schema must be provided via the 'schema' 
> or 'same' parameter in StoreFunc. In addition, 'no_schema_check' mus