[jira] [Updated] (PIG-3317) disable optimizations via pig properties

2013-05-16 Thread Bill Graham (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Graham updated PIG-3317:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed, thanks Travis!

> disable optimizations via pig properties
> 
>
> Key: PIG-3317
> URL: https://issues.apache.org/jira/browse/PIG-3317
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.12
>Reporter: Travis Crawford
>Assignee: Travis Crawford
> Attachments: PIG-3317_disable_opts.1.patch, 
> PIG-3317_disable_opts.2.patch, PIG-3317_disable_opts.3.patch, 
> PIG-3317_disable_opts.4.patch
>
>
> Pig provides a number of optimizations which are described at 
> [http://pig.apache.org/docs/r0.11.1/perf.html#optimization-rules]. As is 
> described in the docs, all or specific optimizations can be disabled via the 
> command-line.
> Currently the caller of a pig script must know which optimizations to disable 
> when running because that information cannot be set in the script itself. Nor 
> can optimizations be disabled site-wide through pig.properties.
> Pig should allow disabling optimizations via properties so that pig scripts 
> themselves can disable optimizations as needed, rather than the caller 
> needing to know what optimizations to disable on the command-line.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2378) macros don't accept references to items within tuples as arguments

2013-05-16 Thread Johnny Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johnny Zhang updated PIG-2378:
--

Attachment: PIG-2378.patch.txt

latest patch improve minor issue (comments in code)

> macros don't accept references to items within tuples as arguments
> --
>
> Key: PIG-2378
> URL: https://issues.apache.org/jira/browse/PIG-2378
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.9.1
>Reporter: Joseph Adler
>Assignee: Johnny Zhang
> Attachments: PIG-2378.patch.txt, PIG-2378.patch.txt
>
>
> I'd like to be able to pass a reference to an item within a parameter to a 
> Pig Macro.
> For example, suppose that I had a relation A with the schema A:{id:long, 
> header:(time:long, type:chararray)}. I'd like to call a macro by typing:
>B = MY_MACRO(A, header.time);
> but this does not currently work. Obviously, I could define a new relation as 
> a workaround, for example I could use some pig code like 
>   AA = FOREACH a GENERATE *, header.time as time;
>   B = MY_MACRO(AA, time);
> But that's ugly and clunky

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2378) macros don't accept references to items within tuples as arguments

2013-05-16 Thread Johnny Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johnny Zhang updated PIG-2378:
--

Status: Patch Available  (was: Open)

> macros don't accept references to items within tuples as arguments
> --
>
> Key: PIG-2378
> URL: https://issues.apache.org/jira/browse/PIG-2378
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.9.1
>Reporter: Joseph Adler
>Assignee: Johnny Zhang
> Attachments: PIG-2378.patch.txt
>
>
> I'd like to be able to pass a reference to an item within a parameter to a 
> Pig Macro.
> For example, suppose that I had a relation A with the schema A:{id:long, 
> header:(time:long, type:chararray)}. I'd like to call a macro by typing:
>B = MY_MACRO(A, header.time);
> but this does not currently work. Obviously, I could define a new relation as 
> a workaround, for example I could use some pig code like 
>   AA = FOREACH a GENERATE *, header.time as time;
>   B = MY_MACRO(AA, time);
> But that's ugly and clunky

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3329) RANK operator failed when working with SPLIT

2013-05-16 Thread Redis Liu (JIRA)
Redis Liu created PIG-3329:
--

 Summary: RANK operator failed when working with SPLIT 
 Key: PIG-3329
 URL: https://issues.apache.org/jira/browse/PIG-3329
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11.1
Reporter: Redis Liu
Priority: Critical


input.txt:
1 2 3
4 5 6
7 8 9

script:
a = load 'input.txt' using PigStorage(' ') as (a:int, b:int, c:int);
SPLIT a into b if a > 0, c if a > 5;
d = RANK b;
dump d;

job will fail with error message:
java.lang.RuntimeException: Unable to read counter 
pig.counters.counter_4929375455335572575_-1
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PORank.addRank(PORank.java:161)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PORank.getNext(PORank.java:134)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:308)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:214)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:157)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
at org.apache.hadoop.mapred.Child$4.run(Child.java:275)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1340)
at org.apache.hadoop.mapred.Child.main(Child.java:269)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema

2013-05-16 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat reopened PIG-3322:
-


Hi Egil,
 The issue here is that the field "t" from the original data 
"studentcomplextab10k" set contains nulls. 
(fred hernandez,73,1.87)
(fred hernandez,20,2.11)

(calvin allen,60,2.49)
(yuri zipper,76,2.05)


So when this is stored via the AvroStorage, nulls are stored for the record.

When you read it out the written avro from the previous store, it fails with a 
null pointer exception.

The following snippet below works without any problems.
{code}
a = load 'studentcomplextab10k' using PigStorage() as (m:[], t:(name:chararray, 
age:int, gpa:double), b:{t:(name:chararray, age:int, gpa:double)});
b = foreach a generate t;
c = filter b by t is not null;
store c into 'singltupleavronotnull' USING 
org.apache.pig.piggybank.storage.avro.AvroStorage();
exec;
b = load 'singltupleavronotnull' USING 
org.apache.pig.piggybank.storage.avro.AvroStorage();
describe b;
dump b;
{code}

Kindly note: This issue is different from PIG-2330 


> AVRO: AvroStorage give NPE on reading file with union as top level schema
> -
>
> Key: PIG-3322
> URL: https://issues.apache.org/jira/browse/PIG-3322
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11.2
>Reporter: Egil Sorensen
>Assignee: Viraj Bhat
>  Labels: patch
>
> I am getting NPE when loading a file with AvroStorage a file that has schema 
> like:
> {code}
> ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated
>  from Pig Field 
> Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig 
> Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated 
> from Pig Field Schema"}]}]
> {code}
> E.g. see the e2e style test, which fails on this:
> {code}
> {
> 'num' => 4,
> # storing file with Pig type tuple relying on 
> conversion to record
> # loading using stored schemas 
> 'notmq' => 1,
> 'pig' => q\
> a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
> (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
> age:int, gpa:double)});
> b = foreach a generate t;
> describe b;
> store b into ':OUTPATH:.intermediate' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> exec;
> -- Read back what was stored with Avro
> u = load ':OUTPATH:.intermediate' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> describe u;
> store u into ':OUTPATH:';
> \,
> 'verify_pig_script' => q\
> a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
> (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
> age:int, gpa:double)});
> b = foreach a generate t;
> describe b;
> store b into ':OUTPATH:';
> \,
> },
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema

2013-05-16 Thread Egil Sorensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Egil Sorensen resolved PIG-3322.


   Resolution: Duplicate
Fix Version/s: (was: 0.11.2)
   (was: 0.12)

The test was only storing one field, and as such seems to duplicate PIG-2330.

> AVRO: AvroStorage give NPE on reading file with union as top level schema
> -
>
> Key: PIG-3322
> URL: https://issues.apache.org/jira/browse/PIG-3322
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11.2
>Reporter: Egil Sorensen
>Assignee: Viraj Bhat
>  Labels: patch
>
> I am getting NPE when loading a file with AvroStorage a file that has schema 
> like:
> {code}
> ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated
>  from Pig Field 
> Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig 
> Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated 
> from Pig Field Schema"}]}]
> {code}
> E.g. see the e2e style test, which fails on this:
> {code}
> {
> 'num' => 4,
> # storing file with Pig type tuple relying on 
> conversion to record
> # loading using stored schemas 
> 'notmq' => 1,
> 'pig' => q\
> a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
> (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
> age:int, gpa:double)});
> b = foreach a generate t;
> describe b;
> store b into ':OUTPATH:.intermediate' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> exec;
> -- Read back what was stored with Avro
> u = load ':OUTPATH:.intermediate' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> describe u;
> store u into ':OUTPATH:';
> \,
> 'verify_pig_script' => q\
> a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
> (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
> age:int, gpa:double)});
> b = foreach a generate t;
> describe b;
> store b into ':OUTPATH:';
> \,
> },
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Subscription: PIG patch available

2013-05-16 Thread jira
Issue Subscription
Filter: PIG patch available (18 issues)

Subscriber: pigdaily

Key Summary
PIG-3328DataBags created with an initial list of tuples don't get 
registered as spillable
https://issues.apache.org/jira/browse/PIG-3328
PIG-3318AVRO: 'default value' not honored when merging schemas on load with 
AvroStorage
https://issues.apache.org/jira/browse/PIG-3318
PIG-3317disable optimizations via pig properties
https://issues.apache.org/jira/browse/PIG-3317
PIG-3295Casting from bytearray failing after Union (even when each field is 
from a single Loader)
https://issues.apache.org/jira/browse/PIG-3295
PIG-3285Jobs using HBaseStorage fail to ship dependency jars
https://issues.apache.org/jira/browse/PIG-3285
PIG-3258Patch to allow MultiStorage to use more than one index to generate 
output tree
https://issues.apache.org/jira/browse/PIG-3258
PIG-3257Add unique identifier UDF
https://issues.apache.org/jira/browse/PIG-3257
PIG-3247Piggybank functions to mimic OVER clause in SQL
https://issues.apache.org/jira/browse/PIG-3247
PIG-3210Pig fails to start when it cannot write log to log files
https://issues.apache.org/jira/browse/PIG-3210
PIG-3199Expose LogicalPlan via PigServer API
https://issues.apache.org/jira/browse/PIG-3199
PIG-3166Update eclipse .classpath according to ivy library.properties
https://issues.apache.org/jira/browse/PIG-3166
PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections
https://issues.apache.org/jira/browse/PIG-3123
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is 
brittle
https://issues.apache.org/jira/browse/PIG-3024
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-2248Pig parser does not detect when a macro name masks a UDF name
https://issues.apache.org/jira/browse/PIG-2248
PIG-2244Macros cannot be passed relation names
https://issues.apache.org/jira/browse/PIG-2244
PIG-1914Support load/store JSON data in Pig
https://issues.apache.org/jira/browse/PIG-1914

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-16 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660033#comment-13660033
 ] 

Julien Le Dem commented on PIG-3307:


https://reviews.apache.org/r/11203/diff/#index_header
thanks [~cheolsoo] and [~daijy]!

> Refactor physical operators to remove methods parameters that are always null
> -
>
> Key: PIG-3307
> URL: https://issues.apache.org/jira/browse/PIG-3307
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch
>
>
> The physical operators are sometimes overly complex. I'm trying to cleanup 
> some unnecessary code.
> in particular there is an array of getNext(*T* v) where the value v does not 
> seem to have any importance and is just used to pick the correct method.
> I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: Refactor physical operators to remove methods parameters that are always null

2013-05-16 Thread Julien Le Dem

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11203/
---

Review request for pig, Daniel Dai, Dmitriy Ryaboy, Cheolsoo Park, and Bill 
Graham.


Description
---

Refactor physical operators to remove methods parameters that are always null


This addresses bug PIG-3307.
https://issues.apache.org/jira/browse/PIG-3307


Diffs
-

  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MergeJoinIndexer.java
 d5aff3d 
  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigCombiner.java
 6cfc8c0 
  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java
 7c499f6 
  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java
 6145214 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java
 fc0112a 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Add.java
 5bceca6 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/BinaryComparisonOperator.java
 3e434f3 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ComparisonOperator.java
 51d9f34 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ConstantExpression.java
 7e4cffa 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Divide.java
 bdcc72b 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/EqualToExpr.java
 a767c36 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ExpressionOperator.java
 9cca2c3 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GTOrEqualToExpr.java
 b5e3c83 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GreaterThanExpr.java
 f3b5d44 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LTOrEqualToExpr.java
 35786c0 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LessThanExpr.java
 c9b3157 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Mod.java
 1108846 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Multiply.java
 2795b78 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/NotEqualToExpr.java
 294f84a 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POAnd.java
 f24c2ac 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POBinCond.java
 312f3ac 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POCast.java
 987cc21 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POIsNull.java
 9ea89f7 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POMapLookUp.java
 fd5573f 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PONegative.java
 8d3fcb1 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PONot.java
 973dfc5 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POOr.java
 498eb12 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POProject.java
 8886df7 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PORegexp.java
 6634915 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PORelationToExprProject.java
 e400a95 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserComparisonFunc.java
 1aa1671 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
 167cf06 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Subtract.java
 495 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCollectedGroup.java
 a5adaf7 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCombinerPackage.java
 4a58a7e 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCounter.java
 30dcea2 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCross.java
 b90b0a2 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PODemux.java
 e26c611 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PODistinct.java
 ed2d39e 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java
 a4abdd8 
  
sr

[jira] [Updated] (PIG-2378) macros don't accept references to items within tuples as arguments

2013-05-16 Thread Johnny Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johnny Zhang updated PIG-2378:
--

Attachment: PIG-2378.patch.txt

this is a working patch resolving the issue. Now you don't have to use the 
quote hacky working around, you can use relation.filed directly as macro 
argument. For the test data I described above, it finally generate results
{noformat}
((2,3),{(1,(2,3))})
((2,4),{(4,(2,4))})
((7,8),{(6,(7,8))})
{noformat}

I will run full unit tests see if any regression it brings, since it touches 
file LogicalSchema.java, which is used by many other places. Meanwhile, improve 
the code efficiency as much as possible.

> macros don't accept references to items within tuples as arguments
> --
>
> Key: PIG-2378
> URL: https://issues.apache.org/jira/browse/PIG-2378
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.9.1
>Reporter: Joseph Adler
>Assignee: Johnny Zhang
> Attachments: PIG-2378.patch.txt
>
>
> I'd like to be able to pass a reference to an item within a parameter to a 
> Pig Macro.
> For example, suppose that I had a relation A with the schema A:{id:long, 
> header:(time:long, type:chararray)}. I'd like to call a macro by typing:
>B = MY_MACRO(A, header.time);
> but this does not currently work. Obviously, I could define a new relation as 
> a workaround, for example I could use some pig code like 
>   AA = FOREACH a GENERATE *, header.time as time;
>   B = MY_MACRO(AA, time);
> But that's ugly and clunky

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-3320) AVRO: no empty field expressed when loading with AvroStorage using reader schema with extra field that has no default

2013-05-16 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat resolved PIG-3320.
-

Resolution: Invalid

> AVRO: no empty field expressed when loading with AvroStorage using reader 
> schema with extra field that has no default
> -
>
> Key: PIG-3320
> URL: https://issues.apache.org/jira/browse/PIG-3320
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11.2
>Reporter: Egil Sorensen
>Assignee: Viraj Bhat
>  Labels: patch
> Fix For: 0.12, 0.11.2
>
>
> Somewhat different use case than PIG-3318:
> Loading with AvroStorage giving a loader schema that relative to the schema 
> in the Avro file had an extra filed w/o default and expected to see an extra 
> empty column, but the schema is as in the avro file w/o the extra column.
> E.g. see the e2e style test, which fails on this:
> {code}
> {
> 'num' => 2,
> # storing using writer schema
> # loading using reader schema with extra field that 
> has no default
> 'notmq' => 1,
> 'pig' => q\
> a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: 
> int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: 
> float,doublenum: double);
> -- Store Avro file w. schema
> b1 = foreach a generate id, intnum5;
> c1 = filter b1 by 10 <= id and id < 20;
> describe c1;
> dump c1;
> store c1 into ':OUTPATH:.intermediate_1' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage('
> {
>"schema" : {  
>   "name" : "schema_writing",
>   "type" : "record",
>   "fields" : [
>  {  
> "name" : "id",
> "type" : [
>"null",
>"int"
> ]
>  },
>  {  
> "name" : "intnum5",
> "type" : [
>"null",
>"int"
> ]
>  }
>   ]
>}
> }
> ');
> exec;
> -- Read back what was stored with Avro adding extra field to reader schema
> u = load ':OUTPATH:.intermediate_1' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage('
> {
>"debug" : 5,
>"schema" : {  
>   "name" : "schema_reading",
>   "type" : "record",
>   "fields" : [
>  {  
> "name" : "id",
> "type" : [
>"null",
>"int"
> ]
>  },
>  {  
> "name" : "intnum5",
> "type" : [
>"null",
>"string"
> ]
>  },
>  {
> "name" : "intnum100",
> "type" : [
>"null",
>"int"
> ]
>  }
>   ]
>}
> }
> ');
> describe u;
> dump u;
> store u into ':OUTPATH:';
> \,
> 'verify_pig_script' => q\
> a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: 
> int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: 
> float,doublenum: double);
> b = filter a by (10 <= id and id < 20);
> c = foreach b generate id, intnum5, '';
> store c into ':OUTPATH:';
> \,
> },
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3320) AVRO: no empty field expressed when loading with AvroStorage using reader schema with extra field that has no default

2013-05-16 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659803#comment-13659803
 ] 

Viraj Bhat commented on PIG-3320:
-

With PIG-3321 committed, the above script throws an error which is listed in 
Comment 2 of this Jira.

Suppose we want AvroStorage() to return an extra field "intnum100" with null 
instead of throwing an error in Comment 2; you have to do the following:
1) Pass with a null reader schema PigAvroDatumReader
2) Construct an mProtoTuple with field size equal to readerSchema
3) Reconcile the schemas manually by using the logic in 
getSchemaToMergedSchemaMap() 
4) Populate mProtoTuple using the map keeping track of new to old position

By doing all the above we are undoing the changes done in PIG-3321, where the 
readerSchema is not passed to PigAvroDatumReader(). We want Avro to handle the 
schema merges in this case and it does it correctly by throwing an error.

Currently closing this Jira as invalid.

> AVRO: no empty field expressed when loading with AvroStorage using reader 
> schema with extra field that has no default
> -
>
> Key: PIG-3320
> URL: https://issues.apache.org/jira/browse/PIG-3320
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11.2
>Reporter: Egil Sorensen
>Assignee: Viraj Bhat
>  Labels: patch
> Fix For: 0.12, 0.11.2
>
>
> Somewhat different use case than PIG-3318:
> Loading with AvroStorage giving a loader schema that relative to the schema 
> in the Avro file had an extra filed w/o default and expected to see an extra 
> empty column, but the schema is as in the avro file w/o the extra column.
> E.g. see the e2e style test, which fails on this:
> {code}
> {
> 'num' => 2,
> # storing using writer schema
> # loading using reader schema with extra field that 
> has no default
> 'notmq' => 1,
> 'pig' => q\
> a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: 
> int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: 
> float,doublenum: double);
> -- Store Avro file w. schema
> b1 = foreach a generate id, intnum5;
> c1 = filter b1 by 10 <= id and id < 20;
> describe c1;
> dump c1;
> store c1 into ':OUTPATH:.intermediate_1' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage('
> {
>"schema" : {  
>   "name" : "schema_writing",
>   "type" : "record",
>   "fields" : [
>  {  
> "name" : "id",
> "type" : [
>"null",
>"int"
> ]
>  },
>  {  
> "name" : "intnum5",
> "type" : [
>"null",
>"int"
> ]
>  }
>   ]
>}
> }
> ');
> exec;
> -- Read back what was stored with Avro adding extra field to reader schema
> u = load ':OUTPATH:.intermediate_1' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage('
> {
>"debug" : 5,
>"schema" : {  
>   "name" : "schema_reading",
>   "type" : "record",
>   "fields" : [
>  {  
> "name" : "id",
> "type" : [
>"null",
>"int"
> ]
>  },
>  {  
> "name" : "intnum5",
> "type" : [
>"null",
>"string"
> ]
>  },
>  {
> "name" : "intnum100",
> "type" : [
>"null",
>"int"
> ]
>  }
>   ]
>}
> }
> ');
> describe u;
> dump u;
> store u into ':OUTPATH:';
> \,
> 'verify_pig_script' => q\
> a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: 
> int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: 
> float,doublenum: double);
> b = filter a by (10 <= id and id < 20);
> c = foreach b generate id, intnum5, '';
> store c into ':OUTPATH:';
> \,
> },
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3328) DataBags created with an initial list of tuples don't get registered as spillable

2013-05-16 Thread Mark Wagner (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Wagner updated PIG-3328:
-

Fix Version/s: 0.11.2
   0.12
Affects Version/s: 0.12
   Status: Patch Available  (was: Open)

> DataBags created with an initial list of tuples don't get registered as 
> spillable
> -
>
> Key: PIG-3328
> URL: https://issues.apache.org/jira/browse/PIG-3328
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.1, 0.12, 0.11.2
>Reporter: Mark Wagner
>Assignee: Mark Wagner
> Fix For: 0.12, 0.11.2
>
> Attachments: PIG-3328.1.patch
>
>
> DefaultDataBag has a constructor to take ownership of an existing list of 
> tuples as its own contents, but registration for spilling only occurs when 
> adding elements. If a bag starts out big enough to consider spilling, but no 
> new tuples are added to it, it will never be spilled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3328) DataBags created with an initial list of tuples don't get registered as spillable

2013-05-16 Thread Mark Wagner (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Wagner updated PIG-3328:
-

Affects Version/s: 0.11.2
   0.11.1

> DataBags created with an initial list of tuples don't get registered as 
> spillable
> -
>
> Key: PIG-3328
> URL: https://issues.apache.org/jira/browse/PIG-3328
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.1, 0.11.2
>Reporter: Mark Wagner
>Assignee: Mark Wagner
> Attachments: PIG-3328.1.patch
>
>
> DefaultDataBag has a constructor to take ownership of an existing list of 
> tuples as its own contents, but registration for spilling only occurs when 
> adding elements. If a bag starts out big enough to consider spilling, but no 
> new tuples are added to it, it will never be spilled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2684) :: in field name causes AvroStorage to fail

2013-05-16 Thread Paul Mazak (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659587#comment-13659587
 ] 

Paul Mazak commented on PIG-2684:
-

Better formatted solution: 
https://issues.apache.org/jira/browse/PIG-3015?focusedCommentId=13659573&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13659573


> :: in field name causes AvroStorage to fail
> ---
>
> Key: PIG-2684
> URL: https://issues.apache.org/jira/browse/PIG-2684
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Reporter: Fabian Alenius
>
> There appears to be a bug in AvroStorage which causes it to fail when there 
> are field names that contain ::
> For example, the following will fail:
> data = load 'test.txt' as (one, two);
> grp = GROUP data by (one, two);
> result = foreach grp generate FLATTEN(group); 
>   
> 
> store result into 'test.avro' using 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> ERROR 2999: Unexpected internal error. Illegal character in: group::one
> While the following will succeed:
> data = load 'test.txt' as (one, two);
> grp = GROUP data by (one, two);
> result = foreach grp generate FLATTEN(group) as (one,two);
>  
> store result into 'test.avro' using 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> Here is a minimal test case:
> data = load 'test.txt' as (one::two, three);  
>   
> 
> store data into 'test.avro' using 
> org.apache.pig.piggybank.storage.avro.AvroStorage();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3015) Rewrite of AvroStorage

2013-05-16 Thread Paul Mazak (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659573#comment-13659573
 ] 

Paul Mazak commented on PIG-3015:
-

One simple workaround for us was to override AvroStorage's checkSchema this way.
{code}
/**
 * In Pig script do:
 * REGISTER 'lib/this.jar'
 * DEFINE AvroStorage com.this.JoinableAvroStorage;
 */
public class JoinableAvroStorage extends AvroStorage {
  
  @Override
  public void checkSchema(ResourceSchema s) throws IOException {
try {
  super.checkSchema(s);
}
catch (SchemaParseException spe) {
  ResourceFieldSchema[] pigFields = s.getFields();
  for (int i = 0; i < pigFields.length; i++) {
String outname = pigFields[i].getName();
if (outname.contains("::")) {
  String newOutname = outname.split("::")[1];
  pigFields[i].setName(newOutname);
}
  }
  super.checkSchema(s);
}
  }
}
{code}

> Rewrite of AvroStorage
> --
>
> Key: PIG-3015
> URL: https://issues.apache.org/jira/browse/PIG-3015
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Reporter: Joseph Adler
>Assignee: Joseph Adler
> Attachments: bad.avro, good.avro, PIG-3015-10.patch, 
> PIG-3015-11.patch, PIG-3015-2.patch, PIG-3015-3.patch, PIG-3015-4.patch, 
> PIG-3015-5.patch, PIG-3015-6.patch, PIG-3015-7.patch, PIG-3015-9.patch, 
> PIG-3015-doc-2.patch, PIG-3015-doc.patch, TestInput.java, Test.java, 
> with_dates.pig
>
>
> The current AvroStorage implementation has a lot of issues: it requires old 
> versions of Avro, it copies data much more than needed, and it's verbose and 
> complicated. (One pet peeve of mine is that old versions of Avro don't 
> support Snappy compression.)
> I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
> new implementation is significantly faster, and the code is a lot simpler. 
> Rewriting AvroStorage also enabled me to implement support for Trevni (as 
> TrevniStorage).
> I'm opening this ticket to facilitate discussion while I figure out the best 
> way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2684) :: in field name causes AvroStorage to fail

2013-05-16 Thread Paul Mazak (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659566#comment-13659566
 ] 

Paul Mazak commented on PIG-2684:
-

One simple workaround for us was to override AvroStorage's checkSchema this way.

/**
 * In Pig script do:
 * REGISTER 'lib/this.jar'
 * DEFINE AvroStorage com.this.JoinableAvroStorage;
 */
public class JoinableAvroStorage extends AvroStorage {

@Override
public void checkSchema(ResourceSchema s) throws IOException {
try {
super.checkSchema(s);
}
catch (SchemaParseException spe) {
ResourceFieldSchema[] pigFields = s.getFields();
for (int i = 0; i < pigFields.length; i++) {
String outname = pigFields[i].getName();
if (outname.contains("::")) {
String newOutname = outname.split("::")[1];
pigFields[i].setName(newOutname);
}
}
super.checkSchema(s);
}
}
}

> :: in field name causes AvroStorage to fail
> ---
>
> Key: PIG-2684
> URL: https://issues.apache.org/jira/browse/PIG-2684
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Reporter: Fabian Alenius
>
> There appears to be a bug in AvroStorage which causes it to fail when there 
> are field names that contain ::
> For example, the following will fail:
> data = load 'test.txt' as (one, two);
> grp = GROUP data by (one, two);
> result = foreach grp generate FLATTEN(group); 
>   
> 
> store result into 'test.avro' using 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> ERROR 2999: Unexpected internal error. Illegal character in: group::one
> While the following will succeed:
> data = load 'test.txt' as (one, two);
> grp = GROUP data by (one, two);
> result = foreach grp generate FLATTEN(group) as (one,two);
>  
> store result into 'test.avro' using 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> Here is a minimal test case:
> data = load 'test.txt' as (one::two, three);  
>   
> 
> store data into 'test.avro' using 
> org.apache.pig.piggybank.storage.avro.AvroStorage();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira