Re: Review Request: PIG-3318 Patch to address default values when schemas are merged in AvroStorage. It does this for Records containing primitive values

2013-05-29 Thread Viraj Bhat

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11135/
---

(Updated May 30, 2013, 2:28 a.m.)


Review request for pig and Rohini Palaniswamy.


Description
---

Default values are not honoured when merging default schema


This addresses bug PIG-3318.
https://issues.apache.org/jira/browse/PIG-3318


Diffs
-

  
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
 1484564 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
 1484564 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java
 1484564 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java
 1484564 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
 1484564 

Diff: https://reviews.apache.org/r/11135/diff/


Testing
---

Yes


Thanks,

Viraj Bhat



Re: Review Request: PIG-3331 Default values not written to Schema when specified in the output schema

2013-05-29 Thread Viraj Bhat

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11355/
---

(Updated May 30, 2013, 2:29 a.m.)


Review request for pig and Rohini Palaniswamy.


Description
---

Patch to write default values to the Schema when the writer schema contains 
that in the AvroStorage.


Diffs
-

  
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigSchema2Avro.java
 1485826 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
 1485826 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/numbers.txt
 PRE-CREATION 

Diff: https://reviews.apache.org/r/11355/diff/


Testing
---

Yes against the Piggybank  in Pig trunk/Pig 0.12


Thanks,

Viraj Bhat



[jira] Subscription: PIG patch available

2013-05-29 Thread jira
Issue Subscription
Filter: PIG patch available (18 issues)

Subscriber: pigdaily

Key Summary
PIG-3334Fix Windows piggybank unit test failures
https://issues.apache.org/jira/browse/PIG-3334
PIG-Fix remaining Windows core unit test failures
https://issues.apache.org/jira/browse/PIG-
PIG-3318AVRO: 'default value' not honored when merging schemas on load with 
AvroStorage
https://issues.apache.org/jira/browse/PIG-3318
PIG-3295Casting from bytearray failing after Union (even when each field is 
from a single Loader)
https://issues.apache.org/jira/browse/PIG-3295
PIG-3285Jobs using HBaseStorage fail to ship dependency jars
https://issues.apache.org/jira/browse/PIG-3285
PIG-3258Patch to allow MultiStorage to use more than one index to generate 
output tree
https://issues.apache.org/jira/browse/PIG-3258
PIG-3257Add unique identifier UDF
https://issues.apache.org/jira/browse/PIG-3257
PIG-3247Piggybank functions to mimic OVER clause in SQL
https://issues.apache.org/jira/browse/PIG-3247
PIG-3210Pig fails to start when it cannot write log to log files
https://issues.apache.org/jira/browse/PIG-3210
PIG-3199Expose LogicalPlan via PigServer API
https://issues.apache.org/jira/browse/PIG-3199
PIG-3166Update eclipse .classpath according to ivy library.properties
https://issues.apache.org/jira/browse/PIG-3166
PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections
https://issues.apache.org/jira/browse/PIG-3123
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-2956Invalid cache specification for some streaming statement
https://issues.apache.org/jira/browse/PIG-2956
PIG-2248Pig parser does not detect when a macro name masks a UDF name
https://issues.apache.org/jira/browse/PIG-2248
PIG-2244Macros cannot be passed relation names
https://issues.apache.org/jira/browse/PIG-2244
PIG-1914Support load/store JSON data in Pig
https://issues.apache.org/jira/browse/PIG-1914

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


[jira] [Commented] (PIG-2569) Fix org.apache.pig.test.TestInvoker.testSpeed

2013-05-29 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669915#comment-13669915
 ] 

Aniket Mokashi commented on PIG-2569:
-

With java version 1.6.0_45+, the ratio is almost about 4-5, so the test fails 
sometimes. It could be JVM problem. Do we need this test in pig?

> Fix org.apache.pig.test.TestInvoker.testSpeed
> -
>
> Key: PIG-2569
> URL: https://issues.apache.org/jira/browse/PIG-2569
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.2
>Reporter: Johnny Zhang
>Assignee: Andrey Klochkov
> Fix For: 0.11
>
> Attachments: PIG-2569.patch
>
>
> the Pig unit test org.apache.pig.test.TestInvoker.testSpeed pass sometimes 
> and fail sometimes. I think this test need further polish, look at the code:
> {noformat}
> @Test
> public void testSpeed() throws IOException, SecurityException, 
> ClassNotFoundException, NoSuchMethodException {
> EvalFunc log = new Log();
> Tuple tup = tf_.newTuple(1);
> long start = System.currentTimeMillis();
> for (int i=0; i < 100; i++) {
> tup.set(0, (double) i);
> log.exec(tup);
> }
> long staticSpeed = (System.currentTimeMillis()-start);
> start = System.currentTimeMillis();
> log = new InvokeForDouble("java.lang.Math.log", "Double", "static");
> for (int i=0; i < 100; i++) {
> tup.set(0, (double) i);
> log.exec(tup);
> }
> long dynamicSpeed = System.currentTimeMillis()-start;
> System.err.println("Dynamic to static ratio: "+((float) 
> dynamicSpeed)/staticSpeed);
> assertTrue( ((float) dynamicSpeed)/staticSpeed < 5);
> }
> {noformat}
> I understand this test is trying to prevent the initicialization time of 
> InvokeForDouble doesn't take too long, but the ratio 5 is hardcoded, and 
> there is no solid logic behind it why it is 5. For my understand, when the 
> server resouce is low, ratio could be larger than 5, but it doesn't mean code 
> has problem. For our case, the code never change, but it pass in the first 
> run, but fail in the second run.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema

2013-05-29 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated PIG-3322:


Attachment: expected_testLoadAvrowithNulls.txt

Golden test file generated

> AVRO: AvroStorage give NPE on reading file with union as top level schema
> -
>
> Key: PIG-3322
> URL: https://issues.apache.org/jira/browse/PIG-3322
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11.2
>Reporter: Egil Sorensen
>Assignee: Viraj Bhat
>  Labels: patch
> Fix For: 0.12
>
> Attachments: expected_testLoadAvrowithNulls.txt, PIG-3322_2.patch, 
> test_loadavrowithnulls.avro
>
>
> I am getting NPE when loading a file with AvroStorage a file that has schema 
> like:
> {code}
> ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated
>  from Pig Field 
> Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig 
> Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated 
> from Pig Field Schema"}]}]
> {code}
> E.g. see the e2e style test, which fails on this:
> {code}
> {
> 'num' => 4,
> # storing file with Pig type tuple relying on 
> conversion to record
> # loading using stored schemas 
> 'notmq' => 1,
> 'pig' => q\
> a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
> (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
> age:int, gpa:double)});
> b = foreach a generate t;
> describe b;
> store b into ':OUTPATH:.intermediate' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> exec;
> -- Read back what was stored with Avro
> u = load ':OUTPATH:.intermediate' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> describe u;
> store u into ':OUTPATH:';
> \,
> 'verify_pig_script' => q\
> a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
> (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
> age:int, gpa:double)});
> b = foreach a generate t;
> describe b;
> store b into ':OUTPATH:';
> \,
> },
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema

2013-05-29 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated PIG-3322:


Attachment: test_loadavrowithnulls.avro

Test Input Avro file

> AVRO: AvroStorage give NPE on reading file with union as top level schema
> -
>
> Key: PIG-3322
> URL: https://issues.apache.org/jira/browse/PIG-3322
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11.2
>Reporter: Egil Sorensen
>Assignee: Viraj Bhat
>  Labels: patch
> Fix For: 0.12
>
> Attachments: expected_testLoadAvrowithNulls.txt, PIG-3322_2.patch, 
> test_loadavrowithnulls.avro
>
>
> I am getting NPE when loading a file with AvroStorage a file that has schema 
> like:
> {code}
> ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated
>  from Pig Field 
> Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig 
> Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated 
> from Pig Field Schema"}]}]
> {code}
> E.g. see the e2e style test, which fails on this:
> {code}
> {
> 'num' => 4,
> # storing file with Pig type tuple relying on 
> conversion to record
> # loading using stored schemas 
> 'notmq' => 1,
> 'pig' => q\
> a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
> (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
> age:int, gpa:double)});
> b = foreach a generate t;
> describe b;
> store b into ':OUTPATH:.intermediate' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> exec;
> -- Read back what was stored with Avro
> u = load ':OUTPATH:.intermediate' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> describe u;
> store u into ':OUTPATH:';
> \,
> 'verify_pig_script' => q\
> a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
> (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
> age:int, gpa:double)});
> b = foreach a generate t;
> describe b;
> store b into ':OUTPATH:';
> \,
> },
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema

2013-05-29 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated PIG-3322:


Attachment: PIG-3322_2.patch

Patch for PIG-3322

> AVRO: AvroStorage give NPE on reading file with union as top level schema
> -
>
> Key: PIG-3322
> URL: https://issues.apache.org/jira/browse/PIG-3322
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11.2
>Reporter: Egil Sorensen
>Assignee: Viraj Bhat
>  Labels: patch
> Fix For: 0.12
>
> Attachments: expected_testLoadAvrowithNulls.txt, PIG-3322_2.patch, 
> test_loadavrowithnulls.avro
>
>
> I am getting NPE when loading a file with AvroStorage a file that has schema 
> like:
> {code}
> ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated
>  from Pig Field 
> Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig 
> Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated 
> from Pig Field Schema"}]}]
> {code}
> E.g. see the e2e style test, which fails on this:
> {code}
> {
> 'num' => 4,
> # storing file with Pig type tuple relying on 
> conversion to record
> # loading using stored schemas 
> 'notmq' => 1,
> 'pig' => q\
> a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
> (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
> age:int, gpa:double)});
> b = foreach a generate t;
> describe b;
> store b into ':OUTPATH:.intermediate' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> exec;
> -- Read back what was stored with Avro
> u = load ':OUTPATH:.intermediate' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> describe u;
> store u into ':OUTPATH:';
> \,
> 'verify_pig_script' => q\
> a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
> (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
> age:int, gpa:double)});
> b = foreach a generate t;
> describe b;
> store b into ':OUTPATH:';
> \,
> },
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema

2013-05-29 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated PIG-3322:


Attachment: (was: test_loadavrowithnulls.avro)

> AVRO: AvroStorage give NPE on reading file with union as top level schema
> -
>
> Key: PIG-3322
> URL: https://issues.apache.org/jira/browse/PIG-3322
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11.2
>Reporter: Egil Sorensen
>Assignee: Viraj Bhat
>  Labels: patch
> Fix For: 0.12
>
>
> I am getting NPE when loading a file with AvroStorage a file that has schema 
> like:
> {code}
> ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated
>  from Pig Field 
> Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig 
> Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated 
> from Pig Field Schema"}]}]
> {code}
> E.g. see the e2e style test, which fails on this:
> {code}
> {
> 'num' => 4,
> # storing file with Pig type tuple relying on 
> conversion to record
> # loading using stored schemas 
> 'notmq' => 1,
> 'pig' => q\
> a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
> (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
> age:int, gpa:double)});
> b = foreach a generate t;
> describe b;
> store b into ':OUTPATH:.intermediate' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> exec;
> -- Read back what was stored with Avro
> u = load ':OUTPATH:.intermediate' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> describe u;
> store u into ':OUTPATH:';
> \,
> 'verify_pig_script' => q\
> a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
> (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
> age:int, gpa:double)});
> b = foreach a generate t;
> describe b;
> store b into ':OUTPATH:';
> \,
> },
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema

2013-05-29 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated PIG-3322:


Attachment: (was: expected_testLoadAvrowithNulls.txt)

> AVRO: AvroStorage give NPE on reading file with union as top level schema
> -
>
> Key: PIG-3322
> URL: https://issues.apache.org/jira/browse/PIG-3322
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11.2
>Reporter: Egil Sorensen
>Assignee: Viraj Bhat
>  Labels: patch
> Fix For: 0.12
>
>
> I am getting NPE when loading a file with AvroStorage a file that has schema 
> like:
> {code}
> ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated
>  from Pig Field 
> Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig 
> Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated 
> from Pig Field Schema"}]}]
> {code}
> E.g. see the e2e style test, which fails on this:
> {code}
> {
> 'num' => 4,
> # storing file with Pig type tuple relying on 
> conversion to record
> # loading using stored schemas 
> 'notmq' => 1,
> 'pig' => q\
> a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
> (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
> age:int, gpa:double)});
> b = foreach a generate t;
> describe b;
> store b into ':OUTPATH:.intermediate' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> exec;
> -- Read back what was stored with Avro
> u = load ':OUTPATH:.intermediate' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> describe u;
> store u into ':OUTPATH:';
> \,
> 'verify_pig_script' => q\
> a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
> (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
> age:int, gpa:double)});
> b = foreach a generate t;
> describe b;
> store b into ':OUTPATH:';
> \,
> },
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: PIG-3322 Fix the issue where NPE is thrown when reading a union which has nulls and add a testcase

2013-05-29 Thread Viraj Bhat

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11333/
---

(Updated May 29, 2013, 11:07 p.m.)


Review request for pig and Rohini Palaniswamy.


Changes
---

Smaller input files and output golden files


Description
---

Null pointer exception when loading union with null in it's schema. Test case 
was also updated with a sample test case.


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
 1485358 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java
 1485358 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
 1485358 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testLoadAvrowithNulls.txt
 PRE-CREATION 

Diff: https://reviews.apache.org/r/11333/diff/


Testing
---

Yes all tests pass in the piggybank


Thanks,

Viraj Bhat



[jira] [Commented] (PIG-3337) Fix remaining Window e2e tests

2013-05-29 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669776#comment-13669776
 ] 

Alan Gates commented on PIG-3337:
-

+1

> Fix remaining Window e2e tests
> --
>
> Key: PIG-3337
> URL: https://issues.apache.org/jira/browse/PIG-3337
> Project: Pig
>  Issue Type: Sub-task
>  Components: e2e harness
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG-3337-1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3334) Fix Windows piggybank unit test failures

2013-05-29 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669774#comment-13669774
 ] 

Alan Gates commented on PIG-3334:
-

+1

> Fix Windows piggybank unit test failures
> 
>
> Key: PIG-3334
> URL: https://issues.apache.org/jira/browse/PIG-3334
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG-3334-1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3333) Fix remaining Windows core unit test failures

2013-05-29 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669771#comment-13669771
 ] 

Alan Gates commented on PIG-:
-

StreamingCommand.addPathToCache - This appears to always convert the path from 
/ to \.  Don't we only want to do this in the Windows case?  Alternatively we 
could always convert / and \ to System.getProperties("file.separator").

JavaCompilerHelp.addClassToPath - Rather than if on windows/unix why not just 
change it to 
{code}
this.classPath = this.classPath+ System.getProperties("path.separator") +path;
{code}

It looks like a bunch of \r's slipped into TestSample.java



> Fix remaining Windows core unit test failures
> -
>
> Key: PIG-
> URL: https://issues.apache.org/jira/browse/PIG-
> Project: Pig
>  Issue Type: Sub-task
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG--1.patch
>
>
> I combine a bunch of Windows unit test fixes into one patch to make things 
> cleaner. They all originated from obvious Windows/Unix inconsistencies, which 
> includes:
> 1. Path separator inconsistency: "/" vs "\"
> 2. Path component separator inconsistency: ":" vs ";"
> 3. "volume:" is not acceptable as URI
> 4. Unix tools/commands (eg, bash, rm) does not exist in Windows
> 5. .sh script need a .cmd companion in Windows
> 6. "\r\n" vs "\n" as newline
> 7. Environment variable use different name (USER vs USERNAME)
> 8. File not closed, not an issue in Unix, but an issue in Windows (not able 
> to remove a open file)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3257) Add unique identifier UDF

2013-05-29 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669660#comment-13669660
 ] 

Rohini Palaniswamy commented on PIG-3257:
-

Alan,
   Why don't we do it as a sequence instead of generating random numbers. Doing 
something like mapid- or reduceid-. i.e First mapper will 
do 0-0, 0-1..0-1. 2nd mapper will do 1-0,1-1,...1-1. Just a idea and we 
can think off a better implementation. It will anyways not be in sequence 
across the job -- but will be in sequence within the map and can be used as a 
UUID across the job which is repeatable if run with same number of 
mappers/reducers. This would avoid all problems of using random numbers and 
avoid human mistakes of writing a script without understanding the internals of 
how UUID is going to work which I don't think a user should be bothered with. 

> Add unique identifier UDF
> -
>
> Key: PIG-3257
> URL: https://issues.apache.org/jira/browse/PIG-3257
> Project: Pig
>  Issue Type: Improvement
>  Components: internal-udfs
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 0.12
>
> Attachments: PIG-3257.patch
>
>
> It would be good to have a Pig function to generate unique identifiers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3257) Add unique identifier UDF

2013-05-29 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669593#comment-13669593
 ] 

Alan Gates commented on PIG-3257:
-

Would it make you happy if we added to the javadoc comments on this function 
not to use it as a key in the same job it's generated in?

> Add unique identifier UDF
> -
>
> Key: PIG-3257
> URL: https://issues.apache.org/jira/browse/PIG-3257
> Project: Pig
>  Issue Type: Improvement
>  Components: internal-udfs
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 0.12
>
> Attachments: PIG-3257.patch
>
>
> It would be good to have a Pig function to generate unique identifiers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2956) Invalid cache specification for some streaming statement

2013-05-29 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2956:


Status: Patch Available  (was: Open)

> Invalid cache specification for some streaming statement
> 
>
> Key: PIG-2956
> URL: https://issues.apache.org/jira/browse/PIG-2956
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG-2956-1_0.10.patch, PIG-2956-1.patch, PIG-2956-2.patch
>
>
> Another category of failure in e2e tests, such as ComputeSpec_1, 
> ComputeSpec_2, ComputeSpec_3, RaceConditions_1, RaceConditions_3, 
> RaceConditions_4, RaceConditions_7, RaceConditions_8.
> Here is stack:
> ERROR 6003: Invalid cache specification. File doesn't exist: C:/Program Files 
> (x86)/GnuWin32/bin/head.exe
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
>  ERROR 2017: Internal error creating job configuration.
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:723)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:258)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:151)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1318)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1303)
> at org.apache.pig.PigServer.execute(PigServer.java:1293)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:364)
> at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:133)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
> at org.apache.pig.Main.run(Main.java:561)
> at org.apache.pig.Main.main(Main.java:111)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6003: 
> Invalid cache specification. File doesn't exist: C:/Program Files 
> (x86)/GnuWin32/bin/head.exe
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1151)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1129)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:447)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2956) Invalid cache specification for some streaming statement

2013-05-29 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669566#comment-13669566
 ] 

Alan Gates commented on PIG-2956:
-

+1

> Invalid cache specification for some streaming statement
> 
>
> Key: PIG-2956
> URL: https://issues.apache.org/jira/browse/PIG-2956
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG-2956-1_0.10.patch, PIG-2956-1.patch, PIG-2956-2.patch
>
>
> Another category of failure in e2e tests, such as ComputeSpec_1, 
> ComputeSpec_2, ComputeSpec_3, RaceConditions_1, RaceConditions_3, 
> RaceConditions_4, RaceConditions_7, RaceConditions_8.
> Here is stack:
> ERROR 6003: Invalid cache specification. File doesn't exist: C:/Program Files 
> (x86)/GnuWin32/bin/head.exe
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
>  ERROR 2017: Internal error creating job configuration.
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:723)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:258)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:151)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1318)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1303)
> at org.apache.pig.PigServer.execute(PigServer.java:1293)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:364)
> at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:133)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
> at org.apache.pig.Main.run(Main.java:561)
> at org.apache.pig.Main.main(Main.java:111)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6003: 
> Invalid cache specification. File doesn't exist: C:/Program Files 
> (x86)/GnuWin32/bin/head.exe
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1151)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1129)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:447)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3335) TestErrorHandling.tesNegative7 fails on MR2

2013-05-29 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669547#comment-13669547
 ] 

Xuefu Zhang commented on PIG-3335:
--

CHANGES.txt is updated. Thanks to Rohini for pointing it out.

> TestErrorHandling.tesNegative7 fails on MR2
> ---
>
> Key: PIG-3335
> URL: https://issues.apache.org/jira/browse/PIG-3335
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.11
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.12
>
> Attachments: PIG-3335.patch
>
>
> This test case fails when being tested with MR2:
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.parser.TestErrorHandling.tesNegative7(TestErrorHandling.java:138)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3335) TestErrorHandling.tesNegative7 fails on MR2

2013-05-29 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-3335:
-

Fix Version/s: (was: 0.11.2)
   0.12

> TestErrorHandling.tesNegative7 fails on MR2
> ---
>
> Key: PIG-3335
> URL: https://issues.apache.org/jira/browse/PIG-3335
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.11
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.12
>
> Attachments: PIG-3335.patch
>
>
> This test case fails when being tested with MR2:
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.parser.TestErrorHandling.tesNegative7(TestErrorHandling.java:138)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3316) Pig failed to interpret DateTime values in some special cases

2013-05-29 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669530#comment-13669530
 ] 

Xuefu Zhang commented on PIG-3316:
--

@Rohini Thanks for pointing that out. I was quite clear about the versions. I 
will update CHANGES.txt as suggested for this (and other JIRAs).

> Pig failed to interpret DateTime values in some special cases
> -
>
> Key: PIG-3316
> URL: https://issues.apache.org/jira/browse/PIG-3316
> Project: Pig
>  Issue Type: Bug
>  Components: data, impl
>Affects Versions: 0.11
> Environment: 1970-01-01
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.12
>
> Attachments: PIG-3316.patch
>
>
> For the query
> A = load 'date.txt' as ( f1:int, f2:datetime );
> dump A;
> with input data
> 1,1970-01-01
> 2,1970-01
> pig generates the following output
> (1,1970-01-01T00:00:00.000-01:00)
> (2,1970-01-01T00:00:00.000-01:00)
> which seemingly incorrectly interprets the day or month part as time zone.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: A major addition to Pig. Working with spatial data

2013-05-29 Thread Russell Jurney
Awesome. This would be a great addition to Pig. Please create a JIRA.

Russell Jurney http://datasyndrome.com

On May 29, 2013, at 8:51 AM, Ahmed Eldawy  wrote:

> Hi all,
>
> Nick has pointed out to me an alternative GIS package that can replace JTS.
> ESRI has recently released a GIS
> packageunder Apache
> license. I changed Pigeon to work with that new package. I
> think it could be easier now to integrate this work with main branch of
> Apache Pig. I will go on with the current project and add more spatial
> functionality. We can then add a new datatype to Apache and link it to
> those functions.
>
> ESRI package contains a class OGCGeometry
> which
> can be linked to a new datatype 'Geometry'. Do you think we can rely on the
> new package and integrate the work with Apache Pig?
>
> On May 23, 2013 11:40 PM, "Ahmed Eldawy"  wrote:
>
>> Hi all,
>>  Thanks for your help. I've started the project with a minimal
>> functionality as a start. It's currently hosted in github. It is licensed
>> under the Apache public license to make it easier to merge with Pig.
>> Currently it has only a very few functions. I implemented a function from
>> different types of functions (e.g., Aggregate and create). I'll keep adding
>> functions and any contributions to the project are welcome. As a beginning,
>> I need an ANT build file that runs the tests, compiles and generates a jar
>> file. I'm not familiar with ANT so any help in this is encouraged.
>> Here's the project home page
>> https://github.com/aseldawy/pigeon
>>
>>
>> If you have any comments or suggestion please contact me.
>>
>>
>> Best regards,
>> Ahmed Eldawy
>>
>>
>> On Mon, May 6, 2013 at 3:09 PM, Jonathan Coveney wrote:
>>
>>> Nick: the only issue is that the way types are implemented in Pig don't
>>> allow us to easily "plug-in" types externally. Adding support for that
>>> would be cool, but a fair bit of work.
>>>
>>>
>>> 2013/5/6 Nick Dimiduk 
>>>
 I'm to a lawyer, but I see no reason why this cannot be an external
 extension to Pig. It would behave the same way PostGIS is an external
 extension to Postgres. Any Apache issues would be toward general
 purpose enhancements, not specific to your project.

 Good on you!
 -n

 On Mon, May 6, 2013 at 10:12 AM, Ahmed Eldawy 
>>> wrote:

> I contacted solr developers to see how JTS can be included in an
>>> Apache
> project. See
>>> http://mail-archives.apache.org/mod_mbox/lucene-dev/201305.mbox/raw/%3C1367815102914-4060969.post%40n3.nabble.com%3E/
> As far as I understand, they did not include it in the main solr
>>> project,
> rather, they created a separate project (spatial 4j) which is still
> licensed under Apache license and refers to JTS. Users will have to
> download JTS libraries separately to make it run. That's pretty much
>>> the
> same plan that Jonathan mentioned. We will still have the overhead of
> serializing/deserializing the shapes each time a function is called.
 Also,
> we will have to use the ugly bytearray data type for spatial data
>>> instead
> of creating its own data type (e.g., Geometry).
> I think using spatial 4j instead of JTS will not be sufficient for our
 case
> as we need to provide an access to all spatial functions of JTS such
>>> as
> Union, Intersection, Difference, ... etc. This way we can claim
 conformity
> with OGC standards which gives visibility and appreciations of the
 spatial
> community.
> I think also that this means I will not add any issues to JIRA as it
>>> is
 now
> a separate project. I'm planning to host it on github and have all the
> issues there.
> Let me know if you have any suggestions or comments.
>
> Thanks
> Ahmed
>
>
> Best regards,
> Ahmed Eldawy
>
>
> On Mon, May 6, 2013 at 9:53 AM, Jonathan Coveney 
> wrote:
>
>> You can give them all the same label or tag and filter on that later
 on.
>>
>>
>> 2013/5/6 Ahmed Eldawy 
>>
>>> Thanks all for taking the time to respond. Danial, I didn't know
>>> that
>> Solr
>>> uses JTS. This is a good finding and we can definitely ask them to
 see
> if
>>> there is a work around we can do. Jonathan, I thought of the same
 idea
> of
>>> serializing/deserializing a bytearray each time a UDF is called.
>>> The
>>> deserialization part is good for letting Pig auto detect spatial
 types
> if
>>> not set explicitly in the schema. What is the best way to start
 this? I
>>> want to add an initial set of JIRA issues and start working on
>>> them
> but I
>>> also need to keep the work grouped in some sense just for
 organization.
>>>
>>> Thanks
>>> Ahmed
>>>
>>> Best regards,
>>> Ahmed Eld

[jira] [Updated] (PIG-3339) Move pattern compilation in ToDate as a static variable

2013-05-29 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3339:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk (0.12). Thanks Cheolsoo. 

> Move pattern compilation in ToDate as a static variable
> ---
>
> Key: PIG-3339
> URL: https://issues.apache.org/jira/browse/PIG-3339
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.11.1
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3339-1.patch
>
>
> Pattern compilation is costly. It is currently being done for every tuple in 
> ToDate.extractDateTimeZone(). Should be a static variable. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3339) Move pattern compilation in ToDate as a static variable

2013-05-29 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669384#comment-13669384
 ] 

Cheolsoo Park commented on PIG-3339:


+1

> Move pattern compilation in ToDate as a static variable
> ---
>
> Key: PIG-3339
> URL: https://issues.apache.org/jira/browse/PIG-3339
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.11.1
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3339-1.patch
>
>
> Pattern compilation is costly. It is currently being done for every tuple in 
> ToDate.extractDateTimeZone(). Should be a static variable. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: A major addition to Pig. Working with spatial data

2013-05-29 Thread Ahmed Eldawy
Hi all,

Nick has pointed out to me an alternative GIS package that can replace JTS.
ESRI has recently released a GIS
packageunder Apache
license. I changed Pigeon to work with that new package. I
think it could be easier now to integrate this work with main branch of
Apache Pig. I will go on with the current project and add more spatial
functionality. We can then add a new datatype to Apache and link it to
those functions.

ESRI package contains a class OGCGeometry
which
can be linked to a new datatype 'Geometry'. Do you think we can rely on the
new package and integrate the work with Apache Pig?

On May 23, 2013 11:40 PM, "Ahmed Eldawy"  wrote:

> Hi all,
>   Thanks for your help. I've started the project with a minimal
> functionality as a start. It's currently hosted in github. It is licensed
> under the Apache public license to make it easier to merge with Pig.
> Currently it has only a very few functions. I implemented a function from
> different types of functions (e.g., Aggregate and create). I'll keep adding
> functions and any contributions to the project are welcome. As a beginning,
> I need an ANT build file that runs the tests, compiles and generates a jar
> file. I'm not familiar with ANT so any help in this is encouraged.
> Here's the project home page
> https://github.com/aseldawy/pigeon
>
>
> If you have any comments or suggestion please contact me.
>
>
> Best regards,
> Ahmed Eldawy
>
>
> On Mon, May 6, 2013 at 3:09 PM, Jonathan Coveney wrote:
>
>> Nick: the only issue is that the way types are implemented in Pig don't
>> allow us to easily "plug-in" types externally. Adding support for that
>> would be cool, but a fair bit of work.
>>
>>
>> 2013/5/6 Nick Dimiduk 
>>
>> > I'm to a lawyer, but I see no reason why this cannot be an external
>> > extension to Pig. It would behave the same way PostGIS is an external
>> > extension to Postgres. Any Apache issues would be toward general
>> > purpose enhancements, not specific to your project.
>> >
>> > Good on you!
>> > -n
>> >
>> > On Mon, May 6, 2013 at 10:12 AM, Ahmed Eldawy 
>> wrote:
>> >
>> > > I contacted solr developers to see how JTS can be included in an
>> Apache
>> > > project. See
>> > >
>> > >
>> >
>> http://mail-archives.apache.org/mod_mbox/lucene-dev/201305.mbox/raw/%3C1367815102914-4060969.post%40n3.nabble.com%3E/
>> > > As far as I understand, they did not include it in the main solr
>> project,
>> > > rather, they created a separate project (spatial 4j) which is still
>> > > licensed under Apache license and refers to JTS. Users will have to
>> > > download JTS libraries separately to make it run. That's pretty much
>> the
>> > > same plan that Jonathan mentioned. We will still have the overhead of
>> > > serializing/deserializing the shapes each time a function is called.
>> > Also,
>> > > we will have to use the ugly bytearray data type for spatial data
>> instead
>> > > of creating its own data type (e.g., Geometry).
>> > > I think using spatial 4j instead of JTS will not be sufficient for our
>> > case
>> > > as we need to provide an access to all spatial functions of JTS such
>> as
>> > > Union, Intersection, Difference, ... etc. This way we can claim
>> > conformity
>> > > with OGC standards which gives visibility and appreciations of the
>> > spatial
>> > > community.
>> > > I think also that this means I will not add any issues to JIRA as it
>> is
>> > now
>> > > a separate project. I'm planning to host it on github and have all the
>> > > issues there.
>> > > Let me know if you have any suggestions or comments.
>> > >
>> > > Thanks
>> > > Ahmed
>> > >
>> > >
>> > > Best regards,
>> > > Ahmed Eldawy
>> > >
>> > >
>> > > On Mon, May 6, 2013 at 9:53 AM, Jonathan Coveney 
>> > > wrote:
>> > >
>> > > > You can give them all the same label or tag and filter on that later
>> > on.
>> > > >
>> > > >
>> > > > 2013/5/6 Ahmed Eldawy 
>> > > >
>> > > > > Thanks all for taking the time to respond. Danial, I didn't know
>> that
>> > > > Solr
>> > > > > uses JTS. This is a good finding and we can definitely ask them to
>> > see
>> > > if
>> > > > > there is a work around we can do. Jonathan, I thought of the same
>> > idea
>> > > of
>> > > > > serializing/deserializing a bytearray each time a UDF is called.
>> The
>> > > > > deserialization part is good for letting Pig auto detect spatial
>> > types
>> > > if
>> > > > > not set explicitly in the schema. What is the best way to start
>> > this? I
>> > > > > want to add an initial set of JIRA issues and start working on
>> them
>> > > but I
>> > > > > also need to keep the work grouped in some sense just for
>> > organization.
>> > > > >
>> > > > > Thanks
>> > > > > Ahmed
>> > > > >
>> > > > > Best regards,
>> > > > > Ahmed Eldawy
>> > > > >
>> > > > >
>> > > > > On Sat, May 4, 2013 at 4:47 PM, Jonathan Coveney <
>> jcove...@gmail.com
>> > >
>> > > >

[jira] [Updated] (PIG-3339) Move pattern compilation in ToDate as a static variable

2013-05-29 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3339:


Attachment: PIG-3339-1.patch

Very small patch. No new unit tests as it is just making a variable static. Ran 
TestDefaultDateTimeZone and it passed. 

> Move pattern compilation in ToDate as a static variable
> ---
>
> Key: PIG-3339
> URL: https://issues.apache.org/jira/browse/PIG-3339
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.11.1
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3339-1.patch
>
>
> Pattern compilation is costly. It is currently being done for every tuple in 
> ToDate.extractDateTimeZone(). Should be a static variable. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3339) Move pattern compilation in ToDate as a static variable

2013-05-29 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3339:


Status: Patch Available  (was: Open)

> Move pattern compilation in ToDate as a static variable
> ---
>
> Key: PIG-3339
> URL: https://issues.apache.org/jira/browse/PIG-3339
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.11.1
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3339-1.patch
>
>
> Pattern compilation is costly. It is currently being done for every tuple in 
> ToDate.extractDateTimeZone(). Should be a static variable. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3339) Move pattern compilation in ToDate as a static variable

2013-05-29 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created PIG-3339:
---

 Summary: Move pattern compilation in ToDate as a static variable
 Key: PIG-3339
 URL: https://issues.apache.org/jira/browse/PIG-3339
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.11.1
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12


Pattern compilation is costly. It is currently being done for every tuple in 
ToDate.extractDateTimeZone(). Should be a static variable. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3335) TestErrorHandling.tesNegative7 fails on MR2

2013-05-29 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669304#comment-13669304
 ] 

Rohini Palaniswamy commented on PIG-3335:
-

[~xuefuz],
   This patch also seems to have gone into trunk (0.12) but marked as 0.11.2. 
CHANGES.txt also does not have the information. Can you please take a look at 
all the jira's that you have fixed and update the Fix Version and also update 
the CHANGES.txt.  

> TestErrorHandling.tesNegative7 fails on MR2
> ---
>
> Key: PIG-3335
> URL: https://issues.apache.org/jira/browse/PIG-3335
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.11
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.11.2
>
> Attachments: PIG-3335.patch
>
>
> This test case fails when being tested with MR2:
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.parser.TestErrorHandling.tesNegative7(TestErrorHandling.java:138)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2537) Output from flatten with a null tuple input generating data inconsistent with the schema

2013-05-29 Thread Peter Connolly (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669256#comment-13669256
 ] 

Peter Connolly commented on PIG-2537:
-

As a workaround, I'm able to move the FLATTEN operator to the rightmost column 
and then run a second generate on all of the fields to fix this problem.  I'm 
only dealing with two columns in the tuple, so I'm not sure it will work with 
more columns.

Using the example above, it might look something like this:
grunt> A = load 'file' as ( a : tuple( x, y, z ), b, c );
--B will have a variable number of null columns on the right side, but columns 
b and c will be correct
grunt> B = foreach A generate b, c, flatten( $0 ) AS (x,y,z);
--Running another foreach inserts null values for the extra columns
grunt> C = foreach B generate b,c,x,y,z;



> Output from flatten with a null tuple input generating data inconsistent with 
> the schema
> 
>
> Key: PIG-2537
> URL: https://issues.apache.org/jira/browse/PIG-2537
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Xuefu Zhang
>Assignee: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG-2537-1.patch, PIG-2537-2.patch, PIG-2537-3.patch
>
>
> For the following pig script,
> grunt> A = load 'file' as ( a : tuple( x, y, z ), b, c );
> grunt> B = foreach A generate flatten( $0 ), b, c;
> grunt> describe B;
> B: {a::x: bytearray,a::y: bytearray,a::z: bytearray,b: bytearray,c: bytearray}
> Alias B has a clear schema.
> However, on the backend, for a row if $0 happens to be null, then output 
> tuple become something like 
> (null, b_value, c_value), which is obviously inconsistent with the schema. 
> The behaviour is confirmed by pig code inspection. 
> This inconsistency corrupts data because of position shifts. Expected output 
> row should be something like
> (null, null, null, b_value, c_value).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3257) Add unique identifier UDF

2013-05-29 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669195#comment-13669195
 ] 

Koji Noguchi commented on PIG-3257:
---

With your first example, say you have _n_ input records. 1 mapper 2 reducers.
{noformat}
A = load ...
B = group A by UUID();
STORE B ...
{noformat}
This job could successfully finish with output ranging from 0 to 2n records.
For example, sequence of events can be, 
   # mapper0_attempt0 finish with n outputs and say all n uuid keys were 
assigned to reducer0.
   # reducer0_attempt0 pulls map outputs and produces _n_ outputs.
   # reducer1_attempt0 tries to pull mapper0_attempt0 output and fail. (could 
be fetch failure or node failure).
   # mapper0_attempt1 rerun. And this time, all n uuid keys were assigned to 
reducer1.
   # reducer1_attempt0 pulls mapper0_attempt1 output and produces n outputs.
   # job finish successfully with 2n outputs.

This is certainly unexpected to users.

Now, with your second example
{noformat}
A = load 'over100k' using org.apache.hcatalog.pig.HCatLoader();
B = foreach A generate *, UUID();
C = group B by s;
D = foreach C generate flatten(B), SUM(B.i) as sum_b;
E = group B by si;
F = foreach E generate flatten(B), SUM(B.f) as sum_f;
G = join D by uuid, F by uuid;
H = foreach G generate D::B::s, sum_b, sum_f;
store H into 'output';
{noformat}

Let's say pig decides to implement the two group by (C and E) with one 
map-reduce job. For simplicity purposes let's use 1 mapper 2 reducers again and 
assume pig decides to partition all group by in _C_ to reducer0 and _E_ to 
reducer1.  Now, using the same story as above, there could be a case where 
reducer0(group-by-C) gets one set of UUID from mapper0_attempt0  and 
reducer1(group-by-E) gets another completely different set of UUID from 
mapper0_attempt1.

When this happen, join _G_ would produce 0 results which is unexpected to users.
Of course this depends on how pig performs the above query but I hope it 
demonstrates how tricky it gets when introducing a pure random id in hadoop.

What's worst about all these is that this is a corner case which won't get 
caught in users' QE phases and it would only manifest during production 
pipeline.  Users would then yell at me for corrupted output from successful 
jobs.  Thus my previous comment on "support nightmare".






> Add unique identifier UDF
> -
>
> Key: PIG-3257
> URL: https://issues.apache.org/jira/browse/PIG-3257
> Project: Pig
>  Issue Type: Improvement
>  Components: internal-udfs
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 0.12
>
> Attachments: PIG-3257.patch
>
>
> It would be good to have a Pig function to generate unique identifiers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3329) RANK operator failed when working with SPLIT

2013-05-29 Thread Johnny Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669057#comment-13669057
 ] 

Johnny Zhang commented on PIG-3329:
---

[~xalan], thanks for your comments.
"the RANK BY provides a ranking by considering the values of column a. Instead, 
RANK assigns a sequential number to each tuple."
does this refers to LogToPhyTranslationVisitor.visit(LORank loRank) function ?

> RANK operator failed when working with SPLIT 
> -
>
> Key: PIG-3329
> URL: https://issues.apache.org/jira/browse/PIG-3329
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: Redis Liu
>Assignee: Allan AvendaƱo
>Priority: Critical
>
> input.txt:
> 1 2 3
> 4 5 6
> 7 8 9
> script:
> a = load 'input.txt' using PigStorage(' ') as (a:int, b:int, c:int);
> SPLIT a into b if a > 0, c if a > 5;
> d = RANK b;
> dump d;
> job will fail with error message:
> java.lang.RuntimeException: Unable to read counter 
> pig.counters.counter_4929375455335572575_-1
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PORank.addRank(PORank.java:161)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PORank.getNext(PORank.java:134)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:308)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:214)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:157)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:275)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1340)
>   at org.apache.hadoop.mapred.Child.main(Child.java:269)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira