[jira] [Updated] (PIG-3414) QueryParserDriver.parseSchema(String) silently returns a wrong result when a comma is missing in the schema definition

2013-08-13 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3414:
---

Status: Patch Available  (was: Reopened)

All the unit tests are fixed. I uploaded the new patch to RB:
https://reviews.apache.org/r/13551/

Please review one more time. Thanks!

> QueryParserDriver.parseSchema(String) silently returns a wrong result when a 
> comma is missing in the schema definition
> --
>
> Key: PIG-3414
> URL: https://issues.apache.org/jira/browse/PIG-3414
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: 0.12
>
> Attachments: PIG-3414-2.patch, PIG-3414-3.patch, PIG-3414-4.patch, 
> PIG-3414-5.patch, PIG-3414-6.patch, PIG-3414.patch
>
>
> QueryParserDriver provides a convenient method to parse from string to 
> LogicalSchema. But if a comma is missing between two fields in the schema 
> definition, it silently returns a wrong result. For example,
> {code}
> a:int b:long
> {code}
> This string will be parsed up to "a:int", and "b:long" will be silently 
> discarded. This should rather fail with a parser exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request 13551: PIG-3414 Utils.getSchemaFromString() silently returns a wrong result when a comma is missing in the schema definition

2013-08-13 Thread Cheolsoo Park

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13551/
---

Review request for pig.


Bugs: PIG-3414
https://issues.apache.org/jira/browse/PIG-3414


Repository: pig-git


Description
---

I updated the schema parsing grammar, so an invalid schema string throws a 
parser exception rather than silently returns partial schema.

While running the unit tests, I found and fixed bugs in the following unit 
tests:
TestSchema
TestSchemaTuple
TestPOCast


Diffs
-

  src/org/apache/pig/parser/QueryParser.g 6040389 
  src/org/apache/pig/parser/QueryParserDriver.java bdad431 
  test/org/apache/pig/data/TestSchemaTuple.java 212c00a 
  test/org/apache/pig/test/TestPOCast.java b6c395f 
  test/org/apache/pig/test/TestSchema.java bfe76c4 

Diff: https://reviews.apache.org/r/13551/diff/


Testing
---

Added a new test case to TestSchema.

All the unit tests pass.


Thanks,

Cheolsoo Park



[jira] [Updated] (PIG-3414) QueryParserDriver.parseSchema(String) silently returns a wrong result when a comma is missing in the schema definition

2013-08-13 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3414:
---

Attachment: PIG-3414-6.patch

> QueryParserDriver.parseSchema(String) silently returns a wrong result when a 
> comma is missing in the schema definition
> --
>
> Key: PIG-3414
> URL: https://issues.apache.org/jira/browse/PIG-3414
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: 0.12
>
> Attachments: PIG-3414-2.patch, PIG-3414-3.patch, PIG-3414-4.patch, 
> PIG-3414-5.patch, PIG-3414-6.patch, PIG-3414.patch
>
>
> QueryParserDriver provides a convenient method to parse from string to 
> LogicalSchema. But if a comma is missing between two fields in the schema 
> definition, it silently returns a wrong result. For example,
> {code}
> a:int b:long
> {code}
> This string will be parsed up to "a:int", and "b:long" will be silently 
> discarded. This should rather fail with a parser exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3412) jsonstorage breaks when tuple does not have as many columns as schema

2013-08-13 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3412:
---

Issue Type: Bug  (was: Improvement)

> jsonstorage breaks when tuple does not have as many columns as schema
> -
>
> Key: PIG-3412
> URL: https://issues.apache.org/jira/browse/PIG-3412
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Adam Silberstein
>Assignee: Adam Silberstein
> Fix For: 0.12
>
> Attachments: jsonStoragePatch.patch
>
>
> Noticed this error when doing something like 
> A = flatten(STRSPLIT($0, ',', 3)) AS (col1:chararray, col2:chararray, 
> col3:chararray);
> STORE A INTO 'foo' USING JsonStorage();
> If the string being split doesn't generate 3 columns, then JsonStorage errors 
> out with an index exception.  This is because it tries to read the fields of 
> the tuple passed to it or not.  See JsonStorage, line 148.
> MY patch checks the length of the tuple.  If any schema column positions are 
> past the length of the tuple, it fills in null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3412) jsonstorage breaks when tuple does not have as many columns as schema

2013-08-13 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3412:
---

   Resolution: Fixed
Fix Version/s: (was: 0.11)
   0.12
 Assignee: Adam Silberstein
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thank you Adam!

> jsonstorage breaks when tuple does not have as many columns as schema
> -
>
> Key: PIG-3412
> URL: https://issues.apache.org/jira/browse/PIG-3412
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.11
>Reporter: Adam Silberstein
>Assignee: Adam Silberstein
> Fix For: 0.12
>
> Attachments: jsonStoragePatch.patch
>
>
> Noticed this error when doing something like 
> A = flatten(STRSPLIT($0, ',', 3)) AS (col1:chararray, col2:chararray, 
> col3:chararray);
> STORE A INTO 'foo' USING JsonStorage();
> If the string being split doesn't generate 3 columns, then JsonStorage errors 
> out with an index exception.  This is because it tries to read the fields of 
> the tuple passed to it or not.  See JsonStorage, line 148.
> MY patch checks the length of the tuple.  If any schema column positions are 
> past the length of the tuple, it fills in null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3412) jsonstorage breaks when tuple does not have as many columns as schema

2013-08-13 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739160#comment-13739160
 ] 

Cheolsoo Park commented on PIG-3412:


+1.

> jsonstorage breaks when tuple does not have as many columns as schema
> -
>
> Key: PIG-3412
> URL: https://issues.apache.org/jira/browse/PIG-3412
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.11
>Reporter: Adam Silberstein
> Fix For: 0.11
>
> Attachments: jsonStoragePatch.patch
>
>
> Noticed this error when doing something like 
> A = flatten(STRSPLIT($0, ',', 3)) AS (col1:chararray, col2:chararray, 
> col3:chararray);
> STORE A INTO 'foo' USING JsonStorage();
> If the string being split doesn't generate 3 columns, then JsonStorage errors 
> out with an index exception.  This is because it tries to read the fields of 
> the tuple passed to it or not.  See JsonStorage, line 148.
> MY patch checks the length of the tuple.  If any schema column positions are 
> past the length of the tuple, it fills in null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-3422) AvroStorage Failed to read paths separated by commas

2013-08-13 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park reassigned PIG-3422:
--

Assignee: Yuanli Dong

> AvroStorage Failed to read paths separated by commas
> 
>
> Key: PIG-3422
> URL: https://issues.apache.org/jira/browse/PIG-3422
> Project: Pig
>  Issue Type: Bug
>Reporter: Yuanli Dong
>Assignee: Yuanli Dong
> Attachments: PIG-3422_08132013.patch
>
>
> Suppose I want to load data using this script:
> a = load 
> './newavro/data/avro/Employee3.ser,./newavro/data/avro/Employee4.ser' USING 
> AvroStorage ();
> It will fail because multiple paths separated by commas are not handled by 
> Avrostorage

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-3420) Failed to retrieve values from data loaded by AvroStorage

2013-08-13 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park reassigned PIG-3420:
--

Assignee: Yuanli Dong

> Failed to retrieve values from data loaded by AvroStorage
> -
>
> Key: PIG-3420
> URL: https://issues.apache.org/jira/browse/PIG-3420
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12
>Reporter: Yuanli Dong
>Assignee: Yuanli Dong
> Fix For: 0.12
>
> Attachments: PIG-3420_08132013.patch
>
>
> Running the following script:
> a = load './newavro/data/avro/EmployeeMapF.ser' USING AvroStorage();
> dump a;
> c = foreach a generate name, office, 'Toyota', cars#'Toyota' as toyota, 
> 'Mazda', cars#'Mazda', 'Nissan', cars#'Nissan' as nissan;
> Although object a has all the data loaded, c cannot retrieve the map values, 
> column 4,6,8 are empty in the result.
> The map keys is of class Utf8, but the keys used to retrieve data is String, 
> that is the reason why we cannot retrieve the values. The patch fix this 
> problem. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3410) LimitOptimizer is applied before PartitionFilterOptimizer

2013-08-13 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739151#comment-13739151
 ] 

Cheolsoo Park commented on PIG-3410:


[~aniket486], did you run the full unit test suite? I just want to make sure we 
don't break anything.

> LimitOptimizer is applied before PartitionFilterOptimizer
> -
>
> Key: PIG-3410
> URL: https://issues.apache.org/jira/browse/PIG-3410
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Attachments: PIG-3410.patch
>
>
> Consider following script-
> {code}
> hcat_load = LOAD 'X' using org.apache.hcatalog.pig.HCatLoader();
> hcat_filter = FILTER hcat_load BY (part='Y');
> hcat_limited = limit hcat_filter 5;
> dump hcat_limited; 
> {code}
> This script is not benefited from LimitOptimizer (pushing limit to loadfunc) 
> because LimitOptimizer is applied before PartitionFilterOptimizer. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request 13535: [PIG-3204] Reduce the number of getSchema calls during script parsing

2013-08-13 Thread Cheolsoo Park

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13535/#review25098
---


Overall looks good. I have few minor comments.


http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestGrunt.java


I believe we shouldn't remove xargs. It was added by PIG-3099 to avoid some 
race condition.



http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPigServer.java


This is a bit confusing to me. 10 - 4 + 6 = 12, but numTimesInitiated is 
set to 10.

_testSkipParseInRegisterForBatch(false, 10, 4);



http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPigServer.java


If I understand correctly, this is equivalent to calling the followings:

GruntParser grunt = new GruntParser(in);
grunt.setInteractive(false);
grunt.setParams(pigServer);
grunt.parseStopOnError(false); //batch

Can you explicitly call them, so it will be easier to identify the 
difference when skipParseInRegisterForBatch is on and off?


- Cheolsoo Park


On Aug. 13, 2013, 2:34 p.m., Rohini Palaniswamy wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13535/
> ---
> 
> (Updated Aug. 13, 2013, 2:34 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Bugs: PIG-3204
> https://issues.apache.org/jira/browse/PIG-3204
> 
> 
> Repository: pig
> 
> 
> Description
> ---
> 
> Change parsing from line by line to whole script at once.
> 
> 
> Diffs
> -
> 
>   http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigServer.java 
> 1510960 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/tools/grunt/GruntParser.java
>  1510960 
>   
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestGrunt.java
>  1510960 
>   
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPigServer.java
>  1510960 
> 
> Diff: https://reviews.apache.org/r/13535/diff/
> 
> 
> Testing
> ---
> 
> New unit tests added to track the number of times a line is parsed. TestGrunt 
> and TestShortcuts test failures fixed.
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>



[jira] Subscription: PIG patch available

2013-08-13 Thread jira
Issue Subscription
Filter: PIG patch available (19 issues)

Subscriber: pigdaily

Key Summary
PIG-3420Failed to retrieve values from data loaded by AvroStorage
https://issues.apache.org/jira/browse/PIG-3420
PIG-3412jsonstorage breaks when tuple does not have as many columns as 
schema
https://issues.apache.org/jira/browse/PIG-3412
PIG-3410LimitOptimizer is applied before PartitionFilterOptimizer
https://issues.apache.org/jira/browse/PIG-3410
PIG-3405Top UDF documentation indicates improper use
https://issues.apache.org/jira/browse/PIG-3405
PIG-3379Alias reuse in nested foreach causes PIG script to fail
https://issues.apache.org/jira/browse/PIG-3379
PIG-3374CASE and IN fail when expression includes dereferencing operator
https://issues.apache.org/jira/browse/PIG-3374
PIG-3349Document ToString(Datetime, String) UDF
https://issues.apache.org/jira/browse/PIG-3349
PIG-3346New property that controls the number of combined splits
https://issues.apache.org/jira/browse/PIG-3346
PIG-Fix remaining Windows core unit test failures
https://issues.apache.org/jira/browse/PIG-
PIG-3325Adding a tuple to a bag is slow
https://issues.apache.org/jira/browse/PIG-3325
PIG-3295Casting from bytearray failing after Union (even when each field is 
from a single Loader)
https://issues.apache.org/jira/browse/PIG-3295
PIG-3292Logical plan invalid state: duplicate uid in schema during 
self-join to get cross product
https://issues.apache.org/jira/browse/PIG-3292
PIG-3257Add unique identifier UDF
https://issues.apache.org/jira/browse/PIG-3257
PIG-3204Reduce the number of getSchema calls during script parsing
https://issues.apache.org/jira/browse/PIG-3204
PIG-3199Expose LogicalPlan via PigServer API
https://issues.apache.org/jira/browse/PIG-3199
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3048Add mapreduce workflow information to job configuration
https://issues.apache.org/jira/browse/PIG-3048
PIG-3021Split results missing records when there is null values in the 
column comparison
https://issues.apache.org/jira/browse/PIG-3021
PIG-1914Support load/store JSON data in Pig
https://issues.apache.org/jira/browse/PIG-1914

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


[jira] [Updated] (PIG-3422) AvroStorage Failed to read paths separated by commas

2013-08-13 Thread Yuanli Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanli Dong updated PIG-3422:
-

Summary: AvroStorage Failed to read paths separated by commas  (was: Failed 
to read paths separated by commas)

> AvroStorage Failed to read paths separated by commas
> 
>
> Key: PIG-3422
> URL: https://issues.apache.org/jira/browse/PIG-3422
> Project: Pig
>  Issue Type: Bug
>Reporter: Yuanli Dong
> Attachments: PIG-3422_08132013.patch
>
>
> Suppose I want to load data using this script:
> a = load 
> './newavro/data/avro/Employee3.ser,./newavro/data/avro/Employee4.ser' USING 
> AvroStorage ();
> It will fail because multiple paths separated by commas are not handled by 
> Avrostorage

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3422) Failed to read paths separated by commas

2013-08-13 Thread Yuanli Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanli Dong updated PIG-3422:
-

Patch Info: Patch Available

> Failed to read paths separated by commas
> 
>
> Key: PIG-3422
> URL: https://issues.apache.org/jira/browse/PIG-3422
> Project: Pig
>  Issue Type: Bug
>Reporter: Yuanli Dong
> Attachments: PIG-3422_08132013.patch
>
>
> Suppose I want to load data using this script:
> a = load 
> './newavro/data/avro/Employee3.ser,./newavro/data/avro/Employee4.ser' USING 
> AvroStorage ();
> It will fail because multiple paths separated by commas are not handled by 
> Avrostorage

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3422) Failed to read paths separated by commas

2013-08-13 Thread Yuanli Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanli Dong updated PIG-3422:
-

Attachment: PIG-3422_08132013.patch

> Failed to read paths separated by commas
> 
>
> Key: PIG-3422
> URL: https://issues.apache.org/jira/browse/PIG-3422
> Project: Pig
>  Issue Type: Bug
>Reporter: Yuanli Dong
> Attachments: PIG-3422_08132013.patch
>
>
> Suppose I want to load data using this script:
> a = load 
> './newavro/data/avro/Employee3.ser,./newavro/data/avro/Employee4.ser' USING 
> AvroStorage ();
> It will fail because multiple paths separated by commas are not handled by 
> Avrostorage

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3422) Failed to read paths separated by commas

2013-08-13 Thread Yuanli Dong (JIRA)
Yuanli Dong created PIG-3422:


 Summary: Failed to read paths separated by commas
 Key: PIG-3422
 URL: https://issues.apache.org/jira/browse/PIG-3422
 Project: Pig
  Issue Type: Bug
Reporter: Yuanli Dong


Suppose I want to load data using this script:
a = load './newavro/data/avro/Employee3.ser,./newavro/data/avro/Employee4.ser' 
USING AvroStorage ();
It will fail because multiple paths separated by commas are not handled by 
Avrostorage

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3420) Failed to retrieve values from data loaded by AvroStorage

2013-08-13 Thread Yuanli Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738982#comment-13738982
 ] 

Yuanli Dong commented on PIG-3420:
--

I'll do that later

> Failed to retrieve values from data loaded by AvroStorage
> -
>
> Key: PIG-3420
> URL: https://issues.apache.org/jira/browse/PIG-3420
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12
>Reporter: Yuanli Dong
> Fix For: 0.12
>
> Attachments: PIG-3420_08132013.patch
>
>
> Running the following script:
> a = load './newavro/data/avro/EmployeeMapF.ser' USING AvroStorage();
> dump a;
> c = foreach a generate name, office, 'Toyota', cars#'Toyota' as toyota, 
> 'Mazda', cars#'Mazda', 'Nissan', cars#'Nissan' as nissan;
> Although object a has all the data loaded, c cannot retrieve the map values, 
> column 4,6,8 are empty in the result.
> The map keys is of class Utf8, but the keys used to retrieve data is String, 
> that is the reason why we cannot retrieve the values. The patch fix this 
> problem. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3420) Failed to retrieve values from data loaded by AvroStorage

2013-08-13 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738971#comment-13738971
 ] 

Rohini Palaniswamy commented on PIG-3420:
-

And the patch is missing unit test. Can you please add one?

> Failed to retrieve values from data loaded by AvroStorage
> -
>
> Key: PIG-3420
> URL: https://issues.apache.org/jira/browse/PIG-3420
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12
>Reporter: Yuanli Dong
> Fix For: 0.12
>
> Attachments: PIG-3420_08132013.patch
>
>
> Running the following script:
> a = load './newavro/data/avro/EmployeeMapF.ser' USING AvroStorage();
> dump a;
> c = foreach a generate name, office, 'Toyota', cars#'Toyota' as toyota, 
> 'Mazda', cars#'Mazda', 'Nissan', cars#'Nissan' as nissan;
> Although object a has all the data loaded, c cannot retrieve the map values, 
> column 4,6,8 are empty in the result.
> The map keys is of class Utf8, but the keys used to retrieve data is String, 
> that is the reason why we cannot retrieve the values. The patch fix this 
> problem. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-13 Thread Achal Soni (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738881#comment-13738881
 ] 

Achal Soni commented on PIG-3419:
-

Sorry my bad! Was trying to get it out as soon as possible that I brushed over 
some stuff too fast. 

[~dvryaboy] I will make these changes later today and post them up. I did 
actually get around the -y argument, just totally forgot to go back and get rid 
of that. In the meantime, the Review Board is located here for the current 
patch: 

https://reviews.apache.org/r/13541/

Once I update the patch later today I will post it here with the ReviewBoards 
as well. 

 - Achal

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Priority: Minor
> Attachments: pluggable_execengine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3420) Failed to retrieve values from data loaded by AvroStorage

2013-08-13 Thread Yuanli Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738879#comment-13738879
 ] 

Yuanli Dong commented on PIG-3420:
--

For 1),2) refleted in the new patch.
For 3), you can look at the code in class Schema in avro-1.7.4, the MapSchema 
type do not have an implementation of getField method, only RecordSchema has 
this function overridden. Then what we need is judging whether the oldSchema is 
a instance of RecordSchema, if it's not, we need to directly return it. 
Unfortunately, the class is a inner class in Schema class, and it is private, 
so we simply cannot achieve this.

> Failed to retrieve values from data loaded by AvroStorage
> -
>
> Key: PIG-3420
> URL: https://issues.apache.org/jira/browse/PIG-3420
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12
>Reporter: Yuanli Dong
> Fix For: 0.12
>
> Attachments: PIG-3420_08132013.patch
>
>
> Running the following script:
> a = load './newavro/data/avro/EmployeeMapF.ser' USING AvroStorage();
> dump a;
> c = foreach a generate name, office, 'Toyota', cars#'Toyota' as toyota, 
> 'Mazda', cars#'Mazda', 'Nissan', cars#'Nissan' as nissan;
> Although object a has all the data loaded, c cannot retrieve the map values, 
> column 4,6,8 are empty in the result.
> The map keys is of class Utf8, but the keys used to retrieve data is String, 
> that is the reason why we cannot retrieve the values. The patch fix this 
> problem. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3420) Failed to retrieve values from data loaded by AvroStorage

2013-08-13 Thread Yuanli Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanli Dong updated PIG-3420:
-

Attachment: PIG-3420_08132013.patch

> Failed to retrieve values from data loaded by AvroStorage
> -
>
> Key: PIG-3420
> URL: https://issues.apache.org/jira/browse/PIG-3420
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12
>Reporter: Yuanli Dong
> Fix For: 0.12
>
> Attachments: PIG-3420_08132013.patch
>
>
> Running the following script:
> a = load './newavro/data/avro/EmployeeMapF.ser' USING AvroStorage();
> dump a;
> c = foreach a generate name, office, 'Toyota', cars#'Toyota' as toyota, 
> 'Mazda', cars#'Mazda', 'Nissan', cars#'Nissan' as nissan;
> Although object a has all the data loaded, c cannot retrieve the map values, 
> column 4,6,8 are empty in the result.
> The map keys is of class Utf8, but the keys used to retrieve data is String, 
> that is the reason why we cannot retrieve the values. The patch fix this 
> problem. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3420) Failed to retrieve values from data loaded by AvroStorage

2013-08-13 Thread Yuanli Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanli Dong updated PIG-3420:
-

Attachment: (was: PIG-3420_08122013.patch)

> Failed to retrieve values from data loaded by AvroStorage
> -
>
> Key: PIG-3420
> URL: https://issues.apache.org/jira/browse/PIG-3420
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12
>Reporter: Yuanli Dong
> Fix For: 0.12
>
> Attachments: PIG-3420_08132013.patch
>
>
> Running the following script:
> a = load './newavro/data/avro/EmployeeMapF.ser' USING AvroStorage();
> dump a;
> c = foreach a generate name, office, 'Toyota', cars#'Toyota' as toyota, 
> 'Mazda', cars#'Mazda', 'Nissan', cars#'Nissan' as nissan;
> Although object a has all the data loaded, c cannot retrieve the map values, 
> column 4,6,8 are empty in the result.
> The map keys is of class Utf8, but the keys used to retrieve data is String, 
> that is the reason why we cannot retrieve the values. The patch fix this 
> problem. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3420) Failed to retrieve values from data loaded by AvroStorage

2013-08-13 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738854#comment-13738854
 ] 

Rohini Palaniswamy commented on PIG-3420:
-

Few comments on the patch:

1) You can skip the type cast in 
v = innerMap.get((String) key);
2) Keep the braces for if else block.
{code}
+if (v instanceof Utf8)
   return v.toString();
-} else {
+else
   return v;
-}
{code}
3) Can't we fix newSchemaFromRequiredFieldList to handle maps and bgas instead 
of catching AvroRunTimeException ?

> Failed to retrieve values from data loaded by AvroStorage
> -
>
> Key: PIG-3420
> URL: https://issues.apache.org/jira/browse/PIG-3420
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12
>Reporter: Yuanli Dong
> Fix For: 0.12
>
> Attachments: PIG-3420_08122013.patch
>
>
> Running the following script:
> a = load './newavro/data/avro/EmployeeMapF.ser' USING AvroStorage();
> dump a;
> c = foreach a generate name, office, 'Toyota', cars#'Toyota' as toyota, 
> 'Mazda', cars#'Mazda', 'Nissan', cars#'Nissan' as nissan;
> Although object a has all the data loaded, c cannot retrieve the map values, 
> column 4,6,8 are empty in the result.
> The map keys is of class Utf8, but the keys used to retrieve data is String, 
> that is the reason why we cannot retrieve the values. The patch fix this 
> problem. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3421) Script jars should be added to extra jars instead of pig's job.jar

2013-08-13 Thread Aniket Mokashi (JIRA)
Aniket Mokashi created PIG-3421:
---

 Summary: Script jars should be added to extra jars instead of 
pig's job.jar
 Key: PIG-3421
 URL: https://issues.apache.org/jira/browse/PIG-3421
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11.1
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi


Currently, for all the script engines, pig adds script jars to pig's job jar 
even without consulting the skipJars list. Ideally, we should add these to 
extraJars so that they can benefit from distributed cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-13 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738478#comment-13738478
 ] 

Julien Le Dem commented on PIG-3419:


Hi Achal
for large patches, please create a review here: https://reviews.apache.org

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Priority: Minor
> Attachments: pluggable_execengine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3048) Add mapreduce workflow information to job configuration

2013-08-13 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3048:


Assignee: Billie Rinaldi

> Add mapreduce workflow information to job configuration
> ---
>
> Key: PIG-3048
> URL: https://issues.apache.org/jira/browse/PIG-3048
> Project: Pig
>  Issue Type: Improvement
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Fix For: 0.11.2
>
> Attachments: PIG-3048.patch, PIG-3048.patch, PIG-3048.patch
>
>
> Adding workflow properties to the job configuration would enable logging and 
> analysis of workflows in addition to individual MapReduce jobs.  Suggested 
> properties include a workflow ID, workflow name, adjacency list connecting 
> nodes in the workflow, and the name of the current node in the workflow.
> mapreduce.workflow.id - a unique ID for the workflow, ideally prepended with 
> the application name
> e.g. pig_
> mapreduce.workflow.name - a name for the workflow, to distinguish this 
> workflow from other workflows and to group different runs of the same workflow
> e.g. pig command line
> mapreduce.workflow.adjacency - an adjacency list for the workflow graph, 
> encoded as mapreduce.workflow.adjacency. =  of target nodes>
> mapreduce.workflow.node.name - the name of the node corresponding to this 
> MapReduce job in the workflow adjacency list

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3048) Add mapreduce workflow information to job configuration

2013-08-13 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated PIG-3048:


Attachment: PIG-3048.patch

> Add mapreduce workflow information to job configuration
> ---
>
> Key: PIG-3048
> URL: https://issues.apache.org/jira/browse/PIG-3048
> Project: Pig
>  Issue Type: Improvement
>Reporter: Billie Rinaldi
> Fix For: 0.11.2
>
> Attachments: PIG-3048.patch, PIG-3048.patch, PIG-3048.patch
>
>
> Adding workflow properties to the job configuration would enable logging and 
> analysis of workflows in addition to individual MapReduce jobs.  Suggested 
> properties include a workflow ID, workflow name, adjacency list connecting 
> nodes in the workflow, and the name of the current node in the workflow.
> mapreduce.workflow.id - a unique ID for the workflow, ideally prepended with 
> the application name
> e.g. pig_
> mapreduce.workflow.name - a name for the workflow, to distinguish this 
> workflow from other workflows and to group different runs of the same workflow
> e.g. pig command line
> mapreduce.workflow.adjacency - an adjacency list for the workflow graph, 
> encoded as mapreduce.workflow.adjacency. =  of target nodes>
> mapreduce.workflow.node.name - the name of the node corresponding to this 
> MapReduce job in the workflow adjacency list

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3048) Add mapreduce workflow information to job configuration

2013-08-13 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated PIG-3048:


Fix Version/s: 0.11.2
   Status: Patch Available  (was: Open)

Updated patch for trunk.

> Add mapreduce workflow information to job configuration
> ---
>
> Key: PIG-3048
> URL: https://issues.apache.org/jira/browse/PIG-3048
> Project: Pig
>  Issue Type: Improvement
>Reporter: Billie Rinaldi
> Fix For: 0.11.2
>
> Attachments: PIG-3048.patch, PIG-3048.patch, PIG-3048.patch
>
>
> Adding workflow properties to the job configuration would enable logging and 
> analysis of workflows in addition to individual MapReduce jobs.  Suggested 
> properties include a workflow ID, workflow name, adjacency list connecting 
> nodes in the workflow, and the name of the current node in the workflow.
> mapreduce.workflow.id - a unique ID for the workflow, ideally prepended with 
> the application name
> e.g. pig_
> mapreduce.workflow.name - a name for the workflow, to distinguish this 
> workflow from other workflows and to group different runs of the same workflow
> e.g. pig command line
> mapreduce.workflow.adjacency - an adjacency list for the workflow graph, 
> encoded as mapreduce.workflow.adjacency. =  of target nodes>
> mapreduce.workflow.node.name - the name of the node corresponding to this 
> MapReduce job in the workflow adjacency list

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3204) Reduce the number of getSchema calls during script parsing

2013-08-13 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738297#comment-13738297
 ] 

Rohini Palaniswamy commented on PIG-3204:
-

Review board link - https://reviews.apache.org/r/13535/

> Reduce the number of getSchema calls during script parsing
> --
>
> Key: PIG-3204
> URL: https://issues.apache.org/jira/browse/PIG-3204
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.10.1
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3204-1.patch, PIG-3204-2.patch, PIG-3204-3.patch, 
> PIG-3204-4.patch
>
>
>   Currently there are a lot of NN calls made to determine if there is a 
> schema file for a path in a LOAD statement. When there is a slow NN(caused by 
> whole bunch of other issues), it takes a lot of time for this and we found 
> the scripts spending anywhere from 5 mins to 40 mins depending upon the 
> script. It seems to be a good place for optimization. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3204) Reduce the number of getSchema calls during script parsing

2013-08-13 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3204:


Attachment: PIG-3204-4.patch

Patch with whitespace changes. PIG-3204-3.patch does not have whitespace 
changes. 

> Reduce the number of getSchema calls during script parsing
> --
>
> Key: PIG-3204
> URL: https://issues.apache.org/jira/browse/PIG-3204
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.10.1
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3204-1.patch, PIG-3204-2.patch, PIG-3204-3.patch, 
> PIG-3204-4.patch
>
>
>   Currently there are a lot of NN calls made to determine if there is a 
> schema file for a path in a LOAD statement. When there is a slow NN(caused by 
> whole bunch of other issues), it takes a lot of time for this and we found 
> the scripts spending anywhere from 5 mins to 40 mins depending upon the 
> script. It seems to be a good place for optimization. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request 13535: [PIG-3204] Reduce the number of getSchema calls during script parsing

2013-08-13 Thread Rohini Palaniswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13535/
---

Review request for pig.


Bugs: PIG-3204
https://issues.apache.org/jira/browse/PIG-3204


Repository: pig


Description
---

Change parsing from line by line to whole script at once.


Diffs
-

  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigServer.java 
1510960 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/tools/grunt/GruntParser.java
 1510960 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestGrunt.java
 1510960 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPigServer.java
 1510960 

Diff: https://reviews.apache.org/r/13535/diff/


Testing
---

New unit tests added to track the number of times a line is parsed. TestGrunt 
and TestShortcuts test failures fixed.


Thanks,

Rohini Palaniswamy



[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-13 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738288#comment-13738288
 ] 

Dmitriy V. Ryaboy commented on PIG-3419:


oh 3 more things :)
I thought you found your way around the -y argument? I still see that in there.
Don't comment out blocks of code, just delete them
Add some documentation about creating new Exec Engines to the xml-based docs, 
or at least post it here. Just having it in javadocs is not sufficient.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Priority: Minor
> Attachments: pluggable_execengine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3204) Reduce the number of getSchema calls during script parsing

2013-08-13 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3204:


Attachment: PIG-3204-3.patch

> Reduce the number of getSchema calls during script parsing
> --
>
> Key: PIG-3204
> URL: https://issues.apache.org/jira/browse/PIG-3204
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.10.1
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3204-1.patch, PIG-3204-2.patch, PIG-3204-3.patch
>
>
>   Currently there are a lot of NN calls made to determine if there is a 
> schema file for a path in a LOAD statement. When there is a slow NN(caused by 
> whole bunch of other issues), it takes a lot of time for this and we found 
> the scripts spending anywhere from 5 mins to 40 mins depending upon the 
> script. It seems to be a good place for optimization. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3204) Reduce the number of getSchema calls during script parsing

2013-08-13 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3204:


Status: Patch Available  (was: Open)

> Reduce the number of getSchema calls during script parsing
> --
>
> Key: PIG-3204
> URL: https://issues.apache.org/jira/browse/PIG-3204
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.10.1
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3204-1.patch, PIG-3204-2.patch, PIG-3204-3.patch
>
>
>   Currently there are a lot of NN calls made to determine if there is a 
> schema file for a path in a LOAD statement. When there is a slow NN(caused by 
> whole bunch of other issues), it takes a lot of time for this and we found 
> the scripts spending anywhere from 5 mins to 40 mins depending upon the 
> script. It seems to be a good place for optimization. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-13 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738285#comment-13738285
 ] 

Dmitriy V. Ryaboy commented on PIG-3419:


Hi Achal,
That's a large patch.
Can you give us a roadmap for reading it -- what are the changes, at a high 
level? It looks like you had to change a bunch of stuff that's not (at first 
glance) directly related to exec mode.

Procedurally:
- please generate the patch using 'git diff -no-prefix' since the apache pig 
master is on svn
- please post the complete patch to Review Board, for ease of commenting
- please make sure that all new files have the apache license headers at the top

Thanks
-D

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Priority: Minor
> Attachments: pluggable_execengine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira