-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6536/#review10529
-----------------------------------------------------------

Ship it!


+1. Will run the tests again.

- Santhosh Srinivasan


On Aug. 20, 2012, 6:40 a.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6536/
> -----------------------------------------------------------
> 
> (Updated Aug. 20, 2012, 6:40 a.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> -------
> 
> Allow recursive records to be loaded/stored by AvroStorage.
> 
> The changes include:
> 
> 1) Remove the recursive record check from AvroSchema2Pig.
> 2) Modofy inconvert() in AvroSchema2Pig so that it can map recursive records 
> to bytearrays.
> 3) Modify containsGenericUnion() in AvroStorageUtils so that it can handle 
> Avro schema that contains recursive records.
> 4) Update the parameter parsing in AvroStorage so that 'no_schema_check' can 
> be passed to both LoadFunc and StoreFunc.
> 5) Add the recursive record check to AvroSchemaManager. This is needed 
> because 'schema_file' and 'data' cannot refer to avro schema that contains 
> recursive records.
> 
> AvroStorage works as follows:
> 
> 1) PigSchema maps recursive records to bytearrays, so there is discrepancy 
> between Avro schema and Pig schema.
> 2) Recursive records are loaded as tuples even though Pig schema defines them 
> as bytearrays and can be referred to by position (e.g. $0, $1.$0, etc).
> 3) To store recursive records, Avro schema must be provided via the 'schema' 
> or 'same' parameter in StoreFunc. In addition, 'no_schema_check' must be 
> enabled because otherwise schema check will fail due to discrepancy between 
> Avro schema and Pig schema.
> 4) Avro schema cannot be specified by the 'data' or 'schema_file' parameter. 
> This is because AvroSchemaManager cannot handle recursive records for now. 
> The recursive record check is added to AvroSchemaManager, so if Avro schema 
> that contains recursive records is specified by these parameters, an 
> exception is thrown.
> 
> 
> This addresses bug PIG-2875.
>     https://issues.apache.org/jira/browse/PIG-2875
> 
> 
> Diffs
> -----
> 
>   
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchema2Pig.java
>  6b1d2a1 
>   
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchemaManager.java
>  1939d3e 
>   
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
>  c9f7d81 
>   
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
>  e24b495 
>   
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
>  2fab3f7 
>   
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java
>  040234f 
>   
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_generic_union_schema.avro
>  4e23e73 
>   
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_recursive_schema.avro
>  12a36f8 
> 
> Diff: https://reviews.apache.org/r/6536/diff/
> 
> 
> Testing
> -------
> 
> New test cases are added as follows:
> 
> 1) Load/store Avro files that contain recursive records in array, map, union, 
> and another record.
> 2) Load Avro files that contains recursive records, generate new relations, 
> apply filters, and store them as non-recursive records.
> 3) Tests for the StoreFunc parameters: no_schema_check, schema, same, 
> schema_file, and data.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>

Reply via email to