-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6536/
-----------------------------------------------------------

Review request for pig.


Description
-------

Allow recursive records to be loaded/stored by AvroStorage.

The changes include:

1) Remove the recursive record check from AvroSchema2Pig.
2) Modofy inconvert() in AvroSchema2Pig so that it can map recursive records to 
bytearrays.
3) Modify containsGenericUnion() in AvroStorageUtils so that it can handle Avro 
schema that contains recursive records.
4) Update the parameter parsing in AvroStorage so that 'no_schema_check' can be 
passed to both LoadFunc and StoreFunc.
5) Add the recursive record check to AvroSchemaManager. This is needed because 
'schema_file' and 'data' cannot refer to avro schema that contains recursive 
records.

AvroStorage works as follows:

1) PigSchema maps recursive records to bytearrays, so there is discrepancy 
between Avro schema and Pig schema.
2) Recursive records are loaded as tuples even though Pig schema defines them 
as bytearrays and can be referred to by position (e.g. $0, $1.$0, etc).
3) To store recursive records, Avro schema must be provided via the 'schema' or 
'same' parameter in StoreFunc. In addition, 'no_schema_check' must be enabled 
because otherwise schema check will fail due to discrepancy between Avro schema 
and Pig schema.
4) Avro schema cannot be specified by the 'data' or 'schema_file' parameter. 
This is because AvroSchemaManager cannot handle recursive records for now. The 
recursive record check is added to AvroSchemaManager, so if Avro schema that 
contains recursive records is specified by these parameters, an exception is 
thrown.


This addresses bug PIG-2869.
    https://issues.apache.org/jira/browse/PIG-2869


Diffs
-----

  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchema2Pig.java
 6b1d2a1 
  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchemaManager.java
 1939d3e 
  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
 c9f7d81 
  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
 e24b495 
  
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
 2fab3f7 
  
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java
 040234f 
  
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_generic_union_schema.avro
 4e23e73 
  
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_recursive_schema.avro
 12a36f8 

Diff: https://reviews.apache.org/r/6536/diff/


Testing
-------

New test cases are added as follows:

1) Load/store Avro files that contain recursive records in array, map, union, 
and another record.
2) Load Avro files that contains recursive records, generate new relations, 
apply filters, and store them as non-recursive records.
3) Tests for the StoreFunc parameters: no_schema_check, schema, same, 
schema_file, and data.


Thanks,

Cheolsoo Park

Reply via email to