----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6536/ -----------------------------------------------------------
(Updated Aug. 16, 2012, 10:18 p.m.) Review request for pig. Changes ------- Incorporate Santhosh's comments. Description ------- Allow recursive records to be loaded/stored by AvroStorage. The changes include: 1) Remove the recursive record check from AvroSchema2Pig. 2) Modofy inconvert() in AvroSchema2Pig so that it can map recursive records to bytearrays. 3) Modify containsGenericUnion() in AvroStorageUtils so that it can handle Avro schema that contains recursive records. 4) Update the parameter parsing in AvroStorage so that 'no_schema_check' can be passed to both LoadFunc and StoreFunc. 5) Add the recursive record check to AvroSchemaManager. This is needed because 'schema_file' and 'data' cannot refer to avro schema that contains recursive records. AvroStorage works as follows: 1) PigSchema maps recursive records to bytearrays, so there is discrepancy between Avro schema and Pig schema. 2) Recursive records are loaded as tuples even though Pig schema defines them as bytearrays and can be referred to by position (e.g. $0, $1.$0, etc). 3) To store recursive records, Avro schema must be provided via the 'schema' or 'same' parameter in StoreFunc. In addition, 'no_schema_check' must be enabled because otherwise schema check will fail due to discrepancy between Avro schema and Pig schema. 4) Avro schema cannot be specified by the 'data' or 'schema_file' parameter. This is because AvroSchemaManager cannot handle recursive records for now. The recursive record check is added to AvroSchemaManager, so if Avro schema that contains recursive records is specified by these parameters, an exception is thrown. This addresses bug PIG-2875. https://issues.apache.org/jira/browse/PIG-2875 Diffs (updated) ----- contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchema2Pig.java 6b1d2a1 contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchemaManager.java 1939d3e contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java c9f7d81 contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java e24b495 contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 2fab3f7 contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 040234f Diff: https://reviews.apache.org/r/6536/diff/ Testing ------- New test cases are added as follows: 1) Load/store Avro files that contain recursive records in array, map, union, and another record. 2) Load Avro files that contains recursive records, generate new relations, apply filters, and store them as non-recursive records. 3) Tests for the StoreFunc parameters: no_schema_check, schema, same, schema_file, and data. Thanks, Cheolsoo Park