> On Dec. 3, 2012, 7:22 p.m., Cheolsoo Park wrote:
> > Overall looks great! I haven't gone through the test cases yet, but here
> > are my comments so far.
> >
> >
> > 1) I noticed that I cannot load .avro files that are not record types. For
> > example, I tried to load a .avro file whose schema is "int" as follows:
> >
> > [cheolsoo@cheolsoo-mr1-0 pig-svn]$ java -jar avro-tools-1.5.4.jar getschema
> > foo2/test_int.avro
> > "int"
> >
> > [cheolsoo@cheolsoo-mr1-0 pig-svn]$ java -jar avro-tools-1.5.4.jar tojson
> > foo2/test_int.avro
> > 1
> >
> > in = LOAD 'foo2/test_int.avro' USING AvroStorage('int');
> > DUMP in;
> >
> > This gives me the following error:
> >
> > Caused by: java.io.IOException: avroSchemaToResourceSchema only processes
> > records
> >
> > Can only Avro record type be loaded? Or am I doing something wrong?
> >
> >
> > 2) TestAvroStorage needs to be more automated. To run it, I had to run the
> > following commands:
> >
> > ant clean compile-test
> > cd ./test/org/apache/pig/builtin/avro
> > python createests.py
> > cd -
> > ant clean test -Dtestcase=TestAvroStorage
> >
> > Ideally, I should be able to run a single command: ant clean
> > -Dtestcase=TestAvroStorage. Please let me know if you need help for this.
> >
> >
> > 3) python createests.py fails with the following errors. I suppose that
> > some files are missing:
> >
> > creating data/avro/uncompressed/testDirectoryCounts.avro
> > Exception in thread "main" java.io.FileNotFoundException:
> > data/json/testDirectoryCounts.json (No such file or directory)
> > ...
> > creating evenFileNameTestDirectoryCounts.avro
> > Exception in thread "main" java.io.FileNotFoundException:
> > data/json/evenFileNameTestDirectoryCounts.json (No such file or directory)
> > ...
> >
> >
> > 4) ant test -Dtestcase=TestAvroStorage fails with the following errors. I
> > suppose that this is due to the missing files:
> >
> > Testcase: testLoadDirectory took 0.005 sec
> > FAILED
> > Testcase: testLoadGlob took 0.004 sec
> > FAILED
> > Testcase: testPartialLoadGlob took 0.005 sec
> > FAILED
> >
> >
> > 5) Typo in the name of createests.py. It should be createtests.py.
> >
> >
> > 6) Is createTests.bash needed at all? If not, can you remove it?
> >
> >
> > I have more comments inline:
Sounds like the python script isn't working completely correctly. I'll debug
that script and make sure it generates all the required files.
Can I take you up on your offer to help automate that build process? I'm not
exactly sure what to modify to automatically run the python script to create
the test files.
> On Dec. 3, 2012, 7:22 p.m., Cheolsoo Park wrote:
> > src/org/apache/pig/builtin/AvroStorage.java, lines 296-305
> > <https://reviews.apache.org/r/8104/diff/1/?file=191564#file191564line296>
> >
> > This won't work in the following case. Let's say p matches two dirs,
> > and one dir is empty.
> >
> > p = foo*
> >
> > foo1
> > foo2/bar.avro
> >
> > I would expect the schema of bar.avro is returned, but I get an
> > IOException instead.
Added proper depth first search to find the first file. (I decided to sort by
modification date, most recent first.)
- Joseph
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8104/#review13962
-----------------------------------------------------------
On Nov. 17, 2012, 5:28 a.m., Joseph Adler wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8104/
> -----------------------------------------------------------
>
> (Updated Nov. 17, 2012, 5:28 a.m.)
>
>
> Review request for pig and Cheolsoo Park.
>
>
> Description
> -------
>
> The current AvroStorage implementation has a lot of issues: it requires old
> versions of Avro, it copies data much more than needed, and it's verbose and
> complicated. (One pet peeve of mine is that old versions of Avro don't
> support Snappy compression.)
>
> I rewrote AvroStorage from scratch to fix these issues. In early tests, the
> new implementation is significantly faster, and the code is a lot simpler.
> Rewriting AvroStorage also enabled me to implement support for Trevni.
>
> This is the latest version of the patch, complete with test cases and
> TrevniStorage. (Test cases for TrevniStorage are still missing).
>
>
> This addresses bug PIG-3015.
> https://issues.apache.org/jira/browse/PIG-3015
>
>
> Diffs
> -----
>
> build.xml 7d468a0
> ivy.xml 70e8d50
> ivy/libraries.properties 317564f
> src/org/apache/pig/builtin/AvroStorage.java PRE-CREATION
> src/org/apache/pig/builtin/TrevniStorage.java PRE-CREATION
> src/org/apache/pig/impl/util/AvroBagWrapper.java PRE-CREATION
> src/org/apache/pig/impl/util/AvroMapWrapper.java PRE-CREATION
> src/org/apache/pig/impl/util/AvroRecordReader.java PRE-CREATION
> src/org/apache/pig/impl/util/AvroRecordWriter.java PRE-CREATION
> src/org/apache/pig/impl/util/AvroStorageDataConversionUtilities.java
> PRE-CREATION
> src/org/apache/pig/impl/util/AvroStorageSchemaConversionUtilities.java
> PRE-CREATION
> src/org/apache/pig/impl/util/AvroTupleWrapper.java PRE-CREATION
> test/commit-tests 5081fbc
> test/org/apache/pig/builtin/TestAvroStorage.java PRE-CREATION
> test/org/apache/pig/builtin/avro/code/pig/directory_test.pig PRE-CREATION
> test/org/apache/pig/builtin/avro/code/pig/identity.pig PRE-CREATION
> test/org/apache/pig/builtin/avro/code/pig/identity_ai1_ao2.pig PRE-CREATION
> test/org/apache/pig/builtin/avro/code/pig/identity_ao2.pig PRE-CREATION
> test/org/apache/pig/builtin/avro/code/pig/identity_codec.pig PRE-CREATION
> test/org/apache/pig/builtin/avro/code/pig/identity_just_ao2.pig
> PRE-CREATION
> test/org/apache/pig/builtin/avro/code/pig/recursive_tests.pig PRE-CREATION
> test/org/apache/pig/builtin/avro/createTests.bash PRE-CREATION
> test/org/apache/pig/builtin/avro/createests.py PRE-CREATION
> test/org/apache/pig/builtin/avro/data/avro/deflate/records.avro
> PRE-CREATION
>
> test/org/apache/pig/builtin/avro/data/avro/deflate/recordsAsOutputByPig.avro
> PRE-CREATION
> test/org/apache/pig/builtin/avro/data/avro/snappy/records.avro PRE-CREATION
>
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordWithRepeatedSubRecords.avro
> PRE-CREATION
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/records.avro
> PRE-CREATION
>
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsAsOutputByPig.avro
> PRE-CREATION
>
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsOfArrays.avro
> PRE-CREATION
>
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsOfArraysOfRecords.avro
> PRE-CREATION
>
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsSubSchema.avro
> PRE-CREATION
>
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsSubSchemaNullable.avro
> PRE-CREATION
>
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsWithEnums.avro
> PRE-CREATION
>
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsWithFixed.avro
> PRE-CREATION
>
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsWithMaps.avro
> PRE-CREATION
>
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsWithMapsOfRecords.avro
> PRE-CREATION
>
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsWithNullableUnions.avro
> PRE-CREATION
>
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recursiveRecord.avro
> PRE-CREATION
>
> test/org/apache/pig/builtin/avro/data/json/recordWithRepeatedSubRecords.json
> PRE-CREATION
> test/org/apache/pig/builtin/avro/data/json/records.json PRE-CREATION
> test/org/apache/pig/builtin/avro/data/json/recordsAsOutputByPig.json
> PRE-CREATION
> test/org/apache/pig/builtin/avro/data/json/recordsOfArrays.json
> PRE-CREATION
> test/org/apache/pig/builtin/avro/data/json/recordsOfArraysOfRecords.json
> PRE-CREATION
> test/org/apache/pig/builtin/avro/data/json/recordsSubSchema.json
> PRE-CREATION
> test/org/apache/pig/builtin/avro/data/json/recordsSubSchemaNullable.json
> PRE-CREATION
> test/org/apache/pig/builtin/avro/data/json/recordsWithEnums.json
> PRE-CREATION
> test/org/apache/pig/builtin/avro/data/json/recordsWithFixed.json
> PRE-CREATION
> test/org/apache/pig/builtin/avro/data/json/recordsWithMaps.json
> PRE-CREATION
> test/org/apache/pig/builtin/avro/data/json/recordsWithMapsOfRecords.json
> PRE-CREATION
> test/org/apache/pig/builtin/avro/data/json/recordsWithNullableUnions.json
> PRE-CREATION
> test/org/apache/pig/builtin/avro/data/json/recursiveRecord.json
> PRE-CREATION
> test/org/apache/pig/builtin/avro/schema/recordWithRepeatedSubRecords.avsc
> PRE-CREATION
> test/org/apache/pig/builtin/avro/schema/records.avsc PRE-CREATION
> test/org/apache/pig/builtin/avro/schema/recordsAsOutputByPig.avsc
> PRE-CREATION
> test/org/apache/pig/builtin/avro/schema/recordsOfArrays.avsc PRE-CREATION
> test/org/apache/pig/builtin/avro/schema/recordsOfArraysOfRecords.avsc
> PRE-CREATION
> test/org/apache/pig/builtin/avro/schema/recordsSubSchema.avsc PRE-CREATION
> test/org/apache/pig/builtin/avro/schema/recordsSubSchemaNullable.avsc
> PRE-CREATION
> test/org/apache/pig/builtin/avro/schema/recordsWithEnums.avsc PRE-CREATION
> test/org/apache/pig/builtin/avro/schema/recordsWithFixed.avsc PRE-CREATION
> test/org/apache/pig/builtin/avro/schema/recordsWithMaps.avsc PRE-CREATION
> test/org/apache/pig/builtin/avro/schema/recordsWithMapsOfRecords.avsc
> PRE-CREATION
> test/org/apache/pig/builtin/avro/schema/recordsWithNullableUnions.avsc
> PRE-CREATION
> test/org/apache/pig/builtin/avro/schema/recursiveRecord.avsc PRE-CREATION
> test/unit-tests 0f18a0e
>
> Diff: https://reviews.apache.org/r/8104/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Joseph Adler
>
>