-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/
-----------------------------------------------------------
(Updated July 19, 2012, 1:23 a.m.)
Review request for pig.
Changes
-------
1) Added more unit tests including some negative tests.
2) Removed getPathsFromString() because I realized that fs.globStatus()
implicitly expands comma-separated string into paths, so it is redundant to
explicitly do it.
3) Changed the type of 1st parameter of getAllSubDirs() from URI to
hadoop.fs.Path. This is needed because '{' and '}' are not allowed in URI, so
URI.create() throws a URISyntaxException on a glob pattern. But these
characters are automatically escaped when constructing a Path. Note that this
wasn't an issue in my previous patch because getPathsFromString() used to
implicitly convert a glob pattern to paths, but now I removed
getPathsFromString() and have to do it explicitly.
In fact, this reverts some changes made by PIG-2540
(https://issues.apache.org/jira/browse/PIG-2540). However, this does not break
S3 support because inside getAllSubDirs(), file system is still constructed for
the given URI, and globStatus() is called on that file system.
FileSystem fs = FileSystem.get(path.toUri(), job.getConfiguration());
FileStatus[] matchedFiles = fs.globStatus(path);
So if path is a s3 URI, S3 file system will be used.
Description
-------
Add glob support to AvroStorage:
https://issues.apache.org/jira/browse/PIG-2492
This addresses bug PIG-2492.
https://issues.apache.org/jira/browse/PIG-2492
Diffs (updated)
-----
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
0f8ef27
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
c7de726
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
48b093b
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java
e5d0c38
Diff: https://reviews.apache.org/r/5936/diff/
Testing
-------
1. Added new unit tests as follows:
- testDir verifies that AvroStorage recursively loads files in a directory and
its sub-directories.
- testGlob1 to 3 verify that glob patterns are expanded properly.
To run the tests, please do the following:
wget
https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz
tar -xf avro_test_files.tar.gz
ant clean compile-test piggybank -Dhadoopversion=20
cd contrib/piggybank/java
ant test -Dtestcase=TestAvroStorage
2. Both TestAvroStorage and TestAvroStorageUtils pass.
Thanks,
Cheolsoo Park