It looks like a bug when using Parquet with MultipleOutputs. The OutputCommitter will read all the footers to generate the _metadata summary
On Mon, Jul 21, 2014 at 3:17 PM, <[email protected]> wrote: > Hi, > > I am getting similar error. Can you suggest how can I resolve this issue? > > I am writing map reduce output in to multiple partition directories based > on date(year=2014/month=6/day-13) in parquet format using > ExampleOutputFormat and MultipleOutputs. I am getting below error when I am > running from MRUnit. > > WARNING: could not write summary file for file /src/test/resources/parquet > java.io.IOException: Could not read footer: java.io.IOException: Could not > read footer for file > RawLocalFileStatus{path=filesrc/test/resources//parquet/year=2014; > isDirectory=true; modification_time=1405979408000; access_time=0; owner=; > group=; permission=rwxrwxrwx; isSymlink=false} > at > parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:159) > at > parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:177) > at > parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:49) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:456) > Caused by: java.io.IOException: Could not read footer for file > RawLocalFileStatus{path=file /src/test/resources/ parquet/year=2014; > isDirectory=true; modification_time=1405979408000; access_time=0; owner=; > group=; permission=rwxrwxrwx; isSymlink=false} > at > parquet.hadoop.ParquetFileReader$1.call(ParquetFileReader.java:140) > at > parquet.hadoop.ParquetFileReader$1.call(ParquetFileReader.java:133) > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.io.FileNotFoundException: > /home/cloudera/git/gbd-hadoop/mr/src/test/resources /parquet/year=2014 (Is > a directory) > at java.io.FileInputStream.open(Native Method) > at java.io.FileInputStream.<init>(FileInputStream.java:120) > at > org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:85) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:121) > at > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:197) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:140) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:335) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:746) > at > parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:246) > at > parquet.hadoop.ParquetFileReader$1.readFooter(ParquetFileReader.java:146) > at > parquet.hadoop.ParquetFileReader$1.call(ParquetFileReader.java:138) > > -- > http://parquet.github.com/ > --- > You received this message because you are subscribed to the Google Groups > "Parquet" group. > To post to this group, send email to [email protected]. >
