[ https://issues.apache.org/jira/browse/PARQUET-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370053#comment-16370053 ]
e.birukov commented on PARQUET-860: ----------------------------------- I get the same error This happens when I get the temporary unavailability of s3 ....FileSystemException... at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158) at org.apache.hadoop.fs.s3a.S3AOutputStream.close(S3AOutputStream.java:121) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:643) when I call the close () method again Caused by: java.lang.NullPointerException at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:162) at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:109) at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:301) I read data from the stream and write data usin ParquetWriter. So this problem is critical. It causes data loss! > ParquetWriter.getDataSize NullPointerException after closed > ----------------------------------------------------------- > > Key: PARQUET-860 > URL: https://issues.apache.org/jira/browse/PARQUET-860 > Project: Parquet > Issue Type: Bug > Components: parquet-mr > Affects Versions: 1.9.0 > Environment: Linux prim 4.8.13-1-ARCH #1 SMP PREEMPT Fri Dec 9 > 07:24:34 CET 2016 x86_64 GNU/Linux > openjdk version "1.8.0_112" > OpenJDK Runtime Environment (build 1.8.0_112-b15) > OpenJDK 64-Bit Server VM (build 25.112-b15, mixed mode) > Reporter: Mike Mintz > Priority: Major > > When I run {{ParquetWriter.getDataSize()}}, it works normally. But after I > call {{ParquetWriter.close()}}, subsequent calls to ParquetWriter.getDataSize > result in a NullPointerException. > {noformat} > java.lang.NullPointerException > at > org.apache.parquet.hadoop.InternalParquetRecordWriter.getDataSize(InternalParquetRecordWriter.java:132) > at > org.apache.parquet.hadoop.ParquetWriter.getDataSize(ParquetWriter.java:314) > at FileBufferState.getFileSizeInBytes(FileBufferState.scala:83) > {noformat} > The reason for the NPE appears to be in > {{InternalParquetRecordWriter.getDataSize}}, where it assumes that > {{columnStore}} is not null. > But the {{close()}} method calls {{flushRowGroupToStore()}} which sets > {{columnStore = null}}. > I'm guessing that once the file is closed, we can just return > {{lastRowGroupEndPos}} since there should be no more buffered data, but I > don't fully understand how this class works. -- This message was sent by Atlassian JIRA (v7.6.3#76005)