[GitHub] [spark] HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix FileContextBasedCheckpointFileManager leaking crc files
HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix FileContextBasedCheckpointFileManager leaking crc files URL: https://github.com/apache/spark/pull/25488#issuecomment-524139791 Addressed. HADOOP JIRA issue is included in both code comment and PR description. Please take a look again. Thanks in advance! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix FileContextBasedCheckpointFileManager leaking crc files
HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix FileContextBasedCheckpointFileManager leaking crc files URL: https://github.com/apache/spark/pull/25488#issuecomment-522379521 As the issue only occurs from temp file, another possible approach would be removing crc file for temp file when FSOutputSummer is being used as OutputStream: ``` underlyingStream match { case e: FSDataOutputStream if e.getWrappedStream.isInstanceOf[FSOutputSummer] => val checksumFile = new Path(tempPath.getParent, s".${tempPath.getName}.crc") fm.delete(checksumFile) case _ => } ``` adding above block after `underlyingStream.close()` would also work. This change will touch RenameBasedFSDataOutputStream. As the annotations of the class describe, `FSOutputSummer` is Unstable and LimitedPrivate so not sure we would like to rely on this though. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix FileContextBasedCheckpointFileManager leaking crc files
HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix FileContextBasedCheckpointFileManager leaking crc files URL: https://github.com/apache/spark/pull/25488#issuecomment-522379521 Another possible approach would be removing crc file for temp file when FSOutputSummer is being used as OutputStream: ``` underlyingStream match { case e: FSDataOutputStream if e.getWrappedStream.isInstanceOf[FSOutputSummer] => val checksumFile = new Path(tempPath.getParent, s".${tempPath.getName}.crc") fm.delete(checksumFile) case _ => } ``` adding above block after `underlyingStream.close()` would also work. This change will touch RenameBasedFSDataOutputStream. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix FileContextBasedCheckpointFileManager leaking crc files
HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix FileContextBasedCheckpointFileManager leaking crc files URL: https://github.com/apache/spark/pull/25488#issuecomment-522379521 As the issue only occurs from temp file, another possible approach would be removing crc file for temp file when FSOutputSummer is being used as OutputStream: ``` underlyingStream match { case e: FSDataOutputStream if e.getWrappedStream.isInstanceOf[FSOutputSummer] => val checksumFile = new Path(tempPath.getParent, s".${tempPath.getName}.crc") fm.delete(checksumFile) case _ => } ``` adding above block after `underlyingStream.close()` would also work. This change will touch RenameBasedFSDataOutputStream. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix FileContextBasedCheckpointFileManager leaking crc files
HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix FileContextBasedCheckpointFileManager leaking crc files URL: https://github.com/apache/spark/pull/25488#issuecomment-522374487 And I'm not 100% sure of intention, but if we want to disable crc in below line, it doesn't seem to work: ``` fc.create(path, EnumSet.of(CREATE, OVERWRITE), CreateOpts.checksumParam(ChecksumOpt.createDisabled())) ``` @skonto already analyzed relevant part but didn't figure out any Hadoop JIRA issue: https://issues.apache.org/jira/browse/SPARK-28025?focusedCommentId=16862339&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16862339 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix FileContextBasedCheckpointFileManager leaking crc files
HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix FileContextBasedCheckpointFileManager leaking crc files URL: https://github.com/apache/spark/pull/25488#issuecomment-522374487 And I'm not 100% sure of intention, but if we intended to disable crc via below line, it doesn't seem to work: ``` fc.create(path, EnumSet.of(CREATE, OVERWRITE), CreateOpts.checksumParam(ChecksumOpt.createDisabled())) ``` @skonto already analyzed relevant part but didn't figure out any Hadoop JIRA issue: https://issues.apache.org/jira/browse/SPARK-28025?focusedCommentId=16862339&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16862339 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org