[GitHub] [spark] HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix FileContextBasedCheckpointFileManager leaking crc files

2019-08-22 Thread GitBox
HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix 
FileContextBasedCheckpointFileManager leaking crc files
URL: https://github.com/apache/spark/pull/25488#issuecomment-524139791
 
 
   Addressed. HADOOP JIRA issue is included in both code comment and PR 
description. Please take a look again. Thanks in advance!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix FileContextBasedCheckpointFileManager leaking crc files

2019-08-18 Thread GitBox
HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix 
FileContextBasedCheckpointFileManager leaking crc files
URL: https://github.com/apache/spark/pull/25488#issuecomment-522379521
 
 
   As the issue only occurs from temp file, another possible approach would be 
removing crc file for temp file when FSOutputSummer is being used as 
OutputStream:
   
   ```
   underlyingStream match {
 case e: FSDataOutputStream if 
e.getWrappedStream.isInstanceOf[FSOutputSummer] =>
   val checksumFile = new Path(tempPath.getParent, 
s".${tempPath.getName}.crc")
   fm.delete(checksumFile)
 case _ =>
   }
   ```
   
   adding above block after `underlyingStream.close()` would also work. This 
change will touch RenameBasedFSDataOutputStream.
   
   As the annotations of the class describe, `FSOutputSummer` is Unstable and 
LimitedPrivate so not sure we would like to rely on this though.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix FileContextBasedCheckpointFileManager leaking crc files

2019-08-18 Thread GitBox
HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix 
FileContextBasedCheckpointFileManager leaking crc files
URL: https://github.com/apache/spark/pull/25488#issuecomment-522379521
 
 
   Another possible approach would be removing crc file for temp file when 
FSOutputSummer is being used as OutputStream:
   
   ```
   underlyingStream match {
 case e: FSDataOutputStream if 
e.getWrappedStream.isInstanceOf[FSOutputSummer] =>
   val checksumFile = new Path(tempPath.getParent, 
s".${tempPath.getName}.crc")
   fm.delete(checksumFile)
 case _ =>
   }
   ```
   
   adding above block after `underlyingStream.close()` would also work. This 
change will touch RenameBasedFSDataOutputStream.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix FileContextBasedCheckpointFileManager leaking crc files

2019-08-18 Thread GitBox
HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix 
FileContextBasedCheckpointFileManager leaking crc files
URL: https://github.com/apache/spark/pull/25488#issuecomment-522379521
 
 
   As the issue only occurs from temp file, another possible approach would be 
removing crc file for temp file when FSOutputSummer is being used as 
OutputStream:
   
   ```
   underlyingStream match {
 case e: FSDataOutputStream if 
e.getWrappedStream.isInstanceOf[FSOutputSummer] =>
   val checksumFile = new Path(tempPath.getParent, 
s".${tempPath.getName}.crc")
   fm.delete(checksumFile)
 case _ =>
   }
   ```
   
   adding above block after `underlyingStream.close()` would also work. This 
change will touch RenameBasedFSDataOutputStream.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix FileContextBasedCheckpointFileManager leaking crc files

2019-08-18 Thread GitBox
HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix 
FileContextBasedCheckpointFileManager leaking crc files
URL: https://github.com/apache/spark/pull/25488#issuecomment-522374487
 
 
   And I'm not 100% sure of intention, but if we want to disable crc in below 
line, it doesn't seem to work:
   
   ```
   fc.create(path, EnumSet.of(CREATE, OVERWRITE), 
CreateOpts.checksumParam(ChecksumOpt.createDisabled()))
   ```
   
   
   @skonto already analyzed relevant part but didn't figure out any Hadoop JIRA 
issue:
   
   
https://issues.apache.org/jira/browse/SPARK-28025?focusedCommentId=16862339&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16862339
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix FileContextBasedCheckpointFileManager leaking crc files

2019-08-18 Thread GitBox
HeartSaVioR edited a comment on issue #25488: [SPARK-28025][SS] Fix 
FileContextBasedCheckpointFileManager leaking crc files
URL: https://github.com/apache/spark/pull/25488#issuecomment-522374487
 
 
   And I'm not 100% sure of intention, but if we intended to disable crc via 
below line, it doesn't seem to work:
   
   ```
   fc.create(path, EnumSet.of(CREATE, OVERWRITE), 
CreateOpts.checksumParam(ChecksumOpt.createDisabled()))
   ```
   
   
   @skonto already analyzed relevant part but didn't figure out any Hadoop JIRA 
issue:
   
   
https://issues.apache.org/jira/browse/SPARK-28025?focusedCommentId=16862339&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16862339
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org