[ https://issues.apache.org/jira/browse/HBASE-23634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163438#comment-17163438 ]
Anoop Sam John commented on HBASE-23634: ---------------------------------------- bq.Then one issue is how the system can know whether it is a partially written file or a real corruption? This is applicable when we do split wals to HFiles and during that there was some wal file split failed in between and reattempted. End of the day there might be some HFiles which are duplicate. Some HFiles may be incomplete. Now while reading back the file (verification while loading to cf) we dont know whether this is a failed attempt's partial HFile or a real corrupted file once it was written to FS. So on wal file split failure, a cleanup before next attempt is imp. Now one problem is there might be N WAL files and all splits will create the HFiles for a region:cf under same dir region/cf/recovered.edits. Now if we wanted this cleanup, these files should have been generated with some way to identify them as result of which wal file's split. Say if HFile was placed under region/cf/recovered.edits/<split wal file name> dir, we could have cleaned it up before doing the next attempt. Thoughts? > Enable "Split WAL to HFile" by default > -------------------------------------- > > Key: HBASE-23634 > URL: https://issues.apache.org/jira/browse/HBASE-23634 > Project: HBase > Issue Type: Task > Affects Versions: 3.0.0-alpha-1, 2.3.0 > Reporter: Guanghao Zhang > Priority: Blocker > Fix For: 3.0.0-alpha-1 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)