[ 
https://issues.apache.org/jira/browse/HBASE-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530267#comment-13530267
 ] 

Jonathan Hsieh edited comment on HBASE-7339 at 12/12/12 8:06 PM:
-----------------------------------------------------------------

This was encountered when testing online snapshots, but will affect offline 
snapshots as well.

Suggested solutions:
1) Make opening the hfile-link daughter reference more robust, by attempting to 
treat as a reference if treating as link fails.  Hacky but "should" work.
2) Change the regex's used to differentiate references and hfilelinks more 
strict so that we can differentiate. Hacky, not sure if it will work.
3) Change daughter reference link file name to be more robust.  Currently 
'<hfile>.<parentregion>', maybe chanage to '<hfile>@<parentregion>'. This would 
then allow '<hfile>\-<region>\-<table>@<parentreigon>' to be interpreted 
correctly.  This is the "right way" but breaks compatibility

Other follow-ons -- ideally we are more robust by quarantining a bad region or 
hfiles/linksfiles if it has killed a few nodes in the cluster.
                
      was (Author: jmhsieh):
    This was encountered when testing online snapshots, but will affect offline 
snapshots as well.

Suggested solutions:
1) Make opening the hfile-link daughter reference more robust, by attempting to 
treat as a reference if treating as link fails.  Hacky but "should" work.
2) Change the regex's used to differentiate references and hfilelinks more 
strict so that we can differentiate. Hacky, not sure if it will work.
3) Change daughter reference link file name to be more robust.  Currently 
'<hfile>.<parentregion>', maybe chanage to '<hfile>@<parentregion>'. This would 
then allow '<hfile>-<region>-<table>@<parentreigon>' to be interpreted 
correctly.  This is the "right way" but breaks compatibility

Other follow-ons -- ideally we are more robust by quarantining a bad region or 
hfiles/linksfiles if it has killed a few nodes in the cluster.
                  
> Splitting a hfilelink causes region servers to go down.
> -------------------------------------------------------
>
>                 Key: HBASE-7339
>                 URL: https://issues.apache.org/jira/browse/HBASE-7339
>             Project: HBase
>          Issue Type: Sub-task
>          Components: snapshots
>    Affects Versions: hbase-6055
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>            Priority: Blocker
>             Fix For: hbase-6055
>
>
> Steps:
> - Have a single region table with 15 hfiles in it.
> - Snapshot it. (was done using online snapshot from HBASE-7321)
> - Clone a snapshot 
> - region post-open task attempts to compact region.  policy does not compact 
> all files. (default seems to be 10)
> - after compaction we have hfile links and real hfiles mixed in the region
> - it starts splitting
> - creating split references, opening daughers fails 
> - hfile links are "split", creating hfile link daughter refs.  
> {{<<hfile>\-<region>\-<table>>.<parentregion>}}
> - these "split" hfile links are interpreted as hfile links with table 
> {{<table>.<parentregion>}} -> 
> {{<<hfile>\-<region>>\-<<table>.<parentregion>>}}  (groupings interpreted 
> incorrectly)
> - Since this is after the splitting PONR, this aborts the server.  It then 
> spreads to the next server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to