[ https://issues.apache.org/jira/browse/HBASE-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544006#comment-13544006 ]
Jonathan Hsieh commented on HBASE-7419: --------------------------------------- Next rev in review board? (easier to get context) The new regex's don't handle the {{_SeqId_[0-9]+}} stuff generated by bulk loads. I think this means links to bulk loaded files could fail to be recognized? (Add some in the regex tests?) {code} - public static final String REF_NAME_REGEX = "^([0-9a-f]+(?:_SeqId_[0-9]+_)?)(?:\\.(.+))?$"; - private static final Pattern REF_NAME_PARSER = Pattern.compile(REF_NAME_REGEX); - - /** - * Regex strictly for references to hfilelinks. (<hfile>-<region>-<table>.<parentEncRegion>). - * Group 1 is this file's hfilelink name. Group 2 the referenced parent region name. The '.' - * char is valid in table names but group 2's regex is greedy and interprets the table names - * correctly. The _SeqId_ portion comes from bulk loaded files. - */ - public static final String REF_TO_LINK_REGEX = "^([0-9a-f]+(?:_SeqId_[0-9]+_)?-[0-9a-f]+-" - + HTableDescriptor.VALID_USER_TABLE_REGEX + "+)\\.([^.]+)$"; {code} > revisit hfilelink file name format. > ----------------------------------- > > Key: HBASE-7419 > URL: https://issues.apache.org/jira/browse/HBASE-7419 > Project: HBase > Issue Type: Sub-task > Components: Client, master, regionserver, snapshots, Zookeeper > Reporter: Jonathan Hsieh > Assignee: Matteo Bertozzi > Fix For: hbase-6055, 0.96.0 > > Attachments: HBASE-7419-v0.patch > > > Valid table names are concatted with a '.' to a valid regions names is also a > valid table name, and lead to the incorrect interpretation. > {code} > true hfile name constraints: [0-9]+(?:_SeqID_[0-9]+)? > region name constraints : [a-f0-9]{16} (but we currently just use > [a-f0-9]+.) > table name constraints : [a-zA-Z0-9_][a-zA-Z0-9_.-]* > {code} > Notice that the table name constraints completely covers all region name > constraints and true hfile name constraints. (a valid hfile name is a valid > part of a table name, and a valid enc region name is a valid part of a table > name. > Currently the hfilelink filename convention is <hfile>-<region>-<table>. > Unfortunately, making a ref to this uses the name > <hfile>-<region>-<table>.<parentregion> -- the contactnation of > <table>.<parentregion> is a valid table name used to get interpreted as such. > The fix in HBASE-7339 requires a FileNotFoundException before going down the > hfile link resolution path. > Regardless of what we do, we need to add some char invalid for table names to > the hfilelink or reference filename convention. > Suggestion: if we changed the order of the hfile-link name we could avoid > some of the confusion -- <table>@<region>-<hfile>.<parentregion> (or some > other separator char than '@') could be used to avoid handling on the initial > filenotfoundexception but I think we'd still need a good chunk of the logic > to handle opening half-storefile reader throw a hfilelink. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira