[jira] [Commented] (HBASE-10615) Make LoadIncrementalHFiles skip reference files

Matteo Bertozzi (JIRA) Wed, 26 Feb 2014 14:31:09 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913610#comment-13913610
 ]


Matteo Bertozzi commented on HBASE-10615:
-----------------------------------------

I'm not convinced about it... 

let say that you copy hbase.rootdir then what are you going to bulk load?
if you bulk load only /hbase/table and you skip references or links you loose 
data.

example for links is: clone_snapshot
all the table is based on links... if you try to bulk load it, you are skipping 
every file...

example for reference is: 
you upload the parent region data but not the daughter reference files
the CatalogJanitor kicks in and the parent is removed, since there are no 
references to the parent
and your data is lost...

can you describe more the use case, and give examples on how did you tested it?

> Make LoadIncrementalHFiles skip reference files
> -----------------------------------------------
>
>                 Key: HBASE-10615
>                 URL: https://issues.apache.org/jira/browse/HBASE-10615
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.96.0
>            Reporter: Jerry He
>            Assignee: Jerry He
>            Priority: Minor
>         Attachments: HBASE-10615-trunk.patch
>
>
> There is use base that the source of hfiles for LoadIncrementalHFiles can be 
> a FileSystem copy-out/backup of HBase table or archive hfiles.  For example,
> 1. Copy-out of hbase.rootdir, table dir, region dir (after disable) or 
> archive dir.
> 2. ExportSnapshot
> It is possible that there are reference files in the family dir in these 
> cases.
> We have such use cases, where trying to load back into HBase, we'll get
> {code}
> Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem 
> reading HFile Trailer from file 
> hdfs://HDFS-AMR/tmp/restoreTemp/117182adfe861c5d2b607da91d60aa8a/info/aed3d01648384b31b29e5bad4cd80bec.d179ab341fc68e7612fcd74eaf7cafbd
>         at 
> org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:570)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:594)
>         at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:636)
>         at 
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:472)
>         at 
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:393)
>         at 
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:391)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
>         at java.lang.Thread.run(Thread.java:738)
> Caused by: java.lang.IllegalArgumentException: Invalid HFile version: 
> 16715777 (expected to be between 2 and 2)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFile.checkFormatVersion(HFile.java:927)
>         at 
> org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:426)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:568)
> {code}
> It is desirable and safe to skip these reference files since they don't 
> contain any real data for bulk load purpose.
>   



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10615) Make LoadIncrementalHFiles skip reference files

Reply via email to