[ 
https://issues.apache.org/jira/browse/HBASE-21077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16586393#comment-16586393
 ] 

Vladimir Rodionov commented on HBASE-21077:
-------------------------------------------

The issue was in the code, which checks (filters) WAL files eligible for 
incremental backup against the list of WAL files which have been already backed 
up. This check (filter) have always failed to find any WAL file already backed 
up. So ALL the files were always eligible for incremental backup, even those 
that have been already deleted by BackupLogCleaner.

> MR job launched by hbase incremental backup command failed with 
> FileNotFoundException
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-21077
>                 URL: https://issues.apache.org/jira/browse/HBASE-21077
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>            Priority: Major
>         Attachments: HBASE-21077-v1.patch
>
>
> Discovered during internal testing by Romil Choksi.
> MR job launched by hbase incremental backup command failed with 
> FileNotFoundException
> from test console log
> {code}
> 2018-06-12 04:27:31,160|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:31,160 INFO 
>  [main] mapreduce.JobSubmitter: Submitting tokens for job: 
> job_1528766389356_0044
> 2018-06-12 04:27:31,186|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:31,184 INFO 
>  [main] mapreduce.JobSubmitter: Executing with tokens: [Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ns1, Ident: (token for hbase: 
> HDFS_DELEGATION_TOKEN owner=hb...@example.com, renewer=yarn, realUser=, 
> issueDate=1528777648605, maxDate=1529382448605, sequenceNumber=175, 
> masterKeyId=2), Kind: kms-dt, Service: 172.27.68.203:9393, Ident: (kms-dt 
> owner=hbase, renewer=yarn, realUser=, issueDate=1528777649149, 
> maxDate=1529382449149, sequenceNumber=49, masterKeyId=2), Kind: 
> HBASE_AUTH_TOKEN, Service: bc71e347-78ff-4f95-af44-006f9b549a84, Ident: 
> (org.apache.hadoop.hbase.security.token.AuthenticationTokenIdentifier@5), 
> Kind: kms-dt, Service: 172.27.52.14:9393, Ident: (kms-dt owner=hbase, 
> renewer=yarn, realUser=, issueDate=1528777648918, maxDate=1529382448918, 
> sequenceNumber=50, masterKeyId=2)]
> 2018-06-12 04:27:31,477|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:31,477 INFO 
>  [main] conf.Configuration: found resource resource-types.xml at 
> file:/etc/hadoop/3.0.0.0-1476/0/resource-types.xml
> 2018-06-12 04:27:31,527|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:31,527 INFO 
>  [main] impl.TimelineClientImpl: Timeline service address: 
> ctr-e138-1518143905142-359429-01-000004.hwx.site:8190
> 2018-06-12 04:27:32,563|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:32,562 INFO 
>  [main] impl.YarnClientImpl: Submitted application 
> application_1528766389356_0044
> 2018-06-12 04:27:32,634|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:32,634 INFO 
>  [main] mapreduce.Job: The url to track the job: 
> https://ctr-e138-1518143905142-359429-01-000003.hwx.site:8090/proxy/application_1528766389356_0044/
> 2018-06-12 04:27:32,635|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:32,635 INFO 
>  [main] mapreduce.Job: Running job: job_1528766389356_0044
> 2018-06-12 04:27:44,807|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:44,806 INFO 
>  [main] mapreduce.Job: Job job_1528766389356_0044 running in uber mode : false
> 2018-06-12 04:27:44,809|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:44,809 INFO 
>  [main] mapreduce.Job:  map 0% reduce 0%
> 2018-06-12 04:27:54,926|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:54,925 INFO 
>  [main] mapreduce.Job:  map 5% reduce 0%
> 2018-06-12 04:27:56,950|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:56,950 INFO 
>  [main] mapreduce.Job: Task Id : attempt_1528766389356_0044_m_000002_0, 
> Status : FAILED
> 2018-06-12 04:27:56,979|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|Error: 
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ns1/apps/hbase/data/oldWALs/ctr-e138-1518143905142-359429-01-000004.hwx.site%2C16020%2C1528776085205.1528776160915
> 2018-06-12 04:27:56,979|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
> 2018-06-12 04:27:56,980|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576)
> 2018-06-12 04:27:56,980|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 2018-06-12 04:27:56,980|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591)
> 2018-06-12 04:27:56,980|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:64)
> 2018-06-12 04:27:56,981|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:165)
> 2018-06-12 04:27:56,981|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:289)
> 2018-06-12 04:27:56,981|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:271)
> 2018-06-12 04:27:56,981|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:259)
> 2018-06-12 04:27:56,981|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:395)
> 2018-06-12 04:27:56,982|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.openReader(AbstractFSWALProvider.java:449)
> 2018-06-12 04:27:56,982|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at 
> org.apache.hadoop.hbase.mapreduce.WALInputFormat$WALRecordReader.openReader(WALInputFormat.java:166)
> 2018-06-12 04:27:56,982|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at 
> org.apache.hadoop.hbase.mapreduce.WALInputFormat$WALRecordReader.initialize(WALInputFormat.java:158)
> 2018-06-12 04:27:56,983|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:560)
> 2018-06-12 04:27:56,983|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at 
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:798)
> 2018-06-12 04:27:56,983|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at 
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
> 2018-06-12 04:27:56,983|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at 
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
> 2018-06-12 04:27:56,983|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at 
> java.security.AccessController.doPrivileged(Native Method)
> 2018-06-12 04:27:56,984|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at 
> javax.security.auth.Subject.doAs(Subject.java:422)
> 2018-06-12 04:27:56,984|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
> 2018-06-12 04:27:56,984|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|at 
> org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> 2018-06-12 04:27:56,984|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|
> {code}
> But looks like it did find the file on hdfs, test script runs incremental 
> backup command as HBase user.
> {code}
> 2018-06-12 04:27:30,756|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:30,755 
> DEBUG [main] mapreduce.WALInputFormat: Scanning 
> hdfs://ns1/apps/hbase/data/oldWALs/ctr-e138-1518143905142-359429-01-000004.hwx.site%2C16020%2C1528776085205.1528776160915
>  for WAL files
> 2018-06-12 04:27:30,758|INFO|MainThread|machine.py:167 - 
> run()||GUID=cb1d85c9-023c-4bc5-bf87-9c231c917eab|2018-06-12 04:27:30,758 INFO 
>  [main] mapreduce.WALInputFormat: Found: 
> HdfsLocatedFileStatus{path=hdfs://ns1/apps/hbase/data/oldWALs/ctr-e138-1518143905142-359429-01-000004.hwx.site%2C16020%2C1528776085205.1528776160915;
>  isDirectory=false; length=18031; replication=3; blocksize=268435456; 
> modification_time=1528776689363; access_time=1528776160921; owner=hbase; 
> group=hdfs; permission=rwx--x--x; isSymlink=false; hasAcl=false; 
> isEncrypted=false; isErasureCoded=false}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to