[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-22 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696227#comment-16696227
 ] 

Ted Yu commented on HBASE-21387:


Looks like catching FileNotFoundException is not enough to pass the new test.

Let's go with v17.
{code}
+LOG.debug("toDeleteFiles[{}] is: " + deletableFiles.get(i));
{code}
Minor: looks like you intended to provide both index and FileStatus. There was 
only one argument above.

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.10
>
> Attachments: 0001-UT.patch, 21387-suggest.txt, 21387.dbg.txt, 
> 21387.v10.txt, 21387.v11.txt, 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 
> 21387.v6.txt, 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, 
> HBASE-21387.branch-1.2.patch, HBASE-21387.branch-1.3.patch, 
> HBASE-21387.branch-1.patch, HBASE-21387.v13.patch, HBASE-21387.v14.patch, 
> HBASE-21387.v15.patch, HBASE-21387.v16.patch, HBASE-21387.v17.patch, 
> two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-22 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21387:
---
Attachment: 21387-suggest.txt

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.10
>
> Attachments: 0001-UT.patch, 21387-suggest.txt, 21387.dbg.txt, 
> 21387.v10.txt, 21387.v11.txt, 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 
> 21387.v6.txt, 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, 
> HBASE-21387.branch-1.2.patch, HBASE-21387.branch-1.3.patch, 
> HBASE-21387.branch-1.patch, HBASE-21387.v13.patch, HBASE-21387.v14.patch, 
> HBASE-21387.v15.patch, HBASE-21387.v16.patch, HBASE-21387.v17.patch, 
> two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-20 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693142#comment-16693142
 ] 

Ted Yu commented on HBASE-21387:


I was aware of the above JIRA.

Thanks for the unit test, Zheng.



> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 0001-UT.patch, 21387.dbg.txt, 21387.v10.txt, 
> 21387.v11.txt, 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 21387.v6.txt, 
> 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, HBASE-21387.v13.patch, 
> HBASE-21387.v14.patch, two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, 
> two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21478) Make table sorted when displaying rsgroup info in shell and master web UI

2018-11-18 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690975#comment-16690975
 ] 

Ted Yu commented on HBASE-21478:


bq. what about adding another private member as a copy of tables into 
RSGroupInfo

Have you considered memory consumption by the extra SortedSet ?
You can try this approach.
Please make the private field name obvious that it is for display only.

> Make table sorted when displaying rsgroup info in shell and master web UI
> -
>
> Key: HBASE-21478
> URL: https://issues.apache.org/jira/browse/HBASE-21478
> Project: HBase
>  Issue Type: Improvement
>  Components: rsgroup
>Reporter: Xiang Li
>Assignee: Xiang Li
>Priority: Minor
>
> Regarding the output of the command of "get_rsgoup" in hbase shell, or the 
> section of "Server Group" of HMaster's web UI, the tables are not sorted, so 
> not quite easy to read, like:
> {code}
> hbase(main):003:0> get_rsgroup 'default'
> GROUP INFORMATION
> ...
> Tables:
> table3
> ns2:table22
> table1
> ns1:table11
> ...
> {code}
> They could be sorted in the order of namespace then table name:
> {code}
> table1
> table3
> ns1:table11
> ns2:table22
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21141) Enable MOB in backup / restore test involving incremental backup

2018-11-16 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21141:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Thanks for the patch, Artem.

> Enable MOB in backup / restore test involving incremental backup
> 
>
> Key: HBASE-21141
> URL: https://issues.apache.org/jira/browse/HBASE-21141
> Project: HBase
>  Issue Type: Test
>  Components: backuprestore
>    Reporter: Ted Yu
>Assignee: Artem Ervits
>Priority: Major
>  Labels: mob
> Fix For: 3.0.0
>
> Attachments: HBASE-21141.v01.patch, HBASE-21141.v02.patch, 
> HBASE-21141.v03.patch, HBASE-21141.v04.patch
>
>
> Currently we only have one test (TestRemoteBackup) where MOB feature is 
> enabled. The test only performs full backup.
> This issue is to enable MOB in backup / restore test(s) involving incremental 
> backup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21482) TestHRegion fails due to 'Too many open files'

2018-11-16 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689903#comment-16689903
 ] 

Ted Yu commented on HBASE-21482:


Toward the end of 
hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.TestHRegion-output.txt
 
branch-2 :
{code}
2018-11-15 19:05:42,036 INFO  [Time-limited test] hbase.ResourceChecker(172): 
after: regionserver.TestHRegion#testCheckAndDelete_ThatDeleteWasWritten 
Thread=85 (was 85), OpenFileDescriptor=1276 (was 1273) - OpenFileDescriptor 
LEAK? -, MaxFileDescriptor=32000 (was 32000), SystemLoadAverage=149 (was 149), 
ProcessCount=361 (was 361), AvailableMemoryMB=36487 (was 36488)
{code}
master branch:
{code}
2018-11-16 19:06:59,290 INFO  [Time-limited test] hbase.ResourceChecker(172): 
after: regionserver.TestHRegion#testCheckAndDelete_ThatDeleteWasWritten 
Thread=79 (was 78) - Thread LEAK? -, OpenFileDescriptor=31932 (was 31934), 
MaxFileDescriptor=32000 (was 32000), SystemLoadAverage=82 (was 82), 
ProcessCount=363 (was 363), AvailableMemoryMB=36785 (was 36784) - 
AvailableMemoryMB LEAK? -
2018-11-16 19:06:59,290 WARN  [Time-limited test] hbase.ResourceChecker(135): 
OpenFileDescriptor=31932 is superior to 1024
{code}

> TestHRegion fails due to 'Too many open files'
> --
>
> Key: HBASE-21482
> URL: https://issues.apache.org/jira/browse/HBASE-21482
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Priority: Major
> Attachments: 
> org.apache.hadoop.hbase.regionserver.TestHRegion-output.txt, 
> org.apache.hadoop.hbase.regionserver.TestHRegion.txt
>
>
> TestHRegion fails due to 'Too many open files' in master branch.
> Here is one failed subtest :
> {code}
> testCheckAndDelete_ThatDeleteWasWritten(org.apache.hadoop.hbase.regionserver.TestHRegion)
>   Time elapsed: 2.373 sec  <<< ERROR!
> java.lang.IllegalStateException: failed to create a child event loop
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034)
> Caused by: org.apache.hbase.thirdparty.io.netty.channel.ChannelException: 
> failed to open a new selector
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034)
> Caused by: java.io.IOException: Too many open files
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21141) Enable MOB in backup / restore test involving incremental backup

2018-11-16 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689834#comment-16689834
 ] 

Ted Yu commented on HBASE-21141:


You can shrink the following into one line comment:
{code}
+//although split fail, this may not affect following check
+//In old split without AM2, if region's best split key is not found,
+//there are not exception thrown. But in current API, exception
+//will be thrown.
{code}
That would be 3 fewer lines.

> Enable MOB in backup / restore test involving incremental backup
> 
>
> Key: HBASE-21141
> URL: https://issues.apache.org/jira/browse/HBASE-21141
> Project: HBase
>  Issue Type: Test
>  Components: backuprestore
>    Reporter: Ted Yu
>Assignee: Artem Ervits
>Priority: Major
>  Labels: mob
> Attachments: HBASE-21141.v01.patch, HBASE-21141.v02.patch, 
> HBASE-21141.v03.patch
>
>
> Currently we only have one test (TestRemoteBackup) where MOB feature is 
> enabled. The test only performs full backup.
> This issue is to enable MOB in backup / restore test(s) involving incremental 
> backup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21141) Enable MOB in backup / restore test involving incremental backup

2018-11-16 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689825#comment-16689825
 ] 

Ted Yu commented on HBASE-21141:


w.r.t. the long method body, can you reduce method length to <= 150 lines by:

* dropping some debug logs
* removing some empty lines

Thanks

> Enable MOB in backup / restore test involving incremental backup
> 
>
> Key: HBASE-21141
> URL: https://issues.apache.org/jira/browse/HBASE-21141
> Project: HBase
>  Issue Type: Test
>  Components: backuprestore
>Reporter: Ted Yu
>Assignee: Artem Ervits
>Priority: Major
>  Labels: mob
> Attachments: HBASE-21141.v01.patch, HBASE-21141.v02.patch, 
> HBASE-21141.v03.patch
>
>
> Currently we only have one test (TestRemoteBackup) where MOB feature is 
> enabled. The test only performs full backup.
> This issue is to enable MOB in backup / restore test(s) involving incremental 
> backup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21246) Introduce WALIdentity interface

2018-11-16 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689816#comment-16689816
 ] 

Ted Yu commented on HBASE-21246:


Patch v43 is mostly formatting change on top of v41.

> Introduce WALIdentity interface
> ---
>
> Key: HBASE-21246
> URL: https://issues.apache.org/jira/browse/HBASE-21246
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Fix For: HBASE-20952
>
> Attachments: 21246.003.patch, 21246.20.txt, 21246.21.txt, 
> 21246.23.txt, 21246.24.txt, 21246.25.txt, 21246.26.txt, 21246.34.txt, 
> 21246.37.txt, 21246.39.txt, 21246.41.txt, 21246.43.txt, 
> 21246.HBASE-20952.001.patch, 21246.HBASE-20952.002.patch, 
> 21246.HBASE-20952.004.patch, 21246.HBASE-20952.005.patch, 
> 21246.HBASE-20952.007.patch, 21246.HBASE-20952.008.patch, 
> replication-src-creates-wal-reader.jpg, wal-factory-providers.png, 
> wal-providers.png, wal-splitter-reader.jpg, wal-splitter-writer.jpg
>
>
> We are introducing WALIdentity interface so that the WAL representation can 
> be decoupled from distributed filesystem.
> The interface provides getName method whose return value can represent 
> filename in distributed filesystem environment or, the name of the stream 
> when the WAL is backed by log stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21246) Introduce WALIdentity interface

2018-11-16 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21246:
---
Attachment: 21246.43.txt

> Introduce WALIdentity interface
> ---
>
> Key: HBASE-21246
> URL: https://issues.apache.org/jira/browse/HBASE-21246
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Fix For: HBASE-20952
>
> Attachments: 21246.003.patch, 21246.20.txt, 21246.21.txt, 
> 21246.23.txt, 21246.24.txt, 21246.25.txt, 21246.26.txt, 21246.34.txt, 
> 21246.37.txt, 21246.39.txt, 21246.41.txt, 21246.43.txt, 
> 21246.HBASE-20952.001.patch, 21246.HBASE-20952.002.patch, 
> 21246.HBASE-20952.004.patch, 21246.HBASE-20952.005.patch, 
> 21246.HBASE-20952.007.patch, 21246.HBASE-20952.008.patch, 
> replication-src-creates-wal-reader.jpg, wal-factory-providers.png, 
> wal-providers.png, wal-splitter-reader.jpg, wal-splitter-writer.jpg
>
>
> We are introducing WALIdentity interface so that the WAL representation can 
> be decoupled from distributed filesystem.
> The interface provides getName method whose return value can represent 
> filename in distributed filesystem environment or, the name of the stream 
> when the WAL is backed by log stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21141) Enable MOB in backup / restore test involving incremental backup

2018-11-16 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689701#comment-16689701
 ] 

Ted Yu commented on HBASE-21141:


We're close.
{code}
+  // #3 - incremental backup for multiple tables
{code}
#3 is repeated. Do you mind re-numbering the steps so that it is easier to 
follow ?

Please leave a blank line prior to each step for readability.
{code}
+  LOG.debug("mob has " + TEST_UTIL.countRows(hTable, mobName) + " rows");
+  Assert.assertEquals(TEST_UTIL.countRows(hTable, mobName), NB_ROWS_MOB);
{code}
countRows(hTable, mobName) is called twice - once for LOG and once for 
assertion. Can you store the count in a variable so that counting is called 
only once ?

Same applies to countRows(hTable, famName) and countRows(hTable, fam2Name).


> Enable MOB in backup / restore test involving incremental backup
> 
>
> Key: HBASE-21141
> URL: https://issues.apache.org/jira/browse/HBASE-21141
> Project: HBase
>  Issue Type: Test
>  Components: backuprestore
>Reporter: Ted Yu
>Assignee: Artem Ervits
>Priority: Major
>  Labels: mob
> Attachments: HBASE-21141.v01.patch, HBASE-21141.v02.patch
>
>
> Currently we only have one test (TestRemoteBackup) where MOB feature is 
> enabled. The test only performs full backup.
> This issue is to enable MOB in backup / restore test(s) involving incremental 
> backup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-16 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689656#comment-16689656
 ] 

Ted Yu commented on HBASE-21387:


Haven't got around to adding new unit test (without introducing extra 
synchronization primitive in snapshot classes).

Zheng:
If you have bandwidth, you can give it a try.

Thanks

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v10.txt, 21387.v11.txt, 
> 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 21387.v6.txt, 21387.v7.txt, 
> 21387.v8.txt, 21387.v9.txt, two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, 
> two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-15 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688926#comment-16688926
 ] 

Ted Yu commented on HBASE-21387:


Looking at Zheng's suggestion for new unit test,

bq. another thread to invoke deleteFiles 
=SnapshotHFileCleaner#getDeletableFiles;

Since the in progress snapshot is really long, 
getUnreferencedFiles(Iterable, SnapshotManager) may detect the in 
progress snapshot and miss the race condition described in the description.

Also, I have never seen unit test creating 10K hfiles.

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v10.txt, 21387.v11.txt, 
> 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 21387.v6.txt, 21387.v7.txt, 
> 21387.v8.txt, 21387.v9.txt, two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, 
> two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21141) Enable MOB in backup / restore test involving incremental backup

2018-11-15 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688648#comment-16688648
 ] 

Ted Yu commented on HBASE-21141:


{code}
+mobHcd.setMobThreshold(0L);
{code}
Please increase the threshold.

Please add assertion on the restored table(s).

> Enable MOB in backup / restore test involving incremental backup
> 
>
> Key: HBASE-21141
> URL: https://issues.apache.org/jira/browse/HBASE-21141
> Project: HBase
>  Issue Type: Test
>  Components: backuprestore
>    Reporter: Ted Yu
>Assignee: Artem Ervits
>Priority: Major
>  Labels: mob
> Attachments: HBASE-21141.v01.patch
>
>
> Currently we only have one test (TestRemoteBackup) where MOB feature is 
> enabled. The test only performs full backup.
> This issue is to enable MOB in backup / restore test(s) involving incremental 
> backup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21482) TestHRegion fails due to 'Too many open files'

2018-11-15 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688550#comment-16688550
 ] 

Ted Yu commented on HBASE-21482:


Didn't reproduce the test failure in branch-2.

Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 
2018-06-17T18:33:14Z)
Maven home: /apache-maven-3.5.4
Java version: 1.8.0_161, vendor: Oracle Corporation, runtime: /jdk1.8.0_161/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-327.28.3.el7.x86_64", arch: "amd64", family: 
"unix"

> TestHRegion fails due to 'Too many open files'
> --
>
> Key: HBASE-21482
> URL: https://issues.apache.org/jira/browse/HBASE-21482
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Major
> Attachments: 
> org.apache.hadoop.hbase.regionserver.TestHRegion-output.txt, 
> org.apache.hadoop.hbase.regionserver.TestHRegion.txt
>
>
> TestHRegion fails due to 'Too many open files' in master branch.
> Here is one failed subtest :
> {code}
> testCheckAndDelete_ThatDeleteWasWritten(org.apache.hadoop.hbase.regionserver.TestHRegion)
>   Time elapsed: 2.373 sec  <<< ERROR!
> java.lang.IllegalStateException: failed to create a child event loop
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034)
> Caused by: org.apache.hbase.thirdparty.io.netty.channel.ChannelException: 
> failed to open a new selector
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034)
> Caused by: java.io.IOException: Too many open files
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21482) TestHRegion fails due to 'Too many open files'

2018-11-15 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21482:
---
Attachment: org.apache.hadoop.hbase.regionserver.TestHRegion.txt

> TestHRegion fails due to 'Too many open files'
> --
>
> Key: HBASE-21482
> URL: https://issues.apache.org/jira/browse/HBASE-21482
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Priority: Major
> Attachments: 
> org.apache.hadoop.hbase.regionserver.TestHRegion-output.txt, 
> org.apache.hadoop.hbase.regionserver.TestHRegion.txt
>
>
> TestHRegion fails due to 'Too many open files' in master branch.
> Here is one failed subtest :
> {code}
> testCheckAndDelete_ThatDeleteWasWritten(org.apache.hadoop.hbase.regionserver.TestHRegion)
>   Time elapsed: 2.373 sec  <<< ERROR!
> java.lang.IllegalStateException: failed to create a child event loop
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034)
> Caused by: org.apache.hbase.thirdparty.io.netty.channel.ChannelException: 
> failed to open a new selector
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034)
> Caused by: java.io.IOException: Too many open files
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21482) TestHRegion fails due to 'Too many open files'

2018-11-15 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21482:
---
Attachment: org.apache.hadoop.hbase.regionserver.TestHRegion-output.txt

> TestHRegion fails due to 'Too many open files'
> --
>
> Key: HBASE-21482
> URL: https://issues.apache.org/jira/browse/HBASE-21482
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Priority: Major
> Attachments: 
> org.apache.hadoop.hbase.regionserver.TestHRegion-output.txt, 
> org.apache.hadoop.hbase.regionserver.TestHRegion.txt
>
>
> TestHRegion fails due to 'Too many open files' in master branch.
> Here is one failed subtest :
> {code}
> testCheckAndDelete_ThatDeleteWasWritten(org.apache.hadoop.hbase.regionserver.TestHRegion)
>   Time elapsed: 2.373 sec  <<< ERROR!
> java.lang.IllegalStateException: failed to create a child event loop
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034)
> Caused by: org.apache.hbase.thirdparty.io.netty.channel.ChannelException: 
> failed to open a new selector
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034)
> Caused by: java.io.IOException: Too many open files
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21482) TestHRegion fails due to 'Too many open files'

2018-11-15 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21482:
--

 Summary: TestHRegion fails due to 'Too many open files'
 Key: HBASE-21482
 URL: https://issues.apache.org/jira/browse/HBASE-21482
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


TestHRegion fails due to 'Too many open files' in master branch.
Here is one failed subtest :
{code}
testCheckAndDelete_ThatDeleteWasWritten(org.apache.hadoop.hbase.regionserver.TestHRegion)
  Time elapsed: 2.373 sec  <<< ERROR!
java.lang.IllegalStateException: failed to create a child event loop
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034)
Caused by: org.apache.hbase.thirdparty.io.netty.channel.ChannelException: 
failed to open a new selector
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034)
Caused by: java.io.IOException: Too many open files
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21482) TestHRegion fails due to 'Too many open files'

2018-11-15 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21482:
--

 Summary: TestHRegion fails due to 'Too many open files'
 Key: HBASE-21482
 URL: https://issues.apache.org/jira/browse/HBASE-21482
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


TestHRegion fails due to 'Too many open files' in master branch.
Here is one failed subtest :
{code}
testCheckAndDelete_ThatDeleteWasWritten(org.apache.hadoop.hbase.regionserver.TestHRegion)
  Time elapsed: 2.373 sec  <<< ERROR!
java.lang.IllegalStateException: failed to create a child event loop
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034)
Caused by: org.apache.hbase.thirdparty.io.netty.channel.ChannelException: 
failed to open a new selector
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034)
Caused by: java.io.IOException: Too many open files
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21246) Introduce WALIdentity interface

2018-11-15 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688506#comment-16688506
 ] 

Ted Yu commented on HBASE-21246:


Patch v41 fixes TestDLSAsyncFSWAL .

TestDLSAsyncFSWAL#countWAL was creating WALProvider in a loop which led to high 
resource consumption.

> Introduce WALIdentity interface
> ---
>
> Key: HBASE-21246
> URL: https://issues.apache.org/jira/browse/HBASE-21246
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Fix For: HBASE-20952
>
> Attachments: 21246.003.patch, 21246.20.txt, 21246.21.txt, 
> 21246.23.txt, 21246.24.txt, 21246.25.txt, 21246.26.txt, 21246.34.txt, 
> 21246.37.txt, 21246.39.txt, 21246.41.txt, 21246.HBASE-20952.001.patch, 
> 21246.HBASE-20952.002.patch, 21246.HBASE-20952.004.patch, 
> 21246.HBASE-20952.005.patch, 21246.HBASE-20952.007.patch, 
> 21246.HBASE-20952.008.patch, replication-src-creates-wal-reader.jpg, 
> wal-factory-providers.png, wal-providers.png, wal-splitter-reader.jpg, 
> wal-splitter-writer.jpg
>
>
> We are introducing WALIdentity interface so that the WAL representation can 
> be decoupled from distributed filesystem.
> The interface provides getName method whose return value can represent 
> filename in distributed filesystem environment or, the name of the stream 
> when the WAL is backed by log stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21246) Introduce WALIdentity interface

2018-11-15 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21246:
---
Attachment: 21246.41.txt

> Introduce WALIdentity interface
> ---
>
> Key: HBASE-21246
> URL: https://issues.apache.org/jira/browse/HBASE-21246
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Fix For: HBASE-20952
>
> Attachments: 21246.003.patch, 21246.20.txt, 21246.21.txt, 
> 21246.23.txt, 21246.24.txt, 21246.25.txt, 21246.26.txt, 21246.34.txt, 
> 21246.37.txt, 21246.39.txt, 21246.41.txt, 21246.HBASE-20952.001.patch, 
> 21246.HBASE-20952.002.patch, 21246.HBASE-20952.004.patch, 
> 21246.HBASE-20952.005.patch, 21246.HBASE-20952.007.patch, 
> 21246.HBASE-20952.008.patch, replication-src-creates-wal-reader.jpg, 
> wal-factory-providers.png, wal-providers.png, wal-splitter-reader.jpg, 
> wal-splitter-writer.jpg
>
>
> We are introducing WALIdentity interface so that the WAL representation can 
> be decoupled from distributed filesystem.
> The interface provides getName method whose return value can represent 
> filename in distributed filesystem environment or, the name of the stream 
> when the WAL is backed by log stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21479) TestHRegionReplayEvents#testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent fails with IndexOutOfBoundsException

2018-11-15 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688285#comment-16688285
 ] 

Ted Yu commented on HBASE-21479:


The single test used to pass.
e.g.
243e6cc5293dc1e2a4dfd3af4ee29087c84184c8

> TestHRegionReplayEvents#testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent 
> fails with IndexOutOfBoundsException
> --
>
> Key: HBASE-21479
> URL: https://issues.apache.org/jira/browse/HBASE-21479
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Priority: Major
> Attachments: testHRegionReplayEvents-output.txt
>
>
> The test fails in both master branch and branch-2 :
> {code}
> testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent(org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents)
>   Time elapsed: 3.74 sec  <<< ERROR!
> java.lang.IndexOutOfBoundsException: Index: 2, Size: 1
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents.testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent(TestHRegionReplayEvents.java:1042)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21460) correct Document Configurable Bucket Sizes in bucketCache

2018-11-15 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21460:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Thanks for the patch, Yechao.

> correct Document Configurable Bucket Sizes in bucketCache
> -
>
> Key: HBASE-21460
> URL: https://issues.apache.org/jira/browse/HBASE-21460
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Yechao Chen
>Assignee: Yechao Chen
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21460.v1.patch, HBASE-21460.v2.patch
>
>
> we use the bucket cache(offheap),found the doc was error,
> the property bucket sizes shoul be "hbase.bucketcache.bucket.sizes"  instead 
> of "hfile.block.cache.sizes"
> CacheConfig.java
>  /**
>  * A comma-delimited array of values for use as bucket sizes.
>  */
>  public static final String BUCKET_CACHE_BUCKETS_KEY = 
> "hbase.bucketcache.bucket.sizes";
> the doc was:
> 
>  
>  HBASE-10641 introduced the ability to configure multiple sizes for the 
> buckets of the BucketCache, in HBase 0.98 and newer. To configurable multiple 
> bucket sizes, configure the new property 
> {color:#ff}{{hfile.block.cache.sizes}}{color} (instead of{color:#ff} 
> {{hfile.block.cache.size}}{color}) to a comma-separated list of block sizes, 
> ordered from smallest to largest, with no spaces. The goal is to optimize the 
> bucket sizes based on your data access patterns. The following example 
> configures buckets of size 4096 and 8192.
>   {color:#ff}hfile.block.cache.sizes{color} 
> 4096,8192 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21479) TestHRegionReplayEvents#testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent fails with IndexOutOfBoundsException

2018-11-15 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688224#comment-16688224
 ] 

Ted Yu commented on HBASE-21479:


{code}
Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 
2018-06-17T18:33:14Z)
Maven home: /apache-maven-3.5.4
Java version: 1.8.0_161, vendor: Oracle Corporation, runtime: /jdk1.8.0_161/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-327.28.3.el7.x86_64", arch: "amd64", family: 
"unix"
{code}
Here is the command which I used to produce the failure:

mvn clean test 
-Dtest=TestHRegionReplayEvents#testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent

> TestHRegionReplayEvents#testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent 
> fails with IndexOutOfBoundsException
> --
>
> Key: HBASE-21479
> URL: https://issues.apache.org/jira/browse/HBASE-21479
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Major
> Attachments: testHRegionReplayEvents-output.txt
>
>
> The test fails in both master branch and branch-2 :
> {code}
> testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent(org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents)
>   Time elapsed: 3.74 sec  <<< ERROR!
> java.lang.IndexOutOfBoundsException: Index: 2, Size: 1
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents.testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent(TestHRegionReplayEvents.java:1042)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21479) TestHRegionReplayEvents#testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent fails with IndexOutOfBoundsException

2018-11-14 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687158#comment-16687158
 ] 

Ted Yu commented on HBASE-21479:


Progressively stepping back.
At :
9012a0b123b3eea8b08c8687cef812e83e9b491d

Still failing the same way.

> TestHRegionReplayEvents#testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent 
> fails with IndexOutOfBoundsException
> --
>
> Key: HBASE-21479
> URL: https://issues.apache.org/jira/browse/HBASE-21479
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Priority: Major
> Attachments: testHRegionReplayEvents-output.txt
>
>
> The test fails in both master branch and branch-2 :
> {code}
> testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent(org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents)
>   Time elapsed: 3.74 sec  <<< ERROR!
> java.lang.IndexOutOfBoundsException: Index: 2, Size: 1
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents.testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent(TestHRegionReplayEvents.java:1042)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-14 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687051#comment-16687051
 ] 

Ted Yu commented on HBASE-21457:


See my comment above: HBASE-21466 cleared the way.

> BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
> ---
>
> Key: HBASE-21457
> URL: https://issues.apache.org/jira/browse/HBASE-21457
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janos Gub
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 21457.v1.txt, 21457.v2.txt, 21457.v3.txt, 21457.v3.txt, 
> 21457.v4.txt
>
>
> Janos reported seeing backup test failure when testing a local HDFS for WALs 
> while using WASB/ADLS only for store files.
> Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
> root dir for retrieving WAL files.
> We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21479) TestHRegionReplayEvents#testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent fails with IndexOutOfBoundsException

2018-11-14 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21479:
--

 Summary: 
TestHRegionReplayEvents#testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent 
fails with IndexOutOfBoundsException
 Key: HBASE-21479
 URL: https://issues.apache.org/jira/browse/HBASE-21479
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


The test fails in both master branch and branch-2 :
{code}
testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent(org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents)
  Time elapsed: 3.74 sec  <<< ERROR!
java.lang.IndexOutOfBoundsException: Index: 2, Size: 1
at 
org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents.testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent(TestHRegionReplayEvents.java:1042)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21479) TestHRegionReplayEvents#testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent fails with IndexOutOfBoundsException

2018-11-14 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21479:
---
Attachment: testHRegionReplayEvents-output.txt

> TestHRegionReplayEvents#testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent 
> fails with IndexOutOfBoundsException
> --
>
> Key: HBASE-21479
> URL: https://issues.apache.org/jira/browse/HBASE-21479
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Priority: Major
> Attachments: testHRegionReplayEvents-output.txt
>
>
> The test fails in both master branch and branch-2 :
> {code}
> testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent(org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents)
>   Time elapsed: 3.74 sec  <<< ERROR!
> java.lang.IndexOutOfBoundsException: Index: 2, Size: 1
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents.testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent(TestHRegionReplayEvents.java:1042)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21479) TestHRegionReplayEvents#testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent fails with IndexOutOfBoundsException

2018-11-14 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21479:
--

 Summary: 
TestHRegionReplayEvents#testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent 
fails with IndexOutOfBoundsException
 Key: HBASE-21479
 URL: https://issues.apache.org/jira/browse/HBASE-21479
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


The test fails in both master branch and branch-2 :
{code}
testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent(org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents)
  Time elapsed: 3.74 sec  <<< ERROR!
java.lang.IndexOutOfBoundsException: Index: 2, Size: 1
at 
org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents.testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent(TestHRegionReplayEvents.java:1042)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21460) correct Document Configurable Bucket Sizes in bucketCache

2018-11-14 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686827#comment-16686827
 ] 

Ted Yu commented on HBASE-21460:


bq. (instead of `hbase.bucketcache.bucket.size`)

I don't see the above property referenced in other part of the online reference.
I think you can remove the above snippet - referencing the actual config name 
should be good enough.

> correct Document Configurable Bucket Sizes in bucketCache
> -
>
> Key: HBASE-21460
> URL: https://issues.apache.org/jira/browse/HBASE-21460
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Yechao Chen
>Assignee: Yechao Chen
>Priority: Major
> Attachments: HBASE-21460.v1.patch
>
>
> we use the bucket cache(offheap),found the doc was error,
> the property bucket sizes shoul be "hbase.bucketcache.bucket.sizes"  instead 
> of "hfile.block.cache.sizes"
> CacheConfig.java
>  /**
>  * A comma-delimited array of values for use as bucket sizes.
>  */
>  public static final String BUCKET_CACHE_BUCKETS_KEY = 
> "hbase.bucketcache.bucket.sizes";
> the doc was:
> 
>  
>  HBASE-10641 introduced the ability to configure multiple sizes for the 
> buckets of the BucketCache, in HBase 0.98 and newer. To configurable multiple 
> bucket sizes, configure the new property 
> {color:#ff}{{hfile.block.cache.sizes}}{color} (instead of{color:#ff} 
> {{hfile.block.cache.size}}{color}) to a comma-separated list of block sizes, 
> ordered from smallest to largest, with no spaces. The goal is to optimize the 
> bucket sizes based on your data access patterns. The following example 
> configures buckets of size 4096 and 8192.
>   {color:#ff}hfile.block.cache.sizes{color} 
> 4096,8192 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21478) Make table sorted when displaying rsgroup info in shell and master web UI

2018-11-14 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686798#comment-16686798
 ] 

Ted Yu commented on HBASE-21478:


RSGroupInfo#getTables() references a SortedSet .
Do you plan to create another Set which is sorted lexicographically ?

> Make table sorted when displaying rsgroup info in shell and master web UI
> -
>
> Key: HBASE-21478
> URL: https://issues.apache.org/jira/browse/HBASE-21478
> Project: HBase
>  Issue Type: Improvement
>  Components: rsgroup
>Reporter: Xiang Li
>Assignee: Xiang Li
>Priority: Major
>
> Regarding the output of the command of "get_rsgoup" in hbase shell, or the 
> section of "Server Group" of HMaster's web UI, the tables are not sorted, so 
> not quite easy to read, like:
> {code}
> hbase(main):003:0> get_rsgroup 'default'
> GROUP INFORMATION
> ...
> Tables:
> table3
> ns2:table22
> table1
> ns1:table11
> ...
> {code}
> They could be sorted in the order of namespace then table name:
> {code}
> table1
> table3
> ns1:table11
> ns2:table22
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21246) Introduce WALIdentity interface

2018-11-14 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686778#comment-16686778
 ] 

Ted Yu commented on HBASE-21246:


There are fewer test failures with patch v39.

Need to handle failure in TestDLSAsyncFSWAL by reducing resource consumption.

> Introduce WALIdentity interface
> ---
>
> Key: HBASE-21246
> URL: https://issues.apache.org/jira/browse/HBASE-21246
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Fix For: HBASE-20952
>
> Attachments: 21246.003.patch, 21246.20.txt, 21246.21.txt, 
> 21246.23.txt, 21246.24.txt, 21246.25.txt, 21246.26.txt, 21246.34.txt, 
> 21246.37.txt, 21246.39.txt, 21246.HBASE-20952.001.patch, 
> 21246.HBASE-20952.002.patch, 21246.HBASE-20952.004.patch, 
> 21246.HBASE-20952.005.patch, 21246.HBASE-20952.007.patch, 
> 21246.HBASE-20952.008.patch, replication-src-creates-wal-reader.jpg, 
> wal-factory-providers.png, wal-providers.png, wal-splitter-reader.jpg, 
> wal-splitter-writer.jpg
>
>
> We are introducing WALIdentity interface so that the WAL representation can 
> be decoupled from distributed filesystem.
> The interface provides getName method whose return value can represent 
> filename in distributed filesystem environment or, the name of the stream 
> when the WAL is backed by log stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21246) Introduce WALIdentity interface

2018-11-14 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21246:
---
Attachment: 21246.39.txt

> Introduce WALIdentity interface
> ---
>
> Key: HBASE-21246
> URL: https://issues.apache.org/jira/browse/HBASE-21246
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Fix For: HBASE-20952
>
> Attachments: 21246.003.patch, 21246.20.txt, 21246.21.txt, 
> 21246.23.txt, 21246.24.txt, 21246.25.txt, 21246.26.txt, 21246.34.txt, 
> 21246.37.txt, 21246.39.txt, 21246.HBASE-20952.001.patch, 
> 21246.HBASE-20952.002.patch, 21246.HBASE-20952.004.patch, 
> 21246.HBASE-20952.005.patch, 21246.HBASE-20952.007.patch, 
> 21246.HBASE-20952.008.patch, replication-src-creates-wal-reader.jpg, 
> wal-factory-providers.png, wal-providers.png, wal-splitter-reader.jpg, 
> wal-splitter-writer.jpg
>
>
> We are introducing WALIdentity interface so that the WAL representation can 
> be decoupled from distributed filesystem.
> The interface provides getName method whose return value can represent 
> filename in distributed filesystem environment or, the name of the stream 
> when the WAL is backed by log stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21246) Introduce WALIdentity interface

2018-11-13 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685515#comment-16685515
 ] 

Ted Yu commented on HBASE-21246:


Patch v37 fixes infinite call in RegionGroupingProvider#createWALIdentity

> Introduce WALIdentity interface
> ---
>
> Key: HBASE-21246
> URL: https://issues.apache.org/jira/browse/HBASE-21246
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Fix For: HBASE-20952
>
> Attachments: 21246.003.patch, 21246.20.txt, 21246.21.txt, 
> 21246.23.txt, 21246.24.txt, 21246.25.txt, 21246.26.txt, 21246.34.txt, 
> 21246.37.txt, 21246.HBASE-20952.001.patch, 21246.HBASE-20952.002.patch, 
> 21246.HBASE-20952.004.patch, 21246.HBASE-20952.005.patch, 
> 21246.HBASE-20952.007.patch, 21246.HBASE-20952.008.patch, 
> replication-src-creates-wal-reader.jpg, wal-factory-providers.png, 
> wal-providers.png, wal-splitter-reader.jpg, wal-splitter-writer.jpg
>
>
> We are introducing WALIdentity interface so that the WAL representation can 
> be decoupled from distributed filesystem.
> The interface provides getName method whose return value can represent 
> filename in distributed filesystem environment or, the name of the stream 
> when the WAL is backed by log stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21246) Introduce WALIdentity interface

2018-11-13 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21246:
---
Attachment: 21246.37.txt

> Introduce WALIdentity interface
> ---
>
> Key: HBASE-21246
> URL: https://issues.apache.org/jira/browse/HBASE-21246
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Fix For: HBASE-20952
>
> Attachments: 21246.003.patch, 21246.20.txt, 21246.21.txt, 
> 21246.23.txt, 21246.24.txt, 21246.25.txt, 21246.26.txt, 21246.34.txt, 
> 21246.37.txt, 21246.HBASE-20952.001.patch, 21246.HBASE-20952.002.patch, 
> 21246.HBASE-20952.004.patch, 21246.HBASE-20952.005.patch, 
> 21246.HBASE-20952.007.patch, 21246.HBASE-20952.008.patch, 
> replication-src-creates-wal-reader.jpg, wal-factory-providers.png, 
> wal-providers.png, wal-splitter-reader.jpg, wal-splitter-writer.jpg
>
>
> We are introducing WALIdentity interface so that the WAL representation can 
> be decoupled from distributed filesystem.
> The interface provides getName method whose return value can represent 
> filename in distributed filesystem environment or, the name of the stream 
> when the WAL is backed by log stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-13 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685384#comment-16685384
 ] 

Ted Yu edited comment on HBASE-21457 at 11/13/18 3:50 PM:
--

Thanks for the review, Stephen and Vlad.


was (Author: yuzhih...@gmail.com):
Thanks for the review, Stephen.

> BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
> ---
>
> Key: HBASE-21457
> URL: https://issues.apache.org/jira/browse/HBASE-21457
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janos Gub
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 21457.v1.txt, 21457.v2.txt, 21457.v3.txt, 21457.v3.txt, 
> 21457.v4.txt
>
>
> Janos reported seeing backup test failure when testing a local HDFS for WALs 
> while using WASB/ADLS only for store files.
> Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
> root dir for retrieving WAL files.
> We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-13 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21457:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Thanks for the review, Stephen.

> BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
> ---
>
> Key: HBASE-21457
> URL: https://issues.apache.org/jira/browse/HBASE-21457
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janos Gub
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 21457.v1.txt, 21457.v2.txt, 21457.v3.txt, 21457.v3.txt, 
> 21457.v4.txt
>
>
> Janos reported seeing backup test failure when testing a local HDFS for WALs 
> while using WASB/ADLS only for store files.
> Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
> root dir for retrieving WAL files.
> We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New HBase committer Jingyun Tian

2018-11-13 Thread Ted Yu
Congratulations, Jingyun!
 Original message From: Srinivas Reddy 
 Date: 11/13/18  12:46 AM  (GMT-08:00) To: 
dev@hbase.apache.org Cc: Hbase-User  Subject: Re: 
[ANNOUNCE] New HBase committer Jingyun Tian Congratulations Jingyun-Srinivas- 
Typed on tiny keys. pls ignore typos.{mobile app}On Tue 13 Nov, 2018, 15:54 
张铎(Duo Zhang)  On behalf of the Apache HBase PMC, 
I am pleased to announce that Jingyun> Tian has accepted the PMC's invitation 
to become a committer on the> project. We appreciate all of Jingyun's generous 
contributions thus far and> look forward to his continued involvement.>> 
Congratulations and welcome, Jingyun!>

Re: [ANNOUNCE] New HBase committer Jingyun Tian

2018-11-13 Thread Ted Yu
Congratulations, Jingyun!
 Original message From: Srinivas Reddy 
 Date: 11/13/18  12:46 AM  (GMT-08:00) To: 
d...@hbase.apache.org Cc: Hbase-User  Subject: Re: 
[ANNOUNCE] New HBase committer Jingyun Tian Congratulations Jingyun-Srinivas- 
Typed on tiny keys. pls ignore typos.{mobile app}On Tue 13 Nov, 2018, 15:54 
张铎(Duo Zhang)  On behalf of the Apache HBase PMC, 
I am pleased to announce that Jingyun> Tian has accepted the PMC's invitation 
to become a committer on the> project. We appreciate all of Jingyun's generous 
contributions thus far and> look forward to his continued involvement.>> 
Congratulations and welcome, Jingyun!>

[jira] [Comment Edited] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-12 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684625#comment-16684625
 ] 

Ted Yu edited comment on HBASE-21387 at 11/13/18 2:18 AM:
--

The two minor comments have been addressed in latest patch.
The only remaining comment was about a new test, right ?

Hopefully I can get to the test later this week.


was (Author: yuzhih...@gmail.com):
The only remaining comment was about a new test, right ?

Not sure when I can get to it this week.

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v10.txt, 21387.v11.txt, 
> 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 21387.v6.txt, 21387.v7.txt, 
> 21387.v8.txt, 21387.v9.txt, two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, 
> two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-12 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684625#comment-16684625
 ] 

Ted Yu commented on HBASE-21387:


The only remaining comment was about a new test, right ?

Not sure when I can get to it this week.

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v10.txt, 21387.v11.txt, 
> 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 21387.v6.txt, 21387.v7.txt, 
> 21387.v8.txt, 21387.v9.txt, two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, 
> two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-12 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21457:
---
Attachment: 21457.v4.txt

> BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
> ---
>
> Key: HBASE-21457
> URL: https://issues.apache.org/jira/browse/HBASE-21457
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janos Gub
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21457.v1.txt, 21457.v2.txt, 21457.v3.txt, 21457.v3.txt, 
> 21457.v4.txt
>
>
> Janos reported seeing backup test failure when testing a local HDFS for WALs 
> while using WASB/ADLS only for store files.
> Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
> root dir for retrieving WAL files.
> We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21466) WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir

2018-11-12 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21466:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.2.0
   3.0.0
   Status: Resolved  (was: Patch Available)

> WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir
> ---
>
> Key: HBASE-21466
> URL: https://issues.apache.org/jira/browse/HBASE-21466
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: 21466.v2.txt, 21466.v3.txt, 21466.v3.txt
>
>
> In WALProcedureStore ctor , the fs field is initialized this way:
> {code}
> this.fs = walDir.getFileSystem(conf);
> {code}
> However, when wal.dir is not under rootdir, the above would return wrong 
> FileSystem.
> In the modified TestMasterProcedureEvents, without fix, the master wouldn't 
> initialize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21466) WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir

2018-11-12 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684378#comment-16684378
 ] 

Ted Yu commented on HBASE-21466:


Without the leading slash (ahead of 'tmp'), test would fail with:
{code}
[ERROR] 
testWalAbortOnLowReplicationWithQueuedWriters(org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS)
  Time elapsed: 1.4 s  <<< ERROR!
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path 
in absolute URI: hdfs://localhost:37261tmp/wal
at 
org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS.setupDFS(TestWALProcedureStoreOnHDFS.java:88)
{code}

> WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir
> ---
>
> Key: HBASE-21466
> URL: https://issues.apache.org/jira/browse/HBASE-21466
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21466.v2.txt, 21466.v3.txt, 21466.v3.txt
>
>
> In WALProcedureStore ctor , the fs field is initialized this way:
> {code}
> this.fs = walDir.getFileSystem(conf);
> {code}
> However, when wal.dir is not under rootdir, the above would return wrong 
> FileSystem.
> In the modified TestMasterProcedureEvents, without fix, the master wouldn't 
> initialize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21466) WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir

2018-11-12 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21466:
---
Attachment: 21466.v3.txt

> WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir
> ---
>
> Key: HBASE-21466
> URL: https://issues.apache.org/jira/browse/HBASE-21466
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Attachments: 21466.v2.txt, 21466.v3.txt, 21466.v3.txt
>
>
> In WALProcedureStore ctor , the fs field is initialized this way:
> {code}
> this.fs = walDir.getFileSystem(conf);
> {code}
> However, when wal.dir is not under rootdir, the above would return wrong 
> FileSystem.
> In the modified TestMasterProcedureEvents, without fix, the master wouldn't 
> initialize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-12 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684373#comment-16684373
 ] 

Ted Yu commented on HBASE-21387:


TestBlockEvictionFromClient failure was unrelated to patch.

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v10.txt, 21387.v11.txt, 
> 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 21387.v6.txt, 21387.v7.txt, 
> 21387.v8.txt, 21387.v9.txt, two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, 
> two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-12 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684197#comment-16684197
 ] 

Ted Yu commented on HBASE-21387:


TestSnapshotFileCache failed due to NPE, as pointed out by findbugs.

TestSaslFanOutOneBlockAsyncDFSOutput failure was due to port in use - not 
related to the patch.

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v10.txt, 21387.v11.txt, 
> 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 21387.v6.txt, 21387.v7.txt, 
> 21387.v8.txt, 21387.v9.txt, two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, 
> two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-12 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21387:
---
Attachment: 21387.v12.txt

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v10.txt, 21387.v11.txt, 
> 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 21387.v6.txt, 21387.v7.txt, 
> 21387.v8.txt, 21387.v9.txt, two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, 
> two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21466) WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir

2018-11-12 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21466:
---
Attachment: 21466.v3.txt

> WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir
> ---
>
> Key: HBASE-21466
> URL: https://issues.apache.org/jira/browse/HBASE-21466
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Attachments: 21466.v2.txt, 21466.v3.txt
>
>
> In WALProcedureStore ctor , the fs field is initialized this way:
> {code}
> this.fs = walDir.getFileSystem(conf);
> {code}
> However, when wal.dir is not under rootdir, the above would return wrong 
> FileSystem.
> In the modified TestMasterProcedureEvents, without fix, the master wouldn't 
> initialize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21466) WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir

2018-11-12 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684169#comment-16684169
 ] 

Ted Yu commented on HBASE-21466:


Patch v3 addresses review comments above.

> WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir
> ---
>
> Key: HBASE-21466
> URL: https://issues.apache.org/jira/browse/HBASE-21466
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Attachments: 21466.v2.txt, 21466.v3.txt
>
>
> In WALProcedureStore ctor , the fs field is initialized this way:
> {code}
> this.fs = walDir.getFileSystem(conf);
> {code}
> However, when wal.dir is not under rootdir, the above would return wrong 
> FileSystem.
> In the modified TestMasterProcedureEvents, without fix, the master wouldn't 
> initialize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21466) WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir

2018-11-12 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21466:
---
Attachment: 21466.v2.txt

> WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir
> ---
>
> Key: HBASE-21466
> URL: https://issues.apache.org/jira/browse/HBASE-21466
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Attachments: 21466.v2.txt
>
>
> In WALProcedureStore ctor , the fs field is initialized this way:
> {code}
> this.fs = walDir.getFileSystem(conf);
> {code}
> However, when wal.dir is not under rootdir, the above would return wrong 
> FileSystem.
> In the modified TestMasterProcedureEvents, without fix, the master wouldn't 
> initialize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21466) WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir

2018-11-12 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21466:
---
Attachment: (was: 21466.v1.txt)

> WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir
> ---
>
> Key: HBASE-21466
> URL: https://issues.apache.org/jira/browse/HBASE-21466
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Attachments: 21466.v2.txt
>
>
> In WALProcedureStore ctor , the fs field is initialized this way:
> {code}
> this.fs = walDir.getFileSystem(conf);
> {code}
> However, when wal.dir is not under rootdir, the above would return wrong 
> FileSystem.
> In the modified TestMasterProcedureEvents, without fix, the master wouldn't 
> initialize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-12 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683916#comment-16683916
 ] 

Ted Yu commented on HBASE-21457:


HBASE-21466 needs to be committed first.
Without HBASE-21466, master wouldn't initialize when wal.dir is set to 
directory not under rootdir.

> BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
> ---
>
> Key: HBASE-21457
> URL: https://issues.apache.org/jira/browse/HBASE-21457
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janos Gub
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21457.v1.txt, 21457.v2.txt, 21457.v3.txt, 21457.v3.txt
>
>
> Janos reported seeing backup test failure when testing a local HDFS for WALs 
> while using WASB/ADLS only for store files.
> Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
> root dir for retrieving WAL files.
> We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-12 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21387:
---
Attachment: 21387.v11.txt

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v10.txt, 21387.v11.txt, 
> 21387.v2.txt, 21387.v3.txt, 21387.v6.txt, 21387.v7.txt, 21387.v8.txt, 
> 21387.v9.txt, two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, 
> two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-11 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683206#comment-16683206
 ] 

Ted Yu edited comment on HBASE-21387 at 11/12/18 5:22 AM:
--

https://reviews.apache.org/r/69316/

Adding a test may take some time. More than one countdown latch would be needed 
to control the timing of when snapshot is moved in place. The introduction of 
the countdown latches, solely for test purposes, seems to be not ideal.




was (Author: yuzhih...@gmail.com):
https://reviews.apache.org/r/69316/

Adding a test may take some time. More than one countdown latch would be needed 
to control the timing of when snapshot is moved in place. The introduction of 
the countdown latches, solely for test purposes, seems to be not ideal.

BTW I also have HBASE-21246 and HBASE-21466 going in parallel.

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v10.txt, 21387.v2.txt, 
> 21387.v3.txt, 21387.v6.txt, 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, 
> two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-11 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683206#comment-16683206
 ] 

Ted Yu commented on HBASE-21387:


https://reviews.apache.org/r/69316/

Adding a test may take some time. More than one countdown latch would be needed 
to control the timing of when snapshot is moved in place. The introduction of 
the countdown latches, solely for test purposes, seems to be not ideal.

BTW I also have HBASE-21246 and HBASE-21466 going in parallel.

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v10.txt, 21387.v2.txt, 
> 21387.v3.txt, 21387.v6.txt, 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, 
> two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-11 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21387:
---
Attachment: 21387.v10.txt

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v10.txt, 21387.v2.txt, 
> 21387.v3.txt, 21387.v6.txt, 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, 
> two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21466) WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir

2018-11-11 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21466:
---
Summary: WALProcedureStore uses wrong FileSystem if wal.dir is not under 
rootdir  (was: WALProcedureStore uses wrong FileSystem if wal.dir is on 
different FileSystem as rootdir)

> WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir
> ---
>
> Key: HBASE-21466
> URL: https://issues.apache.org/jira/browse/HBASE-21466
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Attachments: 21466.v1.txt
>
>
> In WALProcedureStore ctor , the fs field is initialized this way:
> {code}
> this.fs = walDir.getFileSystem(conf);
> {code}
> However, when wal.dir is on different FileSystem as rootdir, the above would 
> return wrong FileSystem.
> In the modified TestMasterProcedureEvents, without fix, the master wouldn't 
> initialize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21466) WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir

2018-11-11 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682946#comment-16682946
 ] 

Ted Yu commented on HBASE-21466:


Here is snippet from test output without fix:
{code}
2018-11-11 09:12:23,731 DEBUG [WALProcedureStoreSyncThread] 
wal.WALProcedureStore(1229): Removed 
log=file:/tmp/wal/MasterProcWALs/pv2-0005.log, 
activeLogs=[file:/tmp/wal/MasterProcWALs/pv2-0006.log, 
file:/tmp/wal/MasterProcWALs/pv2-0007.log]
2018-11-11 09:12:23,731 INFO  [WALProcedureStoreSyncThread] 
wal.ProcedureWALFile(160): Archiving 
file:/tmp/wal/MasterProcWALs/pv2-0006.log to 
file:/tmp/wal/oldWALs/pv2-0006.log
2018-11-11 09:12:23,732 DEBUG [WALProcedureStoreSyncThread] 
wal.WALProcedureStore(1229): Removed 
log=file:/tmp/wal/MasterProcWALs/pv2-0006.log, 
activeLogs=[file:/tmp/wal/MasterProcWALs/pv2-0007.log]
Process Thread Dump: Thread dump because: Master not initialized after 20ms
{code}

> WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir
> ---
>
> Key: HBASE-21466
> URL: https://issues.apache.org/jira/browse/HBASE-21466
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Attachments: 21466.v1.txt
>
>
> In WALProcedureStore ctor , the fs field is initialized this way:
> {code}
> this.fs = walDir.getFileSystem(conf);
> {code}
> However, when wal.dir is not under rootdir, the above would return wrong 
> FileSystem.
> In the modified TestMasterProcedureEvents, without fix, the master wouldn't 
> initialize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21466) WALProcedureStore uses wrong FileSystem if wal.dir is on different FileSystem as rootdir

2018-11-11 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21466:
--

 Summary: WALProcedureStore uses wrong FileSystem if wal.dir is on 
different FileSystem as rootdir
 Key: HBASE-21466
 URL: https://issues.apache.org/jira/browse/HBASE-21466
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu


In WALProcedureStore ctor , the fs field is initialized this way:
{code}
this.fs = walDir.getFileSystem(conf);
{code}
However, when wal.dir is on different FileSystem as rootdir, the above would 
return wrong FileSystem.
In the modified TestMasterProcedureEvents, without fix, the master wouldn't 
initialize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21466) WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir

2018-11-11 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21466:
---
Description: 
In WALProcedureStore ctor , the fs field is initialized this way:
{code}
this.fs = walDir.getFileSystem(conf);
{code}
However, when wal.dir is not under rootdir, the above would return wrong 
FileSystem.
In the modified TestMasterProcedureEvents, without fix, the master wouldn't 
initialize.

  was:
In WALProcedureStore ctor , the fs field is initialized this way:
{code}
this.fs = walDir.getFileSystem(conf);
{code}
However, when wal.dir is on different FileSystem as rootdir, the above would 
return wrong FileSystem.
In the modified TestMasterProcedureEvents, without fix, the master wouldn't 
initialize.


> WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir
> ---
>
> Key: HBASE-21466
> URL: https://issues.apache.org/jira/browse/HBASE-21466
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Attachments: 21466.v1.txt
>
>
> In WALProcedureStore ctor , the fs field is initialized this way:
> {code}
> this.fs = walDir.getFileSystem(conf);
> {code}
> However, when wal.dir is not under rootdir, the above would return wrong 
> FileSystem.
> In the modified TestMasterProcedureEvents, without fix, the master wouldn't 
> initialize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21466) WALProcedureStore uses wrong FileSystem if wal.dir is on different FileSystem as rootdir

2018-11-11 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21466:
---
Attachment: 21466.v1.txt

> WALProcedureStore uses wrong FileSystem if wal.dir is on different FileSystem 
> as rootdir
> 
>
> Key: HBASE-21466
> URL: https://issues.apache.org/jira/browse/HBASE-21466
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21466.v1.txt
>
>
> In WALProcedureStore ctor , the fs field is initialized this way:
> {code}
> this.fs = walDir.getFileSystem(conf);
> {code}
> However, when wal.dir is on different FileSystem as rootdir, the above would 
> return wrong FileSystem.
> In the modified TestMasterProcedureEvents, without fix, the master wouldn't 
> initialize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21466) WALProcedureStore uses wrong FileSystem if wal.dir is on different FileSystem as rootdir

2018-11-11 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21466:
---
Status: Patch Available  (was: Open)

> WALProcedureStore uses wrong FileSystem if wal.dir is on different FileSystem 
> as rootdir
> 
>
> Key: HBASE-21466
> URL: https://issues.apache.org/jira/browse/HBASE-21466
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21466.v1.txt
>
>
> In WALProcedureStore ctor , the fs field is initialized this way:
> {code}
> this.fs = walDir.getFileSystem(conf);
> {code}
> However, when wal.dir is on different FileSystem as rootdir, the above would 
> return wrong FileSystem.
> In the modified TestMasterProcedureEvents, without fix, the master wouldn't 
> initialize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21466) WALProcedureStore uses wrong FileSystem if wal.dir is on different FileSystem as rootdir

2018-11-11 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21466:
--

 Summary: WALProcedureStore uses wrong FileSystem if wal.dir is on 
different FileSystem as rootdir
 Key: HBASE-21466
 URL: https://issues.apache.org/jira/browse/HBASE-21466
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu


In WALProcedureStore ctor , the fs field is initialized this way:
{code}
this.fs = walDir.getFileSystem(conf);
{code}
However, when wal.dir is on different FileSystem as rootdir, the above would 
return wrong FileSystem.
In the modified TestMasterProcedureEvents, without fix, the master wouldn't 
initialize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-10 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682644#comment-16682644
 ] 

Ted Yu commented on HBASE-21387:


One more note about why I choose 21387.v9.txt as the version for review:

priority is given to taking snapshot versus (delaying) cleaning snapshot files.
This is because a failed snapshot has higher visibility compared to delayed 
snapshot cleaning.



> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v2.txt, 21387.v3.txt, 21387.v6.txt, 
> 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, two-pass-cleaner.v4.txt, 
> two-pass-cleaner.v6.txt, two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21246) Introduce WALIdentity interface

2018-11-10 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682638#comment-16682638
 ] 

Ted Yu commented on HBASE-21246:


Currently there are about 69 failing test classes.

Working through these failed tests.

> Introduce WALIdentity interface
> ---
>
> Key: HBASE-21246
> URL: https://issues.apache.org/jira/browse/HBASE-21246
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Fix For: HBASE-20952
>
> Attachments: 21246.003.patch, 21246.20.txt, 21246.21.txt, 
> 21246.23.txt, 21246.24.txt, 21246.25.txt, 21246.26.txt, 21246.34.txt, 
> 21246.HBASE-20952.001.patch, 21246.HBASE-20952.002.patch, 
> 21246.HBASE-20952.004.patch, 21246.HBASE-20952.005.patch, 
> 21246.HBASE-20952.007.patch, 21246.HBASE-20952.008.patch, 
> replication-src-creates-wal-reader.jpg, wal-factory-providers.png, 
> wal-providers.png, wal-splitter-reader.jpg, wal-splitter-writer.jpg
>
>
> We are introducing WALIdentity interface so that the WAL representation can 
> be decoupled from distributed filesystem.
> The interface provides getName method whose return value can represent 
> filename in distributed filesystem environment or, the name of the stream 
> when the WAL is backed by log stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-13468) hbase.zookeeper.quorum supports ipv6 address

2018-11-10 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13468:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Thanks for the patch, maoling

Thanks for the review, Mike

> hbase.zookeeper.quorum supports ipv6 address
> 
>
> Key: HBASE-13468
> URL: https://issues.apache.org/jira/browse/HBASE-13468
> Project: HBase
>  Issue Type: Bug
>Reporter: Mingtao Zhang
>Assignee: maoling
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-13468.master.001.patch, 
> HBASE-13468.master.002.patch, HBASE-13468.master.003.patch, 
> HBASE-13468.master.004.patch
>
>
> I put ipv6 address in hbase.zookeeper.quorum, by the time this string went to 
> zookeeper code, the address is messed up, i.e. only '[1234' left. 
> I started using pseudo mode with embedded zk = true.
> I downloaded 1.0.0, not sure which affected version should be here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21341) DeadServer shouldn't import unshaded Preconditions

2018-11-09 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21341:
---
Fix Version/s: 3.0.0

> DeadServer shouldn't import unshaded Preconditions
> --
>
> Key: HBASE-21341
> URL: https://issues.apache.org/jira/browse/HBASE-21341
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 21341.v1.txt
>
>
> DeadServer currently imports unshaded Preconditions :
> {code}
> import com.google.common.base.Preconditions;
> {code}
> We should import shaded version of Preconditions.
> This is the only place where unshaded class from com.google.common is imported



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21246) Introduce WALIdentity interface

2018-11-09 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681979#comment-16681979
 ] 

Ted Yu commented on HBASE-21246:


Patch v34 is based on current master.

Running test suite locally to see which tests fail.

> Introduce WALIdentity interface
> ---
>
> Key: HBASE-21246
> URL: https://issues.apache.org/jira/browse/HBASE-21246
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Fix For: HBASE-20952
>
> Attachments: 21246.003.patch, 21246.20.txt, 21246.21.txt, 
> 21246.23.txt, 21246.24.txt, 21246.25.txt, 21246.26.txt, 21246.34.txt, 
> 21246.HBASE-20952.001.patch, 21246.HBASE-20952.002.patch, 
> 21246.HBASE-20952.004.patch, 21246.HBASE-20952.005.patch, 
> 21246.HBASE-20952.007.patch, 21246.HBASE-20952.008.patch, 
> replication-src-creates-wal-reader.jpg, wal-factory-providers.png, 
> wal-providers.png, wal-splitter-reader.jpg, wal-splitter-writer.jpg
>
>
> We are introducing WALIdentity interface so that the WAL representation can 
> be decoupled from distributed filesystem.
> The interface provides getName method whose return value can represent 
> filename in distributed filesystem environment or, the name of the stream 
> when the WAL is backed by log stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21246) Introduce WALIdentity interface

2018-11-09 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21246:
---
Attachment: 21246.34.txt

> Introduce WALIdentity interface
> ---
>
> Key: HBASE-21246
> URL: https://issues.apache.org/jira/browse/HBASE-21246
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Fix For: HBASE-20952
>
> Attachments: 21246.003.patch, 21246.20.txt, 21246.21.txt, 
> 21246.23.txt, 21246.24.txt, 21246.25.txt, 21246.26.txt, 21246.34.txt, 
> 21246.HBASE-20952.001.patch, 21246.HBASE-20952.002.patch, 
> 21246.HBASE-20952.004.patch, 21246.HBASE-20952.005.patch, 
> 21246.HBASE-20952.007.patch, 21246.HBASE-20952.008.patch, 
> replication-src-creates-wal-reader.jpg, wal-factory-providers.png, 
> wal-providers.png, wal-splitter-reader.jpg, wal-splitter-writer.jpg
>
>
> We are introducing WALIdentity interface so that the WAL representation can 
> be decoupled from distributed filesystem.
> The interface provides getName method whose return value can represent 
> filename in distributed filesystem environment or, the name of the stream 
> when the WAL is backed by log stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-09 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681855#comment-16681855
 ] 

Ted Yu commented on HBASE-21457:


The master startup delay probably is related to procedure store WAL - store WAL 
would be using the designated hdfs .

> BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
> ---
>
> Key: HBASE-21457
> URL: https://issues.apache.org/jira/browse/HBASE-21457
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janos Gub
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21457.v1.txt, 21457.v2.txt, 21457.v3.txt, 21457.v3.txt
>
>
> Janos reported seeing backup test failure when testing a local HDFS for WALs 
> while using WASB/ADLS only for store files.
> Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
> root dir for retrieving WAL files.
> We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-09 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681852#comment-16681852
 ] 

Ted Yu commented on HBASE-21457:


The failed tests are already marked large test.

Looking for where the timeout should be increased.
200 seconds were really long. Though I don't see meaningful exception in test 
output related to master initialization.


> BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
> ---
>
> Key: HBASE-21457
> URL: https://issues.apache.org/jira/browse/HBASE-21457
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janos Gub
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21457.v1.txt, 21457.v2.txt, 21457.v3.txt, 21457.v3.txt
>
>
> Janos reported seeing backup test failure when testing a local HDFS for WALs 
> while using WASB/ADLS only for store files.
> Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
> root dir for retrieving WAL files.
> We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-09 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681788#comment-16681788
 ] 

Ted Yu commented on HBASE-21457:


Shall we leave the test change to another JIRA ?

> BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
> ---
>
> Key: HBASE-21457
> URL: https://issues.apache.org/jira/browse/HBASE-21457
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janos Gub
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21457.v1.txt, 21457.v2.txt, 21457.v3.txt, 21457.v3.txt
>
>
> Janos reported seeing backup test failure when testing a local HDFS for WALs 
> while using WASB/ADLS only for store files.
> Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
> root dir for retrieving WAL files.
> We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Apache Bahir 2.2.2 (RC1)

2018-11-08 Thread Ted Yu
+1

Ran unit test suite which passed.

On Thu, Nov 8, 2018 at 1:34 PM Luciano Resende  wrote:

> Dear community member,
>
> Please vote to approve the release of Apache Bahir 2.2.2 (RC1) based on
> Apache Spark 2.2.2.
>
> Tag: v2.2.2-rc1 (821a8c67c21f4f4ab4a7caa8e2f85a2c396683d4)
>
> https://github.com/apache/bahir/tree/v2.2.2-rc1
>
> Release files:
>
> https://repository.apache.org/content/repositories/orgapachebahir-1024
>
> Source distribution:
>
> https://dist.apache.org/repos/dist/dev/bahir/bahir-spark/2.2.2-rc1/
>
>
> The vote is open for at least 72 hours and passes if a majority of at least
> 3 +1 PMC votes are cast.
>
>   [ ] +1 Release this package as Apache Bahir 2.2.2
>   [ ] -1 Do not release this package because ...
>
>
> Thanks for your vote!
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-08 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680721#comment-16680721
 ] 

Ted Yu commented on HBASE-21387:


[~openinx][~Apache9][~elserj] :
Gentle ping.

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v2.txt, 21387.v3.txt, 21387.v6.txt, 
> 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, two-pass-cleaner.v4.txt, 
> two-pass-cleaner.v6.txt, two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-08 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21457:
---
Attachment: 21457.v3.txt

> BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
> ---
>
> Key: HBASE-21457
> URL: https://issues.apache.org/jira/browse/HBASE-21457
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janos Gub
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21457.v1.txt, 21457.v2.txt, 21457.v3.txt, 21457.v3.txt
>
>
> Janos reported seeing backup test failure when testing a local HDFS for WALs 
> while using WASB/ADLS only for store files.
> Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
> root dir for retrieving WAL files.
> We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21246) Introduce WALIdentity interface

2018-11-08 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680549#comment-16680549
 ] 

Ted Yu commented on HBASE-21246:


Looking at compilation errors for patch v22: 321 lines in the compilation 
output.

It would be non-trivial to make v22 or v23 pass both compilation and unit test 
suite.
Also, v26 is the closest to what we want WALFactory and WALProvider to be.
So it would be nice if we can build upon v26, possibly using HBASE-21456

> Introduce WALIdentity interface
> ---
>
> Key: HBASE-21246
> URL: https://issues.apache.org/jira/browse/HBASE-21246
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Fix For: HBASE-20952
>
> Attachments: 21246.003.patch, 21246.20.txt, 21246.21.txt, 
> 21246.23.txt, 21246.24.txt, 21246.25.txt, 21246.26.txt, 
> 21246.HBASE-20952.001.patch, 21246.HBASE-20952.002.patch, 
> 21246.HBASE-20952.004.patch, 21246.HBASE-20952.005.patch, 
> 21246.HBASE-20952.007.patch, 21246.HBASE-20952.008.patch, 
> replication-src-creates-wal-reader.jpg, wal-factory-providers.png, 
> wal-providers.png, wal-splitter-reader.jpg, wal-splitter-writer.jpg
>
>
> We are introducing WALIdentity interface so that the WAL representation can 
> be decoupled from distributed filesystem.
> The interface provides getName method whose return value can represent 
> filename in distributed filesystem environment or, the name of the stream 
> when the WAL is backed by log stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-08 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680531#comment-16680531
 ] 

Ted Yu commented on HBASE-21457:


The test passed during local run.

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 102.933 sec - 
in org.apache.hadoop.hbase.backup.TestRemoteRestore


> BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
> ---
>
> Key: HBASE-21457
> URL: https://issues.apache.org/jira/browse/HBASE-21457
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janos Gub
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21457.v1.txt, 21457.v2.txt, 21457.v3.txt
>
>
> Janos reported seeing backup test failure when testing a local HDFS for WALs 
> while using WASB/ADLS only for store files.
> Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
> root dir for retrieving WAL files.
> We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-08 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680500#comment-16680500
 ] 

Ted Yu commented on HBASE-21457:


I tried to retrieve output for the failed test but it looks like archiving 
wasn't successful.

> BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
> ---
>
> Key: HBASE-21457
> URL: https://issues.apache.org/jira/browse/HBASE-21457
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janos Gub
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21457.v1.txt, 21457.v2.txt, 21457.v3.txt
>
>
> Janos reported seeing backup test failure when testing a local HDFS for WALs 
> while using WASB/ADLS only for store files.
> Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
> root dir for retrieving WAL files.
> We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-08 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680435#comment-16680435
 ] 

Ted Yu commented on HBASE-21457:


I did a search in hbase-backup module where we retrieve FileSystem.

The one fixed in the patch is the only one I found for WAL.

If you know of any other call(s) for WAL FS which should be changed, please let 
me know.

> BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
> ---
>
> Key: HBASE-21457
> URL: https://issues.apache.org/jira/browse/HBASE-21457
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janos Gub
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21457.v1.txt, 21457.v2.txt, 21457.v3.txt
>
>
> Janos reported seeing backup test failure when testing a local HDFS for WALs 
> while using WASB/ADLS only for store files.
> Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
> root dir for retrieving WAL files.
> We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-08 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21457:
---
Attachment: 21457.v3.txt

> BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
> ---
>
> Key: HBASE-21457
> URL: https://issues.apache.org/jira/browse/HBASE-21457
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janos Gub
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21457.v1.txt, 21457.v2.txt, 21457.v3.txt
>
>
> Janos reported seeing backup test failure when testing a local HDFS for WALs 
> while using WASB/ADLS only for store files.
> Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
> root dir for retrieving WAL files.
> We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HADOOP-15910) Javadoc for LdapAuthenticationHandler#ENABLE_START_TLS is wrong

2018-11-08 Thread Ted Yu (JIRA)
Ted Yu created HADOOP-15910:
---

 Summary: Javadoc for LdapAuthenticationHandler#ENABLE_START_TLS is 
wrong
 Key: HADOOP-15910
 URL: https://issues.apache.org/jira/browse/HADOOP-15910
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Ted Yu


In LdapAuthenticationHandler, the javadoc for ENABLE_START_TLS has the same 
contents for BASE_DN



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Updated] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-08 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21457:
---
Attachment: (was: 21457.v3.txt)

> BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
> ---
>
> Key: HBASE-21457
> URL: https://issues.apache.org/jira/browse/HBASE-21457
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janos Gub
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21457.v1.txt, 21457.v2.txt
>
>
> Janos reported seeing backup test failure when testing a local HDFS for WALs 
> while using WASB/ADLS only for store files.
> Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
> root dir for retrieving WAL files.
> We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-08 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21457:
---
Attachment: 21457.v3.txt

> BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
> ---
>
> Key: HBASE-21457
> URL: https://issues.apache.org/jira/browse/HBASE-21457
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janos Gub
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21457.v1.txt, 21457.v2.txt, 21457.v3.txt
>
>
> Janos reported seeing backup test failure when testing a local HDFS for WALs 
> while using WASB/ADLS only for store files.
> Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
> root dir for retrieving WAL files.
> We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-08 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680380#comment-16680380
 ] 

Ted Yu commented on HBASE-21457:


Patch v3 modifies TestBackupBase to use second hdfs cluster for WAL dir.

> BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
> ---
>
> Key: HBASE-21457
> URL: https://issues.apache.org/jira/browse/HBASE-21457
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janos Gub
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21457.v1.txt, 21457.v2.txt
>
>
> Janos reported seeing backup test failure when testing a local HDFS for WALs 
> while using WASB/ADLS only for store files.
> Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
> root dir for retrieving WAL files.
> We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21439) StochasticLoadBalancer RegionLoads aren’t being used in RegionLoad cost functions

2018-11-08 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21439:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Thanks for the patch, Ben

> StochasticLoadBalancer RegionLoads aren’t being used in RegionLoad cost 
> functions
> -
>
> Key: HBASE-21439
> URL: https://issues.apache.org/jira/browse/HBASE-21439
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 1.3.2.1, 2.0.2
>Reporter: Ben Lau
>Assignee: Ben Lau
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21439-master.patch
>
>
> In StochasticLoadBalancer.updateRegionLoad() the region loads are being put 
> into the map with Bytes.toString(regionName).
> First, this is a problem because Bytes.toString() assumes that the byte array 
> is a UTF8 encoded String but there is no guarantee that regionName bytes are 
> legal UTF8.
> Secondly, in BaseLoadBalancer.registerRegion, we are reading the region loads 
> out of the load map not using Bytes.toString() but using 
> region.getRegionNameAsString() and region.getEncodedName().  So the load 
> balancer will not see or use any of the cluster's RegionLoad history.
> There are 2 primary ways to solve this issue, assuming we want to stay with 
> String keys for the load map (seems reasonable to aid debugging).  We can 
> either fix updateRegionLoad to store the regionName as a string properly or 
> we can update both the reader & writer to use a new common valid String 
> representation.
> Will post a patch assuming we want to pursue the original intention, i.e. 
> store regionNameAsAString for the loadmap key, but I'm open to fixing this a 
> different way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21246) Introduce WALIdentity interface

2018-11-08 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21246:
---
Attachment: 21246.26.txt

> Introduce WALIdentity interface
> ---
>
> Key: HBASE-21246
> URL: https://issues.apache.org/jira/browse/HBASE-21246
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Fix For: HBASE-20952
>
> Attachments: 21246.003.patch, 21246.20.txt, 21246.21.txt, 
> 21246.23.txt, 21246.24.txt, 21246.25.txt, 21246.26.txt, 
> 21246.HBASE-20952.001.patch, 21246.HBASE-20952.002.patch, 
> 21246.HBASE-20952.004.patch, 21246.HBASE-20952.005.patch, 
> 21246.HBASE-20952.007.patch, 21246.HBASE-20952.008.patch, 
> replication-src-creates-wal-reader.jpg, wal-factory-providers.png, 
> wal-providers.png, wal-splitter-reader.jpg, wal-splitter-writer.jpg
>
>
> We are introducing WALIdentity interface so that the WAL representation can 
> be decoupled from distributed filesystem.
> The interface provides getName method whose return value can represent 
> filename in distributed filesystem environment or, the name of the stream 
> when the WAL is backed by log stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-08 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21457:
---
Attachment: 21457.v2.txt

> BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
> ---
>
> Key: HBASE-21457
> URL: https://issues.apache.org/jira/browse/HBASE-21457
> Project: HBase
>  Issue Type: Bug
>Reporter: Janos Gub
>    Assignee: Ted Yu
>Priority: Major
> Attachments: 21457.v1.txt, 21457.v2.txt
>
>
> Janos reported seeing backup test failure when testing a local HDFS for WALs 
> while using WASB/ADLS only for store files.
> Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
> root dir for retrieving WAL files.
> We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HADOOP-15910) Javadoc for LdapAuthenticationHandler#ENABLE_START_TLS is wrong

2018-11-08 Thread Ted Yu (JIRA)
Ted Yu created HADOOP-15910:
---

 Summary: Javadoc for LdapAuthenticationHandler#ENABLE_START_TLS is 
wrong
 Key: HADOOP-15910
 URL: https://issues.apache.org/jira/browse/HADOOP-15910
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Ted Yu


In LdapAuthenticationHandler, the javadoc for ENABLE_START_TLS has the same 
contents for BASE_DN



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-08 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21457:
---
Status: Patch Available  (was: Open)

> BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
> ---
>
> Key: HBASE-21457
> URL: https://issues.apache.org/jira/browse/HBASE-21457
> Project: HBase
>  Issue Type: Bug
>Reporter: Janos Gub
>Priority: Major
> Attachments: 21457.v1.txt
>
>
> Janos reported seeing backup test failure when testing a local HDFS for WALs 
> while using WASB/ADLS only for store files.
> Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
> root dir for retrieving WAL files.
> We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-08 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-21457:
--

Assignee: Ted Yu

> BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
> ---
>
> Key: HBASE-21457
> URL: https://issues.apache.org/jira/browse/HBASE-21457
> Project: HBase
>  Issue Type: Bug
>Reporter: Janos Gub
>    Assignee: Ted Yu
>Priority: Major
> Attachments: 21457.v1.txt
>
>
> Janos reported seeing backup test failure when testing a local HDFS for WALs 
> while using WASB/ADLS only for store files.
> Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
> root dir for retrieving WAL files.
> We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-08 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21457:
--

 Summary: BackupUtils#getWALFilesOlderThan refers to wrong 
FileSystem
 Key: HBASE-21457
 URL: https://issues.apache.org/jira/browse/HBASE-21457
 Project: HBase
  Issue Type: Bug
Reporter: Janos Gub


Janos reported seeing backup test failure when testing a local HDFS for WALs 
while using WASB/ADLS only for store files.

Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
root dir for retrieving WAL files.

We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-08 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21457:
--

 Summary: BackupUtils#getWALFilesOlderThan refers to wrong 
FileSystem
 Key: HBASE-21457
 URL: https://issues.apache.org/jira/browse/HBASE-21457
 Project: HBase
  Issue Type: Bug
Reporter: Janos Gub


Janos reported seeing backup test failure when testing a local HDFS for WALs 
while using WASB/ADLS only for store files.

Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
root dir for retrieving WAL files.

We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-08 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21457:
---
Attachment: 21457.v1.txt

> BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
> ---
>
> Key: HBASE-21457
> URL: https://issues.apache.org/jira/browse/HBASE-21457
> Project: HBase
>  Issue Type: Bug
>Reporter: Janos Gub
>Priority: Major
> Attachments: 21457.v1.txt
>
>
> Janos reported seeing backup test failure when testing a local HDFS for WALs 
> while using WASB/ADLS only for store files.
> Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
> root dir for retrieving WAL files.
> We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-08 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680048#comment-16680048
 ] 

Ted Yu commented on HBASE-21457:


bq. can you replace filesystem parameter fs with walFs in the lines above?

Done in patch v2.

bq. confirm child classes of TestBackupBase passed ?

Verified locally - all backup tests passed.

I will see if I can add a unit test.
Janos will run tests with local HDFS for WALs which would verify the fix.

> BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
> ---
>
> Key: HBASE-21457
> URL: https://issues.apache.org/jira/browse/HBASE-21457
> Project: HBase
>  Issue Type: Bug
>Reporter: Janos Gub
>    Assignee: Ted Yu
>Priority: Major
> Attachments: 21457.v1.txt, 21457.v2.txt
>
>
> Janos reported seeing backup test failure when testing a local HDFS for WALs 
> while using WASB/ADLS only for store files.
> Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
> root dir for retrieving WAL files.
> We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21439) StochasticLoadBalancer RegionLoads aren’t being used in RegionLoad cost functions

2018-11-07 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678766#comment-16678766
 ] 

Ted Yu commented on HBASE-21439:


StochasticLoadBalancer tests passed.

lgtm, pending QA.

> StochasticLoadBalancer RegionLoads aren’t being used in RegionLoad cost 
> functions
> -
>
> Key: HBASE-21439
> URL: https://issues.apache.org/jira/browse/HBASE-21439
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 1.3.2.1, 2.0.2
>Reporter: Ben Lau
>Assignee: Ben Lau
>Priority: Major
> Attachments: HBASE-21439-master.patch
>
>
> In StochasticLoadBalancer.updateRegionLoad() the region loads are being put 
> into the map with Bytes.toString(regionName).
> First, this is a problem because Bytes.toString() assumes that the byte array 
> is a UTF8 encoded String but there is no guarantee that regionName bytes are 
> legal UTF8.
> Secondly, in BaseLoadBalancer.registerRegion, we are reading the region loads 
> out of the load map not using Bytes.toString() but using 
> region.getRegionNameAsString() and region.getEncodedName().  So the load 
> balancer will not see or use any of the cluster's RegionLoad history.
> There are 2 primary ways to solve this issue, assuming we want to stay with 
> String keys for the load map (seems reasonable to aid debugging).  We can 
> either fix updateRegionLoad to store the regionName as a string properly or 
> we can update both the reader & writer to use a new common valid String 
> representation.
> Will post a patch assuming we want to pursue the original intention, i.e. 
> store regionNameAsAString for the loadmap key, but I'm open to fixing this a 
> different way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-07 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678531#comment-16678531
 ] 

Ted Yu commented on HBASE-21387:


Here is a brief summary of the approaches I tried, with most recent first - 
which is expected to be reviewed:

21387.v9.txt : 

At the beginning of getUnreferencedFiles, snapshot is temporarily disabled.
We check whether there is in-flight snapshot. If there is, don't list any file 
as unreferenced.
Otherwise, fill out unreferenced files. During this time, snapshot attempt 
would be declined.
At the end of getUnreferencedFiles, snapshot is enabled.

two-pass-cleaner.v9.txt :

Cleaner chore stores candidates from previous invocation of the chore. The 
chore would calculate the intersection of previous candidates and current 
candidates.
The downside of this approach is that the extra candidates from previous 
iteration consumes (potentially large) memory.

21387.v8.txt :

SnapshotFileCache would try to obtain in progress snapshot under the lock. 
However, since the timing of when in progress snapshot completes is not under 
the control of SnapshotFileCache, it is hard to avoid race condition.


> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v2.txt, 21387.v3.txt, 21387.v6.txt, 
> 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, two-pass-cleaner.v4.txt, 
> two-pass-cleaner.v6.txt, two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-07 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21387:
---
Description: 
During recent report from customer where ExportSnapshot failed:
{code}
2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
snapshot.SnapshotReferenceUtil: Can't find hfile: 
44f6c3c646e84de6a63fe30da4fcb3aa in the real 
(hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
 or archive 
(hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
 directory for the primary table. 
{code}
We found the following in log:
{code}
2018-10-09 18:54:23,675 DEBUG 
[00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
cleaner.HFileCleaner: Removing: 
hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
from archive
{code}
The root cause is race condition surrounding in progress snapshot(s) handling 
between refreshCache() and getUnreferencedFiles().
There are two callers of refreshCache: one from RefreshCacheTask#run and the 
other from SnapshotHFileCleaner.

Let's look at the code of refreshCache:
{code}
  if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
{code}
whose intention is to exclude in progress snapshot(s).
Suppose when the RefreshCacheTask runs refreshCache, there is some in progress 
snapshot (about to finish).

When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
lastModifiedTime is up to date. So cleaner proceeds to check in progress 
snapshot(s). However, the snapshot has completed by that time, resulting in 
some file(s) deemed unreferenced.

Here is timeline given by Josh illustrating the scenario:

At time T0, we are checking if F1 is referenced. At time T1, there is a 
snapshot S1 in progress that is referencing a file F1. refreshCache() is 
called, but no completed snapshot references F1. At T2, the snapshot S1, which 
references F1, completes. At T3, we check in-progress snapshots and S1 is not 
included. Thus, F1 is marked as unreferenced even though S1 references it. 

  was:
During recent report from customer where ExportSnapshot failed:
{code}
2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
snapshot.SnapshotReferenceUtil: Can't find hfile: 
44f6c3c646e84de6a63fe30da4fcb3aa in the real 
(hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
 or archive 
(hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
 directory for the primary table. 
{code}
We found the following in log:
{code}
2018-10-09 18:54:23,675 DEBUG 
[00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
cleaner.HFileCleaner: Removing: 
hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
from archive
{code}
The root cause is race condition surrounding in progress snapshot(s) handling 
between refreshCache() and getUnreferencedFiles().
There are two callers of refreshCache: one from RefreshCacheTask#run and the 
other from SnapshotHFileCleaner.

Let's look at the code of refreshCache:
{code}
  if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
{code}
whose intention is to exclude in progress snapshot(s).
Suppose when the RefreshCacheTask runs refreshCache, there is some in progress 
snapshot (about to finish).

When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
lastModifiedTime is up to date. So cleaner proceeds to check in progress 
snapshot(s). However, the snapshot has completed by that time, resulting in 
some file(s) deemed unreferenced.


> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v2.txt, 21387.v3.txt, 21387.v6.txt, 
> 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, two-pass-cleaner.v4.txt, 
> two-pass-cleaner.v6.txt, two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-

[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-07 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678473#comment-16678473
 ] 

Ted Yu commented on HBASE-21387:


[~openinx][~Apache9][~elserj] :
Please take a look at 21387.v9.txt which solves the race condition between 
in-progress snapshot and hfile cleaner chore.

Your feedback is welcome.

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>    Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v2.txt, 21387.v3.txt, 21387.v6.txt, 
> 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, two-pass-cleaner.v4.txt, 
> two-pass-cleaner.v6.txt, two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21246) Introduce WALIdentity interface

2018-11-06 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21246:
---
Attachment: 21246.25.txt

> Introduce WALIdentity interface
> ---
>
> Key: HBASE-21246
> URL: https://issues.apache.org/jira/browse/HBASE-21246
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Ted Yu
>    Assignee: Ted Yu
>Priority: Major
> Fix For: HBASE-20952
>
> Attachments: 21246.003.patch, 21246.20.txt, 21246.21.txt, 
> 21246.23.txt, 21246.24.txt, 21246.25.txt, 21246.HBASE-20952.001.patch, 
> 21246.HBASE-20952.002.patch, 21246.HBASE-20952.004.patch, 
> 21246.HBASE-20952.005.patch, 21246.HBASE-20952.007.patch, 
> 21246.HBASE-20952.008.patch, replication-src-creates-wal-reader.jpg, 
> wal-factory-providers.png, wal-providers.png, wal-splitter-reader.jpg, 
> wal-splitter-writer.jpg
>
>
> We are introducing WALIdentity interface so that the WAL representation can 
> be decoupled from distributed filesystem.
> The interface provides getName method whose return value can represent 
> filename in distributed filesystem environment or, the name of the stream 
> when the WAL is backed by log stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


<    1   2   3   4   5   6   7   8   9   10   >