[jira] [Commented] (HBASE-20952) Re-visit the WAL API
[ https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558661#comment-16558661 ] Zach York commented on HBASE-20952: --- I definitely have some thoughts on this. I'll try to summarize and put it here, but in general making the interface as basic as possible would be the easiest to work with IMO. > Re-visit the WAL API > > > Key: HBASE-20952 > URL: https://issues.apache.org/jira/browse/HBASE-20952 > Project: HBase > Issue Type: Sub-task > Components: wal >Reporter: Josh Elser >Priority: Major > > Take a step back from the current WAL implementations and think about what an > HBase WAL API should look like. What are the primitive calls that we require > to guarantee durability of writes with a high degree of performance? > The API needs to take the current implementations into consideration. We > should also have a mind for what is happening in the Ratis LogService (but > the LogService should not dictate what HBase's WAL API looks like RATIS-272). > The API may be "OK" (or OK in a part). We need to also consider other methods > which were "bolted" on such as {{AbstractFSWAL}} and > {{WALFileLengthProvider}}. Other corners of "WAL use" (like the > {{WALSplitter}} should also be looked at to use WAL-APIs only). > We also need to make sure that adequate interface audience and stability > annotations are chosen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20856) PITA having to set WAL provider in two places
[ https://issues.apache.org/jira/browse/HBASE-20856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553326#comment-16553326 ] Zach York commented on HBASE-20856: --- [~stack], [~busbey], or [~elserj] have any further comments? If not, I'll commit it tomorrow. > PITA having to set WAL provider in two places > - > > Key: HBASE-20856 > URL: https://issues.apache.org/jira/browse/HBASE-20856 > Project: HBase > Issue Type: Improvement > Components: Operability, wal >Affects Versions: 3.0.0 >Reporter: stack >Assignee: Tak Lon (Stephen) Wu >Priority: Minor > Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1 > > Attachments: HBASE-20856.master.001.patch, > HBASE-20856.master.002.patch, HBASE-20856.master.003.patch > > > Courtesy of [~elserj], I learn that changing WAL we need to set two places... > both hbase.wal.meta_provider and hbase.wal.provider. Operator should only > have to set it in one place; hbase.wal.meta_provider should pick up general > setting unless hbase.wal.meta_provider is explicitly set. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20558) Backport HBASE-17854 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-20558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20558: -- Resolution: Fixed Fix Version/s: 1.4.6 1.5.0 Status: Resolved (was: Patch Available) > Backport HBASE-17854 to branch-1 > > > Key: HBASE-20558 > URL: https://issues.apache.org/jira/browse/HBASE-20558 > Project: HBase > Issue Type: Sub-task > Components: HFile >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Major > Fix For: 1.5.0, 1.4.6 > > Attachments: HBASE-20558.branch-1.001.patch, report.html > > > As part of HBASE-20555, HBASE-17854 is the third patch that is needed for > backporting HBASE-18083 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20558) Backport HBASE-17854 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-20558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551421#comment-16551421 ] Zach York commented on HBASE-20558: --- Ah! Perfect, thanks! > Backport HBASE-17854 to branch-1 > > > Key: HBASE-20558 > URL: https://issues.apache.org/jira/browse/HBASE-20558 > Project: HBase > Issue Type: Sub-task > Components: HFile >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Major > Attachments: HBASE-20558.branch-1.001.patch, report.html > > > As part of HBASE-20555, HBASE-17854 is the third patch that is needed for > backporting HBASE-18083 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20558) Backport HBASE-17854 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-20558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551385#comment-16551385 ] Zach York commented on HBASE-20558: --- Note, this was done on branch-1. I was planning on doing the same for branch-1.4 > Backport HBASE-17854 to branch-1 > > > Key: HBASE-20558 > URL: https://issues.apache.org/jira/browse/HBASE-20558 > Project: HBase > Issue Type: Sub-task > Components: HFile >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Major > Attachments: HBASE-20558.branch-1.001.patch, report.html > > > As part of HBASE-20555, HBASE-17854 is the third patch that is needed for > backporting HBASE-18083 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20558) Backport HBASE-17854 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-20558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20558: -- Attachment: report.html > Backport HBASE-17854 to branch-1 > > > Key: HBASE-20558 > URL: https://issues.apache.org/jira/browse/HBASE-20558 > Project: HBase > Issue Type: Sub-task > Components: HFile >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Major > Attachments: HBASE-20558.branch-1.001.patch, report.html > > > As part of HBASE-20555, HBASE-17854 is the third patch that is needed for > backporting HBASE-18083 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20558) Backport HBASE-17854 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-20558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551383#comment-16551383 ] Zach York commented on HBASE-20558: --- [~apurtell] I ran the compat-checker and it generated errors, most of them are renamed methods from the previous patches, but this also added a constructor which seems okay to add within a version. How do you judge whether something is okay? I'll attach the report.html > Backport HBASE-17854 to branch-1 > > > Key: HBASE-20558 > URL: https://issues.apache.org/jira/browse/HBASE-20558 > Project: HBase > Issue Type: Sub-task > Components: HFile >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Major > Attachments: HBASE-20558.branch-1.001.patch, report.html > > > As part of HBASE-20555, HBASE-17854 is the third patch that is needed for > backporting HBASE-18083 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20555) Backport HBASE-18083 and related changes in branch-1
[ https://issues.apache.org/jira/browse/HBASE-20555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551268#comment-16551268 ] Zach York commented on HBASE-20555: --- [~apurtell] FYI, I'd like to get the remaining two backports in for 1.4.6 if possible since the first 2 are there. I plan to push #3 soon and will review #4 soon after that. We should be able to get this in today or Monday at the latest. > Backport HBASE-18083 and related changes in branch-1 > > > Key: HBASE-20555 > URL: https://issues.apache.org/jira/browse/HBASE-20555 > Project: HBase > Issue Type: Umbrella > Components: HFile, snapshots >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Major > > This will be the umbrella JIRA for backporting HBASE-18083 of `Make > large/small file clean thread number configurable in HFileCleaner` from > HBase's branch-2 to HBase's branch-1 that will needs a total of 4 sub-tasks > that backport HBASE-16490, HBASE-17215, HBASE-17854 and then HBASE-18083 > The goal is to improve HFile cleaning performance that has been introduced in > branch-2 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20558) Backport HBASE-17854 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-20558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551260#comment-16551260 ] Zach York commented on HBASE-20558: --- +1 I will commit in an hour if no objections. > Backport HBASE-17854 to branch-1 > > > Key: HBASE-20558 > URL: https://issues.apache.org/jira/browse/HBASE-20558 > Project: HBase > Issue Type: Sub-task > Components: HFile >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Major > Attachments: HBASE-20558.branch-1.001.patch > > > As part of HBASE-20555, HBASE-17854 is the third patch that is needed for > backporting HBASE-18083 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20856) PITA having to set WAL provider in two places
[ https://issues.apache.org/jira/browse/HBASE-20856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549684#comment-16549684 ] Zach York commented on HBASE-20856: --- +1 Although that wrapped provider stuff is kinda ugly (not your fault though :) ). Can you reattach patch to try and get a clean testing run? The only failure was a timed out test. > PITA having to set WAL provider in two places > - > > Key: HBASE-20856 > URL: https://issues.apache.org/jira/browse/HBASE-20856 > Project: HBase > Issue Type: Improvement > Components: Operability, wal >Affects Versions: 3.0.0 >Reporter: stack >Assignee: Tak Lon (Stephen) Wu >Priority: Minor > Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1 > > Attachments: HBASE-20856.master.001.patch, > HBASE-20856.master.002.patch > > > Courtesy of [~elserj], I learn that changing WAL we need to set two places... > both hbase.wal.meta_provider and hbase.wal.provider. Operator should only > have to set it in one place; hbase.wal.meta_provider should pick up general > setting unless hbase.wal.meta_provider is explicitly set. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir
[ https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542259#comment-16542259 ] Zach York commented on HBASE-20734: --- Thanks for reviewing [~yuzhih...@gmail.com]. I am trying to rebase on master, but there are a ton of conflicts. I'll hopefully get a new patch up for that early next week as I will likely have to redo a lot of the changes on top of the master branch. I'll also toss it in review board. > Colocate recovered edits directory with hbase.wal.dir > - > > Key: HBASE-20734 > URL: https://issues.apache.org/jira/browse/HBASE-20734 > Project: HBase > Issue Type: Improvement > Components: MTTR, Recovery, wal >Reporter: Ted Yu >Assignee: Zach York >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-20734.branch-1.001.patch > > > During investigation of HBASE-20723, I realized that we wouldn't get the best > performance when hbase.wal.dir is configured to be on different (fast) media > than hbase rootdir w.r.t. recovered edits since recovered edits directory is > currently under rootdir. > Such setup may not result in fast recovery when there is region server > failover. > This issue is to find proper (hopefully backward compatible) way in > colocating recovered edits directory with hbase.wal.dir . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir
[ https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540796#comment-16540796 ] Zach York commented on HBASE-20734: --- Yep, I'll work on getting a patch for master branch. It was just easier for me to test on cluster with branch-1. > Colocate recovered edits directory with hbase.wal.dir > - > > Key: HBASE-20734 > URL: https://issues.apache.org/jira/browse/HBASE-20734 > Project: HBase > Issue Type: Improvement > Components: MTTR, Recovery, wal >Reporter: Ted Yu >Assignee: Zach York >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-20734.branch-1.001.patch > > > During investigation of HBASE-20723, I realized that we wouldn't get the best > performance when hbase.wal.dir is configured to be on different (fast) media > than hbase rootdir w.r.t. recovered edits since recovered edits directory is > currently under rootdir. > Such setup may not result in fast recovery when there is region server > failover. > This issue is to find proper (hopefully backward compatible) way in > colocating recovered edits directory with hbase.wal.dir . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-18840) Add functionality to refresh meta table at master startup
[ https://issues.apache.org/jira/browse/HBASE-18840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-18840: -- Attachment: HBASE-18840.HBASE-18477.007.patch > Add functionality to refresh meta table at master startup > - > > Key: HBASE-18840 > URL: https://issues.apache.org/jira/browse/HBASE-18840 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Attachments: HBASE-18840.HBASE-18477.001.patch, > HBASE-18840.HBASE-18477.002.patch, HBASE-18840.HBASE-18477.003 (2) (1).patch, > HBASE-18840.HBASE-18477.003 (2).patch, HBASE-18840.HBASE-18477.003.patch, > HBASE-18840.HBASE-18477.004.patch, HBASE-18840.HBASE-18477.005.patch, > HBASE-18840.HBASE-18477.006.patch, HBASE-18840.HBASE-18477.007.patch > > > If a HBase cluster’s hbase:meta table is deleted or a cluster is started with > a new meta table, HBase needs the functionality to synchronize it’s metadata > from Storage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20868) Fix TestCheckTestClasses on HBASE-18477
[ https://issues.apache.org/jira/browse/HBASE-20868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20868: -- Resolution: Fixed Status: Resolved (was: Patch Available) pushed to HBASE-18477 > Fix TestCheckTestClasses on HBASE-18477 > --- > > Key: HBASE-20868 > URL: https://issues.apache.org/jira/browse/HBASE-20868 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Minor > Fix For: HBASE-18477 > > Attachments: HBASE-20868.HBASE-18477.001.patch, > HBASE-20868.HBASE-18477.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir
[ https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20734: -- Status: Patch Available (was: Open) > Colocate recovered edits directory with hbase.wal.dir > - > > Key: HBASE-20734 > URL: https://issues.apache.org/jira/browse/HBASE-20734 > Project: HBase > Issue Type: Improvement > Components: MTTR, Recovery, wal >Reporter: Ted Yu >Assignee: Zach York >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-20734.branch-1.001.patch > > > During investigation of HBASE-20723, I realized that we wouldn't get the best > performance when hbase.wal.dir is configured to be on different (fast) media > than hbase rootdir w.r.t. recovered edits since recovered edits directory is > currently under rootdir. > Such setup may not result in fast recovery when there is region server > failover. > This issue is to find proper (hopefully backward compatible) way in > colocating recovered edits directory with hbase.wal.dir . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir
[ https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20734: -- Attachment: HBASE-20734.branch-1.001.patch > Colocate recovered edits directory with hbase.wal.dir > - > > Key: HBASE-20734 > URL: https://issues.apache.org/jira/browse/HBASE-20734 > Project: HBase > Issue Type: Improvement > Components: MTTR, Recovery, wal >Reporter: Ted Yu >Assignee: Zach York >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-20734.branch-1.001.patch > > > During investigation of HBASE-20723, I realized that we wouldn't get the best > performance when hbase.wal.dir is configured to be on different (fast) media > than hbase rootdir w.r.t. recovered edits since recovered edits directory is > currently under rootdir. > Such setup may not result in fast recovery when there is region server > failover. > This issue is to find proper (hopefully backward compatible) way in > colocating recovered edits directory with hbase.wal.dir . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20649) Validate HFiles do not have PREFIX_TREE DataBlockEncoding
[ https://issues.apache.org/jira/browse/HBASE-20649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539329#comment-16539329 ] Zach York commented on HBASE-20649: --- Trying to get up to speed on this all. Overall looks like a handy upgrade tool! [~busbey] Your steps are what we want to document as an operator? It would be awesome if we could provide more info when running the specific tool (if it fails in root dir, suggest trying a major compaction if data encoding for the table is correct. If it fails in archive dir, see if any Snapshots reference these files). Could we have a tool/script to help automate determining which snapshot is 'dirty' and help to automatically clean it? It just seems like a lot of manual steps to get your cluster upgrade ready (imagine if you had a number of incremental snapshots). > Validate HFiles do not have PREFIX_TREE DataBlockEncoding > - > > Key: HBASE-20649 > URL: https://issues.apache.org/jira/browse/HBASE-20649 > Project: HBase > Issue Type: New Feature >Reporter: Peter Somogyi >Assignee: Balazs Meszaros >Priority: Minor > Fix For: 3.0.0 > > Attachments: HBASE-20649.master.001.patch, > HBASE-20649.master.002.patch, HBASE-20649.master.003.patch, > HBASE-20649.master.004.patch, HBASE-20649.master.005.patch > > > HBASE-20592 adds a tool to check column families on the cluster do not have > PREFIX_TREE encoding. > Since it is possible that DataBlockEncoding was already changed but HFiles > are not rewritten yet we would need a tool that can verify the content of > hfiles in the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20868) Fix TestCheckTestClasses on HBASE-18477
[ https://issues.apache.org/jira/browse/HBASE-20868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539278#comment-16539278 ] Zach York commented on HBASE-20868: --- [~yuzhih...@gmail.com] Can you take a look when you get a chance? It's a simple annotation fix. > Fix TestCheckTestClasses on HBASE-18477 > --- > > Key: HBASE-20868 > URL: https://issues.apache.org/jira/browse/HBASE-20868 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Minor > Fix For: HBASE-18477 > > Attachments: HBASE-20868.HBASE-18477.001.patch, > HBASE-20868.HBASE-18477.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20868) Fix TestCheckTestClasses on HBASE-18477
[ https://issues.apache.org/jira/browse/HBASE-20868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20868: -- Attachment: HBASE-20868.HBASE-18477.002.patch > Fix TestCheckTestClasses on HBASE-18477 > --- > > Key: HBASE-20868 > URL: https://issues.apache.org/jira/browse/HBASE-20868 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Minor > Fix For: HBASE-18477 > > Attachments: HBASE-20868.HBASE-18477.001.patch, > HBASE-20868.HBASE-18477.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20868) Fix TestCheckTestClasses on HBASE-18477
[ https://issues.apache.org/jira/browse/HBASE-20868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20868: -- Status: Patch Available (was: Open) > Fix TestCheckTestClasses on HBASE-18477 > --- > > Key: HBASE-20868 > URL: https://issues.apache.org/jira/browse/HBASE-20868 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Minor > Fix For: HBASE-18477 > > Attachments: HBASE-20868.HBASE-18477.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20868) Fix TestCheckTestClasses on HBASE-18477
[ https://issues.apache.org/jira/browse/HBASE-20868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20868: -- Attachment: HBASE-20868.HBASE-18477.001.patch > Fix TestCheckTestClasses on HBASE-18477 > --- > > Key: HBASE-20868 > URL: https://issues.apache.org/jira/browse/HBASE-20868 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Minor > Fix For: HBASE-18477 > > Attachments: HBASE-20868.HBASE-18477.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20868) Fix TestCheckTestClasses on HBASE-18477
Zach York created HBASE-20868: - Summary: Fix TestCheckTestClasses on HBASE-18477 Key: HBASE-20868 URL: https://issues.apache.org/jira/browse/HBASE-20868 Project: HBase Issue Type: Sub-task Affects Versions: HBASE-18477 Reporter: Zach York Assignee: Zach York Fix For: HBASE-18477 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-18840) Add functionality to refresh meta table at master startup
[ https://issues.apache.org/jira/browse/HBASE-18840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-18840: -- Attachment: HBASE-18840.HBASE-18477.006.patch > Add functionality to refresh meta table at master startup > - > > Key: HBASE-18840 > URL: https://issues.apache.org/jira/browse/HBASE-18840 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Attachments: HBASE-18840.HBASE-18477.001.patch, > HBASE-18840.HBASE-18477.002.patch, HBASE-18840.HBASE-18477.003 (2) (1).patch, > HBASE-18840.HBASE-18477.003 (2).patch, HBASE-18840.HBASE-18477.003.patch, > HBASE-18840.HBASE-18477.004.patch, HBASE-18840.HBASE-18477.005.patch, > HBASE-18840.HBASE-18477.006.patch > > > If a HBase cluster’s hbase:meta table is deleted or a cluster is started with > a new meta table, HBase needs the functionality to synchronize it’s metadata > from Storage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20787) Rebase the HBASE-18477 onto the current master to continue dev
[ https://issues.apache.org/jira/browse/HBASE-20787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York resolved HBASE-20787. --- Resolution: Fixed > Rebase the HBASE-18477 onto the current master to continue dev > -- > > Key: HBASE-20787 > URL: https://issues.apache.org/jira/browse/HBASE-20787 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Minor > Fix For: HBASE-18477 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-20787) Rebase the HBASE-18477 onto the current master to continue dev
[ https://issues.apache.org/jira/browse/HBASE-20787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York reopened HBASE-20787: --- Rebasing again to pull in fixes for unit tests. > Rebase the HBASE-18477 onto the current master to continue dev > -- > > Key: HBASE-20787 > URL: https://issues.apache.org/jira/browse/HBASE-20787 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Minor > Fix For: HBASE-18477 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20836) Add Yetus annotation for ReadReplicaClustersTableNameUtil
[ https://issues.apache.org/jira/browse/HBASE-20836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537664#comment-16537664 ] Zach York commented on HBASE-20836: --- Pushed, thanks [~yuzhih...@gmail.com] > Add Yetus annotation for ReadReplicaClustersTableNameUtil > - > > Key: HBASE-20836 > URL: https://issues.apache.org/jira/browse/HBASE-20836 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Fix For: HBASE-18477 > > Attachments: HBASE-20836.HBASE-18477.001.patch, > HBASE-20836.HBASE-18477.002.patch, HBASE-20836.HBASE-18477.003.patch > > > Found via nightly builds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20836) Add Yetus annotation for ReadReplicaClustersTableNameUtil
[ https://issues.apache.org/jira/browse/HBASE-20836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20836: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Add Yetus annotation for ReadReplicaClustersTableNameUtil > - > > Key: HBASE-20836 > URL: https://issues.apache.org/jira/browse/HBASE-20836 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Fix For: HBASE-18477 > > Attachments: HBASE-20836.HBASE-18477.001.patch, > HBASE-20836.HBASE-18477.002.patch, HBASE-20836.HBASE-18477.003.patch > > > Found via nightly builds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-18840) Add functionality to refresh meta table at master startup
[ https://issues.apache.org/jira/browse/HBASE-18840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-18840: -- Attachment: HBASE-18840.HBASE-18477.005.patch > Add functionality to refresh meta table at master startup > - > > Key: HBASE-18840 > URL: https://issues.apache.org/jira/browse/HBASE-18840 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Attachments: HBASE-18840.HBASE-18477.001.patch, > HBASE-18840.HBASE-18477.002.patch, HBASE-18840.HBASE-18477.003 (2) (1).patch, > HBASE-18840.HBASE-18477.003 (2).patch, HBASE-18840.HBASE-18477.003.patch, > HBASE-18840.HBASE-18477.004.patch, HBASE-18840.HBASE-18477.005.patch > > > If a HBase cluster’s hbase:meta table is deleted or a cluster is started with > a new meta table, HBase needs the functionality to synchronize it’s metadata > from Storage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20836) Add Yetus annotation for ReadReplicaClustersTableNameUtil
[ https://issues.apache.org/jira/browse/HBASE-20836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537540#comment-16537540 ] Zach York commented on HBASE-20836: --- It looks like that did the trick. Can you commit when you get the chance, [~yuzhih...@gmail.com]? > Add Yetus annotation for ReadReplicaClustersTableNameUtil > - > > Key: HBASE-20836 > URL: https://issues.apache.org/jira/browse/HBASE-20836 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Fix For: HBASE-18477 > > Attachments: HBASE-20836.HBASE-18477.001.patch, > HBASE-20836.HBASE-18477.002.patch, HBASE-20836.HBASE-18477.003.patch > > > Found via nightly builds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20836) Add Yetus annotation for ReadReplicaClustersTableNameUtil
[ https://issues.apache.org/jira/browse/HBASE-20836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20836: -- Attachment: HBASE-20836.HBASE-18477.003.patch > Add Yetus annotation for ReadReplicaClustersTableNameUtil > - > > Key: HBASE-20836 > URL: https://issues.apache.org/jira/browse/HBASE-20836 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Fix For: HBASE-18477 > > Attachments: HBASE-20836.HBASE-18477.001.patch, > HBASE-20836.HBASE-18477.002.patch, HBASE-20836.HBASE-18477.003.patch > > > Found via nightly builds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20557) Backport HBASE-17215 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-20557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20557: -- Resolution: Fixed Fix Version/s: 1.4.6 1.5.0 Status: Resolved (was: Patch Available) > Backport HBASE-17215 to branch-1 > > > Key: HBASE-20557 > URL: https://issues.apache.org/jira/browse/HBASE-20557 > Project: HBase > Issue Type: Sub-task > Components: HFile, master >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Major > Fix For: 1.5.0, 1.4.6 > > Attachments: HBASE-20557.branch-1.001.patch, > HBASE-20557.branch-1.002.patch, HBASE-20557.branch-1.003.patch > > > As part of HBASE-20555, HBASE-17215 is the second patch that is needed for > backporting HBASE-18083 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20557) Backport HBASE-17215 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-20557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537442#comment-16537442 ] Zach York commented on HBASE-20557: --- Pushed to branch-1 and branch-1.4. Checkcompatibility didn't unearth any issues (a few pre-existing it seems). > Backport HBASE-17215 to branch-1 > > > Key: HBASE-20557 > URL: https://issues.apache.org/jira/browse/HBASE-20557 > Project: HBase > Issue Type: Sub-task > Components: HFile, master >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Major > Fix For: 1.5.0, 1.4.6 > > Attachments: HBASE-20557.branch-1.001.patch, > HBASE-20557.branch-1.002.patch, HBASE-20557.branch-1.003.patch > > > As part of HBASE-20555, HBASE-17215 is the second patch that is needed for > backporting HBASE-18083 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20837) Make IDE configuration for import order match that in our checkstyle module
[ https://issues.apache.org/jira/browse/HBASE-20837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537342#comment-16537342 ] Zach York commented on HBASE-20837: --- [~taklwu] as mentioned in the email thread, this is a best effort solution so let's add an .xml for IntelliJ to make it easier for IntelliJ users and then sync between the checkstyle and the formats. > Make IDE configuration for import order match that in our checkstyle module > --- > > Key: HBASE-20837 > URL: https://issues.apache.org/jira/browse/HBASE-20837 > Project: HBase > Issue Type: Improvement > Components: community >Affects Versions: 3.0.0, 2.0.1, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.2.0 > > Attachments: HBASE-20837.branch-1.001.patch, > HBASE-20837.branch-2.001.patch, HBASE-20837.master.001.patch, IDEA import > layout.png, hbase-intellij-formatter.xml > > > While working on HBASE-20557 contribution, we figured out that the checkstyle > build target (ImportOrder's `groups` > [http://checkstyle.sourceforge.net/config_imports.html] ) was different from > the development supported IDE (e.g. IntelliJ and Eclipse) formatter, we would > provide a fix here to sync between > [dev-support/hbase_eclipse_formatter.xml|https://github.com/apache/hbase/blob/master/dev-support/hbase_eclipse_formatter.xml] > and > [hbase/checkstyle.xml|https://github.com/apache/hbase/blob/master/hbase-checkstyle/src/main/resources/hbase/checkstyle.xml] > This might need to backport the changes of master to branch-1 and branch-2 as > well. > Before this change, this is what checkstyle is expecting for import order > > {code:java} > import com.google.common.annotations.VisibleForTesting; > import java.io.IOException; > import java.util.ArrayList; > import java.util.List; > import java.util.Map; > import org.apache.commons.logging.Log; > import org.apache.commons.logging.LogFactory; > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.hbase.classification.InterfaceAudience; > import org.apache.hadoop.hbase.conf.ConfigurationObserver;{code} > > And the proposed import order with the respect to HBASE-19262 and HBASE-19552 > should be > > !IDEA import layout.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20836) Add Yetus annotation for ReadReplicaClustersTableNameUtil
[ https://issues.apache.org/jira/browse/HBASE-20836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530569#comment-16530569 ] Zach York commented on HBASE-20836: --- [~yuzhih...@gmail.com] I'm not sure why the mvninstall and shadedjars is still failing... perhaps it's not applying the patch first because they are failing for the yetus interface audience error. > Add Yetus annotation for ReadReplicaClustersTableNameUtil > - > > Key: HBASE-20836 > URL: https://issues.apache.org/jira/browse/HBASE-20836 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Fix For: HBASE-18477 > > Attachments: HBASE-20836.HBASE-18477.001.patch, > HBASE-20836.HBASE-18477.002.patch > > > Found via nightly builds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20836) Add Yetus annotation for ReadReplicaClustersTableNameUtil
[ https://issues.apache.org/jira/browse/HBASE-20836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530565#comment-16530565 ] Zach York commented on HBASE-20836: --- Added a new patch to add a private constructor. > Add Yetus annotation for ReadReplicaClustersTableNameUtil > - > > Key: HBASE-20836 > URL: https://issues.apache.org/jira/browse/HBASE-20836 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Fix For: HBASE-18477 > > Attachments: HBASE-20836.HBASE-18477.001.patch, > HBASE-20836.HBASE-18477.002.patch > > > Found via nightly builds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20836) Add Yetus annotation for ReadReplicaClustersTableNameUtil
[ https://issues.apache.org/jira/browse/HBASE-20836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20836: -- Attachment: HBASE-20836.HBASE-18477.002.patch > Add Yetus annotation for ReadReplicaClustersTableNameUtil > - > > Key: HBASE-20836 > URL: https://issues.apache.org/jira/browse/HBASE-20836 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Fix For: HBASE-18477 > > Attachments: HBASE-20836.HBASE-18477.001.patch, > HBASE-20836.HBASE-18477.002.patch > > > Found via nightly builds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20836) Add Yetus annotation for ReadReplicaClustersTableNameUtil
[ https://issues.apache.org/jira/browse/HBASE-20836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530488#comment-16530488 ] Zach York commented on HBASE-20836: --- FYI [~te...@apache.org] > Add Yetus annotation for ReadReplicaClustersTableNameUtil > - > > Key: HBASE-20836 > URL: https://issues.apache.org/jira/browse/HBASE-20836 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Fix For: HBASE-18477 > > Attachments: HBASE-20836.HBASE-18477.001.patch > > > Found via nightly builds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20836) Add Yetus annotation for ReadReplicaClustersTableNameUtil
[ https://issues.apache.org/jira/browse/HBASE-20836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20836: -- Status: Patch Available (was: Open) > Add Yetus annotation for ReadReplicaClustersTableNameUtil > - > > Key: HBASE-20836 > URL: https://issues.apache.org/jira/browse/HBASE-20836 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Fix For: HBASE-18477 > > Attachments: HBASE-20836.HBASE-18477.001.patch > > > Found via nightly builds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20836) Add Yetus annotation for ReadReplicaClustersTableNameUtil
[ https://issues.apache.org/jira/browse/HBASE-20836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20836: -- Attachment: HBASE-20836.HBASE-18477.001.patch > Add Yetus annotation for ReadReplicaClustersTableNameUtil > - > > Key: HBASE-20836 > URL: https://issues.apache.org/jira/browse/HBASE-20836 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Fix For: HBASE-18477 > > Attachments: HBASE-20836.HBASE-18477.001.patch > > > Found via nightly builds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20836) Add Yetus annotation for ReadReplicaClustersTableNameUtil
Zach York created HBASE-20836: - Summary: Add Yetus annotation for ReadReplicaClustersTableNameUtil Key: HBASE-20836 URL: https://issues.apache.org/jira/browse/HBASE-20836 Project: HBase Issue Type: Sub-task Affects Versions: HBASE-18477 Reporter: Zach York Assignee: Zach York Fix For: HBASE-18477 Found via nightly builds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18477) Umbrella JIRA for HBase Read Replica clusters
[ https://issues.apache.org/jira/browse/HBASE-18477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530432#comment-16530432 ] Zach York commented on HBASE-18477: --- [~yuzhih...@gmail.com] Thanks for pointing out. I will address that. > Umbrella JIRA for HBase Read Replica clusters > - > > Key: HBASE-18477 > URL: https://issues.apache.org/jira/browse/HBASE-18477 > Project: HBase > Issue Type: New Feature >Reporter: Zach York >Assignee: Zach York >Priority: Major > Attachments: HBase Read-Replica Clusters Scope doc.docx, HBase > Read-Replica Clusters Scope doc.pdf, HBase Read-Replica Clusters Scope > doc_v2.docx, HBase Read-Replica Clusters Scope doc_v2.pdf > > > Recently, changes (such as HBASE-17437) have unblocked HBase to run with a > root directory external to the cluster (such as in Amazon S3). This means > that the data is stored outside of the cluster and can be accessible after > the cluster has been terminated. One use case that is often asked about is > pointing multiple clusters to one root directory (sharing the data) to have > read resiliency in the case of a cluster failure. > > This JIRA is an umbrella JIRA to contain all the tasks necessary to create a > read-replica HBase cluster that is pointed at the same root directory. > > This requires making the Read-Replica cluster Read-Only (no metadata > operation or data operations). > Separating the hbase:meta table for each cluster (Otherwise HBase gets > confused with multiple clusters trying to update the meta table with their ip > addresses) > Adding refresh functionality for the meta table to ensure new metadata is > picked up on the read replica cluster. > Adding refresh functionality for HFiles for a given table to ensure new data > is picked up on the read replica cluster. > > This can be used with any existing cluster that is backed by an external > filesystem. > > Please note that this feature is still quite manual (with the potential for > automation later). > > More information on this particular feature can be found here: > https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20557) Backport HBASE-17215 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-20557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526983#comment-16526983 ] Zach York commented on HBASE-20557: --- Ah you probably mean this: [https://github.com/apache/hbase/blob/master/dev-support/checkcompatibility.py] I'll try that. > Backport HBASE-17215 to branch-1 > > > Key: HBASE-20557 > URL: https://issues.apache.org/jira/browse/HBASE-20557 > Project: HBase > Issue Type: Sub-task > Components: HFile, master >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Major > Attachments: HBASE-20557.branch-1.001.patch, > HBASE-20557.branch-1.002.patch, HBASE-20557.branch-1.003.patch > > > As part of HBASE-20555, HBASE-17215 is the second patch that is needed for > backporting HBASE-18083 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20557) Backport HBASE-17215 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-20557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526976#comment-16526976 ] Zach York commented on HBASE-20557: --- [~apurtell] I'm leaning towards not including this on branch-1.4 since this by default creates two threads for deletion (one for large and one for small HFiles) so there is no way to only have a single thread deleting anymore. What are your thoughts? Also regarding your comment on api checker is this a tool I can run? I'm not familiar with it. (I looked through the plugins but didn't see one that jumped out immediately). > Backport HBASE-17215 to branch-1 > > > Key: HBASE-20557 > URL: https://issues.apache.org/jira/browse/HBASE-20557 > Project: HBase > Issue Type: Sub-task > Components: HFile, master >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Major > Attachments: HBASE-20557.branch-1.001.patch, > HBASE-20557.branch-1.002.patch, HBASE-20557.branch-1.003.patch > > > As part of HBASE-20555, HBASE-17215 is the second patch that is needed for > backporting HBASE-18083 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20557) Backport HBASE-17215 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-20557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526889#comment-16526889 ] Zach York commented on HBASE-20557: --- +1, reviewed on PR. > Backport HBASE-17215 to branch-1 > > > Key: HBASE-20557 > URL: https://issues.apache.org/jira/browse/HBASE-20557 > Project: HBase > Issue Type: Sub-task > Components: HFile, master >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Major > Attachments: HBASE-20557.branch-1.001.patch, > HBASE-20557.branch-1.002.patch, HBASE-20557.branch-1.003.patch > > > As part of HBASE-20555, HBASE-17215 is the second patch that is needed for > backporting HBASE-18083 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20789) TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky
[ https://issues.apache.org/jira/browse/HBASE-20789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526844#comment-16526844 ] Zach York commented on HBASE-20789: --- [~openinx] {quote}bq. If the existingBlock has nextBlockOnDiskSize set , while cachedItem has nextBlockOnDiskSize(default = -1) unset, the comparison should be positive number ? So there is a typo ? {quote} No, cachedItem will be smaller in that case and so the comparison will be -1. I think this is why you were having difficulty getting the tests to pass. Please flip the '>' back to a '<' > TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky > --- > > Key: HBASE-20789 > URL: https://issues.apache.org/jira/browse/HBASE-20789 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.1.0, 1.5.0, 1.4.6, 2.0.2 > > Attachments: > 0001-HBASE-20789-TestBucketCache-testCacheBlockNextBlockM.patch, > HBASE-20789.v1.patch, HBASE-20789.v2.patch, bucket-33718.out > > > The UT failed frequently in our internal branch-2... Will dig into the UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-18840) Add functionality to refresh meta table at master startup
[ https://issues.apache.org/jira/browse/HBASE-18840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16525571#comment-16525571 ] Zach York edited comment on HBASE-18840 at 6/27/18 8:38 PM: Trying to rebase this patch on the latest master and it looks like this commit [1] removed the (quite helpful) mutateRegions method, but doesn't seem to get a reason for removing it. [~stack] or [~appy] do you have context on why it was removed and what the replacement is? Do I need to call add and delete separately now? [1] https://github.com/apache/hbase/commit/8ec0aa0d709ced78331dd61d28c79f3433198227#diff-081750e39413c3b1930fc9952ed0d920L2081 was (Author: zyork): Trying to rebase this patch on the latest master and it looks like this commit [1] removed the (quite helpful) mutateRegions method, but doesn't seem to get a reason for removing it. [~stack] or [~appy] do you have context on why it was removed and what the replacement is? Do I need to call add and delete separately now? > Add functionality to refresh meta table at master startup > - > > Key: HBASE-18840 > URL: https://issues.apache.org/jira/browse/HBASE-18840 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Attachments: HBASE-18840.HBASE-18477.001.patch, > HBASE-18840.HBASE-18477.002.patch, HBASE-18840.HBASE-18477.003 (2) (1).patch, > HBASE-18840.HBASE-18477.003 (2).patch, HBASE-18840.HBASE-18477.003.patch, > HBASE-18840.HBASE-18477.004.patch > > > If a HBase cluster’s hbase:meta table is deleted or a cluster is started with > a new meta table, HBase needs the functionality to synchronize it’s metadata > from Storage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18840) Add functionality to refresh meta table at master startup
[ https://issues.apache.org/jira/browse/HBASE-18840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16525571#comment-16525571 ] Zach York commented on HBASE-18840: --- Trying to rebase this patch on the latest master and it looks like this commit [1] removed the (quite helpful) mutateRegions method, but doesn't seem to get a reason for removing it. [~stack] or [~appy] do you have context on why it was removed and what the replacement is? Do I need to call add and delete separately now? > Add functionality to refresh meta table at master startup > - > > Key: HBASE-18840 > URL: https://issues.apache.org/jira/browse/HBASE-18840 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Attachments: HBASE-18840.HBASE-18477.001.patch, > HBASE-18840.HBASE-18477.002.patch, HBASE-18840.HBASE-18477.003 (2) (1).patch, > HBASE-18840.HBASE-18477.003 (2).patch, HBASE-18840.HBASE-18477.003.patch, > HBASE-18840.HBASE-18477.004.patch > > > If a HBase cluster’s hbase:meta table is deleted or a cluster is started with > a new meta table, HBase needs the functionality to synchronize it’s metadata > from Storage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20789) TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky
[ https://issues.apache.org/jira/browse/HBASE-20789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16525381#comment-16525381 ] Zach York commented on HBASE-20789: --- [~openinx] Ah... I see that putIfAbsent now. I wonder why my testing didn't uncover that (I ran it many times. I must have just gotten lucky :) ) [~Apache9] caching an already cached block was present long before HBASE-20447 (see [2]). It seems that to fix your memory leak case we need to add locking on the key. Does this need to be a putIfAbsent? What is the harm in replacing the key if it is in the ramCache and hasn't been persisted yet? [2] [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java#L440] > TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky > --- > > Key: HBASE-20789 > URL: https://issues.apache.org/jira/browse/HBASE-20789 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: bucket-33718.out > > > The UT failed frequently in our internal branch-2... Will dig into the UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20799) TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky
[ https://issues.apache.org/jira/browse/HBASE-20799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16525321#comment-16525321 ] Zach York commented on HBASE-20799: --- [~apurtell] see HBASE-20789. This is already being tracked there. I'll take a look into that soon. > TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky > --- > > Key: HBASE-20799 > URL: https://issues.apache.org/jira/browse/HBASE-20799 > Project: HBase > Issue Type: Bug >Affects Versions: 1.5.0, 1.4.5 >Reporter: Andrew Purtell >Priority: Major > > {noformat} > [ERROR] testCacheBlockNextBlockMetadataMissing[1: blockSize=16,384, > bucketSizes=[I@29ee9faa](org.apache.hadoop.hbase.io.hfile.bucket.TestBucketCache) > Time elapsed: 0.066 s <<< FAILURE! > java.lang.AssertionError: expected: > java.nio.HeapByteBuffer but > was: java.nio.HeapByteBuffer > at > org.apache.hadoop.hbase.io.hfile.bucket.TestBucketCache.testCacheBlockNextBlockMetadataMissing(TestBucketCache.java:424) > {noformat} > [~zyork] any idea what is going on here? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-18840) Add functionality to refresh meta table at master startup
[ https://issues.apache.org/jira/browse/HBASE-18840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-18840: -- Attachment: HBASE-18840.HBASE-18477.004.patch > Add functionality to refresh meta table at master startup > - > > Key: HBASE-18840 > URL: https://issues.apache.org/jira/browse/HBASE-18840 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Attachments: HBASE-18840.HBASE-18477.001.patch, > HBASE-18840.HBASE-18477.002.patch, HBASE-18840.HBASE-18477.003 (2) (1).patch, > HBASE-18840.HBASE-18477.003 (2).patch, HBASE-18840.HBASE-18477.003.patch, > HBASE-18840.HBASE-18477.004.patch > > > If a HBase cluster’s hbase:meta table is deleted or a cluster is started with > a new meta table, HBase needs the functionality to synchronize it’s metadata > from Storage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18477) Umbrella JIRA for HBase Read Replica clusters
[ https://issues.apache.org/jira/browse/HBASE-18477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524199#comment-16524199 ] Zach York commented on HBASE-18477: --- [~busbey] I'm going to pick up this work again as I'd like to avoid long term code maintenance. What are the remaining functionality/conceptual issues to be addressed? Also I'm starting to think that it doesn't make sense for these features to be in a feature branch as none of them are being turned on by default and keeping them in a feature branch increases the code maintenance aspect of the feature (I'd like to spend more time actually improving it rather than rebasing :) ). Thanks for everyone's reviews so far! > Umbrella JIRA for HBase Read Replica clusters > - > > Key: HBASE-18477 > URL: https://issues.apache.org/jira/browse/HBASE-18477 > Project: HBase > Issue Type: New Feature >Reporter: Zach York >Assignee: Zach York >Priority: Major > Attachments: HBase Read-Replica Clusters Scope doc.docx, HBase > Read-Replica Clusters Scope doc.pdf, HBase Read-Replica Clusters Scope > doc_v2.docx, HBase Read-Replica Clusters Scope doc_v2.pdf > > > Recently, changes (such as HBASE-17437) have unblocked HBase to run with a > root directory external to the cluster (such as in Amazon S3). This means > that the data is stored outside of the cluster and can be accessible after > the cluster has been terminated. One use case that is often asked about is > pointing multiple clusters to one root directory (sharing the data) to have > read resiliency in the case of a cluster failure. > > This JIRA is an umbrella JIRA to contain all the tasks necessary to create a > read-replica HBase cluster that is pointed at the same root directory. > > This requires making the Read-Replica cluster Read-Only (no metadata > operation or data operations). > Separating the hbase:meta table for each cluster (Otherwise HBase gets > confused with multiple clusters trying to update the meta table with their ip > addresses) > Adding refresh functionality for the meta table to ensure new metadata is > picked up on the read replica cluster. > Adding refresh functionality for HFiles for a given table to ensure new data > is picked up on the read replica cluster. > > This can be used with any existing cluster that is backed by an external > filesystem. > > Please note that this feature is still quite manual (with the potential for > automation later). > > More information on this particular feature can be found here: > https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20789) TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky
[ https://issues.apache.org/jira/browse/HBASE-20789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524195#comment-16524195 ] Zach York commented on HBASE-20789: --- [~yuzhih...@gmail.com] None of those build links actually load for me... > TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky > --- > > Key: HBASE-20789 > URL: https://issues.apache.org/jira/browse/HBASE-20789 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > > The UT failed frequently in our internal branch-2... Will dig into the UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20787) Rebase the HBASE-18477 onto the current master to continue dev
[ https://issues.apache.org/jira/browse/HBASE-20787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York resolved HBASE-20787. --- Resolution: Fixed > Rebase the HBASE-18477 onto the current master to continue dev > -- > > Key: HBASE-20787 > URL: https://issues.apache.org/jira/browse/HBASE-20787 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Minor > Fix For: HBASE-18477 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20787) Rebase the HBASE-18477 onto the current master to continue dev
[ https://issues.apache.org/jira/browse/HBASE-20787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524187#comment-16524187 ] Zach York commented on HBASE-20787: --- Did a force push to clean the branch up. > Rebase the HBASE-18477 onto the current master to continue dev > -- > > Key: HBASE-20787 > URL: https://issues.apache.org/jira/browse/HBASE-20787 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Minor > Fix For: HBASE-18477 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20789) TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky
[ https://issues.apache.org/jira/browse/HBASE-20789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524184#comment-16524184 ] Zach York commented on HBASE-20789: --- Sorry the comment wasn't updated, I think I had updated the comment locally, but it must not have been pushed out. Basically there are 3 cases here: equality (0) -> these blocks are exactly the same, no issue. (-1) -> The existing block has nextBlockOnDiskSize set so we will get performance gains by keeping that version. (1) -> The new block has nextBlockOnDiskSize set so it makes sense to cache the new version Please let me know if anything is unclear, I can try to clear it up and I can try to improve this logging. Where is the test failing? AFAIK there shouldn't be much flakiness in this test, but let's fix it if there is. Thanks for digging in! > TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky > --- > > Key: HBASE-20789 > URL: https://issues.apache.org/jira/browse/HBASE-20789 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > > The UT failed frequently in our internal branch-2... Will dig into the UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20787) Rebase the HBASE-18477 onto the current master to continue dev
[ https://issues.apache.org/jira/browse/HBASE-20787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522979#comment-16522979 ] Zach York commented on HBASE-20787: --- I will also remove the various commits/reverts of the initial patch to simplify things. > Rebase the HBASE-18477 onto the current master to continue dev > -- > > Key: HBASE-20787 > URL: https://issues.apache.org/jira/browse/HBASE-20787 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Minor > Fix For: HBASE-18477 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20787) Rebase the HBASE-18477 onto the current master to continue dev
[ https://issues.apache.org/jira/browse/HBASE-20787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20787: -- Issue Type: Sub-task (was: Task) Parent: HBASE-18477 > Rebase the HBASE-18477 onto the current master to continue dev > -- > > Key: HBASE-20787 > URL: https://issues.apache.org/jira/browse/HBASE-20787 > Project: HBase > Issue Type: Sub-task >Affects Versions: HBASE-18477 >Reporter: Zach York >Assignee: Zach York >Priority: Minor > Fix For: HBASE-18477 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20787) Rebase the HBASE-18477 onto the current master to continue dev
Zach York created HBASE-20787: - Summary: Rebase the HBASE-18477 onto the current master to continue dev Key: HBASE-20787 URL: https://issues.apache.org/jira/browse/HBASE-20787 Project: HBase Issue Type: Task Affects Versions: HBASE-18477 Reporter: Zach York Assignee: Zach York Fix For: HBASE-18477 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir
[ https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York reassigned HBASE-20734: - Assignee: Zach York > Colocate recovered edits directory with hbase.wal.dir > - > > Key: HBASE-20734 > URL: https://issues.apache.org/jira/browse/HBASE-20734 > Project: HBase > Issue Type: Improvement > Components: MTTR, Recovery, wal >Reporter: Ted Yu >Assignee: Zach York >Priority: Major > Fix For: 3.0.0 > > > During investigation of HBASE-20723, I realized that we wouldn't get the best > performance when hbase.wal.dir is configured to be on different (fast) media > than hbase rootdir w.r.t. recovered edits since recovered edits directory is > currently under rootdir. > Such setup may not result in fast recovery when there is region server > failover. > This issue is to find proper (hopefully backward compatible) way in > colocating recovered edits directory with hbase.wal.dir . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir
[ https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514190#comment-16514190 ] Zach York commented on HBASE-20734: --- I looked into the code for this and the challenge is that Region has no concept of walFS and the regionDir is determined from the HRegionFileSystem... I'll continue to look into how we can do this. Hopefully without changing the Region contract. > Colocate recovered edits directory with hbase.wal.dir > - > > Key: HBASE-20734 > URL: https://issues.apache.org/jira/browse/HBASE-20734 > Project: HBase > Issue Type: Improvement > Components: MTTR, Recovery, wal >Reporter: Ted Yu >Priority: Major > > During investigation of HBASE-20723, I realized that we wouldn't get the best > performance when hbase.wal.dir is configured to be on different (fast) media > than hbase rootdir w.r.t. recovered edits since recovered edits directory is > currently under rootdir. > Such setup may not result in fast recovery when there is region server > failover. > This issue is to find proper (hopefully backward compatible) way in > colocating recovered edits directory with hbase.wal.dir . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20723) Custom hbase.wal.dir results in dataloss because we write recovered edits into a different place than where the recovering region server looks for them.
[ https://issues.apache.org/jira/browse/HBASE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514187#comment-16514187 ] Zach York commented on HBASE-20723: --- +1 on the patch once the checkstyle is fixed. I'll push tonight if nobody objects. > Custom hbase.wal.dir results in dataloss because we write recovered edits > into a different place than where the recovering region server looks for them. > > > Key: HBASE-20723 > URL: https://issues.apache.org/jira/browse/HBASE-20723 > Project: HBase > Issue Type: Bug > Components: Recovery, wal >Affects Versions: 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 2.0.0 >Reporter: Rohan Pednekar >Assignee: Ted Yu >Priority: Critical > Attachments: 20723.v1.txt, 20723.v2.txt, 20723.v3.txt, 20723.v4.txt, > 20723.v5.txt, 20723.v5.txt, 20723.v6.txt, 20723.v7.txt, 20723.v8.txt, > 20723.v9.txt, logs.zip > > > Description: > When custom hbase.wal.dir is configured the recovery system uses it in place > of the HBase root dir and thus constructs an incorrect path for recovered > edits when splitting WALs. This causes the recovery code in Region Servers to > believe there are no recovered edits to replay, which causes a loss of writes > that had not flushed prior to loss of a server. > > Reproduction: > This is an Azure HDInsight HBase cluster with HDP 2.6. and HBase > 1.1.2.2.6.3.2-14 > By default the underlying data is going to wasb://x@y/hbase > I tried to move WAL folders to HDFS, which is the SSD mounted on each VM at > /mnt. > hbase.wal.dir= hdfs://mycluster/walontest > hbase.wal.dir.perms=700 > hbase.rootdir.perms=700 > hbase.rootdir= > wasb://XYZ[@hbaseperf.core.net|mailto:duohbase5ds...@duohbaseperf.blob.core.windows.net]/hbase > Procedure to reproduce this issue: > 1. create a table in hbase shell > 2. insert a row in hbase shell > 3. reboot the VM which hosts that region > 4. scan the table in hbase shell and it is empty > Looking at the region server logs: > {code:java} > 2018-06-12 22:08:40,455 INFO [RS_LOG_REPLAY_OPS-wn2-duohba:16020-0-Writer-1] > wal.WALSplitter: This region's directory doesn't exist: > hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648. > It is very likely that it was already split so it's safe to discard those > edits. > {code} > The log split/replay ignored actual WAL due to WALSplitter is looking for the > region directory in the hbase.wal.dir we specified rather than the > hbase.rootdir. > Looking at the source code, > > [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALSplitter.java] > it uses the rootDir, which is walDir, as the tableDir root path. > So if we use HBASE-17437, waldir and hbase rootdir are in different path or > even in different filesystem, then the #5 uses walDir as tableDir is > apparently wrong. > CC: [~zyork], [~yuzhih...@gmail.com] Attached the logs for quick review. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20723) Custom hbase.wal.dir results in dataloss because we write recovered edits into a different place than where the recovering region server looks for them.
[ https://issues.apache.org/jira/browse/HBASE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20723: -- Description: Description: When custom hbase.wal.dir is configured the recovery system uses it in place of the HBase root dir and thus constructs an incorrect path for recovered edits when splitting WALs. This causes the recovery code in Region Servers to believe there are no recovered edits to replay, which causes a loss of writes that had not flushed prior to loss of a server. Reproduction: This is an Azure HDInsight HBase cluster with HDP 2.6. and HBase 1.1.2.2.6.3.2-14 By default the underlying data is going to wasb://x@y/hbase I tried to move WAL folders to HDFS, which is the SSD mounted on each VM at /mnt. hbase.wal.dir= hdfs://mycluster/walontest hbase.wal.dir.perms=700 hbase.rootdir.perms=700 hbase.rootdir= wasb://XYZ[@hbaseperf.core.net|mailto:duohbase5ds...@duohbaseperf.blob.core.windows.net]/hbase Procedure to reproduce this issue: 1. create a table in hbase shell 2. insert a row in hbase shell 3. reboot the VM which hosts that region 4. scan the table in hbase shell and it is empty Looking at the region server logs: {code:java} 2018-06-12 22:08:40,455 INFO [RS_LOG_REPLAY_OPS-wn2-duohba:16020-0-Writer-1] wal.WALSplitter: This region's directory doesn't exist: hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648. It is very likely that it was already split so it's safe to discard those edits. {code} The log split/replay ignored actual WAL due to WALSplitter is looking for the region directory in the hbase.wal.dir we specified rather than the hbase.rootdir. Looking at the source code, [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALSplitter.java] it uses the rootDir, which is walDir, as the tableDir root path. So if we use HBASE-17437, waldir and hbase rootdir are in different path or even in different filesystem, then the #5 uses walDir as tableDir is apparently wrong. CC: [~zyork], [~yuzhih...@gmail.com] Attached the logs for quick review. was: This is an Azure HDInsight HBase cluster with HDP 2.6. and HBase 1.1.2.2.6.3.2-14 By default the underlying data is going to wasb://x@y/hbase I tried to move WAL folders to HDFS, which is the SSD mounted on each VM at /mnt. hbase.wal.dir= hdfs://mycluster/walontest hbase.wal.dir.perms=700 hbase.rootdir.perms=700 hbase.rootdir= wasb://XYZ[@hbaseperf.core.net|mailto:duohbase5ds...@duohbaseperf.blob.core.windows.net]/hbase Procedure to reproduce this issue: 1. create a table in hbase shell 2. insert a row in hbase shell 3. reboot the VM which hosts that region 4. scan the table in hbase shell and it is empty Looking at the region server logs: {code:java} 2018-06-12 22:08:40,455 INFO [RS_LOG_REPLAY_OPS-wn2-duohba:16020-0-Writer-1] wal.WALSplitter: This region's directory doesn't exist: hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648. It is very likely that it was already split so it's safe to discard those edits. {code} The log split/replay ignored actual WAL due to WALSplitter is looking for the region directory in the hbase.wal.dir we specified rather than the hbase.rootdir. Looking at the source code, https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALSplitter.java it uses the rootDir, which is walDir, as the tableDir root path. So if we use HBASE-17437, waldir and hbase rootdir are in different path or even in different filesystem, then the #5 uses walDir as tableDir is apparently wrong. CC: [~zyork], [~yuzhih...@gmail.com] Attached the logs for quick review. > Custom hbase.wal.dir results in dataloss because we write recovered edits > into a different place than where the recovering region server looks for them. > > > Key: HBASE-20723 > URL: https://issues.apache.org/jira/browse/HBASE-20723 > Project: HBase > Issue Type: Bug > Components: Recovery, wal >Affects Versions: 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 2.0.0 >Reporter: Rohan Pednekar >Assignee: Ted Yu >Priority: Critical > Attachments: 20723.v1.txt, 20723.v2.txt, 20723.v3.txt, 20723.v4.txt, > 20723.v5.txt, 20723.v5.txt, 20723.v6.txt, 20723.v7.txt, 20723.v8.txt, > 20723.v9.txt, logs.zip > > > Description: > When custom hbase.wal.dir is configured the recovery system uses it in place > of the HBase root dir and thus constructs an incorrect path for recovered > edits when splitting WALs. This causes the recovery code in Region Servers to > believe there are no recovered edits to replay, which causes
[jira] [Updated] (HBASE-20723) Custom hbase.wal.dir results in dataloss because we write recovered edits into a different place than where the recovering region server looks for them.
[ https://issues.apache.org/jira/browse/HBASE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20723: -- Summary: Custom hbase.wal.dir results in dataloss because we write recovered edits into a different place than where the recovering region server looks for them. (was: WALSplitter uses the rootDir, which is walDir, as the recovered edits root path) > Custom hbase.wal.dir results in dataloss because we write recovered edits > into a different place than where the recovering region server looks for them. > > > Key: HBASE-20723 > URL: https://issues.apache.org/jira/browse/HBASE-20723 > Project: HBase > Issue Type: Bug > Components: Recovery, wal >Affects Versions: 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 2.0.0 >Reporter: Rohan Pednekar >Assignee: Ted Yu >Priority: Critical > Attachments: 20723.v1.txt, 20723.v2.txt, 20723.v3.txt, 20723.v4.txt, > 20723.v5.txt, 20723.v5.txt, 20723.v6.txt, 20723.v7.txt, 20723.v8.txt, > 20723.v9.txt, logs.zip > > > This is an Azure HDInsight HBase cluster with HDP 2.6. and HBase > 1.1.2.2.6.3.2-14 > By default the underlying data is going to wasb://x@y/hbase > I tried to move WAL folders to HDFS, which is the SSD mounted on each VM at > /mnt. > hbase.wal.dir= hdfs://mycluster/walontest > hbase.wal.dir.perms=700 > hbase.rootdir.perms=700 > hbase.rootdir= > wasb://XYZ[@hbaseperf.core.net|mailto:duohbase5ds...@duohbaseperf.blob.core.windows.net]/hbase > Procedure to reproduce this issue: > 1. create a table in hbase shell > 2. insert a row in hbase shell > 3. reboot the VM which hosts that region > 4. scan the table in hbase shell and it is empty > Looking at the region server logs: > {code:java} > 2018-06-12 22:08:40,455 INFO [RS_LOG_REPLAY_OPS-wn2-duohba:16020-0-Writer-1] > wal.WALSplitter: This region's directory doesn't exist: > hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648. > It is very likely that it was already split so it's safe to discard those > edits. > {code} > The log split/replay ignored actual WAL due to WALSplitter is looking for the > region directory in the hbase.wal.dir we specified rather than the > hbase.rootdir. > Looking at the source code, > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALSplitter.java > it uses the rootDir, which is walDir, as the tableDir root path. > So if we use HBASE-17437, waldir and hbase rootdir are in different path or > even in different filesystem, then the #5 uses walDir as tableDir is > apparently wrong. > CC: [~zyork], [~yuzhih...@gmail.com] Attached the logs for quick review. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20723) WALSplitter uses the rootDir, which is walDir, as the recovered edits root path
[ https://issues.apache.org/jira/browse/HBASE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514172#comment-16514172 ] Zach York commented on HBASE-20723: --- Thanks for the summary [~busbey], one minor tweak: {quote}which causes a loss writes that had not flushed prior to loss of a server. {quote} which causes a loss of writes that had not flushed prior to loss of a server. [~elserj] I'll make a comment on the vote thread, but I do agree with your sentiment. Andrew has been doing good work with keeping the releases regular :) > WALSplitter uses the rootDir, which is walDir, as the recovered edits root > path > --- > > Key: HBASE-20723 > URL: https://issues.apache.org/jira/browse/HBASE-20723 > Project: HBase > Issue Type: Bug > Components: Recovery, wal >Affects Versions: 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 2.0.0 >Reporter: Rohan Pednekar >Assignee: Ted Yu >Priority: Critical > Attachments: 20723.v1.txt, 20723.v2.txt, 20723.v3.txt, 20723.v4.txt, > 20723.v5.txt, 20723.v5.txt, 20723.v6.txt, 20723.v7.txt, 20723.v8.txt, > 20723.v9.txt, logs.zip > > > This is an Azure HDInsight HBase cluster with HDP 2.6. and HBase > 1.1.2.2.6.3.2-14 > By default the underlying data is going to wasb://x@y/hbase > I tried to move WAL folders to HDFS, which is the SSD mounted on each VM at > /mnt. > hbase.wal.dir= hdfs://mycluster/walontest > hbase.wal.dir.perms=700 > hbase.rootdir.perms=700 > hbase.rootdir= > wasb://XYZ[@hbaseperf.core.net|mailto:duohbase5ds...@duohbaseperf.blob.core.windows.net]/hbase > Procedure to reproduce this issue: > 1. create a table in hbase shell > 2. insert a row in hbase shell > 3. reboot the VM which hosts that region > 4. scan the table in hbase shell and it is empty > Looking at the region server logs: > {code:java} > 2018-06-12 22:08:40,455 INFO [RS_LOG_REPLAY_OPS-wn2-duohba:16020-0-Writer-1] > wal.WALSplitter: This region's directory doesn't exist: > hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648. > It is very likely that it was already split so it's safe to discard those > edits. > {code} > The log split/replay ignored actual WAL due to WALSplitter is looking for the > region directory in the hbase.wal.dir we specified rather than the > hbase.rootdir. > Looking at the source code, > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALSplitter.java > it uses the rootDir, which is walDir, as the tableDir root path. > So if we use HBASE-17437, waldir and hbase rootdir are in different path or > even in different filesystem, then the #5 uses walDir as tableDir is > apparently wrong. > CC: [~zyork], [~yuzhih...@gmail.com] Attached the logs for quick review. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir
[ https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513169#comment-16513169 ] Zach York commented on HBASE-20734: --- Before HBASE-20723 goes in, there is no chance of this happening, right? Currently it will fail if hbase.wal.dir is set to anything but the default. We could remove the headache of BC if we fixed this right the first time. > Colocate recovered edits directory with hbase.wal.dir > - > > Key: HBASE-20734 > URL: https://issues.apache.org/jira/browse/HBASE-20734 > Project: HBase > Issue Type: Improvement > Components: MTTR, Recovery, wal >Reporter: Ted Yu >Priority: Major > > During investigation of HBASE-20723, I realized that we wouldn't get the best > performance when hbase.wal.dir is configured to be on different (fast) media > than hbase rootdir w.r.t. recovered edits since recovered edits directory is > currently under rootdir. > Such setup may not result in fast recovery when there is region server > failover. > This issue is to find proper (hopefully backward compatible) way in > colocating recovered edits directory with hbase.wal.dir . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20723) WALSplitter uses the rootDir, which is walDir, as the tableDir root path.
[ https://issues.apache.org/jira/browse/HBASE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513113#comment-16513113 ] Zach York commented on HBASE-20723: --- Removed 1.1.2 from affects since the backport isn't in the public repo. This affects 1.4.0+ > WALSplitter uses the rootDir, which is walDir, as the tableDir root path. > - > > Key: HBASE-20723 > URL: https://issues.apache.org/jira/browse/HBASE-20723 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 2.0.0 >Reporter: Rohan Pednekar >Assignee: Ted Yu >Priority: Major > Attachments: 20723.v1.txt, 20723.v2.txt, 20723.v3.txt, 20723.v4.txt, > 20723.v5.txt, 20723.v5.txt, 20723.v6.txt, 20723.v7.txt, 20723.v8.txt, logs.zip > > > This is an Azure HDInsight HBase cluster with HDP 2.6. and HBase > 1.1.2.2.6.3.2-14 > By default the underlying data is going to wasb://x@y/hbase > I tried to move WAL folders to HDFS, which is the SSD mounted on each VM at > /mnt. > hbase.wal.dir= hdfs://mycluster/walontest > hbase.wal.dir.perms=700 > hbase.rootdir.perms=700 > hbase.rootdir= > wasb://XYZ[@hbaseperf.core.net|mailto:duohbase5ds...@duohbaseperf.blob.core.windows.net]/hbase > Procedure to reproduce this issue: > 1. create a table in hbase shell > 2. insert a row in hbase shell > 3. reboot the VM which hosts that region > 4. scan the table in hbase shell and it is empty > Looking at the region server logs: > {code:java} > 2018-06-12 22:08:40,455 INFO [RS_LOG_REPLAY_OPS-wn2-duohba:16020-0-Writer-1] > wal.WALSplitter: This region's directory doesn't exist: > hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648. > It is very likely that it was already split so it's safe to discard those > edits. > {code} > The log split/replay ignored actual WAL due to WALSplitter is looking for the > region directory in the hbase.wal.dir we specified rather than the > hbase.rootdir. > Looking at the source code, > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALSplitter.java > it uses the rootDir, which is walDir, as the tableDir root path. > So if we use HBASE-17437, waldir and hbase rootdir are in different path or > even in different filesystem, then the #5 uses walDir as tableDir is > apparently wrong. > CC: [~zyork], [~yuzhih...@gmail.com] Attached the logs for quick review. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20723) WALSplitter uses the rootDir, which is walDir, as the tableDir root path.
[ https://issues.apache.org/jira/browse/HBASE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20723: -- Affects Version/s: (was: 1.1.2) 1.4.0 1.4.1 1.4.2 1.4.3 1.4.4 2.0.0 > WALSplitter uses the rootDir, which is walDir, as the tableDir root path. > - > > Key: HBASE-20723 > URL: https://issues.apache.org/jira/browse/HBASE-20723 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 2.0.0 >Reporter: Rohan Pednekar >Assignee: Ted Yu >Priority: Major > Attachments: 20723.v1.txt, 20723.v2.txt, 20723.v3.txt, 20723.v4.txt, > 20723.v5.txt, 20723.v5.txt, 20723.v6.txt, 20723.v7.txt, 20723.v8.txt, logs.zip > > > This is an Azure HDInsight HBase cluster with HDP 2.6. and HBase > 1.1.2.2.6.3.2-14 > By default the underlying data is going to wasb://x@y/hbase > I tried to move WAL folders to HDFS, which is the SSD mounted on each VM at > /mnt. > hbase.wal.dir= hdfs://mycluster/walontest > hbase.wal.dir.perms=700 > hbase.rootdir.perms=700 > hbase.rootdir= > wasb://XYZ[@hbaseperf.core.net|mailto:duohbase5ds...@duohbaseperf.blob.core.windows.net]/hbase > Procedure to reproduce this issue: > 1. create a table in hbase shell > 2. insert a row in hbase shell > 3. reboot the VM which hosts that region > 4. scan the table in hbase shell and it is empty > Looking at the region server logs: > {code:java} > 2018-06-12 22:08:40,455 INFO [RS_LOG_REPLAY_OPS-wn2-duohba:16020-0-Writer-1] > wal.WALSplitter: This region's directory doesn't exist: > hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648. > It is very likely that it was already split so it's safe to discard those > edits. > {code} > The log split/replay ignored actual WAL due to WALSplitter is looking for the > region directory in the hbase.wal.dir we specified rather than the > hbase.rootdir. > Looking at the source code, > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALSplitter.java > it uses the rootDir, which is walDir, as the tableDir root path. > So if we use HBASE-17437, waldir and hbase rootdir are in different path or > even in different filesystem, then the #5 uses walDir as tableDir is > apparently wrong. > CC: [~zyork], [~yuzhih...@gmail.com] Attached the logs for quick review. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20723) WALSplitter uses the rootDir, which is walDir, as the tableDir root path.
[ https://issues.apache.org/jira/browse/HBASE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513107#comment-16513107 ] Zach York commented on HBASE-20723: --- [~yuzhih...@gmail.com] The patch looks good to me. Let's see what QA says. [~elserj] Do you think this is enough to reroll a 1.4.5 RC? It isn't a default config, but still quite serious for those that set this config. > WALSplitter uses the rootDir, which is walDir, as the tableDir root path. > - > > Key: HBASE-20723 > URL: https://issues.apache.org/jira/browse/HBASE-20723 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 1.1.2 >Reporter: Rohan Pednekar >Assignee: Ted Yu >Priority: Major > Attachments: 20723.v1.txt, 20723.v2.txt, 20723.v3.txt, 20723.v4.txt, > 20723.v5.txt, 20723.v5.txt, 20723.v6.txt, 20723.v7.txt, 20723.v8.txt, logs.zip > > > This is an Azure HDInsight HBase cluster with HDP 2.6. and HBase > 1.1.2.2.6.3.2-14 > By default the underlying data is going to wasb://x@y/hbase > I tried to move WAL folders to HDFS, which is the SSD mounted on each VM at > /mnt. > hbase.wal.dir= hdfs://mycluster/walontest > hbase.wal.dir.perms=700 > hbase.rootdir.perms=700 > hbase.rootdir= > wasb://XYZ[@hbaseperf.core.net|mailto:duohbase5ds...@duohbaseperf.blob.core.windows.net]/hbase > Procedure to reproduce this issue: > 1. create a table in hbase shell > 2. insert a row in hbase shell > 3. reboot the VM which hosts that region > 4. scan the table in hbase shell and it is empty > Looking at the region server logs: > {code:java} > 2018-06-12 22:08:40,455 INFO [RS_LOG_REPLAY_OPS-wn2-duohba:16020-0-Writer-1] > wal.WALSplitter: This region's directory doesn't exist: > hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648. > It is very likely that it was already split so it's safe to discard those > edits. > {code} > The log split/replay ignored actual WAL due to WALSplitter is looking for the > region directory in the hbase.wal.dir we specified rather than the > hbase.rootdir. > Looking at the source code, > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALSplitter.java > it uses the rootDir, which is walDir, as the tableDir root path. > So if we use HBASE-17437, waldir and hbase rootdir are in different path or > even in different filesystem, then the #5 uses walDir as tableDir is > apparently wrong. > CC: [~zyork], [~yuzhih...@gmail.com] Attached the logs for quick review. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir
[ https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513097#comment-16513097 ] Zach York commented on HBASE-20734: --- doesn't split log only run before region opening so shouldn't rolling upgrade work? Or is there a case where the log is split, not applied, and tries to split/recover again? > Colocate recovered edits directory with hbase.wal.dir > - > > Key: HBASE-20734 > URL: https://issues.apache.org/jira/browse/HBASE-20734 > Project: HBase > Issue Type: Improvement > Components: MTTR, Recovery, wal >Reporter: Ted Yu >Priority: Major > > During investigation of HBASE-20723, I realized that we wouldn't get the best > performance when hbase.wal.dir is configured to be on different (fast) media > than hbase rootdir w.r.t. recovered edits since recovered edits directory is > currently under rootdir. > Such setup may not result in fast recovery when there is region server > failover. > This issue is to find proper (hopefully backward compatible) way in > colocating recovered edits directory with hbase.wal.dir . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20723) WALSplitter uses the rootDir, which is walDir, as the tableDir root path.
[ https://issues.apache.org/jira/browse/HBASE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513067#comment-16513067 ] Zach York commented on HBASE-20723: --- Okay, let's do the quick fix here and look to HBASE-20734 for the longer term solution. In regards to your existing code, can you change all other occurrences of rootDir (where it means walDir) to walDir to avoid confusion in the mean time? Let me start working on seeing if moving recovered edits to the WAL dir fixes HBASE-20734. > WALSplitter uses the rootDir, which is walDir, as the tableDir root path. > - > > Key: HBASE-20723 > URL: https://issues.apache.org/jira/browse/HBASE-20723 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 1.1.2 >Reporter: Rohan Pednekar >Assignee: Ted Yu >Priority: Major > Attachments: 20723.v1.txt, 20723.v2.txt, 20723.v3.txt, 20723.v4.txt, > 20723.v5.txt, 20723.v5.txt, 20723.v6.txt, 20723.v7.txt, logs.zip > > > This is an Azure HDInsight HBase cluster with HDP 2.6. and HBase > 1.1.2.2.6.3.2-14 > By default the underlying data is going to wasb://x@y/hbase > I tried to move WAL folders to HDFS, which is the SSD mounted on each VM at > /mnt. > hbase.wal.dir= hdfs://mycluster/walontest > hbase.wal.dir.perms=700 > hbase.rootdir.perms=700 > hbase.rootdir= > wasb://XYZ[@hbaseperf.core.net|mailto:duohbase5ds...@duohbaseperf.blob.core.windows.net]/hbase > Procedure to reproduce this issue: > 1. create a table in hbase shell > 2. insert a row in hbase shell > 3. reboot the VM which hosts that region > 4. scan the table in hbase shell and it is empty > Looking at the region server logs: > {code:java} > 2018-06-12 22:08:40,455 INFO [RS_LOG_REPLAY_OPS-wn2-duohba:16020-0-Writer-1] > wal.WALSplitter: This region's directory doesn't exist: > hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648. > It is very likely that it was already split so it's safe to discard those > edits. > {code} > The log split/replay ignored actual WAL due to WALSplitter is looking for the > region directory in the hbase.wal.dir we specified rather than the > hbase.rootdir. > Looking at the source code, > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALSplitter.java > it uses the rootDir, which is walDir, as the tableDir root path. > So if we use HBASE-17437, waldir and hbase rootdir are in different path or > even in different filesystem, then the #5 uses walDir as tableDir is > apparently wrong. > CC: [~zyork], [~yuzhih...@gmail.com] Attached the logs for quick review. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20723) WALSplitter uses the rootDir, which is walDir, as the tableDir root path.
[ https://issues.apache.org/jira/browse/HBASE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513048#comment-16513048 ] Zach York commented on HBASE-20723: --- It's great that we have these JMX values, but unless an admin happens to look at these, there isn't anything calling out a potential issue. If we log the replay_num_ops for each recovery attempt, then maybe it would be useful. {quote}Once this logic error is fixed, I am not aware of other scenario where the message should be WARN. {quote} If only software development were that simple :)... There is no guarantee that this can't break again (obviously we will do our best) or a different edge case will break something like this. Unless there is a way to check with 100% certainty that this is expected behavior, this log line is still useful. Though I would feel better about leaving it at info if it were possible to see from some other log line that things might not be operating correctly. It seems too many assumptions are being made about the recovered edits directory. Related to the actual change... I've been thinking about this a bit and why does it make sense for WALs to be under hbase.wal.dir, but for recovered.edits (basically the split log) to be under the root directory. It seems to me that both should be under the hbase.wal.dir. > WALSplitter uses the rootDir, which is walDir, as the tableDir root path. > - > > Key: HBASE-20723 > URL: https://issues.apache.org/jira/browse/HBASE-20723 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 1.1.2 >Reporter: Rohan Pednekar >Assignee: Ted Yu >Priority: Major > Attachments: 20723.v1.txt, 20723.v2.txt, 20723.v3.txt, 20723.v4.txt, > 20723.v5.txt, 20723.v5.txt, 20723.v6.txt, 20723.v7.txt, logs.zip > > > This is an Azure HDInsight HBase cluster with HDP 2.6. and HBase > 1.1.2.2.6.3.2-14 > By default the underlying data is going to wasb://x@y/hbase > I tried to move WAL folders to HDFS, which is the SSD mounted on each VM at > /mnt. > hbase.wal.dir= hdfs://mycluster/walontest > hbase.wal.dir.perms=700 > hbase.rootdir.perms=700 > hbase.rootdir= > wasb://XYZ[@hbaseperf.core.net|mailto:duohbase5ds...@duohbaseperf.blob.core.windows.net]/hbase > Procedure to reproduce this issue: > 1. create a table in hbase shell > 2. insert a row in hbase shell > 3. reboot the VM which hosts that region > 4. scan the table in hbase shell and it is empty > Looking at the region server logs: > {code:java} > 2018-06-12 22:08:40,455 INFO [RS_LOG_REPLAY_OPS-wn2-duohba:16020-0-Writer-1] > wal.WALSplitter: This region's directory doesn't exist: > hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648. > It is very likely that it was already split so it's safe to discard those > edits. > {code} > The log split/replay ignored actual WAL due to WALSplitter is looking for the > region directory in the hbase.wal.dir we specified rather than the > hbase.rootdir. > Looking at the source code, > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALSplitter.java > it uses the rootDir, which is walDir, as the tableDir root path. > So if we use HBASE-17437, waldir and hbase rootdir are in different path or > even in different filesystem, then the #5 uses walDir as tableDir is > apparently wrong. > CC: [~zyork], [~yuzhih...@gmail.com] Attached the logs for quick review. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20723) WALSplitter uses the rootDir, which is walDir, as the tableDir root path.
[ https://issues.apache.org/jira/browse/HBASE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513019#comment-16513019 ] Zach York commented on HBASE-20723: --- [~yuzhih...@gmail.com] I did reproduce the issue on my side as well. Let me review your patch as well. I think going forward we need two things to prevent something like this happening again: # Tests that utilize hbase.wal.dir (on a different FS and path) to validate that edits are able to be replayed and logs are split from a user level (put, kill RS, restart RS, check to ensure edit is present in table). # Improve on this log messaging around here. There should be some indication of the number of records replayed or something as the current logging is easy to miss... Considering this log means that edits won't be applied for that region, this should at the very least be a WARN to indicate something potentially wrong happened. > WALSplitter uses the rootDir, which is walDir, as the tableDir root path. > - > > Key: HBASE-20723 > URL: https://issues.apache.org/jira/browse/HBASE-20723 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 1.1.2 >Reporter: Rohan Pednekar >Assignee: Ted Yu >Priority: Major > Attachments: 20723.v1.txt, 20723.v2.txt, 20723.v3.txt, 20723.v4.txt, > 20723.v5.txt, 20723.v5.txt, 20723.v6.txt, logs.zip > > > This is an Azure HDInsight HBase cluster with HDP 2.6. and HBase > 1.1.2.2.6.3.2-14 > By default the underlying data is going to wasb://x@y/hbase > I tried to move WAL folders to HDFS, which is the SSD mounted on each VM at > /mnt. > hbase.wal.dir= hdfs://mycluster/walontest > hbase.wal.dir.perms=700 > hbase.rootdir.perms=700 > hbase.rootdir= > wasb://XYZ[@hbaseperf.core.net|mailto:duohbase5ds...@duohbaseperf.blob.core.windows.net]/hbase > Procedure to reproduce this issue: > 1. create a table in hbase shell > 2. insert a row in hbase shell > 3. reboot the VM which hosts that region > 4. scan the table in hbase shell and it is empty > Looking at the region server logs: > {code:java} > 2018-06-12 22:08:40,455 INFO [RS_LOG_REPLAY_OPS-wn2-duohba:16020-0-Writer-1] > wal.WALSplitter: This region's directory doesn't exist: > hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648. > It is very likely that it was already split so it's safe to discard those > edits. > {code} > The log split/replay ignored actual WAL due to WALSplitter is looking for the > region directory in the hbase.wal.dir we specified rather than the > hbase.rootdir. > Looking at the source code, > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALSplitter.java > it uses the rootDir, which is walDir, as the tableDir root path. > So if we use HBASE-17437, waldir and hbase rootdir are in different path or > even in different filesystem, then the #5 uses walDir as tableDir is > apparently wrong. > CC: [~zyork], [~yuzhih...@gmail.com] Attached the logs for quick review. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20723) WALSplitter uses the rootDir, which is walDir, as the tableDir root path.
[ https://issues.apache.org/jira/browse/HBASE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511807#comment-16511807 ] Zach York commented on HBASE-20723: --- [~yuzhih...@gmail.com] I think I'm starting to see your point. Let me do a few tests tomorrow to confirm. > WALSplitter uses the rootDir, which is walDir, as the tableDir root path. > - > > Key: HBASE-20723 > URL: https://issues.apache.org/jira/browse/HBASE-20723 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 1.1.2 >Reporter: Rohan Pednekar >Priority: Major > Attachments: 20723.v1.txt, logs.zip > > > This is an Azure HDInsight HBase cluster with HDP 2.6. and HBase > 1.1.2.2.6.3.2-14 > By default the underlying data is going to wasb://x@y/hbase > I tried to move WAL folders to HDFS, which is the SSD mounted on each VM at > /mnt. > hbase.wal.dir= hdfs://mycluster/walontest > hbase.wal.dir.perms=700 > hbase.rootdir.perms=700 > hbase.rootdir= > wasb://XYZ[@hbaseperf.core.net|mailto:duohbase5ds...@duohbaseperf.blob.core.windows.net]/hbase > Procedure to reproduce this issue: > 1. create a table in hbase shell > 2. insert a row in hbase shell > 3. reboot the VM which hosts that region > 4. scan the table in hbase shell and it is empty > Looking at the region server logs: > {code:java} > 2018-06-12 22:08:40,455 INFO [RS_LOG_REPLAY_OPS-wn2-duohba:16020-0-Writer-1] > wal.WALSplitter: This region's directory doesn't exist: > hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648. > It is very likely that it was already split so it's safe to discard those > edits. > {code} > The log split/replay ignored actual WAL due to WALSplitter is looking for the > region directory in the hbase.wal.dir we specified rather than the > hbase.rootdir. > Looking at the source code, > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALSplitter.java > it uses the rootDir, which is walDir, as the tableDir root path. > So if we use HBASE-17437, waldir and hbase rootdir are in different path or > even in different filesystem, then the #5 uses walDir as tableDir is > apparently wrong. > CC: [~zyork], [~yuzhih...@gmail.com] Attached the logs for quick review. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20723) WALSplitter uses the rootDir, which is walDir, as the tableDir root path.
[ https://issues.apache.org/jira/browse/HBASE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511758#comment-16511758 ] Zach York commented on HBASE-20723: --- [~taklwu] Yes hbase.wal.dir is set in your experiments. > WALSplitter uses the rootDir, which is walDir, as the tableDir root path. > - > > Key: HBASE-20723 > URL: https://issues.apache.org/jira/browse/HBASE-20723 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 1.1.2 >Reporter: Rohan Pednekar >Priority: Major > Attachments: logs.zip > > > This is an Azure HDInsight HBase cluster with HDP 2.6. and HBase > 1.1.2.2.6.3.2-14 > By default the underlying data is going to wasb://x@y/hbase > I tried to move WAL folders to HDFS, which is the SSD mounted on each VM at > /mnt. > hbase.wal.dir= hdfs://mycluster/walontest > hbase.wal.dir.perms=700 > hbase.rootdir.perms=700 > hbase.rootdir= > wasb://XYZ[@hbaseperf.core.net|mailto:duohbase5ds...@duohbaseperf.blob.core.windows.net]/hbase > Procedure to reproduce this issue: > 1. create a table in hbase shell > 2. insert a row in hbase shell > 3. reboot the VM which hosts that region > 4. scan the table in hbase shell and it is empty > Looking at the region server logs: > {code:java} > 2018-06-12 22:08:40,455 INFO [RS_LOG_REPLAY_OPS-wn2-duohba:16020-0-Writer-1] > wal.WALSplitter: This region's directory doesn't exist: > hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648. > It is very likely that it was already split so it's safe to discard those > edits. > {code} > The log split/replay ignored actual WAL due to WALSplitter is looking for the > region directory in the hbase.wal.dir we specified rather than the > hbase.rootdir. > Looking at the source code, > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALSplitter.java > it uses the rootDir, which is walDir, as the tableDir root path. > So if we use HBASE-17437, waldir and hbase rootdir are in different path or > even in different filesystem, then the #5 uses walDir as tableDir is > apparently wrong. > CC: [~zyork], [~yuzhih...@gmail.com] Attached the logs for quick review. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20723) WALSplitter uses the rootDir, which is walDir, as the tableDir root path.
[ https://issues.apache.org/jira/browse/HBASE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511647#comment-16511647 ] Zach York commented on HBASE-20723: --- [~busbey] That's what I expected, but Stephen's experiment seems to prove otherwise. In both reproduction steps, the HDFS datanodes aren't actually going away (just the RS), so I think we can rule out the replication factor. > WALSplitter uses the rootDir, which is walDir, as the tableDir root path. > - > > Key: HBASE-20723 > URL: https://issues.apache.org/jira/browse/HBASE-20723 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 1.1.2 >Reporter: Rohan Pednekar >Priority: Major > Attachments: logs.zip > > > This is an Azure HDInsight HBase cluster with HDP 2.6. and HBase > 1.1.2.2.6.3.2-14 > By default the underlying data is going to wasb://x@y/hbase > I tried to move WAL folders to HDFS, which is the SSD mounted on each VM at > /mnt. > hbase.wal.dir= hdfs://mycluster/walontest > hbase.wal.dir.perms=700 > hbase.rootdir.perms=700 > hbase.rootdir= > wasb://XYZ[@hbaseperf.core.net|mailto:duohbase5ds...@duohbaseperf.blob.core.windows.net]/hbase > Procedure to reproduce this issue: > 1. create a table in hbase shell > 2. insert a row in hbase shell > 3. reboot the VM which hosts that region > 4. scan the table in hbase shell and it is empty > Looking at the region server logs: > {code:java} > 2018-06-12 22:08:40,455 INFO [RS_LOG_REPLAY_OPS-wn2-duohba:16020-0-Writer-1] > wal.WALSplitter: This region's directory doesn't exist: > hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648. > It is very likely that it was already split so it's safe to discard those > edits. > {code} > The log split/replay ignored actual WAL due to WALSplitter is looking for the > region directory in the hbase.wal.dir we specified rather than the > hbase.rootdir. > Looking at the source code, > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALSplitter.java > it uses the rootDir, which is walDir, as the tableDir root path. > So if we use HBASE-17437, waldir and hbase rootdir are in different path or > even in different filesystem, then the #5 uses walDir as tableDir is > apparently wrong. > CC: [~zyork], [~yuzhih...@gmail.com] Attached the logs for quick review. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20723) WALSplitter uses the rootDir, which is walDir, as the tableDir root path.
[ https://issues.apache.org/jira/browse/HBASE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510654#comment-16510654 ] Zach York commented on HBASE-20723: --- [~yuzhih...@gmail.com] you're welcome to try that change, but as you can see from the log, it is already looking in the walDir. (rootdir == walDir here). [~rpednekar] The WALSplitter is tasked with splitting logs (WALs). Why wouldn't it be looking in the hbase.wal.dir? >From my understanding, the recovered edits should be in: hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648/recovered.edits However, that directory doesn't exist... The one thing that one of my colleagues figured out recently is that edits aren't actually persisted to the WAL until they either reach a certain size or a time limit has elapsed that triggers the hsync() or hflush(). Since the VM didn't exit correctly, I'm assuming this is what happened. Can you try loading more data in (still under the flush size/interval), but enough to cause a hsync to the WAL file and see if you have the same issue? [~stack] You mentioned you also ran into this issue... Can you provide any more info on your reproduction? As [~apurtell] mentioned on the original JIRA, we tested this thoroughly when making the original change and have had many customers run with this setting without issue. It's possible that the patch was backported incorrectly to the Azure version, but it seems like this might be expected behavior when the number of writes are below the threshold required to sync/flush to the WAL file stream. > WALSplitter uses the rootDir, which is walDir, as the tableDir root path. > - > > Key: HBASE-20723 > URL: https://issues.apache.org/jira/browse/HBASE-20723 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 1.1.2 >Reporter: Rohan Pednekar >Priority: Major > Attachments: logs.zip > > > This is an Azure HDInsight HBase cluster with HDP 2.6. and HBase > 1.1.2.2.6.3.2-14 > By default the underlying data is going to wasb://x@y/hbase > I tried to move WAL folders to HDFS, which is the SSD mounted on each VM at > /mnt. > hbase.wal.dir= hdfs://mycluster/walontest > hbase.wal.dir.perms=700 > hbase.rootdir.perms=700 > hbase.rootdir= > wasb://XYZ[@hbaseperf.core.net|mailto:duohbase5ds...@duohbaseperf.blob.core.windows.net]/hbase > Procedure to reproduce this issue: > 1. create a table in hbase shell > 2. insert a row in hbase shell > 3. reboot the VM which hosts that region > 4. scan the table in hbase shell and it is empty > Looking at the region server logs: > {code} > 2018-06-12 22:08:40,455 INFO [RS_LOG_REPLAY_OPS-wn2-duohba:16020-0-Writer-1] > wal.WALSplitter: This region's directory doesn't exist: > hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648. > It is very likely that it was already split so it's safe to discard those > edits. > {code} > The log split/replay ignored actual WAL due to WALSplitter is looking for the > region directory in the hbase.wal.dir we specified rather than the > hbase.rootdir. > Looking at the source code, > > [https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.20-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALSplitter.java#L519] > it uses the rootDir, which is walDir, as the tableDir root path. > So if we use HBASE-17437, waldir and hbase rootdir are in different path or > even in different filesystem, then the #5 uses walDir as tableDir is > apparently wrong. > CC: [~zyork], [~yuzhih...@gmail.com] Attached the logs for quick review. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17437) Support specifying a WAL directory outside of the root directory
[ https://issues.apache.org/jira/browse/HBASE-17437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510471#comment-16510471 ] Zach York commented on HBASE-17437: --- Looking briefly at the code it seems that rootdir should be renamed to walDir (to more accurately describe what it does), but the code looks correct to me. WalDir is being passed in as rootDir ever place WALSplitter is getting initialized. > Support specifying a WAL directory outside of the root directory > > > Key: HBASE-17437 > URL: https://issues.apache.org/jira/browse/HBASE-17437 > Project: HBase > Issue Type: Improvement > Components: Filesystem Integration, wal >Affects Versions: 1.2.4 >Reporter: Yishan Yang >Assignee: Zach York >Priority: Major > Labels: patch > Fix For: 1.4.0, 2.0.0 > > Attachments: HBASE-17437.branch-1.001.patch, > HBASE-17437.branch-1.002.patch, HBASE-17437.branch-1.003.patch, > HBASE-17437.branch-1.004.patch, HBASE-17437.master.001.patch, > HBASE-17437.master.002.patch, HBASE-17437.master.003.patch, > HBASE-17437.master.004.patch, HBASE-17437.master.005.patch, > HBASE-17437.master.006.patch, HBASE-17437.master.007.patch, > HBASE-17437.master.008.patch, HBASE-17437.master.009.patch, > HBASE-17437.master.010.patch, HBASE-17437.master.011.patch, > HBASE-17437.master.012.patch, hbase-17437-branch-1.2.patch, > hbase-17437-master.patch > > > Currently, the WAL and the StoreFiles need to be on the same FileSystem. Some > FileSystems (such as Amazon S3) don’t support append or consistent writes. > These two properties are imperative for the WAL in order to avoid loss of > writes. However, StoreFiles don’t necessarily need the same consistency > guarantees (since writes are cached locally and if writes fail, they can > always be replayed from the WAL). > > This JIRA aims to allow users to configure a log directory (for WALs) that is > outside of the root directory or even in a different FileSystem. The default > value will still put the log directory under the root directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17437) Support specifying a WAL directory outside of the root directory
[ https://issues.apache.org/jira/browse/HBASE-17437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510454#comment-16510454 ] Zach York commented on HBASE-17437: --- Yeah please open a new issue. Feel free to assign it to me and I can take a look. It's hard to judge what is happening just with the single log line. Please also specify which branch you were running with. > Support specifying a WAL directory outside of the root directory > > > Key: HBASE-17437 > URL: https://issues.apache.org/jira/browse/HBASE-17437 > Project: HBase > Issue Type: Improvement > Components: Filesystem Integration, wal >Affects Versions: 1.2.4 >Reporter: Yishan Yang >Assignee: Zach York >Priority: Major > Labels: patch > Fix For: 1.4.0, 2.0.0 > > Attachments: HBASE-17437.branch-1.001.patch, > HBASE-17437.branch-1.002.patch, HBASE-17437.branch-1.003.patch, > HBASE-17437.branch-1.004.patch, HBASE-17437.master.001.patch, > HBASE-17437.master.002.patch, HBASE-17437.master.003.patch, > HBASE-17437.master.004.patch, HBASE-17437.master.005.patch, > HBASE-17437.master.006.patch, HBASE-17437.master.007.patch, > HBASE-17437.master.008.patch, HBASE-17437.master.009.patch, > HBASE-17437.master.010.patch, HBASE-17437.master.011.patch, > HBASE-17437.master.012.patch, hbase-17437-branch-1.2.patch, > hbase-17437-master.patch > > > Currently, the WAL and the StoreFiles need to be on the same FileSystem. Some > FileSystems (such as Amazon S3) don’t support append or consistent writes. > These two properties are imperative for the WAL in order to avoid loss of > writes. However, StoreFiles don’t necessarily need the same consistency > guarantees (since writes are cached locally and if writes fail, they can > always be replayed from the WAL). > > This JIRA aims to allow users to configure a log directory (for WALs) that is > outside of the root directory or even in a different FileSystem. The default > value will still put the log directory under the root directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20556) Backport HBASE-16490 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-20556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510084#comment-16510084 ] Zach York commented on HBASE-20556: --- Pushed to branch-1 and branch-1.4. I reran the modified tests to ensure the patch was working as expected. > Backport HBASE-16490 to branch-1 > > > Key: HBASE-20556 > URL: https://issues.apache.org/jira/browse/HBASE-20556 > Project: HBase > Issue Type: Sub-task > Components: HFile, snapshots >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Major > Fix For: 1.5.0, 1.4.6 > > Attachments: HBASE-20556.branch-1.001.patch, > HBASE-20556.branch-1.002.patch, HBASE-20556.branch-1.003.patch, > HBASE-20556.branch-1.004.patch > > > As part of HBASE-20555, HBASE-16490 is the first patch that is needed for > backporting HBASE-18083 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20556) Backport HBASE-16490 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-20556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20556: -- Resolution: Fixed Fix Version/s: 1.4.6 1.5.0 Status: Resolved (was: Patch Available) > Backport HBASE-16490 to branch-1 > > > Key: HBASE-20556 > URL: https://issues.apache.org/jira/browse/HBASE-20556 > Project: HBase > Issue Type: Sub-task > Components: HFile, snapshots >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Major > Fix For: 1.5.0, 1.4.6 > > Attachments: HBASE-20556.branch-1.001.patch, > HBASE-20556.branch-1.002.patch, HBASE-20556.branch-1.003.patch, > HBASE-20556.branch-1.004.patch > > > As part of HBASE-20555, HBASE-16490 is the first patch that is needed for > backporting HBASE-18083 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20555) Backport HBASE-18083 and related changes in branch-1
[ https://issues.apache.org/jira/browse/HBASE-20555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497235#comment-16497235 ] Zach York commented on HBASE-20555: --- Thanks! I'll make sure to keep the default comment in mind while reviewing. > Backport HBASE-18083 and related changes in branch-1 > > > Key: HBASE-20555 > URL: https://issues.apache.org/jira/browse/HBASE-20555 > Project: HBase > Issue Type: Umbrella > Components: HFile, snapshots >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Major > > This will be the umbrella JIRA for backporting HBASE-18083 of `Make > large/small file clean thread number configurable in HFileCleaner` from > HBase's branch-2 to HBase's branch-1 that will needs a total of 4 sub-tasks > that backport HBASE-16490, HBASE-17215, HBASE-17854 and then HBASE-18083 > The goal is to improve HFile cleaning performance that has been introduced in > branch-2 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20555) Backport HBASE-18083 and related changes in branch-1
[ https://issues.apache.org/jira/browse/HBASE-20555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497189#comment-16497189 ] Zach York commented on HBASE-20555: --- [~apurtell] Any concerns with any of these backports being applied to branch-1.4? I will hold off on committing to branch-1.4 until I hear from you. > Backport HBASE-18083 and related changes in branch-1 > > > Key: HBASE-20555 > URL: https://issues.apache.org/jira/browse/HBASE-20555 > Project: HBase > Issue Type: Umbrella > Components: HFile, snapshots >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Major > > This will be the umbrella JIRA for backporting HBASE-18083 of `Make > large/small file clean thread number configurable in HFileCleaner` from > HBase's branch-2 to HBase's branch-1 that will needs a total of 4 sub-tasks > that backport HBASE-16490, HBASE-17215, HBASE-17854 and then HBASE-18083 > The goal is to improve HFile cleaning performance that has been introduced in > branch-2 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20665) "Already cached block XXX" message should be DEBUG
[ https://issues.apache.org/jira/browse/HBASE-20665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496949#comment-16496949 ] Zach York commented on HBASE-20665: --- I swore I fixed this in a recent commit, but I guess I must have missed it in the latest version. Thanks for calling this out. > "Already cached block XXX" message should be DEBUG > -- > > Key: HBASE-20665 > URL: https://issues.apache.org/jira/browse/HBASE-20665 > Project: HBase > Issue Type: Task > Components: BlockCache >Affects Versions: 1.2.0, 2.0.0 >Reporter: Sean Busbey >Priority: Minor > Labels: beginner > Fix For: 3.0.0, 2.1.0, 1.5.0 > > > Testing a local cluster that relies on the LruBlockCache for a scan-heavy > workload and I'm getting a bunch of log entries at WARN > {code} > 2018-05-30 12:28:47,192 WARN org.apache.hadoop.hbase.io.hfile.LruBlockCache: > Cached an already cached block: df01f5bf6a244f6bb1a626b927377fff_54780812 > cb:df01f5bf6a244f6bb1a626b927377fff_54780812. This is harmless and can happen > in rare cases (see HBASE-8547) > {code} > As the log message notes (and the code confirms) this is a harmless result of > contention for getting a block into the CHM. the message should be at DEBUG. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20556) Backport HBASE-16490 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-20556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491358#comment-16491358 ] Zach York commented on HBASE-20556: --- +1 I will commit if nobody has anymore comments. > Backport HBASE-16490 to branch-1 > > > Key: HBASE-20556 > URL: https://issues.apache.org/jira/browse/HBASE-20556 > Project: HBase > Issue Type: Sub-task > Components: HFile, snapshots >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Major > Attachments: HBASE-20556.branch-1.001.patch, > HBASE-20556.branch-1.002.patch, HBASE-20556.branch-1.003.patch, > HBASE-20556.branch-1.004.patch > > > As part of HBASE-20555, HBASE-16490 is the first patch that is needed for > backporting HBASE-18083 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20608) Remove build option of error prone profile for branch-1 after HBASE-12350
[ https://issues.apache.org/jira/browse/HBASE-20608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484722#comment-16484722 ] Zach York commented on HBASE-20608: --- LGTM Andrew, also thanks for pointing out where these configs are kept! Also, noted on your comments about automated testing. In the future I will run the tests myself. > Remove build option of error prone profile for branch-1 after HBASE-12350 > - > > Key: HBASE-20608 > URL: https://issues.apache.org/jira/browse/HBASE-20608 > Project: HBase > Issue Type: Task > Components: build >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Mike Drob >Priority: Major > > After HBASE-12350, error prone profile was introduced/backported to branch-1 > and branch-2. However, branch-1 is still building with JDK 7 and is > incompatible with this error prone profile such that `mvn test-compile` > failed since then. > Open this issue to track the removal of `-PerrorProne` in the build command > (in Jenkins) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20608) Remove build option of error prone profile for branch-1 after HBASE-12350
[ https://issues.apache.org/jira/browse/HBASE-20608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484678#comment-16484678 ] Zach York commented on HBASE-20608: --- Can we remove/revert this change in branch-1 until we come up with the long term solution? This is blocking precommit in branch-1 and I'm uncomfortable committing with precommit in this state. Also please let me know if I can be of any help in fixing this. Unfortunately, I don't have much knowledge of HBase's Jenkins/Yetus setup, but could potentially learn with some help. > Remove build option of error prone profile for branch-1 after HBASE-12350 > - > > Key: HBASE-20608 > URL: https://issues.apache.org/jira/browse/HBASE-20608 > Project: HBase > Issue Type: Task > Components: build >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Mike Drob >Priority: Major > > After HBASE-12350, error prone profile was introduced/backported to branch-1 > and branch-2. However, branch-1 is still building with JDK 7 and is > incompatible with this error prone profile such that `mvn test-compile` > failed since then. > Open this issue to track the removal of `-PerrorProne` in the build command > (in Jenkins) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20556) Backport HBASE-16490 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-20556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484340#comment-16484340 ] Zach York commented on HBASE-20556: --- [~mdrob] and [~busbey] do we want *HBASE-20608* to go in before this so we can ensure a clean run or are we okay with ignoring the failures? It appears that this has been failing for some time. Also, I'm not sure if either of you have context on the unit test 'failures', but it looks like the tests hit a hard limit on memory (since I doubt we are calling system.exit() in our test code). Is there something we can do to fix that (also likely not related to this change)? Is memory something we control at the test level or overall test execution level? > Backport HBASE-16490 to branch-1 > > > Key: HBASE-20556 > URL: https://issues.apache.org/jira/browse/HBASE-20556 > Project: HBase > Issue Type: Sub-task > Components: HFile, snapshots >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Major > Attachments: HBASE-20556.branch-1.001.patch, > HBASE-20556.branch-1.002.patch, HBASE-20556.branch-1.003.patch > > > As part of HBASE-20555, HBASE-16490 is the first patch that is needed for > backporting HBASE-18083 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20556) Backport HBASE-16490 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-20556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477728#comment-16477728 ] Zach York commented on HBASE-20556: --- [~taklwu] can you retrigger the unit tests (reattach the patch)? It looks like there was a surefire error that caused the failure. > Backport HBASE-16490 to branch-1 > > > Key: HBASE-20556 > URL: https://issues.apache.org/jira/browse/HBASE-20556 > Project: HBase > Issue Type: Sub-task > Components: HFile, snapshots >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Major > Attachments: HBASE-20556.branch-1.002.patch > > > As part of HBASE-20555, HBASE-16490 is the first patch that is needed for > backporting HBASE-18083 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20556) Backport HBASE-16490 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-20556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20556: -- Status: Patch Available (was: Open) > Backport HBASE-16490 to branch-1 > > > Key: HBASE-20556 > URL: https://issues.apache.org/jira/browse/HBASE-20556 > Project: HBase > Issue Type: Sub-task > Components: HFile, snapshots >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Major > Attachments: HBASE-20556.branch-1.002.patch > > > As part of HBASE-20555, HBASE-16490 is the first patch that is needed for > backporting HBASE-18083 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20447) Only fail cacheBlock if block collisions aren't related to next block metadata
[ https://issues.apache.org/jira/browse/HBASE-20447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20447: -- Attachment: HBASE-20447.master.004.patch > Only fail cacheBlock if block collisions aren't related to next block metadata > -- > > Key: HBASE-20447 > URL: https://issues.apache.org/jira/browse/HBASE-20447 > Project: HBase > Issue Type: Bug > Components: BlockCache, BucketCache >Affects Versions: 1.4.3, 2.0.0 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Fix For: 3.0.0, 2.1.0, 1.5.0, 2.0.1, 1.4.5 > > Attachments: HBASE-20447.branch-1.001.patch, > HBASE-20447.branch-1.002.patch, HBASE-20447.branch-1.003.patch, > HBASE-20447.branch-1.004.patch, HBASE-20447.branch-1.005.patch, > HBASE-20447.branch-1.006.patch, HBASE-20447.master.001.patch, > HBASE-20447.master.002.patch, HBASE-20447.master.003.patch, > HBASE-20447.master.004.patch > > > This is the issue I was originally having here: > [http://mail-archives.apache.org/mod_mbox/hbase-dev/201802.mbox/%3CCAN+qs_Pav=md_aoj4xji+kcnetubg2xou2ntxv1g6m8-5vn...@mail.gmail.com%3E] > > When we pread, we don't force the read to read all of the next block header. > However, when we get into a race condition where two opener threads try to > cache the same block and one thread read all of the next block header and the > other one didn't, it will fail the open process. This is especially important > in a splitting case where it will potentially fail the split process. > Instead, in the caches, we should only fail if the required blocks are > different. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20447) Only fail cacheBlock if block collisions aren't related to next block metadata
[ https://issues.apache.org/jira/browse/HBASE-20447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20447: -- Attachment: HBASE-20447.master.003.patch > Only fail cacheBlock if block collisions aren't related to next block metadata > -- > > Key: HBASE-20447 > URL: https://issues.apache.org/jira/browse/HBASE-20447 > Project: HBase > Issue Type: Bug > Components: BlockCache, BucketCache >Affects Versions: 1.4.3, 2.0.0 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Fix For: 3.0.0, 2.1.0, 1.5.0, 2.0.1, 1.4.5 > > Attachments: HBASE-20447.branch-1.001.patch, > HBASE-20447.branch-1.002.patch, HBASE-20447.branch-1.003.patch, > HBASE-20447.branch-1.004.patch, HBASE-20447.branch-1.005.patch, > HBASE-20447.branch-1.006.patch, HBASE-20447.master.001.patch, > HBASE-20447.master.002.patch, HBASE-20447.master.003.patch > > > This is the issue I was originally having here: > [http://mail-archives.apache.org/mod_mbox/hbase-dev/201802.mbox/%3CCAN+qs_Pav=md_aoj4xji+kcnetubg2xou2ntxv1g6m8-5vn...@mail.gmail.com%3E] > > When we pread, we don't force the read to read all of the next block header. > However, when we get into a race condition where two opener threads try to > cache the same block and one thread read all of the next block header and the > other one didn't, it will fail the open process. This is especially important > in a splitting case where it will potentially fail the split process. > Instead, in the caches, we should only fail if the required blocks are > different. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20447) Only fail cacheBlock if block collisions aren't related to next block metadata
[ https://issues.apache.org/jira/browse/HBASE-20447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20447: -- Attachment: HBASE-20447.branch-1.006.patch > Only fail cacheBlock if block collisions aren't related to next block metadata > -- > > Key: HBASE-20447 > URL: https://issues.apache.org/jira/browse/HBASE-20447 > Project: HBase > Issue Type: Bug > Components: BlockCache, BucketCache >Affects Versions: 1.4.3, 2.0.0 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Fix For: 3.0.0, 2.1.0, 1.5.0, 2.0.1, 1.4.5 > > Attachments: HBASE-20447.branch-1.001.patch, > HBASE-20447.branch-1.002.patch, HBASE-20447.branch-1.003.patch, > HBASE-20447.branch-1.004.patch, HBASE-20447.branch-1.005.patch, > HBASE-20447.branch-1.006.patch, HBASE-20447.master.001.patch, > HBASE-20447.master.002.patch > > > This is the issue I was originally having here: > [http://mail-archives.apache.org/mod_mbox/hbase-dev/201802.mbox/%3CCAN+qs_Pav=md_aoj4xji+kcnetubg2xou2ntxv1g6m8-5vn...@mail.gmail.com%3E] > > When we pread, we don't force the read to read all of the next block header. > However, when we get into a race condition where two opener threads try to > cache the same block and one thread read all of the next block header and the > other one didn't, it will fail the open process. This is especially important > in a splitting case where it will potentially fail the split process. > Instead, in the caches, we should only fail if the required blocks are > different. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20204) Add locking to RefreshFileConnections in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20204: -- Fix Version/s: 1.4.5 1.5.0 2.1.0 3.0.0 > Add locking to RefreshFileConnections in BucketCache > > > Key: HBASE-20204 > URL: https://issues.apache.org/jira/browse/HBASE-20204 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 1.4.3, 2.0.0 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Fix For: 3.0.0, 2.1.0, 1.5.0, 1.4.5 > > Attachments: HBASE-20204.master.001.patch, > HBASE-20204.master.002.patch, HBASE-20204.master.003.patch, > HBASE-20204.master.004.patch > > > This is a follow-up to HBASE-20141 where [~anoop.hbase] suggested adding > locking for refreshing channels. > I have also seen this become an issue when a RS has to abort and it locks on > trying to flush out the remaining data to the cache (since cache on write was > turned on). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20204) Add locking to RefreshFileConnections in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20204: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Add locking to RefreshFileConnections in BucketCache > > > Key: HBASE-20204 > URL: https://issues.apache.org/jira/browse/HBASE-20204 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 1.4.3, 2.0.0 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Fix For: 3.0.0, 2.1.0, 1.5.0, 1.4.5 > > Attachments: HBASE-20204.master.001.patch, > HBASE-20204.master.002.patch, HBASE-20204.master.003.patch, > HBASE-20204.master.004.patch > > > This is a follow-up to HBASE-20141 where [~anoop.hbase] suggested adding > locking for refreshing channels. > I have also seen this become an issue when a RS has to abort and it locks on > trying to flush out the remaining data to the cache (since cache on write was > turned on). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20204) Add locking to RefreshFileConnections in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469542#comment-16469542 ] Zach York commented on HBASE-20204: --- Pushed to master, branch-2, branch-1, branch-1.4. I didn't push to branch-2.0 because of [~stack]'s email saying to refrain from pushing to 2.0 (though this is a fairly small bug fix). > Add locking to RefreshFileConnections in BucketCache > > > Key: HBASE-20204 > URL: https://issues.apache.org/jira/browse/HBASE-20204 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 1.4.3, 2.0.0 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Attachments: HBASE-20204.master.001.patch, > HBASE-20204.master.002.patch, HBASE-20204.master.003.patch, > HBASE-20204.master.004.patch > > > This is a follow-up to HBASE-20141 where [~anoop.hbase] suggested adding > locking for refreshing channels. > I have also seen this become an issue when a RS has to abort and it locks on > trying to flush out the remaining data to the cache (since cache on write was > turned on). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20204) Add locking to RefreshFileConnections in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20204: -- Attachment: HBASE-20204.master.004.patch > Add locking to RefreshFileConnections in BucketCache > > > Key: HBASE-20204 > URL: https://issues.apache.org/jira/browse/HBASE-20204 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 1.4.3, 2.0.0 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Attachments: HBASE-20204.master.001.patch, > HBASE-20204.master.002.patch, HBASE-20204.master.003.patch, > HBASE-20204.master.004.patch > > > This is a follow-up to HBASE-20141 where [~anoop.hbase] suggested adding > locking for refreshing channels. > I have also seen this become an issue when a RS has to abort and it locks on > trying to flush out the remaining data to the cache (since cache on write was > turned on). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20447) Only fail cacheBlock if block collisions aren't related to next block metadata
[ https://issues.apache.org/jira/browse/HBASE-20447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20447: -- Attachment: HBASE-20447.branch-1.005.patch > Only fail cacheBlock if block collisions aren't related to next block metadata > -- > > Key: HBASE-20447 > URL: https://issues.apache.org/jira/browse/HBASE-20447 > Project: HBase > Issue Type: Bug > Components: BlockCache, BucketCache >Affects Versions: 1.4.3, 2.0.0 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Fix For: 3.0.0, 2.1.0, 1.5.0, 2.0.1, 1.4.5 > > Attachments: HBASE-20447.branch-1.001.patch, > HBASE-20447.branch-1.002.patch, HBASE-20447.branch-1.003.patch, > HBASE-20447.branch-1.004.patch, HBASE-20447.branch-1.005.patch, > HBASE-20447.master.001.patch, HBASE-20447.master.002.patch > > > This is the issue I was originally having here: > [http://mail-archives.apache.org/mod_mbox/hbase-dev/201802.mbox/%3CCAN+qs_Pav=md_aoj4xji+kcnetubg2xou2ntxv1g6m8-5vn...@mail.gmail.com%3E] > > When we pread, we don't force the read to read all of the next block header. > However, when we get into a race condition where two opener threads try to > cache the same block and one thread read all of the next block header and the > other one didn't, it will fail the open process. This is especially important > in a splitting case where it will potentially fail the split process. > Instead, in the caches, we should only fail if the required blocks are > different. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20447) Only fail cacheBlock if block collisions aren't related to next block metadata
[ https://issues.apache.org/jira/browse/HBASE-20447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453171#comment-16453171 ] Zach York commented on HBASE-20447: --- I have attached a new patch that fixes the master patch. I figured out that it was an incorrect forward port w.r.t returnBlock nothing broken with shared memory [~anoop.hbase] (sorry for pulling you in on that). > Only fail cacheBlock if block collisions aren't related to next block metadata > -- > > Key: HBASE-20447 > URL: https://issues.apache.org/jira/browse/HBASE-20447 > Project: HBase > Issue Type: Bug > Components: BlockCache, BucketCache >Affects Versions: 1.4.3, 2.0.0 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Fix For: 3.0.0, 2.1.0, 1.5.0, 2.0.1, 1.4.5 > > Attachments: HBASE-20447.branch-1.001.patch, > HBASE-20447.branch-1.002.patch, HBASE-20447.branch-1.003.patch, > HBASE-20447.branch-1.004.patch, HBASE-20447.master.001.patch, > HBASE-20447.master.002.patch > > > This is the issue I was originally having here: > [http://mail-archives.apache.org/mod_mbox/hbase-dev/201802.mbox/%3CCAN+qs_Pav=md_aoj4xji+kcnetubg2xou2ntxv1g6m8-5vn...@mail.gmail.com%3E] > > When we pread, we don't force the read to read all of the next block header. > However, when we get into a race condition where two opener threads try to > cache the same block and one thread read all of the next block header and the > other one didn't, it will fail the open process. This is especially important > in a splitting case where it will potentially fail the split process. > Instead, in the caches, we should only fail if the required blocks are > different. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20447) Only fail cacheBlock if block collisions aren't related to next block metadata
[ https://issues.apache.org/jira/browse/HBASE-20447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-20447: -- Attachment: HBASE-20447.master.002.patch > Only fail cacheBlock if block collisions aren't related to next block metadata > -- > > Key: HBASE-20447 > URL: https://issues.apache.org/jira/browse/HBASE-20447 > Project: HBase > Issue Type: Bug > Components: BlockCache, BucketCache >Affects Versions: 1.4.3, 2.0.0 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Fix For: 3.0.0, 2.1.0, 1.5.0, 2.0.1, 1.4.5 > > Attachments: HBASE-20447.branch-1.001.patch, > HBASE-20447.branch-1.002.patch, HBASE-20447.branch-1.003.patch, > HBASE-20447.branch-1.004.patch, HBASE-20447.master.001.patch, > HBASE-20447.master.002.patch > > > This is the issue I was originally having here: > [http://mail-archives.apache.org/mod_mbox/hbase-dev/201802.mbox/%3CCAN+qs_Pav=md_aoj4xji+kcnetubg2xou2ntxv1g6m8-5vn...@mail.gmail.com%3E] > > When we pread, we don't force the read to read all of the next block header. > However, when we get into a race condition where two opener threads try to > cache the same block and one thread read all of the next block header and the > other one didn't, it will fail the open process. This is especially important > in a splitting case where it will potentially fail the split process. > Instead, in the caches, we should only fail if the required blocks are > different. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20447) Only fail cacheBlock if block collisions aren't related to next block metadata
[ https://issues.apache.org/jira/browse/HBASE-20447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451373#comment-16451373 ] Zach York commented on HBASE-20447: --- TestBucketCache passes locally. Reattaching to retry. > Only fail cacheBlock if block collisions aren't related to next block metadata > -- > > Key: HBASE-20447 > URL: https://issues.apache.org/jira/browse/HBASE-20447 > Project: HBase > Issue Type: Bug > Components: BlockCache, BucketCache >Affects Versions: 1.4.3, 2.0.0 >Reporter: Zach York >Assignee: Zach York >Priority: Major > Fix For: 3.0.0, 2.1.0, 1.5.0, 2.0.1, 1.4.5 > > Attachments: HBASE-20447.branch-1.001.patch, > HBASE-20447.branch-1.002.patch, HBASE-20447.branch-1.003.patch, > HBASE-20447.branch-1.004.patch, HBASE-20447.master.001.patch > > > This is the issue I was originally having here: > [http://mail-archives.apache.org/mod_mbox/hbase-dev/201802.mbox/%3CCAN+qs_Pav=md_aoj4xji+kcnetubg2xou2ntxv1g6m8-5vn...@mail.gmail.com%3E] > > When we pread, we don't force the read to read all of the next block header. > However, when we get into a race condition where two opener threads try to > cache the same block and one thread read all of the next block header and the > other one didn't, it will fail the open process. This is especially important > in a splitting case where it will potentially fail the split process. > Instead, in the caches, we should only fail if the required blocks are > different. -- This message was sent by Atlassian JIRA (v7.6.3#76005)