[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits
[ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-3845: -- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) data loss because lastSeqWritten can miss memstore edits Key: HBASE-3845 URL: https://issues.apache.org/jira/browse/HBASE-3845 Project: HBase Issue Type: Bug Affects Versions: 0.90.3 Reporter: Prakash Khemani Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.90.5 Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.) In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably. After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore. HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track of the earliest log-sequence-number that is present in the memstore. Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens. step 1: flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock(). step 2 : as soon as the updatesLock.writeLock() is released new entries will be added into the memstore. step 3 : wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten. step 4: the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing. == as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits
[ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073159#comment-13073159 ] Ted Yu commented on HBASE-3845: --- Applied to TRUNK. TestResettingCounters passes now. Thanks for the patch Anirudh. data loss because lastSeqWritten can miss memstore edits Key: HBASE-3845 URL: https://issues.apache.org/jira/browse/HBASE-3845 Project: HBase Issue Type: Bug Affects Versions: 0.90.3 Reporter: Prakash Khemani Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.90.5 Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.) In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably. After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore. HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track of the earliest log-sequence-number that is present in the memstore. Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens. step 1: flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock(). step 2 : as soon as the updatesLock.writeLock() is released new entries will be added into the memstore. step 3 : wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten. step 4: the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing. == as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits
[ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073168#comment-13073168 ] Hudson commented on HBASE-3845: --- Integrated in HBase-TRUNK #2064 (See [https://builds.apache.org/job/HBase-TRUNK/2064/]) HBASE-3845 Addendum: relax lastSeqWritten check in case write to WAL is skipped tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java data loss because lastSeqWritten can miss memstore edits Key: HBASE-3845 URL: https://issues.apache.org/jira/browse/HBASE-3845 Project: HBase Issue Type: Bug Affects Versions: 0.90.3 Reporter: Prakash Khemani Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.90.5 Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.) In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably. After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore. HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track of the earliest log-sequence-number that is present in the memstore. Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens. step 1: flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock(). step 2 : as soon as the updatesLock.writeLock() is released new entries will be added into the memstore. step 3 : wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten. step 4: the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing. == as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4003) Cleanup Calls Conservatively On Timeout
[ https://issues.apache.org/jira/browse/HBASE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthick Sankarachary updated HBASE-4003: - Attachment: (was: HBASE-4003-V2.patch) Cleanup Calls Conservatively On Timeout --- Key: HBASE-4003 URL: https://issues.apache.org/jira/browse/HBASE-4003 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.3 Reporter: Karthick Sankarachary Assignee: Karthick Sankarachary Fix For: 0.92.0 Attachments: HBASE-4003.patch In the event of a socket timeout, the {{HBaseClient}} iterates over the outstanding calls (on that socket), and notifies them that a {{SocketTimeoutException}} has occurred. Ideally, we should be cleanup up just those calls that have been outstanding for longer than the specified socket timeout. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4148) HFileOutputFormat doesn't fill in TIMERANGE_KEY metadata
[ https://issues.apache.org/jira/browse/HBASE-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-4148: -- Attachment: 0001-HBASE-4148.-HFileOutputFormat-doesn-t-fill-in-TIMERA.patch HFileOutputFormat doesn't fill in TIMERANGE_KEY metadata Key: HBASE-4148 URL: https://issues.apache.org/jira/browse/HBASE-4148 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.3 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.90.5 Attachments: 0001-HBASE-4148.-HFileOutputFormat-doesn-t-fill-in-TIMERA.patch When HFiles are flushed through the normal path, they include an attribute TIMERANGE_KEY which can be used to cull HFiles when performing a time-restricted scan. Files produced by HFileOutputFormat are currently missing this metadata. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4148) HFileOutputFormat doesn't fill in TIMERANGE_KEY metadata
[ https://issues.apache.org/jira/browse/HBASE-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-4148: -- Status: Patch Available (was: Open) Up for review here: https://reviews.apache.org/r/1229/ HFileOutputFormat doesn't fill in TIMERANGE_KEY metadata Key: HBASE-4148 URL: https://issues.apache.org/jira/browse/HBASE-4148 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.3 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.90.5 Attachments: 0001-HBASE-4148.-HFileOutputFormat-doesn-t-fill-in-TIMERA.patch When HFiles are flushed through the normal path, they include an attribute TIMERANGE_KEY which can be used to cull HFiles when performing a time-restricted scan. Files produced by HFileOutputFormat are currently missing this metadata. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4148) HFileOutputFormat doesn't fill in TIMERANGE_KEY metadata
[ https://issues.apache.org/jira/browse/HBASE-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073230#comment-13073230 ] jirapos...@reviews.apache.org commented on HBASE-4148: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1229/ --- Review request for hbase and Todd Lipcon. Summary --- When HFiles are flushed through the normal path, they include an attribute TIMERANGE_KEY which can be used to cull HFiles when performing a time-restricted scan. Files produced by HFileOutputFormat are currently missing this metadata. This addresses bug HBASE-4148. https://issues.apache.org/jira/browse/HBASE-4148 Diffs - src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java 8ccdf4d src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 40efdda src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java 89241eb Diff: https://reviews.apache.org/r/1229/diff Testing --- Added unit test. I don't quite understand why the KeyValue with the larger timestamp (2000) value must be written before the one with the smaller timestamp (1000). I can see the code that enforces this (HFile.checkKey) but not why keys are larger to smaller. Is this in HFile data precondition? I cannot get the full test suite to pass, with or without this patch. Suite seems to timeout on tests unrelated to this. Would appreciate some hints or pointers for info on which tests are flakey or take a long time to run. Thanks, jmhsieh HFileOutputFormat doesn't fill in TIMERANGE_KEY metadata Key: HBASE-4148 URL: https://issues.apache.org/jira/browse/HBASE-4148 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.3 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.90.5 Attachments: 0001-HBASE-4148.-HFileOutputFormat-doesn-t-fill-in-TIMERA.patch When HFiles are flushed through the normal path, they include an attribute TIMERANGE_KEY which can be used to cull HFiles when performing a time-restricted scan. Files produced by HFileOutputFormat are currently missing this metadata. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-451) Remove HTableDescriptor from HRegionInfo
[ https://issues.apache.org/jira/browse/HBASE-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073279#comment-13073279 ] Hudson commented on HBASE-451: -- Integrated in HBase-TRUNK #2065 (See [https://builds.apache.org/job/HBase-TRUNK/2065/]) HBASE-4032 HBASE-451 improperly breaks public API HRegionInfo#getTableDesc tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionInfo.java Remove HTableDescriptor from HRegionInfo Key: HBASE-451 URL: https://issues.apache.org/jira/browse/HBASE-451 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.2.0 Reporter: Jim Kellerman Assignee: Subbu M Iyer Priority: Critical Fix For: 0.92.0 Attachments: 451-addendum-v2.txt, 451_support_for_removing_HTD_from_HRI_trunk.txt, HBASE-451-Fixed_broken_TestAdmin.patch, HBASE-451-Fixed_broken_TestAdmin1.patch, HBASE-451_-_First_draft_support_for_removing_HTD_from_HRI1.patch, HBASE-451_-_Fourth_draft_support_for_removing_HTD_from_HRI.patch, HBASE-451_-_Second_draft_-_Remove_HTD_from_HRI.patch, descriptors.txt, fixtestadmin.txt, pass_htd_on_region_construction.txt There is an HRegionInfo for every region in HBase. Currently HRegionInfo also contains the HTableDescriptor (the schema). That means we store the schema n times where n is the number of regions in the table. Additionally, for every region of the same table that the region server has open, there is a copy of the schema. Thus it is stored in memory once for each open region. If HRegionInfo merely contained the table name the HTableDescriptor could be stored in a separate file and easily found. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4032) HBASE-451 improperly breaks public API HRegionInfo#getTableDesc
[ https://issues.apache.org/jira/browse/HBASE-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073280#comment-13073280 ] Hudson commented on HBASE-4032: --- Integrated in HBase-TRUNK #2065 (See [https://builds.apache.org/job/HBase-TRUNK/2065/]) HBASE-4032 HBASE-451 improperly breaks public API HRegionInfo#getTableDesc tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionInfo.java HBASE-451 improperly breaks public API HRegionInfo#getTableDesc --- Key: HBASE-4032 URL: https://issues.apache.org/jira/browse/HBASE-4032 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: stack Priority: Blocker Fix For: 0.92.0 Attachments: 4032-v2.txt, 4032-v3.txt, 4032.txt After HBASE-451, HRegionInfo#getTableDesc has been modified to always return {{null}}. One immediate effect is broken unit tests. That aside, it is not in the spirit of deprecation to actually break the method until after the deprecation cycle, it's a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4148) HFileOutputFormat doesn't fill in TIMERANGE_KEY metadata
[ https://issues.apache.org/jira/browse/HBASE-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073328#comment-13073328 ] jirapos...@reviews.apache.org commented on HBASE-4148: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1229/ --- (Updated 2011-07-31 05:52:30.608713) Review request for hbase and Todd Lipcon. Changes --- Updated to address nit. Summary --- When HFiles are flushed through the normal path, they include an attribute TIMERANGE_KEY which can be used to cull HFiles when performing a time-restricted scan. Files produced by HFileOutputFormat are currently missing this metadata. This addresses bug HBASE-4148. https://issues.apache.org/jira/browse/HBASE-4148 Diffs (updated) - src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java 8ccdf4d src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 40efdda src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java 89241eb Diff: https://reviews.apache.org/r/1229/diff Testing --- Added unit test. I don't quite understand why the KeyValue with the larger timestamp (2000) value must be written before the one with the smaller timestamp (1000). I can see the code that enforces this (HFile.checkKey) but not why keys are larger to smaller. Is this in HFile data precondition? I cannot get the full test suite to pass, with or without this patch. Suite seems to timeout on tests unrelated to this. Would appreciate some hints or pointers for info on which tests are flakey or take a long time to run. Thanks, jmhsieh HFileOutputFormat doesn't fill in TIMERANGE_KEY metadata Key: HBASE-4148 URL: https://issues.apache.org/jira/browse/HBASE-4148 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.3 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.90.5 Attachments: 0001-HBASE-4148-HFileOutputFormat-doesn-t-fill-in-TIMERAN.patch, 0001-HBASE-4148.-HFileOutputFormat-doesn-t-fill-in-TIMERA.patch When HFiles are flushed through the normal path, they include an attribute TIMERANGE_KEY which can be used to cull HFiles when performing a time-restricted scan. Files produced by HFileOutputFormat are currently missing this metadata. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira