[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122498#comment-14122498 ] Hudson commented on HBASE-11882: FAILURE: Integrated in HBase-TRUNK #5467 (See [https://builds.apache.org/job/HBase-TRUNK/5467/]) HBASE-11882 Row level consistency may not be maintained with bulk load and (ramkrishna: rev 8de30d32d4d5c86650effadbda72f7ef32a4f15f) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionServerBulkLoad.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.99.0, 2.0.0 Reporter: Jerry He Assignee: Jerry He Priority: Critical Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11882-master-v1.patch, HBASE-11882-master-v2.patch, HBASE-11882-master-v3.patch, TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121622#comment-14121622 ] Ted Yu commented on HBASE-11882: @Ram: Go ahead. Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.99.0, 2.0.0 Reporter: Jerry He Assignee: Jerry He Priority: Critical Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11882-master-v1.patch, HBASE-11882-master-v2.patch, HBASE-11882-master-v3.patch, TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122359#comment-14122359 ] ramkrishna.s.vasudevan commented on HBASE-11882: The change that was done to identify bulk loaded file using the presence of '_seqid_' in the file name - should that be backported to 0.98 also? Because in cases of 0.98 i think we still go with the meta data BUL_LOAD_CONF_KEY? Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.99.0, 2.0.0 Reporter: Jerry He Assignee: Jerry He Priority: Critical Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11882-master-v1.patch, HBASE-11882-master-v2.patch, HBASE-11882-master-v3.patch, TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122364#comment-14122364 ] Anoop Sam John commented on HBASE-11882: That is part of a fix for HBASE-11772. I can see 0.98 also in fix versions for that Jira. So we can get it done when we commit that Jira (?) Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.99.0, 2.0.0 Reporter: Jerry He Assignee: Jerry He Priority: Critical Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11882-master-v1.patch, HBASE-11882-master-v2.patch, HBASE-11882-master-v3.patch, TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122445#comment-14122445 ] Jerry He commented on HBASE-11882: -- Thanks, [~ram_krish], [~anoop.hbase], [~tedyu]. Will go back to work on HBASE-11772 now. Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.99.0, 2.0.0 Reporter: Jerry He Assignee: Jerry He Priority: Critical Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11882-master-v1.patch, HBASE-11882-master-v2.patch, HBASE-11882-master-v3.patch, TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122477#comment-14122477 ] Hudson commented on HBASE-11882: SUCCESS: Integrated in HBase-1.0 #152 (See [https://builds.apache.org/job/HBase-1.0/152/]) HBASE-11882 Row level consistency may not be maintained with bulk load and (ramkrishna: rev 5fa07efd700ad40a5c6a6616b16b9189faba2949) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionServerBulkLoad.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.99.0, 2.0.0 Reporter: Jerry He Assignee: Jerry He Priority: Critical Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11882-master-v1.patch, HBASE-11882-master-v2.patch, HBASE-11882-master-v3.patch, TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119417#comment-14119417 ] ramkrishna.s.vasudevan commented on HBASE-11882: I hope you got my concern. Previously when a bulk load gets completed just after a scanner is created and before the scan does start a seek, the kvs in the bulk loaded file will also be taken into consideration. But after HBASE-11591 the bulk load file would not be taken into consideration. So if the test case expects some value from the bulk loaded file then it may fail. May be it did not happen now but may happen. Anyway I will check the test case once closely. +1 on patch. Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.99.0, 2.0.0 Reporter: Jerry He Assignee: Jerry He Priority: Critical Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11882-master-v1.patch, HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119439#comment-14119439 ] Jerry He commented on HBASE-11882: -- Yes, the behavior change from the added readpoint checks for bulkloaded data using the seqId is understood. I did run the 'mvn test' after v2 patch earlier today The entire run passed cleanly. But I lost the result. I've just kick off another run, and will paste the result here. Also try to trigger a Hadoop QA run here. Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.99.0, 2.0.0 Reporter: Jerry He Assignee: Jerry He Priority: Critical Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11882-master-v1.patch, HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120039#comment-14120039 ] Jerry He commented on HBASE-11882: -- Still there is no Hadoop QA run. Here is my local 'mvn test' run result with patch v2: {code} [INFO] Reactor Summary: [INFO] [INFO] HBase . SUCCESS [ 1.793 s] [INFO] HBase - Common SUCCESS [ 50.103 s] [INFO] HBase - Protocol .. SUCCESS [ 0.073 s] [INFO] HBase - Client SUCCESS [01:02 min] [INFO] HBase - Hadoop Compatibility .. SUCCESS [ 8.157 s] [INFO] HBase - Hadoop Two Compatibility .. SUCCESS [ 5.973 s] [INFO] HBase - Prefix Tree ... SUCCESS [ 8.950 s] [INFO] HBase - Server SUCCESS [57:32 min] [INFO] HBase - Testing Util .. SUCCESS [ 1.101 s] [INFO] HBase - Thrift SUCCESS [02:08 min] [INFO] HBase - Shell . SUCCESS [ 1.586 s] [INFO] HBase - Integration Tests . SUCCESS [ 0.516 s] [INFO] HBase - Examples .. SUCCESS [ 1.916 s] [INFO] HBase - Assembly .. SUCCESS [ 1.045 s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 01:02 h [INFO] Finished at: 2014-09-03T00:15:35-08:00 [INFO] Final Memory: 46M/273M {code} Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.99.0, 2.0.0 Reporter: Jerry He Assignee: Jerry He Priority: Critical Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11882-master-v1.patch, HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120216#comment-14120216 ] Hadoop QA commented on HBASE-11882: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666066/HBASE-11882-master-v2.patch against trunk revision . ATTACHMENT ID: 12666066 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 2 zombie test(s): at org.apache.hadoop.hbase.client.TestHCM.testClusterStatus(TestHCM.java:250) at org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting(TestHRegion.java:3813) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10695//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10695//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10695//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10695//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10695//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10695//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10695//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10695//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10695//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10695//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10695//console This message is automatically generated. Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.99.0, 2.0.0 Reporter: Jerry He Assignee: Jerry He Priority: Critical Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11882-master-v1.patch, HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10.
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120523#comment-14120523 ] Jerry He commented on HBASE-11882: -- The 'zombie tests' don't seem to be related to the patch. To be safe, pulled from master again, applied v2 patch. Full 'mvn test' ran cleanly again. Separately ran TestHRegion: {code} Running org.apache.hadoop.hbase.regionserver.TestHRegion Tests run: 80, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 107.129 sec - in org.apache.hadoop.hbase.regionserver.TestHRegion Results : Tests run: 80, Failures: 0, Errors: 0, Skipped: 0 {code} Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.99.0, 2.0.0 Reporter: Jerry He Assignee: Jerry He Priority: Critical Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11882-master-v1.patch, HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120644#comment-14120644 ] Ted Yu commented on HBASE-11882: nit: {code} + writer.appendFileInfo(StoreFile.BULKLOAD_TIME_KEY, +Bytes.toBytes(System.currentTimeMillis())); {code} You can utilize the variable now which is assigned on line 111. Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.99.0, 2.0.0 Reporter: Jerry He Assignee: Jerry He Priority: Critical Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11882-master-v1.patch, HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120732#comment-14120732 ] Jerry He commented on HBASE-11882: -- Hi, [~yuzhih...@gmail.com] Updated with v3 patch. Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.99.0, 2.0.0 Reporter: Jerry He Assignee: Jerry He Priority: Critical Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11882-master-v1.patch, HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120819#comment-14120819 ] Hadoop QA commented on HBASE-11882: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666379/HBASE-11882-master-v3.patch against trunk revision . ATTACHMENT ID: 12666379 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10707//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10707//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10707//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10707//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10707//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10707//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10707//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10707//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10707//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10707//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10707//console This message is automatically generated. Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.99.0, 2.0.0 Reporter: Jerry He Assignee: Jerry He Priority: Critical Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11882-master-v1.patch, HBASE-11882-master-v2.patch, HBASE-11882-master-v3.patch, TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120869#comment-14120869 ] ramkrishna.s.vasudevan commented on HBASE-11882: Will commit this patch unless objections. Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.99.0, 2.0.0 Reporter: Jerry He Assignee: Jerry He Priority: Critical Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11882-master-v1.patch, HBASE-11882-master-v2.patch, HBASE-11882-master-v3.patch, TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118560#comment-14118560 ] Jerry He commented on HBASE-11882: -- Attached a patch for TestHRegionServerBulkLoad. Currently TestHRegionServerBulkLoad does not add BULKLOAD_TIME_KEY to the bulk load hfiles. But currently we use BULKLOAD_TIME_KEY to determine isBulkLoadResult(). Without it, isBulkLoadResult() is always false, This hides the problem, and that is why the test has been passing in normal QA runs. With the added BULKLOAD_TIME_KEY, the test will fail most of the time. Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.0.0, 2.0.0 Reporter: Jerry He Priority: Critical Fix For: 1.0.0, 2.0.0 Attachments: TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118567#comment-14118567 ] Jerry He commented on HBASE-11882: -- There are two options to fix the problem: Option1. Revert to the previous behavior. No readpoint/MVCC check for bulk loaded data. They become visible immediately to all scanners. Option2. Transfer the bulk load seqId into the cells during compaction. Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.0.0, 2.0.0 Reporter: Jerry He Priority: Critical Fix For: 1.0.0, 2.0.0 Attachments: TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118573#comment-14118573 ] ramkrishna.s.vasudevan commented on HBASE-11882: Great debugging. Option 1 would be bettre I think. So HBASE-11591 would only need to fix same KV case as done in earlier patches). Excuse typos. Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.0.0, 2.0.0 Reporter: Jerry He Priority: Critical Fix For: 1.0.0, 2.0.0 Attachments: TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118576#comment-14118576 ] Jerry He commented on HBASE-11882: -- My 2 cent opinion is to go for Option2. The bulk load data are treated in a consistent way as the puts and writes. A consistent story of integration that we've been going along for a while for bulk load. Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.0.0, 2.0.0 Reporter: Jerry He Priority: Critical Fix For: 1.0.0, 2.0.0 Attachments: TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118586#comment-14118586 ] Jerry He commented on HBASE-11882: -- Hi, [~ram_krish] Didn't refresh to see your comment before I clicked submit. Still we should consider Option2. :-) Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.0.0, 2.0.0 Reporter: Jerry He Priority: Critical Fix For: 1.0.0, 2.0.0 Attachments: TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118620#comment-14118620 ] Anoop Sam John commented on HBASE-11882: bq.Then I think the above change is not necessary and only handle the case as per the initial patch where we handle same KVs case This is what your last comment in HBase-11591 regarding the read point check for bulk loaded files. So we were continuing with existing way(?) I am +1 for option2 bq.Option2. Transfer the bulk load seqId into the cells during compaction. In read we set each Cell seqId with the seqId of the file right? Compaction read cells from store files and write to single file. So this set of seqId not happening? Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.0.0, 2.0.0 Reporter: Jerry He Priority: Critical Fix For: 1.0.0, 2.0.0 Attachments: TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118661#comment-14118661 ] Jerry He commented on HBASE-11882: -- Hi, [~anoop.hbase] bq. In read we set each Cell seqId with the seqId of the file right? Compaction read cells from store files and write to single file. So this set of seqId not happening? Good question! I attached a patch from the testing I was doing. It fixed the problem. But based on your comment, let me confirm if we really need to do the part to add the seqId: {code} + if (current != null current.isBulkLoaded() + current.getSequenceID() = smallestReadPoint) { +kv.setSequenceId(current.getSequenceID()); + } {code} Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.0.0, 2.0.0 Reporter: Jerry He Priority: Critical Fix For: 1.0.0, 2.0.0 Attachments: HBASE-11882-master-v1.patch, TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118923#comment-14118923 ] Jerry He commented on HBASE-11882: -- Hi, [~anoop.hbase] You are right. We don't need to set the seqId again in Compactor.performCompaction(). The scanners have them already set on the fly. The fix is even simpler. Updated with patch v2. v2 also combines the change to TestHRegionServerBulkLoad. Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.0.0, 2.0.0 Reporter: Jerry He Priority: Critical Fix For: 1.0.0, 2.0.0 Attachments: HBASE-11882-master-v1.patch, TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119310#comment-14119310 ] Anoop Sam John commented on HBASE-11882: V2 patch looks good to me. Good debugging. Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.0.0, 2.0.0 Reporter: Jerry He Priority: Critical Fix For: 1.0.0, 2.0.0 Attachments: HBASE-11882-master-v1.patch, HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119319#comment-14119319 ] ramkrishna.s.vasudevan commented on HBASE-11882: The reason why I was telling option 1 was it is a behaviour change in bulk load. Previously all the KVs were getting visible from a bulk loaded file and this was true in cases of where a bulk load could get completed when a scan is in progress. The HBASE-11591 changed that behaviour too and now bulk load will go through mvcc sequence as per the other normal kvs (during scan). The patch v2 looks fine to me and it solves the problem mentioned in the JIRA. But I doubt whether this solves the test case issue. Assume in the test case if there is a scanner that gets started and just after that a bulk load is completed, the kvs in the bulk loaded file is not visibile. Does this behaviour happen in the test case now? If we all agree with the new behaviour then I think the test case may need some tweak. That was my concern. Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.99.0, 2.0.0 Reporter: Jerry He Assignee: Jerry He Priority: Critical Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11882-master-v1.patch, HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119322#comment-14119322 ] ramkrishna.s.vasudevan commented on HBASE-11882: bq.Does this behaviour happen in the test case now? If so, previously the bulk loaded kvs would also be seen but now it won't be seen. Am not seen the test case closely if it has any such cases that could potentially happen. Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.99.0, 2.0.0 Reporter: Jerry He Assignee: Jerry He Priority: Critical Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11882-master-v1.patch, HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction
[ https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119408#comment-14119408 ] Jerry He commented on HBASE-11882: -- Hi, [~ram_krish] TestHRegionServerBulkLoad spawns multiple threads that continuously bulkload, scan, and compact the same region. Scanners get started before bulk load, after bulk load or in between. But its only purpose is to test and ensure row level atomicity. Row level consistency may not be maintained with bulk load and compaction - Key: HBASE-11882 URL: https://issues.apache.org/jira/browse/HBASE-11882 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.99.0, 2.0.0 Reporter: Jerry He Assignee: Jerry He Priority: Critical Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11882-master-v1.patch, HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I found the root cause is that row level atomicity may not be maintained with bulk load together with compation. TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses multiple threads to do bulk load and scan continuously and do compactions periodically. It verifies row level data is always consistent across column families. After HBASE-11591, we added readpoint checks for bulkloaded data using the seqId at the time of bulk load. Now a scanner will not see the data from a bulk load if the scanner's readpoint is earlier than the bulk load seqId. Previously, the atomic bulk load result is visible immediately to all scanners. The problem is with compaction after bulk load. Compaction does not lock the region and it is done one store (column family) at a time. It also compact away the seqId marker of bulk load. Here is an event sequence where the row level consistency is broken. 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 10. 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 11. Bulk load is guarded by region write lock. So it is atomic. 3. There is a compaction that compacts cf1. It compacts away the seqId marker of the bulk load. 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 since there is no seqId preventing it. It does not get the bulk load data for cf2 since the scanner's readpoint (10) is less than the bulk load seqId (11). Now the row level consistency is broken in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)