[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122498#comment-14122498
 ] 

Hudson commented on HBASE-11882:


FAILURE: Integrated in HBase-TRUNK #5467 (See 
[https://builds.apache.org/job/HBase-TRUNK/5467/])
HBASE-11882 Row level consistency may not be maintained with bulk load and 
(ramkrishna: rev 8de30d32d4d5c86650effadbda72f7ef32a4f15f)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionServerBulkLoad.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java


 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.99.0, 2.0.0
Reporter: Jerry He
Assignee: Jerry He
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11882-master-v1.patch, 
 HBASE-11882-master-v2.patch, HBASE-11882-master-v3.patch, 
 TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121622#comment-14121622
 ] 

Ted Yu commented on HBASE-11882:


@Ram:
Go ahead.

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.99.0, 2.0.0
Reporter: Jerry He
Assignee: Jerry He
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11882-master-v1.patch, 
 HBASE-11882-master-v2.patch, HBASE-11882-master-v3.patch, 
 TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-04 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122359#comment-14122359
 ] 

ramkrishna.s.vasudevan commented on HBASE-11882:


The change that was done to identify bulk loaded file using the  presence of 
'_seqid_' in the file name - should that be backported to 0.98 also?  Because 
in cases of 0.98 i think we still go with the meta data BUL_LOAD_CONF_KEY?

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.99.0, 2.0.0
Reporter: Jerry He
Assignee: Jerry He
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11882-master-v1.patch, 
 HBASE-11882-master-v2.patch, HBASE-11882-master-v3.patch, 
 TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-04 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122364#comment-14122364
 ] 

Anoop Sam John commented on HBASE-11882:


That is part of a fix for HBASE-11772. I can see 0.98 also in fix versions for 
that Jira. So we can get it done when we commit that Jira (?)

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.99.0, 2.0.0
Reporter: Jerry He
Assignee: Jerry He
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11882-master-v1.patch, 
 HBASE-11882-master-v2.patch, HBASE-11882-master-v3.patch, 
 TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-04 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122445#comment-14122445
 ] 

Jerry He commented on HBASE-11882:
--

Thanks, [~ram_krish], [~anoop.hbase], [~tedyu].

Will go back to work on HBASE-11772 now.

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.99.0, 2.0.0
Reporter: Jerry He
Assignee: Jerry He
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11882-master-v1.patch, 
 HBASE-11882-master-v2.patch, HBASE-11882-master-v3.patch, 
 TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122477#comment-14122477
 ] 

Hudson commented on HBASE-11882:


SUCCESS: Integrated in HBase-1.0 #152 (See 
[https://builds.apache.org/job/HBase-1.0/152/])
HBASE-11882 Row level consistency may not be maintained with bulk load and 
(ramkrishna: rev 5fa07efd700ad40a5c6a6616b16b9189faba2949)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionServerBulkLoad.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java


 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.99.0, 2.0.0
Reporter: Jerry He
Assignee: Jerry He
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11882-master-v1.patch, 
 HBASE-11882-master-v2.patch, HBASE-11882-master-v3.patch, 
 TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-03 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119417#comment-14119417
 ] 

ramkrishna.s.vasudevan commented on HBASE-11882:


I hope you got my concern.  Previously when a bulk load gets completed just 
after a scanner is created and before the scan does start a seek, the kvs in 
the bulk loaded file will also be taken into consideration. But after 
HBASE-11591 the bulk load file would not be taken into consideration.  So if 
the test case expects some value from the bulk loaded file then it may fail.  
May be it did not happen now but may happen.  Anyway I will check the test case 
once closely.  +1 on patch. 

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.99.0, 2.0.0
Reporter: Jerry He
Assignee: Jerry He
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11882-master-v1.patch, 
 HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-03 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119439#comment-14119439
 ] 

Jerry He commented on HBASE-11882:
--

Yes, the behavior change from the added readpoint checks for bulkloaded data 
using the seqId is understood.

I did run the 'mvn test' after v2 patch earlier today  The entire run passed 
cleanly. But I lost the result.  I've just kick off another run, and will paste 
the result here.
Also try to trigger a Hadoop QA run here.

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.99.0, 2.0.0
Reporter: Jerry He
Assignee: Jerry He
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11882-master-v1.patch, 
 HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-03 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120039#comment-14120039
 ] 

Jerry He commented on HBASE-11882:
--

Still there is no Hadoop QA run. 
Here is my local 'mvn test' run result with patch v2:
{code}
[INFO] Reactor Summary:
[INFO]
[INFO] HBase . SUCCESS [  1.793 s]
[INFO] HBase - Common  SUCCESS [ 50.103 s]
[INFO] HBase - Protocol .. SUCCESS [  0.073 s]
[INFO] HBase - Client  SUCCESS [01:02 min]
[INFO] HBase - Hadoop Compatibility .. SUCCESS [  8.157 s]
[INFO] HBase - Hadoop Two Compatibility .. SUCCESS [  5.973 s]
[INFO] HBase - Prefix Tree ... SUCCESS [  8.950 s]
[INFO] HBase - Server  SUCCESS [57:32 min]
[INFO] HBase - Testing Util .. SUCCESS [  1.101 s]
[INFO] HBase - Thrift  SUCCESS [02:08 min]
[INFO] HBase - Shell . SUCCESS [  1.586 s]
[INFO] HBase - Integration Tests . SUCCESS [  0.516 s]
[INFO] HBase - Examples .. SUCCESS [  1.916 s]
[INFO] HBase - Assembly .. SUCCESS [  1.045 s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 01:02 h
[INFO] Finished at: 2014-09-03T00:15:35-08:00
[INFO] Final Memory: 46M/273M
{code}

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.99.0, 2.0.0
Reporter: Jerry He
Assignee: Jerry He
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11882-master-v1.patch, 
 HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120216#comment-14120216
 ] 

Hadoop QA commented on HBASE-11882:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12666066/HBASE-11882-master-v2.patch
  against trunk revision .
  ATTACHMENT ID: 12666066

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 2 zombie test(s):   
at org.apache.hadoop.hbase.client.TestHCM.testClusterStatus(TestHCM.java:250)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting(TestHRegion.java:3813)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10695//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10695//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10695//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10695//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10695//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10695//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10695//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10695//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10695//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10695//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10695//console

This message is automatically generated.

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.99.0, 2.0.0
Reporter: Jerry He
Assignee: Jerry He
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11882-master-v1.patch, 
 HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.

[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-03 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120523#comment-14120523
 ] 

Jerry He commented on HBASE-11882:
--

The 'zombie tests' don't seem to be related to  the patch.
To be safe, pulled from master again, applied v2 patch.  Full 'mvn test' ran 
cleanly again.

Separately ran TestHRegion:
{code}
Running org.apache.hadoop.hbase.regionserver.TestHRegion
Tests run: 80, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 107.129 sec - 
in org.apache.hadoop.hbase.regionserver.TestHRegion

Results :

Tests run: 80, Failures: 0, Errors: 0, Skipped: 0
{code}

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.99.0, 2.0.0
Reporter: Jerry He
Assignee: Jerry He
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11882-master-v1.patch, 
 HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-03 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120644#comment-14120644
 ] 

Ted Yu commented on HBASE-11882:


nit:
{code}
+  writer.appendFileInfo(StoreFile.BULKLOAD_TIME_KEY,
+Bytes.toBytes(System.currentTimeMillis()));
{code}
You can utilize the variable now which is assigned on line 111.

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.99.0, 2.0.0
Reporter: Jerry He
Assignee: Jerry He
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11882-master-v1.patch, 
 HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-03 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120732#comment-14120732
 ] 

Jerry He commented on HBASE-11882:
--

Hi, [~yuzhih...@gmail.com]

Updated with v3 patch.

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.99.0, 2.0.0
Reporter: Jerry He
Assignee: Jerry He
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11882-master-v1.patch, 
 HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120819#comment-14120819
 ] 

Hadoop QA commented on HBASE-11882:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12666379/HBASE-11882-master-v3.patch
  against trunk revision .
  ATTACHMENT ID: 12666379

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10707//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10707//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10707//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10707//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10707//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10707//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10707//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10707//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10707//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10707//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10707//console

This message is automatically generated.

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.99.0, 2.0.0
Reporter: Jerry He
Assignee: Jerry He
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11882-master-v1.patch, 
 HBASE-11882-master-v2.patch, HBASE-11882-master-v3.patch, 
 TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk 

[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-03 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120869#comment-14120869
 ] 

ramkrishna.s.vasudevan commented on HBASE-11882:


Will commit this patch unless objections.

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.99.0, 2.0.0
Reporter: Jerry He
Assignee: Jerry He
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11882-master-v1.patch, 
 HBASE-11882-master-v2.patch, HBASE-11882-master-v3.patch, 
 TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-02 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118560#comment-14118560
 ] 

Jerry He commented on HBASE-11882:
--

Attached a patch for TestHRegionServerBulkLoad.
Currently TestHRegionServerBulkLoad does not add BULKLOAD_TIME_KEY to the bulk 
load hfiles. But currently we use BULKLOAD_TIME_KEY to determine 
isBulkLoadResult().
Without it, isBulkLoadResult() is always false,  This hides the problem, and 
that is why the test has been passing in normal QA runs.

With the added BULKLOAD_TIME_KEY, the test will fail most of the time.

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.0.0, 2.0.0
Reporter: Jerry He
Priority: Critical
 Fix For: 1.0.0, 2.0.0

 Attachments: TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-02 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118567#comment-14118567
 ] 

Jerry He commented on HBASE-11882:
--

There are two options to fix the problem:

Option1.  Revert to the previous behavior.  No readpoint/MVCC check for bulk 
loaded data. They become visible immediately to all scanners.
Option2.  Transfer the bulk load seqId into the cells during compaction.

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.0.0, 2.0.0
Reporter: Jerry He
Priority: Critical
 Fix For: 1.0.0, 2.0.0

 Attachments: TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-02 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118573#comment-14118573
 ] 

ramkrishna.s.vasudevan commented on HBASE-11882:


Great debugging. 
Option 1 would be bettre I think. So HBASE-11591 would only need to fix same KV 
case as done in earlier patches).

Excuse typos.


 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.0.0, 2.0.0
Reporter: Jerry He
Priority: Critical
 Fix For: 1.0.0, 2.0.0

 Attachments: TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-02 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118576#comment-14118576
 ] 

Jerry He commented on HBASE-11882:
--

My 2 cent opinion is to go for Option2. 
The bulk load data are treated in a consistent way as the puts and writes. A 
consistent story of integration that we've been going along for a while for 
bulk load.

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.0.0, 2.0.0
Reporter: Jerry He
Priority: Critical
 Fix For: 1.0.0, 2.0.0

 Attachments: TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-02 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118586#comment-14118586
 ] 

Jerry He commented on HBASE-11882:
--

Hi, [~ram_krish]

Didn't refresh to see your comment before I clicked submit.
Still we should consider Option2. :-)

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.0.0, 2.0.0
Reporter: Jerry He
Priority: Critical
 Fix For: 1.0.0, 2.0.0

 Attachments: TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-02 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118620#comment-14118620
 ] 

Anoop Sam John commented on HBASE-11882:


bq.Then I think the above change is not necessary and only handle the case as 
per the initial patch where we handle same KVs case
This is what your last comment in HBase-11591 regarding the read point check 
for bulk loaded files.  So we were continuing with existing way(?)

I am +1 for option2
bq.Option2. Transfer the bulk load seqId into the cells during compaction.
In read we set each Cell seqId with the seqId of the file right? Compaction 
read cells from store files and write to single file. So this set of seqId not 
happening?


 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.0.0, 2.0.0
Reporter: Jerry He
Priority: Critical
 Fix For: 1.0.0, 2.0.0

 Attachments: TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-02 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118661#comment-14118661
 ] 

Jerry He commented on HBASE-11882:
--

Hi, [~anoop.hbase]

bq. In read we set each Cell seqId with the seqId of the file right? Compaction 
read cells from store files and write to single file. So this set of seqId not 
happening?

Good question!

I attached a patch from the testing I was doing.  It fixed the problem.
But based on your comment, let me confirm if we really need to do the part to 
add the seqId: 
{code}
+  if (current != null  current.isBulkLoaded()
+   current.getSequenceID() = smallestReadPoint) {
+kv.setSequenceId(current.getSequenceID());
+  }
{code}

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.0.0, 2.0.0
Reporter: Jerry He
Priority: Critical
 Fix For: 1.0.0, 2.0.0

 Attachments: HBASE-11882-master-v1.patch, 
 TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-02 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118923#comment-14118923
 ] 

Jerry He commented on HBASE-11882:
--

Hi, [~anoop.hbase]

You are right. We don't need to set the seqId again in 
Compactor.performCompaction().  The scanners have them already set on the fly.
The fix is even simpler.
Updated with patch v2.  v2 also combines the change to 
TestHRegionServerBulkLoad. 

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.0.0, 2.0.0
Reporter: Jerry He
Priority: Critical
 Fix For: 1.0.0, 2.0.0

 Attachments: HBASE-11882-master-v1.patch, 
 TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-02 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119310#comment-14119310
 ] 

Anoop Sam John commented on HBASE-11882:


V2 patch looks good to me.  Good debugging.

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.0.0, 2.0.0
Reporter: Jerry He
Priority: Critical
 Fix For: 1.0.0, 2.0.0

 Attachments: HBASE-11882-master-v1.patch, 
 HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-02 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119319#comment-14119319
 ] 

ramkrishna.s.vasudevan commented on HBASE-11882:


The reason why I was telling option 1 was it is a behaviour change in bulk 
load.  Previously all the KVs were getting visible from a bulk loaded file and 
this was true in cases of where a bulk load could get completed when a scan is 
in progress.
The HBASE-11591 changed that behaviour too and now bulk load will go through 
mvcc sequence as per the other normal kvs (during scan).
The patch v2 looks fine to me and it solves the problem mentioned in the JIRA.  
But I doubt whether this solves the test case issue.  Assume in the test case 
if there is a scanner that gets started and just after that a bulk load is 
completed, the kvs in the bulk loaded file is not visibile. Does this behaviour 
happen in the test case now? If we all agree with the new behaviour then I 
think the test case may need some tweak. That was my concern.

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.99.0, 2.0.0
Reporter: Jerry He
Assignee: Jerry He
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11882-master-v1.patch, 
 HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-02 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119322#comment-14119322
 ] 

ramkrishna.s.vasudevan commented on HBASE-11882:


bq.Does this behaviour happen in the test case now?
If so, previously the bulk loaded kvs would also be seen but now it won't be 
seen.  Am not seen the test case closely if it has any such cases that could 
potentially happen.

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.99.0, 2.0.0
Reporter: Jerry He
Assignee: Jerry He
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11882-master-v1.patch, 
 HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11882) Row level consistency may not be maintained with bulk load and compaction

2014-09-02 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119408#comment-14119408
 ] 

Jerry He commented on HBASE-11882:
--

Hi, [~ram_krish]

TestHRegionServerBulkLoad spawns multiple threads that continuously bulkload, 
scan, and compact the same region. Scanners get started before bulk load, after 
bulk load or in between.
But its only purpose is to test and ensure row level atomicity.

 Row level consistency may not be maintained with bulk load and compaction
 -

 Key: HBASE-11882
 URL: https://issues.apache.org/jira/browse/HBASE-11882
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.99.0, 2.0.0
Reporter: Jerry He
Assignee: Jerry He
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11882-master-v1.patch, 
 HBASE-11882-master-v2.patch, TestHRegionServerBulkLoad.java.patch


 While looking into the TestHRegionServerBulkLoad failure for HBASE-11772, I 
 found the root cause is that row level atomicity may not be maintained with 
 bulk load together with compation.
 TestHRegionServerBulkLoad is used to test bulk load atomicity. The test uses 
 multiple threads to do bulk load and scan continuously and do compactions 
 periodically. 
 It verifies row level data is always consistent across column families.
 After HBASE-11591, we added readpoint checks for bulkloaded data using the 
 seqId at the time of bulk load. Now a scanner will not see the data from a 
 bulk load if the scanner's readpoint is earlier than the bulk load seqId.
 Previously, the atomic bulk load result is visible immediately to all 
 scanners.
 The problem is with compaction after bulk load. Compaction does not lock the 
 region and it is done one store (column family) at a time. It also compact 
 away the seqId marker of bulk load.
 Here is an event sequence where the row level consistency is broken.
 1. A scanner is started to scan a region with cf1 and cf2. The readpoint is 
 10.
 2. There is a bulk load that loads into cf1 and cf2. The bulk load seqId is 
 11. Bulk load is guarded by region write lock. So it is atomic.
 3. There is a compaction that compacts cf1. It compacts away the seqId marker 
 of the bulk load.
 4. The scanner tries to next to row-1001. It gets the bulk load data for cf1 
 since there is no seqId preventing it.  It does not get the bulk load data 
 for cf2 since the scanner's readpoint (10) is less than the bulk load seqId 
 (11).
 Now the row level consistency is broken in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)