[ 
https://issues.apache.org/jira/browse/HBASE-15236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240549#comment-15240549
 ] 

Hadoop QA commented on HBASE-15236:
-----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
29s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s 
{color} | {color:green} master passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} master passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 5m 
21s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} master passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s 
{color} | {color:green} master passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s 
{color} | {color:green} the patch passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 5m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
29m 51s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 104m 6s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
15s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 159m 50s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | 
org.apache.hadoop.hbase.snapshot.TestMobFlushSnapshotFromClient |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12798623/HBASE-15236-master-v8.patch
 |
| JIRA Issue | HBASE-15236 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux pietas.apache.org 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT 
Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / 5a7c8dc |
| Default Java | 1.7.0_79 |
| Multi-JDK versions |  /home/jenkins/tools/java/jdk1.8.0:1.8.0 
/usr/local/jenkins/java/jdk1.7.0_79:1.7.0_79 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/1406/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/1406/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/1406/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/1406/console |
| Powered by | Apache Yetus 0.2.1   http://yetus.apache.org |


This message was automatically generated.



> Inconsistent cell reads over multiple bulk-loaded HFiles
> --------------------------------------------------------
>
>                 Key: HBASE-15236
>                 URL: https://issues.apache.org/jira/browse/HBASE-15236
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Appy
>            Assignee: Appy
>         Attachments: HBASE-15236-master-v2.patch, 
> HBASE-15236-master-v3.patch, HBASE-15236-master-v4.patch, 
> HBASE-15236-master-v5.patch, HBASE-15236-master-v6.patch, 
> HBASE-15236-master-v7.patch, HBASE-15236-master-v8.patch, HBASE-15236.patch, 
> TestWithSingleHRegion.java, tables_data.zip
>
>
>  If there are two bulkloaded hfiles in a region with same seqID, same 
> timestamps and duplicate keys*, get and scan may return different values for 
> a key. Not sure how this would happen, but one of our customer uploaded a 
> dataset with 2 files in a single region and both having same bulk load 
> timestamp. These files are small ~50M (I couldn't find any setting for max 
> file size that could lead to 2 files). The range of keys in two hfiles are 
> overlapping to some extent, but not fully (so the two files are because of 
> region merge).
> In such a case, depending on file sizes (used in 
> StoreFile.Comparators.SEQ_ID), we may get different values for the same cell 
> (say "r", "cf:50") depending on what we call: get "r" "cf:50" or get "r" 
> "cf:".
> To replicate this, i ran ImportTsv twice with 2 different csv files but same 
> timestamp (-Dimporttsv.timestamp=1). Collected two resulting hfiles in a 
> single directory and loaded with LoadIncrementalHFiles. Here are the commands.
> {noformat}
> //CSV files should be in hdfs:///user/hbase/tmp
> sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.ImportTsv  
> -Dimporttsv.columns=HBASE_ROW_KEY,c:1,c:2,c:3,c:4,c:5,c:6 
> -Dimporttsv.separator='|' -Dimporttsv.timestamp=1 
> -Dimporttsv.bulk.output=hdfs:///user/hbase/1 t tmp
> {noformat}
> tables_data.zip contains three tables: t, t2, and t3.
> To load these tables and test with them, use the attached 
> TestWithSingleHRegion.java. (Change ROOT_DIR and TABLE_NAME variables 
> appropriately)
> Hfiles for table 't':
> 1) 07922ac90208445883b2ddc89c424c89 : length = 1094: KVs = (c:5, 55) (c:6, 66)
> 2) 143137fa4fa84625b4e8d1d84a494f08 : length = 1192: KVs = (c:1, 1) (c:2, 2) 
> (c:3, 3) (c:4, 4) (c:5, 5) (c:6, 6)
> Get returns 5  where as Scan returns 55.
> On the other hand if I make the first hfile, the one with values 55 and 66 
> bigger in size than second hfile, then:
> Get returns 55  where as Scan returns 55.
> This can be seen in table 't2'.
> Weird things:
> 1. Get consistently returns values from larger hfile.
> 2. Scan consistently returns 55.
> Reason:
> Explanation 1: We add StoreFileScanners to heap by iterating over list of 
> store files which is kept sorted in DefaultStoreFileManger using 
> StoreFile.Comparators.SEQ_ID. It sorts the file first by increasing seq id, 
> then by decreasing filesize. So our larger files sizes are always in the 
> start.
> Explanation 2: In KeyValueHeap.next(), if both current and this.heap.peek() 
> scanners (type is StoreFileScanner in this case) point to KVs which compare 
> as equal, we put back the current scanner into heap, and call pollRealKV() to 
> get new one. While inserting, KVScannerComparator is used which compares the 
> two (one already in heap and the one being inserted) as equal and inserts it 
> after the existing one. \[1\]
> Say s1 is scanner over first hfile and s2 over second.
> 1. We scan with s2 to get values 1, 2, 3, 4
> 2. see that both scanners have c:5 as next key, put current scanner s2 back 
> in heap, take s1 out
> 4. return 55, advance s1 to key c:6
> 5. compare c:6 to c:5, put back s2 in heap since it has larger key, take out 
> s1
> 6. ignore it's value (there is code for that too), advance s2 to c:6
> Go back to step 2 with both scanners having c:6 as next key
> This is wrong because if we add a key between c:4 and c:5 to first hfile, say 
> (c:44, foo)  which makes s1 as the 'current' scanner when we hit step 2, then 
> we'll get the values - 1, 2, 3, 4, foo, 5, 6.
> (try table t3)
> Fix:
> Assign priority ids to StoreFileScanners when inserting them into heap, 
> latest hfiles get higher priority, and use them in KVComparator instead of 
> just seq id.
> \[1\] 
> [PriorityQueue.siftUp()|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/PriorityQueue.java#586]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to