[ https://issues.apache.org/jira/browse/HADOOP-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559181#action_12559181 ]
Hadoop QA commented on HADOOP-2493: ----------------------------------- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12373141/2493.patch against trunk revision r612161. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1598/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1598/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1598/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1598/console This message is automatically generated. > hbase will split on row when the start and end row is the same cuase data loss > ------------------------------------------------------------------------------ > > Key: HADOOP-2493 > URL: https://issues.apache.org/jira/browse/HADOOP-2493 > Project: Hadoop > Issue Type: Bug > Components: contrib/hbase > Reporter: Billy Pearson > Assignee: Bryan Duxbury > Priority: Critical > Fix For: 0.16.0 > > Attachments: 2493.patch, regions_shot.JPG > > > While testing hbase splits with my code I was loading a table to become a > inverted index on some links > I was using the anchor text as the row key > and the column parent:child as > url:(siteurl) and the data is the count of the links pointing to the siteurl > with row key anchor text. > but a lot of sites have image links and I use "image" as the anchor text for > my testing code so there is a lot of image links. > I changed the max file size of hbase to 16mb for testing and have been able > to recreate the same error. > When the table get big it splits on the column image as the end key for one > table and the start of the next table later it splits to where the start key > and end key was image for one of the splits. After that it keep spiting the > region with start key as "image" and the end key the same. So I have multi > splits with start key and end key as "image" unless the master keeps track of > the row key and partend:child data on the splits I do not thank all the data > will get returned when querying it. > I have attached a screen shot of my regions i thank there should be some > logic to where if the start and end row key is the same the region does not > split or we need to start keeping track of the start key, column data on the > master of each split so we can know where each row is in the database. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.