[ https://issues.apache.org/jira/browse/HADOOP-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12555133 ]
Jim Kellerman commented on HADOOP-2493: --------------------------------------- Even simpler, during the split when the midkey is computed, if it is the same as either the start key of the lower half or the end key of the upper half, don't split. > hbase will split on row when the start and end row is the same cuase data loss > ------------------------------------------------------------------------------ > > Key: HADOOP-2493 > URL: https://issues.apache.org/jira/browse/HADOOP-2493 > Project: Hadoop > Issue Type: Bug > Components: contrib/hbase > Reporter: Billy Pearson > Priority: Critical > Attachments: regions_shot.JPG > > > While testing hbase splits with my code I was loading a table to become a > inverted index on some links > I was using the anchor text as the row key > and the column parent:child as > url:(siteurl) and the data is the count of the links pointing to the siteurl > with row key anchor text. > but a lot of sites have image links and I use "image" as the anchor text for > my testing code so there is a lot of image links. > I changed the max file size of hbase to 16mb for testing and have been able > to recreate the same error. > When the table get big it splits on the column image as the end key for one > table and the start of the next table later it splits to where the start key > and end key was image for one of the splits. After that it keep spiting the > region with start key as "image" and the end key the same. So I have multi > splits with start key and end key as "image" unless the master keeps track of > the row key and partend:child data on the splits I do not thank all the data > will get returned when querying it. > I have attached a screen shot of my regions i thank there should be some > logic to where if the start and end row key is the same the region does not > split or we need to start keeping track of the start key, column data on the > master of each split so we can know where each row is in the database. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.