I am speaking off the hip here, but the major compaction algorithm attempts to keep the number of major compactions to a minimum by checking the timestamp of the file. So it's possible that the other regions just 'didnt come due' yet.
-ryan On Thu, Feb 10, 2011 at 10:42 PM, James Kennedy <[email protected]> wrote: > I've tested HBase 0.90 + HBase-trx 0.90.0 and i've run it over old data from > 0.89x using a variety of seeded unit test/QA data and cluster configurations. > > But when it came time to upgrade some production data I got snagged on > HBASE-3524. The gist of it is in Ryan's last points: > > * compaction is "optional", meaning if it fails no data is lost, so you > should probably be fine. > > * Older versions of the code did not write out time tracker data and > that is why your older files were giving you NPEs. > > Makes sense. But why did I not encounter this with my initial data upgrades > on very similar data pkgs? > > So I applied Ryan's patch, which simply assigns a default value > (Long.MIN_VALUE) when a StoreFile lacks a timeRangeTracker and I "fixed" the > data by forcing major compactions on the regions affected. Preliminary > poking has not shown any instability in the data since. > > But I confess that I just don't have the time right now to really dig into > the code and validate that there are no more gotchya's or data corruption > that could have resulted. > > I guess the questions that I have for the team are: > > * What state would 9 out of 50 tables be in to miss the new 0.90.0 > timeRangeTracker injection before the first major compaction check? > * Where else is the new TimeRangeTracker used? Could a StoreFile with a null > timeRangeTracker have corrupted the data in other subtler ways? > * What other upgrade-related data changes might not have completed elsewhere? > > Thanks, > > James Kennedy > Project Manage > Troove Inc. > >
