Yes, I am planning on filing a JIRA for that shortly. -Aleks S.
On Mon, Jan 14, 2013 at 11:01 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Thanks for the write up. > > Would the new tests be sub-tasks of HBASE-7290 ? > > Cheers > > On Mon, Jan 14, 2013 at 10:32 AM, Aleksandr Shulman <al...@cloudera.com > >wrote: > > > Hi everyone, > > > > I'd like to start a thread about Cloudera's testing efforts on the > upcoming > > snapshots feature. This is a new feature and it's important that we > explain > > our testing efforts and get the community's opinion on what we'd all like > > to see tested. My hope is that from this discussion, we can get more > ideas > > about what needs to be tested and gain confidence in the testing we have > in > > place. > > > > Before I begin, I'd like to introduce myself. I'm Aleks Shulman. I'm a > > software engineer at Cloudera, working primarily on HBase. Within HBase, > I > > am focusing on the quality side of things. What this means to me is an > > conversation unto itself, but in brief, I will be writing tests and test > > frameworks. I will also be an advocate for the user experience, with > > particular focus on API compatibility and ease-of-use. > > > > So let's discuss snapshots: > > There are two main areas that should be tested and they correspond nicely > > into what can be done as unit tests and what is better left as Jenkins > job > > or some other automation, unit testing and non-unit testing. We've been > > working on this for a bit, so there is already some progress in these > > areas: > > > > Unit testing - In progress or completed: > > > > 1. HBase Snapshots Repeatability and Idempotency Test: > > This test class verifies proper behavior with regards performing > > restore/clone operations on tables that themselves were created as a > clone > > or restored from a snapshot. This is an interesting set of cases because > of > > the way snapshots work. They work by pointing to the original HFiles. > > We can use these tests to verify correctness in the file system and test > > closure under deletion of the original table. > > > > 2. HBase Snapshots HTable Descriptor Test > > This test class verifies proper behavior with regards to changes to the > > information about the table itself before and after snapshotting in the > > 'before' table and the 'after' table. > > > > 3. HBase Snapshots HFileLink Test > > This test class inspects the correctness of the HFileLink files. It looks > > into their permissioning, the naming convention, and how they respond > > events. Events may include an HFile being deleted or moved. > > > > 4. HBase Snapshots Table Dimensions Test > > This test class inspects operations on tables that are empty, have only > one > > row, have one or two CFs, etc. Basically if there is an edge scenario in > > what the table looks like, that may affect the way it snapshotted or > > restored/cloned. > > > > 5. HBase Snapshots Independence Test > > This test should verify that all aspects of table independence are > > guaranteed between the original table and the restored snapshot/clone. > > This includes things like data mutations, compactions, splits, etc. It > also > > includes metadata changes. > > > > 6. HBase Snapshots Aborted or Failed Snapshot Cleanup > > Verifies that no cruft is left over after an attempt to snapshot a table > > fails or is aborted. We should be able to account for every file in the > > file system before and after. > > > > 7. HBase Snapshots HFile Archive Test > > This test task is to fill in any gaps in testing of archiving as it > relates > > to snapshots. The snapshots relies on the HFileArchiver/LogArchiver with > > two new cleaners (SnapshotHFile/SnapshotLog Cleaners), so we'd need to go > > through and find out what needs to be tested between them. > > > > 8. HBase Snapshots Export Test > > This test should verify that export of a snapshot to another cluster > works > > properly. > > Implemented as: mvn clean test -PlocalTests > > -Dtest=org.apache.hadoop.hbase.snapshot.TestExportSnapshot > > However, we need to add more test around chmod, chown and checksums > > > > 9. HBase Snapshots Concurrent Snapshots Test > > This test class will enforce proper behavior in situations where race > > conditions can occur. For example, if one process attempts to restore a > > table and another one tries to do so simultaneously, what happens? We > need > > to know how dangerous this could be and whether it is possible for data > to > > be lost. > > Covered in HBASE-7536. > > > > Unit testing - Lightly tested so far, or tests we are hoping to write > soon: > > > > 1. HBase Snapshots File System Correctness Tests - > > > > This test class verifies proper behavior with regards to what the file > > system looks like. What the file system contains should be predictable > > after certain events, both snapshot-specific and environment-specific. > > For example, after a snapshot, we should expect there to be files in the > > /hbase/.snapshot/ folder. Also, after a split occurs on the base table > and > > the underlying HFiles go through flux, we should be able to know > beforehand > > where files move. In particular, this is important to test after repeated > > deletions and modifications. Also -- we want to make sure no cruft > remains > > after various operations occur. > > > > > > 2. HBase Snapshots (Re)Naming Test [Note: Renaming snapshots is not > > supported yet!] > > > > These tests should verify valid/invalid names for snapshots. In > particular, > > it should use the rename_snapshot command to attempt to rename to a table > > that already exists, or to a snapshot that already exists (or had existed > > but was deleted). > > Things like special characters or semantically-meaningful characters are > > important as well. Other things that need to be tested are what happens > if > > a snapshot is created, deleted, the underlying table is modified, and > then > > another snapshot is taken. The snapshot should contain the most recent > > data. > > > > > > 3. Snapshots logline test: > > Verifies that the proper loglines are generated for events. > > Manual testing for this might include making sure that spurious, > > misleading, or unnecessary log lines are not present. > > > > 4. HBase Snapshots Aborted or Failed Clone or Restore > > > > Verifies that no cruft is left over after an attempt to restore or clone > a > > snapshotted table fails or is aborted and that further snapshots can take > > place. This may be tricky and could require writing some additional > > utilities. > > > > Non-unit testing: > > > > This area of testing is less straightforward and more exploratory in > > nature. It's open-ended but with some direction. Particularly, we want to > > test a lot of "what if this happens when we do something related > > snapshots". By "this happens", I mean compactions, splits, processes > dying, > > master failing over to backup master, etc. By "something related to > > snapshots", that could mean taking a snapshot, restoring a snapshot, or > > cloning a snapshot, among other things. In addition, we can see what > > happens as scaling factors, (e.g. the number of regions, amount of data > per > > node, duration of test, and frequency of compactions/splits) increases. > > Finally, we should benchmark the time it takes to take/restore/clone a > > snapshot and see how it changes with scale factors. > > > > We are testing some of these combination internally. When we see > something > > go awry, we fix and rerun the trial, with the expectation that the > feature > > becomes more stable and reliant. > > > > Some of the things we have tried: > > -Long running tests: Run repeated snapshots while verifying that all is > > well. > > > > -Meanness tets: > > 1. Killing the master > > 2. Performing a compaction > > 3. Table enable/disable > > > > Feel free to follow-up with questions. > > > > -- > > Best Regards, > > > > Aleks Shulman > > 847.814.5804 > > Cloudera > > > -- Best Regards, Aleks Shulman 847.814.5804 Cloudera