Re: Let's discuss Snapshots Feature Testing

Ted Yu Mon, 14 Jan 2013 11:01:48 -0800

Thanks for the write up.

Would the new tests be sub-tasks of HBASE-7290 ?


Cheers

On Mon, Jan 14, 2013 at 10:32 AM, Aleksandr Shulman <[email protected]>wrote:

> Hi everyone,
>
> I'd like to start a thread about Cloudera's testing efforts on the upcoming
> snapshots feature. This is a new feature and it's important that we explain
> our testing efforts and get the community's opinion on what we'd all like
> to see tested. My hope is that from this discussion, we can get more ideas
> about what needs to be tested and gain confidence in the testing we have in
> place.
>
> Before I begin, I'd like to introduce myself. I'm Aleks Shulman. I'm a
> software engineer at Cloudera, working primarily on HBase. Within HBase, I
> am focusing on the quality side of things. What this means to me is an
> conversation unto itself, but in brief, I will be writing tests and test
> frameworks. I will also be an advocate for the user experience, with
> particular focus on API compatibility and ease-of-use.
>
> So let's discuss snapshots:
> There are two main areas that should be tested and they correspond nicely
> into what can be done as unit tests and what is better left as Jenkins job
> or some other automation, unit testing and non-unit testing. We've been
> working on this for a bit, so there is already some progress in these
> areas:
>
> Unit testing - In progress or completed:
>
> 1. HBase Snapshots Repeatability and Idempotency Test:
> This test class verifies proper behavior with regards performing
> restore/clone operations on tables that themselves were created as a clone
> or restored from a snapshot. This is an interesting set of cases because of
> the way snapshots work. They work by pointing to the original HFiles.
> We can use these tests to verify correctness in the file system and test
> closure under deletion of the original table.
>
> 2. HBase Snapshots HTable Descriptor Test
> This test class verifies proper behavior with regards to changes to the
> information about the table itself before and after snapshotting in the
> 'before' table and the 'after' table.
>
> 3. HBase Snapshots HFileLink Test
> This test class inspects the correctness of the HFileLink files. It looks
> into their permissioning, the naming convention, and how they respond
> events. Events may include an HFile being deleted or moved.
>
> 4. HBase Snapshots Table Dimensions Test
> This test class inspects operations on tables that are empty, have only one
> row, have one or two CFs, etc. Basically if there is an edge scenario in
> what the table looks like, that may affect the way it snapshotted or
> restored/cloned.
>
> 5. HBase Snapshots Independence Test
> This test should verify that all aspects of table independence are
> guaranteed between the original table and the restored snapshot/clone.
> This includes things like data mutations, compactions, splits, etc. It also
> includes metadata changes.
>
> 6.  HBase Snapshots Aborted or Failed Snapshot Cleanup
> Verifies that no cruft is left over after an attempt to snapshot a table
> fails or is aborted. We should be able to account for every file in the
> file system before and after.
>
> 7. HBase Snapshots HFile Archive Test
> This test task is to fill in any gaps in testing of archiving as it relates
> to snapshots. The snapshots relies on the HFileArchiver/LogArchiver with
> two new cleaners (SnapshotHFile/SnapshotLog Cleaners), so we'd need to go
> through and find out what needs to be tested between them.
>
> 8. HBase Snapshots Export Test
> This test should verify that export of a snapshot to another cluster works
> properly.
> Implemented as: mvn clean test -PlocalTests
> -Dtest=org.apache.hadoop.hbase.snapshot.TestExportSnapshot
> However, we need to add more test around chmod, chown and checksums
>
> 9. HBase Snapshots Concurrent Snapshots Test
> This test class will enforce proper behavior in situations where race
> conditions can occur. For example, if one process attempts to restore a
> table and another one tries to do so simultaneously, what happens? We need
> to know how dangerous this could be and whether it is possible for data to
> be lost.
> Covered in HBASE-7536.
>
> Unit testing - Lightly tested so far, or tests we are hoping to write soon:
>
> 1. HBase Snapshots File System Correctness Tests -
>
> This test class verifies proper behavior with regards to what the file
> system looks like. What the file system contains should be predictable
> after certain events, both snapshot-specific and environment-specific.
> For example, after a snapshot, we should expect there to be files in the
> /hbase/.snapshot/ folder. Also, after a split occurs on the base table and
> the underlying HFiles go through flux, we should be able to know beforehand
> where files move. In particular, this is important to test after repeated
> deletions and modifications. Also -- we want to make sure no cruft remains
> after various operations occur.
>
>
> 2. HBase Snapshots (Re)Naming Test [Note: Renaming snapshots is not
> supported yet!]
>
> These tests should verify valid/invalid names for snapshots. In particular,
> it should use the rename_snapshot command to attempt to rename to a table
> that already exists, or to a snapshot that already exists (or had existed
> but was deleted).
> Things like special characters or semantically-meaningful characters are
> important as well. Other things that need to be tested are what happens if
> a snapshot is created, deleted, the underlying table is modified, and then
> another snapshot is taken. The snapshot should contain the most recent
> data.
>
>
> 3. Snapshots logline test:
> Verifies that the proper loglines are generated for events.
> Manual testing for this might include making sure that spurious,
> misleading, or unnecessary log lines are not present.
>
> 4. HBase Snapshots Aborted or Failed Clone or Restore
>
> Verifies that no cruft is left over after an attempt to restore or clone a
> snapshotted table fails or is aborted and that further snapshots can take
> place. This may be tricky and could require writing some additional
> utilities.
>
> Non-unit testing:
>
> This area of testing is less straightforward and more exploratory in
> nature. It's open-ended but with some direction. Particularly, we want to
> test a lot of "what if this happens when we do something related
> snapshots". By "this happens", I mean compactions, splits, processes dying,
> master failing over to backup master, etc. By "something related to
> snapshots", that could mean taking a snapshot, restoring a snapshot, or
> cloning a snapshot, among other things. In addition, we can see what
> happens as scaling factors, (e.g. the number of regions, amount of data per
> node, duration of test, and frequency of compactions/splits) increases.
> Finally, we should benchmark the time it takes to take/restore/clone a
> snapshot and see how it changes with scale factors.
>
> We are testing some of these combination internally. When we see something
> go awry, we fix and rerun the trial, with the expectation that the feature
> becomes more stable and reliant.
>
> Some of the things we have tried:
> -Long running tests: Run repeated snapshots while verifying that all is
> well.
>
> -Meanness tets:
> 1. Killing the master
> 2. Performing a compaction
> 3. Table enable/disable
>
> Feel free to follow-up with questions.
>
> --
> Best Regards,
>
> Aleks Shulman
> 847.814.5804
>  Cloudera
>

Re: Let's discuss Snapshots Feature Testing

Reply via email to