A different top level command would be a better approach (even though
the implementation can sure much of the scan spec parsing code.) OTOH,
DUMP would be a better name though as "BACKUP" without a file (just
dump to stdout) would sound strange. Plus, it's shorter :)

On Mon, Jan 18, 2010 at 10:42 PM, Doug Judd <[email protected]> wrote:
> The BACKUP feature is really to allow for the generation of efficient backup
> files.  Certain WHERE clauses and options such as ROW, CELL, and LIMIT would
> be incompatible with the BACKUP option since BACKUP would be a completely
> separate code path and those other options don't really jibe with the
> concept of backing up a table.  The reason that I suggest folding it in with
> SELECT is because some of the other options, such as TIMESTAMP, column
> selection, and REVS, could be useful features of table backup.
>
> The other approach would be to add a toplevel BACKUP TABLE command that
> would support a subset of SELECT options that would be appropriate for table
> backups.
>
> BACKUP TABLE <table> [WHERE <where-clause>] [OPTIONS]
>
> Supported where-clause options:
>   TIMESTAMP
>
> Other supported options:
>   REVS revision_count
>   INTO FILE filename[.gz]
>
> - Doug
>
> On Mon, Jan 18, 2010 at 10:04 PM, Sanjit Jhala <[email protected]> wrote:
>>
>> I assume it will also allow SELECT (list, of, cfs) FROM foo BACKUP INTO
>> FILE "foo-backup.tgz".
>> Also I'm wondering if the work BACKUP ought to be replaced by something
>> like RANDOM or SHUFFLED to decouple this change from backups (although I
>> agree that fast restores are the main use case for this feature). So,
>> "SELECT * FROM foo SHUFFLED LIMIT=N;" returns N samples across all ranges
>> and one can additionally choose to store the output of the SELECT into the
>> tgz file for fast restores.
>>
>> -Sanjit
>>
>>
>> On Mon, Jan 18, 2010 at 8:49 PM, Doug Judd <[email protected]> wrote:
>>>
>>> The current method of using SELECT to take table backups causes
>>> efficiency problems during restore.  Because the cells are dumped in-order,
>>> when it comes time to restore from backup, the data ends up getting loaded
>>> into one range at a time.  I propose adding a BACKUP option to SELECT that
>>> would cause the data to get dumped in random order (uniformly distributed
>>> across key space).  This will cause restores to be parallelized, since
>>> ranges distributed across the cluster will receive updates simultaneously.
>>> Here's example syntax:
>>>
>>> SELECT * FROM foo BACKUP INTO FILE "foo-backup.gz";
>>>
>>> I also propose having the BACKUP option force timestamps to be dumped as
>>> well, since this will preserve the table state exactly.  Thoughts?
>>>
>>> - Doug
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Hypertable Development" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected].
>>> For more options, visit this group at
>>> http://groups.google.com/group/hypertable-dev?hl=en.
>>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Hypertable Development" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/hypertable-dev?hl=en.
>>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Hypertable Development" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/hypertable-dev?hl=en.
>
>
-- 
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en.


Reply via email to