On Feb 3, 12:03 am, Jung Gun Lim <[email protected]> wrote:
> Let me consider this case: there are 100 processes, which of those
> mutate a single tiny cell for every 50 miliseconds, on a single range.
> RangeServer receives 2000 update requests per second and invokes 2000
> small write requests on DFS. Since it is discomfort to write many
> small block of data on DFS separately, appending on the user commit
> log with every flushes. As I measured, writing a tiny block write on
> KFS takes about 0.6 miliseconds.
>
> To make Hypertable more scalable over huge number of mutator clients,
> it would be good to make a update request queue that, if many update
> request jobs are pending, bundle them, write their commit logs
> altogether once, and update for the corresponding ranges.
Hypertable doesn't write to DFS for every cell but every update. The
client only hits range servers, when you call flush or the buffers are
full. The client mutator currently only autoflush for 1MB per range
server or 20MB total by default. The design guarantees that when the
flush returns the data is persisted. Coalesce DFS updates is a minor
optimization that's hard to get right (to maintain correct flush
behavior) since we're already appending to DFS in fairly large blocks
in common cases.
__Luke
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---