The upper limit of 4 TB is for data on-disk (post-encoding,
post-compression, and post-replication); it does not include in-memory
data from memrowsets or deltamemstores.
The value of the limit is based on the kinds of workloads tested by
the Kudu development community. As a group we feel comfortable
supporting users up to 4 TB because we've run such workloads
ourselves. Beyond 4 TB, however, we're not exactly sure what becomes
slow, what breaks, etc.
Speaking from experience, as the amount of on-disk data grows,
tservers will take longer to start-up. You might become vulnerable to
KUDU-2050; we're not sure. In order to reach that amount of data
you'll probably also raise the number of tablets hosted by the
tserver. This can increase the tserver's thread count, file descriptor
count, and may cause slowdowns in other areas.
In short, nothing will "happen" the moment you cross 4 TB, it's just
that you'll be entering relatively uncharted waters and might
encounter unusual or unexpected behavior. If that doesn't deter you,
by all means give it a shot (and report back with your findings)!
On Wed, Aug 30, 2017 at 5:53 PM, 李津 wrote:
> why per tserver have the upper limit of 4T and it include the memrowset
> data? we also not testing more than 4T. what will happen if reach the upper
> limit?