Hey folks,

I've started working on a few patches to add "guard rails" to various
user-specified dimensions in Kudu. In particular, I'm planning to add
limits to the following:

- max number of columns in a table (proposal: 300)
- max replication factor (proposal: 7)
- max table name or column name length (proposal: 256)
- max size of a binary/string column cell value (proposal: 64kb)

The reasoning is that, even though in some cases we don't know a specific
issue that will happen outside these limits, we've done very little testing
(and have no automated testing) outside of these ranges. In some cases, we
do know that there is a certain threshold that will cause a big problem (eg
large cell sizes can cause tablet servers to crash). In other cases, it's
just "unknown territory".

In all cases, I'm planning on making the limits overridable via an "unsafe"
configuration flag. That means that a user can run with
"--unlock_unsafe_flags --max_identifier_length=1000" if they want to, but
they're explicitly accepting some risk that they're entering untested
territory.

Of course, in all cases, if we hear that there are people who are bumping
the maxes higher than the defaults and having good results, we can consider
raising the maximum, but I think it's smarter to start conservatively low
and raise later as we increase test coverage. Also, I'm sure down the road
we'll add features such as BLOB support or sparse column support, and at
that time we can remove the corresponding guard rails.

I'm sending this note to both user@ and dev@ to solicit feedback. Are there
any other dimensions people can think of where we should probably add
guard-rails? Is anyone out there already outside of the above ranges and
can make a case that we're being too conservative?

Thanks
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to