Github user keith-turner commented on the issue:
https://github.com/apache/accumulo/pull/180
> For example, we would want to avoid storing 1M CVs if a user had that
many in a table (for some reason).
I think we should address this issue in some way while considering the
following.
* Fetching summaries should be relatively fast. Gigantic summaries will
stymie this goal.
* When a users summarizer does produce a gigantic summary, it would be
nice if we helped them debug it.
I am thinking one way to accomplish these goals is to store gigantic
summaries, but only read summaries under a certain size. The size of a
serialized summary could be written first. When a summary is read this size
will be the first bit of info. If the summary is over a certain size an error
could be logged and that file would be treated like it had no summary. We
could also add a enum that indicates gigantic summaries were present. Since
the summary is stored, it would give the user a chance to use rfile-info to
look at whats in the summary for debugging.
We also need to stress in the javadoc that summaries are intended to be
small.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---