Github user keith-turner commented on the issue: https://github.com/apache/accumulo/pull/180 > For example, we would want to avoid storing 1M CVs if a user had that many in a table (for some reason). I think we should address this issue in some way while considering the following. * Fetching summaries should be relatively fast. Gigantic summaries will stymie this goal. * When a users summarizer does produce a gigantic summary, it would be nice if we helped them debug it. I am thinking one way to accomplish these goals is to store gigantic summaries, but only read summaries under a certain size. The size of a serialized summary could be written first. When a summary is read this size will be the first bit of info. If the summary is over a certain size an error could be logged and that file would be treated like it had no summary. We could also add a enum that indicates gigantic summaries were present. Since the summary is stored, it would give the user a chance to use rfile-info to look at whats in the summary for debugging. We also need to stress in the javadoc that summaries are intended to be small.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---