Hi all, I'm building an application in which users can add arbitrary documents, and all fields will be added as facets as well. This allows users to browse their documents by their own defined facets easily.
However, when the number of documents gets very large, I switch to random-sampled facets to make sure the application stays responsive. By the nature of sampling, documents (and thus facet-values) will be missed. I let the user select the number of facet-values he want to see for each facets. For example, the default is 10. If a facet contains values 1 to 20, the user will always see 10 values if all documents are returned in the search and no sampling is done. If sampling is done, and the values are non-uniformly distributed, the user might end up with only 5 values instead of 10. I want to 'fill' the empty 5 facet-value-slots with existing facet-values and an unknown facet-count (?). The reason behind this, is that this value might exist in the resultset and for interaction purposes, it is very nice if this value can be selected and added to the query, to quickly find if there are documents that also contain this facet value. It is even more useful if these facet values are not sorted by count, but by label. The user can then quickly see there are document that contain a certain value. I can iterate over the ordinals via the TaxonomyReader and TaxonomyFacets (by leveraging the 'children'), but these ordinals might no longer be used in the documents. What would be a good approach to tackle this issue?