[
https://issues.apache.org/jira/browse/SOLR-10286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Smiley updated SOLR-10286:
--------------------------------
Attachment: SOLR-10286_large_fields.patch
Here's a patch. The whole test suite has passed twice now.
Nocommits:
* SchemaField.isLarge will default to true for the purposes of testing during
development of this feature. This was extremely useful. It should of course
be false.
* SolrIndexSearcher: I want to refactor/move all the Lucene Document related
code and docValuesAsStored type code out of here into a new companion class
named {{SolrDocumentFetcher}}. I didn't do that yet as I want the patch to
show what's changed more clearly.
* BaseEditorialTransformer (related to query elevation): This made bad
instanceof assumptions that I fixed, but I found the code to be too loosy
goosey to my liking on toString'ing whatever it didn't understand (I hate that
in general; leads to hard-to-find bugs). I don't think it could happen so I
added an "assert false". Now that I see all tests pass, I'm inclined to make
it fail hard.
Schema package:
* FieldProperties: converted the bit masks from hex to instead use Java 8's
boolean literal. Much clearer!
* Question: [[email protected]] what is {{BINARY}} for? This isn't used
anywhere and the line of code dates back to Solr's initial Apache contribution.
For a moment I thought I could use it as the same as a BinaryField check but
apparently not.
* FieldType.checkSchemaField only used to test for docValues compatibility and
subclasses would override this to add a no-op. I think that design was poor as
it's too all-encompassing, so I made it call a new checkSupportsDocValues() and
had the applicable subclasses override _that_ instead.
* FieldType.checkSchemaField now checks for "large" compatibility --
multiValued, stored, not-a-number. BinaryField overrides to throw as well as
that hasn't been implemented yet.
SolrIndexSearcher:
* I refactored the doc() handling to always use a custom StoredFieldVisitor,
which I think makes it clearer. This may also make it easier to add a
Status.STOP optimization for single-valued fields but I didn't get to that.
* When the Unified/Postings highlighters supply their custom StoredFieldVisitor
and match an already cached document's large field, this code will avoid a
double-string conversion, reducing heap memory pressure.
Tests:
* The test is pretty basic; good enough? It'd be nice to add a test to the
Solr UnifiedHighlighter related stuff to randomly use this field. It's at
least an opt-in feature so I'm not too worried... not to mention I ran this
with a default large'ness to tease out bugs. I wonder if the default large'ness
could/should be flipped randomly by Solr's test infrastructure?
Bugs found/fixed:
* In a couple places in Solr, there was an assumption that the Lucene
{{IndexableField}} was actually an instance of {{Field}}. Two cases are seen
as fixed in this patch:
** {{DocumentBuilder.addField}}. It appears in-place updates might not have
worked in some cases involving lazy fields, depending on the usage pattern.
** {{BaseEditorialTransformer}} (query elevation).
* RealTimeGetComponent: RTG can internally grab a ref-counted realtime
searcher, lookup a document, then dec-ref the searcher. If the searcher is
subsequently closed, the lazy field can't get the value anymore. Theoretically
this problem could happen with Solr's standard lazy fields too but a "large"
field is better at provoking it. I fixed this by essentially copying the
IndexableField. It'd be nice if Lucene {{Field}} had a copy-constructor of an
IndexableField; I was forced to subclass to accomplish the same.
Although not a strict requirement, ideally SOLR-10273 (largest field last) is
also done.
> Declare a field as "large", don't keep value in the document cache
> ------------------------------------------------------------------
>
> Key: SOLR-10286
> URL: https://issues.apache.org/jira/browse/SOLR-10286
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: David Smiley
> Assignee: David Smiley
> Attachments: SOLR-10286_large_fields.patch
>
>
> (part of umbrella issue SOLR-10117)
> This adds a field to be declared as "large" in the schema. In the
> {{SolrIndexSearcher.doc(...)}} handling, these fields are lazily fetched from
> Lucene. Unlike {{LazyDocument.LazyField}}, it's not cached after first-use
> unless the value is "small" < 512KB by default. "large" can only be used
> when its stored="true" and multiValued="false" and the field is otherwise
> compatible (basically not a numeric field) -- you'll get a helpful exception
> if it's unsupported. BinaryField is not yet supported at this time; it could
> be in the future.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]