[jira] [Updated] (SOLR-10286) Declare a field as "large", don't keep value in the document cache

David Smiley (JIRA) Wed, 15 Mar 2017 00:24:53 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-10286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


David Smiley updated SOLR-10286:
--------------------------------
    Attachment: SOLR-10286_large_fields.patch

Here's a patch.  The whole test suite has passed twice now.

Nocommits:
* SchemaField.isLarge will default to true for the purposes of testing during 
development of this feature.  This was extremely useful.  It should of course 
be false.
* SolrIndexSearcher: I want to refactor/move all the Lucene Document related 
code and docValuesAsStored type code out of here into a new companion class 
named {{SolrDocumentFetcher}}.  I didn't do that yet as I want the patch to 
show what's changed more clearly.
* BaseEditorialTransformer (related to query elevation): This made bad 
instanceof assumptions that I fixed, but I found the code to be too loosy 
goosey to my liking on toString'ing whatever it didn't understand (I hate that 
in general; leads to hard-to-find bugs).  I don't think it could happen so I 
added an "assert false".  Now that I see all tests pass, I'm inclined to make 
it fail hard.

Schema package:
* FieldProperties: converted the bit masks from hex to instead use Java 8's 
boolean literal.  Much clearer!
* Question: [[email protected]] what is {{BINARY}} for?  This isn't used 
anywhere and the line of code dates back to Solr's initial Apache contribution. 
For a moment I thought I could use it as the same as a BinaryField check but 
apparently not.
* FieldType.checkSchemaField only used to test for docValues compatibility and 
subclasses would override this to add a no-op.  I think that design was poor as 
it's too all-encompassing, so I made it call a new checkSupportsDocValues() and 
had the applicable subclasses override _that_ instead.  
* FieldType.checkSchemaField now checks for "large" compatibility -- 
multiValued, stored, not-a-number.  BinaryField overrides to throw as well as 
that hasn't been implemented yet.

SolrIndexSearcher:
* I refactored the doc() handling to always use a custom StoredFieldVisitor, 
which I think makes it clearer.  This may also make it easier to add a 
Status.STOP optimization for single-valued fields but I didn't get to that.
* When the Unified/Postings highlighters supply their custom StoredFieldVisitor 
and match an already cached document's large field, this code will avoid a 
double-string conversion, reducing heap memory pressure.

Tests:
* The test is pretty basic; good enough?  It'd be nice to add a test to the 
Solr UnifiedHighlighter related stuff to randomly use this field.  It's at 
least an opt-in feature so I'm not too worried... not to mention I ran this 
with a default large'ness to tease out bugs. I wonder if the default large'ness 
could/should be flipped randomly by Solr's test infrastructure?

Bugs found/fixed:
* In a couple places in Solr, there was an assumption that the Lucene 
{{IndexableField}} was actually an instance of {{Field}}.  Two cases are seen 
as fixed in this patch:
** {{DocumentBuilder.addField}}. It appears in-place updates might not have 
worked in some cases involving lazy fields, depending on the usage pattern.
** {{BaseEditorialTransformer}} (query elevation).
* RealTimeGetComponent: RTG can internally grab a ref-counted realtime 
searcher, lookup a document, then dec-ref the searcher.  If the searcher is 
subsequently closed, the lazy field can't get the value anymore.  Theoretically 
this problem could happen with Solr's standard lazy fields too but a "large" 
field is better at provoking it. I fixed this by essentially copying the 
IndexableField.  It'd be nice if Lucene {{Field}} had a copy-constructor of an 
IndexableField; I was forced to subclass to accomplish the same.

Although not a strict requirement, ideally SOLR-10273 (largest field last) is 
also done.

> Declare a field as "large", don't keep value in the document cache
> ------------------------------------------------------------------
>
>                 Key: SOLR-10286
>                 URL: https://issues.apache.org/jira/browse/SOLR-10286
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: David Smiley
>            Assignee: David Smiley
>         Attachments: SOLR-10286_large_fields.patch
>
>
> (part of umbrella issue SOLR-10117)
> This adds a field to be declared as "large" in the schema.  In the 
> {{SolrIndexSearcher.doc(...)}} handling, these fields are lazily fetched from 
> Lucene.  Unlike {{LazyDocument.LazyField}}, it's not cached after first-use 
> unless the value is "small" < 512KB by default.  "large" can only be used 
> when its stored="true" and multiValued="false" and the field is otherwise 
> compatible (basically not a numeric field) -- you'll get a helpful exception 
> if it's unsupported. BinaryField is not yet supported at this time; it could 
> be in the future.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-10286) Declare a field as "large", don't keep value in the document cache

Reply via email to