[ 
https://issues.apache.org/jira/browse/SOLR-10375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957552#comment-15957552
 ] 

David Smiley commented on SOLR-10375:
-------------------------------------

bq.  at what size/length should Solr be expected to support for stored string 
values? I'd imagine making that call instead does come at some cost overall.

So we pick a threshold just like {{GrowableByteArrayDataOutput.writeString}} 
does.  Before the threshold is the simplest path, albeit one that might use 
larger arrays than necessary.  Over the threshold we scan the text to see how 
big we need to make the byte[].

Another route to take is to override 
{{org.apache.lucene.document.DocumentStoredFieldVisitor#stringField}} to 
conditionally use a Field/IndexableField subclass that holds the byte[] instead 
of immediately converting to a String, leaving the String conversion to occur 
on-demand.  The ultimate char length could be pre-computed and cached as well.  
This path is more work of course.

> Stored text > 716MB retrieval with StoredFieldVisitor causes out of memory 
> error with document cache
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-10375
>                 URL: https://issues.apache.org/jira/browse/SOLR-10375
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 6.2.1
>         Environment: Java 1.8.121, Linux x64
>            Reporter: Michael Braun
>
> Using SolrIndexSearcher.doc(int n, StoredFieldVisitor visitor) - 
> If the document cache has the document, will call visitFromCached, will get 
> an out of memory error because of line 752 of SolrIndexSearcher - 
> visitor.stringField(info, f.stringValue().getBytes(StandardCharsets.UTF_8));
> {code}
>  at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
>   at java.lang.StringCoding.encode(Ljava/nio/charset/Charset;[CII)[B 
> (StringCoding.java:350)
>   at java.lang.String.getBytes(Ljava/nio/charset/Charset;)[B (String.java:941)
>   at 
> org.apache.solr.search.SolrIndexSearcher.visitFromCached(Lorg/apache/lucene/document/Document;Lorg/apache/lucene/index/StoredFieldVisitor;)V
>  (SolrIndexSearcher.java:685)
>   at 
> org.apache.solr.search.SolrIndexSearcher.doc(ILorg/apache/lucene/index/StoredFieldVisitor;)V
>  (SolrIndexSearcher.java:652)
> {code}
> This is due to the current String.getBytes(Charset) implementation, which 
> allocates the underlying byte array as a function of 
> charArrayLength*maxBytesPerCharacter, which for UTF-8 is 3.  3 * 716MB is 
> over Integer.MAX, and the JVM cannot allocate over this, so an out of memory 
> exception is thrown because the allocation of this much memory for a single 
> array is currently impossible.
> The problem is not present when the document cache is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to