[ 
https://issues.apache.org/jira/browse/SOLR-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-16589:
--------------------------------
    Description: 
h3. Summary

For fields using large="true", large fields (which is what they are intended 
for) can be truncated in v9+ of Solr.

Example fieldtype definition:
{code:java}
<fieldtype name="string_large"  class="solr.TextField" multiValued="false" 
indexed="false" stored="true" omitNorms="true" large="true" />{code}
Cause

Looks like this is a bug introduced along with 
https://issues.apache.org/jira/browse/LUCENE-8805 / 
https://github.com/apache/lucene/issues/9849|https://github.com/apache/lucene/issues/9849

The current code is here:
https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/search/SolrDocumentFetcher.java#L511
 
{code:java}
public void stringField(FieldInfo fieldInfo, String value) throws IOException {
    Objects.requireNonNull(value, "String value should not be null");
    bytesRef.bytes = value.getBytes(StandardCharsets.UTF_8);
    bytesRef.length = value.length();
{code}
 
Specifically with respect to "large" fields handling.

The length in utf8 bytes will often be longer than the string length 
`value.length()`, hence the truncation.
h3. Fix

The Fix would be:
{code:java}
bytesRef.length = bytesRef.bytes.length {code}
 

  was:
h3. Summary

For fields using large="true", large fields (which is what they are intended 
for) can be truncated in v9+ of Solr.

Example fieldtype definition:
{code:java}
<fieldtype name="string_large"  class="solr.TextField" multiValued="false" 
indexed="false" stored="true" omitNorms="true" large="true" />{code}
Cause

Looks like this is a bug introduced along with 
https://issues.apache.org/jira/browse/LUCENE-8805) / 
[https://github.com/apache/lucene/issues/9849|https://github.com/apache/lucene/issues/9849:]

[https://github.com/apache/lucene/blob/5a694ea26ff862ecc874ca798135073d300c2234/solr/core/src/java/org/apache/solr/search/SolrDocumentFetcher.java#L462-L465|https://github.com/apache/solr/blob/bc2d9623f7960f83636eb8416b11dd4e91ab4b22/solr/core/src/java/org/apache/solr/search/SolrDocumentFetcher.java#L508-L511]

 
{code:java}
                  public void stringField(FieldInfo fieldInfo, String value) 
throws IOException {
                    Objects.requireNonNull(value, "String value should not be 
null");
                    bytesRef.bytes = value.getBytes(StandardCharsets.UTF_8);
                    bytesRef.length = value.length();{code}
 

 

Specifically with respect to "large" fields handling.

The length in utf8 bytes will often be longer than the string length 
`value.length()`, hence the truncation.
h3. Fix

The Fix would be:
{code:java}
bytesRef.length = bytesRef.bytes.length {code}
 


> Large fields with large="true" can be truncated in v9+
> ------------------------------------------------------
>
>                 Key: SOLR-16589
>                 URL: https://issues.apache.org/jira/browse/SOLR-16589
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: search
>    Affects Versions: 9.0, 9.1, 9.2
>            Reporter: Nikolas Osvalds
>            Assignee: Michael Gibney
>            Priority: Major
>
> h3. Summary
> For fields using large="true", large fields (which is what they are intended 
> for) can be truncated in v9+ of Solr.
> Example fieldtype definition:
> {code:java}
> <fieldtype name="string_large"  class="solr.TextField" multiValued="false" 
> indexed="false" stored="true" omitNorms="true" large="true" />{code}
> Cause
> Looks like this is a bug introduced along with 
> https://issues.apache.org/jira/browse/LUCENE-8805 / 
> https://github.com/apache/lucene/issues/9849|https://github.com/apache/lucene/issues/9849
> The current code is here:
> https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/search/SolrDocumentFetcher.java#L511
>  
> {code:java}
> public void stringField(FieldInfo fieldInfo, String value) throws IOException 
> {
>     Objects.requireNonNull(value, "String value should not be null");
>     bytesRef.bytes = value.getBytes(StandardCharsets.UTF_8);
>     bytesRef.length = value.length();
> {code}
>  
> Specifically with respect to "large" fields handling.
> The length in utf8 bytes will often be longer than the string length 
> `value.length()`, hence the truncation.
> h3. Fix
> The Fix would be:
> {code:java}
> bytesRef.length = bytesRef.bytes.length {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to