[ 
https://issues.apache.org/jira/browse/OAK-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabrizio Fortino updated OAK-9123:
----------------------------------
    Fix Version/s: 1.32.0

> Error: Document contains at least one immense term
> --------------------------------------------------
>
>                 Key: OAK-9123
>                 URL: https://issues.apache.org/jira/browse/OAK-9123
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: elastic-search, indexing, search
>            Reporter: Fabrizio Fortino
>            Assignee: Fabrizio Fortino
>            Priority: Major
>             Fix For: 1.32.0
>
>
> {code:java}
> 11:35:09.400 [I/O dispatcher 1] ERROR o.a.j.o.p.i.e.i.ElasticIndexWriter - 
> Bulk item with id /wikipedia/76/84/National Palace (Mexico) failed
> org.elasticsearch.ElasticsearchException: Elasticsearch exception 
> [type=illegal_argument_exception, reason=Document contains at least one 
> immense term in field="text.keyword" (whose UTF8 encoding is longer than the 
> max length 32766), all of which were skipped. Please correct the analyzer to 
> not produce such terms. The prefix of the first immense term is: '[123, 123, 
> 73, 110, 102, 111, 98, 111, 120, 32, 104, 105, 115, 116, 111, 114, 105, 99, 
> 32, 98, 117, 105, 108, 100, 105, 110, 103, 10, 124, 110]...', original 
> message: bytes can be at most 32766 in length; got 33409]
> at 
> org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
> at 
> org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
> at 
> org.elasticsearch.action.bulk.BulkItemResponse.fromXContent(BulkItemResponse.java:138)
> at 
> org.elasticsearch.action.bulk.BulkResponse.fromXContent(BulkResponse.java:196)
> at 
> org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1888)
> at 
> org.elasticsearch.client.RestHighLevelClient.lambda$performRequestAsyncAndParseEntity$10(RestHighLevelClient.java:1676)
> at 
> org.elasticsearch.client.RestHighLevelClient$1.onSuccess(RestHighLevelClient.java:1758)
> at 
> org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onSuccess(RestClient.java:590)
> at org.elasticsearch.client.RestClient$1.completed(RestClient.java:333)
> at org.elasticsearch.client.RestClient$1.completed(RestClient.java:327)
> at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:122)
> at 
> org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:181)
> at 
> org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:448)
> at 
> org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:338)
> at 
> org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265)
> at 
> org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
> at 
> org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
> at 
> org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
> at 
> org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
> at 
> org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
> at 
> org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
> at 
> org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
> at 
> org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
> at 
> org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception 
> [type=max_bytes_length_exceeded_exception, reason=bytes can be at most 32766 
> in length; got 33409]
> at 
> org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
> at 
> org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
> at 
> org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
> ... 24 common frames omitted{code}
> This happens with huge keyword fields since Lucene doesn't allow terms with 
> more than 32k bytes.
> See 
> [https://discuss.elastic.co/t/error-document-contains-at-least-one-immense-term-in-field/66486]
> We have decided to always create keyword fields to remove the need to specify 
> properties like ordered or facet. In this way every field can be sorted or 
> used as facet.
> In this specific case the keyword field won't be needed at all but it would 
> be hard to decide when include it or not. To solve this we are going to use 
> `ignore_above=256` so huge keyword fields will be ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to