GitHub user IntraCherche opened a pull request:
https://github.com/apache/lucene-solr/pull/530
Branch 7 4 : IndexOutOfBoundsException randomly appearing while indexing
Hi,
First of all sorry if it is not the right place to write the issue I am
encountering. I saw other people where having the same problem on StackOverflow
(https://stackoverflow.com/questions/52783491/solr-indexing-error-possible-analysis-error)
although there was no solution provided so far. Anyway feel free to reroute my
pull request.
So while indexing documents with SOLR 7.4 the following exception appears
randomly. It is very annoying because I cannot reproduce it on my machine and I
don't have access to customer's data. All I know is that it can appear on pdf,
xls, or eml documents. It looks like the issues stems from FlattenGraphFilter
class and specifically _restoreState(inputNode.tokens.get(inputNode.nextOut))_
The stack trace shows :
Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:657)
at java.util.ArrayList.get(ArrayList.java:433)
at
org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:204)
at
org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:258)
at
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:738)
at
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)
at
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392)
at
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:251)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1602)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
at
org.apache.solr.update.DirectUpdateHandler2.updateDocument(DirectUpdateHandler2.java:982)
at
org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:971)
at
org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:348)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:284)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:234)
If more info is needed I would be delighted to provide it as long as I have
it (eg documents are customer confidentials so I don't have access to them).
Kind regards
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/lucene-solr branch_7_4
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/lucene-solr/pull/530.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #530
----
commit 1ed95c097b82ee5f175e93f3fe62572abe064da6
Author: Jim Ferenczi <jim.ferenczi@...>
Date: 2018-05-03T14:11:52Z
LUCENE-8231: Add missing part of speech filter in the SPI META-INF file
commit 0e45b39476f287fe56ec4734c5c8695a99f8cb5d
Author: Christine Poerschke <cpoerschke@...>
Date: 2018-05-03T17:42:56Z
json-facet-api.adoc typo fix.
commit dad48603aec715063fdcb71e11fe73599d63c3a2
Author: Simon Willnauer <simonw@...>
Date: 2018-05-03T07:29:12Z
LUCENE-8293: Ensure only hard deletes are carried over in a merge
Today we carry over hard deletes based on the SegmentReaders liveDocs.
This is not correct if soft-deletes are used especially with rentention
policies. If a soft delete is added while a segment is merged the document
might end up hard deleted in the target segment. This isn't necessarily a
correctness issue but causes unnecessary writes of hard-deletes. The biggest
issue here is that we assert that previously deleted documents are still
deleted
in the live-docs we apply and that might be violated by the retention
policy.
commit 27992e3386f7ba9521329ec8a1957103d73da2e4
Author: Adrien Grand <jpountz@...>
Date: 2018-05-04T12:38:42Z
LUCENE-8295: Remove useless liveDocsSharedPending flag.
commit 6ec1198a5144e73b47bc88cd79f534878cbcbef4
Author: Anshum Gupta <anshum@...>
Date: 2018-05-03T22:00:47Z
SOLR-11277: Add auto hard commit setting based on tlog size (this closes
#358)
commit 2558a06f3c8040d47de6606da2539532e4c67854
Author: Erick Erickson <erick@...>
Date: 2018-05-05T05:30:18Z
SOLR-12028: BadApple and AwaitsFix annotations usage
(cherry picked from commit 89fc02a)
commit 96f079b4b47eaadff65c7aaf0e5bafe68e30ec3b
Author: Uwe Schindler <uschindler@...>
Date: 2018-05-06T12:21:34Z
SOLR-12316: Do not allow to use absolute URIs for including other files in
solrconfig.xml and schema parsing
# Conflicts:
# solr/CHANGES.txt
commit 709782ac9d50e20da5745aa6fa2351b6b6757b20
Author: Mikhail Khludnev <mkhl@...>
Date: 2018-05-06T13:46:56Z
SOLR-8998: documentation fix.
commit 7f761977171d86323848151f41be82de0b6e3f65
Author: Uwe Schindler <uschindler@...>
Date: 2018-05-06T13:53:07Z
SOLR-12316: Fix test to work on linux and test also windows in a better way
commit c43c15409ab204e53622dd6ddb89a78d0cda0389
Author: Ishan Chattopadhyaya <ishan@...>
Date: 2018-05-06T17:09:10Z
Synchronizing the Solr CHANGES.txt
commit 7db633910424244356652944d4aca6184807faf6
Author: Ishan Chattopadhyaya <ishan@...>
Date: 2018-05-06T17:30:21Z
SOLR-12316: Moving the changelog entry to 7.3.1 from 7.4.0
commit b72af046c5bd04eec4e84700a2ee20ab5a833e39
Author: Mark Miller <markrmiller@...>
Date: 2018-05-05T01:02:56Z
SOLR-12293: Updates need to use their own connection pool to maintain
connection reuse and prevent spurious recoveries.
commit 5d6e47eaed05d4c305560c88f2e3393a45d1dbd8
Author: Simon Willnauer <simonw@...>
Date: 2018-05-07T10:01:32Z
LUCENE-8275: Suppress WindowsFS TestDirectoryTaxonomyWriter
TestDirectoryTaxonomyWriter#testRecreateAndRefresh can't deal with pending
files since it creates multiple IW instances on the same directory.
commit 30175b6410579b6d21a59ccad2a03dd03f89d7c5
Author: Dawid Weiss <dweiss@...>
Date: 2018-05-07T11:22:11Z
LUCENE-8261: InterpolatedProperties.interpolate and recursive property
references.
commit 0600a58f005ade7559a765f058f2b91f898925af
Author: Jason Gerlowski <gerlowskija@...>
Date: 2018-05-07T11:01:01Z
SOLR-12279: Reject invalid 'blockUnknown' values for 'bin/solr auth'
commit 48a2138e899bca0b8a8485fb7e490a9dc943d997
Author: Simon Willnauer <simonw@...>
Date: 2018-05-05T07:55:58Z
LUCENE-8297: Add IW#tryUpdateDocValues(Reader, int, Fields...)
IndexWriter can update doc values for a specific term but this might
affect all documents containing the term. With tryUpdateDocValues
users can update doc-values fields for individual documents. This allows
for instance to soft-delete individual documents.
The new method shares most of it's code with tryDeleteDocuments.
commit 93f9cc71b127b84c432bf40e115fd4afe689bd8a
Author: David Smiley <dsmiley@...>
Date: 2018-05-07T18:54:11Z
SOLR-12312: Replication's IndexFetcher buf size should be initialized
to an amount no greater than the size of the file being transferred.
(cherry picked from commit 81f6112)
commit 572490088149e6fc498b1a1ca739f277c7364e00
Author: Dawid Weiss <dawid.weiss@...>
Date: 2018-05-07T19:34:59Z
LUCENE-8261: non-recursive->recursive (javadoc update).
commit 6b1a64e1e6041e1205c3180c08365bbab18096a1
Author: David Smiley <dsmiley@...>
Date: 2018-05-08T02:17:30Z
SOLR-12308: LISTALIASES is now assured to return an up-to-date response
* MiniSolrCloudCluster.deleteAllCollections will now first delete aliases
* Minor refactorings to AliasesManager, AliasIntegrationTest,
CreateRoutedAliasTest
(cherry picked from commit 08ee037)
commit 4cab3eba9c751b6364bf89f6d4dcd604985edba2
Author: Erick Erickson <erick@...>
Date: 2018-05-08T16:54:40Z
SOLR-12192: Error when ulimit is unlimited
(cherry picked from commit abb57c5)
commit cfa4a82b3b0462a3d921bddfbaaf97f7cabb24a4
Author: David Smiley <dsmiley@...>
Date: 2018-05-08T19:10:07Z
SOLR-12258: A V2 request referencing a collection or alias may fail to
resolve it if it was just recently created.
Now we sync with ZooKeeper and try one more time. V1 partially did this
but only for aliases; now it does both.
(cherry picked from commit c3d28a5)
commit 586fb0272f8df6dc5835061e599e5728ae918ca9
Author: Varun Thacker <varun@...>
Date: 2018-05-09T03:50:50Z
SOLR-12265: Upgrade to Jetty 9.4.10
(cherry picked from commit 1705e4f)
commit 7e19c6c32d68f5db65536d865e857a1e3b009729
Author: Simon Willnauer <simonw@...>
Date: 2018-05-09T08:04:18Z
[TEST] Fix TestNorms to ensure that max token lenght is at least 3 to have
predictable norms
commit 5021fb185fdad94b36cbdddf7d5e4f2d056213a0
Author: Mikhail Khludnev <mkhl@...>
Date: 2018-05-09T09:47:46Z
SOLR-12303: documenting default \[subquery].qt trap.
commit dd98a98bdcaa9ba7f97a88646b0f2973a04c8581
Author: Dawid Weiss <dweiss@...>
Date: 2018-05-09T11:28:03Z
LUCENE-8301: Update randomizedtesting to 2.6.0
commit c5767d350e7f5b514609cac9db6f07893a5d29a0
Author: Adrien Grand <jpountz@...>
Date: 2018-05-09T12:31:23Z
LUCENE-8303: Make LiveDocsFormat only responsible for
serialization/deserialization of live docs.
commit 52e72467bb2176c1e7bc9e0d75dd71a8582dc67e
Author: Adrien Grand <jpountz@...>
Date: 2018-05-09T13:12:23Z
LUCENE-8296: PendingDeletes may no longer write to live docs after they are
shared.
commit 0c65af048b3a497ea2e95e48c886b3a653412a0c
Author: Simon Willnauer <simonw@...>
Date: 2018-05-07T09:52:51Z
LUCENE-8298: Allow DocValues updates to reset a value
Today once a document has a value in a certain DV field this values
can only be changed but not removed. While resetting / removing a value
from a field is certainly a corner case it can be used to undelete a
soft-deleted document unless it's merged away.
This allows to rollback changes without rolling back to another commitpoint
or trashing all uncommitted changes. In certain cenarios it can be used to
"repair" history of documents in distributed systems.
commit 72ae8db24c66dd6e5b5f892fa1a0e60bdbd362ab
Author: Simon Willnauer <simonw@...>
Date: 2018-05-09T17:10:36Z
[TEST] Never oversize bitset
commit e260f923ba3d9d93772291b0994b00011af978b0
Author: yonik <yonik@...>
Date: 2018-05-09T19:42:58Z
SOLR-12170: fix date format exceptions for terms facet on date field
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]