GitHub user IntraCherche opened a pull request:

    https://github.com/apache/lucene-solr/pull/530

    Branch 7 4 : IndexOutOfBoundsException randomly appearing while indexing

    Hi,
    
    First of all sorry if it is not the right place to write the issue I am 
encountering. I saw other people where having the same problem on StackOverflow 
(https://stackoverflow.com/questions/52783491/solr-indexing-error-possible-analysis-error)
 although there was no solution provided so far. Anyway feel free to reroute my 
pull request.
    
    So while indexing documents with SOLR 7.4 the following exception appears 
randomly. It is very annoying because I cannot reproduce it on my machine and I 
don't have access to customer's data. All I know is that it can appear on pdf, 
xls, or eml documents. It looks like the issues stems from FlattenGraphFilter 
class and specifically _restoreState(inputNode.tokens.get(inputNode.nextOut))_ 
    The stack trace shows : 
    
     Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:657)
        at java.util.ArrayList.get(ArrayList.java:433)
        at 
org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:204)
        at 
org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:258)
        at 
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:738)
        at 
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)
        at 
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392)
        at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:251)
        at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)
        at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1602)
        at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
        at 
org.apache.solr.update.DirectUpdateHandler2.updateDocument(DirectUpdateHandler2.java:982)
        at 
org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:971)
        at 
org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:348)
        at 
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:284)
        at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:234)
    
    If more info is needed I would be delighted to provide it as long as I have 
it (eg documents are customer confidentials so I don't have access to them).
    
    Kind regards
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/lucene-solr branch_7_4

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucene-solr/pull/530.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #530
    
----
commit 1ed95c097b82ee5f175e93f3fe62572abe064da6
Author: Jim Ferenczi <jim.ferenczi@...>
Date:   2018-05-03T14:11:52Z

    LUCENE-8231: Add missing part of speech filter in the SPI META-INF file

commit 0e45b39476f287fe56ec4734c5c8695a99f8cb5d
Author: Christine Poerschke <cpoerschke@...>
Date:   2018-05-03T17:42:56Z

    json-facet-api.adoc typo fix.

commit dad48603aec715063fdcb71e11fe73599d63c3a2
Author: Simon Willnauer <simonw@...>
Date:   2018-05-03T07:29:12Z

    LUCENE-8293: Ensure only hard deletes are carried over in a merge
    
    Today we carry over hard deletes based on the SegmentReaders liveDocs.
    This is not correct if soft-deletes are used especially with rentention
    policies. If a soft delete is added while a segment is merged the document
    might end up hard deleted in the target segment. This isn't necessarily a
    correctness issue but causes unnecessary writes of hard-deletes. The biggest
    issue here is that we assert that previously deleted documents are still 
deleted
    in the live-docs we apply and that might be violated by the retention 
policy.

commit 27992e3386f7ba9521329ec8a1957103d73da2e4
Author: Adrien Grand <jpountz@...>
Date:   2018-05-04T12:38:42Z

    LUCENE-8295: Remove useless liveDocsSharedPending flag.

commit 6ec1198a5144e73b47bc88cd79f534878cbcbef4
Author: Anshum Gupta <anshum@...>
Date:   2018-05-03T22:00:47Z

    SOLR-11277: Add auto hard commit setting based on tlog size (this closes 
#358)

commit 2558a06f3c8040d47de6606da2539532e4c67854
Author: Erick Erickson <erick@...>
Date:   2018-05-05T05:30:18Z

    SOLR-12028: BadApple and AwaitsFix annotations usage
    
    (cherry picked from commit 89fc02a)

commit 96f079b4b47eaadff65c7aaf0e5bafe68e30ec3b
Author: Uwe Schindler <uschindler@...>
Date:   2018-05-06T12:21:34Z

    SOLR-12316: Do not allow to use absolute URIs for including other files in 
solrconfig.xml and schema parsing
    
    # Conflicts:
    #   solr/CHANGES.txt

commit 709782ac9d50e20da5745aa6fa2351b6b6757b20
Author: Mikhail Khludnev <mkhl@...>
Date:   2018-05-06T13:46:56Z

    SOLR-8998: documentation fix.

commit 7f761977171d86323848151f41be82de0b6e3f65
Author: Uwe Schindler <uschindler@...>
Date:   2018-05-06T13:53:07Z

    SOLR-12316: Fix test to work on linux and test also windows in a better way

commit c43c15409ab204e53622dd6ddb89a78d0cda0389
Author: Ishan Chattopadhyaya <ishan@...>
Date:   2018-05-06T17:09:10Z

    Synchronizing the Solr CHANGES.txt

commit 7db633910424244356652944d4aca6184807faf6
Author: Ishan Chattopadhyaya <ishan@...>
Date:   2018-05-06T17:30:21Z

    SOLR-12316: Moving the changelog entry to 7.3.1 from 7.4.0

commit b72af046c5bd04eec4e84700a2ee20ab5a833e39
Author: Mark Miller <markrmiller@...>
Date:   2018-05-05T01:02:56Z

    SOLR-12293: Updates need to use their own connection pool to maintain 
connection reuse and prevent spurious recoveries.

commit 5d6e47eaed05d4c305560c88f2e3393a45d1dbd8
Author: Simon Willnauer <simonw@...>
Date:   2018-05-07T10:01:32Z

    LUCENE-8275: Suppress WindowsFS TestDirectoryTaxonomyWriter
    
    TestDirectoryTaxonomyWriter#testRecreateAndRefresh can't deal with pending
    files since it creates multiple IW instances on the same directory.

commit 30175b6410579b6d21a59ccad2a03dd03f89d7c5
Author: Dawid Weiss <dweiss@...>
Date:   2018-05-07T11:22:11Z

    LUCENE-8261: InterpolatedProperties.interpolate and recursive property 
references.

commit 0600a58f005ade7559a765f058f2b91f898925af
Author: Jason Gerlowski <gerlowskija@...>
Date:   2018-05-07T11:01:01Z

    SOLR-12279: Reject invalid 'blockUnknown' values for 'bin/solr auth'

commit 48a2138e899bca0b8a8485fb7e490a9dc943d997
Author: Simon Willnauer <simonw@...>
Date:   2018-05-05T07:55:58Z

    LUCENE-8297: Add IW#tryUpdateDocValues(Reader, int, Fields...)
    
    IndexWriter can update doc values for a specific term but this might
    affect all documents containing the term. With tryUpdateDocValues
    users can update doc-values fields for individual documents. This allows
    for instance to soft-delete individual documents.
    The new method shares most of it's code with tryDeleteDocuments.

commit 93f9cc71b127b84c432bf40e115fd4afe689bd8a
Author: David Smiley <dsmiley@...>
Date:   2018-05-07T18:54:11Z

    SOLR-12312: Replication's IndexFetcher buf size should be initialized
    to an amount no greater than the size of the file being transferred.
    
    (cherry picked from commit 81f6112)

commit 572490088149e6fc498b1a1ca739f277c7364e00
Author: Dawid Weiss <dawid.weiss@...>
Date:   2018-05-07T19:34:59Z

    LUCENE-8261: non-recursive->recursive (javadoc update).

commit 6b1a64e1e6041e1205c3180c08365bbab18096a1
Author: David Smiley <dsmiley@...>
Date:   2018-05-08T02:17:30Z

    SOLR-12308: LISTALIASES is now assured to return an up-to-date response
    * MiniSolrCloudCluster.deleteAllCollections will now first delete aliases
    * Minor refactorings to AliasesManager, AliasIntegrationTest, 
CreateRoutedAliasTest
    
    (cherry picked from commit 08ee037)

commit 4cab3eba9c751b6364bf89f6d4dcd604985edba2
Author: Erick Erickson <erick@...>
Date:   2018-05-08T16:54:40Z

    SOLR-12192: Error when ulimit is unlimited
    
    (cherry picked from commit abb57c5)

commit cfa4a82b3b0462a3d921bddfbaaf97f7cabb24a4
Author: David Smiley <dsmiley@...>
Date:   2018-05-08T19:10:07Z

    SOLR-12258: A V2 request referencing a collection or alias may fail to 
resolve it if it was just recently created.
    Now we sync with ZooKeeper and try one more time.  V1 partially did this 
but only for aliases; now it does both.
    
    (cherry picked from commit c3d28a5)

commit 586fb0272f8df6dc5835061e599e5728ae918ca9
Author: Varun Thacker <varun@...>
Date:   2018-05-09T03:50:50Z

    SOLR-12265: Upgrade to Jetty 9.4.10
    
    (cherry picked from commit 1705e4f)

commit 7e19c6c32d68f5db65536d865e857a1e3b009729
Author: Simon Willnauer <simonw@...>
Date:   2018-05-09T08:04:18Z

    [TEST] Fix TestNorms to ensure that max token lenght is at least 3 to have 
predictable norms

commit 5021fb185fdad94b36cbdddf7d5e4f2d056213a0
Author: Mikhail Khludnev <mkhl@...>
Date:   2018-05-09T09:47:46Z

    SOLR-12303: documenting default \[subquery].qt trap.

commit dd98a98bdcaa9ba7f97a88646b0f2973a04c8581
Author: Dawid Weiss <dweiss@...>
Date:   2018-05-09T11:28:03Z

    LUCENE-8301: Update randomizedtesting to 2.6.0

commit c5767d350e7f5b514609cac9db6f07893a5d29a0
Author: Adrien Grand <jpountz@...>
Date:   2018-05-09T12:31:23Z

    LUCENE-8303: Make LiveDocsFormat only responsible for 
serialization/deserialization of live docs.

commit 52e72467bb2176c1e7bc9e0d75dd71a8582dc67e
Author: Adrien Grand <jpountz@...>
Date:   2018-05-09T13:12:23Z

    LUCENE-8296: PendingDeletes may no longer write to live docs after they are 
shared.

commit 0c65af048b3a497ea2e95e48c886b3a653412a0c
Author: Simon Willnauer <simonw@...>
Date:   2018-05-07T09:52:51Z

    LUCENE-8298: Allow DocValues updates to reset a value
    
    Today once a document has a value in a certain DV field this values
    can only be changed but not removed. While resetting / removing a value
    from a field is certainly a corner case it can be used to undelete a
    soft-deleted document unless it's merged away.
    This allows to rollback changes without rolling back to another commitpoint
    or trashing all uncommitted changes. In certain cenarios it can be used to
    "repair" history of documents in distributed systems.

commit 72ae8db24c66dd6e5b5f892fa1a0e60bdbd362ab
Author: Simon Willnauer <simonw@...>
Date:   2018-05-09T17:10:36Z

    [TEST] Never oversize bitset

commit e260f923ba3d9d93772291b0994b00011af978b0
Author: yonik <yonik@...>
Date:   2018-05-09T19:42:58Z

    SOLR-12170: fix date format exceptions for terms facet on date field

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to