[
https://issues.apache.org/jira/browse/JCR-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886683#comment-17886683
]
Julian Reschke commented on JCR-2576:
-------------------------------------
trunk: (2.9.0)
[8e68391a5|https://github.com/apache/jackrabbit/commit/8e68391a5b00a20cd3acd7f42a20c9d38930257b]
...in retired branches:
2.10: (2.9.0)
[8e68391a5|https://github.com/apache/jackrabbit/commit/8e68391a5b00a20cd3acd7f42a20c9d38930257b]
2.8: (2.8.2)
[8e68391a5|https://github.com/apache/jackrabbit/commit/8e68391a5b00a20cd3acd7f42a20c9d38930257b]
2.6: (2.6.6)
[8e68391a5|https://github.com/apache/jackrabbit/commit/8e68391a5b00a20cd3acd7f42a20c9d38930257b]
2.4: (2.4.6)
[8e68391a5|https://github.com/apache/jackrabbit/commit/8e68391a5b00a20cd3acd7f42a20c9d38930257b]
2.2:
[8e68391a5|https://github.com/apache/jackrabbit/commit/8e68391a5b00a20cd3acd7f42a20c9d38930257b]
> DbInputStream does not support mark()/reset() when exhausted.
> -------------------------------------------------------------
>
> Key: JCR-2576
> URL: https://issues.apache.org/jira/browse/JCR-2576
> Project: Jackrabbit Content Repository
> Issue Type: Bug
> Components: jackrabbit-core
> Affects Versions: 2.0
> Reporter: Julian Sedding
> Assignee: Thomas Mueller
> Priority: Major
> Fix For: 2.1
>
> Attachments: DbInputStream.patch
>
>
> The DbDataStore implementation uses a DbInputStream to read binary properties
> from the database. When a new binary property is created, Jackrabbit attempts
> to index it. Tika's CharsetDetector is used in the process, which marks the
> input stream, reads the first 8000 bytes and then resets the stream.
> This results in the stacktrace shown at the end of the issue, if the
> following two conditions hold true:
> * the property is larger than the minRecordLength configuration of the
> Datastore and
> * the property is smaller than 8000 bytes
> The DbInputStream needs to have the following properties:
> 1. lazy instantiation of the underlying stream
> 2. auto-close underlying stream when EOF is reached
> 3. fully support mark()/reset() even if the underlying stream is auto-closed
> due to 2.
> 12.03.2010 15:53:28 *WARN * LazyTextExtractorField: Failed to extract text
> from a binary property (LazyTextExtractorField.java, line 165)
> java.io.EOFException
> at
> org.apache.jackrabbit.core.data.db.DbInputStream.reset(DbInputStream.java:180)
> at
> org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:156)
> at
> org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:156)
> at
> org.apache.tika.parser.txt.CharsetDetector.setText(CharsetDetector.java:131)
> at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:77)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:114)
> at
> org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:160)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)