[
https://issues.apache.org/jira/browse/CONNECTORS-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Massiera resolved CONNECTORS-1681.
-----------------------------------------
Fix Version/s: ManifoldCF 2.21
Resolution: Fixed
r1895299
> TikaServiceRmeta: recordActivity can cause Database exception
> -------------------------------------------------------------
>
> Key: CONNECTORS-1681
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1681
> Project: ManifoldCF
> Issue Type: Bug
> Components: Tika service connector
> Affects Versions: ManifoldCF 2.20
> Reporter: Julien Massiera
> Assignee: Julien Massiera
> Priority: Major
> Fix For: ManifoldCF 2.21
>
>
> Some files containing non ASCII characters can cause Tika to trigger an
> exception describing the parsing problem.
> As the TikaServiceRmeta connector creates an activity record for any Tika
> exception containing its description (and so that contains the non ASCII char
> in those cases), it causes an SQL exception when MCF tries to insert the
> activity record in Postgres:
> {code:java}
> ERROR 2021-11-24T13:37:00,121 (Worker thread '41') -
> MCF|MCF-agent|apache.manifoldcf.crawlerthreads|Worker thread aborting and
> restarting due to database connection reset: Database exception: SQLException
> doing query (22021): ERROR: invalid byte sequence for encoding "UTF8": 0x00
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database
> exception: SQLException doing query (22021): ERROR: invalid byte sequence for
> encoding "UTF8": 0x00 {code}
> So to avoid this, we need to remove any non ASCII chars from the exception
> description before recording the activity
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)