Julien Massiera created CONNECTORS-1681:
-------------------------------------------
Summary: TikaServiceRmeta: recordActivity can cause Database
exception
Key: CONNECTORS-1681
URL: https://issues.apache.org/jira/browse/CONNECTORS-1681
Project: ManifoldCF
Issue Type: Bug
Components: Tika service connector
Affects Versions: ManifoldCF 2.20
Reporter: Julien Massiera
Assignee: Julien Massiera
Some files containing non ASCII characters can cause Tika to trigger an
exception describing the parsing problem.
As the TikaServiceRmeta connector creates an activity record for any Tika
exception containing its description (and so that contains the non ASCII char
in those cases), it causes an SQL exception when MCF tries to insert the
activity record in Postgres:
{code:java}
ERROR 2021-11-24T13:37:00,121 (Worker thread '41') -
MCF|MCF-agent|apache.manifoldcf.crawlerthreads|Worker thread aborting and
restarting due to database connection reset: Database exception: SQLException
doing query (22021): ERROR: invalid byte sequence for encoding "UTF8": 0x00
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception:
SQLException doing query (22021): ERROR: invalid byte sequence for encoding
"UTF8": 0x00 {code}
So to avoid this, we need to remove any non ASCII chars from the exception
description before recording the activity
--
This message was sent by Atlassian Jira
(v8.20.1#820001)