Julien Massiera created CONNECTORS-1681: -------------------------------------------
Summary: TikaServiceRmeta: recordActivity can cause Database exception Key: CONNECTORS-1681 URL: https://issues.apache.org/jira/browse/CONNECTORS-1681 Project: ManifoldCF Issue Type: Bug Components: Tika service connector Affects Versions: ManifoldCF 2.20 Reporter: Julien Massiera Assignee: Julien Massiera Some files containing non ASCII characters can cause Tika to trigger an exception describing the parsing problem. As the TikaServiceRmeta connector creates an activity record for any Tika exception containing its description (and so that contains the non ASCII char in those cases), it causes an SQL exception when MCF tries to insert the activity record in Postgres: {code:java} ERROR 2021-11-24T13:37:00,121 (Worker thread '41') - MCF|MCF-agent|apache.manifoldcf.crawlerthreads|Worker thread aborting and restarting due to database connection reset: Database exception: SQLException doing query (22021): ERROR: invalid byte sequence for encoding "UTF8": 0x00 org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception: SQLException doing query (22021): ERROR: invalid byte sequence for encoding "UTF8": 0x00 {code} So to avoid this, we need to remove any non ASCII chars from the exception description before recording the activity -- This message was sent by Atlassian Jira (v8.20.1#820001)