https://bz.apache.org/bugzilla/show_bug.cgi?id=69172
Bug ID: 69172
Summary: PDF parse incorrect one character a line
Product: POI
Version: unspecified
Hardware: PC
Status: NEW
Severity: critical
Priority: P2
Component: POI Overall
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: ---
Created attachment 39793
--> https://bz.apache.org/bugzilla/attachment.cgi?id=39793&action=edit
the incorrect result parsed by Tika and Tika Server 2.9.2 and 3.0beta
The attached PDF cannot be correctly parsed by Tika 2.9.2 and 3.0beta, in
server version and the standalone.
If a "line break" will be added in every character. It happened to symbol,
English letters, and JCK characters.
In the server version, curl -g -T "sample.pdf" http://localhost:889/tika
--header "Accept: text/plain"
In the standalone version, java.exe -jar "C:\TikaSearch\tika-app-2.9.2.jar"
--text
Both of above, deliver the the incorrect result in the attached pdf.
The bugs appears to some scanner models only. Other scanned documents are fine.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]