#589: bibclassify: text needs preprocessing
------------------------------------+----------------------
Reporter: jpcorral | Owner: jpcorral
Type: defect | Status: new
Priority: major | Milestone:
Component: BibClassify | Version:
Keywords: ligature textification |
------------------------------------+----------------------
When the text is extracted from a PDF using pdftotext can appear some
ligatures like this: [...]role of medium effects was emphasized.[...]
After the solution is applied: [...]role of medium effects was
emphasized.[...]
The text can be normalize in the function normalize_fulltext of
bibclassify_text_normalizer.py
Reference: this ticket #317 show the problem bibdocfile module with
ligatures
--
Ticket URL: <https://invenio-software.org/ticket/589>
Invenio <http://invenio-software.org>