#589: bibclassify: text needs preprocessing
------------------------------------+----------------------
 Reporter:  jpcorral                |      Owner:  jpcorral
     Type:  defect                  |     Status:  new
 Priority:  major                   |  Milestone:
Component:  BibClassify             |    Version:
 Keywords:  ligature textification  |
------------------------------------+----------------------
 When the text is extracted from a PDF using pdftotext can appear some
 ligatures like this: [...]role of medium effects was emphasized.[...]
 After the solution is applied: [...]role of medium effects was
 emphasized.[...]

 The text can be normalize in the function normalize_fulltext of
 bibclassify_text_normalizer.py


 Reference: this ticket #317 show the problem bibdocfile module with
 ligatures

-- 
Ticket URL: <https://invenio-software.org/ticket/589>
Invenio <http://invenio-software.org>

Reply via email to