[ 
https://issues.apache.org/jira/browse/TIKA-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gross updated TIKA-574:
-----------------------

    Attachment: tika-0.8-cp866.patch

I've used ngrams from cp1251 and wrote custom byteMap. All russian letters, 
used in cp1251 are present in cp866, so no changes in NGrams needed.

Added inner static class in CharsetRecog_sbcs and 
CharsetDetector#createRecognizers modified to register this class.


> Support for IBM866 (CP866) encoding in TXTParser
> ------------------------------------------------
>
>                 Key: TIKA-574
>                 URL: https://issues.apache.org/jira/browse/TIKA-574
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 0.8
>         Environment: GNU/Linux 2.6.35-23, openjdk6
>            Reporter: gross
>            Priority: Minor
>             Fix For: 0.8, 0.9, 1.0
>
>         Attachments: tika-0.8-cp866.patch
>
>
> There's no recognizer for CP866 (DOS russian encoding) in tika yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to