Re: Mozilla Charset Detectors

Joshua Cranmer 🐧 Tue, 23 May 2017 09:50:36 -0700

On 5/23/17 2:58 AM, Gabriel Sandor wrote:

Hello Henri,


I was afraid this might be the case, so the library really is deprecated.

The project i'm working on implies multi-lingual environment, users, and
files, so yes, having a good encoding detector is important. Thanks for the
alternate recommendations, i see that they are C/C++ libraries but in
theory they can be wrapped into a managed C++.NET assembly and consumed by
a C# project. I haven't seen yet any existing C# ports that also handle
charset detection.

You only need charset detection if you can't get reliable charsetspassed around. Most word processing formats embed the charset they usein the document (or just use UTF-8 unconditionally), so you only needcharset detection if you're getting lots of multilingual plain text (orplain text-ish formats like markdown or HTML), and even then, only ifyou expect the charset information to be unreliable. It's also worthpointing out that letting users override the charset information on aper-file basis goes a very long way to avoiding the need for charsetdetection.


--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Mozilla Charset Detectors

Reply via email to