On 5/23/17 2:58 AM, Gabriel Sandor wrote:
Hello Henri,

I was afraid this might be the case, so the library really is deprecated.

The project i'm working on implies multi-lingual environment, users, and
files, so yes, having a good encoding detector is important. Thanks for the
alternate recommendations, i see that they are C/C++ libraries but in
theory they can be wrapped into a managed C++.NET assembly and consumed by
a C# project. I haven't seen yet any existing C# ports that also handle
charset detection.

You only need charset detection if you can't get reliable charsets passed around. Most word processing formats embed the charset they use in the document (or just use UTF-8 unconditionally), so you only need charset detection if you're getting lots of multilingual plain text (or plain text-ish formats like markdown or HTML), and even then, only if you expect the charset information to be unreliable. It's also worth pointing out that letting users override the charset information on a per-file basis goes a very long way to avoiding the need for charset detection.

--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to