Re: Mozilla Charset Detectors

Gabriel Sandor Tue, 30 May 2017 06:42:46 -0700

They can come from arbitrary sources that are out of my control. Therefore
i may not get the charset of the original document, so all i'm left with is
heuristic detection for those fragments. The application must be able to
deal with any XML it receives, it doesn't impose any particular structure
or content (think of XML editors like Notepad++).


Besides XML, there are also plain text files, which don't really have a
standard way of declaring all possible encodings.

No matter how much i'd like to avoid it, there are cases when heuristic
encoding detection is the only option.

On Fri, May 26, 2017 at 9:45 PM, Daniel Veditz <dved...@mozilla.com> wrote:

> On Fri, May 26, 2017 at 4:12 AM, <gabi.t.san...@gmail.com> wrote:
>
>> Still, sometimes XML fragments come up and even if they are not 100% XML
>> spec compliant i still have to process them. This includes encoding
>> detection as well, when the XML declaration is missing from the fragments.
>>
>
> Where do the fragments come from? If you pulled them out of a document
> then you should have a charset (even if we have to guess at the document
> level). If you only get the fragments through an API the charset should be
> passed along as an argument to the API, otherwise treat them as Henri
> described above.
>
> -Dan Veditz
>
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Mozilla Charset Detectors

Reply via email to