Hi Terry,

On Thu, Mar 8, 2012 at 2:36 PM, Reese, Terry
<[email protected]> wrote:
> This is one of the reasons you really can't trust the information found in 
> position 9.  This is one of the reasons why when I wrote MarcEdit, I utilize 
> a mixed process when working with data and determining characterset -- a 
> process that reads this byte and takes the information under advisement, but 
> in the end treats it more as a suggestion and one part of a larger heuristic 
> analysis of the record data to determine whether the information is in UTF8 
> or not.  Fortunately, determining if a set of data is in UTF8 or something 
> else, is a fairly easy process.  Determining the something else is much more 
> difficult, but generally not necessary.

Can you describe in a bit more detail how MARCEdit sniffs the record
to determine the encoding? This has come up enough times w/ pymarc to
make it worth implementing.

//Ed

Reply via email to