Hi Terry, On Thu, Mar 8, 2012 at 2:36 PM, Reese, Terry <[email protected]> wrote: > This is one of the reasons you really can't trust the information found in > position 9. This is one of the reasons why when I wrote MarcEdit, I utilize > a mixed process when working with data and determining characterset -- a > process that reads this byte and takes the information under advisement, but > in the end treats it more as a suggestion and one part of a larger heuristic > analysis of the record data to determine whether the information is in UTF8 > or not. Fortunately, determining if a set of data is in UTF8 or something > else, is a fairly easy process. Determining the something else is much more > difficult, but generally not necessary.
Can you describe in a bit more detail how MARCEdit sniffs the record to determine the encoding? This has come up enough times w/ pymarc to make it worth implementing. //Ed
