Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records

Ed Summers Thu, 08 Mar 2012 12:23:05 -0800

Hi Terry,

On Thu, Mar 8, 2012 at 2:36 PM, Reese, Terry
<[email protected]> wrote:
> This is one of the reasons you really can't trust the information found in 
> position 9.  This is one of the reasons why when I wrote MarcEdit, I utilize 
> a mixed process when working with data and determining characterset -- a 
> process that reads this byte and takes the information under advisement, but 
> in the end treats it more as a suggestion and one part of a larger heuristic 
> analysis of the record data to determine whether the information is in UTF8 
> or not.  Fortunately, determining if a set of data is in UTF8 or something 
> else, is a fairly easy process.  Determining the something else is much more 
> difficult, but generally not necessary.


Can you describe in a bit more detail how MARCEdit sniffs the record
to determine the encoding? This has come up enough times w/ pymarc to
make it worth implementing.

//Ed

Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records

Reply via email to