Sorry for not being able to respond. My colleague was able to parse UTF-8 XML records through Java over the weekend.
BTW, being new to XML, this is a great help, Ed. I'm going to deal with lot of OAI and XML in my next project. It's time to get hands dirty with these modules... Best regards, Saiful On 11/13/05, Edward Summers <[EMAIL PROTECTED]> wrote: > > > Please find attached the file I'm trying to parse. It is extracted > > from a OAI Data Provider in oai_dc format. The challenge is to > > preserve the Thai characters encoded in UTF-8. > > I see these are the result of oai-pmh GetRequests. If you like you > can use the SAX handler in Net::OAI::Harvester directly to extract > record objects like so: > > #!/usr/bin/perl > > use strict; > use XML::SAX::ParserFactory; > use Net::OAI::Record::OAI_DC; > > my $file = shift; > my $factory = XML::SAX::ParserFactory->new(); > my $record = Net::OAI::Record::OAI_DC->new(); > my $parser = $factory->parser(Handler => $record); > > # parse the file > $parser->parse_uri($file); > > # print out the title > print $record->title(); > > That is a script that takes the filename as an argument and prints > out the title. For info about utf8 and perl your best bet is to read > about it in the Camel book (imho). As for a utf8 safe MARC::Record I > believe it's not on CPAN yet, although you can get it out of > SourceForge. Andy Lester manages the CPANification of MARC::Record. > > XSL is the logical choice for transforming one version of XML to > another. However if you need to parse XML to stuff rows into a > database it isn't that logical...at least for me. > > //Ed