Hi,

You have to give the Charset when creating the Writer. If you give no
charset, Java uses the platform default. This question has nothing to do
with Lucene, it is better suited at an XML  or JAVA general forum.

Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]


> -----Original Message-----
> From: Patrick Diviacco [mailto:[email protected]]
> Sent: Monday, March 28, 2011 9:04 AM
> To: [email protected]
> Subject: file formats: MacRoman and UTF-8...
> 
> When I run my Lucene app and a parse a xml file I get the following error
due
> to some fonts such as "é" written in the text file.
> 
> If I save the text file as UTF-8 with my text editor I don't have this
issue, but
> when I create it with a java app, it is saved as MacRoman.
> 
> How can I specify a different format with Java instead ?
> 
> thanks
> 
> [CODE]Exception in thread "main"
> com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceExcepti
> on:
> Invalid byte 1 of 1-byte UTF-8 sequence.
> at
> com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Re
> ader.java:684)
> at
> com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.ja
> va:554)
> at
> com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityS
> canner.java:1742)
> at
> com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEn
> tityScanner.java:1416)
> at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerIm
> pl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2
> 792)
> at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(X
> MLDocumentScannerImpl.java:648)
> at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerIm
> pl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
> at
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML
> 11Configuration.java:808)
> at
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML
> 11Configuration.java:737)
> at
> com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.jav
> a:119)
> at
> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Abstra
> ctSAXParser.java:1205)
> at
> com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.par
> se(SAXParserImpl.java:522)
> at org.apache.commons.digester.Digester.parse(Digester.java:1871)
> at CollectionIndexer.main(CollectionIndexer.java:111)[/CODE]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to