Jarrett Billingsley Wrote: > On Tue, Jan 6, 2009 at 8:04 PM, james <jame...@gmail.com> wrote: > > im writing an indexer, but im having a problem because on some file, when i > > read gives this error > > > > Error 4: invalid UTF-8 sequence > > > > is there a way to fix it. > > > > You're probably reading a file that's encoded in some non-Unicode > encoding, like Latin-1. You could read in the file data as byte[] > instead of as char[], but that still doesn't deal with the problem > that you have characters in your file that are outside the ASCII > range. If you know what encoding your file uses, you could do some > transformations on it to turn it into valid Unicode, or you could just > ignore characters outside the ASCII range :P
is there any library or function that can automatically convert these unknown html charset into UTF-8