Hi Gerd
Here is updated patch that closes the file, although I find many files
in mkgmap that don't have explicit close(), but I presume .finalize()
will close them eventually.
I'll do another patch for other text file handling, using
StandardCharset where possible and fixing TokenScanner message for bad
characters if not utf-8 and, if reasonable, allowing a BOM even if the
file is opened as utf-8 anyway.
Ticker
On Tue, 2020-01-14 at 08:21 +0000, Gerd Petermann wrote:
> Hi Ticker,
>
> thanks for the patch.
>
> Please review TypCompiler.CharsetProbe. BufferedReader br is not
> closed. Is that intended?
>
> I see that we have a mix of "utf-8" and "UTF-8" in the mkgmap
> sources. I think it would be good to use StandardCharsets.UTF_8 where
> possible
> and unify the rest.
>
> Gerd
>
> ________________________________________
> Von: mkgmap-dev <mkgmap-dev-boun...@lists.mkgmap.org.uk> im Auftrag
> von Ticker Berkin <rwb-mkg...@jagit.co.uk>
> Gesendet: Montag, 13. Januar 2020 11:34
> An: Development list for mkgmap
> Betreff: Re: [mkgmap-dev] TYP files and character encoding
>
> Hi Gerd
>
> I've updated this patch with changes to TypCompiler CharsetProbe:
>
> 1/ looks for unicode BOM in various encodings near start of file.
> 2/ looks for line containing "-*- coding: charset -*-" near start of
> the file.
> 3/ retains the check for "CodePage=" coding for compatibility.
> 4/ in the absence of the above, sets the reading charset to utf-8 if
> the file is valid utf-8, otherwise to Cp1252.
> 5/ fixes the bad character message from the scanner to say what the
> charset really is rather than saying "uft-8" regardless.
> 6/ removes the logic to that checks if String... lines, read in the
> charset it is currently trying, can be encoded in the presumed output
> CodePage.
>
> The final result of this patch should be that:
>
> a/ No existing usage is broken
> b/ 2 methods to indicate the charset/encoding of the file that are
> commonly used by text editors can be used and are taken notice of.
> Previously, just the UTF-8 BOM was detected.
> c/ Typ files can, and should from now on, be written in utf-8
> d/ labels for languages not supported in the --code-page of the
> output
> img just generate a warning in mkgmap.log.x
>
> Ticker
>
>
> On Sat, 2019-12-21 at 16:11 +0000, Ticker Berkin wrote:
> > Hi Gerd
> >
> > Attached is a patch that:
> >
> > Doesn't use the 'CodePage=' command in the typ-file to determine
> > output
> > character encoding of the typ-file, rather it uses the main map
> > encoding from the --code-page argument.
> >
> > log.warn's any typ labels that can't be encoded in the --code-page,
> > rather than just giving up with message like:
> > > TYP file cannot be written in code page 1252
> >
> > The message:
> > > WARNING: SortCode in TYP txt file different from command line
> > > setting
> > that was written direct to system.out is changed to a log.warn and
> > it
> > shouldn't happen anyway now
> >
> > For the moment, the 'CodePage=' command in the typ-file is, under
> > some
> > circumstances, used to determine the encoding of the typ-file
> > itself
> > and I've left this alone for compatibility with existing useage.
> > Sometime in January I'll provide a better method for this
> >
> > Ticker
> >
> >
> > On Wed, 2019-12-18 at 19:54 +0000, Ticker Berkin wrote:
> > > Hi Gerd
> > >
> > > I think it is best to continue with the ideas for typ-files that:
> > >
> > > 1/ they can be in any character set and we just need a better way
> > > of
> > > working out the correct one - see my posting earlier today.
> > >
> > > 2/ it can include as many languages as anyone can be bothered to
> > > add,
> > > and so has to be an a character set that allows the languages to
> > > be
> > > added, implying unicode for a common one (more particulary, UTF
> > > -8)
> > >
> > > 3/ the codepage= statement should be redundant and ignored for
> > > controlling the output character set, which should be taken from
> > > the
> > > map, but its use for determining the input coding might need to
> > > be
> > > kept
> > > for a while for compatability.
> > >
> > > 4/ the messages my hack generates should be turned into 1 warning
> > > or
> > > information message per language or maybe suppressed altogether.
> > > If
> > > someone is generating a map with a character set that doesn't
> > > support
> > > a
> > > particular language, they really won't care that that data for
> > > other
> > > languages that have an incompatible representation with their
> > > language
> > > won't be there.
> > >
> > > Ticker
> > >
> > > On Wed, 2019-12-18 at 19:08 +0000, Gerd Petermann wrote:
> > > > Hi Ticker,
> > > >
> > > > I think I understand now why we didn't have a default typ file
> > > > ;)
> > > > If I got that right I should revert the changes in r4395 and
> > > > mkgmap
> > > > should not allow or warn loudly when a typ file with a
> > > > different
> > > > codepage is merged?
> > > > Or should we force the usage of unicode codepage?
> > > > Or is it possible to compile mapnik.txt with cp 1252 (or any
> > > > other)
> > > > in a way that only those lines which contain non-matching
> > > > characters
> > > > are ignored?
> > > >
> > > > Gerd
> > > >
> > > >
> > > > ________________________________________
> > > > Von: mkgmap-dev <mkgmap-dev-boun...@lists.mkgmap.org.uk> im
> > > > Auftrag
> > > > von Ticker Berkin <rwb-mkg...@jagit.co.uk>
> > > > Gesendet: Mittwoch, 18. Dezember 2019 19:46
> > > > An: mkgmap development
> > > > Betreff: [mkgmap-dev] TYP files and character encoding
> > > >
> > > > Hi
> > > >
> > > > A couple of problems with typ-files and unicode.
> > > >
> > > > With 'Codepage=65001' the final contents of the labels in
> > > > mapnik.typ
> > > > that is included with the composite map is unicode, but if the
> > > > map
> > > > is
> > > > codepage 1252, the unicode characters with the top bit set are
> > > > simply
> > > > displayed as if in 1252.
> > > >
> > > > Removing the codepage statement from mapnik.txt and making
> > > > fixes
> > > > elsewhere to ensure that the file is read correctly as utf-8
> > > > and
> > > > then
> > > > generating a map with --code-page=1252, it gives the error:
> > > >
> > > > SEVE: uk.me.parabola.imgfmt.MapFailedException
> > > > ../svn/trunk/resources/typ-files/mapnik.txt:
> > > > (thrown in TypCompiler.makeMap())
> > > > TYP file cannot be written in code page 1252
> > > >
> > > > Changing the exception handling in
> > > > imgfmt/app/typ/TypElement.java,
> > > > so
> > > > that makeLabelBlock() reads as
> > > > ...
> > > > CharBuffer cb = CharBuffer.wrap(tl.getText());
> > > > try {
> > > > ByteBuffer buffer = encoder.encode(cb);
> > > > out.put((byte) tl.getLang());
> > > > out.put(buffer);
> > > > out.put((byte) 0);
> > > > } catch (CharacterCodingException ignore) {
> > > > // ignore.printStackTrace();
> > > > String name = encoder.charset().name();
> > > > System.out.println("Cannot represent String=" +
> > > > tl.getLang() + "," + tl.getText() +
> > > > " in CodePage=" + name);
> > > > // throw newTypLabelException(name);
> > > > }
> > > > ...
> > > >
> > > > It gives output like:
> > > > Cannot represent String=21,Gara|e in CodePage=windows-1252
> > > > Cannot represent String=21,Obszar przemysBowy in
> > > > CodePage=windows
> > > > -1252
> > > > Cannot represent String=21,ZieleD in CodePage=windows-1252
> > > > Cannot represent String=21,Zaro[la in CodePage=windows-1252
> > > > Cannot represent String=21,MokradBa in CodePage=windows-1252
> > > > Cannot represent String=21,Droga wojew\363dzka (B^Ecznik) in
> > > > CodePage=windows-1252
> > > > Cannot represent String=21,Droga szybkiego ruchu (B^Ecznik) in
> > > > CodePage=windows-1252
> > > > Cannot represent String=21,Droga szybkiego ruchu (B^Ecznik) in
> > > > CodePage=windows-1252
> > > > Cannot represent String=21,Zcie|ka rowerowa in CodePage=windows
> > > > -1252
> > > > Cannot represent String=21,Wybrze|e in CodePage=windows-1252
> > > > Cannot represent String=21,Zcie|ka in CodePage=windows-1252
> > > > Cannot represent String=21,StrumieD in CodePage=windows-1252
> > > > Cannot represent String=21,Granica paDstwa in CodePage=windows
> > > > -1252
> > > > Cannot represent String=21,Rzeka, KanaB in CodePage=windows
> > > > -1252
> > > > Cannot represent String=21,StrumieD in CodePage=windows-1252
> > > > Cannot represent String=21,Ruroci^Eg in CodePage=windows-1252
> > > > Cannot represent String=21,Kabel wysokiego napi^Ycia in
> > > > CodePage=windows-1252
> > > > Cannot represent String=21,Tor wy[cigowy in CodePage=windows
> > > > -1252
> > > > Cannot represent String=21,Droga szybkiego ruchu (B^Ecznik) in
> > > > CodePage=windows-1252
> > > > Cannot represent String=21,Droga krajowa (B^Ecznik) in
> > > > CodePage=windows
> > > > -1252
> > > > Cannot represent String=21,Droga wojew\363dzka (B^Ecznik) in
> > > > CodePage=windows-1252
> > > > Cannot represent String=21,Wie[ (>5 tys.) in CodePage=windows
> > > > -1252
> > > > Cannot represent String=21,Wie[ (>5 tys.) in CodePage=windows
> > > > -1252
> > > > Cannot represent String=21,Restauracja (AmerykaDska) in
> > > > CodePage=windows-1252
> > > > Cannot represent String=21,Restauracja (ChiDska) in
> > > > CodePage=windows
> > > > -1252
> > > > Cannot represent String=21,Restauracja (Mi^Ydzynarodowa) in
> > > > CodePage=windows-1252
> > > > Cannot represent String=21,Restauracja (WBoska) in
> > > > CodePage=windows
> > > > -1252
> > > > Cannot represent String=21,Restauracja (MeksykaDska) in
> > > > CodePage=windows-1252
> > > > Cannot represent String=21,Restauracja (P^Eczki) in
> > > > CodePage=windows
> > > > -1252
> > > > Cannot represent String=21,Restauracja (WegetariaDska) in
> > > > CodePage=windows-1252
> > > > Cannot represent String=21,Kr^Ygle in CodePage=windows-1252
> > > > Cannot represent String=21,Sklep odzie|owy in CodePage=windows
> > > > -1252
> > > > Cannot represent String=21,Wypo|yczalnia samochod\363w in
> > > > CodePage=windows-1252
> > > > Cannot represent String=21,Gara| in CodePage=windows-1252
> > > > Cannot represent String=21,Sprzeda| samochod\363w in
> > > > CodePage=windows
> > > > -1252
> > > > Cannot represent String=21,Sklep |eglarski in CodePage=windows
> > > > -1252
> > > > Cannot represent String=21,S^Ed in CodePage=windows-1252
> > > > Cannot represent String=21,O[rodek kultury in CodePage=windows
> > > > -1252
> > > > Cannot represent String=21,Wi^Yzienie in CodePage=windows-1252
> > > > Cannot represent String=21,Stra| po|arna in CodePage=windows
> > > > -1252
> > > > Cannot represent String=21,SBupek in CodePage=windows-1252
> > > > Cannot represent String=21,PrzystaD in CodePage=windows-1252
> > > > Cannot represent String=21,L^Edowisko helikopterowe in
> > > > CodePage=windows
> > > > -1252
> > > > Cannot represent String=21,Wie|a in CodePage=windows-1252
> > > > Cannot represent String=21,yr\363dBo in CodePage=windows-1252
> > > > Cannot represent String=21,Pla|a in CodePage=windows-1252
> > > > Cannot represent String=21,Przyl^Edek in CodePage=windows-1252
> > > > Cannot represent String=21,SkaBa in CodePage=windows-1252
> > > >
> > > > Which makes sense if codepage 1252 doesn't handle Polish (hex
> > > > 0x15,
> > > > decimal 21).
> > > >
> > > > NB the non ascii characters in above are messed up by my
> > > > cutting
> > > > and
> > > > pasting.
> > > >
> > > > Checking the French, on my Garmin device, the type descriptions
> > > > now
> > > > display accents correctly.
> > > >
> > > > Ticker
> > > >
> > > > _______________________________________________
> > > > mkgmap-dev mailing list
> > > > mkgmap-dev@lists.mkgmap.org.uk
> > > > http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > > > _______________________________________________
> > > > mkgmap-dev mailing list
> > > > mkgmap-dev@lists.mkgmap.org.uk
> > > > http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > > _______________________________________________
> > > mkgmap-dev mailing list
> > > mkgmap-dev@lists.mkgmap.org.uk
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev@lists.mkgmap.org.uk
> > http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Index: src/uk/me/parabola/imgfmt/app/typ/TYPFile.java
===================================================================
--- src/uk/me/parabola/imgfmt/app/typ/TYPFile.java (revision 4413)
+++ src/uk/me/parabola/imgfmt/app/typ/TYPFile.java (working copy)
@@ -121,12 +121,13 @@
// If we succeeded then note offsets for indexes
strToType.put(off, type);
typeToStr.put(type, off);
-
+ writer.put1u(0);
} catch (CharacterCodingException ignore) {
+ //ignore.printStackTrace();
String name = encoder.charset().name();
- throw new TypLabelException(name);
+ log.warn("Cannot represent icon String", label, "in CodePage", name);
+ //throw new TypLabelException(name);
}
- writer.put1u(0);
}
}
Utils.closeFile(writer);
Index: src/uk/me/parabola/imgfmt/app/typ/TypData.java
===================================================================
--- src/uk/me/parabola/imgfmt/app/typ/TypData.java (revision 4413)
+++ src/uk/me/parabola/imgfmt/app/typ/TypData.java (working copy)
@@ -17,6 +17,7 @@
import java.util.List;
import uk.me.parabola.imgfmt.app.srt.Sort;
+import uk.me.parabola.log.Logger;
/**
* Holds all the data for a typ file.
@@ -24,6 +25,8 @@
* @author Steve Ratcliffe
*/
public class TypData {
+ private static final Logger log = Logger.getLogger(TypData.class);
+
private final ShapeStacking stacking = new ShapeStacking();
private final TypParam param = new TypParam();
private final List<TypPolygon> polygons = new ArrayList<TypPolygon>();
@@ -51,10 +54,11 @@
if (origCodepage != 0) {
if (origCodepage != sort.getCodepage()) {
// This is just a warning, not a definite problem
- System.out.println("WARNING: SortCode in TYP txt file different from" +
- " command line setting");
+ // and is to be expected if have general UTF-8 TYP.txt
+ log.warn("CodePage in TYP txt file:", sort.getCodepage(), "different from --code-page:", origCodepage);
}
}
+ return; // want to use the command line one
}
this.sort = sort;
encoder = sort.getCharset().newEncoder();
Index: src/uk/me/parabola/imgfmt/app/typ/TypElement.java
===================================================================
--- src/uk/me/parabola/imgfmt/app/typ/TypElement.java (revision 4413)
+++ src/uk/me/parabola/imgfmt/app/typ/TypElement.java (working copy)
@@ -20,6 +20,7 @@
import java.util.List;
import uk.me.parabola.imgfmt.app.ImgFileWriter;
+import uk.me.parabola.log.Logger;
/**
* Base routines and data used by points, lines and polygons.
@@ -30,6 +31,8 @@
* @author Steve Ratcliffe
*/
public abstract class TypElement implements Comparable<TypElement> {
+ private static final Logger log = Logger.getLogger(TypElement.class);
+
private int type;
private int subType;
@@ -124,17 +127,19 @@
protected ByteBuffer makeLabelBlock(CharsetEncoder encoder) {
ByteBuffer out = ByteBuffer.allocate(256 * labels.size());
for (TypLabel tl : labels) {
- out.put((byte) tl.getLang());
CharBuffer cb = CharBuffer.wrap(tl.getText());
try {
ByteBuffer buffer = encoder.encode(cb);
+ out.put((byte) tl.getLang());
out.put(buffer);
+ out.put((byte) 0);
} catch (CharacterCodingException ignore) {
+ //ignore.printStackTrace();
String name = encoder.charset().name();
//System.out.println("cs " + name);
- throw new TypLabelException(name);
+ log.warn("Cannot represent String", tl.getText(), "for language", tl.getLang(), "in CodePage", name);
+ //throw new TypLabelException(name);
}
- out.put((byte) 0);
}
return out;
Index: src/uk/me/parabola/mkgmap/main/TypCompiler.java
===================================================================
--- src/uk/me/parabola/mkgmap/main/TypCompiler.java (revision 4413)
+++ src/uk/me/parabola/mkgmap/main/TypCompiler.java (working copy)
@@ -21,11 +21,13 @@
import java.io.InputStreamReader;
import java.io.Reader;
import java.io.UnsupportedEncodingException;
+import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;
+import java.nio.charset.CharsetDecoder;
import java.nio.charset.StandardCharsets;
import java.nio.file.StandardOpenOption;
@@ -85,7 +87,7 @@
param.setFamilyId(family);
if (product != -1)
param.setProductId(product);
- if (cp != -1 && param.getCodePage() == 0)
+ if (cp != -1)
param.setCodePage(cp);
File outFile = new File(filename);
@@ -134,7 +136,7 @@
try {
Reader r = new BufferedReader(new InputStreamReader(new FileInputStream(filename), charset));
try {
- tr.read(filename, r);
+ tr.read(filename, r, charset);
} finally {
Utils.closeFile(r);
}
@@ -204,79 +206,98 @@
class CharsetProbe {
- private String codePage;
- private CharsetEncoder encoder;
+ // TODO: this should could be moved to somewhere like util and used on other text files
+ // except looking for Codepage is particular to Typ files
+ // and want to have ability to return default environment decoder
+ // (ie inputStream without 2nd parameter)
- public CharsetProbe() {
- setCodePage("latin1");
- }
+ private String probeCharset(String file) {
- private void setCodePage(String codePage) {
- if ("cp65001".equalsIgnoreCase(codePage)) {
- this.codePage = "utf-8";
- this.encoder = StandardCharsets.UTF_8.newEncoder();
- } else {
- this.codePage = codePage;
- this.encoder = Charset.forName(codePage).newEncoder();
- }
- }
+ final String BOM_UTF_8 = "\u00EF\u00BB\u00BF";
+ final String BOM_UTF_16LE = "\u00FF\u00FE";
+ final String BOM_UTF_16BE = "\u00FE\u00FF";
+ final String BOM_UTF_32LE = "\u00FF\u00FE\u0000\u0000";
+ final String BOM_UTF_32BE = "\u0000\u0000\u00FE\u00FF";
- private String probeCharset(String file) {
- String readingCharset = "utf-8";
+ final Charset byteCharNoMap = StandardCharsets.ISO_8859_1; // byteVal == charVal
+ final CharsetDecoder utf8Decoder = StandardCharsets.UTF_8.newDecoder();
+ String charset = null;
+ InputStream is = null;
try {
- tryCharset(file, readingCharset);
- return readingCharset;
- } catch (TypLabelException e) {
+ is = new FileInputStream(file);
+ } catch (FileNotFoundException e) {
+ throw new ExitException("File not found " + file);
+ }
+ BufferedReader br = new BufferedReader(new InputStreamReader(is, byteCharNoMap));
+ String line;
+ int lineNo = 0;
+ boolean validUTF8 = true;
+ do {
try {
- readingCharset = e.getCharsetName();
- tryCharset(file, readingCharset);
- } catch (Exception e1) {
- return "utf-8";
+ line = br.readLine();
+ } catch (IOException e) {
+ throw new ExitException("Unable to read file " + file);
}
- }
+ if (line == null)
+ break;
+ ++lineNo;
+ if (line.isEmpty())
+ continue;
+ if (lineNo <= 2) { // only check the first few lines for these
+ if (line.contains(BOM_UTF_8))
+ charset = "UTF-8";
+ else if (line.contains(BOM_UTF_32LE)) // must test _32 before _16
+ charset = "UTF-32LE";
+ else if (line.contains(BOM_UTF_32BE))
+ charset = "UTF-32BE";
+ else if (line.contains(BOM_UTF_16LE))
+ charset = "UTF-16LE";
+ else if (line.contains(BOM_UTF_16BE))
+ charset = "UTF-16BE";
+ if (charset != null)
+ break;
- return readingCharset;
- }
+ int strInx = line.indexOf("-*- coding:"); // be lax about start/end
+ if (strInx >= 0) {
+ charset = line.substring(strInx+11).trim();
+ strInx = charset.indexOf(' ');
+ if (strInx >= 0)
+ charset = charset.substring(0, strInx);
+ break;
+ }
+ }
- private void tryCharset(String file, String readingCharset) {
-
- try (InputStream is = new FileInputStream(file); BufferedReader br = new BufferedReader(new InputStreamReader(is, readingCharset))) {
-
- String line;
- while ((line = br.readLine()) != null) {
- if (line.isEmpty())
- continue;
-
- // This is a giveaway the file is in utf-something, so ignore anything else
- if (line.charAt(0) == 0xfeff)
- return;
-
- if (line.startsWith("CodePage=")) {
- String[] split = line.split("=");
- try {
- if (split.length > 1)
- setCodePage("cp" + Integer.decode(split[1].trim()));
- } catch (NumberFormatException e) {
- setCodePage("cp1252");
- }
+ // special for TypFile; to be compatible with possible old usage
+ if (line.startsWith("CodePage=")) {
+ charset = line.substring(9).trim();
+ try {
+ int codePage = Integer.decode(charset);
+ if (codePage == 65001)
+ charset = "UTF-8";
+ else
+ charset = "cp" + codePage;
+ } catch (NumberFormatException e) {
}
+ break;
+ }
- if (line.startsWith("String")) {
- CharBuffer cb = CharBuffer.wrap(line);
- if (encoder != null)
- encoder.encode(cb);
+ if (validUTF8) { // test the line for being valid UTF-8
+ ByteBuffer asBytes = byteCharNoMap.encode(line);
+ try { // arbitary sequences of bytes > 127 tend not to be UTF8
+ /*CharBuffer asChars =*/ utf8Decoder.decode(asBytes);
+ } catch (CharacterCodingException e) {
+ validUTF8 = false;
+ // don't stop as might still get coding directive
}
}
- } catch (UnsupportedEncodingException | CharacterCodingException e) {
- throw new TypLabelException(codePage);
- } catch (FileNotFoundException e) {
- throw new ExitException("File not found " + file);
-
+ } while (true);
+ try {
+ is.close();
} catch (IOException e) {
- throw new ExitException("Could not read file " + file);
}
+ return charset != null ? charset : (validUTF8 ? "UTF-8" : "cp1252");
}
}
}
Index: src/uk/me/parabola/mkgmap/scan/TokenScanner.java
===================================================================
--- src/uk/me/parabola/mkgmap/scan/TokenScanner.java (revision 4413)
+++ src/uk/me/parabola/mkgmap/scan/TokenScanner.java (working copy)
@@ -28,6 +28,7 @@
*/
public class TokenScanner {
private static final int NO_PUSHBACK = 0;
+ private String charset = "utf-8";
// Reading state
private final Reader reader;
@@ -53,6 +54,10 @@
fileName = filename;
}
+ public void setCharset(String charset) {
+ this.charset = charset;
+ }
+
/**
* Peek and return the first token. It is not consumed.
*/
@@ -236,7 +241,7 @@
try {
c = reader.read();
if (c == 0xfffd)
- throw new SyntaxException(this, "Bad character in input, file probably not in utf-8");
+ throw new SyntaxException(this, "Bad character in input, file probably not in " + charset);
} catch (IOException e) {
isEOF = true;
c = -1;
Index: src/uk/me/parabola/mkgmap/typ/IdSection.java
===================================================================
--- src/uk/me/parabola/mkgmap/typ/IdSection.java (revision 4413)
+++ src/uk/me/parabola/mkgmap/typ/IdSection.java (working copy)
@@ -42,7 +42,8 @@
} else if (name.equalsIgnoreCase("ProductCode")) {
data.setProductId(ival);
} else if (name.equalsIgnoreCase("CodePage")) {
- data.setSort(SrtTextReader.sortForCodepage(ival));
+ if (data.getSort() == null) // ignore if --code-page
+ data.setSort(SrtTextReader.sortForCodepage(ival));
} else {
throw new SyntaxException(scanner, "Unrecognised keyword in id section: " + name);
}
Index: src/uk/me/parabola/mkgmap/typ/TypTextReader.java
===================================================================
--- src/uk/me/parabola/mkgmap/typ/TypTextReader.java (revision 4413)
+++ src/uk/me/parabola/mkgmap/typ/TypTextReader.java (working copy)
@@ -32,9 +32,10 @@
// As the file is read in, the information is saved into this data structure.
private final TypData data = new TypData();
- public void read(String filename, Reader r) {
+ public void read(String filename, Reader r, String charset) {
TokenScanner scanner = new TokenScanner(filename, r);
scanner.setCommentChar(null); // the '#' comment character is not appropriate for this file
+ scanner.setCharset(charset);
ProcessSection currentSection = null;
_______________________________________________
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev