Hi Ticker,

remember that cs932 is a double-byte character set.
With your code only a few unmappable utf-16 characters are replaced, for the 
rest one of cs932 is used, but without any good reason. The result is typically 
garbage.

I've modified the patch to replace any  unmappable character that was not 
transliterated by '?' .
I've also attached a debug version that shows what goes on.
A possible change in SparseTransliterator would be to add a mapping for the 
MATH MINUS, the other FULLWIDTH digits are supported in cs932.

Gerd









________________________________________
Von: mkgmap-dev <mkgmap-dev-boun...@lists.mkgmap.org.uk> im Auftrag von Ticker 
Berkin <rwb-mkg...@jagit.co.uk>
Gesendet: Dienstag, 16. November 2021 17:33
An: Development list for mkgmap
Betreff: Re: [mkgmap-dev] New assertion, now with code-page=632 and Japan tile

Hi

wouldn't:
        if ((c & 0xff) == 0)
                c = "?";
be safer

I don't understand the point of sparseTranslitorator and why it is only
used for cp932 (japanese), unless this charset includes quite a few
european accented character.

If this is the case then wouldn't it be much better to do as I
described, the essence of which is not to transliterate the complete
string into the small ascii/latin1 set just because some chars can't be
mapped. The TableTranslitorator (ascii & latin1) map these FULLWIDTH
digits (and letters). MATHS MINUS isn't defined but easy to add.

Handling char at a time might allow removal of the 'ascii' table - if
transliteration changes char to [string of] another, for each of these,
if can't be represented, transliterate them.

Ticker


On Tue, 2021-11-16 at 15:48 +0000, Gerd Petermann wrote:
> Hi all,
>
> this small patch would be my approach. It replaces those characters
> which don't fit into a byte by '?'
> This fixes the problems with japanese codepage 932.
>
> Gerd
> BTW: SparseTransliterator is very sparse. We could add a few more
> character mappings, for example there is a housenumber that contains
> "1237−1" instead of "1237-1".
> https://www.fontspace.com/unicode/analyzer#e=77yR77yS77yT77yX4oiS77yR
>
> ________________________________________
> Von: mkgmap-dev <mkgmap-dev-boun...@lists.mkgmap.org.uk> im Auftrag
> von Ticker Berkin <rwb-mkg...@jagit.co.uk>
> Gesendet: Montag, 15. November 2021 15:59
> An: Development list for mkgmap
> Betreff: Re: [mkgmap-dev] New assertion, now with code-page=632 and
> Japan tile
>
> Hi
>
> How about something like:
>
> If the full string fails to encode in the target charset, process
> char
> at a time.
>
> If a char can't be represented, try transliteration on it and, if
> none
> defined, use "?", then go through the resultant string char at a
> time,
> and if this can't be represented, drop it.
>
> Maybe a final warning at end if no transliteration for a char or
> transliteration couldn't be represented.
>
> Ticker
>
> On Mon, 2021-11-15 at 13:04 +0000, Gerd Petermann wrote:
> > Hi all,
> >
> > > Maybe we should simply stop transliteration when this happens and
> > > return an empty string for the label?
> >
> > any thoughts on this?
> >
> > Gerd
> >
> > ________________________________________
> > Von: mkgmap-dev <mkgmap-dev-boun...@lists.mkgmap.org.uk> im Auftrag
> > von Gerd Petermann <gpetermann_muenc...@hotmail.com>
> > Gesendet: Mittwoch, 10. November 2021 11:17
> > An: Development list for mkgmap
> > Betreff: Re: [mkgmap-dev] New assertion,        now with code-
> > page=632 and Japan tile
> >
> > Hi devs,
> >
> > the problem occurs with node https://www.osm.org/node/5692472121
> > name=키타가키 고로케
> > Google translate says the name is Korean. The (utf8) name cannot be
> > translated into code-page 932 (japanese) and thus mkgmap converts
> > the
> > internal utf16 representation of the name to bytes.  This happens
> > in
> > method AnyCharsetEncoder.encodeText(String text) in this loop:
> >                                 for (int i = 0; i < s.length();
> > i++)
> >                                         outBuf.put((byte)
> > s.charAt(i));
> > The name 키타가키 고로케 ends with  케 and the char value is \ucf00, so it
> > is
> > converted to \0x00.
> > Maybe we should simply stop transliteration when this happens and
> > return an empty string for the label?
> >
> > If mkgmap is executed without the -ea run time option the map shows
> > name 、タ for the restaurant which is just wrong.
> > Gerd
> >
> > ________________________________________
> > Von: mkgmap-dev <mkgmap-dev-boun...@lists.mkgmap.org.uk> im Auftrag
> > von Gerd Petermann <gpetermann_muenc...@hotmail.com>
> > Gesendet: Mittwoch, 10. November 2021 09:43
> > An: Development list for mkgmap
> > Betreff: Re: [mkgmap-dev] New assertion,        now with code-
> > page=632 and Japan tile
> >
> > Hi Carlos,
> >
> > I'll try to debug this.
> >
> > BTW: I see you use *.o5m for the tiles (output from splitter). I
> > think this is no longer a good choice, pbf is a lot smaller and
> > almost as fast. Esp. when it comes to the goal of reducing disk I/O
> > (as with --gmapi-minimal)
> >
> > Gerd
> >
> > ________________________________________
> > Von: mkgmap-dev <mkgmap-dev-boun...@lists.mkgmap.org.uk> im Auftrag
> > von Carlos Dávila <car...@alternativaslibres.org>
> > Gesendet: Dienstag, 9. November 2021 22:54
> > An: mkgmap-dev@lists.mkgmap.org.uk
> > Betreff: Re: [mkgmap-dev] New assertion,        now with code-
> > page=632 and Japan tile
> >
> > Hi Ticker
> >
> > Not sure if relevant, but note in this case assertion occurs while
> > compiling the tile, not the index. In fact, --index is not included
> > in
> > the command.
> >
> > El 9/11/21 a las 21:55, Ticker Berkin escribió:
> > > Hi
> > >
> > > I think this assertion could be removed from the code.
> > >
> > > Looking through the definition of Shift-JIS, I read it as saying
> > > the
> > > second byte shouldn't be zero, so I don't know why this happens.
> > >
> > > As with the Chinese code-pages, mkgmap has places where multi-
> > > byte
> > > encodings are not handled correctly in the --index generation and
> > > unknown meanings of flags to the Garmin software.
> > >
> > > Ticker
> > >
> > >
> > >
> > > On 09/11/2021 19:43, Carlos Dávila wrote:
> > > > code-page=932, sorry for the typo.
> > > >
> > > > El 9/11/21 a las 20:36, Carlos Dávila escribió:
> > > > > The command below produces an assertion while compiling this
> > > > > tile
> > > > > <https://files.mkgmap.org.uk/download/526/31191025.o5m> from
> > > > > Japan.
> > > > > Process continues with remaining tiles and finishes without
> > > > > "Number
> > > > > of MapFailedExceptions: 1" as expected. This is with r4813,
> > > > > but
> > > > > I
> > > > > also tried with an old version of mkgmap with the same
> > > > > result.
> > > > >
> > > > > java -Xmx27G -ea -jar mkgmap.jar--code-page=632 31191025.o5m
> > > > > Mkgmap version 4813
> > > > > Time started: Tue Nov 09 20:18:16 CET 2021
> > > > > WARNING (global): Setting max-jobs to 8
> > > > > Exception in thread "main" java.lang.AssertionError: found
> > > > > trailing
> > > > > 0 in chars
> > > > >         at
> > > > > uk.me.parabola.imgfmt.app.labelenc.EncodedText.<init>(Encoded
> > > > > Te
> > > > > xt.java:39)
> > > > >
> > > > >         at
> > > > > uk.me.parabola.imgfmt.app.labelenc.AnyCharsetEncoder.encodeTe
> > > > > xt
> > > > > (AnyCharsetEncoder.java:112)
> > > > >
> > > > >         at
> > > > > uk.me.parabola.imgfmt.app.lbl.LBLFile.newLabel(LBLFile.java:1
> > > > > 32
> > > > > )
> > > > >         at
> > > > > uk.me.parabola.imgfmt.app.lbl.PlacesFile.createPOI(PlacesFile
> > > > > .j
> > > > > ava:253)
> > > > >         at
> > > > > uk.me.parabola.imgfmt.app.lbl.LBLFile.createPOI(LBLFile.java:
> > > > > 17
> > > > > 2)
> > > > >         at
> > > > > uk.me.parabola.mkgmap.build.MapBuilder.processPOIs(MapBuilder
> > > > > .j
> > > > > ava:670)
> > > > >         at
> > > > > uk.me.parabola.mkgmap.build.MapBuilder.makeMap(MapBuilder.jav
> > > > > a:
> > > > > 325)
> > > > >         at
> > > > > uk.me.parabola.mkgmap.main.MapMaker.makeMap(MapMaker.java:114
> > > > > )
> > > > >         at
> > > > > uk.me.parabola.mkgmap.main.MapMaker.makeMap(MapMaker.java:62)
> > > > >         at
> > > > > uk.me.parabola.mkgmap.main.Main.lambda$processFilename$1(Main
> > > > > .j
> > > > > ava:291)
> > > > >         at
> > > > > java.base/java.util.concurrent.FutureTask.run(FutureTask.java
> > > > > :2
> > > > > 64)
> > > > >         at
> > > > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(T
> > > > > hr
> > > > > eadPoolExecutor.java:1128)
> > > > >
> > > > >         at
> > > > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > > > > Th
> > > > > readPoolExecutor.java:628)
> > > > >
> > > > >         at java.base/java.lang.Thread.run(Thread.java:829)
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > mkgmap-dev mailing list
> > > > > mkgmap-dev@lists.mkgmap.org.uk
> > > > > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > > >
> > > > _______________________________________________
> > > > mkgmap-dev mailing list
> > > > mkgmap-dev@lists.mkgmap.org.uk
> > > > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > > _______________________________________________
> > > mkgmap-dev mailing list
> > > mkgmap-dev@lists.mkgmap.org.uk
> > > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> >
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev@lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev@lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev@lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev@lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
>
>
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


_______________________________________________
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Attachment: cs932-v2.patch
Description: cs932-v2.patch

Attachment: cs932-v2-debug.patch
Description: cs932-v2-debug.patch

_______________________________________________
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Reply via email to