Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-22 Thread Steve Ratcliffe

Hi Ticker

> Problem is that resources/sort/cp65001.txt doesn't give ordering to
> lots of characters; it looks like it covers only about 10,500 of the
> 1,112,064 possible code-points. Many of these non-ordered characters
> are being used by the names in the tile in question.

I used the program in extra/src/uk/me/parabola/util/CollationRules.java
to generate some of the tables.

This uses the file "allkeys.txt" which can be obtained
from https://www.unicode.org/Public/UCA/latest/allkeys.txt

The document explaining the unicode collation rules that references
that file is: http://www.unicode.org/reports/tr10/ It includes a
section for programmatically deriving the weights for characters that
do not have explicit entries in the table.

> Assuming the actual ordering of unspecified code-points doesn't really
> matter, I propose to change the logic slightly so undefined Unicode is
> sorted on its 16-bit value after the range of known sorts.

I think that is a good initial approach to get things working.

Steve

___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-22 Thread Ticker Berkin
Hi Gerd

I was just starting to reply to your previous mail about which parts
were necessary - what I had was:

Mdr5 and Mdr25 need to use the same sort/unique algorithm to ensure
Mdr25 isn't bigger than Mdr5. Regardless of the character set and the
logic changes to Sort.java, it is possible, but very unlikely, to come
across a set of city names that cause this problem while the algorithms
are different. Given Mdr5 is the most significant, Mdr25 is changed to
use a Collator and Mdr29.java now includes an assert for the relative
sizes.

Mdr23, 24 and 28 using the same logic (sort followed by detecting a
change) to get a unique list, so I think they should be fixed in the
same way as Mdr25, also bringing them into line with Mdr7, Mdr20 and
Mdr2x. using the same collator strength.

...

The way that Mdr29Record just takes the first reference per country to
mdr17/22/24/25/26, maybe the region/country complexity doesn't matter
in the mdr25 logic.

Ticker


On Fri, 2021-10-22 at 08:01 +, Gerd Petermann wrote:
> Hi Ticker,
> 
> I've committed the patch as is. Reg. Mdr25: The current code doesn't
> make much sense, but maybe there is no Garmin software that uses this
> index?
> I have only two maps (AdriaTOPO 2.40 and a Topomap Benelux from 2009)
> where this section is filled. Both maps are locked, so I cannot analyse
> the content further.
> 
> Gerd
> 
> 
> Von: mkgmap-dev  im Auftrag von
> Gerd Petermann 
> Gesendet: Donnerstag, 21. Oktober 2021 15:48
> An: Development list for mkgmap
> Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index
> from unicode tiles
> 
> Hi Ticker,
> 
> I agree that the original code isn't clear, what I don't understand is
> this: Do we need the changes reg. the collator to fix the problem
> regarding unicode or are these two separate problems? The changes in
> Sort seem to be needed (and I have no clue if your approach is good or
> not), the others seem to be OK, but not needed to avoid the crash.
> 
> I don't mind to commit the change to class Sort soon  as long as only
> unicode maps are affected. For all other changes I'd prefer to have a
> new branch and maybe find a way to verify if they are improvements or
> not.
> 
> Gerd
> 
> 
> 
> Von: Gerd Petermann 
> Gesendet: Donnerstag, 21. Oktober 2021 11:21
> An: Development list for mkgmap
> Betreff: AW: [mkgmap-dev] java.lang.AssertionError while building index
> from unicode tiles
> 
> Hi Ticker,
> 
> so far I don't understand most of the changes in mdrUnicode_v2.patch
> 
> Setting strength to SECONDARY (instead of the default TERTIARY) means
> that e.g. a and ä (German umlaut) are treated the same, right? Why do
> you describe it "generally case-insensitive"?
> 
> This doesn't seem to be related to unicode maps only, so I wonder what
> side effects this has.
> 
> Gerd
> 
> ________________
> Von: mkgmap-dev  im Auftrag von
> Ticker Berkin 
> Gesendet: Mittwoch, 20. Oktober 2021 12:32
> An: Development list for mkgmap; Steve Ratcliffe
> Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index
> from unicode tiles
> 
> Hi
> 
> In the changes I've just made, I hope I've been consistent and fixed
> all instances to use collator.compare() where scanning the results of a
> sort on the same table for a change. Also consistently setting strength
> to SECONDARY (generally case-insensitive).
> 
> There may be places where an indirect test should also use
> collator.compare(). Maybe this should be tackled next.
> 
> I didn't look at MdrCheck.
> 
> Ticker
> 
> On Wed, 2021-10-20 at 08:24 +, Gerd Petermann wrote:
> > Hi Ticker & Steve,
> > 
> > I don't understand the mixed use of collator.compare() and
> > String.equals() in the Mdr classes.
> > When we use the collator to sort the data we probably also have to
> > use it to compare for equality while grouping?
> > 
> > I also see differences between the code in MdrCheck and the classes
> > in mkgmap.
> > 
> > Gerd
> 
> 
> ___
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> ___
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> ___
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-22 Thread Gerd Petermann
Hi Ticker,

I've committed the patch as is. Reg. Mdr25: The current code doesn't make much 
sense, but maybe there is no Garmin software that uses this index?
I have only two maps (AdriaTOPO 2.40 and a Topomap Benelux from 2009) where 
this section is filled. Both maps are locked, so I cannot analyse the content 
further.

Gerd


Von: mkgmap-dev  im Auftrag von Gerd 
Petermann 
Gesendet: Donnerstag, 21. Oktober 2021 15:48
An: Development list for mkgmap
Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from 
unicode tiles

Hi Ticker,

I agree that the original code isn't clear, what I don't understand is this: Do 
we need the changes reg. the collator to fix the problem regarding unicode or 
are these two separate problems? The changes in Sort seem to be needed (and I 
have no clue if your approach is good or not), the others seem to be OK, but 
not needed to avoid the crash.

I don't mind to commit the change to class Sort soon  as long as only unicode 
maps are affected. For all other changes I'd prefer to have a new branch and 
maybe find a way to verify if they are improvements or not.

Gerd



Von: Gerd Petermann 
Gesendet: Donnerstag, 21. Oktober 2021 11:21
An: Development list for mkgmap
Betreff: AW: [mkgmap-dev] java.lang.AssertionError while building index from 
unicode tiles

Hi Ticker,

so far I don't understand most of the changes in mdrUnicode_v2.patch

Setting strength to SECONDARY (instead of the default TERTIARY) means that e.g. 
a and ä (German umlaut) are treated the same, right? Why do you describe it 
"generally case-insensitive"?

This doesn't seem to be related to unicode maps only, so I wonder what side 
effects this has.

Gerd


Von: mkgmap-dev  im Auftrag von Ticker 
Berkin 
Gesendet: Mittwoch, 20. Oktober 2021 12:32
An: Development list for mkgmap; Steve Ratcliffe
Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from 
unicode tiles

Hi

In the changes I've just made, I hope I've been consistent and fixed
all instances to use collator.compare() where scanning the results of a
sort on the same table for a change. Also consistently setting strength
to SECONDARY (generally case-insensitive).

There may be places where an indirect test should also use
collator.compare(). Maybe this should be tackled next.

I didn't look at MdrCheck.

Ticker

On Wed, 2021-10-20 at 08:24 +, Gerd Petermann wrote:
> Hi Ticker & Steve,
>
> I don't understand the mixed use of collator.compare() and
> String.equals() in the Mdr classes.
> When we use the collator to sort the data we probably also have to
> use it to compare for equality while grouping?
>
> I also see differences between the code in MdrCheck and the classes
> in mkgmap.
>
> Gerd


___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-21 Thread Gerd Petermann
Hi Ticker,

I agree that the original code isn't clear, what I don't understand is this: Do 
we need the changes reg. the collator to fix the problem regarding unicode or 
are these two separate problems? The changes in Sort seem to be needed (and I 
have no clue if your approach is good or not), the others seem to be OK, but 
not needed to avoid the crash.

I don't mind to commit the change to class Sort soon  as long as only unicode 
maps are affected. For all other changes I'd prefer to have a new branch and 
maybe find a way to verify if they are improvements or not.

Gerd



Von: Gerd Petermann 
Gesendet: Donnerstag, 21. Oktober 2021 11:21
An: Development list for mkgmap
Betreff: AW: [mkgmap-dev] java.lang.AssertionError while building index from 
unicode tiles

Hi Ticker,

so far I don't understand most of the changes in mdrUnicode_v2.patch

Setting strength to SECONDARY (instead of the default TERTIARY) means that e.g. 
a and ä (German umlaut) are treated the same, right? Why do you describe it 
"generally case-insensitive"?

This doesn't seem to be related to unicode maps only, so I wonder what side 
effects this has.

Gerd


Von: mkgmap-dev  im Auftrag von Ticker 
Berkin 
Gesendet: Mittwoch, 20. Oktober 2021 12:32
An: Development list for mkgmap; Steve Ratcliffe
Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from 
unicode tiles

Hi

In the changes I've just made, I hope I've been consistent and fixed
all instances to use collator.compare() where scanning the results of a
sort on the same table for a change. Also consistently setting strength
to SECONDARY (generally case-insensitive).

There may be places where an indirect test should also use
collator.compare(). Maybe this should be tackled next.

I didn't look at MdrCheck.

Ticker

On Wed, 2021-10-20 at 08:24 +, Gerd Petermann wrote:
> Hi Ticker & Steve,
>
> I don't understand the mixed use of collator.compare() and
> String.equals() in the Mdr classes.
> When we use the collator to sort the data we probably also have to
> use it to compare for equality while grouping?
>
> I also see differences between the code in MdrCheck and the classes
> in mkgmap.
>
> Gerd


___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-21 Thread Ticker Berkin
Hi Gerd

In the existing code, Mdr20, Mdr2x, and Mdr7 set the strength to
SECONDARY, PrefixIndex set it to PRIMARY and Mdr5 didn't set it.

The Java manual doesn't say what the default strength is for a new
Collator:

https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/Collator.html

but I've seen reference to Collator.getInstance() being locale
dependant and/or TERTIARY. Generally, SECONDARY distinguishes between
accents and TERTIARY between case.

Case-insensitive seems to be the correct option mkgmap / map indexing.

Ticker

On Thu, 2021-10-21 at 09:21 +, Gerd Petermann wrote:
> Hi Ticker,
> 
> so far I don't understand most of the changes in mdrUnicode_v2.patch
> 
> Setting strength to SECONDARY (instead of the default TERTIARY) means
> that e.g. a and ä (German umlaut) are treated the same, right? Why do
> you describe it "generally case-insensitive"?
> 
> This doesn't seem to be related to unicode maps only, so I wonder what
> side effects this has.
> 
> Gerd
> 
> 
> Von: mkgmap-dev  im Auftrag von
> Ticker Berkin 
> Gesendet: Mittwoch, 20. Oktober 2021 12:32
> An: Development list for mkgmap; Steve Ratcliffe
> Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index
> from unicode tiles
> 
> Hi
> 
> In the changes I've just made, I hope I've been consistent and fixed
> all instances to use collator.compare() where scanning the results of a
> sort on the same table for a change. Also consistently setting strength
> to SECONDARY (generally case-insensitive).
> 
> There may be places where an indirect test should also use
> collator.compare(). Maybe this should be tackled next.
> 
> I didn't look at MdrCheck.
> 
> Ticker
> 
> On Wed, 2021-10-20 at 08:24 +, Gerd Petermann wrote:
> > Hi Ticker & Steve,
> > 
> > I don't understand the mixed use of collator.compare() and
> > String.equals() in the Mdr classes.
> > When we use the collator to sort the data we probably also have to
> > use it to compare for equality while grouping?
> > 
> > I also see differences between the code in MdrCheck and the classes
> > in mkgmap.
> > 
> > Gerd
> 
> 
> ___
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> ___
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-21 Thread Gerd Petermann
Hi Ticker,

so far I don't understand most of the changes in mdrUnicode_v2.patch

Setting strength to SECONDARY (instead of the default TERTIARY) means that e.g. 
a and ä (German umlaut) are treated the same, right? Why do you describe it 
"generally case-insensitive"?

This doesn't seem to be related to unicode maps only, so I wonder what side 
effects this has.

Gerd


Von: mkgmap-dev  im Auftrag von Ticker 
Berkin 
Gesendet: Mittwoch, 20. Oktober 2021 12:32
An: Development list for mkgmap; Steve Ratcliffe
Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from 
unicode tiles

Hi

In the changes I've just made, I hope I've been consistent and fixed
all instances to use collator.compare() where scanning the results of a
sort on the same table for a change. Also consistently setting strength
to SECONDARY (generally case-insensitive).

There may be places where an indirect test should also use
collator.compare(). Maybe this should be tackled next.

I didn't look at MdrCheck.

Ticker

On Wed, 2021-10-20 at 08:24 +, Gerd Petermann wrote:
> Hi Ticker & Steve,
>
> I don't understand the mixed use of collator.compare() and
> String.equals() in the Mdr classes.
> When we use the collator to sort the data we probably also have to
> use it to compare for equality while grouping?
>
> I also see differences between the code in MdrCheck and the classes
> in mkgmap.
>
> Gerd


___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-20 Thread Ticker Berkin
Hi

In the changes I've just made, I hope I've been consistent and fixed
all instances to use collator.compare() where scanning the results of a
sort on the same table for a change. Also consistently setting strength
to SECONDARY (generally case-insensitive).

There may be places where an indirect test should also use
collator.compare(). Maybe this should be tackled next.

I didn't look at MdrCheck.

Ticker

On Wed, 2021-10-20 at 08:24 +, Gerd Petermann wrote:
> Hi Ticker & Steve,
> 
> I don't understand the mixed use of collator.compare() and
> String.equals() in the Mdr classes.
> When we use the collator to sort the data we probably also have to
> use it to compare for equality while grouping?
> 
> I also see differences between the code in MdrCheck and the classes
> in mkgmap.
> 
> Gerd


___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-20 Thread Ticker Berkin
Hi Gerd

I didn't understand this either - Mdr29 with lowest refs to Mdr17,
Mdr22, Mdr24, Mdr25 and Mdr26 is beyond me so I thought it best leave
that part untouched.

Ticker

On Wed, 2021-10-20 at 07:59 +, Gerd Petermann wrote:
> Hi Ticker,
> 
> please double check Mdr25:
> I just wonder why we compare the region name when we sort by the
> country name.
> 
> Looks wrong (also in the unpatched code)
> 
> Gerd


___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-20 Thread Gerd Petermann
Hi Ticker & Steve,

I don't understand the mixed use of collator.compare() and String.equals() in 
the Mdr classes.
When we use the collator to sort the data we probably also have to use it to 
compare for equality while grouping?

I also see differences between the code in MdrCheck and the classes in mkgmap.

Gerd


Von: mkgmap-dev  im Auftrag von Gerd 
Petermann 
Gesendet: Mittwoch, 20. Oktober 2021 09:59
An: Development list for mkgmap
Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from 
unicode tiles

Hi Ticker,

please double check Mdr25:
I just wonder why we compare the region name when we sort by the country name.

Looks wrong (also in the unpatched code)

Gerd


Von: mkgmap-dev  im Auftrag von Ticker 
Berkin 
Gesendet: Dienstag, 19. Oktober 2021 12:10
An: Development list for mkgmap
Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from 
unicode tiles

Hi Gerd

Here it is

Ticker

On Tue, 2021-10-19 at 09:22 +, Gerd Petermann wrote:
> Hi Ticker,
>
> yes, please remove all unrelated optimizations.
>
> Gerd
>
> 
> Von: mkgmap-dev  im Auftrag
> von Ticker Berkin 
> Gesendet: Dienstag, 19. Oktober 2021 11:03
> An: Development list for mkgmap
> Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building
> index from unicode tiles
>
> Hi Gerd
>
> I'd removed the change relating to clearing the reference to the Sort
> object to allow garbage garbage collection; as you said, this won't
> happen because Sort is shared. I do notice, however, that on a
> typical
> mkgmap run, Sort is created/read 3 times - it isn't shared as fully
> as
> possible.
>
> The other changes (LargeListSorter) are slight improvements to memory
> usage and/or processing time - I can remove them if you want.
>
> Ticker
>
>
> On Tue, 2021-10-19 at 08:13 +, Gerd Petermann wrote:
> > Hi Ticker,
> >
> > please remove the unrelated changes. I think we discussed them with
> > patch mdrSort.patch in May, subject "MDR building out-of-memory".
> >
> > Gerd
> >
> > ________
> > Von: mkgmap-dev  im Auftrag
> > von Ticker Berkin 
> > Gesendet: Montag, 18. Oktober 2021 16:36
> > An: Development list for mkgmap
> > Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building
> > index from unicode tiles
> >
> > Hi Gerd
> >
> > Here is first version of the changes to improve MDR unicode and
> > stop
> > the crash.
> >
> > It always provides a PRIMARY strength sort value, both in the key
> > for
> > sorting and direct comparison when using the collator. Previously
> > neither of these would have anything for a unicode character not
> > mentioned in the sort/cp65001.txt file
> >
> > In an attempt to stop ordering clashes between the specified sort
> > and
> > the ones fudged from the actual unicode value, it orders anything
> > unknown after the known values. Unfortunately these can then become
> > larger than 2 bytes - and, as this is all the space available
> > without
> > re-structuring, they have to wrap onto the known sort region. I
> > only
> > found 1 character that did this and I don't know if it conflicted
> > with
> > an existing sort.
> >
> > Regardless of the character set used, in all the places where
> > sorting
> > is used for de-dupe, I've used the SECONDARY strength collator to
> > detect similar record instead of name.equals(lastName)
> >
> > I also noticed that my source base included optimisation for
> > LargeListSorter, its use of a key cache and some tidy-up of this in
> > mdr7 & mdr11 so these are here as well.
> >
> > Ticker
> >
> > ___
> > mkgmap-dev mailing list
> > mkgmap-dev@lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
>
>
> ___
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> ___
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-20 Thread Gerd Petermann
Hi Ticker,

please double check Mdr25:
I just wonder why we compare the region name when we sort by the country name.

Looks wrong (also in the unpatched code)

Gerd


Von: mkgmap-dev  im Auftrag von Ticker 
Berkin 
Gesendet: Dienstag, 19. Oktober 2021 12:10
An: Development list for mkgmap
Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from 
unicode tiles

Hi Gerd

Here it is

Ticker

On Tue, 2021-10-19 at 09:22 +, Gerd Petermann wrote:
> Hi Ticker,
>
> yes, please remove all unrelated optimizations.
>
> Gerd
>
> 
> Von: mkgmap-dev  im Auftrag
> von Ticker Berkin 
> Gesendet: Dienstag, 19. Oktober 2021 11:03
> An: Development list for mkgmap
> Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building
> index from unicode tiles
>
> Hi Gerd
>
> I'd removed the change relating to clearing the reference to the Sort
> object to allow garbage garbage collection; as you said, this won't
> happen because Sort is shared. I do notice, however, that on a
> typical
> mkgmap run, Sort is created/read 3 times - it isn't shared as fully
> as
> possible.
>
> The other changes (LargeListSorter) are slight improvements to memory
> usage and/or processing time - I can remove them if you want.
>
> Ticker
>
>
> On Tue, 2021-10-19 at 08:13 +, Gerd Petermann wrote:
> > Hi Ticker,
> >
> > please remove the unrelated changes. I think we discussed them with
> > patch mdrSort.patch in May, subject "MDR building out-of-memory".
> >
> > Gerd
> >
> > ____
> > Von: mkgmap-dev  im Auftrag
> > von Ticker Berkin 
> > Gesendet: Montag, 18. Oktober 2021 16:36
> > An: Development list for mkgmap
> > Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building
> > index from unicode tiles
> >
> > Hi Gerd
> >
> > Here is first version of the changes to improve MDR unicode and
> > stop
> > the crash.
> >
> > It always provides a PRIMARY strength sort value, both in the key
> > for
> > sorting and direct comparison when using the collator. Previously
> > neither of these would have anything for a unicode character not
> > mentioned in the sort/cp65001.txt file
> >
> > In an attempt to stop ordering clashes between the specified sort
> > and
> > the ones fudged from the actual unicode value, it orders anything
> > unknown after the known values. Unfortunately these can then become
> > larger than 2 bytes - and, as this is all the space available
> > without
> > re-structuring, they have to wrap onto the known sort region. I
> > only
> > found 1 character that did this and I don't know if it conflicted
> > with
> > an existing sort.
> >
> > Regardless of the character set used, in all the places where
> > sorting
> > is used for de-dupe, I've used the SECONDARY strength collator to
> > detect similar record instead of name.equals(lastName)
> >
> > I also noticed that my source base included optimisation for
> > LargeListSorter, its use of a key cache and some tidy-up of this in
> > mdr7 & mdr11 so these are here as well.
> >
> > Ticker
> >
> > ___
> > mkgmap-dev mailing list
> > mkgmap-dev@lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
>
>
> ___
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> ___
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-19 Thread Ticker Berkin
Hi Gerd

Here it is

Ticker

On Tue, 2021-10-19 at 09:22 +, Gerd Petermann wrote:
> Hi Ticker,
> 
> yes, please remove all unrelated optimizations.
> 
> Gerd
> 
> 
> Von: mkgmap-dev  im Auftrag
> von Ticker Berkin 
> Gesendet: Dienstag, 19. Oktober 2021 11:03
> An: Development list for mkgmap
> Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building
> index from unicode tiles
> 
> Hi Gerd
> 
> I'd removed the change relating to clearing the reference to the Sort
> object to allow garbage garbage collection; as you said, this won't
> happen because Sort is shared. I do notice, however, that on a
> typical
> mkgmap run, Sort is created/read 3 times - it isn't shared as fully
> as
> possible.
> 
> The other changes (LargeListSorter) are slight improvements to memory
> usage and/or processing time - I can remove them if you want.
> 
> Ticker
> 
> 
> On Tue, 2021-10-19 at 08:13 +, Gerd Petermann wrote:
> > Hi Ticker,
> > 
> > please remove the unrelated changes. I think we discussed them with
> > patch mdrSort.patch in May, subject "MDR building out-of-memory".
> > 
> > Gerd
> > 
> > ____
> > Von: mkgmap-dev  im Auftrag
> > von Ticker Berkin 
> > Gesendet: Montag, 18. Oktober 2021 16:36
> > An: Development list for mkgmap
> > Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building
> > index from unicode tiles
> > 
> > Hi Gerd
> > 
> > Here is first version of the changes to improve MDR unicode and
> > stop
> > the crash.
> > 
> > It always provides a PRIMARY strength sort value, both in the key
> > for
> > sorting and direct comparison when using the collator. Previously
> > neither of these would have anything for a unicode character not
> > mentioned in the sort/cp65001.txt file
> > 
> > In an attempt to stop ordering clashes between the specified sort
> > and
> > the ones fudged from the actual unicode value, it orders anything
> > unknown after the known values. Unfortunately these can then become
> > larger than 2 bytes - and, as this is all the space available
> > without
> > re-structuring, they have to wrap onto the known sort region. I
> > only
> > found 1 character that did this and I don't know if it conflicted
> > with
> > an existing sort.
> > 
> > Regardless of the character set used, in all the places where
> > sorting
> > is used for de-dupe, I've used the SECONDARY strength collator to
> > detect similar record instead of name.equals(lastName)
> > 
> > I also noticed that my source base included optimisation for
> > LargeListSorter, its use of a key cache and some tidy-up of this in
> > mdr7 & mdr11 so these are here as well.
> > 
> > Ticker
> > 
> > ___
> > mkgmap-dev mailing list
> > mkgmap-dev@lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> 
> 
> ___
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> ___
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Index: src/uk/me/parabola/imgfmt/app/mdr/Mdr23.java
===
--- src/uk/me/parabola/imgfmt/app/mdr/Mdr23.java	(revision 4808)
+++ src/uk/me/parabola/imgfmt/app/mdr/Mdr23.java	(working copy)
@@ -12,6 +12,7 @@
  */
 package uk.me.parabola.imgfmt.app.mdr;
 
+import java.text.Collator;
 import java.util.ArrayList;
 import java.util.List;
 
@@ -37,6 +38,8 @@
 	 */
 	public void sortRegions(List list) {
 		Sort sort = getConfig().getSort();
+		Collator collator = sort.getCollator();
+		collator.setStrength(Collator.SECONDARY);
 		List> keys = MdrUtils.sortList(sort, list);
 
 		String lastName = null;
@@ -47,7 +50,7 @@
 
 			// Only add if different name or map
 			String name = reg.getName();
-			if (reg.getMapIndex() != lastMapIndex || !name.equals(lastName)) {
+			if (lastName == null || reg.getMapIndex() != lastMapIndex || collator.compare(name, lastName) != 0) {
 record++;
 reg.getMdr28().setMdr23(record);
 regions.add(reg);
Index: src/uk/me/parabola/imgfmt/app/mdr/Mdr24.java
===
--- src/uk/me/parabola/imgfmt/app/mdr/Mdr24.java	(revision 4808)
+++ src/uk/me/parabola/imgfmt/app/mdr/Mdr24.java	(working copy)
@@ -12,6 +12,7 @@
  */
 package uk.me.parabola.imgfmt

Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-19 Thread Gerd Petermann
Hi Ticker,

yes, please remove all unrelated optimizations.

Gerd


Von: mkgmap-dev  im Auftrag von Ticker 
Berkin 
Gesendet: Dienstag, 19. Oktober 2021 11:03
An: Development list for mkgmap
Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from 
unicode tiles

Hi Gerd

I'd removed the change relating to clearing the reference to the Sort
object to allow garbage garbage collection; as you said, this won't
happen because Sort is shared. I do notice, however, that on a typical
mkgmap run, Sort is created/read 3 times - it isn't shared as fully as
possible.

The other changes (LargeListSorter) are slight improvements to memory
usage and/or processing time - I can remove them if you want.

Ticker


On Tue, 2021-10-19 at 08:13 +, Gerd Petermann wrote:
> Hi Ticker,
>
> please remove the unrelated changes. I think we discussed them with
> patch mdrSort.patch in May, subject "MDR building out-of-memory".
>
> Gerd
>
> 
> Von: mkgmap-dev  im Auftrag
> von Ticker Berkin 
> Gesendet: Montag, 18. Oktober 2021 16:36
> An: Development list for mkgmap
> Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building
> index from unicode tiles
>
> Hi Gerd
>
> Here is first version of the changes to improve MDR unicode and stop
> the crash.
>
> It always provides a PRIMARY strength sort value, both in the key for
> sorting and direct comparison when using the collator. Previously
> neither of these would have anything for a unicode character not
> mentioned in the sort/cp65001.txt file
>
> In an attempt to stop ordering clashes between the specified sort and
> the ones fudged from the actual unicode value, it orders anything
> unknown after the known values. Unfortunately these can then become
> larger than 2 bytes - and, as this is all the space available without
> re-structuring, they have to wrap onto the known sort region. I only
> found 1 character that did this and I don't know if it conflicted
> with
> an existing sort.
>
> Regardless of the character set used, in all the places where sorting
> is used for de-dupe, I've used the SECONDARY strength collator to
> detect similar record instead of name.equals(lastName)
>
> I also noticed that my source base included optimisation for
> LargeListSorter, its use of a key cache and some tidy-up of this in
> mdr7 & mdr11 so these are here as well.
>
> Ticker
>
> ___
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-19 Thread Ticker Berkin
Hi Gerd

I'd removed the change relating to clearing the reference to the Sort
object to allow garbage garbage collection; as you said, this won't
happen because Sort is shared. I do notice, however, that on a typical
mkgmap run, Sort is created/read 3 times - it isn't shared as fully as
possible.

The other changes (LargeListSorter) are slight improvements to memory
usage and/or processing time - I can remove them if you want.

Ticker


On Tue, 2021-10-19 at 08:13 +, Gerd Petermann wrote:
> Hi Ticker,
> 
> please remove the unrelated changes. I think we discussed them with
> patch mdrSort.patch in May, subject "MDR building out-of-memory".
> 
> Gerd
> 
> 
> Von: mkgmap-dev  im Auftrag
> von Ticker Berkin 
> Gesendet: Montag, 18. Oktober 2021 16:36
> An: Development list for mkgmap
> Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building
> index from unicode tiles
> 
> Hi Gerd
> 
> Here is first version of the changes to improve MDR unicode and stop
> the crash.
> 
> It always provides a PRIMARY strength sort value, both in the key for
> sorting and direct comparison when using the collator. Previously
> neither of these would have anything for a unicode character not
> mentioned in the sort/cp65001.txt file
> 
> In an attempt to stop ordering clashes between the specified sort and
> the ones fudged from the actual unicode value, it orders anything
> unknown after the known values. Unfortunately these can then become
> larger than 2 bytes - and, as this is all the space available without
> re-structuring, they have to wrap onto the known sort region. I only
> found 1 character that did this and I don't know if it conflicted
> with
> an existing sort.
> 
> Regardless of the character set used, in all the places where sorting
> is used for de-dupe, I've used the SECONDARY strength collator to
> detect similar record instead of name.equals(lastName)
> 
> I also noticed that my source base included optimisation for
> LargeListSorter, its use of a key cache and some tidy-up of this in
> mdr7 & mdr11 so these are here as well.
> 
> Ticker
> 
> ___
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-19 Thread Gerd Petermann
Hi Ticker,

please remove the unrelated changes. I think we discussed them with patch 
mdrSort.patch in May, subject "MDR building out-of-memory".

Gerd


Von: mkgmap-dev  im Auftrag von Ticker 
Berkin 
Gesendet: Montag, 18. Oktober 2021 16:36
An: Development list for mkgmap
Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from 
unicode tiles

Hi Gerd

Here is first version of the changes to improve MDR unicode and stop
the crash.

It always provides a PRIMARY strength sort value, both in the key for
sorting and direct comparison when using the collator. Previously
neither of these would have anything for a unicode character not
mentioned in the sort/cp65001.txt file

In an attempt to stop ordering clashes between the specified sort and
the ones fudged from the actual unicode value, it orders anything
unknown after the known values. Unfortunately these can then become
larger than 2 bytes - and, as this is all the space available without
re-structuring, they have to wrap onto the known sort region. I only
found 1 character that did this and I don't know if it conflicted with
an existing sort.

Regardless of the character set used, in all the places where sorting
is used for de-dupe, I've used the SECONDARY strength collator to
detect similar record instead of name.equals(lastName)

I also noticed that my source base included optimisation for
LargeListSorter, its use of a key cache and some tidy-up of this in
mdr7 & mdr11 so these are here as well.

Ticker

___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-18 Thread Ticker Berkin
Hi Gerd

Here is first version of the changes to improve MDR unicode and stop
the crash.

It always provides a PRIMARY strength sort value, both in the key for
sorting and direct comparison when using the collator. Previously
neither of these would have anything for a unicode character not
mentioned in the sort/cp65001.txt file

In an attempt to stop ordering clashes between the specified sort and
the ones fudged from the actual unicode value, it orders anything
unknown after the known values. Unfortunately these can then become
larger than 2 bytes - and, as this is all the space available without
re-structuring, they have to wrap onto the known sort region. I only 
found 1 character that did this and I don't know if it conflicted with
an existing sort.

Regardless of the character set used, in all the places where sorting
is used for de-dupe, I've used the SECONDARY strength collator to
detect similar record instead of name.equals(lastName)

I also noticed that my source base included optimisation for
LargeListSorter, its use of a key cache and some tidy-up of this in
mdr7 & mdr11 so these are here as well.

Ticker

Index: src/uk/me/parabola/imgfmt/app/mdr/LargeListSorter.java
===
--- src/uk/me/parabola/imgfmt/app/mdr/LargeListSorter.java	(revision 4808)
+++ src/uk/me/parabola/imgfmt/app/mdr/LargeListSorter.java	(working copy)
@@ -29,9 +29,11 @@
  */
 public abstract class LargeListSorter {
 	private final Sort sort;
+	private final boolean useCache;
 	
-	public LargeListSorter(Sort sort) {
+	public LargeListSorter(Sort sort, boolean useCache) {
 		this.sort = sort;
+		this.useCache = useCache;
 	}
 
 	/**
@@ -39,6 +41,8 @@
 	 * @param list list of records.
 	 */
 	public void sort(List list) {
+		if (list.size() <= 1)  // Mdr7: can have no streets or single element in partial list
+			return;
 		mergeSort(0, list, 0, list.size());
 	}
 	
@@ -51,13 +55,15 @@
 	 */
 	private void mergeSort(int depth, List list, int start, int len) {
 		// we split if the number is very high and recursion is not too deep
-		if (len > 1_000_000 && depth < 3) {
+		if (len > 500_000 && depth < 4) {
 			mergeSort(depth+1,list, start, len / 2); // left
 			mergeSort(depth+1,list, start + len / 2, len - len / 2); // right
 			merge(list,start,len);
 		} else {
 			// sort one chunk
-			Map cache = new HashMap<>();
+			Map cache = null;
+			if (useCache)
+cache = new HashMap<>();
 			List> keys = new ArrayList<>(len);
 
 			for (int i = start; i < start + len; i++) {
@@ -82,7 +88,7 @@
 		int stop2 = start + len;
 		boolean fetch1 = true;
 		boolean fetch2 = true;
-		List merged = new ArrayList<>();
+		List merged = new ArrayList<>(len);
 		SortKey sk1 = null;
 		SortKey sk2 = null;
 		while (pos1 < stop1 &&  pos2 < stop2) {
Index: src/uk/me/parabola/imgfmt/app/mdr/Mdr11.java
===
--- src/uk/me/parabola/imgfmt/app/mdr/Mdr11.java	(revision 4808)
+++ src/uk/me/parabola/imgfmt/app/mdr/Mdr11.java	(working copy)
@@ -61,8 +61,8 @@
 		pois.trimToSize();
 		Sort sort = getConfig().getSort();
 
-		LargeListSorter sorter = new LargeListSorter(sort) {
-			
+		LargeListSorter sorter = new LargeListSorter(sort, false) {
+			// typical 15% cache hit-rate so cache probably not worth-while
 			@Override
 			protected SortKey makeKey(Mdr11Record r, Sort sort, Map cache) {
 return sort.createSortKey(r, r.getName(), r.getMapIndex(), cache);
Index: src/uk/me/parabola/imgfmt/app/mdr/Mdr23.java
===
--- src/uk/me/parabola/imgfmt/app/mdr/Mdr23.java	(revision 4808)
+++ src/uk/me/parabola/imgfmt/app/mdr/Mdr23.java	(working copy)
@@ -12,6 +12,7 @@
  */
 package uk.me.parabola.imgfmt.app.mdr;
 
+import java.text.Collator;
 import java.util.ArrayList;
 import java.util.List;
 
@@ -37,6 +38,8 @@
 	 */
 	public void sortRegions(List list) {
 		Sort sort = getConfig().getSort();
+		Collator collator = sort.getCollator();
+		collator.setStrength(Collator.SECONDARY);
 		List> keys = MdrUtils.sortList(sort, list);
 
 		String lastName = null;
@@ -47,7 +50,7 @@
 
 			// Only add if different name or map
 			String name = reg.getName();
-			if (reg.getMapIndex() != lastMapIndex || !name.equals(lastName)) {
+			if (lastName == null || reg.getMapIndex() != lastMapIndex || collator.compare(name, lastName) != 0) {
 record++;
 reg.getMdr28().setMdr23(record);
 regions.add(reg);
Index: src/uk/me/parabola/imgfmt/app/mdr/Mdr24.java
===
--- src/uk/me/parabola/imgfmt/app/mdr/Mdr24.java	(revision 4808)
+++ src/uk/me/parabola/imgfmt/app/mdr/Mdr24.java	(working copy)
@@ -12,6 +12,7 @@
  */
 package uk.me.parabola.imgfmt.app.mdr;
 
+import java.text.Collator;
 import java.util.ArrayList;
 import java.util.List;
 
@@ -37,6 +38,8 @@
 	 */
 	public void sortCountries(List list) {
 		Sort sort = 

Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-18 Thread Gerd Petermann
Hi Ticker,

I've never tried to understand that code, but yes, masking a position looks 
wrong.

Gerd


Von: mkgmap-dev  im Auftrag von Ticker 
Berkin 
Gesendet: Montag, 18. Oktober 2021 10:52
An: Development list for mkgmap
Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from 
unicode tiles

Hi Gerd

In imgfmt/app/srt/Sort.java around line 853:

// Get the first non-ignorable at this level
int c = chars[pos++ & 0xff];
if (!hasPage(c >>> 8)) {

I'm at a loss to understand the 0xff mask! am I missing something?

Ticker




___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-18 Thread Ticker Berkin
Hi Gerd

In imgfmt/app/srt/Sort.java around line 853:

// Get the first non-ignorable at this level
int c = chars[pos++ & 0xff];
if (!hasPage(c >>> 8)) {

I'm at a loss to understand the 0xff mask! am I missing something?

Ticker




___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-18 Thread Ticker Berkin
Hi Gerd

Yes - I don't know how we could test Garmin device/software use of
these indexes. Does the mkgmap ordering have to agree with something
Garmin is going to presume? Maybe it doesn't matter as long as there is
consistency where one ordered mdr structure points into another ordered
mdr. 

So, I propose to not worry about the actual ordering, but just make it
use all available information so that sort/unique dedupe works
correctly and do this consistently wherever necessary. This also side-
steps the issue of surrogate-pairs, which would need more significant
changes in code structure to deal with.

It's interesting that the existing code would have generated as more-
or-less unsorted mdr5 and rubbish mdr25/mdr29 when -unicode for chars
without sort entries and no one has complained.

Ticker

On Mon, 2021-10-18 at 08:12 +, Gerd Petermann wrote:
> Hi Ticker,
> 
> thanks for looking into this. I have no clue how to test if the index
> really works with those characters as I don't know how to type them. 
> If I got you right mkgmap isn't able to sort the city names so I
> wonder how the index can be of any use? I assume we have the same
> problem for other names like those for highways, POI etc?
> 
> Gerd
> 
> 
> Von: mkgmap-dev  im Auftrag
> von Ticker Berkin 
> Gesendet: Montag, 18. Oktober 2021 09:58
> An: Development list for mkgmap
> Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building
> index from unicode tiles
> 
> Hi
> 
> Although 2 16-bit items (surrogate pairs in UTF-16 speak) are
> required
> to represent many Chinese characters, this isn't the significant
> problem in this case.
> 
> Problem is that resources/sort/cp65001.txt doesn't give ordering to
> lots of characters; it looks like it covers only about 10,500 of the
> 1,112,064 possible code-points. Many of these non-ordered characters
> are being used by the names in the tile in question.
> 
> The basic handling for other codings (eg cp125*) uses a missing sort
> as
> the basis for ignoring the character; it won't be represented in the
> output so no point in considering it in the sorting.
> 
> This isn't the case with Unicode as all characters should show, but,
> more importantly relating to this crash, stable sorting is required
> for
> de-duplication of some of the index structures this isn't happening
> because of characters being ignored.
> 
> Assuming the actual ordering of unspecified code-points doesn't
> really
> matter, I propose to change the logic slightly so undefined Unicode
> is
> sorted on its 16-bit value after the range of known sorts.
> 
> I also need to make SortKey generation consistent in a similar way,
> fix
> some of uniqueness tests to be consistent with the sort and verify
> that
> the size of mdr5 is >= mdr25 so this type problem is detected before
> it
> is exposed when mdr25 indexes can't be represented in the same number
> of bytes as mdr5 indexes.
> 
> Ticker
> 
> 
> On Sun, 2021-10-17 at 11:16 +0100, Ticker Berkin wrote:
> > Hi
> > 
> > It is most likely that this problem is because Chinese requires 2
> > UTF16 chars to encode many of its characters - see
> > 
> > https://softwareengineering.stackexchange.com/questions/102205/should-utf-16-be-considered-harmful
> > 
> > I think it is only  --index processing where this is a problem
> > mkgmap.
> > 
> > I'll investigate  more
> > 
> > Ticker
> > 
> > 
> > ___
> > mkgmap-dev mailing list
> > mkgmap-dev@lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> 
> 
> ___
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> ___
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-18 Thread Gerd Petermann
Hi Ticker,

thanks for looking into this. I have no clue how to test if the index really 
works with those characters as I don't know how to type them.  If I got you 
right mkgmap isn't able to sort the city names so I wonder how the index can be 
of any use? I assume we have the same problem for other names like those for 
highways, POI etc?

Gerd


Von: mkgmap-dev  im Auftrag von Ticker 
Berkin 
Gesendet: Montag, 18. Oktober 2021 09:58
An: Development list for mkgmap
Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from 
unicode tiles

Hi

Although 2 16-bit items (surrogate pairs in UTF-16 speak) are required
to represent many Chinese characters, this isn't the significant
problem in this case.

Problem is that resources/sort/cp65001.txt doesn't give ordering to
lots of characters; it looks like it covers only about 10,500 of the
1,112,064 possible code-points. Many of these non-ordered characters
are being used by the names in the tile in question.

The basic handling for other codings (eg cp125*) uses a missing sort as
the basis for ignoring the character; it won't be represented in the
output so no point in considering it in the sorting.

This isn't the case with Unicode as all characters should show, but,
more importantly relating to this crash, stable sorting is required for
de-duplication of some of the index structures this isn't happening
because of characters being ignored.

Assuming the actual ordering of unspecified code-points doesn't really
matter, I propose to change the logic slightly so undefined Unicode is
sorted on its 16-bit value after the range of known sorts.

I also need to make SortKey generation consistent in a similar way, fix
some of uniqueness tests to be consistent with the sort and verify that
the size of mdr5 is >= mdr25 so this type problem is detected before it
is exposed when mdr25 indexes can't be represented in the same number
of bytes as mdr5 indexes.

Ticker


On Sun, 2021-10-17 at 11:16 +0100, Ticker Berkin wrote:
> Hi
>
> It is most likely that this problem is because Chinese requires 2
> UTF16 chars to encode many of its characters - see
>
> https://softwareengineering.stackexchange.com/questions/102205/should-utf-16-be-considered-harmful
>
> I think it is only  --index processing where this is a problem
> mkgmap.
>
> I'll investigate  more
>
> Ticker
>
>
> ___
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-18 Thread Ticker Berkin
Hi

Although 2 16-bit items (surrogate pairs in UTF-16 speak) are required
to represent many Chinese characters, this isn't the significant
problem in this case.

Problem is that resources/sort/cp65001.txt doesn't give ordering to
lots of characters; it looks like it covers only about 10,500 of the
1,112,064 possible code-points. Many of these non-ordered characters
are being used by the names in the tile in question. 

The basic handling for other codings (eg cp125*) uses a missing sort as
the basis for ignoring the character; it won't be represented in the
output so no point in considering it in the sorting.

This isn't the case with Unicode as all characters should show, but,
more importantly relating to this crash, stable sorting is required for
de-duplication of some of the index structures this isn't happening
because of characters being ignored.

Assuming the actual ordering of unspecified code-points doesn't really
matter, I propose to change the logic slightly so undefined Unicode is
sorted on its 16-bit value after the range of known sorts.

I also need to make SortKey generation consistent in a similar way, fix
some of uniqueness tests to be consistent with the sort and verify that
the size of mdr5 is >= mdr25 so this type problem is detected before it
is exposed when mdr25 indexes can't be represented in the same number
of bytes as mdr5 indexes.

Ticker


On Sun, 2021-10-17 at 11:16 +0100, Ticker Berkin wrote:
> Hi
> 
> It is most likely that this problem is because Chinese requires 2
> UTF16 chars to encode many of its characters - see
> 
> https://softwareengineering.stackexchange.com/questions/102205/should-utf-16-be-considered-harmful
> 
> I think it is only  --index processing where this is a problem
> mkgmap.
> 
> I'll investigate  more
> 
> Ticker
> 
> 
> ___
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-17 Thread Gerd Petermann
Hi Carlos,

no, the index is probably wrong for the other tiles as well. Just the special 
case that causes the exception doesn't occur when e.g. the list of Mdr5 entries 
has more than 256 items.

Gerd


Von: mkgmap-dev  im Auftrag von Carlos 
Dávila 
Gesendet: Sonntag, 17. Oktober 2021 13:48
An: mkgmap-dev@lists.mkgmap.org.uk
Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from 
unicode tiles

In that case, it seems estrange that only 2 of 67 tiles of China map
cause problems, doesn't it?

El 17/10/21 a las 12:16, Ticker Berkin escribió:
> Hi
>
> It is most likely that this problem is because Chinese requires 2
> UTF16 chars to encode many of its characters - see
>
> https://softwareengineering.stackexchange.com/questions/102205/should-utf-16-be-considered-harmful
> <https://softwareengineering.stackexchange.com/questions/102205/should-utf-16-be-considered-harmful>
>
> I think it is only  --index processing where this is a problem mkgmap.
>
> I'll investigate  more
>
> Ticker
>
>
>
> ___
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-17 Thread Carlos Dávila
In that case, it seems estrange that only 2 of 67 tiles of China map 
cause problems, doesn't it?


El 17/10/21 a las 12:16, Ticker Berkin escribió:

Hi

It is most likely that this problem is because Chinese requires 2 
UTF16 chars to encode many of its characters - see


https://softwareengineering.stackexchange.com/questions/102205/should-utf-16-be-considered-harmful 



I think it is only  --index processing where this is a problem mkgmap.

I'll investigate  more

Ticker



___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-17 Thread Ticker Berkin
Hi

It is most likely that this problem is because Chinese requires 2 UTF16
chars to encode many of its characters - see

https://softwareengineering.stackexchange.com/questions/102205/should-utf-16-be-considered-harmful

I think it is only  --index processing where this is a problem mkgmap.

I'll investigate  more

Ticker


___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-15 Thread Ticker Berkin
Hi

I can also reproduce this. I'll investigate, but am no expert on java
sort/collation.

Ticker


___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-15 Thread Gerd Petermann
Hi Carlos,

I think there are at least two problems:
1) something is wrong with the unicode String comparison, but I have no clue 
how it should work.
The tile contains > 1 city POI, but mkgmap detects only 145 different names 
with unicode. Method Mdr5Record.isSameByName(Collator collator, Mdr5Record 
other) returns true for names which look very different to me.
2) We don't use the method Mdr5Record.isSameByName() when section Mdr25 is 
written (Cities are sorted by country and then by the mdr5 city record number). 
Instead normal java String.equals() is used and thus the list contains the 
expected > 1 entries. This list requires a two-byte value in the index.

The crash happens because we try to write the position in the mdr25 list with 
only one byte cause of this code in Mdr29.java:
int size25 = sizes.getSize(5);  // NB appears to be size of 5 
(cities), not 25 (cities with country).
The comment already shows that this is probably only correct when boths lists 
have the same number of entries.

I hope Steve or Ticker have an idea what's wrong.
Gerd


Von: mkgmap-dev  im Auftrag von Gerd 
Petermann 
Gesendet: Freitag, 15. Oktober 2021 10:09
An: Development list for mkgmap
Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from
unicode tiles

Hi Carlos,

I can reproduce the crash. Not sure where to fix this yet...

Gerd


Von: mkgmap-dev  im Auftrag von Carlos 
Dávila 
Gesendet: Donnerstag, 14. Oktober 2021 18:33
An: Development list for mkgmap
Betreff: [mkgmap-dev] java.lang.AssertionError while building index from
unicode tiles

Hi all

I'm getting error below while building index from this tile:
https://files.mkgmap.org.uk/download/523/31177029.o5m. Minimum mkgmap
options triggering the error are: java -jar mkgmap-trunk.jar
--bounds=bounds.zip --index --unicode 31177029.o5m

Another tile from the same splitter run also fails but all other 65 of
67 build fine. With --code-page=936 they all build fine.

Command output:

Exception in thread "main" java.lang.AssertionError: 10586
 at
uk.me.parabola.imgfmt.app.FileBackedImgFileWriter.putNu(FileBackedImgFileWriter.java:215)
 at uk.me.parabola.imgfmt.app.mdr.Mdr29.writeSectData(Mdr29.java:94)
 at
uk.me.parabola.imgfmt.app.mdr.MDRFile.writeSection(MDRFile.java:424)
 at
uk.me.parabola.imgfmt.app.mdr.MDRFile.writeSections(MDRFile.java:388)
 at uk.me.parabola.imgfmt.app.mdr.MDRFile.write(MDRFile.java:270)
 at
uk.me.parabola.mkgmap.combiners.MdrBuilder.onFinish(MdrBuilder.java:331)
 at uk.me.parabola.mkgmap.main.Main.endOptions(Main.java:690)
 at
uk.me.parabola.mkgmap.CommandArgsReader.readArgs(CommandArgsReader.java:126)
 at uk.me.parabola.mkgmap.main.Main.mainStart(Main.java:147)
 at uk.me.parabola.mkgmap.main.Main.main(Main.java:118)


___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles

2021-10-15 Thread Gerd Petermann
Hi Carlos,

I can reproduce the crash. Not sure where to fix this yet...

Gerd


Von: mkgmap-dev  im Auftrag von Carlos 
Dávila 
Gesendet: Donnerstag, 14. Oktober 2021 18:33
An: Development list for mkgmap
Betreff: [mkgmap-dev] java.lang.AssertionError while building index from
unicode tiles

Hi all

I'm getting error below while building index from this tile:
https://files.mkgmap.org.uk/download/523/31177029.o5m. Minimum mkgmap
options triggering the error are: java -jar mkgmap-trunk.jar
--bounds=bounds.zip --index --unicode 31177029.o5m

Another tile from the same splitter run also fails but all other 65 of
67 build fine. With --code-page=936 they all build fine.

Command output:

Exception in thread "main" java.lang.AssertionError: 10586
 at
uk.me.parabola.imgfmt.app.FileBackedImgFileWriter.putNu(FileBackedImgFileWriter.java:215)
 at uk.me.parabola.imgfmt.app.mdr.Mdr29.writeSectData(Mdr29.java:94)
 at
uk.me.parabola.imgfmt.app.mdr.MDRFile.writeSection(MDRFile.java:424)
 at
uk.me.parabola.imgfmt.app.mdr.MDRFile.writeSections(MDRFile.java:388)
 at uk.me.parabola.imgfmt.app.mdr.MDRFile.write(MDRFile.java:270)
 at
uk.me.parabola.mkgmap.combiners.MdrBuilder.onFinish(MdrBuilder.java:331)
 at uk.me.parabola.mkgmap.main.Main.endOptions(Main.java:690)
 at
uk.me.parabola.mkgmap.CommandArgsReader.readArgs(CommandArgsReader.java:126)
 at uk.me.parabola.mkgmap.main.Main.mainStart(Main.java:147)
 at uk.me.parabola.mkgmap.main.Main.main(Main.java:118)


___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
___
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev