Hi Gerd
I did a bit more experimenting with LargeListSorter and Mdr7/11 and
have some changes - patch attached.
1/ explicit param to LargeListSorted to useCache
2/ do nothing if 0 or 1 records to sort
3/ reduce chunkSize to reduce memory usage, increase maxDepth so same
overall limit
4/ allocate temp "merged" with correct length
5/ useCache parameters and remarks on usage in Mdr7 & 11
6/ prevent a copy of allStreets
7/ share the collator and allow it and "sort" to be garbage collected
8/ A comments that sort of partial streets must use sortkey, even if
only sorting a few record. LargeListSorter is a handy way of doing this
for a simple/single sortKey per record.
Sortkey simply makes a byte array of the string converted into the
target charset and sort does a byte comparison on this.
collator.compare(str1, str2) ignores the shield chars (and possibly the
other prefix/suffix markers) during the comparison and has extra
dynamic processing looking to handle char aliases (PRIMARY strength
etc).
The comment at the top of Sort.java:
* found that sorting with the sort keys and the collator gave
different results in some
* cases. This implementation does not.
is not really correct.
An alternative collator string comparison could be provided that
behaves in the same way as sort.createSortKey. This would speed up 'on
-the-fly' sorts, maybe becoming feasible.
Ticker
On Wed, 2021-05-12 at 09:08 +0000, Gerd Petermann wrote:
> Hi Ticker,
>
> I think the MultiSortKeys were introduced because the on-the-fly solution was
> far too slow, at least with the normal Java sort. Could well be that the
> problem was solved with Java 8 or newer releases.
>
> I think there was a special case with maps containing huge numbers of equally
> named roads causing extreme run times.
> This depends on the style. Some styles add a name like "tr2" for each unnamed
> track with tracktype=grade2.
>
> Gerd
>
> ________________________________________
> Von: mkgmap-dev <mkgmap-dev-boun...@lists.mkgmap.org.uk> im Auftrag von
> Ticker Berkin <rwb-mkg...@jagit.co.uk>
> Gesendet: Mittwoch, 12. Mai 2021 10:48
> An: Development list for mkgmap
> Betreff: Re: [mkgmap-dev] MDR building out-of-memory
>
> Hi Gerd
>
> Certainly no cache. Maybe reduce the chunk size, but this might
> increase copying.
>
> It could be improved by doing a linear chunk split/sort then a multi
> -way merge. This would avoid lots of copying assuming the following:
>
> Using the original array to store sorted chunks demands that another
> array of the same full size is needed for the final merge. If each
> sorted key chunk is converted to a object chunk and these merged, then
> although the same total size is needed, it is made of number of smaller
> arrays.
>
> The most space efficient solution might be have Mdr11 "implements
> Comparable" and generate pairs of sortkeys on the fly and let the java
> sort take care of all the details.
>
> The other use of LargeListSorter is Mdr7. I get a higher hit-rate
> (~50%) for the first/partialSorter (1046096 allstreets).
>
> However, for the repeated fullNameSorter on the partial results, most
> of the lists are just 1 long, with very few more than 10. I guess this
> depends on --road-name-config/split-name and use of shields etc, but
> LargeListSorted seems overkill.
>
> Ticker
>
>
> On Tue, 2021-05-11 at 19:15 +0000, Gerd Petermann wrote:
> > Hi Ticker,
> >
> > I think the cache is not meant to improve run time, it is used to
> > deduplicate and thus reduce memory. Maybe it would be better to use a
> > smaller chunk size and no cache.
> > No idea why I didn't use
> > merged = new ArrayList<>(len);
> >
> > Gerd
> >
> > ________________________________________
> > Von: mkgmap-dev <mkgmap-dev-boun...@lists.mkgmap.org.uk> im Auftrag
> > von Ticker Berkin <rwb-mkg...@jagit.co.uk>
> > Gesendet: Dienstag, 11. Mai 2021 19:19
> > An: Development list for mkgmap
> > Betreff: Re: [mkgmap-dev] MDR building out-of-memory
> >
> > Hi Gerd
> >
> > I've looked at this, and if Mdr5 space becomes a problem again, I'll
> > consider converting to it.
> >
> > A couple of comments:
> >
> > My map had 2909735 poi so the sort chunk size was ~727433. The cache
> > sizes after each chunk were 563095, 603595, 597239 & 605718. Is the
> > cache worth-while for this low hit-rate? Just running the Gmapsupp
> > combiner on existing tiles (without --route, so no streets), I got a
> > run time of 1 min 44 secs with the cache and 1 min 30 without!
> > However
> > most of this time is copying the tiles into gmapsupp.img so not an
> > accurate statistic.
> >
> > You could pre-allocate List<> "merged" with the correct size.
> >
> > Ticker
> >
> > On Tue, 2021-05-11 at 14:40 +0000, Gerd Petermann wrote:
> > > Hi Ticker,
> > >
> > > I've committed the patch as is. I've not seen big changes in
> > > performance, but I've used a different (already existing) set of
> > > files which was created with my own style. For me,
> > > Mdr11.preWriteImpl() is the most problematic part reg. OOM errors.
> > >
> > > Maybe look at the code which uses LargeListSorter.
> > >
> > > Gerd
> > >
> > > ________________________________________
> > > Von: mkgmap-dev <mkgmap-dev-boun...@lists.mkgmap.org.uk> im Auftrag
> > > von Ticker Berkin <rwb-mkg...@jagit.co.uk>
> > > Gesendet: Dienstag, 11. Mai 2021 13:27
> > > An: Development list for mkgmap
> > > Betreff: Re: [mkgmap-dev] MDR building out-of-memory
> > >
> > > Hi Gerd
> > >
> > > Here is updated version of patch.
> > >
> > > Changes from last:
> > >
> > > Uses your cache code for region and country (in 2 places). For
> > > British
> > > Isles, there are 190 regions and 7 countries, so I don't think the
> > > extra memory will be a problem and there should be some performance
> > > benefit.
> > >
> > > Delays allocating cities until it can use sortKeys.size() for
> > > initial
> > > allocation. For above map this is 0.07% too big, so I don't think
> > > trimToSize() is worthwhile.
> > >
> > > Shares the Sort object between the 4 methods.
> > >
> > > Ticker
> > > _______________________________________________
> > > mkgmap-dev mailing list
> > > mkgmap-dev@lists.mkgmap.org.uk
> > > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev@lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev@lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Index: src/uk/me/parabola/imgfmt/app/mdr/LargeListSorter.java
===================================================================
--- src/uk/me/parabola/imgfmt/app/mdr/LargeListSorter.java (revision 4710)
+++ src/uk/me/parabola/imgfmt/app/mdr/LargeListSorter.java (working copy)
@@ -29,9 +29,11 @@
*/
public abstract class LargeListSorter<T extends NamedRecord> {
private final Sort sort;
+ private final boolean useCache;
- public LargeListSorter(Sort sort) {
+ public LargeListSorter(Sort sort, boolean useCache) {
this.sort = sort;
+ this.useCache = useCache;
}
/**
@@ -39,6 +41,8 @@
* @param list list of records.
*/
public void sort(List<T> list) {
+ if (list.size() <= 1) // Mdr7: can have no streets or single element in partial list
+ return;
mergeSort(0, list, 0, list.size());
}
@@ -51,13 +55,15 @@
*/
private void mergeSort(int depth, List<T> list, int start, int len) {
// we split if the number is very high and recursion is not too deep
- if (len > 1_000_000 && depth < 3) {
+ if (len > 500_000 && depth < 4) {
mergeSort(depth+1,list, start, len / 2); // left
mergeSort(depth+1,list, start + len / 2, len - len / 2); // right
merge(list,start,len);
} else {
// sort one chunk
- Map<String, byte[]> cache = new HashMap<>();
+ Map<String, byte[]> cache = null;
+ if (useCache)
+ cache = new HashMap<>();
List<SortKey<T>> keys = new ArrayList<>(len);
for (int i = start; i < start + len; i++) {
@@ -82,7 +88,7 @@
int stop2 = start + len;
boolean fetch1 = true;
boolean fetch2 = true;
- List<T> merged = new ArrayList<>();
+ List<T> merged = new ArrayList<>(len);
SortKey<T> sk1 = null;
SortKey<T> sk2 = null;
while (pos1 < stop1 && pos2 < stop2) {
Index: src/uk/me/parabola/imgfmt/app/mdr/Mdr11.java
===================================================================
--- src/uk/me/parabola/imgfmt/app/mdr/Mdr11.java (revision 4710)
+++ src/uk/me/parabola/imgfmt/app/mdr/Mdr11.java (working copy)
@@ -61,8 +61,8 @@
pois.trimToSize();
Sort sort = getConfig().getSort();
- LargeListSorter<Mdr11Record> sorter = new LargeListSorter<Mdr11Record>(sort) {
-
+ LargeListSorter<Mdr11Record> sorter = new LargeListSorter<Mdr11Record>(sort, false) {
+ // typical 15% cache hit-rate so cache probably not worth-while
@Override
protected SortKey<Mdr11Record> makeKey(Mdr11Record r, Sort sort, Map<String, byte[]> cache) {
return sort.createSortKey(r, r.getName(), r.getMapIndex(), cache);
Index: src/uk/me/parabola/imgfmt/app/mdr/Mdr7.java
===================================================================
--- src/uk/me/parabola/imgfmt/app/mdr/Mdr7.java (revision 4710)
+++ src/uk/me/parabola/imgfmt/app/mdr/Mdr7.java (working copy)
@@ -51,11 +51,12 @@
private int partialInfoSize;
private Set<String> exclNames;
- private final Sort sort;
+ private Sort sort;
private int maxPrefixCount;
private int maxSuffixCount;
private boolean magicIsValid;
private int magic;
+ private Collator collator;
public Mdr7(MdrConfig config) {
setConfig(config);
@@ -64,6 +65,8 @@
exclNames = config.getMdr7Excl();
codepage = sort.getCodepage();
isMulti = sort.isMulti();
+ collator = sort.getCollator();
+ collator.setStrength(Collator.SECONDARY);
}
public void addStreet(int mapId, String name, int lblOffset, int strOff, Mdr5Record mdrCity) {
@@ -204,7 +207,8 @@
@Override
protected void preWriteImpl() {
- LargeListSorter<Mdr7Record> partialSorter = new LargeListSorter<Mdr7Record>(sort) {
+ LargeListSorter<Mdr7Record> partialSorter = new LargeListSorter<Mdr7Record>(sort, true) {
+ // typical 50% cache hit-rate so cache probably worth-while
@Override
protected SortKey<Mdr7Record> makeKey(Mdr7Record r, Sort sort, Map<String, byte[]> cache) {
return sort.createSortKey(r, r.getPartialName(), 0, cache); // first sort by partial name only
@@ -211,14 +215,12 @@
}
};
- ArrayList<Mdr7Record> sorted = new ArrayList<>(allStreets);
- allStreets.clear();
+ ArrayList<Mdr7Record> sorted = allStreets;
+ allStreets = new ArrayList<>(sorted.size());
partialSorter.sort(sorted);
// list is now sorted by partial name only, we have to group by name and map index now
String lastPartial = null;
List<Mdr7Record> samePartial = new ArrayList<>();
- Collator collator = sort.getCollator();
- collator.setStrength(Collator.SECONDARY);
for (int i = 0; i < sorted.size(); i++) {
Mdr7Record r = sorted.get(i);
String partial = r.getPartialName();
@@ -246,16 +248,20 @@
// Basecamp needs the records grouped by partial name, full name, and map index.
// This sometimes presents search results in the wrong order. The partial sort fields allow to
// tell the right order.
+ if (samePartial.size() > 1) {
+ LargeListSorter<Mdr7Record> fullNameSorter = new LargeListSorter<Mdr7Record>(sort, true) {
+ // Very high hit-rate so use cache.
+ // NB: Mostly there are only a few records to sort here (the common case is 1),
+ // but must use SortKeys rather than a Comparator with Sort.collator.compare(names)
+ // because the special char handling is different.
+ @Override
+ protected SortKey<Mdr7Record> makeKey(Mdr7Record r, Sort sort, Map<String, byte[]> cache) {
+ return sort.createSortKey(r, r.getName(), r.getMapIndex(), cache);
+ }
+ };
+ fullNameSorter.sort(samePartial);
+ }
- LargeListSorter<Mdr7Record> fullNameSorter = new LargeListSorter<Mdr7Record>(sort) {
- @Override
- protected SortKey<Mdr7Record> makeKey(Mdr7Record r, Sort sort, Map<String, byte[]> cache) {
- return sort.createSortKey(r, r.getName(), r.getMapIndex(), cache);
- }
- };
-
-
- fullNameSorter.sort(samePartial);
Mdr7Record last = null;
int recordNumber = streets.size();
@@ -283,8 +289,6 @@
public void writeSectData(ImgFileWriter writer) {
boolean hasStrings = hasFlag(MDR7_HAS_STRING);
boolean hasNameOffset = hasFlag(MDR7_HAS_NAME_OFFSET);
- Collator collator = sort.getCollator();
- collator.setStrength(Collator.SECONDARY);
Mdr7Record last = null;
for (Mdr7Record s : streets) {
addIndexPointer(s.getMapIndex(), s.getIndex());
@@ -359,6 +363,8 @@
protected void releaseMemory() {
allStreets = null;
streets = null;
+ sort = null;
+ collator = null;
}
_______________________________________________
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev