Hi Gerd

I did a bit more experimenting with LargeListSorter and Mdr7/11 and
have some changes - patch attached.

1/ explicit param to LargeListSorted to useCache
2/ do nothing if 0 or 1 records to sort
3/ reduce chunkSize to reduce memory usage, increase maxDepth so same
overall limit
4/ allocate temp "merged" with correct length
5/ useCache parameters and remarks on usage in Mdr7 & 11
6/ prevent a copy of allStreets
7/ share the collator and allow it and "sort" to be garbage collected
8/ A comments that sort of partial streets must use sortkey, even if
only sorting a few record. LargeListSorter is a handy way of doing this
for a simple/single sortKey per record.

Sortkey simply makes a byte array of the string converted into the
target charset and sort does a byte comparison on this.

collator.compare(str1, str2) ignores the shield chars (and possibly the
other prefix/suffix markers) during the comparison and has extra
dynamic processing looking to handle char aliases (PRIMARY strength
etc).

The comment at the top of Sort.java:

 * found that sorting with the sort keys and the collator gave
different results in some
 * cases. This implementation does not.

is not really correct.

An alternative collator string comparison could be provided that
behaves in the same way as sort.createSortKey. This would speed up 'on
-the-fly' sorts, maybe becoming feasible.

Ticker

On Wed, 2021-05-12 at 09:08 +0000, Gerd Petermann wrote:
> Hi Ticker,
> 
> I think the MultiSortKeys were introduced because the on-the-fly solution was 
> far too slow, at least with the normal Java sort. Could well be that the 
> problem was solved with Java 8 or newer releases.
> 
> I think there was a special case with maps containing huge numbers of equally 
> named roads causing extreme run times.
> This depends on the style. Some styles add a name like "tr2" for each unnamed 
> track with tracktype=grade2.
> 
> Gerd
> 
> ________________________________________
> Von: mkgmap-dev <mkgmap-dev-boun...@lists.mkgmap.org.uk> im Auftrag von 
> Ticker Berkin <rwb-mkg...@jagit.co.uk>
> Gesendet: Mittwoch, 12. Mai 2021 10:48
> An: Development list for mkgmap
> Betreff: Re: [mkgmap-dev] MDR building out-of-memory
> 
> Hi Gerd
> 
> Certainly no cache. Maybe reduce the chunk size, but this might
> increase copying.
> 
> It could be improved by doing a linear chunk split/sort then a multi
> -way merge. This would avoid lots of copying assuming the following:
> 
> Using the original array to store sorted chunks demands that another
> array of the same full size is needed for the final merge. If each
> sorted key chunk is converted to a object chunk and these merged, then
> although the same total size is needed, it is made of number of smaller
> arrays.
> 
> The most space efficient solution might be have Mdr11 "implements
> Comparable" and generate pairs of sortkeys on the fly and let the java
> sort take care of all the details.
> 
> The other use of LargeListSorter is Mdr7. I get a higher hit-rate
> (~50%) for the first/partialSorter (1046096 allstreets).
> 
> However, for the repeated fullNameSorter on the partial results, most
> of the lists are just 1 long, with very few more than 10. I guess this
> depends on --road-name-config/split-name and use of shields etc, but
> LargeListSorted seems overkill.
> 
> Ticker
> 
> 
> On Tue, 2021-05-11 at 19:15 +0000, Gerd Petermann wrote:
> > Hi Ticker,
> > 
> > I think the cache is not meant to improve run time, it is used to
> > deduplicate and thus reduce memory. Maybe it would be better to use a
> > smaller chunk size and no cache.
> > No idea why I didn't use
> > merged = new ArrayList<>(len);
> > 
> > Gerd
> > 
> > ________________________________________
> > Von: mkgmap-dev <mkgmap-dev-boun...@lists.mkgmap.org.uk> im Auftrag
> > von Ticker Berkin <rwb-mkg...@jagit.co.uk>
> > Gesendet: Dienstag, 11. Mai 2021 19:19
> > An: Development list for mkgmap
> > Betreff: Re: [mkgmap-dev] MDR building out-of-memory
> > 
> > Hi Gerd
> > 
> > I've looked at this, and if Mdr5 space becomes a problem again, I'll
> > consider converting to it.
> > 
> > A couple of comments:
> > 
> > My map had 2909735 poi so the sort chunk size was ~727433. The cache
> > sizes after each chunk were 563095, 603595, 597239 & 605718. Is the
> > cache worth-while for this low hit-rate? Just running the Gmapsupp
> > combiner on existing tiles (without --route, so no streets), I got a
> > run time of 1 min 44 secs with the cache and 1 min 30 without!
> > However
> > most of this time is copying the tiles into gmapsupp.img so not an
> > accurate statistic.
> > 
> > You could pre-allocate List<> "merged" with the correct size.
> > 
> > Ticker
> > 
> > On Tue, 2021-05-11 at 14:40 +0000, Gerd Petermann wrote:
> > > Hi Ticker,
> > > 
> > > I've committed the patch as is. I've not seen big changes in
> > > performance, but I've used a different (already existing) set of
> > > files which was created with my own style. For me,
> > > Mdr11.preWriteImpl() is the most problematic part reg. OOM errors.
> > > 
> > > Maybe look at the code which uses LargeListSorter.
> > > 
> > > Gerd
> > > 
> > > ________________________________________
> > > Von: mkgmap-dev <mkgmap-dev-boun...@lists.mkgmap.org.uk> im Auftrag
> > > von Ticker Berkin <rwb-mkg...@jagit.co.uk>
> > > Gesendet: Dienstag, 11. Mai 2021 13:27
> > > An: Development list for mkgmap
> > > Betreff: Re: [mkgmap-dev] MDR building out-of-memory
> > > 
> > > Hi Gerd
> > > 
> > > Here is updated version of patch.
> > > 
> > > Changes from last:
> > > 
> > > Uses your cache code for region and country (in 2 places). For
> > > British
> > > Isles, there are 190 regions and 7 countries, so I don't think the
> > > extra memory will be a problem and there should be some performance
> > > benefit.
> > > 
> > > Delays allocating cities until it can use sortKeys.size() for
> > > initial
> > > allocation. For above map this is 0.07% too big, so I don't think
> > > trimToSize() is worthwhile.
> > > 
> > > Shares the Sort object between the 4 methods.
> > > 
> > > Ticker
> > > _______________________________________________
> > > mkgmap-dev mailing list
> > > mkgmap-dev@lists.mkgmap.org.uk
> > > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev@lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev@lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Index: src/uk/me/parabola/imgfmt/app/mdr/LargeListSorter.java
===================================================================
--- src/uk/me/parabola/imgfmt/app/mdr/LargeListSorter.java	(revision 4710)
+++ src/uk/me/parabola/imgfmt/app/mdr/LargeListSorter.java	(working copy)
@@ -29,9 +29,11 @@
  */
 public abstract class LargeListSorter<T extends NamedRecord> {
 	private final Sort sort;
+	private final boolean useCache;
 	
-	public LargeListSorter(Sort sort) {
+	public LargeListSorter(Sort sort, boolean useCache) {
 		this.sort = sort;
+		this.useCache = useCache;
 	}
 
 	/**
@@ -39,6 +41,8 @@
 	 * @param list list of records.
 	 */
 	public void sort(List<T> list) {
+		if (list.size() <= 1)  // Mdr7: can have no streets or single element in partial list
+			return;
 		mergeSort(0, list, 0, list.size());
 	}
 	
@@ -51,13 +55,15 @@
 	 */
 	private void mergeSort(int depth, List<T> list, int start, int len) {
 		// we split if the number is very high and recursion is not too deep
-		if (len > 1_000_000 && depth < 3) {
+		if (len > 500_000 && depth < 4) {
 			mergeSort(depth+1,list, start, len / 2); // left
 			mergeSort(depth+1,list, start + len / 2, len - len / 2); // right
 			merge(list,start,len);
 		} else {
 			// sort one chunk
-			Map<String, byte[]> cache = new HashMap<>();
+			Map<String, byte[]> cache = null;
+			if (useCache)
+				cache = new HashMap<>();
 			List<SortKey<T>> keys = new ArrayList<>(len);
 
 			for (int i = start; i < start + len; i++) {
@@ -82,7 +88,7 @@
 		int stop2 = start + len;
 		boolean fetch1 = true;
 		boolean fetch2 = true;
-		List<T> merged = new ArrayList<>();
+		List<T> merged = new ArrayList<>(len);
 		SortKey<T> sk1 = null;
 		SortKey<T> sk2 = null;
 		while (pos1 < stop1 &&  pos2 < stop2) {
Index: src/uk/me/parabola/imgfmt/app/mdr/Mdr11.java
===================================================================
--- src/uk/me/parabola/imgfmt/app/mdr/Mdr11.java	(revision 4710)
+++ src/uk/me/parabola/imgfmt/app/mdr/Mdr11.java	(working copy)
@@ -61,8 +61,8 @@
 		pois.trimToSize();
 		Sort sort = getConfig().getSort();
 
-		LargeListSorter<Mdr11Record> sorter = new LargeListSorter<Mdr11Record>(sort) {
-			
+		LargeListSorter<Mdr11Record> sorter = new LargeListSorter<Mdr11Record>(sort, false) {
+			// typical 15% cache hit-rate so cache probably not worth-while
 			@Override
 			protected SortKey<Mdr11Record> makeKey(Mdr11Record r, Sort sort, Map<String, byte[]> cache) {
 				return sort.createSortKey(r, r.getName(), r.getMapIndex(), cache);
Index: src/uk/me/parabola/imgfmt/app/mdr/Mdr7.java
===================================================================
--- src/uk/me/parabola/imgfmt/app/mdr/Mdr7.java	(revision 4710)
+++ src/uk/me/parabola/imgfmt/app/mdr/Mdr7.java	(working copy)
@@ -51,11 +51,12 @@
 
 	private int partialInfoSize;
 	private Set<String> exclNames;
-	private final Sort sort;
+	private Sort sort;
 	private int maxPrefixCount;
 	private int maxSuffixCount;
 	private boolean magicIsValid;
 	private int magic;
+	private Collator collator;
 
 	public Mdr7(MdrConfig config) {
 		setConfig(config);
@@ -64,6 +65,8 @@
 		exclNames = config.getMdr7Excl();
 		codepage = sort.getCodepage();
 		isMulti = sort.isMulti();
+		collator = sort.getCollator();
+		collator.setStrength(Collator.SECONDARY);
 	}
 
 	public void addStreet(int mapId, String name, int lblOffset, int strOff, Mdr5Record mdrCity) {
@@ -204,7 +207,8 @@
 	@Override
 	protected void preWriteImpl() {
 		
-		LargeListSorter<Mdr7Record> partialSorter = new LargeListSorter<Mdr7Record>(sort) {
+		LargeListSorter<Mdr7Record> partialSorter = new LargeListSorter<Mdr7Record>(sort, true) {
+			// typical 50% cache hit-rate so cache probably worth-while
 			@Override
 			protected SortKey<Mdr7Record> makeKey(Mdr7Record r, Sort sort, Map<String, byte[]> cache) {
 				return sort.createSortKey(r, r.getPartialName(), 0, cache); // first sort by partial name only
@@ -211,14 +215,12 @@
 			}
 		};
 		
-		ArrayList<Mdr7Record> sorted = new ArrayList<>(allStreets);
-		allStreets.clear();
+		ArrayList<Mdr7Record> sorted = allStreets;
+		allStreets = new ArrayList<>(sorted.size());
 		partialSorter.sort(sorted);
 		// list is now sorted by partial name only, we have to group by name and map index now
 		String lastPartial = null;
 		List<Mdr7Record> samePartial = new ArrayList<>();
-		Collator collator = sort.getCollator();
-		collator.setStrength(Collator.SECONDARY);
 		for (int i = 0; i < sorted.size(); i++) {
 			Mdr7Record r = sorted.get(i);
 			String partial = r.getPartialName();
@@ -246,16 +248,20 @@
 		// Basecamp needs the records grouped by partial name, full name, and map index.
 		// This sometimes presents search results in the wrong order. The partial sort fields allow to
 		// tell the right order.
+		if (samePartial.size() > 1) {
+			LargeListSorter<Mdr7Record> fullNameSorter = new LargeListSorter<Mdr7Record>(sort, true) {
+				// Very high hit-rate so use cache.
+				// NB: Mostly there are only a few records to sort here (the common case is 1),
+				// but must use SortKeys rather than a Comparator with Sort.collator.compare(names)
+				// because the special char handling is different.
+				@Override
+				protected SortKey<Mdr7Record> makeKey(Mdr7Record r, Sort sort, Map<String, byte[]> cache) {
+					return sort.createSortKey(r, r.getName(), r.getMapIndex(), cache);
+				}
+			};
+			fullNameSorter.sort(samePartial);
+		}
 		
-		LargeListSorter<Mdr7Record> fullNameSorter = new LargeListSorter<Mdr7Record>(sort) {
-			@Override
-			protected SortKey<Mdr7Record> makeKey(Mdr7Record r, Sort sort, Map<String, byte[]> cache) {
-				return sort.createSortKey(r, r.getName(), r.getMapIndex(), cache);
-			}
-		};
-		
-		
-		fullNameSorter.sort(samePartial);
 		Mdr7Record last = null;
 		int recordNumber = streets.size();
 		
@@ -283,8 +289,6 @@
 	public void writeSectData(ImgFileWriter writer) {
 		boolean hasStrings = hasFlag(MDR7_HAS_STRING);
 		boolean hasNameOffset = hasFlag(MDR7_HAS_NAME_OFFSET);
-		Collator collator = sort.getCollator();
-		collator.setStrength(Collator.SECONDARY); 
 		Mdr7Record last = null;
 		for (Mdr7Record s : streets) {
 			addIndexPointer(s.getMapIndex(), s.getIndex());
@@ -359,6 +363,8 @@
 	protected void releaseMemory() {
 		allStreets = null;
 		streets = null;
+		sort = null;
+		collator = null;
 	}
 
 
_______________________________________________
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Reply via email to