Hi Gerd

Here it is

Ticker

On Tue, 2021-10-19 at 09:22 +0000, Gerd Petermann wrote:
> Hi Ticker,
> 
> yes, please remove all unrelated optimizations.
> 
> Gerd
> 
> ________________________________________
> Von: mkgmap-dev <[email protected]> im Auftrag
> von Ticker Berkin <[email protected]>
> Gesendet: Dienstag, 19. Oktober 2021 11:03
> An: Development list for mkgmap
> Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building
> index from unicode tiles
> 
> Hi Gerd
> 
> I'd removed the change relating to clearing the reference to the Sort
> object to allow garbage garbage collection; as you said, this won't
> happen because Sort is shared. I do notice, however, that on a
> typical
> mkgmap run, Sort is created/read 3 times - it isn't shared as fully
> as
> possible.
> 
> The other changes (LargeListSorter) are slight improvements to memory
> usage and/or processing time - I can remove them if you want.
> 
> Ticker
> 
> 
> On Tue, 2021-10-19 at 08:13 +0000, Gerd Petermann wrote:
> > Hi Ticker,
> > 
> > please remove the unrelated changes. I think we discussed them with
> > patch mdrSort.patch in May, subject "MDR building out-of-memory".
> > 
> > Gerd
> > 
> > ________________________________________
> > Von: mkgmap-dev <[email protected]> im Auftrag
> > von Ticker Berkin <[email protected]>
> > Gesendet: Montag, 18. Oktober 2021 16:36
> > An: Development list for mkgmap
> > Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building
> > index from unicode tiles
> > 
> > Hi Gerd
> > 
> > Here is first version of the changes to improve MDR unicode and
> > stop
> > the crash.
> > 
> > It always provides a PRIMARY strength sort value, both in the key
> > for
> > sorting and direct comparison when using the collator. Previously
> > neither of these would have anything for a unicode character not
> > mentioned in the sort/cp65001.txt file
> > 
> > In an attempt to stop ordering clashes between the specified sort
> > and
> > the ones fudged from the actual unicode value, it orders anything
> > unknown after the known values. Unfortunately these can then become
> > larger than 2 bytes - and, as this is all the space available
> > without
> > re-structuring, they have to wrap onto the known sort region. I
> > only
> > found 1 character that did this and I don't know if it conflicted
> > with
> > an existing sort.
> > 
> > Regardless of the character set used, in all the places where
> > sorting
> > is used for de-dupe, I've used the SECONDARY strength collator to
> > detect similar record instead of name.equals(lastName)
> > 
> > I also noticed that my source base included optimisation for
> > LargeListSorter, its use of a key cache and some tidy-up of this in
> > mdr7 & mdr11 so these are here as well.
> > 
> > Ticker
> > 
> > _______________________________________________
> > mkgmap-dev mailing list
> > [email protected]
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> 
> 
> _______________________________________________
> mkgmap-dev mailing list
> [email protected]
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> _______________________________________________
> mkgmap-dev mailing list
> [email protected]
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Index: src/uk/me/parabola/imgfmt/app/mdr/Mdr23.java
===================================================================
--- src/uk/me/parabola/imgfmt/app/mdr/Mdr23.java	(revision 4808)
+++ src/uk/me/parabola/imgfmt/app/mdr/Mdr23.java	(working copy)
@@ -12,6 +12,7 @@
  */
 package uk.me.parabola.imgfmt.app.mdr;
 
+import java.text.Collator;
 import java.util.ArrayList;
 import java.util.List;
 
@@ -37,6 +38,8 @@
 	 */
 	public void sortRegions(List<Mdr13Record> list) {
 		Sort sort = getConfig().getSort();
+		Collator collator = sort.getCollator();
+		collator.setStrength(Collator.SECONDARY);
 		List<SortKey<Mdr13Record>> keys = MdrUtils.sortList(sort, list);
 
 		String lastName = null;
@@ -47,7 +50,7 @@
 
 			// Only add if different name or map
 			String name = reg.getName();
-			if (reg.getMapIndex() != lastMapIndex || !name.equals(lastName)) {
+			if (lastName == null || reg.getMapIndex() != lastMapIndex || collator.compare(name, lastName) != 0) {
 				record++;
 				reg.getMdr28().setMdr23(record);
 				regions.add(reg);
Index: src/uk/me/parabola/imgfmt/app/mdr/Mdr24.java
===================================================================
--- src/uk/me/parabola/imgfmt/app/mdr/Mdr24.java	(revision 4808)
+++ src/uk/me/parabola/imgfmt/app/mdr/Mdr24.java	(working copy)
@@ -12,6 +12,7 @@
  */
 package uk.me.parabola.imgfmt.app.mdr;
 
+import java.text.Collator;
 import java.util.ArrayList;
 import java.util.List;
 
@@ -37,6 +38,8 @@
 	 */
 	public void sortCountries(List<Mdr14Record> list) {
 		Sort sort = getConfig().getSort();
+		Collator collator = sort.getCollator();
+		collator.setStrength(Collator.SECONDARY);
 		List<SortKey<Mdr14Record>> keys = MdrUtils.sortList(sort, list);
 
 		String lastName = null;
@@ -48,7 +51,7 @@
 			// If this is a new name, then we prepare a mdr29 record for it.
 			String name = c.getName();
 
-			if (lastMapIndex != c.getMapIndex() || !name.equals(lastName)) {
+			if (lastName == null || lastMapIndex != c.getMapIndex() || collator.compare(name, lastName) != 0) {
 				record++;
 				c.getMdr29().setMdr24(record);
 				countries.add(c);
Index: src/uk/me/parabola/imgfmt/app/mdr/Mdr25.java
===================================================================
--- src/uk/me/parabola/imgfmt/app/mdr/Mdr25.java	(revision 4808)
+++ src/uk/me/parabola/imgfmt/app/mdr/Mdr25.java	(working copy)
@@ -12,6 +12,7 @@
  */
 package uk.me.parabola.imgfmt.app.mdr;
 
+import java.text.Collator;
 import java.util.ArrayList;
 import java.util.List;
 
@@ -37,6 +38,8 @@
 	 */
 	public void sortCities(List<Mdr5Record> list) {
 		Sort sort = getConfig().getSort();
+		Collator collator = sort.getCollator();
+		collator.setStrength(Collator.SECONDARY);
 
 		List<SortKey<Mdr5Record>> keys = new ArrayList<>();
 		for (Mdr5Record c : list) {
@@ -51,9 +54,8 @@
 		int record = 0;
 		for (SortKey<Mdr5Record> key : keys) {
 			Mdr5Record city = key.getObject();
-
 			if (lastCity == null ||
-					(!city.getName().equals(lastCity.getName()) || !(city.getRegionName().equals(lastCity.getRegionName()))))
+				collator.compare(city.getName(), lastCity.getName()) != 0 || !city.getRegionName().equals(lastCity.getRegionName()))
 			{
 				record++;
 
Index: src/uk/me/parabola/imgfmt/app/mdr/Mdr28.java
===================================================================
--- src/uk/me/parabola/imgfmt/app/mdr/Mdr28.java	(revision 4808)
+++ src/uk/me/parabola/imgfmt/app/mdr/Mdr28.java	(working copy)
@@ -12,6 +12,7 @@
  */
 package uk.me.parabola.imgfmt.app.mdr;
 
+import java.text.Collator;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.List;
@@ -35,6 +36,8 @@
 
 	public void buildFromRegions(List<Mdr13Record> regions) {
 		Sort sort = getConfig().getSort();
+		Collator collator = sort.getCollator();
+		collator.setStrength(Collator.SECONDARY);
 		List<SortKey<Mdr13Record>> keys = MdrUtils.sortList(sort, regions);
 
 		int record = 0;
@@ -44,7 +47,7 @@
 			Mdr13Record region = key.getObject();
 
 			String name = region.getName();
-			if (!name.equals(lastName)) {
+			if (lastName == null || collator.compare(name, lastName) != 0) {
 				record++;
 				mdr28 = new Mdr28Record();
 				mdr28.setIndex(record);
Index: src/uk/me/parabola/imgfmt/app/mdr/Mdr29.java
===================================================================
--- src/uk/me/parabola/imgfmt/app/mdr/Mdr29.java	(revision 4808)
+++ src/uk/me/parabola/imgfmt/app/mdr/Mdr29.java	(working copy)
@@ -12,6 +12,7 @@
  */
 package uk.me.parabola.imgfmt.app.mdr;
 
+import java.text.Collator;
 import java.util.ArrayList;
 import java.util.List;
 
@@ -36,6 +37,8 @@
 
 	public void buildFromCountries(List<Mdr14Record> countries) {
 		Sort sort = getConfig().getSort();
+		Collator collator = sort.getCollator();
+		collator.setStrength(Collator.SECONDARY);
 		List<SortKey<Mdr14Record>> keys = MdrUtils.sortList(sort, countries);
 
 		// Sorted by name, for every new name we allocate a new 29 record and set the same one in every
@@ -46,7 +49,7 @@
 			Mdr14Record country = key.getObject();
 
 			String name = country.getName();
-			if (!name.equals(lastName)) {
+			if (lastName == null || collator.compare(name, lastName) != 0) {
 				mdr29 = new Mdr29Record();
 				mdr29.setName(name);
 				mdr29.setStrOffset(country.getStrOff());
@@ -83,6 +86,8 @@
 		PointerSizes sizes = getSizes();
 		int size24 = sizes.getSize(24);
 		int size22 = sizes.getSize(22);
+		assert sizes.getNumberOfItems(5) >= sizes.getNumberOfItems(25)
+			: "5:" + sizes.getNumberOfItems(5) + " 25:" + sizes.getNumberOfItems(25);
 		int size25 = sizes.getSize(5);  // NB appears to be size of 5 (cities), not 25 (cities with country).
 		int size26 = has26? sizes.getSize(26): 0;
 		int size17 = Utils.numberToPointerSize(max17);
Index: src/uk/me/parabola/imgfmt/app/mdr/Mdr5.java
===================================================================
--- src/uk/me/parabola/imgfmt/app/mdr/Mdr5.java	(revision 4808)
+++ src/uk/me/parabola/imgfmt/app/mdr/Mdr5.java	(working copy)
@@ -39,6 +39,8 @@
 	private int maxCityIndex;
 	private int localCitySize;
 	private int mdr20PointerSize = 0; // bytes for mdr20 pointer, or 0 if no mdr20
+	private Sort sort;
+	private Collator collator;
 
 	// We need a common area to save the mdr20 values, since there can be multiple
 	// city records with the same global city index
@@ -46,6 +48,9 @@
 
 	public Mdr5(MdrConfig config) {
 		setConfig(config);
+		sort = config.getSort();
+		collator = sort.getCollator();
+		collator.setStrength(Collator.SECONDARY);
 	}
 
 	public void addCity(Mdr5Record record) {
@@ -62,15 +67,14 @@
 	public void preWriteImpl() {
 		allCities.trimToSize();
 		localCitySize = Utils.numberToPointerSize(maxCityIndex + 1);
-		Sort sort = getConfig().getSort();
-		genCitiesAndMdr20s(sort);
+		genCitiesAndMdr20s();
 		// calculate positions for the different road indexes
-		calcMdr20SortPos(sort);
-		calcMdr21SortPos(sort);
-		calcMdr22SortPos(sort);
+		calcMdr20SortPos();
+		calcMdr21SortPos();
+		calcMdr22SortPos();
 	}
 
-	private void genCitiesAndMdr20s(Sort sort) {
+	private void genCitiesAndMdr20s() {
 		List<SortKey<Mdr5Record>> sortKeys = new ArrayList<>(allCities.size());
 		Map<String, byte[]> regionCache = new HashMap<>();
 		Map<String, byte[]> countryCache = new HashMap<>();
@@ -90,7 +94,6 @@
 		sortKeys.sort(null);
 
 		cities = new ArrayList<>(sortKeys.size());
-		Collator collator = sort.getCollator();
 
 		int count = 0;
 		Mdr5Record lastCity = null;
@@ -122,7 +125,7 @@
 	/**
 	 * Calculate a position when sorting by name, region, and country- This is used for MDR20. 
 	 */
-	private void calcMdr20SortPos(Sort sort) {
+	private void calcMdr20SortPos() {
 		List<SortKey<Mdr5Record>> sortKeys = new ArrayList<>(allCities.size());
 		Map<String, byte[]> regionCache = new HashMap<>();
 		Map<String, byte[]> countryCache = new HashMap<>();
@@ -155,7 +158,7 @@
 	/**
 	 * Calculate a position when sorting by region- This is used for MDR21. 
 	 */
-	private void calcMdr21SortPos(Sort sort) {
+	private void calcMdr21SortPos() {
 		List<SortKey<Mdr5Record>> sortKeys = new ArrayList<>(allCities.size());
 		for (Mdr5Record m : allCities) {
 			if (m.getRegionName() == null) 
@@ -181,7 +184,7 @@
 	 * Calculate a position when sorting by country- This is used for MDR22. 
 	 */
 
-	private void calcMdr22SortPos(Sort sort) {
+	private void calcMdr22SortPos() {
 		List<SortKey<Mdr5Record>> sortKeys = new ArrayList<>(allCities.size());
 		for (Mdr5Record m : allCities) {
 			if (m.getCountryName() == null)
@@ -208,7 +211,6 @@
 		Mdr5Record lastCity = null;
 		boolean hasString = hasFlag(0x8);
 		boolean hasRegion = hasFlag(0x4);
-		Collator collator = getConfig().getSort().getCollator();
 		for (Mdr5Record city : cities) {
 			int gci = city.getGlobalCityIndex();
 			addIndexPointer(city.getMapIndex(), gci);
@@ -284,7 +286,7 @@
 		int val = (localCitySize - 1);
 		// String offset is only included for a mapsource index.
 		if (isForDevice() ) {
-			if (!getConfig().getSort().isMulti())
+			if (!sort.isMulti())
 				val |= 0x40; // mdr17 sub section present (not with unicode)
 		} else {
 			val |= 0x04;  // region
Index: src/uk/me/parabola/imgfmt/app/srt/Sort.java
===================================================================
--- src/uk/me/parabola/imgfmt/app/srt/Sort.java	(revision 4808)
+++ src/uk/me/parabola/imgfmt/app/srt/Sort.java	(working copy)
@@ -72,6 +72,8 @@
 	private int headerLen = SRTHeader.HEADER_LEN; 
 	private int header3Len = -1;
 
+	private int maxPrimary = 0;  // max seen while loading resource/sort/cp*.txt file. == 10690 for cp65001.txt on 18Oct2021
+
 	public Sort() {
 		pages[0] = new Page();
 	}
@@ -80,9 +82,11 @@
 		ensurePage(ch >>> 8);
 		if (getPrimary(ch) != 0)
 			throw new ExitException(String.format("Repeated primary index 0x%x", ch & 0xff));
-		setPrimary (ch, primary);
+		if (primary > maxPrimary)
+			maxPrimary = primary;
+		setPrimary(ch, primary);
 		setSecondary(ch, secondary);
-		setTertiary( ch, tertiary);
+		setTertiary(ch, tertiary);
 
 		setFlags(ch, flags);
 		int numExp = (flags >> 4) & 0xf;
@@ -393,6 +397,13 @@
 		return fillKey(Collator.TERTIARY, bVal, key, start);
 	}
 
+	private static int writeSort(int strength, int pos, byte[] outKey, int start) {
+		if (strength == Collator.PRIMARY)
+			outKey[start++] = (byte) ((pos >> 8) & 0xff); // for 2 byte charsets
+		outKey[start++] = (byte) (pos & 0xff);
+		return start;
+	}
+
 	/**
 	 * Fill in the output key for a given strength.
 	 *
@@ -405,22 +416,25 @@
 		int index = start;
 		for (char c : input) {
 
-			if (!hasPage(c >>> 8))
+			if (!hasPage(c >>> 8)) {
+				if (isMulti() && type == Collator.PRIMARY) // attempt to avoid conflict with defined sorts. Be consistent with SrtCollator
+					index = writeSort(type, c + maxPrimary, outKey, index);
 				continue;
+			}
 
 			int exp = (getFlags(c) >> 4) & 0xf;
 			if (exp == 0) {
+				int startIndex = index;
 				index = writePos(type, c, outKey, index);
+				if (index == startIndex && isMulti() && type == Collator.PRIMARY) { // nothing got written but is needed
+					index = writeSort(type, c + maxPrimary, outKey, index);
+				}
 			} else {
 				// now have to redirect to a list of input chars, get the list via the primary value always.
 				int idx = getPrimary(c);
 				for (int i = idx - 1; i < idx + exp; i++) {
 					int pos = expansions.get(i).getPosition(type);
-					if (pos != 0) {
-						if (type == Collator.PRIMARY)
-							outKey[index++] = (byte) ((pos >>> 8) & 0xff);
-						outKey[index++] = (byte) pos;
-					}
+					index = writeSort(type, pos, outKey, index);
 				}
 			}
 		}
@@ -681,11 +695,8 @@
 		 */
 		public int writePos(int strength, int ch, byte[] outKey, int start) {
 			int pos = getPos(strength, ch);
-			if (pos != 0) {
-				if (strength == Collator.PRIMARY)
-					outKey[start++] = (byte) ((pos >> 8) & 0xff); // for 2 byte charsets
-				outKey[start++] = (byte) (pos & 0xff);
-			}
+			if (pos != 0)
+				start = writeSort(strength, pos, outKey, start);
 			return start;
 		}
 	}
@@ -851,8 +862,10 @@
 						}
 
 						// Get the first non-ignorable at this level
-						int c = chars[pos++ & 0xff];
+						int c = chars[pos++];
 						if (!hasPage(c >>> 8)) {
+							if (isMulti() && type == Collator.PRIMARY) // order by char itself if no sortpos
+								return (c + maxPrimary) & 0xffff; // attempt to avoid conflict with defined sorts. Be consistent with fillKey
 							next = 0;
 							continue;
 						}
@@ -869,6 +882,8 @@
 								expPos = 0;
 						} else {
 							next = getPos(type, c);
+							if (next == 0 && isMulti() && type == Collator.PRIMARY) // like earlier
+								return (c + maxPrimary) & 0xffff;
 						}
 
 					} while (next == 0);
_______________________________________________
mkgmap-dev mailing list
[email protected]
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Reply via email to