Ian,
I have used GEDCOM Validator by Chronoplex. I forget exactly where I found it, but I did not get it via the Windows Store. I use it with Windows 7. The "Best Practice" mode is going to show issues with most GEDCOM files. Best practices are not requirements, and there is not 100% agreement on which non-standard practices to follow when writing GEDCOM files. The "Standards only" mode is probably more useful for the task you described. There will probably be a lot of warnings. Most of those will not be an issue because most software that reads GEDCOM files can handle common issues like lines that exceed the maximum GEDCOM length. The only way to know whether a warning is serious or not is to review the data after importing it to see if data has been lost or corrupted. Typically, the importing program will produce a log and that will help, too. I don't have much experience with Legacy's GEDCOM import so I don't know how extensive its log is. If there are errors, you should investigate. You can use the same approach as with warnings: review the error message from the validator, then review the import log from Legacy (or whichever program you use to read the GEDCOM file), and then review the actual record in Legacy to see if it looks correct. For both warnings and errors, you do not have to review every instance of every issue. You will usually find that a single type of error in the GEDCOM file will be repeated many times, and if any one of those errors does not corrupt the data, then none of them will. That's a generalization, of course, but it's usually true. The character encoding issue is serious. In my opinion, that is the single biggest issue with non-standard GEDCOM files. In 2019, all software should write GEDCOM files using the UTF-8 encoding, and all programs should read that encoding correctly. If a program cannot handle Unicode characters, then it should still read UTF-8 and it should report when characters cannot be loaded because of the limitations of the reading program. I believe that applies to Legacy because Legacy does not use Unicode internally. The problem with encoding issues is that characters can be misinterpreted during the import process. That results in hard-to-find garbled text: the import log usually won't include where those issues occurred because a program that doesn't use the proper encoding when reading the GEDCOM file does not know it has corrupted the text. So, the first thing to check in the import log is whether the importing program has recognized the character encoding of the GEDCOM file. In this case, you said the GEDCOM file uses Windows-1252. That's a common non-standard value and I assume that Legacy and most other modern programs will handle it properly. In GEDCOM 5.5, UTF-8 is not a valid encoding choice. Many programs will still write it anyway, a rare case where ignoring the standard is a good thing. UTF-8 is valid in GEDCOM 5.5.1. Many programs will export to GEDCOM 5.5 or 5.5.1, and if UTF-8 is only a choice when writing to GEDCOM 5.5.1, then use that option. It's sad and ridiculous that users must be aware of character encoding issues when it's an easy problem to solve. John
-- LegacyUserGroup mailing list LegacyUserGroup@legacyusers.com To manage your subscription and unsubscribe http://legacyusers.com/mailman/listinfo/legacyusergroup_legacyusers.com Archives at: http://www.mail-archive.com/legacyusergroup@legacyusers.com/