Ian,

 

I have used GEDCOM Validator by Chronoplex. I forget exactly where I found
it, but I did not get it via the Windows Store. I use it with Windows 7.

 

The "Best Practice" mode is going to show issues with most GEDCOM files.
Best practices are not requirements, and there is not 100% agreement on
which non-standard practices to follow when writing GEDCOM files.

 

The "Standards only" mode is probably more useful for the task you
described.

 

There will probably be a lot of warnings. Most of those will not be an issue
because most software that reads GEDCOM files can handle common issues like
lines that exceed the maximum GEDCOM length. The only way to know whether a
warning is serious or not is to review the data after importing it to see if
data has been lost or corrupted. Typically, the importing program will
produce a log and that will help, too. I don't have much experience with
Legacy's GEDCOM import so I don't know how extensive its log is.

 

If there are errors, you should investigate. You can use the same approach
as with warnings: review the error message from the validator, then review
the import log from Legacy (or whichever program you use to read the GEDCOM
file), and then review the actual record in Legacy to see if it looks
correct.

 

For both warnings and errors, you do not have to review every instance of
every issue. You will usually find that a single type of error in the GEDCOM
file will be repeated many times, and if any one of those errors does not
corrupt the data, then none of them will. That's a generalization, of
course, but it's usually true.

 

The character encoding issue is serious. In my opinion, that is the single
biggest issue with non-standard GEDCOM files. In 2019, all software should
write GEDCOM files using the UTF-8 encoding, and all programs should read
that encoding correctly. If a program cannot handle Unicode characters, then
it should still read UTF-8 and it should report when characters cannot be
loaded because of the limitations of the reading program. I believe that
applies to Legacy because Legacy does not use Unicode internally.

 

The problem with encoding issues is that characters can be misinterpreted
during the import process. That results in hard-to-find garbled text: the
import log usually won't include where those issues occurred because a
program that doesn't use the proper encoding when reading the GEDCOM file
does not know it has corrupted the text.

 

So, the first thing to check in the import log is whether the importing
program has recognized the character encoding of the GEDCOM file. In this
case, you said the GEDCOM file uses Windows-1252. That's a common
non-standard value and I assume that Legacy and most other modern programs
will handle it properly.

 

In GEDCOM 5.5, UTF-8 is not a valid encoding choice. Many programs will
still write it anyway, a rare case where ignoring the standard is a good
thing. UTF-8 is valid in GEDCOM 5.5.1. Many programs will export to GEDCOM
5.5 or 5.5.1, and if UTF-8 is only a choice when writing to GEDCOM 5.5.1,
then use that option.

 

It's sad and ridiculous that users must be aware of character encoding
issues when it's an easy problem to solve.

 

John

-- 

LegacyUserGroup mailing list
LegacyUserGroup@legacyusers.com
To manage your subscription and unsubscribe 
http://legacyusers.com/mailman/listinfo/legacyusergroup_legacyusers.com
Archives at:
http://www.mail-archive.com/legacyusergroup@legacyusers.com/

Reply via email to