Hi,
I just noticed your email. Apologies if this has already been answered
and I didn't notice _that_.
I haven't used the spellcheck yet, but I have been considering it, so I
am looking at it for the first time. You didn't say if you are using
Windows or not, that information might help. I am using Linux at the moment.
I downloaded the latest SSP file from sourceforge.
https://sourceforge.net/projects/scid/files/Player%20Data/
ratings0426.zip
On 2/8/26 11:12, Ulrich Dirr wrote:
Hello,
I want to replace in a database player names with correct spellings*. Do I need to create
a special SSP file for that? Do I understand it correctly that lines with "="
denote alternate writings?
This is documented in the SSP file, I quote a snippet below. There is
more information in the SSP file, well worth reading. From the handling
of first and last names, it seems clear that it will not update some
mis-spellings, and might sometimes unexpectedly change some correct
names to incorrect ones.
(quote)
# PLAYER SECTION
#
# The format is: the correct spelling is unindented, and
# alternative spellings are indented on the following lines starting
# with a "=". Lines starting with a "%Bio" are biography notes.
(/quote)
What would the best way to proceed with a database of about 100k entries? Maybe
it is better to make the changes in a PGN file first?
I can say from bitter experience with bulk database updates, that this
sort of operation should _never_ be done without careful checking of the
data both before and after. Make backups first. Check everything. Check
everything again, carefully! And I would make the changes _twice_. Once
in PGN, then restore from backup and make the changes again in Scid,
finally compare the two runs. It will help you understand where you are
smarter than Scid, and where you are not.
I strongly recommend writing a helper script to describe the names, and
using it to make sure your update only made desirable changes. If you
have some experience with AWK, the below script (see bottom of email)
can report on, for example, White or Black tags in a pgn database. Or
you could do similar in python, powershell, or other scripting language.
You can use such a report to check before, check after, and perhaps even
create your new SSP file if you want to go that route.
Thanks in advance.
Best regards,
Ulrich
*For my own needs I will use correctly transliterated cyrillic names. And other
names with accents, dieresis etc. Therefore they will be UTF-8 encoded.
Hmm, I think the Scid spellcheck predates UTF-8 encoding.... The unix
`file` utility says the SSP file is ASCII text. And the PGN Standard
specifies Latin-1. But most chess software I have seen uses either UTF-8
or Windows-1252. So UTF-8 might work, but be sure to test everything.
$ file ratings0426.ssp
ratings0426.ssp: ASCII text
$ file --version
file-5.44
magic file from /etc/magic:/usr/share/misc/magic
$
(quote)
4.1: Character codes
PGN data is represented using a subset of the eight bit ISO 8859/1 (Latin 1)
character set.
(/quote)
https://github.com/fsmosca/PGN-Standard/blob/master/PGN-Standard.txt
One final point is that once you have your database names the way you
like them, you may want to keep a separate "additional_games" database.
Run the spellcheck only on the "additional_games"; this avoids
introducing errors to your main database. Especially important as you
make changes to your SSP file. The other benefit is it will be much
easier to notice spelling problems in the smaller "additional_games"
database. Once the new games are correct, only then import them into
your main database. And of course add any new names to your custom SSP
file or script or however you did it.
--
Alan
(script)
# $ awk -v vtag='Tag1' -f _this.awk input.pgn
# output tagvalue;count for sorting by tagvalue
# v2024.12.20 tag => vtag
# t => tt
# tagvalue => tagvaluecn
# v2024.11.13 add useage message
# v2020.09.27 1st
BEGIN {
if ( vtag == "" ) {
print "usage$ awk -v vtag=\"Tag1\" -f
pgn-report-tags-param-vtag-counts.awk some.pgn"
exit 1
}
FS = "\"" ;
}
{
if ( ("[" vtag " ") == $1 )
tagvaluecn[$2]++ ;
}
END {
for ( tt in tagvaluecn )
print tt ";" tagvaluecn[tt] ;
}
(/script)
_______________________________________________
Scid-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scid-users