Hi,
I just noticed your email. Apologies if this has already been answered and I didn't notice _that_.

I haven't used the spellcheck yet, but I have been considering it, so I am looking at it for the first time. You didn't say if you are using Windows or not, that information might help. I am using Linux at the moment.

I downloaded the latest SSP file from sourceforge.
https://sourceforge.net/projects/scid/files/Player%20Data/
ratings0426.zip

On 2/8/26 11:12, Ulrich Dirr wrote:
Hello,

I want to replace in a database player names with correct spellings*. Do I need to create 
a special SSP file for that? Do I understand it correctly that lines with "=" 
denote alternate writings?
This is documented in the SSP file, I quote a snippet below. There is more information in the SSP file, well worth reading. From the handling of first and last names, it seems clear that it will not update some mis-spellings, and might sometimes unexpectedly change some correct names to incorrect ones.

(quote)
# PLAYER SECTION
#
# The format is: the correct spelling is unindented, and
# alternative spellings are indented on the following lines starting
# with a "=".  Lines starting with a "%Bio" are biography notes.
(/quote)

What would the best way to proceed with a database of about 100k entries? Maybe 
it is better to make the changes in a PGN file first?

I can say from bitter experience with bulk database updates, that this sort of operation should _never_ be done without careful checking of the data both before and after. Make backups first. Check everything. Check everything again, carefully! And I would make the changes _twice_. Once in PGN, then restore from backup and make the changes again in Scid, finally compare the two runs. It will help you understand where you are smarter than Scid, and where you are not.

I strongly recommend writing a helper script to describe the names, and using it to make sure your update only made desirable changes. If you have some experience with AWK, the below script (see bottom of email) can report on, for example, White or Black tags in a pgn database. Or you could do similar in python, powershell, or other scripting language. You can use such a report to check before, check after, and perhaps even create your new SSP file if you want to go that route.

Thanks in advance.

Best regards,
Ulrich

*For my own needs I will use correctly transliterated cyrillic names. And other 
names with accents, dieresis etc. Therefore they will be UTF-8 encoded.

Hmm, I think the Scid spellcheck predates UTF-8 encoding.... The unix `file` utility says the SSP file is ASCII text. And the PGN Standard specifies Latin-1. But most chess software I have seen uses either UTF-8 or Windows-1252. So UTF-8 might work, but be sure to test everything.

$ file ratings0426.ssp
ratings0426.ssp: ASCII text
$ file --version
file-5.44
magic file from /etc/magic:/usr/share/misc/magic
$

(quote)
4.1: Character codes

PGN data is represented using a subset of the eight bit ISO 8859/1 (Latin 1)
character set.
(/quote)
https://github.com/fsmosca/PGN-Standard/blob/master/PGN-Standard.txt

One final point is that once you have your database names the way you like them, you may want to keep a separate "additional_games" database. Run the spellcheck only on the "additional_games"; this avoids introducing errors to your main database. Especially important as you make changes to your SSP file. The other benefit is it will be much easier to notice spelling problems in the smaller "additional_games" database. Once the new games are correct, only then import them into your main database. And of course add any new names to your custom SSP file or script or however you did it.

--
Alan


(script)
# $ awk -v vtag='Tag1' -f _this.awk input.pgn
# output tagvalue;count for sorting by tagvalue
# v2024.12.20  tag      => vtag

#              t        => tt

#              tagvalue => tagvaluecn

# v2024.11.13  add useage message

# v2020.09.27  1st

BEGIN {

  if ( vtag == "" ) {

print "usage$ awk -v vtag=\"Tag1\" -f pgn-report-tags-param-vtag-counts.awk some.pgn"

    exit 1

  }
  FS = "\"" ;

}

{
  if ( ("[" vtag " ") == $1 )
    tagvaluecn[$2]++ ;

}

END {

  for ( tt in tagvaluecn )

    print tt ";" tagvaluecn[tt] ;

}

(/script)



_______________________________________________
Scid-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scid-users

Reply via email to