Hello Debian!
TLDR; How I can generate my own dictionary and spell checker file?
I from Malaysia, we use Malay (Bahasa Melayu) as our primary language
but we don't mind using English for user-interface of software and
communication. I think most of us are comfortable with English because
direct Malays translation maybe awkward a bits but of course I encourage
Malays (Ms) translation (I do translation too, btw).
That is OK for user-interface but when comes to document such as
dissertation, paperwork, report which we use word-processing such as
LibreOffice and so..we really want dictionary and spellchecker to
validate what we typed and fix typos right away..
Checking around, I found there is few contribution made by some people
but that distribute dictionary and affix file, but it not been update
quite long time.
I wonder how their build and test the files? It must be using some
tools. I contacted them to ask, but not getting response..could be their
not use the email anymore or already left this world.
On my research, I found most of people using Myspell long time ago, and
mostly now are using Hunspell and there is new tool called Nuspell too,
but I don't understand how to use it. Where you I put my new keyword?
Can anyone guide me? Correct me if this is not the tool for what I am
looking for.
Looking at English hunspell package by debian, it look like it have some
pattern and unfortunately I don't understand it but thanks to Nuspell
manual wiki, I able to understand it
$ head /usr/share/hunspell/en_US.aff
SET UTF-8
TRY esianrtolcdugmphbyfvkwzESIANRTOLCDUGMPHBYFVKWZ'
ICONV 1
ICONV ’ '
NOSUGGEST !
# ordinal numbers
COMPOUNDMIN 1
# only in compounds: 1th, 2th, 3th
ONLYINCOMPOUND c
$ head /usr/share/hunspell/en_US.dic
78975
0/nm
0th/pt
1/n1
1st/p
1th/tc
2/nm
2nd/p
2th/tc
3/nm
78975
0/nm
0th/pt
1/n1
1st/p
1th/tc
2/nm
2nd/p
2th/tc
3/nm
As end user, no body care much about it but now I care because I want to
implement Malay words on it and generate the dictionary and affix file.
I know, it take time to have a good size of file to be useful, but I
might spend few hour when I free and someone maybe can continue my work.
I have plan to put it on debian package too. Since I have experience
with debian packaging. So I take a look on most hunspell package on debian.
$ ls /usr/share/hunspell/ -la
total 860
drwxr-xr-x 2 root root 4096 Feb 26 00:00 .
drwxr-xr-x 381 root root 12288 Jun 14 22:09 ..
-rw-r--r-- 1 root root 3090 Mar 1 2020 en_US.aff
-rw-r--r-- 1 root root 859956 Mar 1 2020 en_US.dic
I don't have hunspell installed but I have hunspell-en-us
$ apt-cache policy hunspell hunspell-en-us
hunspell:
Installed: (none)
Candidate: 1.7.0-3
Version table:
1.7.0-3 500
500 http://ftp.jp.debian.org/debian bullseye/main amd64 Packages
hunspell-en-us:
Installed: 1:2019.10.06-1
Candidate: 1:2019.10.06-1
Version table:
*** 1:2019.10.06-1 500
500 http://ftp.jp.debian.org/debian bullseye/main amd64 Packages
500 http://ftp.jp.debian.org/debian bullseye/main i386 Packages
100 /var/lib/dpkg/status
which mean, I can just check hunspell-en-us package but on
https://tracker.debian.org/pkg/hunspell-en-us and
https://packages.debian.org/search?searchon=sourcenames&keywords=hunspell-en-us
it said version 20070829-* but I have 1:2019.10.06-1 installed. Not sure
why it look like this.
Anyway, I still can see the code dump on
https://sources.debian.org/src/hunspell-en-us/20070829-7/ (it would be
nice, if I can see it on salsa), and I am right. It quite simple to
package and upstream source only need aff and dic file. I see a light
for packaging part.
It only, I don't see how should I generate this file? or It really just
a plaintext and no need a tool to generate it.
To be honest, I might be lost interest if I am clueless to much, but I
posted here hoping to get some information and maybe useful for someone
like me who have same purpose.
--
Robbi Nespu <robbinespu AT SPAMFREE gmail DOT com>
D311 B5FF EEE6 0BE8 9C91 FA9E 0C81 FA30 3B3A 80BA
https://robbinespu.gitlab.io | https://mstdn.social/@robbinespu