Hello Debian!

TLDR; How I can generate my own dictionary and spell checker file?

I from Malaysia, we use Malay (Bahasa Melayu) as our primary language but we don't mind using English for user-interface of software and communication. I think most of us are comfortable with English because direct Malays translation maybe awkward a bits but of course I encourage Malays (Ms) translation (I do translation too, btw).

That is OK for user-interface but when comes to document such as dissertation, paperwork, report which we use word-processing such as LibreOffice and so..we really want dictionary and spellchecker to validate what we typed and fix typos right away..

Checking around, I found there is few contribution made by some people but that distribute dictionary and affix file, but it not been update quite long time.

I wonder how their build and test the files? It must be using some tools. I contacted them to ask, but not getting response..could be their not use the email anymore or already left this world.

On my research, I found most of people using Myspell long time ago, and mostly now are using Hunspell and there is new tool called Nuspell too, but I don't understand how to use it. Where you I put my new keyword? Can anyone guide me? Correct me if this is not the tool for what I am looking for.

Looking at English hunspell package by debian, it look like it have some pattern and unfortunately I don't understand it but thanks to Nuspell manual wiki, I able to understand it

$ head /usr/share/hunspell/en_US.aff
SET UTF-8
TRY esianrtolcdugmphbyfvkwzESIANRTOLCDUGMPHBYFVKWZ'
ICONV 1
ICONV ’ '
NOSUGGEST !

# ordinal numbers
COMPOUNDMIN 1
# only in compounds: 1th, 2th, 3th
ONLYINCOMPOUND c

$ head /usr/share/hunspell/en_US.dic
78975
0/nm
0th/pt
1/n1
1st/p
1th/tc
2/nm
2nd/p
2th/tc
3/nm

78975
0/nm
0th/pt
1/n1
1st/p
1th/tc
2/nm
2nd/p
2th/tc
3/nm

As end user, no body care much about it but now I care because I want to implement Malay words on it and generate the dictionary and affix file.

I know, it take time to have a good size of file to be useful, but I might spend few hour when I free and someone maybe can continue my work.

I have plan to put it on debian package too. Since I have experience with debian packaging. So I take a look on most hunspell package on debian.

$ ls /usr/share/hunspell/ -la
total 860
drwxr-xr-x   2 root root   4096 Feb 26 00:00 .
drwxr-xr-x 381 root root  12288 Jun 14 22:09 ..
-rw-r--r--   1 root root   3090 Mar  1  2020 en_US.aff
-rw-r--r--   1 root root 859956 Mar  1  2020 en_US.dic

I don't have hunspell installed but I have hunspell-en-us

$ apt-cache policy hunspell hunspell-en-us
hunspell:
  Installed: (none)
  Candidate: 1.7.0-3
  Version table:
     1.7.0-3 500
        500 http://ftp.jp.debian.org/debian bullseye/main amd64 Packages
hunspell-en-us:
  Installed: 1:2019.10.06-1
  Candidate: 1:2019.10.06-1
  Version table:
 *** 1:2019.10.06-1 500
        500 http://ftp.jp.debian.org/debian bullseye/main amd64 Packages
        500 http://ftp.jp.debian.org/debian bullseye/main i386 Packages
        100 /var/lib/dpkg/status

which mean, I can just check hunspell-en-us package but on https://tracker.debian.org/pkg/hunspell-en-us and https://packages.debian.org/search?searchon=sourcenames&keywords=hunspell-en-us it said version 20070829-* but I have 1:2019.10.06-1 installed. Not sure why it look like this.

Anyway, I still can see the code dump on https://sources.debian.org/src/hunspell-en-us/20070829-7/ (it would be nice, if I can see it on salsa), and I am right. It quite simple to package and upstream source only need aff and dic file. I see a light for packaging part.

It only, I don't see how should I generate this file? or It really just a plaintext and no need a tool to generate it.

To be honest, I might be lost interest if I am clueless to much, but I posted here hoping to get some information and maybe useful for someone like me who have same purpose.

--
Robbi Nespu <robbinespu AT SPAMFREE gmail DOT com>
D311 B5FF EEE6 0BE8 9C91 FA9E 0C81 FA30 3B3A 80BA
https://robbinespu.gitlab.io | https://mstdn.social/@robbinespu

Reply via email to