Re: [9fans] mkindex of dict(7)

Akshat Kumar Mon, 05 Jan 2009 22:04:04 -0800

Regarding the dict index files, what I understand is that dict(7)
receives a pattern (may also be a byte offset or whatever, but suppose
pattern), looks it up in the first fields of the lines in the dict
index, and uses the corresponding byte offset in the index to find the
full line in the dict file.  Well, I've been trying to make the EDICT
dictionary[1] usable with dict(7), using just the "simple" dict scheme
as described in /sys/src/cmd/dict/simple.c, and have made (for now) a
        <kanji> <byte offset>
index file from the output of mkindex (piping through to sed and
switching the order of the kanji and byte offset).  I've tried quite a
few ways of making that index file, but have yet not succeeded in
getting dict(7) to actually find a corresponding line in the dict file
(`pattern not found'), given any kanji in the first fields in the
index file as a pattern.


I cannot attach the index file nor the dictionary file with this
E-Mail, since both are too big -- though I've put them online[2] --
but the dictionary file made available at [1] is in a slightly
different format (inserted tab after each kanji/kana) and charset
(EUC-JP/JIS X 0208 → UTF-8) than I have converted at [2].  If anyone
is willing to help figure this out, I'd be very grateful.


[1] http://www.csse.monash.edu.au/~jwb/edict_doc.html
        (see FORMAT for default formatting, and CURRENT VERSION & DOWNLOAD
         to grab edict.gz)
[2] http://sounine.nanosouffle.net/magic/webls?dir=/comp/dict


Please alert me if the information here is insufficient --
I also don't mind if you go ahead and make the dict files yourself...
just let me in on it --
ak

Re: [9fans] mkindex of dict(7)

Reply via email to