Hi,

You need to sort your index file. Looks like dict(7) is doing binary
search on it. After sorted, it works fine.

fhs

On Tue, Jan 6, 2009 at 12:34 AM, Akshat Kumar
<aku...@sounine.nanosouffle.net> wrote:
> Regarding the dict index files, what I understand is that dict(7)
> receives a pattern (may also be a byte offset or whatever, but suppose
> pattern), looks it up in the first fields of the lines in the dict
> index, and uses the corresponding byte offset in the index to find the
> full line in the dict file.  Well, I've been trying to make the EDICT
> dictionary[1] usable with dict(7), using just the "simple" dict scheme
> as described in /sys/src/cmd/dict/simple.c, and have made (for now) a
>        <kanji> <byte offset>
> index file from the output of mkindex (piping through to sed and
> switching the order of the kanji and byte offset).  I've tried quite a
> few ways of making that index file, but have yet not succeeded in
> getting dict(7) to actually find a corresponding line in the dict file
> (`pattern not found'), given any kanji in the first fields in the
> index file as a pattern.
>
> I cannot attach the index file nor the dictionary file with this
> E-Mail, since both are too big -- though I've put them online[2] --
> but the dictionary file made available at [1] is in a slightly
> different format (inserted tab after each kanji/kana) and charset
> (EUC-JP/JIS X 0208 → UTF-8) than I have converted at [2].  If anyone
> is willing to help figure this out, I'd be very grateful.
>
>
> [1] http://www.csse.monash.edu.au/~jwb/edict_doc.html
>        (see FORMAT for default formatting, and CURRENT VERSION & DOWNLOAD
>         to grab edict.gz)
> [2] http://sounine.nanosouffle.net/magic/webls?dir=/comp/dict
>
>
> Please alert me if the information here is insufficient --
> I also don't mind if you go ahead and make the dict files yourself...
> just let me in on it --
> ak

Reply via email to