Bug#582708: [Bug-dico] Bug#582708: dico: cannot fetch a definition
Quoting Sergey Poznyakoff : database { name "fd-tur-eng"; handler "dictorg sort database=/usr/share/dictd/freedict-tur-eng"; } Ok, i did not see the ordering problems, thank you. I have deactivated global sort because loading so much database with this option is absolutely impracticable. I reactivated it for this one only and it is now working well. So i guess the best is to open a bug on dict-freedict-tur-eng to have them sort their index, and perhaps have dicodconfig maintain some kind of greylist for databases needing such care until it's fixed. This issue can then be closed in the next upload. -- Marc Dequènes (Duck) pgprfSCHsxtnz.pgp Description: PGP Digital Signature
Bug#582708: [Bug-dico] Bug#582708: dico: cannot fetch a definition
Marc Dequènes (Duck) ha escrit: > Looking at the fd-tur-eng index, i can find these entries, but is > seems it is then missing from the dict.dz, if i understand the format > well. I made a few other searches, and was not capable to reproduce > with another search pattern on several other database. I'll have a > better look tomorrow before perhaps bugging the dict-freedict-tur-eng > package. The problem with that database is that its index uses strange collating order. Take a look at lines 2102 and below. You'll see: ahfat DkphC6 ahitDEKZCA ahitJ7du/ ahize LlZ6Eo ahkâmı diniye B5fqGK Then, look at line 2289. There we have: ahır/Ih D/ ahırDI15Dp ahırNmBhCT Now, the collating order in Turkish is: abcçdefgğhıij... As you see the index file is ordered improperly. The proper ordering would be ahfat ahır ... ahit That's why you cannot get the definition. To fix this add sort[1] to the loading sequence in fd-tur-eng database: database { name "fd-tur-eng"; handler "dictorg sort database=/usr/share/dictd/freedict-tur-eng"; } Regards, Sergey [1] http://dico.prog.gnu.org.ua/manual/html_node/Dictorg.html#IDX148 -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#582708: [Bug-dico] Bug#582708: dico: cannot fetch a definition
Quoting Sergey Poznyakoff : And now getting the definition for each one: koruma -> No match kurama -> OK kurma -> OK kurum -> No match kuruma -> OK kurumak -> No match kurutma -> No match In fact, when i tested, i forgot the "-d fd-tur-eng", and should have had "No match" everywhere. Looking at the fd-tur-eng index, i can find these entries, but is seems it is then missing from the dict.dz, if i understand the format well. I made a few other searches, and was not capable to reproduce with another search pattern on several other database. I'll have a better look tomorrow before perhaps bugging the dict-freedict-tur-eng package. -- Marc Dequènes (Duck) pgpZrVpVd5pzx.pgp Description: PGP Digital Signature
Bug#582708: [Bug-dico] Bug#582708: dico: cannot fetch a definition
Marc Dequènes (Duck) ha escrit: > And now getting the definition for each one: > koruma -> No match > kurama -> OK > kurma -> OK > kurum -> No match > kuruma -> OK > kurumak -> No match > kurutma -> No match Very strange. I cannot reproduce this. With 2.0.90 I get definitions for each one of them. Regards, Sergey -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#582708: [Bug-dico] Bug#582708: dico: cannot fetch a definition
You're right, it is working in a few cases, i should have tried more. Let's say we start with: # dico --host=dico.duckcorp.org --noauth -m kuruma fd-tur-eng "koruma" fd-tur-eng "kurama" fd-tur-eng "kurma" fd-tur-eng "kurum" fd-tur-eng "kuruma" fd-tur-eng "kurumak" fd-tur-eng "kurutma" And now getting the definition for each one: koruma -> No match kurama -> OK kurma -> OK kurum -> No match kuruma -> OK kurumak -> No match kurutma -> No match -- Marc Dequènes (Duck) pgpWzTk8j6KwG.pgp Description: PGP Digital Signature
Bug#582708: [Bug-dico] Bug#582708: dico: cannot fetch a definition
Marc Dequ�nes (Duck) ha escrit: > Installed on dico.duckcorp.org, but does not fix this bug (tested both > web and cli). Here's what I get when talking to dico.duckcorp.org: $ telnet dico.duckcorp.org dict Trying 193.200.42.177... Connected to dico.duckcorp.org (193.200.42.177). Escape character is '^]'. 220 Toushirou.duckcorp.org dicod (GNU dico 2.0.90) <24097.1274635...@toushirou.duckcorp.org> define fd-deu-eng "kurkuma" define fd-deu-eng "kurkuma" 150 1 definitions found: list follows 151 "kurkuma" fd-deu-eng "" Kurkuma [kurkuːmaː] (n) , s.(n ) turmeric . 250 Command complete [d/m/c = 1/0/15 0.001r 0.000u 0.000s] Regards, Sergey -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#582708: [Bug-dico] Bug#582708: dico: cannot fetch a definition
Quoting Sergey Poznyakoff : ftp://download.gnu.org.ua/pub/alpha/dico/dico-2.0.90.tar.gz Installed on dico.duckcorp.org, but does not fix this bug (tested both web and cli). -- Marc Dequènes (Duck) pgpxBE4YBYPM3.pgp Description: PGP Digital Signature
Bug#582708: [Bug-dico] Bug#582708: dico: cannot fetch a definition
ãí¥ï ç¤¥í¥¨ïª ha escrit: > Do you consider it to be stable enough for release ? Or is it just for > testing ? It is definitely more stable than 2.0 after applying patches, because it ensures that no changes would get lost in between. Regards, Sergey -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#582708: [Bug-dico] Bug#582708: dico: cannot fetch a definition
On Sun, May 23, 2010 at 06:55:21PM +0300, Sergey Poznyakoff wrote: > That's because of that 15 patches in between. Ahmed, perhaps you can > use 2.0.90 for the debian package? ---end quoted text--- Do you consider it to be stable enough for release ? Or is it just for testing ? -- أحمد المحمودي (Ahmed El-Mahmoudy) Digital design engineer GPG KeyID: 0xEDDDA1B7 GPG Fingerprint: 8206 A196 2084 7E6D 0DF8 B176 BC19 6A94 EDDD A1B7 -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#582708: [Bug-dico] Bug#582708: dico: cannot fetch a definition
ãí¥ï ç¤¥í¥¨ïª ha escrit: > Actually the original version of this patch did not apply cleanly, there > were two or three hunks, and then I refreshed the patch. That's because of that 15 patches in between. Ahmed, perhaps you can use 2.0.90 for the debian package? Regards, Sergey -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#582708: [Bug-dico] Bug#582708: dico: cannot fetch a definition
On Sun, May 23, 2010 at 05:35:53PM +0200, Marc Dequènes (Duck) wrote: > Quoting Sergey Poznyakoff : > > >The attached patch coalesces multiple matches into one entry. > >Please give it a try and let me know if it works for you. > > It works like a charm, thank you :-). ---end quoted text--- Sergey: please note that it is the "coalesces multiple matches" patch that fixed the issue, not dico 2.0.90 (I don't think that Marc tried it). I actually forgot for which bug was "coalesces multiple matches" sent ! -- أحمد المحمودي (Ahmed El-Mahmoudy) Digital design engineer GPG KeyID: 0xEDDDA1B7 GPG Fingerprint: 8206 A196 2084 7E6D 0DF8 B176 BC19 6A94 EDDD A1B7 -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#582708: [Bug-dico] Bug#582708: dico: cannot fetch a definition
On Sun, May 23, 2010 at 05:19:51PM +0200, Marc Dequènes (Duck) wrote: > 0002-Fix-improper-handling-of-conversion-errors-in-levens.patch ---end quoted text--- Actually the original version of this patch did not apply cleanly, there were two or three hunks, and then I refreshed the patch. -- أحمد المحمودي (Ahmed El-Mahmoudy) Digital design engineer GPG KeyID: 0xEDDDA1B7 GPG Fingerprint: 8206 A196 2084 7E6D 0DF8 B176 BC19 6A94 EDDD A1B7 -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#582708: [Bug-dico] Bug#582708: dico: cannot fetch a definition
Quoting Sergey Poznyakoff : The attached patch coalesces multiple matches into one entry. Please give it a try and let me know if it works for you. It works like a charm, thank you :-). -- Marc Dequènes (Duck) pgpVbURCt300N.pgp Description: PGP Digital Signature
Bug#582708: [Bug-dico] Bug#582708: dico: cannot fetch a definition
Marc Dequènes (Duck) ha escrit: > > Perhaps another buffer problem. I'm using a lot of databases on my > config (74 to be precise), so perhaps this is a problem. Strange, I have installed the same set of databases on my box (except for english-german and german-english which I was unable to find) and still get a result. There were some 15 commits after 2.0 (apart from those of today), which may well affect the case. To make sure we are using the same codebase, please try this version: ftp://download.gnu.org.ua/pub/alpha/dico/dico-2.0.90.tar.gz Regards, Sergey -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#582708: [Bug-dico] Bug#582708: dico: cannot fetch a definition
Quoting Sergey Poznyakoff : Yes, when querying your server I do get a "No match" answer. However, trying on my box I get a definition. Are you sure the patch has applied cleanly? I proofread the build log, and yes i am: dpkg-source: info: using source format `3.0 (quilt)' dpkg-source: warning: patches have not been applied, applying them now (use --no-preparation to override) dpkg-source: info: applying dico+kbsd.diff dpkg-source: info: applying 0002-Fix-improper-handling-of-conversion-errors-in-levens.patch dpkg-source: info: applying 0003-Avoid-using-fixed-size-buffer-in-dictorg.c.patch dpkg-source: info: building dico using existing ./dico_2.0.orig.tar.gz dpkg-source: info: building dico in dico_2.0-8~1.gbpa16fbc.debian.tar.gz dpkg-source: info: building dico in dico_2.0-8~1.gbpa16fbc.dsc Perhaps another buffer problem. I'm using a lot of databases on my config (74 to be precise), so perhaps this is a problem. -- Marc Dequènes (Duck) pgpoLa5yp03zS.pgp Description: PGP Digital Signature
Bug#582708: [Bug-dico] Bug#582708: dico: cannot fetch a definition
Marc Dequènes (Duck) ha escrit: > Using 0003-Avoid-using-fixed-size-buffer-in-dictorg.c.patch, i still > get a no match. Try this to see by yourself: > dico --host=dico.duckcorp.org -a kurutma > ("kurutma" being one of the found entries for the same "kuruma" search) Yes, when querying your server I do get a "No match" answer. However, trying on my box I get a definition. Are you sure the patch has applied cleanly? Regards. Sergey -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#582708: [Bug-dico] Bug#582708: dico: cannot fetch a definition
Sergey Poznyakoff ha escrit: > ãí¥ï ç¤¥í¥¨ïª ha escrit: > > > This patch does not fix that one gets duplicates, is it intended to > > be this way ? > [...] > This will require a bit more work, though. The attached patch coalesces multiple matches into one entry. Please give it a try and let me know if it works for you. Regards, Sergey diff --git a/include/dico/list.h b/include/dico/list.h index f816362..8f5419d 100644 --- a/include/dico/list.h +++ b/include/dico/list.h @@ -22,12 +22,18 @@ /* Lists */ +#define DICO_LIST_COMPARE_HEAD 0x01 +#define DICO_LIST_COMPARE_TAIL 0x02 + typedef int (*dico_list_iterator_t)(void *item, void *data); typedef int (*dico_list_comp_t)(const void *, void *); dico_list_t dico_list_create(void); void dico_list_destroy(dico_list_t *list); int dico_list_clear(struct dico_list *list); +int dico_list_set_flags(struct dico_list *list, int flags); +int dico_list_get_flags(struct dico_list *list); + int dico_list_set_free_item(struct dico_list *list, dico_list_iterator_t free_item, void *data); dico_list_comp_t dico_list_set_comparator(dico_list_t list, diff --git a/lib/list.c b/lib/list.c index cae8f83..d15de44 100644 --- a/lib/list.c +++ b/lib/list.c @@ -30,6 +30,7 @@ struct list_entry { struct dico_list { size_t count; struct list_entry *head, *tail; +int flags; struct iterator *itr; dico_list_comp_t comp; dico_list_iterator_t free_item; @@ -56,6 +57,7 @@ dico_list_create() if (p) { p->count = 0; p->head = p->tail = NULL; + p->flags = 0; p->itr = NULL; p->comp = cmp_ptr; p->free_item = NULL; @@ -266,6 +268,26 @@ dico_list_set_comparator(struct dico_list *list, dico_list_comp_t comp) return prev; } +int +dico_list_set_flags(struct dico_list *list, int flags) +{ + if (!list) { + errno = EINVAL; + return 1; + } + list->flags = flags; + return 0; +} + +int +dico_list_get_flags(struct dico_list *list) +{ + if (list) + return list->flags; + return 0; +} + + dico_list_comp_t dico_list_get_comparator(struct dico_list *list) { @@ -285,6 +307,9 @@ dico_list_append(struct dico_list *list, void *data) errno = EINVAL; return 1; } +if ((list->flags & DICO_LIST_COMPARE_TAIL) && list->comp + && list->tail && list->comp(list->tail->data, data) == 0) + return EEXIST; ep = malloc(sizeof(*ep)); if (!ep) return 1; @@ -308,6 +333,9 @@ dico_list_prepend(struct dico_list *list, void *data) errno = EINVAL; return 1; } +if ((list->flags & DICO_LIST_COMPARE_HEAD) && list->comp + && list->head && list->comp(list->head->data, data) == 0) + return EEXIST; ep = malloc(sizeof(*ep)); if (!ep) return 1; diff --git a/modules/dict.org/dictorg.c b/modules/dict.org/dictorg.c index 909a15d..4d638db 100644 --- a/modules/dict.org/dictorg.c +++ b/modules/dict.org/dictorg.c @@ -536,8 +536,26 @@ register_strategies(void) } static int +compare_entry(const void *a, const void *b) +{ +const struct index_entry *epa = a; +const struct index_entry *epb = b; +compare_count++; +return utf8_strcasecmp(epa->word, epb->word); +} + +static int +compare_entry_ptr(const void *a, const void *b) +{ +const struct index_entry *epa = *(const struct index_entry **)a; +const struct index_entry *epb = *(const struct index_entry **)b; +return compare_entry(epa, epb); +} + +static int common_match(struct dictdb *db, const char *word, - int (*compare) (const void *, const void *), struct result *res) + int (*compare) (const void *, const void *), + int unique, struct result *res) { struct index_entry x, *ep; @@ -561,6 +579,12 @@ common_match(struct dictdb *db, const char *word, memerr("common_match"); return 0; } + if (unique) { + dico_list_set_comparator(res->list, + (int (*)(const void *, void *)) + compare_entry); + dico_list_set_flags(res->list, DICO_LIST_COMPARE_TAIL); + } for (p++; p < ep; p++) if (!RESERVED_WORD(db, p->word)) dico_list_append(res->list, p); @@ -572,26 +596,9 @@ common_match(struct dictdb *db, const char *word, static int -compare_entry(const void *a, const void *b) -{ -const struct index_entry *epa = a; -const struct index_entry *epb = b; -compare_count++; -return utf8_strcasecmp(epa->word, epb->word); -} - -static int -compare_entry_ptr(const void *a, const void *b) -{ -const struct index_entry *epa = *(const struct index_entry **)a; -const struct index_entry *epb = *(const struct index_entry **)b; -return compare_entry(epa, epb); -} - -static int exact_match(struct dictdb *db, const char *word, struct result *res) { -return common_match(db, word, compare_entry, res); +return common_match(db, word, compare_entry, 1, res); } static int @@ -609,7 +616,7 @@ compare_prefix(const void *a, const void *b) static int prefix_match(struct dictdb *db, const char *word, struct result *res) { -retur
Bug#582708: [Bug-dico] Bug#582708: dico: cannot fetch a definition
Coin, Quoting Sergey Poznyakoff : dico --host=localhost -m kuruma I got: fd-deu-eng "kurkuma" Which does have a definition indeed. Using 0003-Avoid-using-fixed-size-buffer-in-dictorg.c.patch, i still get a no match. Try this to see by yourself: dico --host=dico.duckcorp.org -a kurutma ("kurutma" being one of the found entries for the same "kuruma" search) Regards. -- Marc Dequènes (Duck) pgpuw5Gy2QLx3.pgp Description: PGP Digital Signature
Bug#582708: [Bug-dico] Bug#582708: dico: cannot fetch a definition
ãí¥ï ç¤¥í¥¨ïª ha escrit: > This patch does not fix that one gets duplicates, is it intended to > be this way ? The patch was not intended to change this particular behavior. Since the database does contain several entries with the same key, displaying them all is correct, at least according to the standard. However, I agree that it would be logical and more practical to compress them all into one. This will require a bit more work, though. Regards, Sergey -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#582708: [Bug-dico] Bug#582708: dico: cannot fetch a definition
Hello, On Sun, May 23, 2010 at 02:52:39PM +0300, Sergey Poznyakoff wrote: > That's a bug. A fix is attached. Thank you. ---end quoted text--- This patch does not fix that one gets duplicates, is it intended to be this way ? -- أحمد المحمودي (Ahmed El-Mahmoudy) Digital design engineer GPG KeyID: 0xEDDDA1B7 GPG Fingerprint: 8206 A196 2084 7E6D 0DF8 B176 BC19 6A94 EDDD A1B7 -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#582708: [Bug-dico] Bug#582708: dico: cannot fetch a definition
ãí¥ï ç¤¥í¥¨ïª ha escrit: > 1. I confirm that I get duplicates too, the reason as I see, is that > the matching word has several definitions in that dictionary. Yes, that's right. > dico --host=localhost -m kuruma > > I got: > > fd-deu-eng "kurkuma" > > Which does have a definition indeed. That's a bug. A fix is attached. Thank you. Regards, Sergey >From 956846d3d1b5e35d9012be97b33066e480669dc1 Mon Sep 17 00:00:00 2001 From: Sergey Poznyakoff Date: Sun, 23 May 2010 14:52:13 +0300 Subject: [PATCH] Avoid using fixed-size buffer in dictorg.c. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Fixes bug reported by Marc Dequènes (debian #582708). * modules/dict.org/dictorg.c (read_index): Use dico stream instead of FILE to avoid using fixed-size buffer. --- modules/dict.org/dictorg.c | 28 +--- 1 files changed, 21 insertions(+), 7 deletions(-) diff --git a/modules/dict.org/dictorg.c b/modules/dict.org/dictorg.c index 7b040cd..17a310f 100644 --- a/modules/dict.org/dictorg.c +++ b/modules/dict.org/dictorg.c @@ -232,10 +232,9 @@ static int read_index(struct dictdb *db, const char *idxname, int tws) { struct stat st; -FILE *fp; -char buf[512]; /* FIXME: fixed size */ int rc; dico_list_t list; +dico_stream_t stream; if (stat(idxname, &st)) { dico_log(L_ERR, errno, _("open_index: cannot stat `%s'"), idxname); @@ -246,11 +245,21 @@ read_index(struct dictdb *db, const char *idxname, int tws) idxname); return 1; } -fp = fopen(idxname, "r"); -if (!fp) { - dico_log(L_ERR, errno, _("open_index: cannot open `%s'"), idxname); + + +stream = dico_mapfile_stream_create(idxname, DICO_STREAM_READ); +if (!stream) { + dico_log(L_ERR, errno, + _("cannot create stream `%s'"), idxname); return 1; } +rc = dico_stream_open(stream); +if (rc) { + dico_log(L_ERR, 0, + _("cannot open stream `%s': %s"), + idxname, dico_stream_strerror(stream, rc)); + dico_stream_destroy(&stream); +} list = dico_list_create(); if (!list) { @@ -260,17 +269,21 @@ read_index(struct dictdb *db, const char *idxname, int tws) dico_iterator_t itr; size_t i; struct index_entry *ep; + char *buf = NULL; + size_t bufsize = 0; + size_t rdsize; rc = 0; i = 0; - while (fgets(buf, sizeof(buf), fp)) { + while (!dico_stream_getline(stream, &buf, &bufsize, &rdsize)) { i++; dico_trim_nl(buf); rc = parse_index_entry(idxname, i, list, buf, tws); if (rc) break; } + free(buf); if (rc) { dico_list_set_free_item(list, free_index_entry, NULL); } else { @@ -288,7 +301,8 @@ read_index(struct dictdb *db, const char *idxname, int tws) dico_list_destroy(&list); } -fclose(fp); +dico_stream_close(stream); +dico_stream_destroy(&stream); return rc; } -- 1.6.0.3