Hello Teijo,

On 03/24/2017 01:24 AM, Teijo wrote:
> Hello,
> If I search given word with search.cgi, I get correct number of occurences.
> But if I do it with SQL (no matter in mysql or sqlite3), they show extra
> occurence. For example, if a given word is in a given original file
> twice, they tell that there are three occurences. SQL query is almost
> the same one found in Mnogosearch's manual, except that I am using only
> one word:
> SELECT url.url, count(*) AS RANK FROM dict, url WHERE
> url.rec_id=dict.url_id AND dict.word IN ('word') GROUP BY url.url ORDER
> BY rank DESC;
> I'd like to know (by SQL query) position of word in the original file
> (to use filepos function). There is at least coord column in dict table.
> Coord contains section id and word's position in relationship to
> section, if I have understood correctly. How to extract the relative
> position from coord, or is the position information elsewhere in
> database? If I disabled all sections, would coord actually contain the
> absolute position?
> I'm using "single mode" as to database.

Coord is a 32 bit number.

- The highest 8 bits are section ID (e.g. title, body, etc,
   according to Section commands in indexer.conf)

- The lowest 24 bits are position inside this section.

- The last hit inside each combination (url_id,word,secno) is the
section length (i.e. the total number of words in this section on)
in this document.

This MySQL query return the information in a readable form:

SELECT url_id,word,coord>>24 AS secno,coord&0xFFFFFF AS pos FROM dict
WHERE word='mnogosearch' ORDER BY secno,pos;

| url_id | word        | secno | pos |
|      1 | mnogosearch |     1 |   1 |
|      1 | mnogosearch |     1 |  14 |
|      1 | mnogosearch |     1 |  28 |
|      1 | mnogosearch |     1 |  42 |
|      1 | mnogosearch |     1 |  76 |
|      1 | mnogosearch |     1 |  77 |
|      1 | mnogosearch |     1 |  85 |
|      1 | mnogosearch |     1 | 105 | <- section 1 length
|      1 | mnogosearch |     2 |   1 |
|      1 | mnogosearch |     2 |   6 | <- section 2 length
|      1 | mnogosearch |     3 |  54 |
|      1 | mnogosearch |     3 |  69 | <- section 3 length
|      1 | mnogosearch |     4 |   1 |
|      1 | mnogosearch |     4 |  11 | <- section 4 length
|      1 | mnogosearch |     8 |   2 |
|      1 | mnogosearch |     8 |   4 | <- section 8 length

Lines that are not marked as "section X length" are actual word hits.

> Best regards,
> Teijo
> _______________________________________________
> General mailing list
> General@mnogosearch.org
> http://lists.mnogosearch.org/listinfo/general
General mailing list

Reply via email to