NGramSpeller for n field

Nicolas Maisonneuve Fri, 08 Oct 2004 08:08:49 -0700

hy
i would like use the David Spencer NGramSpeller for N fields in a index.
With this algorithm, 1 field i = 1 NGramSpeller index.
So if i have N fields, i must create N NgramSpeller index. ok why not... but in fact 
the structure for a 5gram(for example) is : 
"word"
"transposition"
"3gram"
"4gram"
"5gram"
+ the field "freq " for the popularity of the word in the field to be processed 
+ the document is boosted during the indexation


As we see, from "word" to "5gram" (5/6 fields) the data are only dependant of the word 
 and not of the data of the index to be processed. So, for N fields , i have N times 
the same information from "word" field to "5gram" field in N index. it's not really 
optimized for n fields.

---First method ----
In fact  i would like change the field "freq" to field named "freq_nameofField". The 
structure of document for the field "field1"  could be 

:
"word"
..
"5gram"
"freq_field1" ,freq for the field "field1"

so i have:
- n document for 1 word (each document have a freq field for a specific field) 
- but only 1 index . My structure of the index 

will be:
"word"
..
"5gram"
"freq_field1" 
"freq_field2" 
...
"freq_fieldn"


---Second method ----
But in the first method the 5/6 of the information of a document are redundant and not 
useful (from word to 5gram field), so i would like create only 1 document for 1 word, 
with this structure:
"word"
..
"5gram"
"freq_field1" ,freq for the field "field1"
"freq_field2" ,freq for the field "field2"
"freq_field3" ,freq for the field "field3"

But the problem is the boosting of the document: the boost value depend on the freq 
and i have differents freq  to be processed.

Have a idea to not allow redondant information in the NGramSpeller index for n field ?

NGramSpeller for n field

Reply via email to