hy i would like use the David Spencer NGramSpeller for N fields in a index. With this algorithm, 1 field i = 1 NGramSpeller index. So if i have N fields, i must create N NgramSpeller index. ok why not... but in fact the structure for a 5gram(for example) is : "word" "transposition" "3gram" "4gram" "5gram" + the field "freq " for the popularity of the word in the field to be processed + the document is boosted during the indexation
As we see, from "word" to "5gram" (5/6 fields) the data are only dependant of the word and not of the data of the index to be processed. So, for N fields , i have N times the same information from "word" field to "5gram" field in N index. it's not really optimized for n fields. ---First method ---- In fact i would like change the field "freq" to field named "freq_nameofField". The structure of document for the field "field1" could be : "word" .. "5gram" "freq_field1" ,freq for the field "field1" so i have: - n document for 1 word (each document have a freq field for a specific field) - but only 1 index . My structure of the index will be: "word" .. "5gram" "freq_field1" "freq_field2" ... "freq_fieldn" ---Second method ---- But in the first method the 5/6 of the information of a document are redundant and not useful (from word to 5gram field), so i would like create only 1 document for 1 word, with this structure: "word" .. "5gram" "freq_field1" ,freq for the field "field1" "freq_field2" ,freq for the field "field2" "freq_field3" ,freq for the field "field3" But the problem is the boosting of the document: the boost value depend on the freq and i have differents freq to be processed. Have a idea to not allow redondant information in the NGramSpeller index for n field ?