Re: controlled vocabulary

Rupinder Singh Mazara Fri, 25 Aug 2006 08:27:43 -0700

Hi Xin

then perhaps you can change it to Field.Index.TOKENIZED, but i wasnot aware that pubmed boosts mesh terms, they broadly classify terms asmajor and minor, if you plan to use this simple system of classificationconsider adding the major terms twice to the document ?


Zhao, Xin wrote:

Hi, Rupinder,
My understanding is Field.Index.NO_NORMS disables index-time boostingand field length normalization at the same time. But I do needindex-time boosting to store the scoring of each mesh term. Have Imissed anything?
Thank you very much for your help,
Xin
----- Original Message ----- From: "Rupinder Singh Mazara"<[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Friday, August 25, 2006 10:49 AM
Subject: Re: controlled vocabulary
hi Xin
this is take a look at this you can add multiple fields with thename mesh
for ( i=0; i< meshList.size() ; i++ ){
   meshTerm = meshList.get(i)
document.addField( new Field( "mesh", meshTerm.semanticWebConceptId,Field.Store.YES , Field.Index.NO_NORMS );
}
when querying this index, create a analyzer that infers the textstring and generates id's that correspond to the mesh term in thesemantic web
Zhao, Xin wrote:
Hi,
Thank you for your reply. I had thought about the first twosolutions before. If we apply one doc for each MeSH term, it wouldbe 26 docs for each item digested(we actually need the top 25 MeSHterms generated), would it be any problem if there are too manydocuments? If we apply field name like "mesh_1", "mesh_2"..., whenit comes to search, we will have to generate a loop for each singleone of the query terms( there will be more than 20-30 terms onaverage, since we are using sematic web to implement conceptsearch), do you think it would affect the performance in a very badway?
Regards,
Xin


----- Original Message ----- From: "Dedian Guo" <[EMAIL PROTECTED]>
To: <[email protected]>; "Zhao, Xin" <[EMAIL PROTECTED]>
Sent: Thursday, August 24, 2006 4:22 PM
Subject: Re: controlled library
in my solution, you can apply one doc for each mesh term, or applydifferentkeyword such as "mesh_1"...."mesh_10" for your top 10 terms...or ucan groupyour mesh terms as one string then add into a field, which requiresa simple
string parser for the group string when you wanna read the terms...

not sure if that works or answers your question...

On 8/24/06, Zhao, Xin <[EMAIL PROTECTED]> wrote:
Hi,
I have a design question. Here is what we try to do for indexing:
We designed an indexing tool to generate standard MeSH terms frommedicalcitations, and then use Lucene to save the terms and citations forfuture
search. The information we need to save are:
a) the exact mesh terms (top 10)
b) the score for each term
so the codings are like
-----------------------------------
for the top 10 MeSH Terms
myField=Field.Keyword("mesh", mesh.toLowerCase());
myField.setBoost(score);
doc.add(myFiled);
end for
------------------------------------
as you could see we generate all the terms under named field"mesh". If I
understand correctly, all the fields under the same name would
eventually save into one field, with all the scores be normalizedintofiled boost. In this case, we wouldn't be able to save separatescore, sothe information is lost. Am I correct? Is there anyway we couldchange it? Iunderstand Lucene is for keyword search, and what we try to do isControlled
Vocabulary search, Any other tool we could use?

Thank you,
Xin
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: controlled vocabulary

Reply via email to