Re: controlled vocabulary

Zhao, Xin Fri, 25 Aug 2006 12:55:28 -0700

now. i have a second thought about one meah term per document. the scoringformula(hits too) is based on document, right? does it mean that weshouldn't have more than one document for each object indexed?for example, i try to index a publication, for some of the information,like title, abstract i would like to store and index them using defaultsimilarity, while the other information i would like to use customizedsimilarity. i probably should use a different indexing directory and writerinstead of two documents in the same index, right?thank you for helping me. you could see that i am in the early learningstage now.

xin

----- Original Message -----From: "Zhao, Xin" <[EMAIL PROTECTED]>

To: <[email protected]>
Sent: Friday, August 25, 2006 10:21 AM
Subject: Re: controlled vocabulary

Hi,
Thank you for your reply. I had thought about the first two solutionsbefore. If we apply one doc for each MeSH term, it would be 26 docs foreach item digested(we actually need the top 25 MeSH terms generated),would it be any problem if there are too many documents? If we apply fieldname like "mesh_1", "mesh_2"..., when it comes to search, we will have togenerate a loop for each single one of the query terms( there will be morethan 20-30 terms on average, since we are using sematic web to implementconcept search), do you think it would affect the performance in a verybad way?
Regards,
Xin
----- Original Message -----From: "Dedian Guo" <[EMAIL PROTECTED]>
To: <[email protected]>; "Zhao, Xin" <[EMAIL PROTECTED]>
Sent: Thursday, August 24, 2006 4:22 PM
Subject: Re: controlled library
in my solution, you can apply one doc for each mesh term, or applydifferentkeyword such as "mesh_1"...."mesh_10" for your top 10 terms...or u cangroupyour mesh terms as one string then add into a field, which requires asimple
string parser for the group string when you wanna read the terms...

not sure if that works or answers your question...

On 8/24/06, Zhao, Xin <[EMAIL PROTECTED]> wrote:
Hi,
I have a design question. Here is what we try to do for indexing:
We designed an indexing tool to generate standard MeSH terms frommedicalcitations, and then use Lucene to save the terms and citations forfuture
search. The information we need to save are:
a) the exact mesh terms (top 10)
b) the score for each term
so the codings are like
-----------------------------------
for the top 10 MeSH Terms
myField=Field.Keyword("mesh", mesh.toLowerCase());
myField.setBoost(score);
doc.add(myFiled);
end for
------------------------------------
as you could see we generate all the terms under named field "mesh". IfI
understand correctly, all the fields under the same name would
eventually  save into one field, with all the scores be normalized into
filed boost. In this case, we wouldn't be able to save separate score,sothe information is lost. Am I correct? Is there anyway we could changeit? Iunderstand Lucene is for keyword search, and what we try to do isControlled
Vocabulary search, Any other tool we could use?

Thank you,
Xin
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: controlled vocabulary

Reply via email to