Re: merged search of document

Thomas Scheffler Wed, 07 Jan 2004 13:18:38 -0800

Am Mit, den 07.01.2004 schrieb Dror Matalon um 20:10:
> On Wed, Jan 07, 2004 at 07:58:52PM +0100, Thomas Scheffler wrote:
> > Am Mit, den 07.01.2004 schrieb Dror Matalon um 19:00:
> > > The solution is simple, but you need to think of it conceptually in a
> > > different way. Instead of "all documents with the same DocID are the same
> > > document" think "fetch all the document where DocId is XYZ."
> > > 
> > > Assuming the contents are in a field called contents
> > > you do 
> > > +(DocID:XYZ) (contents:foo) (contents:bar)
> > 
> > I allready was on that way but think of a search like (foo -bar). With
> > your solution it will result in a hit because on page 345 (to keep my
> > example) is the word "foo" and no "bar". Of cause I want with my model,
> > that the book don't get a hit for that query. You see how hard it is to
> > handle, isn't it? 
> 
> I think, I'm starting to understand. So you want to treat several
> documents as one, and if the hit fails for one of the documents, it
> should fail for all the documents with the same id. OK. This begs the
> question. Why don't you make all these document with the same id one
> document, and index them together?


This would be a functional but not nice solution. The "pages" are send
to my java class. This point I cannot change cause it api related
restriction. To index 1000 pages I have to index the first one, when I
get the second one I need to reget the first page, bind both together an
send it to the indexwriter. I must keep track of every single page the
"book" contains. This procedure is made for every page and get uglier
while page size is increasing. Furthermore my "book" allows single pages
to be deleted or updated. Every time such a atomic task
(adding/deleting) is performed the index for the whole "book" must be
restored. The mechanism to transfer a "page" to a lucene document is
very time consuming, so I wan't to do that stuff as less as possible. It
would be great as you see, if somehow lucene is possible to thread a
"logical document" (consisting of several lucene documents) like normal
lucene documents.

> 
> > 
> > > 
> > > For that matter, you can use a standard analyzer on the query and use a
> > > boolean to tie it to the specific document set.
> > > 
> > > This is how we do searching on a specific channel at fastbuzz.com.
> > > 
> > > Dror
> > > 
> > > 
> > > On Wed, Jan 07, 2004 at 05:21:43PM +0100, Thomas Scheffler wrote:
> > > > 
> > > > Jamie Stallwood sagte:
> > > > > +(DocID:XYZ DocID:ABC) +(foo bar)
> > > > >
> > > > > will find a document that (MUST have (xyz OR abc)) AND (MUST have (foo OR
> > > > > bar)).
> > > > 
> > > > This is just the solution for the example in real world I really don't
> > > > have noc documents containing "foo" or "bar". What I meant was: Make
> > > > Lucene think, that all Documents with the same DocID are ONE Document.
> > > > Imagine you have a big book, say 1000 pages. Instead of putting the whole
> > > > book in the index, you split it up in single pages and index them. Now
> > > > it's faster if a page changes or is deleted to update your index instead
> > > > of doing it over and over again for all 1000 pages. So you problem starts
> > > > when you're searching on the book. You search for (foo bar), foo is on
> > > > site 345 while bar ist on 435. You want to get a hit for the book. So I
> > > > need a solution matching this more generic example.
> > > > 
> > > > >
> > > > > -----Original Message-----
> > > > > From: Thomas Scheffler [mailto:[EMAIL PROTECTED]
> > > > > Sent: 07 January 2004 11:23
> > > > > To: [EMAIL PROTECTED]
> > > > > Subject: merged search of document
> > > > >
> > > > > Hi,
> > > > >
> > > > > I need a tip for implementation. I have several documents all of them with
> > > > > a field named DocID. DocID identifies not a single Lucene Document but a
> > > > > collection of them. When I wan't to start a seach it should handle the
> > > > > search in that way, as these lucene documents where one.
> > > > >
> > > > > example:
> > > > >
> > > > > Document 1: DocID:XYZ
> > > > >
> > > > > containing: foo
> > > > >
> > > > > Document 2: DocID:XYZ
> > > > >
> > > > > containing: bar
> > > > >
> > > > > Document 3: DocID:ABC
> > > > >
> > > > > containing: foo bar
> > > > >
> > > > > Document 4: GHJ
> > > > >
> > > > > containing: foo
> > > > >
> > > > > As you already guesses, when I'm searching for "+foo +bar" I wan't the
> > > > > hits to contain Document 1, Document 2 and Document 3, not Document 4. Is
> > > > > that clear what I want? How do I implement such a monster? Is that
> > > > > possible with lucene? The content is not stored within lucene it's just
> > > > > tokenized and indexed.
> > > > >
> > > > > Any help?
> > > > >
> > > > > Thanks in advance!
> > > > >
> > > > > Thomas Scheffler
> > > > >
> > > > >
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > > > For additional commands, e-mail: [EMAIL PROTECTED]
> > > > >
> > > > >
> > > > >
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > > > For additional commands, e-mail: [EMAIL PROTECTED]
> > > > >
> > > > >
> > > > 
> > > > 
> > > > -- 
> > > > 
> > > > 
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > > For additional commands, e-mail: [EMAIL PROTECTED]
> > > > 
> > --
> > Fachbegriffe der Informatik - Einfach erklÃrt
> > =============================================
> > NÂ 37 -- Fehlertolerant :
> > 
> > Das Programm erlaubt keine Benutzereingaben. 
> > 
--
Fachbegriffe der Informatik - Einfach erklÃrt
=============================================
NÂ 385 -- fÃgt sich in bestehende Strukturen ein :

Microsoft Passport-Account nÃtig (Henryk PlÃtz)

signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil

Re: merged search of document

Reply via email to