Question about proximity searching and wildcards

Mariella Di Giacomo Tue, 18 Jan 2005 14:02:16 -0800

Hello,

We are using Lucene to index scientific articles.
We are also using Luke to verify the fields and values we index.

One of the fields we index is the author field that consists of the authors that have written the scientific article (an example of such data is shown at the bottom of the email).

The most common search on the author field is the following:

"find all the authors whose last name starts with Cole and the first name starts with S"

We thought of a proximity search (we want to make sure we take the first name and not the middle name/initial) similar like that

"Author:cole* S*"~1
"Author:cole* AND Author:S*"~1

What we were expecting was: all the documents that contain authors whose last name starts with Cole and the first name start with S and those words are near (next to each other)

Unfortunately when we type that search through the Luke "search interface" we do not get the expansion of the words when using the similarity at the same time

So my question:

1) Does Luke cannot deal with that ? 2) Is the query not properly structured to get what expected ? Which would be the correct one ?

If Luke cannot deal with that, when writing the query through the Java application, which would be the query to be provided to get what expected ? Do we need to use a query filter ?


Thanks a lot in advance for your help,


Mariella


_____________________________________________________________________________________________
E.g.

Below there are three examples of data we index and to be precise the information related to the Authors field.

The following is the information related to scientific articles that we index.

1)
The Authors field consists of two authors

Title: Using Document Dimensions for Enhanced Information Retrieval Authors: Jayasooriya, Thimala([EMAIL PROTECTED]); Manandhar, Suresha([EMAIL PROTECTED]) Affiliations: a. Department of Computer Science, University of York Abstract (English): Conventional document search techniques are constrained by attempting to match individual keywords or phrases to source documents. Thus, these techniques miss out documents that contain semantically similar terms, thereby achieving a relatively low degree of recall. At the same time, processing capabilities and tools for syntactic and semantic analysis of language have advanced to the point where an index-time linguistic analysis of source documents is both feasible and realistic. In this paper, we introduce document dimensions, a means of classifying or grouping terms discovered in documents. Using an enhanced version of Jakarta Lucene[1], we demonstrate that supplementing keyword analysis with some syntactic and semantic information can indeed enhance the quality of information retrieval results. Publisher: Springer-Verlag Publication Type: Original Paper ISSN: 0302-9743 ISBN: 3-540-23659-7 Book DOI: 10.1007/b101591

2)
The Authors field consists of six authors

Title: Multilingual Retrieval Experiments with MIMOR at the University of Hildesheim Authors: Hackl, Renéa; Kölle, Ralpha; Mandl, Thomasa([EMAIL PROTECTED]); Ploedt, Alexandraa; Scheufen, Jan-Hendrika; Womser-Hacker, Christaa Affiliations: a. University of Hildesheim, Information Science, Marienburger Platz 22, D-31141 Hildesheim Abstract (English): Fusion and optimization based relevance judgements have proven to be successful strategies in information retrieval. In this years CLEF campaign we applied these strategies to multilingual retrieval with four languages. Our fusion experiments were carried out using freely available software. We used the snowball stemmers, internet translation services and the text retrieval tools in Lucene and the new MySQL. Publisher: Springer-Verlag Publication Type: Original Paper ISSN: 0302-9743 ISBN: 3-540-24017-9 Book DOI: 10.1007/b102261

3)

The Authors field consists of one author and only middle and first initial are provided

Title: Letter to the editor Author: Coleman, S.S.a Affiliations: a. Department of Orthopaedics, The University of Utah School of Medicine, 50 North Medical Drive, Salt Lake City, UT 84132, USA US Abstract: No Abstract Publisher: Springer-Verlag Item Identifier: 10.1007/s002640000113 Publication Type: Article ISSN: 0341-2695

Question about proximity searching and wildcards

Reply via email to