Hi Donal,

We released SIREn [1], a plugin for Lucene that allows indexing and querying of semi-structured data, a few days ago. Your use case seems to match perfectly what SIREn can do.

SIREn enables the indexing of semi-structured data into a Lucene field, and offers additional query components to build programmatically semi-structured queries. SIREn is currently indexing tabular data, i.e. data composed of rows and columns.

For example, for your use case, you can create a SIREn's field that will contain the following table

   Course.name   Attendance.mandatory
   ----------------------------------
cooking N art Y

Course.name is the first column of the SIREn's table, Attendance.mandatory the second column. SIREn does not have limitation on the number of columns, which means that you can index additional information than the course.name or attendance.mandatory, for example the course.location or professor.name. Each row (or SIREn's tuple) is one of your database entry. For example, the first row is {cooking, N}. The Student.name is index into a normal Lucene's field in order to be able to retrieve it. To summarize, your Lucene's document schema will look like:

Doc {
- name: Bob
- content: {cooking, N}, {art, Y}
}

The 'content' field is created using SIREn, and index two tuples: {cooking, N} and {art, Y}. Then, you can retrieve, using SIREn's query components, all documents that matches certain tuples, such as {cooking, Y}. In this example, this will return nothing since there is no tuples containing {cooking, Y}.

You can have a look at the IMDB indexing and querying example [2]. It shows how to index and query tabular data of this kind. If you need some help, feel free to ask your questions in our mailing list.

[1] http://siren.sindice.com
[2] https://dev.deri.ie/confluence/display/SIREn/Indexing+and+Searching+Tabular+Data

Best Regards,
--
Renaud Delbru

Donal Murtagh wrote:
Hi,

I'm trying to use Lucene to query a domain that has the following structure

    Student 1-------* Attendance *---------1 Course

The data in the domain is summarised below

    Course.name   Attendance.mandatory   Student.name
    -------------------------------------------------
    cooking                        N                      Bob
    art                                Y                      Bob

If I execute the query "+courseName:cooking AND +mandatory:Y"
it
returns Bob, because Bob is attending the cooking course, and Bob is
also attending a mandatory course. However, what I *really* want to
query for is "students attending a mandatory cooking course", which in
this case would return nobody. Is it possible to formulate this as a
Lucene query?

For the sake of completeness, the domain classes
themselves are shown below. These classes are Grails domain classes,
but I'm using the standard Compass annotations and Lucene query syntax.

Thanks!
- Don

    @Searchable
    class Student {
@SearchableProperty(accessor = 'property')
        String name
static hasMany = [attendances: Attendance] @SearchableId(accessor = 'property')
        Long id
@SearchableComponent
        Set<Attendance> getAttendances() {
            return attendances
        }
    }
@Searchable(root = false)
    class Attendance {
static belongsTo = [student: Student, course: Course] @SearchableProperty(accessor = 'property')
        String mandatory = "Y"
@SearchableId(accessor = 'property')
        Long id
@SearchableComponent
        Course getCourse() {
            return course
        }
    }
@Searchable(root = false)
    class Course {
@SearchableProperty(accessor = 'property', name = "courseName") String name @SearchableId(accessor = 'property')
        Long id
    }




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to