Hi Mac,

Wow, when you dive in, you dive in deep! By the way, I have
not read the book, but I have looked at the FAQ.

1) It seems as though the Lucene search engine (LSE) deals with
indexes, rather than the email messages. Is that correct?

Ahead of time, Lucene reads email messages to produce an index. For
example, it took a few minutes to read the 12 thousand or so messages
in the Sundial list and create an index. After that, you are totally
correct. When someone searches for "Mac Oglesby" Lucene just consults
the index. Essentially all search software uses this technique because
it is fast.

2) LSE searches fields, and the default field is text. Correct?

Yes, LSE searches fields. We've set things up so the fields are
"subject" "date" "from" and "message". (We stick the message body in
"message"). Additionally there is the "all" field which is a
concatenation of all of the above. We've programmed the "all" field to
be searched by default.

Finally, there is another field where we store the messsage filename
(e.g. "msg00243.html"). I don't want to tell people the name of this
field since it is not useful for searching, and will likely change
anyway. Anyway, we need the filename to create links on the results
page. If this is confusing, just pretend I never wrote this
paragraph. :)

3) Thinking of the Sundial archive, how do I find out what fields are
available? "date" seems to be a field, as does "title" and "text."
What other fields are there?

The fields used by The Mail Archive are listed in the FAQ. My
impression is only a few people will ever want to get this
advanced with their queries.

http://www.mail-archive.com/faq.html#search

Is it possible for me to actually look at some typical indexes?

Sure. Here's the current sundial index, and a tool called "luke" to
inspect it. The tool is oriented towards programmers, but is still
kind of neat. Shows some of what is going on behind the scenes.

http://www.mail-archive.com/sundial_index.zip
http://www.getopt.org/luke/

4) One FAQ dealt with searching within results:
[...]
Can you give me a couple of simple examples of how [BooleanQuery]
might work?

Well, the FAQ is basically saying "hey, programmer! Do you want to
make a user interface like Eudora's search-within-a-search? If so,
here's the Lucene command you should be using." However, Jeff and I
haven't made such a user interface and we aren't using that command.
So this is basically out of you hands. I suppose you can lobby us to
change the user interface, which may or may not be successful.

However! You can get a functional equivalent to "search within a
search" just by making longer and longer query strings. For example:

Mac
Mac AND ebay
Mac AND ebay AND date:2005*
Mac AND ebay AND date:2005 AND sun

By the way, just like the global search engines, we've prgrammed AND
to be implicit. So the following two queries are equivalent.

Mac AND ebay AND date:2005 AND sun
Mac ebay date:2005* sun

_______________________________________________
Discussion list for The Mail Archive
Gossip@jab.org
http://jab.org/cgi-bin/mailman/listinfo/gossip

Reply via email to