31 jan 2007 kl. 12.25 skrev Christoph Pächter:

I was wondering, if there is anywhere a table (similar to Table 1.2 An overview of different field types, their characteristics, and their usage in Lucene in
Action), listing the possible methods and their usage.

Implementations will differ, for example:


Store   |TermVector              |Index          |reasonable |Usage
YES     |NO                      |NO             |1          |URLs
| telephone number

You never have to store anything in the index, perhaps that information is persistent somewhere else?

If you use a term vector or not depends very little on what kind of information you store in there, it is up to what analysis you plan to include the documents in. Highlighting? More like this? Neural networks?

Some are more than happy with one large token. Other people might want to tokenize the exact same information.

An URL in [protocol://host:port/path], a phone number in country-, area, and district parts.

It really up to each and every implementer to decide what settings is best for them.

Also, a Lucene index is not made up of static rows and columns the way a relational database is. The spoon does not exists. You can bend it any way you want. Documents in a corpus can share field that share names but not settings. Perhaps you only want to index phone numbers in a specific area code.

--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to