Hi agatone,
I agree with markharw00 that highlighting is the main reason to store fields
in lucene.
I want to remind Sascha Fahl that the stored field in lucene are not inside
the inverted index-structure.
The implemention of stored fields is very simple:
A (.fdt)-file with the pairs "field-name"/"field-value" in order of the
documents with a map "documentID" --> "first pair in file".
("Stored fields" in
http://lucene.apache.org/java/2_3_2/fileformats.html#Fields )
You can search with no stored fields at all.
I agree with chrislusf that you should store least data in lucene as
possible.
If you store large byte-arrays for "full view" you possible will have a lot
more IO even for hit-lists which does not use this byte-array. (you would
have to use FieldSelector, but still with FieldSelector a hard-drive don't
like to skip this field-data (= seek data)).
So if you have no highlighting at all, you could store a map "lucene
document id"(int) --> "database id"(hopefully also type int) in main memory,
and convert each lucene search result-list to a small select statement.
This is completely ok.
Lucene is very good in searching not in storing data.
Take a look to thread
http://www.nabble.com/Using-lucene-as-a-database...-good-idea-or-bad-idea--to18703473.html
In my company we decided to use lucene as storage. But we have now to
index-directories: one for searching and showing hit lists, the other as
storage with ony two fields: "key" & "data".
Performance tests shows that reading the storage is between 5 and 2 times
slower then a solution with database (this was OK for our use-case).
Best regards
Karsten
agatone wrote:
>
> Hi,
> I asked this question already on "lucene-general" list but also got
> advised to ask here too.
>
> I'm working on a project that has big database in the background (some
> tables have about 1500000 rows). We decided to use Lucene for "faster"
> search. Our search works similar as all searches: you write search string,
> get list of hits with detail link. But there is dilemma if we should store
> more data into index than it's needed.
>
> One side of developing team insists that we should use lucene index as
> somekind of storage for data so when you get hit, you go onto details and
> then again use lucene to find document that matches the selected ID and
> take the data from Lucene index. So in the end you end with copying
> complete database tables into the lucene index.
>
> Other side insists on storing to index only data that is displayed
> directly to the user when showing the search results list and needed for
> search criteria. When you go onto details, you have the matching ID so you
> can pickup that row from database by that ID rather than search it inside
> Lucene index.
>
> Can someone please describe drawbacks and advantages of both approaches.
> Actually can someone write down what's the actual profit, where and when
> of the Lucene itself in real production env.
>
> IT would be great if there is anyone who could write his experience with
> indexing and searching large amount of data.
>
>
> Thank you
>
--
View this message in context:
http://www.nabble.com/Lucene-vs.-Database-tp19755932p19757274.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]