Caching and paging search results

2004-03-08 Thread Clandes Tino
Hi all, 
could someone describe his expirience in
implementation of caching, sorting and paging search
results.
Is Stateful Session bean appropriate for this?
My wish is to obtain all search hits only in first
call, and after that, to iterate through Hit
Collection and display cached results.
I have checked SearchBean in contribution section, but
it does not provide real caching and paging.
 
Regards and thanx in advance!
Milan






___
Yahoo! Messenger - Communicate instantly..."Ping" 
your friends today! Download Messenger Now 
http://uk.messenger.yahoo.com/download/index.html

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Highlighting problem

2004-03-02 Thread Clandes Tino
Hi all, 
I have incorporated highlighting package
(http://home.clara.net/markharwood/lucene/highlight.htm)
but I am worried about the following issue.

If I want to display "body" field content’s best
segments, containing term from query highlighted, I
have to define Field "body" as Stored.

So, complete process would be like this:
Index related work:
1. parse uploaded document into temp ASCII file
2. read ASCII file and append its content to String 
3. make Field as Text(String name, String value)

Search related work:
1. Retrieve field “body” String value from the hit
(again - only way to do this - as I have understood –
is to declare Field “body” as Stored)
2. pass the String value to Highlighter methods.
 
Besides that in Lucene FAQ I have read that “body”
fields are not good candidates to be declared as
Stored. Index size is one obvious reason, but I am
wondering, how it implies Lucene search performance in
general?

Has somebody an idea how to include highlight
functionality in Unstored Field?

Regards and thanx in advance
Milan 






___
Yahoo! Messenger - Communicate instantly..."Ping" 
your friends today! Download Messenger Now 
http://uk.messenger.yahoo.com/download/index.html

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Multilanguage and wildcard support

2004-02-23 Thread Clandes Tino
Hi, all.
I would like to describe my dilemma about analyzing
stuff.

2. Multilanguage and wildcard support
In Lucene 1.3 Final I have found very useful class
PerFieldAnalyzerWrapper, which helped me to specify
separate analyzer for each field.
But, full text content - obtained after parsing word,
excel, xml or other kind of document) should be
searchable using stemming capabilities and also should
support wildcard queries.
I implemented this solution:
- indexing of full content is done in two separate
fields, because wildcard queries do not pass through
analyzer, as I have read in this mailing archive.
Field1 (“stemmingbody”) - matching snowball analyzer
is used.
Field2 (“plainbody”) - Whitespace analyzer is used.
So, when user searches for some term in item’s
content, I parse the query and if it contains wild
character, search in "plainbody" is performed;
otherwise I search in "stemmingbody", expecting better
search results, that way.
Is there a better way to do this, e.g. not to index
full content in two separate fields, but only one (I
tokenize it, index it, but not store it)?

Thanks for any opinion or suggestion in advance!
Best regards
Milan Agatonovic 






___
Yahoo! Messenger - Communicate instantly..."Ping" 
your friends today! Download Messenger Now 
http://uk.messenger.yahoo.com/download/index.html

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene and Message Driven Bean

2004-02-23 Thread Clandes Tino
Hi all, 
I am new at this mailing list, although I have been
using Lucene for a quite long time.
I have implemented Lucene API for a pretty big
multi-language groupware application, but I still have
some problems and dilemmas.
I should not use Lucene indexing in schedule procedure
(as I found like common way to use Lucene), because I
am supposed to provide searchable item, as soon as it
is uploaded (document, meeting, forum article etc)
So, I made a solution (described under) and would like
to hear from experts in this field if it is a good or
bad one in general, suggestions and opinions. 
1. Indexing process:
After upload (parallel storage in DB and File System)
I call my Stateless Session Bean which puts uploaded
item (wrapped in JMS Message) in Queue. Message Driven
Bean (configured as One Instance in Pool – under
JBoss) receives message and calls Lucene methods which
then perform indexing stuff.
Dilemma: Is there better way to do this, providing the
same functionality?
Problem: I face the situation that IOException is
raised after call IndexWriter constructor
IndexWriter(Directory d, Analyzer a, final boolean
create) with different messages.
- Index locked for write
- Lock obtain timed out
- Other messages if index is corrupted (no segments
file e.g - I deleted it on purpose)
The thing I would like to do is:
- If Index is locked due to any reason, rollback the
transaction – bring the message back into queue.
- If Index is corrupted, discard the messages in queue
and send mail to administrator.
Do you find an idea to subclass IOException and
somehow treat differently situation when index is
locked from when it is corrupted, appropriate?

Thanks a lot in advance.
Next problem – dilemma is regarding analyzing content
and is to be followed.
Best regards
Milan Agatonovic






___
Yahoo! Messenger - Communicate instantly..."Ping" 
your friends today! Download Messenger Now 
http://uk.messenger.yahoo.com/download/index.html

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]