Progress bar for Lucene

2004-07-28 Thread Hannah c
Hi,
Is there anything in lucene that would help with the implementation of a 
progress bar. Somewhere I could throw an event that says the search is 10%, 
20% complete etc.  Or is there already an implementation of a progress bar 
available for lucene.

Thanks for your help,
Hannah

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: AW: Problem indexing Spanish Characters

2004-05-19 Thread Hannah c
Hi,
I had a quick look at the sandbox but my problem is that I don't need a 
spanish stemmer. However there must be a replacement tokenizer that supports 
foreign characters to go along with the foreign language snowball stemmers. 
Does anyone know where I could find one?

In answer to Peters question -yes I'm also using "UTF-8" encoded XML 
documents as the source.
I also put below an example of what is happening when I tokenize the text 
using the StandardTokenizer below.

Thanks Hannah

--text I'm trying to index
century palace known as la “Fundación Hospital de Na. Señora del Pilar”
-tokens outputed from StandardTokenizer
century
palace
known
as
la
â
FundaciÃ*
n   *
Hospital
de
Na
Seà *
ora   *
del
Pilar
â
---

From: "Peter M Cipollone" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Subject: Re: Problem indexing Spanish Characters
Date: Wed, 19 May 2004 11:41:28 -0400
could you send some sample text that causes this to happen?
- Original Message -
From: "Hannah c" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, May 19, 2004 11:30 AM
Subject: Problem indexing Spanish Characters
>
> Hi,
>
> I  am indexing a number of English articles on Spanish resorts. As such
> there are a number of spanish characters throught the text, most of 
these
> are in the place names which are the type of words I would like to use 
as
> queries. My problem is with the StandardTokenizer class which cuts the
word
> into two when it comes across any of the spanish characters. I had a 
look
at
> the source but the code was generated by JavaCC and so is not very
readable.
> I was wondering if there was a way around this problem or which area of
the
> code I would need to change to avoid this.
>
> Thanks
> Hannah Cumming
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>



From: PEP AD Server Administrator 
<[EMAIL PROTECTED]>
Reply-To: "Lucene Users List" <[EMAIL PROTECTED]>
To: "'Lucene Users List'" <[EMAIL PROTECTED]>
Subject: AW: Problem indexing Spanish Characters
Date: Wed, 19 May 2004 18:08:56 +0200

Hi Hannah, Otis
I cannot help but I have excatly the same problems with special german
charcters. I used snowball analyser but this does not help because the
problem (tokenizing) appears before the analyser comes into action.
I just posted the question "Problem tokenizing UTF-8 with geman umlauts"
some minutes ago which describes my problem and Hannahs seem to be similar.
Do you have also UTF-8 encoded pages?
Peter MH
-Ursprüngliche Nachricht-
Von: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 19. Mai 2004 17:42
An: Lucene Users List
Betreff: Re: Problem indexing Spanish Characters
It looks like Snowball project supports Spanish:
http://www.google.com/search?q=snowball spanish
If it does, take a look at Lucene Sandbox.  There is a project that
allows you to use Snowball analyzers with Lucene.
Otis
--- Hannah c <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I  am indexing a number of English articles on Spanish resorts. As
> such
> there are a number of spanish characters throught the text, most of
> these
> are in the place names which are the type of words I would like to
> use as
> queries. My problem is with the StandardTokenizer class which cuts
> the word
> into two when it comes across any of the spanish characters. I had a
> look at
> the source but the code was generated by JavaCC and so is not very
> readable.
> I was wondering if there was a way around this problem or which area
> of the
> code I would need to change to avoid this.
>
> Thanks
> Hannah Cumming
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Hannah 
Cumming
[EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Problem indexing Spanish Characters

2004-05-19 Thread Hannah c
Hi,
I  am indexing a number of English articles on Spanish resorts. As such 
there are a number of spanish characters throught the text, most of these 
are in the place names which are the type of words I would like to use as 
queries. My problem is with the StandardTokenizer class which cuts the word 
into two when it comes across any of the spanish characters. I had a look at 
the source but the code was generated by JavaCC and so is not very readable. 
I was wondering if there was a way around this problem or which area of the 
code I would need to change to avoid this.

Thanks
Hannah Cumming

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Dimitry's Term Vector Patch for lucene 1.2

2004-04-06 Thread Hannah c
I working with java 1.1 so the earlier version of lucene is more compatable.

Hannah


From: Otis Gospodnetic <[EMAIL PROTECTED]>
Reply-To: "Lucene Users List" <[EMAIL PROTECTED]>
To: Lucene Users List <[EMAIL PROTECTED]>
Subject: Re: Dimitry's Term Vector Patch for lucene 1.2
Date: Tue, 6 Apr 2004 04:03:11 -0700 (PDT)
Get the latest Lucene version, it includes that patch.

Otis

--- Hannah c <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I am working with lucene 1.2 and want to try the Term Vector Patch. I
> have
> not been able to install the diff patch I downloaded from the mail
> archives
> as I do not have CVS. Is there anywhere I could get the full source
> of all
> the files changed or would someone be able to send me a zip of the
> files I
> need. It would be greatly appreciated.
>
> Thanks Hannah
>
> _
> Sign-up for a FREE BT Broadband connection today!
> http://www.msn.co.uk/specials/btbroadband
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Hannah 
Cumming
[EMAIL PROTECTED]

_
It's fast, it's easy and it's free. Get MSN Messenger today! 
http://www.msn.co.uk/messenger

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Dimitry's Term Vector Patch for lucene 1.2

2004-04-06 Thread Hannah c
Hi,

I am working with lucene 1.2 and want to try the Term Vector Patch. I have 
not been able to install the diff patch I downloaded from the mail archives 
as I do not have CVS. Is there anywhere I could get the full source of all 
the files changed or would someone be able to send me a zip of the files I 
need. It would be greatly appreciated.

Thanks Hannah

_
Sign-up for a FREE BT Broadband connection today! 
http://www.msn.co.uk/specials/btbroadband

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]