Hi Michael,
I just find my problem. Du to a Lucene problem that it index “abcd.” Like a
word we added into our code a regex to add space between “abcd” and “.” (or
punctuation caracters).
So I update this regex and it wxorks fine.
The code before:
// Add space between word and punctuation caracters
String pattern = "(\\w)([\\.,;\\?!:])";
contents = contents.replaceAll(pattern, "$1 $2");
The code after:
// Not taking into account the figures if the amounts will be cut
// REGEX: all words ([a-zA-Z0-9]) followed by,;.? but not
immediately followed by punctuation
String pattern = "(\\w)([\\.,;\\?!:])(?!(\\s*[0-9]))";
contents = contents.replaceAll(pattern, "$1 $2");
Thanks a lot for your time.
Good bye
Jérémy GUYENOT | Responsable service R&D
[email protected]<mailto:[email protected]>
———————————
49, av. de la République 69200 Vénissieux | Tél : 04 72 51 77 55 | Fax : 04 72
50 43 13
WWW.EFALIA.COM<http://www.efalia.com/>
[cid:[email protected]]<http://www.efalia.com/>
Pour assurer un suivi technique de vos demandes veuillez passer par
Mantis<http://feqa.communauteged-multigest.fr/> notre outil en ligne.
P Eco-responsabilité, n'imprimez ce mail que si nécessaire
De : Michael McCandless [mailto:[email protected]]
Envoyé : mardi 27 septembre 2016 16:19
À : Lucene Users <[email protected]>; Jérémy GUYENOT
<[email protected]>
Cc : Jan Høydahl <[email protected]>
Objet : Re: Research problems on numeric values into text (with. or,)
Possibly you are using an analyzer that does not preserve decimal numbers as a
single token? Or, you are using a different analyzer at indexing time vs
search time?
Can you make a small test case showing the issue?
Mike McCandless
http://blog.mikemccandless.com
On Tue, Sep 27, 2016 at 3:06 PM, Jérémy GUYENOT
<[email protected]<mailto:[email protected]>> wrote:
Hello,
Sorry for this multi post but my first post was without answers so I try
another way.
What are you indexing?
I wish to index files such as that present in the "ZIP \ file" folder, which
contains decimal data (with. Or, as decimal separator).
How are you searching, and what did you expect to find?
I want to be able to search decimals because our tools stock large quantities
of such documents (eg invoices, quotes, orders).
What do you actually see and why is that a problem?
The search for the number 404 returns files containing 404.
The search for the number 50 returns files containing 50.
The search for the number 404.50 returns no results.
The text content was store in a TextField with
Field.Store.NO<http://Field.Store.NO>.
I try some of Analysers but the result is the same. I also try with 4.3.1 and
6.2.0 of lucene but the same.
I wish you can give me some details to search decimals values into text files.
In the zip you can find:
- File
o The file example containing decimals values
- Index
o The files of Lucene indexation
- Indexationlucene
o The code that we have to index file from our app
- RechercheLucene
o The code that we have to search into our app
Cordially
Jérémy GUYENOT | Responsable service R&D
[email protected]<mailto:[email protected]>
———————————
49, av. de la République 69200 Vénissieux | Tél : 04 72 51 77 55 | Fax : 04 72
50 43 13
WWW.EFALIA.COM<http://www.efalia.com/>
[cid:[email protected]]<http://www.efalia.com/>
Pour assurer un suivi technique de vos demandes veuillez passer par
Mantis<http://feqa.communauteged-multigest.fr/> notre outil en ligne.
P Eco-responsabilité, n'imprimez ce mail que si nécessaire
De : Jan Høydahl [mailto:[email protected]<mailto:[email protected]>]
Envoyé : mardi 27 septembre 2016 10:20
À : [email protected]<mailto:[email protected]>
Cc : Jérémy GUYENOT <[email protected]<mailto:[email protected]>>
Objet : Re: Research problems on numeric values into text (with. or,)
Please do not cross-post to multiple mailing lists.
This belongs to java-user only.
It is also generally better to describe the problem in more detail in the mail,
than attaching a zip.
- What are you indexing
- How are you searching, and what did you expect to find
- What do you actually see and why is that a problem?
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com<http://www.cominvent.com>
27. sep. 2016 kl. 10.15 skrev Jérémy GUYENOT
<[email protected]<mailto:[email protected]>>:
Hello,
we find research problems on numeric values into text (with. or,). Unable to
search 315.86 or 315.86.
We try custom Analysers without success either.
I enclose the code used to index and one to do the research.
I do not know if this is a bug on your side or problem Analyze of ours.
The problem is the same between version 4.3.1 and 6.2.0.
Thank you in advance for your quick return.
cordially
<LUCENE.zip>
---------------------------------------------------------------------
To unsubscribe, e-mail:
[email protected]<mailto:[email protected]>
For additional commands, e-mail:
[email protected]<mailto:[email protected]>
---------------------------------------------------------------------
To unsubscribe, e-mail:
[email protected]<mailto:[email protected]>
For additional commands, e-mail:
[email protected]<mailto:[email protected]>