from:"Ernesto De Santis"

Re: Zip Files

2005-03-01 Thread Ernesto De Santis

Hello
first, you need a parser for each file type: pdf, txt, word, etc.
and use a java api to iterate zip content, see:
http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/ZipInputStream.html
use getNextEntry() method
little example:
ZipInputStream zis = new ZipInputStream(fileInputStream);
ZipEntry zipEntry;
while(zipEntry = zis.getNextEntry() != null){
   //use zipEntry to get name, etc.
   //get properly parser for current entry
   //use parser with zis (ZipInputStream)
}
good luck
Ernesto
Luke Shannon escribió:
Hello;
Anyone have an ideas on how to index the contents within zip files?
Thanks,
Luke
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 

--
Ernesto De Santis - Colaborativa.net
Córdoba 1147 Piso 6 Oficinas 3 y 4
(S2000AWO) Rosario, SF, Argentina.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Optimize not deleting all files

2005-02-04 Thread Ernesto De Santis

Hi all
We have the same problem.
We guess that the problem is that windows lock files.
Our enviroment:
Windows 2000
Tomcat 5.5.4
Ernesto.
[EMAIL PROTECTED] escribió:
Hi,
When I run an optimize in our production environment, old index are
left in the directory and are not deleted.  

My understanding is that an
optimize will create new index files and all existing index files should be
deleted.  Is this correct?
We are running Lucene 1.4.2 on Windows.  

Any help is appreciated.  Thanks!
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.8.5 - Release Date: 03/02/2005
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Disk space used by optimize - non space in disk corrupts index.

2005-02-04 Thread Ernesto De Santis

Hi all
We have a big index and a little space in disk.
When optimize and all space is consumed, our index is corrupted.
segments file point to nonexistent files.
Enviroment:
java 1.4.2_04
W2000 SP4
Tomat 5.5.4
Bye,
Ernesto.
Yura Smolsky escribió:
Hello, Otis.
There is a big difference when you use compound index format or
multiple files. I have tested it on the big index (45 Gb). When I used
compound file then optimize takes 3 times more space, b/c *.cfs needs
to be unpacked.
Now I do use non compound file format. It needs like twice as much
disk space.
OG Have you tried using the multifile index format?  Now I wonder if there
OG is actually a difference in disk space cosumed by optimize() when you
OG use multifile and compound index format...
OG Otis
OG --- Kauler, Leto S [EMAIL PROTECTED] wrote:
 

Our copy of LIA is in the mail ;)
Yes the final three files are: the .cfs (46.8MB), deletable (4
bytes),
and segments (29 bytes).
--Leto

 

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 

Hello,
Yes, that is how optimize works - copies all existing index 
segments into one unified index segment, thus optimizing it.

see hit #1:
   

http://www.lucenebook.com/search?query=optimize+disk+space
 

However, three times the space sounds a bit too much, or I 
make a mistake in the book. :)

You said you end up with 3 files - .cfs is one of them, right?
Otis
--- Kauler, Leto S [EMAIL PROTECTED] wrote:
   

Just a quick question:  after writing an index and then calling
optimize(), is it normal for the index to expand to about 
 

three times 
   

the size before finally compressing?
In our case the optimise grinds the disk, expanding the index
 

into 
 

many files of about 145MB total, before compressing down to three
 

files of about 47MB total.  That must be a lot of disk activity
 

for 
 

the people with multi-gigabyte indexes!
Regards,
Leto
 

CONFIDENTIALITY NOTICE AND DISCLAIMER
Information in this transmission is intended only for the person(s)
to whom it is addressed and may contain privileged and/or
confidential information. If you are not the intended recipient, any
disclosure, copying or dissemination of the information is
unauthorised and you should delete/destroy all copies and notify the
sender. No liability is accepted for any unauthorised use of the
information contained in this transmission.
This disclaimer has been automatically added.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]
 


OG -
OG To unsubscribe, e-mail: [EMAIL PROTECTED]
OG For additional commands, e-mail:
OG [EMAIL PROTECTED]
Yura Smolsky,

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.8.5 - Release Date: 03/02/2005
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene and multiple languages

2005-01-20 Thread Ernesto De Santis

Hi Aurora
I develop a tool with this multiple languages issue. I found very useful
an nuke library language-identifier. This jar have nuke dependencies,
but I delete all unnecessary code (for me obvious).
This language-identifier that I use work fine and is very simple:
For example:
LanguageIdentifier languageIdentifier = LanguageIdentifier.getInstance();
String userInputText = free text;
String language = languageIdentifier.identify(text);
This work for 11 languages: English, Spanish, Portuguese, Dutch, German,
French, Italian, and Others.
I can send you this touched jar, but remember that this jar is from
Nuke, for copyright (or left :).
http://www.nutch.org/LICENSE.txt
More comments above...
aurora escribió:
I'm trying to build some web search tool that could work for multiple  
languages. I understand that Lucene is shipped with StandardAnalyzer 
plus  a German and Russian analyzers and some more in the sandbox. And 
that  indexing and searching should use the same analyzer.

Now let's said I have an index with documents in multiple languages 
and  analyzed by an assortment of analyzers. When user enter a query, 
what  analyzer should be used? Should the user be asked for the 
language  upfront? What to expect when the analyzer and the document 
doesn't match?  Let's said the query is parsed using StandardAnalyzer. 
Would it match any  documents done in German analyzer at all. Or would 
it end up in poor  result?

When this happen, in the major cases you do not obtain matchs.
Also is there a good way to find out the languages used in a web 
page?  There is a 'content-langage' header in http and a 'lang' 
attribute in  HTML. Looks like people don't really use them. How can 
we recognize the  language?

With language identifier. :)
Even more interesting is multiple languages used in one document, 
let's  say half English and half French. Is there a good way to deal 
with those  cases?

Language identifier only return one language. I look into
language-identifier and work with a score for each language, and return
the language with greater value.
Maybe you can modify the language-identifier for take the most greater
values.
Bye
Ernesto.
Thanks for any guidance.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: where is the SnowBallAnalyzer?

2004-09-08 Thread Ernesto De Santis

Is in snowball-1.0.jar

I sent you it in private email.

Bye
Ernesto.

- Original Message - 
From: Wermus Fernando [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, September 08, 2004 1:12 PM
Subject: where is the SnowBallAnalyzer?

I have to look better, but why the SnowBallAnalizer isn't in 
org.apache.lucene.analysis.snowball.SnowballAnalyzer package?

I have lucene 1.4.

I'm doing my own spanish stemmer.

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.754 / Virus Database: 504 - Release Date: 06/09/2004

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

spanish stemmer

2004-08-23 Thread Ernesto De Santis

Hello

I use the Snowball jar for implement my SpanishAnalyzer. I found that the
words finished in 'bol' are not stripped.
For example:

In spanish for say basketball, you can say basquet or basquetbol. But for
SpanishStemmer are different words.
Idem with voley and voleybol.

Not idem with futbol (football), we not say fut for futbol. But 'fut' don´t
exist in spanish.

you think that I are correct?

you can change this?

Ernesto.


---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: spanish stemmer

2004-08-23 Thread Ernesto De Santis

Yes, is too easy.

You need do a wrapper for spanish Snowball initilization.

analyzer = new SnowballAnalyzer(Spanish, SPANISH_STOP_WORDS);

above the complete code.

Bye, Ernesto.


--
public class SpanishAnalyzer extends Analyzer {

private static SnowballAnalyzer analyzer;


private String SPANISH_STOP_WORDS[] = {

un, una, unas, unos, uno, sobre, todo, también, tras,
otro, algún, alguno, alguna,

algunos, algunas, ser, es, soy, eres, somos, sois, estoy,
esta, estamos, estais,

estan, en, para, atras, porque, por qué, estado, estaba,
ante, antes, siendo,

ambos, pero, por, poder, puede, puedo, podemos, podeis,
pueden, fui, fue, fuimos,

fueron, hacer, hago, hace, hacemos, haceis, hacen, cada,
fin, incluso, primero,

desde, conseguir, consigo, consigue, consigues, conseguimos,
consiguen, ir, voy, va,

vamos, vais, van, vaya, bueno, ha, tener, tengo, tiene,
tenemos, teneis, tienen,

el, la, lo, las, los, su, aqui, mio, tuyo, ellos,
ellas, nos, nosotros, vosotros,

vosotras, si, dentro, solo, solamente, saber, sabes, sabe,
sabemos, sabeis, saben,

ultimo, largo, bastante, haces, muchos, aquellos, aquellas,
sus, entonces, tiempo,

verdad, verdadero, verdadera, cierto, ciertos, cierta,
ciertas, intentar, intento,

intenta, intentas, intentamos, intentais, intentan, dos, bajo,
arriba, encima, usar,

uso, usas, usa, usamos, usais, usan, emplear, empleo,
empleas, emplean, ampleamos,

empleais, valor, muy, era, eras, eramos, eran, modo, bien,
cual, cuando, donde,

mientras, quien, con, entre, sin, trabajo, trabajar,
trabajas, trabaja, trabajamos,

trabajais, trabajan, podria, podrias, podriamos, podrian,
podriais, yo, aquel, mi,

de, a, e, i, o, u};

public SpanishAnalyzer() {

analyzer = new SnowballAnalyzer(Spanish, SPANISH_STOP_WORDS);

}

public SpanishAnalyzer(String stopWords[]) {

analyzer = new SnowballAnalyzer(Spanish, stopWords);

}

public TokenStream tokenStream(String fieldName, Reader reader) {

return analyzer.tokenStream(fieldName, reader);

}

}



- Original Message - 
From: Chad Small [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, August 23, 2004 3:49 PM
Subject: RE: spanish stemmer


Do you mind sharing how you implemented your SpanishAnalyzer using Snowball?

Sorry I can't help with your question.  I am trying to implement Snowball
Spanish or a Spanish Analyzer in Lucene.

thanks,
chad.

-Original Message-
From: Ernesto De Santis [mailto:[EMAIL PROTECTED]
Sent: Monday, August 23, 2004 8:30 AM
To: Lucene Users List
Subject: spanish stemmer


Hello

I use the Snowball jar for implement my SpanishAnalyzer. I found that the
words finished in 'bol' are not stripped.
For example:

In spanish for say basketball, you can say basquet or basquetbol. But for
SpanishStemmer are different words.
Idem with voley and voleybol.

Not idem with futbol (football), we not say fut for futbol. But 'fut' don´t
exist in spanish.

you think that I are correct?

you can change this?

Ernesto.


---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: spanish stemmer

2004-08-23 Thread Ernesto De Santis

Because the SnowballAnalyzer, and SpanishStemmer don´t have a default
stopword set.

SnowballAnalyzer constructor:

  /** Builds the named analyzer with no stop words. */
  public SnowballAnalyzer(String name) {
this.name = name;
  }

Note the comment.

Bye,
Ernesto.

- Original Message - 
From: Chad Small [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, August 23, 2004 4:57 PM
Subject: RE: spanish stemmer


Excellent Ernesto.

Was there a reason you used your own stop word list and not just the default
constructor SnowballAnalyzer(Spanish)?

thanks,
chad.

-Original Message-
From: Ernesto De Santis [mailto:[EMAIL PROTECTED]
Sent: Monday, August 23, 2004 2:03 PM
To: Lucene Users List
Subject: Re: spanish stemmer


Yes, is too easy.

You need do a wrapper for spanish Snowball initilization.

analyzer = new SnowballAnalyzer(Spanish, SPANISH_STOP_WORDS);

above the complete code.

Bye, Ernesto.


--
public class SpanishAnalyzer extends Analyzer {

private static SnowballAnalyzer analyzer;


private String SPANISH_STOP_WORDS[] = {

un, una, unas, unos, uno, sobre, todo, también, tras,
otro, algún, alguno, alguna,

algunos, algunas, ser, es, soy, eres, somos, sois, estoy,
esta, estamos, estais,

estan, en, para, atras, porque, por qué, estado, estaba,
ante, antes, siendo,

ambos, pero, por, poder, puede, puedo, podemos, podeis,
pueden, fui, fue, fuimos,

fueron, hacer, hago, hace, hacemos, haceis, hacen, cada,
fin, incluso, primero,

desde, conseguir, consigo, consigue, consigues, conseguimos,
consiguen, ir, voy, va,

vamos, vais, van, vaya, bueno, ha, tener, tengo, tiene,
tenemos, teneis, tienen,

el, la, lo, las, los, su, aqui, mio, tuyo, ellos,
ellas, nos, nosotros, vosotros,

vosotras, si, dentro, solo, solamente, saber, sabes, sabe,
sabemos, sabeis, saben,

ultimo, largo, bastante, haces, muchos, aquellos, aquellas,
sus, entonces, tiempo,

verdad, verdadero, verdadera, cierto, ciertos, cierta,
ciertas, intentar, intento,

intenta, intentas, intentamos, intentais, intentan, dos, bajo,
arriba, encima, usar,

uso, usas, usa, usamos, usais, usan, emplear, empleo,
empleas, emplean, ampleamos,

empleais, valor, muy, era, eras, eramos, eran, modo, bien,
cual, cuando, donde,

mientras, quien, con, entre, sin, trabajo, trabajar,
trabajas, trabaja, trabajamos,

trabajais, trabajan, podria, podrias, podriamos, podrian,
podriais, yo, aquel, mi,

de, a, e, i, o, u};

public SpanishAnalyzer() {

analyzer = new SnowballAnalyzer(Spanish, SPANISH_STOP_WORDS);

}

public SpanishAnalyzer(String stopWords[]) {

analyzer = new SnowballAnalyzer(Spanish, stopWords);

}

public TokenStream tokenStream(String fieldName, Reader reader) {

return analyzer.tokenStream(fieldName, reader);

}

}



- Original Message - 
From: Chad Small [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, August 23, 2004 3:49 PM
Subject: RE: spanish stemmer


Do you mind sharing how you implemented your SpanishAnalyzer using Snowball?

Sorry I can't help with your question.  I am trying to implement Snowball
Spanish or a Spanish Analyzer in Lucene.

thanks,
chad.

-Original Message-
From: Ernesto De Santis [mailto:[EMAIL PROTECTED]
Sent: Monday, August 23, 2004 8:30 AM
To: Lucene Users List
Subject: spanish stemmer


Hello

I use the Snowball jar for implement my SpanishAnalyzer. I found that the
words finished in 'bol' are not stripped.
For example:

In spanish for say basketball, you can say basquet or basquetbol. But for
SpanishStemmer are different words.
Idem with voley and voleybol.

Not idem with futbol (football), we not say fut for futbol. But 'fut' don´t
exist in spanish.

you think that I are correct?

you can change this?

Ernesto.


---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: spanish stemmer

2004-08-23 Thread Ernesto De Santis

Hi Chad

 One more question to the group.  From what I have gathered, my choices for
indexing and querying Spanish content are:

 1.  StandardAnalyzer (I read that this analyzer could be used for
European languages)

The StandardAnalyzer not is for European languages, is like a generic
analyzer.

 2.  SnowballAnalyzer(Spanish, SPANISH_STOP_WORDS);  --custom stop words
from Ernesto class below

 Can I assume that choice 2 would be the better for Spanish content?

Yes, is too better.

For example:
In StandardAnalyzer, caminar, caminantes, camino, etc, are differents words,
only return hit if the match is exactly.
In SpanishAnalyzer, are the same word. This three words are conjugations of
caminar. If in your index, one document have the word caminante, you can
get the hit with the differents conjugations of this verb.

The operation of stemmers is strip the words according to the rules of the
language (spanish for us).
caminar, caminantes, camino are stored as camin. (Camin not exist in
spanish).

This improvement the quality of hits


thanks,
 chad.

Bye, Ernesto.


-Original Message-
From: Ernesto De Santis [mailto:[EMAIL PROTECTED]
Sent: Monday, August 23, 2004 3:31 PM
To: Lucene Users List
Subject: Re: spanish stemmer


Because the SnowballAnalyzer, and SpanishStemmer don´t have a default
stopword set.

SnowballAnalyzer constructor:

  /** Builds the named analyzer with no stop words. */
  public SnowballAnalyzer(String name) {
this.name = name;
  }

Note the comment.

Bye,
Ernesto.

- Original Message - 
From: Chad Small [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, August 23, 2004 4:57 PM
Subject: RE: spanish stemmer


Excellent Ernesto.

Was there a reason you used your own stop word list and not just the default
constructor SnowballAnalyzer(Spanish)?

thanks,
chad.

-Original Message-
From: Ernesto De Santis [mailto:[EMAIL PROTECTED]
Sent: Monday, August 23, 2004 2:03 PM
To: Lucene Users List
Subject: Re: spanish stemmer


Yes, is too easy.

You need do a wrapper for spanish Snowball initilization.

analyzer = new SnowballAnalyzer(Spanish, SPANISH_STOP_WORDS);

above the complete code.

Bye, Ernesto.


--
public class SpanishAnalyzer extends Analyzer {

private static SnowballAnalyzer analyzer;


private String SPANISH_STOP_WORDS[] = {

un, una, unas, unos, uno, sobre, todo, también, tras,
otro, algún, alguno, alguna,

algunos, algunas, ser, es, soy, eres, somos, sois, estoy,
esta, estamos, estais,

estan, en, para, atras, porque, por qué, estado, estaba,
ante, antes, siendo,

ambos, pero, por, poder, puede, puedo, podemos, podeis,
pueden, fui, fue, fuimos,

fueron, hacer, hago, hace, hacemos, haceis, hacen, cada,
fin, incluso, primero,

desde, conseguir, consigo, consigue, consigues, conseguimos,
consiguen, ir, voy, va,

vamos, vais, van, vaya, bueno, ha, tener, tengo, tiene,
tenemos, teneis, tienen,

el, la, lo, las, los, su, aqui, mio, tuyo, ellos,
ellas, nos, nosotros, vosotros,

vosotras, si, dentro, solo, solamente, saber, sabes, sabe,
sabemos, sabeis, saben,

ultimo, largo, bastante, haces, muchos, aquellos, aquellas,
sus, entonces, tiempo,

verdad, verdadero, verdadera, cierto, ciertos, cierta,
ciertas, intentar, intento,

intenta, intentas, intentamos, intentais, intentan, dos, bajo,
arriba, encima, usar,

uso, usas, usa, usamos, usais, usan, emplear, empleo,
empleas, emplean, ampleamos,

empleais, valor, muy, era, eras, eramos, eran, modo, bien,
cual, cuando, donde,

mientras, quien, con, entre, sin, trabajo, trabajar,
trabajas, trabaja, trabajamos,

trabajais, trabajan, podria, podrias, podriamos, podrian,
podriais, yo, aquel, mi,

de, a, e, i, o, u};

public SpanishAnalyzer() {

analyzer = new SnowballAnalyzer(Spanish, SPANISH_STOP_WORDS);

}

public SpanishAnalyzer(String stopWords[]) {

analyzer = new SnowballAnalyzer(Spanish, stopWords);

}

public TokenStream tokenStream(String fieldName, Reader reader) {

return analyzer.tokenStream(fieldName, reader);

}

}





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Index and Search question in Lucene.

2004-08-21 Thread Ernesto De Santis

Hi Dimitri

What analyzer you use?

You need take carefully with Keyword fields and analyzers. When you
index a Document, the fields that have set tokenized = false, like
Keyword, are not analyzed. 
In search time you need parse the query with your analyzer but not
analyze the untokenized fields, like your filename.

 I can do a search as this
 +contents:SomeWord  +filename:SomePath
 

The sintaxis is rigth, but if you search +filename:somepath, find only
this file.

For example, 
+content:version +filename:/my/path/myfile.ext

Only can found myfile.ext, and if this file don't content version, not
going to find nothing. This is because you use +. + set the term
required.

You can see the queries sintaxis in lucene site.

http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.searchtoc=faq#q5

good luck.

Bye
Ernesto.


El dom, 15 de 08 de 2004 a las 17:13, Dmitrii PapaGeorgio escribi:
 Ok so when I index a file such as below
 
 Document doc = new Document();
 doc.Add(Field.Text(contents, new StreamReader(dataDir)));
 doc.Add(Field.Keyword(filename, dataDir));
 
 I can do a search as this
 +contents:SomeWord  +filename:SomePath
 
 Correct?
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

javadoc api

2004-08-17 Thread Ernesto De Santis

Hello Lucene developers

A litle issue about a Field documentation.

In Field class on getBoost() method it says:

Returns the boost factor for hits on any field of this document.

I think that this comment are copied from Document class and forgot change
it.

Bye
Ernesto.


---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

parce Query

2004-08-05 Thread Ernesto De Santis

Hello

What is the best practice to parce a Query object.?

QueryParcer only work with String, but if I have a Query?

I want that anothers applications build yours lucene Query´s, and I want
parse this when this applications do search with my server application. In
my server application I store the configuration, languages, analyzers,
IndexSearchers, how are indexed each field (Keyword or not), etc.

then, I need parce Query to Query with the appropriate analyzer over
appropriate terms (fields).

Thanks for your attention.
Ernesto.


---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.732 / Virus Database: 486 - Release Date: 29/07/2004


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Weighting database fields

2004-07-21 Thread Ernesto De Santis

Hi Erik

 On Jul 21, 2004, at 11:40 AM, Anson Lau wrote:
  Is there any benefit to set the boost during indexing rather than set
  it
  during query?

 It allows setting each document differently.  For example,
 TheServerSide is using field-level boosts at index time to control
 ordering by date, such that newer articles come up first.  This could
 not be done at query time since each document gets a different field
 boost.

If some field have set a boots value in index time, and when in search time
the query have another boost value for this field, what happens?
which value is used for boost?

Bye,
Ernesto.


---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.725 / Virus Database: 480 - Release Date: 19/07/2004


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: languages lucene can support

2004-07-01 Thread Ernesto De Santis




Hi Praveen

You can develope your SpanishAnalyzer easily (or 
another language)with SnowballAnalyzer.

I send you my SpanishAnalyzer.

Bye, Ernesto.


- Original Message - 
From: "Praveen Peddi" [EMAIL PROTECTED]
To: "lucenelist" [EMAIL PROTECTED]
Sent: Thursday, July 01, 2004 6:13 PM
Subject: languages lucene can 
support

I have read many emails in lucene mailing list 
regarding analyzers.Following is the list of languages lucene supports 
out of box. So they will be supported with no change in our code but just a 
configuration change.EnglishGermanRussianFollowing is the 
list of languages that are available as external downloads on lucene's 
site:ChineseJapaneseKorean (all of the above come as single 
download)BrazilianCZechFrenchDutchI also read that 
lucene's StandardAnalyzer supports most of the european languages. Does it mean 
it supports spanish also? or is there a separate analyzer for that? I didn't see 
any spanish analyzer in the sand box or lucene release.Another question 
regarding FrenchAnalyzer. I downloaded FrenchAnalyzer and some methods do not 
throw IOException where it is supposed to throw. For example, the constructor. I 
am using 1.4 final (I know its relased only today :)). Whats the fix for 
it?PraveenPraveen** 
Praveen PeddiSr Software Engg, Context Media, Inc. 
email:[EMAIL PROTECTED] Tel: 401.854.3475 Fax: 
401.861.3596 web: http://www.contextmedia.com 
** Context 
Media- "The Leader in Enterprise Content Integration"

---Outgoing mail is certified Virus 
Free.Checked by AVG anti-virus system (http://www.grisoft.com).Version: 6.0.712 / 
Virus Database: 468 - Release Date: 27/06/2004
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: syntax of queries.

2003-12-19 Thread Ernesto De Santis

Erik, Thanks!

The article is very good. thanks.

I have news questions:

 - apiQuery.add(new TermQuery(new Term(contents, dot)), false, true);

new Term(contents, dot)

The Term class, work for only one word?
this is right?
new Term(contents, dot java)
for search for dor OR java in contents.

My problem is that the user, entry a phrase, and i search for any word in a
phrase. No the entire phrase.
I need parse de string?, take word for word and add a TermQuery for each
word?

Bye, Ernesto.




- Original Message -
From: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Saturday, December 13, 2003 4:07 AM
Subject: Re: syntax of queries.


Try out the toString(fieldName) trick on your Query instances and
pair them up with what you have below - this will be quite insightful
for the issue - i promise!  :)

Look at my QueryParser article and search for toString on that page:
http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html

On Friday, December 12, 2003, at 10:38  PM, Ernesto De Santis wrote:

 Thanks Otis, I don´t resolve my problem.

 I see the Query sintaxis page, and the FAQ´s search section.
 I proof too many alternatives:

 body:(imprimir teclado) title:base = 451 hits

 body:(imprimir teclado)^5.1 title:base = 248 hits (* under 451)

 body:(imprimir teclado^5.1) title:base = 451 hits - first document:
 3287.html

 body:(imprimir^5.1 teclado) title:base = 451 hits - first document:
 1545.html

 conclusion:
 I think that the boost is only applicable for one word. not to
 parenthesys,
 and not to field.

 I wanna make the boost applicable to field.
 For me, is more important a hit in title that in body, for example.

 In the FAQ´s search secction:

 Clause  ::=  [ Modifier ] [ FieldName ':' ] BasicClause  [ Boost ]
 BasicClause ::= ( Term | Phrase | | PrefixQuery '(' Query ')'

 then, in my example BasicClause=(imprimir teclado) and Boost ^5.1.
 but not work.

 Regards, Ernesto.

 - Original Message -
 From: Otis Gospodnetic [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]; Ernesto De
 Santis [EMAIL PROTECTED]
 Sent: Friday, December 12, 2003 7:18 PM
 Subject: Re: syntax of queries.


 Maybe it's the spaces after title:?
 Try title:importar ... instead.

 Maybe it's the spaces before ^5.0?
 Try title:importar^5 instead

 You shouldn't need the parentheses in this case either, I believe.

 See Query Synax page on Lucene's site.

 Otis


 --- Ernesto De Santis [EMAIL PROTECTED] wrote:
 Hello

 I not undertanding the syntax of queries.
 I search with this string:

 title: (importar) ^5.0 OR title: (arquivos)

 return 6 hits.

 and with this:

 title: (arquivos) OR title: (importar) ^5.0

 27 hits.

 why?
 in the first, I think that work like AND, but, why? :-(

 Regards, Ernesto.



 __
 Do you Yahoo!?
 New Yahoo! Photos - easier uploading and sharing.
 http://photos.yahoo.com/

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: syntax of queries.

2003-12-12 Thread Ernesto De Santis

Thanks Otis, I don´t resolve my problem.

I see the Query sintaxis page, and the FAQ´s search section.
I proof too many alternatives:

body:(imprimir teclado) title:base = 451 hits

body:(imprimir teclado)^5.1 title:base = 248 hits (* under 451)

body:(imprimir teclado^5.1) title:base = 451 hits - first document:
3287.html

body:(imprimir^5.1 teclado) title:base = 451 hits - first document:
1545.html

conclusion:
I think that the boost is only applicable for one word. not to parenthesys,
and not to field.

I wanna make the boost applicable to field.
For me, is more important a hit in title that in body, for example.

In the FAQ´s search secction:

Clause  ::=  [ Modifier ] [ FieldName ':' ] BasicClause  [ Boost ]
BasicClause ::= ( Term | Phrase | | PrefixQuery '(' Query ')'

then, in my example BasicClause=(imprimir teclado) and Boost ^5.1.
but not work.

Regards, Ernesto.

- Original Message -
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]; Ernesto De
Santis [EMAIL PROTECTED]
Sent: Friday, December 12, 2003 7:18 PM
Subject: Re: syntax of queries.


 Maybe it's the spaces after title:?
 Try title:importar ... instead.

 Maybe it's the spaces before ^5.0?
 Try title:importar^5 instead

 You shouldn't need the parentheses in this case either, I believe.

 See Query Synax page on Lucene's site.

 Otis


 --- Ernesto De Santis [EMAIL PROTECTED] wrote:
  Hello
 
  I not undertanding the syntax of queries.
  I search with this string:
 
  title: (importar) ^5.0 OR title: (arquivos)
 
  return 6 hits.
 
  and with this:
 
  title: (arquivos) OR title: (importar) ^5.0
 
  27 hits.
 
  why?
  in the first, I think that work like AND, but, why? :-(
 
  Regards, Ernesto.
 


 __
 Do you Yahoo!?
 New Yahoo! Photos - easier uploading and sharing.
 http://photos.yahoo.com/

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Index pdf files with your content in lucene.

2003-11-12 Thread Ernesto De Santis

Hello

well, not work zip the files.

I can send files, if somebody won, to personal email.

And if somebody can post this in a web site, very cool.
I don´t post in a web site.

Ernesto.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Index pdf files with your content in lucene.

2003-11-11 Thread Ernesto De Santis

Classes for index Pdf and word files in lucene.
Ernesto.

- Original Message -
From: Ernesto De Santis [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, October 29, 2003 12:04 PM
Subject: Re: [opencms-dev] Index pdf files with your content in lucene.

Hello all,

Thans very much Stephan for your valuable help.
Attached you will find the PDFDocument, and WordDocument class source code

Ernesto.

- Original Message -
From: Hartmann, Waehrisch  Feykes GmbH [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Tuesday, October 28, 2003 11:10 AM
Subject: Re: [opencms-dev] Index pdf files with your content in lucene.

 Hi Ernesto,

 the IndexManager retrieves a list of files of a folder by calling the
method
 getFilesInFolder of CmsObject. This method returns only empty files, i.e.
 with empty content. To get the content of a pdf file you have to reread
the
 file:
 f = cms.readFile(f.getAbsolutePath());

 Bye,
 Stephan

 Am Montag, 27. Oktober 2003 19:18 schrieben Sie:

   Hello

  Thanks for the previous reply.

  Now, i use
  - version 1.4 of lucene searche module. (the version attached in this
list)
  - new version of registry.xml format for module. (like you write me)
  - the pdf files are stored with the binary type.

  But i have the next problem:
  i can´t make a InputStream for the cmsfile content.
  For this i write this code in de Document method of my class
PDFDocument:

  -

  InputStream in = new ByteArrayInputStream(f.getContents()); //f is the
  parameter CmsFile of the Document method

  PDFExtractor extractor = new PDFExtractor(); //PDFExtractor is lib i
use.
  in file system work fine.

  bodyText = extractor.extractText(in);

  Is correct use ByteArrayInputStream for make a InputStream for a
CmsFile?

  The error ocurr in the third line.
  In the PDFParcer.
  the error menssage in tomcat is:

  java.io.IOException: Error: Header is corrupt ''
  at PDFParcer.parse
  at PDFExtractor.extractText
  at PDFDocument.Document (my class)
  at.

  By, and thanks.
  Ernesto.

  - Original Message -
From: Hartmann, Waehrisch  Feykes GmbH
To: [EMAIL PROTECTED]
Sent: Friday, October 24, 2003 4:45 AM
Subject: Re: [opencms-dev] Index pdf files with your content in
lucene.

Hello Ernesto,

i assume you are using the unpatched version 1.3 of the search module.
As i mentioned yesterday, the plainDocFactory does only index cmsFiles
of
  type plain but not of type binary. PDF files are stored as binary. I
  suggest to use the version i posted yesterday. Then your registry.xml
would
  have to look like this: ...
docFactories
...
   docFactory type=plain enabled=true
...
   /docFactory
   docFactory type=binary enabled=true
  fileType name=pdftext
 extension.pdf/extension

classnet.grcomputing.opencms.search.lucene.PDFDocument/class
  /fileType
   /docFactory
...
/docFactories

Important: The type attribute must match the file types of OpenCms
(also
  defined in the registry.xml).

Bye,
Stephan

  - Original Message -
  From: Ernesto De Santis
  To: Lucene Users List
  Cc: [EMAIL PROTECTED]
  Sent: Thursday, October 23, 2003 4:16 PM
  Subject: [opencms-dev] Index pdf files with your content in lucene.

  Hello

  I am new in opencms and lucene tecnology.

  I won index pdf files, and index de content of this files.

  I work in this way:

  Make a PDFDocument class like JspDocument class.
  use org.textmining.text.extraction.PDFExtractor class, this class
work
  fine out of vfs.

  and write my registry.xml for pdf document, in plainDocFactory tag.

  fileType name=pdftext
  extension.pdf/extension
  !-- This will strip tags before
processing --

  classnet.grcomputing.opencms.search.lucene.PDFDocument/class
  /fileType

  my PDFDocument content this code:
  I think that the probrem is how take the content from CmsFile?, what
  InputStream use? PDFExtractor work with extractText(InputStream) method.

  public class PDFDocument implements I_DocumentConstants,
  I_DocumentFactory {

  public PDFDocument(){

  }

  public Document Document(CmsObject cmsobject, CmsFile cmsfile)

  throws CmsException

  {

  return Document(cmsobject, cmsfile, null);

  }

  public Document Document(CmsObject cmsobject, CmsFile cmsfile,
HashMap
  hashmap)

  throws CmsException

  {

  Document document=(new BodylessDocument()).Document(cmsobject,
  cmsfile);

  //put de content in the pdf file.

  String contenido = new String(cmsfile.getContents());

  StringBufferInputStream in = new StringBufferInputStream(contenido);

  // ByteArrayInputStream in = new
  ByteArrayInputStream(contenido.getBytes

Re: Index pdf files with your content in lucene.

2003-11-11 Thread Ernesto De Santis

try again zipping the files.

after i post the files in the web site.

 Could you also tell us a bit about this code?  Is it better than
 existing PDF/Word parsing solutions?  Pure Java?  Uses POI?

This code use existing parsing solution.
The intent is make a lucene Document for index pdf and word files, with
content.
Is pure java.
Use TextExtraction library.
tm-extractors-0.2.jar
Use POI and PDFBox.

Ernesto
Sorry for my bad English.


 Thanks,
 Otis


 --- Ernesto De Santis [EMAIL PROTECTED] wrote:
  Classes for index Pdf and word files in lucene.
  Ernesto.
 
  - Original Message -
  From: Ernesto De Santis [EMAIL PROTECTED]
  To: [EMAIL PROTECTED]
  Sent: Wednesday, October 29, 2003 12:04 PM
  Subject: Re: [opencms-dev] Index pdf files with your content in
  lucene.
 
 
  Hello all,
 
  Thans very much Stephan for your valuable help.
  Attached you will find the PDFDocument, and WordDocument class source
  code
 
  Ernesto.
 
 
  - Original Message -
  From: Hartmann, Waehrisch  Feykes GmbH
  [EMAIL PROTECTED]
  To: [EMAIL PROTECTED]
  Sent: Tuesday, October 28, 2003 11:10 AM
  Subject: Re: [opencms-dev] Index pdf files with your content in
  lucene.
 
 
   Hi Ernesto,
  
   the IndexManager retrieves a list of files of a folder by calling
  the
  method
   getFilesInFolder of CmsObject. This method returns only empty
  files, i.e.
   with empty content. To get the content of a pdf file you have to
  reread
  the
   file:
   f = cms.readFile(f.getAbsolutePath());
  
   Bye,
   Stephan
  
   Am Montag, 27. Oktober 2003 19:18 schrieben Sie:
  
 Hello
   
Thanks for the previous reply.
   
Now, i use
- version 1.4 of lucene searche module. (the version attached in
  this
  list)
- new version of registry.xml format for module. (like you write
  me)
- the pdf files are stored with the binary type.
   
But i have the next problem:
i can´t make a InputStream for the cmsfile content.
For this i write this code in de Document method of my class
  PDFDocument:
   
-
   
InputStream in = new ByteArrayInputStream(f.getContents()); //f
  is the
parameter CmsFile of the Document method
   
PDFExtractor extractor = new PDFExtractor(); //PDFExtractor is
  lib i
  use.
in file system work fine.
   
   
bodyText = extractor.extractText(in);
   

   
Is correct use ByteArrayInputStream for make a InputStream for a
  CmsFile?
   
The error ocurr in the third line.
In the PDFParcer.
the error menssage in tomcat is:
   
java.io.IOException: Error: Header is corrupt ''
at PDFParcer.parse
at PDFExtractor.extractText
at PDFDocument.Document (my class)
at.
   
By, and thanks.
Ernesto.
   
   
- Original Message -
  From: Hartmann, Waehrisch  Feykes GmbH
  To: [EMAIL PROTECTED]
  Sent: Friday, October 24, 2003 4:45 AM
  Subject: Re: [opencms-dev] Index pdf files with your content in
  lucene.
   
   
  Hello Ernesto,
   
  i assume you are using the unpatched version 1.3 of the search
  module.
  As i mentioned yesterday, the plainDocFactory does only index
  cmsFiles
  of
type plain but not of type binary. PDF files are stored as
  binary. I
suggest to use the version i posted yesterday. Then your
  registry.xml
  would
have to look like this: ...
  docFactories
  ...
 docFactory type=plain enabled=true
  ...
 /docFactory
 docFactory type=binary enabled=true
fileType name=pdftext
   extension.pdf/extension
   
  classnet.grcomputing.opencms.search.lucene.PDFDocument/class
/fileType
 /docFactory
  ...
  /docFactories
   
  Important: The type attribute must match the file types of
  OpenCms
  (also
defined in the registry.xml).
   
  Bye,
  Stephan
   
- Original Message -
From: Ernesto De Santis
To: Lucene Users List
Cc: [EMAIL PROTECTED]
Sent: Thursday, October 23, 2003 4:16 PM
Subject: [opencms-dev] Index pdf files with your content in
  lucene.
   
   
Hello
   
I am new in opencms and lucene tecnology.
   
I won index pdf files, and index de content of this files.
   
I work in this way:
   
Make a PDFDocument class like JspDocument class.
use org.textmining.text.extraction.PDFExtractor class, this
  class
  work
fine out of vfs.
   
and write my registry.xml for pdf document, in
  plainDocFactory tag.
   
fileType name=pdftext
extension.pdf/extension
!-- This will strip tags before
  processing --
   
classnet.grcomputing.opencms.search.lucene.PDFDocument/class
/fileType
   
my PDFDocument content this code:
I think that the probrem is how take the content from
  CmsFile?, what

Index pdf files with your content in lucene.

2003-10-23 Thread Ernesto De Santis

Hello

I am new in opencms and lucene tecnology. 

I won index pdf files, and index de content of this files.

I work in this way:

Make a PDFDocument class like JspDocument class. 
use org.textmining.text.extraction.PDFExtractor class, this class work fine out of vfs.

and write my registry.xml for pdf document, in plainDocFactory tag.

fileType name=pdftext
extension.pdf/extension
!-- This will strip tags before processing --

classnet.grcomputing.opencms.search.lucene.PDFDocument/class
/fileType

my PDFDocument content this code:
I think that the probrem is how take the content from CmsFile?, what InputStream use?
PDFExtractor work with extractText(InputStream) method.

public class PDFDocument implements I_DocumentConstants, I_DocumentFactory {

public PDFDocument(){

}


public Document Document(CmsObject cmsobject, CmsFile cmsfile)

throws CmsException 

{

return Document(cmsobject, cmsfile, null);

}

public Document Document(CmsObject cmsobject, CmsFile cmsfile, HashMap hashmap)

throws CmsException

{

Document document=(new BodylessDocument()).Document(cmsobject, cmsfile);


//put de content in the pdf file.

String contenido = new String(cmsfile.getContents());

StringBufferInputStream in = new StringBufferInputStream(contenido);

// ByteArrayInputStream in = new ByteArrayInputStream(contenido.getBytes());


/* try{

FileInputStream in = new FileInputStream (cmsfile.getPath() + cmsfile.getName());

*/

PDFExtractor extractor = new PDFExtractor();

String body = extractor.extractText(in);


document.add(Field.Text(body, body));

/* }catch(FileNotFoundException e){

e.toString();

throw new CmsException();

}


*/ 

return (document);

}


thanks
Ernesto
PD: Sorry for my poor english.




- Original Message - 
From: Hartmann, Waehrisch  Feykes GmbH [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, October 22, 2003 3:50 AM
Subject: Re: [opencms-dev] (no subject)


 Hi Ben,
 
 i think this won't work since the plainDocFactory will only be used for
 files of type plain but not for files of type binary.
 Recently we have done some additions to the module - by order of Lenord,
 Bauer  Co. GmbH - that could meet your needs. It introduces a more flexible
 way of defining docFactories that you can add new factories without having
 to recompile the whole module. So other modules (like the news) can bring
 their own docFactory and all you have to do is to edit the registry.xml.
 Here is an example:
 
 docFactories
 docFactory enabled=true type=plain
 fileType name=plaintext
 extension.txt/extension
 
 classnet.grcomputing.opencms.search.lucene.PlainDocument/class
 /fileType
 /docFactory
 docFactory enabled=true type=news
 
 classnet.grcomputing.opencms.search.lucene.NewsDocument/class
 /docFactory
 /docFactories
 
 To index binary files all you need to add is this:
 
docFactory enabled=true type=binary
 
 classnet.grcomputing.opencms.search.lucene.BodylessDocument/class
/docFactory
 
 There should be no need for an extension mapping.
 
 For the interested people:
 For ContentDefinitions (like news) i introduced the following:
 contentDefinitions
 contentDefinition type=news !-- must match docFactory
 type --
 
 classcom.opencms.modules.homepage.news.NewsContentDefinition/class
 
 initClassnet.grcomputing.opencms.search.lucene.NewsInitialization/initCla
 ss
 listMethod name=getNewsList
 param type=java.lang.Integer1/param
 param type=java.lang.String-1/param
 /listMethod
 page uri=/news.html?__element=entry
 param method=getIntId name=newsid/
 /page
 /contentDefinition
 
 In short:
 initClass is optional: For the news the news classes have to be loaded to
 initialize the db pool.
 listMethod: a method of the content definition class that returns a List of
 elements
 page: the page that can display an entry. Here a jsp that has a template
 element entry. It also needs the id of the news item.
 getIntId is a method of the content definition class and newsid is the url
 parameter the page needs. A link like
 news.html?__element=entrynewsid=xy
 will be generated.
 
 Best regards,
 Stephan
 
 
 - Original Message - 
 From: Ben Rometsch [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Wednesday, October 22, 2003 6:15 AM
 Subject: [opencms-dev] (no subject)
 
 
  Hi Matt,
 
  I am not having any joy! I've updated my registry.xml file, with the
  appropriate section reading:
 
  luceneSearch
  mergeFactor10/mergeFactor
  permChecktrue/permCheck
  indexDirc:\search/indexDir

Re: Zip Files

Re: Optimize not deleting all files

Re: Disk space used by optimize - non space in disk corrupts index.

Re: Lucene and multiple languages

Re: where is the SnowBallAnalyzer?

spanish stemmer

Re: spanish stemmer

Re: spanish stemmer

Re: spanish stemmer

Re: Index and Search question in Lucene.

javadoc api

parce Query

Re: Weighting database fields

Re: languages lucene can support

Re: syntax of queries.

Re: syntax of queries.

Re: Index pdf files with your content in lucene.

Index pdf files with your content in lucene.

Re: Index pdf files with your content in lucene.

Index pdf files with your content in lucene.

20 matches

Site Navigation

Mail list logo

Footer information