AW: Retriving Results - getting blank entries?

2003-06-16 Thread Borkenhagen, Michael (ofd-ko zdfin)
Maybe you get something like  . Try to trim() the Strings. 

-Ursprüngliche Nachricht-
Von: Rishabh Bajpai [mailto:[EMAIL PROTECTED]
Gesendet: Montag, 16. Juni 2003 08:25
An: Lucene Users List
Betreff: Retriving Results - getting blank entries?



Hi All,

I am retrieving results in the normal manner..

construct a query, get the hits object and iterate through it...
doc = hits.doc(i);

if at all any of the field name or value is null or blank, then dont display
that result...
if ( 
 field.name()==null) || 
 (field.stringValue()==null) ||
 (field.name().equals()) || 
 (field.stringValue().equals()) 
)
{
  addtoResultSet = false;   
}

But in some rare cases, I am still getting blank records displayed?
Is it some problem that happend while indexing, or a bug in Lucene, or just
that I am totally missing out on something?
Please help...
-Rishabh







Get advanced SPAM filtering on Webmail or POP Mail ... Get Lycos Mail!
http://login.mail.lycos.com/r/referral?aid=27005

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Bug in QueryParser ?

2003-06-13 Thread Borkenhagen, Michael (ofd-ko zdfin)
I´ve got the following Exeption during my tests with a query like
word1 || word2 || word3
if one of the words, e.g. word2 is in the stopword - list of my Analyzer :

java.lang.ArrayIndexOutOfBoundsException: -1  0
at java.util.Vector.elementAt(Vector.java:427)
at
org.apache.lucene.queryParser.QueryParser.addClause(QueryParser.java:171)
at
org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:463)
at
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:113)

I´m using Lucene 1.3 rc1.
Is this a Bug ?

Michael


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: About Query...

2003-03-17 Thread Borkenhagen, Michael (ofd-ko zdfin)
Your Syntax seems to be wrong; try 
Author:Williams AND Title:Sword - Title:House
or
Author:Williams AND Title:Sword NOT Title:House

Michael

-Ursprüngliche Nachricht-
Von: Pierre Lacchini [mailto:[EMAIL PROTECTED]
Gesendet: Montag, 17. März 2003 10:47
An: Lucene (E-mail)
Betreff: About Query...


Well guys, here's my (silly) question :

I got 2 Fields in my Index, for example Title and Author...

If i want to perform a complex query like : search Williams in fields
Author AND Sword in fields Title WITHOUT House in the fields
Title
I tried this synthax : Author:Williams AND Title:Sword -House But it
doesnt'seem to work...
Is it possible ? Or mb i'm wrong with the synthax ???

Thx for help ;)

Pierre Lacchini
Consultant développement

PeopleWare
12, rue du Cimetière
L-8413 Steinfort
Phone : + 352 399 968 35
http://www.peopleware.lu




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: About Query...

2003-03-17 Thread Borkenhagen, Michael (ofd-ko zdfin)
mmmh ... good question - i really don´t know :(

-Ursprüngliche Nachricht-
Von: Pierre Lacchini [mailto:[EMAIL PROTECTED]
Gesendet: Montag, 17. März 2003 12:32
An: 'Lucene Users List'
Betreff: RE: About Query...


sorry for my poor english...

Well if i perform a Multiple Fields query... Why do I have to specify the
name of the field in the parse method ?
Because i'm using 2 field in the query...



-Original Message-
From: Borkenhagen, Michael (ofd-ko zdfin)
[mailto:[EMAIL PROTECTED]
Sent: lundi 17 mars 2003 12:07
To: 'Lucene Users List'
Subject: AW: About Query...


Yes for sure,
Maybe I don´t understand your question ?

-Ursprüngliche Nachricht-
Von: Pierre Lacchini [mailto:[EMAIL PROTECTED]
Gesendet: Montag, 17. März 2003 12:26
An: 'Lucene Users List'
Betreff: RE: About Query...


Yeah thx Michael, now it works fine :)

But in this case, does the second argument of method parse(String query,
String field, Analyser analyser) of the QueryParser matter ?

-Original Message-
From: Borkenhagen, Michael (ofd-ko zdfin)
[mailto:[EMAIL PROTECTED]
Sent: lundi 17 mars 2003 12:01
To: 'Lucene Users List'
Subject: AW: About Query...


Your Syntax seems to be wrong; try
Author:Williams AND Title:Sword - Title:House
or
Author:Williams AND Title:Sword NOT Title:House

Michael

-Ursprüngliche Nachricht-
Von: Pierre Lacchini [mailto:[EMAIL PROTECTED]
Gesendet: Montag, 17. März 2003 10:47
An: Lucene (E-mail)
Betreff: About Query...


Well guys, here's my (silly) question :

I got 2 Fields in my Index, for example Title and Author...

If i want to perform a complex query like : search Williams in fields
Author AND Sword in fields Title WITHOUT House in the fields
Title
I tried this synthax : Author:Williams AND Title:Sword -House But it
doesnt'seem to work...
Is it possible ? Or mb i'm wrong with the synthax ???

Thx for help ;)

Pierre Lacchini
Consultant développement

PeopleWare
12, rue du Cimetière
L-8413 Steinfort
Phone : + 352 399 968 35
http://www.peopleware.lu




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: my experiences - Re: Parsing Word Docs

2003-03-06 Thread Borkenhagen, Michael (ofd-ko zdfin)
Ryan,

I tried to use texmining to extract text from word97 Documents. Some german
characters like ä, ü etc. aren`t parsed correctly, so a can`t use it
cause many german words include this characters. I dont know if the reason
is textmining or hdf from poi (hssf from poi parses this characters
correctly). Do you have any hints for me ?

Michael

-Ursprüngliche Nachricht-
Von: Ryan Ackley [mailto:[EMAIL PROTECTED]
Gesendet: Donnerstag, 6. März 2003 13:13
An: Lucene Users List
Betreff: Re: my experiences - Re: Parsing Word Docs


David,

The textmining.org stuff only works on Word97 and above. It should work with
no exceptions on any Word 97 doc. If you have any problems then it is from
an earlier version (most likely Word 6.0) or its not a word document. If
this isn't the case you need to email me so I can fix it and make it better
for the benefit of everyone. I plan on adding support for Word 6 in the
future.

Ryan Ackley

- Original Message -
From: David Spencer [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, March 05, 2003 6:24 PM
Subject: my experiences - Re: Parsing Word Docs


 FYI I tried the textmining.org/poi combo and on a collection of 350 word
 docs people have developed here over the years, and it failed on 33% of
them
 with exceptions being thrown about the formats being invalid.

 I tried antiword ( http://www.winfield.demon.nl/ ), a native  free
 *.exe, and
 it worked great ( well it seemed to process all the files fine).

 I've had similar experiences with PDF - I tried the 3 or so
 freeware/java PDF
 text extractors and they were not as good as the exe, pdftotext,
 from foolabs (http://www.foolabs.com/xpdf/).

 Not satisfying to a java developer but these work better than anything
 else I can find.

 You get source and I use them on windows  linux, no prob.



 Eric Anderson wrote:

 I'm interested in using the textmining/textextraction utilities using
Apache
 POI, that Ryan was discussing. However, I'm having some difficulty
determining
 what the insertion point would be to replace the default parser with the
word
 parser.
 
 Any assistance would be appreciated.
 
 
 
 
 
 LanRx Network Solutions, Inc.
 Providing Enterprise Level Solutions...On A Small Business Budget
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: my experiences - Re: Parsing Word Docs

2003-03-06 Thread Borkenhagen, Michael (ofd-ko zdfin)
thx a lot :) I'll try it

-Ursprüngliche Nachricht-
Von: Mario Ivankovits [mailto:[EMAIL PROTECTED]
Gesendet: Donnerstag, 6. März 2003 14:00
An: Lucene Users List
Betreff: Re: my experiences - Re: Parsing Word Docs


The problems with german umlauts should be fixed.
I have posted them a patch (see
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14735), and it should be
applied now.
I havent cross-checked it for now.

I currently use POI to index documents with lucene, but i do not use the
standard way with an lucende-word-document class (like the pdfdocument).
For sure, i have had some problems with getting the text from old documents,
but in this case my system falls back to an simple STRINGS parser (filters
any human-readable) char from the document-file.

byebye
Mario

- Original Message -
From: Borkenhagen, Michael (ofd-ko zdfin)
[EMAIL PROTECTED]
To: 'Lucene Users List' [EMAIL PROTECTED]
Sent: Thursday, March 06, 2003 1:39 PM
Subject: AW: my experiences - Re: Parsing Word Docs


Ryan,

I tried to use texmining to extract text from word97 Documents. Some german
characters like ä, ü etc. aren`t parsed correctly, so a can`t use it
cause many german words include this characters. I dont know if the reason
is textmining or hdf from poi (hssf from poi parses this characters
correctly). Do you have any hints for me ?

Michael

-Ursprüngliche Nachricht-
Von: Ryan Ackley [mailto:[EMAIL PROTECTED]
Gesendet: Donnerstag, 6. März 2003 13:13
An: Lucene Users List
Betreff: Re: my experiences - Re: Parsing Word Docs


David,

The textmining.org stuff only works on Word97 and above. It should work with
no exceptions on any Word 97 doc. If you have any problems then it is from
an earlier version (most likely Word 6.0) or its not a word document. If
this isn't the case you need to email me so I can fix it and make it better
for the benefit of everyone. I plan on adding support for Word 6 in the
future.

Ryan Ackley

- Original Message -
From: David Spencer [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, March 05, 2003 6:24 PM
Subject: my experiences - Re: Parsing Word Docs


 FYI I tried the textmining.org/poi combo and on a collection of 350 word
 docs people have developed here over the years, and it failed on 33% of
them
 with exceptions being thrown about the formats being invalid.

 I tried antiword ( http://www.winfield.demon.nl/ ), a native  free
 *.exe, and
 it worked great ( well it seemed to process all the files fine).

 I've had similar experiences with PDF - I tried the 3 or so
 freeware/java PDF
 text extractors and they were not as good as the exe, pdftotext,
 from foolabs (http://www.foolabs.com/xpdf/).

 Not satisfying to a java developer but these work better than anything
 else I can find.

 You get source and I use them on windows  linux, no prob.



 Eric Anderson wrote:

 I'm interested in using the textmining/textextraction utilities using
Apache
 POI, that Ryan was discussing. However, I'm having some difficulty
determining
 what the insertion point would be to replace the default parser with the
word
 parser.
 
 Any assistance would be appreciated.
 
 
 
 
 
 LanRx Network Solutions, Inc.
 Providing Enterprise Level Solutions...On A Small Business Budget
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: [ANN] PDFBox 0.6.0

2003-03-06 Thread Borkenhagen, Michael (ofd-ko zdfin)
Ben,

by using PDFBox-0.5.6 and alternative PDFBox-0.6.0 I'd receive the following
StackTrace
java.lang.ClassCastException: org.pdfbox.cos.COSObject
at
org.pdfbox.encoding.DictionaryEncoding.init(DictionaryEncoding.java
:66)
at org.pdfbox.cos.COSObject.getEncoding(COSObject.java:269)
at org.pdfbox.cos.COSObject.encode(COSObject.java:210)
at
org.pdfbox.util.PDFTextStripper.showString(PDFTextStripper.java:959)
at
org.pdfbox.util.PDFTextStripper.handleOperation(PDFTextStripper.java:
788)
at org.pdfbox.util.PDFTextStripper.process(PDFTextStripper.java:379)
at org.pdfbox.util.PDFTextStripper.process(PDFTextStripper.java:366)
at
org.pdfbox.util.PDFTextStripper.processPageContents(PDFTextStripper.j
ava:288)
at
org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:231
)
at
org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:223
)
at
org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:148)
...
(Stack from PDF 0.6.0)

I also receive the from Eric reported Error - but only one time. My Indexer
continues parsing the other pdf Documents after getting an error.
Have you any idea regarding the ClassCastException ?

Michael
-Ursprüngliche Nachricht-
Von: Ben Litchfield [mailto:[EMAIL PROTECTED]
Gesendet: Donnerstag, 6. März 2003 14:45
An: Lucene Users List
Betreff: Re: [ANN] PDFBox 0.6.0


In this release I have changed how I parsed the document, which may have
introduced this bug.  I have received another report of this and will have
it fixed for the next point release.

You said you tried with reasonably sized PDF repository.  Did you stop
indexing at this error or did you continue?  If you continued, is this the
only error that you got?

-Ben




-- 

On Thu, 6 Mar 2003, Eric Anderson wrote:

 Ben-
 In attempting to use the PDFBox-0.6.0, I rec'd the following error when
 attempting to scan a reasonably sized PDF repository.

 Any thoughts?


  caught a class java.io.EOFException
  with message: Unexpected end of ZLIB input stream


 Eric Anderson
 LanRx Network Solutions


 Quoting Ben Litchfield [EMAIL PROTECTED]:

  I would like to announce the next release of PDFBox.  PDFBox allows for
  PDF documents to be indexed using lucene through a simple interface.
  Please take a look at org.pdfbox.searchengine.lucene.LucenePDFDocument,
  which will extract all text and PDF document summary properties as
lucene
  fields.
 
  You can obtain the latest release from http://www.pdfbox.org
 
  Please send all bug reports to me and attach the PDF document when
  possible.
 
  RELEASE 0.6.0
  -Massive improvements to memory footprint.
  -Must call close() on the COSDocument(LucenePDFDocument does this for
you)
  -Really fixed the bug where small documents were not being indexed.
  -Fixed bug where no whitespace existed between obj and start of object.
  Exception in thread main java.io.IOException: expected='obj'
  actual='obj/Pro
  -Fixed issue with spacing where textLineMatrix was not being copied
   properly
  -Fixed 'bug' where parsing would fail with some pdfs with double endobj
   definitions
  -Added PDF document summary fields to the lucene document
 
 
  Thank you,
  Ben Litchfield
  http://www.pdfbox.org
 
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 

 LanRx Network Solutions, Inc.
 Providing Enterprise Level Solutions...On A Small Business Budget

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: Best HTML Parser !!

2003-02-24 Thread Borkenhagen, Michael (ofd-ko zdfin)
I prefer JTidy http://lempinen.net/sami/jtidy/.

Michael
-Ursprüngliche Nachricht-
Von: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Gesendet: Montag, 24. Februar 2003 15:03
An: Lucene Users List; [EMAIL PROTECTED]
Betreff: Re: Best HTML Parser !!


It's not possible to generalize like that.
I like NekoHTML.

Otis

--- Pierre Lacchini [EMAIL PROTECTED] wrote:
 Hello,
  
 i'm trying to index html file with Lucene.
 Do u know what's the best HTML Parser in Java ? 
 The most Powerful ?
 I need to extract meta-tag, and many other differents text fields...
  
 Thx for ur help ;)
 


__
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more
http://taxes.yahoo.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: IndexWriter addDocument NullPointerException

2003-02-24 Thread Borkenhagen, Michael (ofd-ko zdfin)
Yes it is possible. Instead of catching an Exception you can do anything
else, e.g.
try {
...}
catch (MyException e) {
 System.err.prinltn(e.class.forName());
}
But this is off-topic here, it´s an gereral question about java.

Michael

-Ursprüngliche Nachricht-
Von: Günter Kukies [mailto:[EMAIL PROTECTED]
Gesendet: Montag, 24. Februar 2003 17:52
An: Lucene Users List
Betreff: Re: IndexWriter addDocument NullPointerException


I switched off the -server switch from the java commandline options and
everything works fine now.
I changed nothing in my code.

So is it principly possible to throw an Exception with not stack trace?

Any comments about this phenomenon?

Günter


- Original Message -
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, February 24, 2003 4:31 PM
Subject: Re: IndexWriter addDocument NullPointerException


 If I were you I would make things simpler for myself by converting the
 code to something that I could run from the command line instead of
 having to go through Tomcat.

 You really need to capture your exception stack trace with lne numbers,
 and then we can try helping.

 Otis


 --- Günter_Kukies [EMAIL PROTECTED] wrote:
  log(doc: +doc); is handled by tomcat and directed into special
  log-files,
  so you can't see them.
 
System.err.println(hallo1 +doc);
ex.printStackTrace();
System.err.println(hallo2);
  this is printing the relevant output.
 
  doc is never null, writer is never null and I can't add null-fields
  to a
  document.
 
 
  Günter
 
  - Original Message -
  From: Otis Gospodnetic [EMAIL PROTECTED]
  To: Lucene Users List [EMAIL PROTECTED]
  Sent: Monday, February 24, 2003 3:07 PM
  Subject: Re: IndexWriter addDocument NullPointerException
 
 
   My guess is that your 2 getDocument calls are the source, that is,
  that
   those PDF and TXT classes don't return a proper Document.
   I also don't see the output created by log(doc: +doc);
  
   Otis
  
  
   if(path.matches(\\d+_\\d{4}_[a-z]{2,3}\\.pdf)) {
   doc =
  PDF_Document_Parser.getDocument(this,RealPath,file);
   }
   else if(path.matches(\\d+_\\d{4}_[a-z]{2,3}\\.txt)) {
   doc =
  TXT_Document_Parser.getDocument(this,RealPath,file);
   }
  
  
   --- Günter_Kukies [EMAIL PROTECTED] wrote:
So, weekend is over.
   
here is some code :
   
   private void addDocument(IndexWriter writer, File file )
  throws
IOException, InterruptedException {
String path = file.getName();
log( -start Indexing: + path );
Document doc = null;
if(path.matches(\\d+_\\d{4}_[a-z]{2,3}\\.pdf)) {
doc =
PDF_Document_Parser.getDocument(this,RealPath,file);
}
else if(path.matches(\\d+_\\d{4}_[a-z]{2,3}\\.txt)) {
doc =
TXT_Document_Parser.getDocument(this,RealPath,file);
}
else {
log(do nothing);
}
   
log(doc: +doc);
if( doc != null ) {
try {
writer.addDocument(doc);
}
catch(Exception ex) {
System.err.println(hallo1 +doc);
ex.printStackTrace();
System.err.println(hallo2);
log(ERROR writer.addDocument(doc););
}
}
else {
log( Skipping  + path );
}
log( -end Indexing: + path );
}
   
   
   
Here is the output:
   
hallo1 DocumentTextcontents:[EMAIL PROTECTED]
Unindexedemail:[EMAIL PROTECTED] Unindexedname:Hans Dampf
Textsummary:Equipo de deteccion 2002 Texttitle:Equipo de
  deteccion
2002
Textdoctypeid:0001 Unindexedlifetime:0
[EMAIL PROTECTED]
  Keywordmodified:0dcek766w
Keywordusername:hda
   
  
 

Unindexedrelative_path_xml:documents/news_new/sub1/sub11/sub111/10457359746
80_0001_hda.xml
Unindexedcategory:documents/news_new/sub1/sub11/sub111/
Keywordsearch_all:all [EMAIL PROTECTED]
   
  
 

Unindexedrelative_path:documents/news_new/sub1/sub11/sub111/1045735974680_0
001_hda.pdf
java.lang.NullPointerException
hallo2
hallo1 DocumentTextcontents:[EMAIL PROTECTED]
Unindexedemail:[EMAIL PROTECTED] Unindexedname:Hans Dampf
Textsummary:testsummary Texttitle:testtitle
  Textdoctypeid:0001
Unindexedlifetime:0 [EMAIL PROTECTED]
Keywordmodified:0dcek76bm Keywordusername:hda
   
  
 

Unindexedrelative_path_xml:documents/news_new/sub1/sub11/sub111/10457359748
50_0001_hda.xml
Unindexedcategory:documents/news_new/sub1/sub11/sub111/
Keywordsearch_all:all [EMAIL PROTECTED]
   
  
 

Unindexedrelative_path:documents/news_new/sub1/sub11/sub111/1045735974850_0
001_hda.pdf
java.lang.NullPointerException
hallo2
   

AW: Using term-highlighter

2003-02-20 Thread Borkenhagen, Michael (ofd-ko zdfin)
You have to write a class which implements the TermHighlighter Interface for
example like this 

public class MyHighlighter implements TermHighlighter {

  public String highlightTerm (String term){
return font class='highlight' + term + /font;
  }

}

Use this class in your Query after searching :

Document doc = this.ivHits.doc(i);
String doctitle = doc.get(Konstanten.F_TITLE);
doctitle = LuceneTools.highlightTerms(doctitle, new MyHighlighter(),
  this.ivQuery, analyzer);

Regards,
Michael

-Ursprüngliche Nachricht-
Von: Harpreet S Walia [mailto:[EMAIL PROTECTED]]
Gesendet: Freitag, 21. Februar 2003 07:37
An: Lucene Users List
Betreff: Using term-highlighter


Hi,

I am trying to use the term-highlighter posted on the contribution page for
lucene. I downloaded the files and made the changes mentioned in the
whitepaper to the classes in the  lucene search package.

can anbody please tell me, how to invoke the highligher while searching.
currently i am performing the searches as follows

org.apache.lucene.search.Searcher searcher = new
IndexSearcher(indexPath);
Query query = QueryParser.parse(srchqry,field, new
SimpleAnalyzer());
Hits hits = searcher.search(query);

what changes will be needed in these search steps.

Thanks in advance !

Regards,
Harpreet




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




AW: Compile lucene

2003-01-12 Thread Borkenhagen, Michael (ofd-ko zdfin)
Here is the exactly link
http://www.mail-archive.com/lucene-user@jakarta.apache.org/
:))

-Ursprüngliche Nachricht-
Von: Oshima, Scott [mailto:[EMAIL PROTECTED]]
Gesendet: Freitag, 10. Januar 2003 20:00
An: Lucene Users List
Betreff: RE: Compile lucene


Anyone can send me a link to the lucene mailing list email archives?  these
emails build up fast and i can't store them locally, but too valuable to
delete.  thanks.

-Original Message-
From: Romo García, Javier [mailto:[EMAIL PROTECTED]]
Sent: Thursday, September 12, 2002 1:19 AM
To: Lucene Users List
Subject: Compile lucene


Hi everyone!

Is there a good guide anywhere to compile the source code of lucene?
I don't know very well how to start, specially with javacc.

Thanks


--
To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
For additional commands, e-mail:
mailto:[EMAIL PROTECTED]



--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




AW: PDFBox 0.5.6

2002-12-04 Thread Borkenhagen, Michael (ofd-ko zdfin)
Thank You very very much !
This version is really great - it fixes most of the Problems I had with
earlier versions!

-Ursprüngliche Nachricht-
Von: Ben Litchfield [mailto:[EMAIL PROTECTED]]
Gesendet: Freitag, 29. November 2002 04:42
An: [EMAIL PROTECTED]
Betreff: PDFBox 0.5.6



PDFBox version 0.5.6 is now available at http://www.pdfbox.org

PDFBox makes it easy to add PDF Documents to a lucene index.

Fixes over the last version

-Fixed bug in LucenePDFDocument where stream was not being closed and
small documents were not being indexed.
-Fixed a spacing issue for some PDF documents.
-Fixed error while parsing the version number
-Fixed NullPointer in persistence example.
-Create example lucene IndexFiles class which models the demo from lucene.
-Fixed bug where garbage at the end of file caused an infinite loop
-Fixed bug in parsing boolean values with stuff at the end like true


Ben Litchfield



--
To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
For additional commands, e-mail:
mailto:[EMAIL PROTECTED]


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




AW: PDF parser

2002-11-22 Thread Borkenhagen, Michael (ofd-ko zdfin)
There are different Parsers available - every Parser has other advantages
and disadvantages.
I use a combination of the PDFBox  http://www.pdfbox.org/ and Etymon PJ
http://www.etymon.com/pjc/, cause their APIs are very simple. Both of them
parse PDF in a format of their own an provide interfaces to get the PDF
Documents contents.

Other developers on this list prefer JPedal http://www.jpedal.org/ which
parses PDF into XML an provide a XML Tree with the PDF Documents contentsest, but the 
Documentation isn´t very detailed.

Micha

-Ursprüngliche Nachricht-
Von: Thomas Chacko [mailto:[EMAIL PROTECTED]]
Gesendet: Freitag, 22. November 2002 15:26
An: Lucene Users List
Betreff: PDF parser


Whats the best parser available to extarct text from PDF documents.
Expecting a reply ASAP

Thanks in advance
Thomas Chacko


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]