INDEXREADER + MAXDOC

2005-01-04 Thread Karthik N S


Hi

Guys

Apologies...



On using the integer number of  Indexreader.maxDoc() API , 

Is it possible to get the VALUES from the varoius  fieldtypes.

ex:-   'docs.get(contents)  at  IndexReader.maxdoc()'



If so How...??




WITH WARM REGARDS 
HAVE A NICE DAY 
[ N.S.KARTHIK] 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: INDEXREADER + MAXDOC

2005-01-04 Thread Erik Hatcher
On Jan 4, 2005, at 5:19 AM, Karthik N S wrote:
On using the integer number of  Indexreader.maxDoc() API ,
Is it possible to get the VALUES from the varoius  fieldtypes.
ex:-   'docs.get(contents)  at  IndexReader.maxdoc()'

If so How...??
Just to be sure I understand... you want the last document in the 
index?  IndexReader.document(n) will give you this.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: INDEXREADER + MAXDOC

2005-01-04 Thread Karthik N S
Hi Erik

Apologies...

  I would like to EXTRACT the DATA from the various fields of the Last
Document [as u said ]

  Ex: at IndexReader.maxDoc = 100

  doc.get(Content) == ISBN100
  doc.get(name)== LUCENE IN ACTION
  doc.get(author)  == Erik Hatcher
  .

This is my Requirement.

Please
With regards
Karthik


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 04, 2005 5:10 PM
To: Lucene Users List
Subject: Re: INDEXREADER + MAXDOC



On Jan 4, 2005, at 5:19 AM, Karthik N S wrote:
 On using the integer number of  Indexreader.maxDoc() API ,

 Is it possible to get the VALUES from the varoius  fieldtypes.

 ex:-   'docs.get(contents)  at  IndexReader.maxdoc()'



 If so How...??

Just to be sure I understand... you want the last document in the
index?  IndexReader.document(n) will give you this.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: INDEXREADER + MAXDOC

2005-01-04 Thread Erik Hatcher
On Jan 4, 2005, at 7:29 AM, Karthik N S wrote:
Hi Erik
  I would like to EXTRACT the DATA from the various fields of the Last
Document [as u said ]
  Ex: at IndexReader.maxDoc = 100
  doc.get(Content) == ISBN100
  doc.get(name)== LUCENE IN ACTION
  doc.get(author)  == Erik Hatcher
  .
This is my Requirement.
So...
doc = reader.document(100)
Or am I missing something from what you're asking?  You need to have 
the data *stored* in the fields you're going to retrieve from the 
returned Document.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


lucenebook.com

2005-01-04 Thread Erik Hatcher
Otis and I have been working hard to get a website up for Lucene in 
Action and beyond.  It's finally in place at:

http://www.lucenebook.com
We haven't put in place as much explanation and help there as we 
should, and I'm sure by opening up the flood gates on it we'll uncover 
issues that need to be fixed.  If you come across anything that is 
broken or you have suggestions for improvement, just click the link to 
e-mail us at the top of the site.

Here are a few FAQ's that we'll eventually post to the site -
* What are you searching?
  The search feature searches (using a MultiSearcher) the blog and book 
content.  The results will be a combination of highlighted book section 
snippets and highlighted full blog content.

* Can I see the book contents?
  The only actual book content visible are the snippets in the search 
results.  The search results may be useful if you don't have the book, 
but if you do have the book it will make more sense.  Who knows where 
this will evolve, but there is no current plan to provide more of the 
book content in the search results than this.  For hits in chapters 1 
and 3, there is a link provided to Manning's site where those sample 
chapters can be downloaded in their entirety for free.  The book search 
results are, of course, designed as teasers to (hopefully) show that we 
cover the topics you're interested in and that you should buy a copy of 
the book!  (my children need to eat too :)

* What's up with the hyphens in some of the search results?
  This is an artifact of how the book content was indexed (a text 
version of the PDF was processed, including the words split across 
lines).  These split words are, however, searchable!  There is a fair 
bit of analysis trickery going on to piece this stuff back together 
during indexing, but the stored content still contains the hyphens.

* What are the plans for the blog?
  Otis and I will make book related announcements, news about Lucene in 
general, errata postings, etc.  You can subscribe to the blog using 
your favorite syndication reader using 
http://www.lucenebook.com/blog/?flavor=rdf (or rss, atom, or 
rss2) - yes, we should put the links to this on the site itself.

* How did you build this thing?
  This is a great topic for a case study.  Erik will be preparing a 
case study on this as an article (or series of articles) and using it 
for upcoming presentations.  The about page has some technical 
details, but more will follow.

Again, feedback is most welcome - please send it to 
[EMAIL PROTECTED] rather than posting replies to this list though.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Deleting an index

2005-01-04 Thread Luke Shannon
If you opened an IndexReader was has it also been closed before you attempt
to delete?

- Original Message - 
From: Scott Smith [EMAIL PROTECTED]
To: lucene-user@jakarta.apache.org
Sent: Monday, January 03, 2005 7:39 PM
Subject: Deleting an index


I'm writing some junit tests for my search code (which layers on top of
Lucene).  The tests all follow the same pattern:

1. setUp(): Create some directories; write some files to be indexed
2. someTest: Call the indexer to create an index on the generated
files; do several searches and verify counts, expected hits, etc.;
3. tearDown(): Delete all of the directories and associated files
included the just-created index.



My problem is that I am unable to delete the index.  I've narrowed it
down to something in the search routine not letting go of the index file
(i.e., if I do the indexing and comment out the search, then everything
deletes fine).  The search code is pretty straight forward.  It creates
a new IndexSearcher (which it caches and hence uses for all searches in
the test).  Each individual search simply creates several QueryParsers
and then combines them to do a search using the cached IndexSearcher.
After the last search, I close() the IndexSearcher.  But something still
seems to have hold of the index.  I've tried nulling the hits object,
but that didn't seem to affect anything.



Any ideas?



Scott





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Problems...

2005-01-04 Thread Ross Rankin
(Bear with me; I have inherited this system from another developer who is no
longer with the company.  So I am not familiar with Lucene at all.  I just
have got the task of Fixing the search.)  

 

I have servlet that runs every 10 minutes and indexes and I can see files
being created in the index path on that interval (fdt,fdx,fnm,frq, etc.)
however the search function is no longer working.  I'm not getting anything
in the log that I can point to that says what is not working, the search or
the index.  But since the index files seem to change size/date stamp as they
have in the past, I'm leaning towards the search function.   

 

I'm not sure where or how to troubleshoot.  Can I examine the indexes with
anything to see what is there and that it's meaningful.  Is there something
simple I can do to track down what doesn't work in the process?  Thanks.

 

Ross

 

Here's the search function:

public Hits search(String searchString, String resellerId) {

int currentOffset = 0;

try {

currentOffset = Integer.parseInt(paramOffset);

} catch (NumberFormatException e) {}

 

System.out.println(\n\t\tSearch for  + searchString +  off =  +
currentOffset);

if (currentOffset  0) {

// if the user only requested the next n items from the search
returns

return hits;

}

 

// performs a new search

try {

hits = null;

try {

searcher.close();

} catch (Exception e){}

 

searcher = new IndexSearcher(pathToIndex);

Analyzer analyzer = new StandardAnalyzer();

 

String searchQuery = LuceneConstants.FIELD_RESELLER_IDS + :

+ resellerId

+  AND 

+ LuceneConstants.FIELD_FULL_DESCRIPTION + : +
searchString;

 

 

Query query = null;

try {

query = QueryParser.parse(searchQuery,

LuceneConstants.FIELD_FULL_DESCRIPTION,
analyzer);

} catch (ParseException e) {

// if an excepption occures parsing the search string
entered by the user

// escapes all the special lucene chars and try to make the
query again.

searchQuery = LuceneConstants.FIELD_RESELLER_IDS + :

+ resellerId +  AND 

+ LuceneConstants.FIELD_FULL_DESCRIPTION + :

+ escape(searchString);

query = QueryParser.parse(searchQuery,

LuceneConstants.FIELD_FULL_DESCRIPTION,
analyzer);

}

System.out.println(Searching for:  +
query.toString(LuceneConstants.FIELD_FULL_DESCRIPTION));

 

hits = searcher.search(query);

System.out.println(hits.length() +  total matching documents);

//searcher.close();

 

} catch (Exception e) {

e.printStackTrace();

}

 

return hits;



Re: Problems...

2005-01-04 Thread Luke Shannon
I had a similar situation with the same problem.

I found the previous system was creating all the object (including the
Searcher) and than updating the Index.

The result was the Searcher was not able to find any of the data just added
to the Index.

The solution for me was to move the creation of the Searcher to after the
Index had been updated and the Reader and Writer objects had been closed.
Also ensure the Searcher uses the same Analyzer as the IndexWriter used to
create the Index.

This is a good tool for checking what is in your index. It may help with the
trouble shooting:

http://www.getopt.org/luke/

Luke

- Original Message - 
From: Ross Rankin [EMAIL PROTECTED]
To: lucene-user@jakarta.apache.org
Sent: Tuesday, January 04, 2005 10:53 AM
Subject: Problems...


 (Bear with me; I have inherited this system from another developer who is
no
 longer with the company.  So I am not familiar with Lucene at all.  I just
 have got the task of Fixing the search.)



 I have servlet that runs every 10 minutes and indexes and I can see files
 being created in the index path on that interval (fdt,fdx,fnm,frq, etc.)
 however the search function is no longer working.  I'm not getting
anything
 in the log that I can point to that says what is not working, the search
or
 the index.  But since the index files seem to change size/date stamp as
they
 have in the past, I'm leaning towards the search function.



 I'm not sure where or how to troubleshoot.  Can I examine the indexes with
 anything to see what is there and that it's meaningful.  Is there
something
 simple I can do to track down what doesn't work in the process?  Thanks.



 Ross



 Here's the search function:

 public Hits search(String searchString, String resellerId) {

 int currentOffset = 0;

 try {

 currentOffset = Integer.parseInt(paramOffset);

 } catch (NumberFormatException e) {}



 System.out.println(\n\t\tSearch for  + searchString +  off = 
+
 currentOffset);

 if (currentOffset  0) {

 // if the user only requested the next n items from the search
 returns

 return hits;

 }



 // performs a new search

 try {

 hits = null;

 try {

 searcher.close();

 } catch (Exception e){}



 searcher = new IndexSearcher(pathToIndex);

 Analyzer analyzer = new StandardAnalyzer();



 String searchQuery = LuceneConstants.FIELD_RESELLER_IDS + :

 + resellerId

 +  AND 

 + LuceneConstants.FIELD_FULL_DESCRIPTION + : +
 searchString;





 Query query = null;

 try {

 query = QueryParser.parse(searchQuery,

 LuceneConstants.FIELD_FULL_DESCRIPTION,
 analyzer);

 } catch (ParseException e) {

 // if an excepption occures parsing the search string
 entered by the user

 // escapes all the special lucene chars and try to make
the
 query again.

 searchQuery = LuceneConstants.FIELD_RESELLER_IDS + :

 + resellerId +  AND 

 + LuceneConstants.FIELD_FULL_DESCRIPTION + :

 + escape(searchString);

 query = QueryParser.parse(searchQuery,

 LuceneConstants.FIELD_FULL_DESCRIPTION,
 analyzer);

 }

 System.out.println(Searching for:  +
 query.toString(LuceneConstants.FIELD_FULL_DESCRIPTION));



 hits = searcher.search(query);

 System.out.println(hits.length() +  total matching
documents);

 //searcher.close();



 } catch (Exception e) {

 e.printStackTrace();

 }



 return hits;




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



1.4.3 breaks 1.4.1 QueryParser functionality

2005-01-04 Thread Bill Janssen
I'm trying to figure out what changed between 1.4.1 and 1.4.3 to break
my application.  I couldn't use 1.4.2, because my app wouldn't compile
with 1.4.2, due to API changes.  With 1.4.3, the API incompatibilities
were fixed, but now the QueryParser seems to process query strings
differently.

For instance, with 1.4.1, when I use the query string

category:user names

during the parsing of the query, the method getFieldQuery is called,
which allows me to do some custom analysis of the search terms, for
particular field names.  However, with 1.4.3, getFieldQuery is *not*
called.  Why not?  Is something else called instead?

Also, what's the difference between major, minor, and micro release
numbers in the context of the Lucene project?  I'm still stuck on
1.4.1 due to these incompatibilities.  I'm a bit surprised that
differences between two micro releases of the same minor release would
cause this much difficulty.

Bill



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



how to create a long lasting unique key?

2005-01-04 Thread Peter Veentjer - Anchor Men
What is the best way to create a key for a document? I know the id (from hits) 
can not be used, but what is a good way to create a key

I need this key for a webapplication. At the moment every document can be 
identified with the filelocation key, but I would rather some kind of integer 
for the Job (nobody needs to know the file location).


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: how to create a long lasting unique key?

2005-01-04 Thread PA
On Jan 04, 2005, at 20:43, Peter Veentjer - Anchor Men wrote:
What is the best way to create a key for a document?
UUID?
http://java.sun.com/j2se/1.5.0/docs/api/java/util/UUID.html
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: how to create a long lasting unique key?

2005-01-04 Thread Luke Shannon
This is taken from the example code writen by Doug Cutting that ships with
Lucene.

It is the key our system uses. It also comes in handy when incrementally
updating.

Luke

public static String uid(File f) {
  // Append path and date into a string in such a way that lexicographic
  // sorting gives the same results as a walk of the file hierarchy. Thus
  // null (\u) is used both to separate directory components and to
  // separate the path from the date.
  return f.getPath().replace(dirSep, '\u') + \u
+ DateField.timeToString(f.lastModified());
 }

- Original Message - 
From: Peter Veentjer - Anchor Men [EMAIL PROTECTED]
To: lucene-user@jakarta.apache.org
Sent: Tuesday, January 04, 2005 2:43 PM
Subject: how to create a long lasting unique key?


What is the best way to create a key for a document? I know the id (from
hits) can not be used, but what is a good way to create a key

I need this key for a webapplication. At the moment every document can be
identified with the filelocation key, but I would rather some kind of
integer for the Job (nobody needs to know the file location).


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Problems...

2005-01-04 Thread Erik Hatcher
On Jan 4, 2005, at 10:53 AM, Ross Rankin wrote:
I'm not sure where or how to troubleshoot.  Can I examine the indexes 
with
anything to see what is there and that it's meaningful.  Is there 
something
simple I can do to track down what doesn't work in the process?  
Thanks.
Echoing a previous suggestion, use Luke to examine the index to make 
sure it is in good shape.  You can do ad-hoc queries with it also.

One recommendation below
String searchQuery = LuceneConstants.FIELD_RESELLER_IDS + 
:
+ resellerId
+  AND 
+ LuceneConstants.FIELD_FULL_DESCRIPTION + : +
searchString;

I highly recommend you use a BooleanQuery and AND in a TermQuery for 
resellerId rather than textually concatenating it to the expression you 
parse.  There are numerous issues that could come up with parsing it 
this way and potentially brittle.  If you ever switch analyzers such 
that numbers are filtered out, you could be in trouble.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Parsing issue

2005-01-04 Thread Hetan Shah
Hello All,

Does any one know how to handle the following parsing error?

thanks for pointers/code snippets.

-H

While trying to parse a HTML file using IndexHTML I get

Parse Aborted: Encountered \ at line 8, column 1162.
Was expecting one of:
ArgName ...
= ...
TagEnd ...



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: 1.4.3 breaks 1.4.1 QueryParser functionality

2005-01-04 Thread Erik Hatcher
Bill,
If we broken API compatibility then we screwed up.  What getFieldQuery 
signature are you overriding?

As for version numbers - there are no strict conventions.  The API 
should not have broken in 1.4.2, nor in 1.4.3 - this is very 
unfortunate.  I caught what I thought were all of the incompatibilities 
introduced in 1.4.2, but apparently I missed something that perhaps my 
test cases didn't account for?

Erik
On Jan 4, 2005, at 1:13 PM, Bill Janssen wrote:
I'm trying to figure out what changed between 1.4.1 and 1.4.3 to break
my application.  I couldn't use 1.4.2, because my app wouldn't compile
with 1.4.2, due to API changes.  With 1.4.3, the API incompatibilities
were fixed, but now the QueryParser seems to process query strings
differently.
For instance, with 1.4.1, when I use the query string
category:user names
during the parsing of the query, the method getFieldQuery is called,
which allows me to do some custom analysis of the search terms, for
particular field names.  However, with 1.4.3, getFieldQuery is *not*
called.  Why not?  Is something else called instead?
Also, what's the difference between major, minor, and micro release
numbers in the context of the Lucene project?  I'm still stuck on
1.4.1 due to these incompatibilities.  I'm a bit surprised that
differences between two micro releases of the same minor release would
cause this much difficulty.
Bill

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Parsing issue

2005-01-04 Thread Erik Hatcher
Sure... clean up your HTML and it'll parse fine :)   Perhaps use JTidy 
to clean up the HTML.  Or switch to using a more forgiving parser like 
NekoHTML.

Erik
On Jan 4, 2005, at 3:59 PM, Hetan Shah wrote:
Hello All,
Does any one know how to handle the following parsing error?
thanks for pointers/code snippets.
-H
While trying to parse a HTML file using IndexHTML I get
Parse Aborted: Encountered \ at line 8, column 1162.
Was expecting one of:
ArgName ...
= ...
TagEnd ...

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Parsing issue

2005-01-04 Thread Hetan Shah
Has any one used NekoHTML ? If so how do I use it. Is it a stand alone 
jar file that I include in my classpath and start using just like 
IndexHTML ?
Can some one share syntax and or code if it is supposed to be used 
programetically. I am looking at 
http://www.apache.org/~andyc/neko/doc/html/ for more information is that 
the correct place to look?

Thanks,
-H
Erik Hatcher wrote:
Sure... clean up your HTML and it'll parse fine :)   Perhaps use JTidy 
to clean up the HTML.  Or switch to using a more forgiving parser like 
NekoHTML.

Erik
On Jan 4, 2005, at 3:59 PM, Hetan Shah wrote:
Hello All,
Does any one know how to handle the following parsing error?
thanks for pointers/code snippets.
-H
While trying to parse a HTML file using IndexHTML I get
Parse Aborted: Encountered \ at line 8, column 1162.
Was expecting one of:
ArgName ...
= ...
TagEnd ...

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Parsing issue

2005-01-04 Thread Otis Gospodnetic
That's the correct place to look and it includes code samples.
Yes, it's a Jar file that you add to the CLASSPATH and use ... hm,
normally programmatically, yes :).

Otis

--- Hetan Shah [EMAIL PROTECTED] wrote:

 Has any one used NekoHTML ? If so how do I use it. Is it a stand
 alone 
 jar file that I include in my classpath and start using just like 
 IndexHTML ?
 Can some one share syntax and or code if it is supposed to be used 
 programetically. I am looking at 
 http://www.apache.org/~andyc/neko/doc/html/ for more information is
 that 
 the correct place to look?
 
 Thanks,
 -H
 
 
 Erik Hatcher wrote:
 
  Sure... clean up your HTML and it'll parse fine :)   Perhaps use
 JTidy 
  to clean up the HTML.  Or switch to using a more forgiving parser
 like 
  NekoHTML.
 
  Erik
 
  On Jan 4, 2005, at 3:59 PM, Hetan Shah wrote:
 
  Hello All,
 
  Does any one know how to handle the following parsing error?
 
  thanks for pointers/code snippets.
 
  -H
 
  While trying to parse a HTML file using IndexHTML I get
 
  Parse Aborted: Encountered \ at line 8, column 1162.
  Was expecting one of:
  ArgName ...
  = ...
  TagEnd ...
 
 
 
 
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail:
 [EMAIL PROTECTED]
 
 
 
 
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail:
 [EMAIL PROTECTED]
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: 1.4.3 breaks 1.4.1 QueryParser functionality

2005-01-04 Thread Bill Janssen
Erik,

The signature I'm overriding is

protected Query getFieldQuery (String field,
   Analyzer a,
   String queryText)
throws ParseException

It gets called with a query string of the form

   field:text

but no longer with a query string of the form

   field:text1 text2

Bill

 What getFieldQuery signature are you overriding?


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: 1.4.3 breaks 1.4.1 QueryParser functionality

2005-01-04 Thread Daniel Naber
On Tuesday 04 January 2005 23:53, Bill Janssen wrote:

   protected Query getFieldQuery (String field,
  Analyzer a,
  String queryText)
 throws ParseException

You're right, the problem is that we should call the deprecated method for 
example in getFieldQuery(String field, String queryText, int slop). 
However, there's a simple workaround: just remove the analyzer parameter 
from your method.

Regards
 Daniel

-- 
http://www.danielnaber.de


Re: 1.4.3 breaks 1.4.1 QueryParser functionality

2005-01-04 Thread Bill Janssen
 You're right, the problem is that we should call the deprecated method for 
 example in getFieldQuery(String field, String queryText, int slop). 
 However, there's a simple workaround: just remove the analyzer parameter 
 from your method.

Sure, if I wanted to ship different code for each micro-release of
Lucene (which, you might guess, I don't).  That signature doesn't
compile with 1.4.1.

Bill

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Problems...

2005-01-04 Thread Chris Hostetter


To start with, there has to be more to the search side of things then
what you included.  this search function is not static, which means it's
getting called on an object, which obviously has some internal state
(paramOffset, hits, and pathToIndex are a few that jump out at me)  what
are the values of those variables when this method gets called?

second, there are at least two places in your code where potential
exceptions get thrown away and execution continues.  as a matter of good
practice, you should add logging to these spots to make sure you aren't
ignoring errors...

third, you said  I'm not getting anything in the log that I can point to
that says what is not working, but what about what is/isn't in the log?
there are several System.out.println calls in this code ... I'm assuming
you're logging STDOUT, what do those messages (with variables) say?
what is the value of currentOffset on the initial search? what does the
query.toString look like? how many total hits are being found when the
search is executed?  (or is that line not getting logged because the
search is getting skipped becuase of some initial state in paramOffset?)




-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Parsing issue

2005-01-04 Thread Chuck Williams
I use it and have yet to have a problem with it.  It uses the Xerces API
so you parse and access html files just like xml files.  Very cool,

Chuck

   -Original Message-
   From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
   Sent: Tuesday, January 04, 2005 2:05 PM
   To: Lucene Users List
   Subject: Re: Parsing issue
   
   That's the correct place to look and it includes code samples.
   Yes, it's a Jar file that you add to the CLASSPATH and use ... hm,
   normally programmatically, yes :).
   
   Otis
   
   --- Hetan Shah [EMAIL PROTECTED] wrote:
   
Has any one used NekoHTML ? If so how do I use it. Is it a stand
alone
jar file that I include in my classpath and start using just like
IndexHTML ?
Can some one share syntax and or code if it is supposed to be used
programetically. I am looking at
http://www.apache.org/~andyc/neko/doc/html/ for more information
is
that
the correct place to look?
   
Thanks,
-H
   
   
Erik Hatcher wrote:
   
 Sure... clean up your HTML and it'll parse fine :)   Perhaps use
JTidy
 to clean up the HTML.  Or switch to using a more forgiving
parser
like
 NekoHTML.

 Erik

 On Jan 4, 2005, at 3:59 PM, Hetan Shah wrote:

 Hello All,

 Does any one know how to handle the following parsing error?

 thanks for pointers/code snippets.

 -H

 While trying to parse a HTML file using IndexHTML I get

 Parse Aborted: Encountered \ at line 8, column 1162.
 Was expecting one of:
 ArgName ...
 = ...
 TagEnd ...




   
-
 To unsubscribe, e-mail:
[EMAIL PROTECTED]
 For additional commands, e-mail:
[EMAIL PROTECTED]




   
-
 To unsubscribe, e-mail:
[EMAIL PROTECTED]
 For additional commands, e-mail:
[EMAIL PROTECTED]

   
   
   
   
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]
   
   
   
   
  
-
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: 1.4.3 breaks 1.4.1 QueryParser functionality

2005-01-04 Thread Bill Janssen
  However, there's a simple workaround: just remove the analyzer parameter 
  from your method.
 
 Sure, if I wanted to ship different code for each micro-release of
 Lucene (which, you might guess, I don't).  That signature doesn't
 compile with 1.4.1.
 
 Bill

Let me be a bit more explicit.  My method (essentially an
after-method, for those Lisp'rs out there) begins thusly:

protected Query getFieldQuery (String field,
   Analyzer a,
   String queryText)
throws ParseException {

  Query x = super.getFieldQuery(field, a, queryText);

  ...
}

If I remove the Analyzer a from both the signature and the super
call, the super call won't compile because that method isn't in the
QueryParser in 1.4.1.  But my getFieldQuery() method won't even be
called in 1.4.1, because it doesn't exist in that version of the
QueryParser.

Bill

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]