If you use the Field.Text(String name, Reader value) version of the Field.Text constructor, the field is tokenized and indexed but *not* stored. This means you will be able to search and find that document, but to know the original contents you will have to store a copy of it elsewhere.
The Field.Text(String name, String value) version does store the document String itself, so that's probably the origin of the confusion. > -----Original Message----- > From: Luke Shannon [mailto:[EMAIL PROTECTED] > Sent: donderdag 11 november 2004 20:17 > To: Lucene Users List > Subject: HTMLParser.getReader returning null > > Hello; > > Things were working fine. I have been re-organizing my code > to drop into QA when I noticed I was no longer getting search > results for my HTML files. > When I checked things out I confirmed I was still creating > the Documents but realized no content was being indexed. > > HTMLParser parser = new HTMLParser(f); > > // Add the tag-stripped contents as a Reader-valued Text > field so it will > // get tokenized and indexed. > doc.add(Field.Text("contents", parser.getReader())); > System.out.println("The content is " + doc.get("contents")); > > The SOP line above outputs a null where the contents used to > be. Any seen this before? > > Thanks, > > Luke > > ----- Original Message ----- > From: "Will Allen" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Thursday, November 11, 2004 1:59 PM > Subject: RE: Bug in the BooleanQuery optimizer? ..TooManyClauses > > > Any wildcard search will automatically expand your query to > the number of > terms it find in the index that suit the wildcard. > > For example: > > wild*, would become wild OR wilderness OR wildman etc for > each of the terms > that exist in your index. > > It is because of this, that you quickly reach the 1024 limit > of clauses. I > automatically set it to max int with the following line: > > BooleanQuery.setMaxClauseCount( Integer.MAX_VALUE ); > > > -----Original Message----- > From: Sanyi [mailto:[EMAIL PROTECTED] > Sent: Thursday, November 11, 2004 6:46 AM > To: [EMAIL PROTECTED] > Subject: Bug in the BooleanQuery optimizer? ..TooManyClauses > > > Hi! > > First of all, I've read about BooleanQuery$TooManyClauses, so > I know that it > has a 1024 Clauses > limit by default which is good enough for me, but I still > think it works > strange. > > Example: > I have an index with about 20Million documents. > Let's say that there is about 3000 variants in the entire > document set of > this word mask: cab* > Let's say that about 500 documents are containing the word: spectrum > Now, when I search for "cab* AND spectrum", I don't expect it > to throw an > exception. > It should first restrict the search for the 500 documents > containing the > word "spectrum", then it > should collect the variants of "cab*" withing these > documents, which turns > out in two or three > variants of "cab*" (cable, cables, maybe some more) and the > search should > return let's say 10 > documents. > > Similar example: When I search for "cab* AND nonexistingword" it still > throws a TooManyClauses > exception instead of saying "No results", since there is no > "nonexistingword" in my document set, > so it doesn't even have to start collecting the variations of "cab*". > > Is there any path for this issue? > Thank you for your time! > > Sanyi > (I'm using: lucene 1.4.2) > > p.s.: Sorry for re-sending this message, I was first sending it as an > accidental reply to a wrong thread.. > > > > __________________________________ > Do you Yahoo!? > Check out the new Yahoo! Front Page. > www.yahoo.com > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]