Re: Size of Document
In the document types I usually index (.pdf, .docx/.doc, .eml), there exists a metadata field called "stream_size" that contains the size of the document on disk. You don't have to compute it. Thus, when you retrieve each document you can pull out the contents of this field and, if you like, include it in each hitlist entry. On 07/04/2018 05:26 AM, Chris and Helen Bamford wrote: > Hi there, > > How can I calculate the total size of a Lucene Document that I'm about > to write to an index so I know how many bytes I am writing please? I > need it for some external metrics collection. > > Thanks > > - Chris > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: advanced search
You can just add a field to your indexed docs that always evaluates to a fixed value. Then you can do queries like: +doc:1 -id:test karl wettin wrote: 13 okt 2006 kl. 09.59 skrev tony yin: I wanta search several fields use NOT condition, but how? for example: I store "test" in {"id", "name", "value", ...} fields. now I search "test" NOT in "id". That's it. Can anyone help me? You will not get any matchs looking for just a boolean NOT-clause. It has to be combined with something that matches. Perhaps a MatchAllDocumentsQuery will do it for you. But to answer your question: a not-query is a Clause of a BooleanQuery. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Collecting documents where only one field term matches
I wonder if you could accomplish your goal by creating another field during indexing which holds the number of terms in the "species" field. If that's possible, then you might get what you want with a query like: +species:"homo sapien" +num_species:1. mad Cow wrote: Could some more experienced users suggest a solution to my problem. I have documents which contain multiple terms and phrases, and I wish to collect documents which match only the term I query for. For example: Doc1 contains, species:"homo sapien" Mammalia Doc2 contains, species:"homo sapien" I wish to collect documents ONLY with "homo sapien" but a search for species:"homo sapien" returns both documents as they both contain the phrase. I have written code to cache every term for every field an I hoped that I could do the search - species:"homo sapien" -species:Mammalia. Unfortunately the terms homo and sapien seem to be separate. So when I collect every term to use with the "-" operator I end up with a query thus species:"homo sapien" -species:(homo Mammalia sapien) which isn't the same. Can anybody suggest another approach? Many thanks Iain _ It's fast, it's easy and it's free. Get MSN Messenger today! http://www.msn.co.uk/messenger - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: a "real" PhrasePrefixQuery
Paul, Could you flesh out the implementation you describe below with some code or pseudocode? Regards, Terry Paul Elschot wrote: On Friday 20 May 2005 11:30, Stanislav Jordanov wrote: Is there a Lucene Query (or something that will do a job) like: "Star Wars tri*" that will match all docs containing a 3 word phrase: 'Star' followed by 'Wars' followed by a word starting with 'tri'. I.e. the above query will match both "Star Wars trilogy" and "Star Wars triumph". You'll need an ordered SpanNearQuery over the following: - SpanTermQuery for "Star" - SpanTermQuery for "Wars" - SpanOrQuery over all SpanTermQuery's for terms matching tri*. The last one should be a SpanPrefixQuery, but that one is not available. Have a look in PrefixQuery.rewrite() on how to find all terms matching tri*, it's fairly straightforward. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]