Re: Size of Document

2018-07-04 Thread Terry Steichen
In the document types I usually index (.pdf, .docx/.doc, .eml), there
exists a metadata field called "stream_size" that contains the size of
the document on disk.  You don't have to compute it.  Thus, when you
retrieve each document you can pull out the contents of this field and,
if you like, include it in each hitlist entry.


On 07/04/2018 05:26 AM, Chris and Helen Bamford wrote:
> Hi there,
>
> How can I calculate the total size of a Lucene Document that I'm about
> to write to an index so I know how many bytes I am writing please?  I
> need it for some external metrics collection.
>
> Thanks
>
> - Chris
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: advanced search

2006-10-13 Thread Terry Steichen
You can just add a field to your indexed docs that always evaluates to a 
fixed value.  Then you can do queries like: +doc:1 -id:test


karl wettin wrote:


13 okt 2006 kl. 09.59 skrev tony yin:


I wanta search several fields use NOT condition, but how?
for example:
I store "test" in {"id", "name", "value", ...} fields.
now I search "test" NOT in "id". That's it.

Can anyone help me?


You will not get any matchs looking for just a boolean NOT-clause. It 
has to be combined with something that matches. Perhaps a 
MatchAllDocumentsQuery will do it for you.


But to answer your question: a not-query is a Clause of a BooleanQuery.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Collecting documents where only one field term matches

2005-04-04 Thread Terry Steichen
I wonder if you could accomplish your goal by creating another field 
during indexing which holds the number of terms in the "species" field.  
If that's possible, then you might get what you want with a query like: 
+species:"homo sapien" +num_species:1.

mad Cow wrote:
Could some more experienced users suggest a solution to my problem. I 
have documents which contain multiple terms and phrases, and I wish to 
collect documents which match only the term I query for.

For example:
Doc1 contains,
  species:"homo sapien" Mammalia
Doc2 contains,
  species:"homo sapien"
I wish to collect documents ONLY with "homo sapien" but a search for 
species:"homo sapien" returns both documents as they both contain the 
phrase.
I have written code to cache every term for every field an I hoped 
that I could do the search - species:"homo sapien" -species:Mammalia. 
Unfortunately the terms homo and sapien seem to be separate.  So when 
I collect every term to use with the "-" operator I end up with a 
query thus
species:"homo sapien" -species:(homo Mammalia sapien)

which isn't the same.
Can anybody suggest another approach?
Many thanks
Iain
_
It's fast, it's easy and it's free. Get MSN Messenger today! 
http://www.msn.co.uk/messenger

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: a "real" PhrasePrefixQuery

2005-05-20 Thread Terry Steichen
Paul,
Could you flesh out the implementation you describe below with some code 
or pseudocode?

Regards,
Terry
Paul Elschot wrote:
On Friday 20 May 2005 11:30, Stanislav Jordanov wrote:
 

Is there a Lucene Query (or something that will do a job) like:
"Star Wars tri*"
that will match all docs containing a 3 word phrase: 'Star' followed by 
'Wars' followed by a word starting with 'tri'.

I.e. the above query will match both "Star Wars trilogy" and "Star Wars 
triumph".
   

You'll need an ordered SpanNearQuery over the following:
- SpanTermQuery for "Star"
- SpanTermQuery for "Wars"
- SpanOrQuery over all SpanTermQuery's for terms matching tri*.
The last one should be a SpanPrefixQuery, but that one is not
available. Have a look in PrefixQuery.rewrite() on how to find all
terms matching tri*, it's fairly straightforward.
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]