Filter.getDocIdSet() returning null, and what this means for CachingWrapperFilter

2010-05-26 Thread Daniel Noll
in the cache from the entry is in the cache but it's null. Daniel -- Daniel NollForensic and eDiscovery Software Senior Developer The world's most advanced Nuixemail data analysis http

Re: Filter.getDocIdSet() returning null, and what this means for CachingWrapperFilter

2010-05-26 Thread Daniel Noll
, which will probably result in another post sooner or later. Daniel -- Daniel NollForensic and eDiscovery Software Senior Developer The world's most advanced Nuixemail data analysis http

Questions about the new query parser framework

2010-05-02 Thread Daniel Noll
and tag:(a b) both parse to the same node structure (making it impossible to figure out which the user actually used)? Daniel -- Daniel NollForensic and eDiscovery Software Senior Developer The world's most advanced Nuix

Re: Filters and multiple, per-segment calls to getDocIdSet

2010-03-25 Thread Daniel Noll
explicitly pass the docBase for the IndexReader - this would reduce the need to perform maths to determine the docBase ourselves, and also make it possible to parallelise those calls later. Daniel -- Daniel NollForensic and eDiscovery Software Senior Developer

Filters and multiple, per-segment calls to getDocIdSet

2010-03-24 Thread Daniel Noll
explicitly not threadsafe. We weren't keeping any state in them anyway, but now we will have to, so there is potential for a lot of new bugs if a filter is somehow used by two queries running at the same time. Daniel -- Daniel NollForensic and eDiscovery Software Senior

Re: Deleting documents without deleting them

2010-03-16 Thread Daniel Noll
-- Daniel NollForensic and eDiscovery Software Senior Developer The world's most advanced Nuixemail data analysis http://nuix.com/and eDiscovery software

Deleting documents without deleting them

2010-03-15 Thread Daniel Noll
even if I use a BitSet. :-( Is there any other way to go about it? Daniel -- Daniel NollForensic and eDiscovery Software Senior Developer The world's most advanced Nuixemail data analysis

New Query Parser: converting a QueryNode back into a String?

2009-11-29 Thread Daniel Noll
to do this, but I thought I would throw the question in anyway.) Daniel -- Daniel NollForensic and eDiscovery Software Senior Developer The world's most advanced Nuixemail data analysis http

Finding the highest term in a field

2009-11-18 Thread Daniel Noll
terms until I find a term where there are terms higher than the term but no terms higher than the term for the next day? Daniel -- Daniel NollForensic and eDiscovery Software Senior Developer The world's most advanced Nuix

Re: Finding the highest term in a field

2009-11-18 Thread Daniel Noll
On Thu, Nov 19, 2009 at 16:01, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Nov 18, 2009 at 10:48 PM, Daniel Noll dan...@nuix.com wrote: But what if I want to find the highest?  TermEnum can't step backwards. I've also wanted to do the same. It's coming with the new flexible

IndexWriter.close() no longer seems to close everything

2009-11-08 Thread Daniel Noll
at this time (though I was under the impression that close() waited for merges and so forth to complete before returning.) Daniel -- Daniel NollForensic and eDiscovery Software Senior Developer The world's most advanced Nuix

Re: IndexWriter.close() no longer seems to close everything

2009-11-08 Thread Daniel Noll
index state (before adding docs.) When the IndexWriter was opened, another reader was opened, so even though we thought we were closing both, it turned out there were two readers and one writer, and we were only closing one of the readers. Daniel -- Daniel Noll

Directory.list() deprecation

2009-11-05 Thread Daniel Noll
filter.) Daniel -- Daniel NollForensic and eDiscovery Software Senior Developer The world's most advanced Nuixemail data analysis http://nuix.com/and eDiscovery

Re: Migrating from Hit/Hits to TopDocs/TopDocCollector

2009-06-10 Thread Daniel Noll
of documents in the index. It's a shame we don't have an inverted kind of HitCollector where we can say give me the next hit, so that we can get the best of both worlds (like what StAX gives us in the XML world.) Daniel -- Daniel NollForensic and eDiscovery Software

Re: Phrase search

2009-06-10 Thread Daniel Noll
. gaming laptop cool= 2 (cool, gaming) And of course if it actually finds cool gaming computer it would get 6. Daniel -- Daniel NollForensic and eDiscovery Software Senior Developer The world's most advanced Nuix

Extending StandardAnalyzer considered harmful

2009-06-03 Thread Daniel Noll
have prevented the problem in its entirety as we would have realised much sooner that it wasn't safe to override in the beginning. Daniel -- Daniel NollForensic and eDiscovery Software Senior Developer The world's most advanced Nuix

ArrayIndexOutOfBoundsException from TermInfosReader.get (2.3.2)

2009-04-27 Thread Daniel Noll
2.4.1 and see if it has been fixed, but was there a bug along these lines in 2.3.2? Daniel -- Daniel NollForensic and eDiscovery Software Senior Developer The world's most advanced Nuixemail

Internals question: BooleanQuery with many TermQuery children

2009-04-06 Thread Daniel Noll
-- Daniel NollForensic and eDiscovery Software Senior Developer The world's most advanced Nuixemail data analysis http://nuix.com/and eDiscovery software

Re: underscore a word separator in StandardAnalyzer?

2009-03-15 Thread Daniel Noll
a trivial analyser which breaks on commas be the way to go? Daniel -- Daniel NollForensic and eDiscovery Software Senior Developer The world's most advanced Nuixemail data analysis http://nuix.com

Re: Lucene: MultiSearcher

2009-03-08 Thread Daniel Noll
Michael McCandless wrote: You could look at the docID of each hit, and compare to the .maxDoc() of each underlying reader. There is also MultiSearcher#subSearcher(int) which also works as you add more without having to do the maths yourself. Daniel -- Daniel Noll

Re: Optimum way to find all document without particular field

2009-03-04 Thread Daniel Noll
. Daniel -- Daniel NollForensic and eDiscovery Software Senior Developer The world's most advanced Nuixemail data analysis http://nuix.com/and eDiscovery

Re: double metaphone for misspellings

2008-12-17 Thread Daniel Noll
you would end up with a DoubleMetaphoneFilter, which you could then use with PerFieldAnalyzerWrapper to have it apply only to the fields you use that for. Daniel -- Daniel NollForensic and eDiscovery Software Senior Developer The world's

Re: Query Search returns always the same id

2008-10-28 Thread Daniel Noll
particularly surprising that it isn't stored. ;-) Daniel -- Daniel NollForensic and eDiscovery Software Senior Developer The world's most advanced Nuixemail data analysis http://nuix.com

Re: QueryParser returning TermQuery instead of PhraseQuery?

2008-10-20 Thread Daniel Noll
you don't want stemming (talking about exact matches) yet you chose the snowball analyser (whose sole purpose is stemming, unless I am mistaken...) Daniel -- Daniel NollForensic and eDiscovery Software Senior Developer The world's most

Re: StandardTokenizer and Korean grouping with alphanum

2008-09-22 Thread Daniel Noll
[:letter:] which is much more convenient. Daniel -- Daniel NollForensic and eDiscovery Software Senior Developer The world's most advanced Nuixemail data analysis http://nuix.com

StandardTokenizer and Korean grouping with alphanum

2008-09-21 Thread Daniel Noll
I'm seeing some tokens come back with mixed digits and Hangul, and I'm questioning the correctness of that. Disclaimer: we're not performing any further processing of Korean in subsequent filters at the current point in time, and I don't know the language either. Daniel -- Daniel Noll

Re: IndexSearcher.search

2008-09-19 Thread Daniel Noll
using it for that? Some of us obviously were. Daniel -- Daniel NollForensic and eDiscovery Software Senior Developer The world's most advanced Nuixemail data analysis http://nuix.com

Re: IndexSearcher.search

2008-09-16 Thread Daniel Noll
given search eventually. Maybe others have different opinions as they are working on webapps, where the user is already expecting paging before they even see the results page. Daniel -- Daniel NollForensic and eDiscovery Software Senior Developer

Re: IndexSearcher.search

2008-09-15 Thread Daniel Noll
? Daniel -- Daniel Noll - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: TopDocs question

2008-09-15 Thread Daniel Noll
to know the *number* of hits, and don't need the hits themselves, then you should just use a custom HitCollector which increments a counter. It will run much faster. Daniel -- Daniel Noll - To unsubscribe, e-mail: [EMAIL

Re: How to search

2008-08-25 Thread Daniel Noll
now, just not by default. Daniel -- Daniel Noll - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: How I can find wildcard symbol with WildcardQuery?

2008-08-19 Thread Daniel Noll
UN_TOKENIZED. Source code QFT: } else if (index == Index.NO_NORMS) { this.isIndexed = true; this.isTokenized = false; this.omitNorms = true; } ... Daniel -- Daniel Noll - To unsubscribe, e-mail: [EMAIL

Re: How I can find wildcard symbol with WildcardQuery?

2008-08-19 Thread Daniel Noll
Kwon, Ohsang wrote: Why do you use to WildcardQuery? You are not need to whildcard. (maybe..) Use term query. What if you need to match a literal wildcard *and* an actual wildcard. :-) Daniel -- Daniel Noll

Re: How I can find wildcard symbol with WildcardQuery?

2008-08-19 Thread Daniel Noll
-- Daniel Noll - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Testing for field existence

2008-08-18 Thread Daniel Noll
to support older text indexes.) Daniel -- Daniel Noll - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene search for OR

2008-08-14 Thread Daniel Noll
for the. If it gives no results then you won't find or either, without reindexing with stop words off. Daniel -- Daniel Noll - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: SpanRegexQuery

2008-07-31 Thread Daniel Noll
-- Daniel Noll - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Ignoring XML tags when Indexing

2008-07-25 Thread Daniel Noll
as you have a BufferedReader wrapped around the entire thing. Daniel -- Daniel Noll - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: MoreLikeThis from a field with a specific value

2008-07-15 Thread Daniel Noll
this category Not surprising at all. This is what you actually want: +(content:blah content:blah content:blah) +categoryId:2 Your original query's only REQUIRED constraint was that it match the category. Daniel -- Daniel Noll

Re: Match all documents with non empty field

2008-07-02 Thread Daniel Noll
it in a QueryFilter to cache the result, but I found it to be fast enough even for relatively large document sets. Daniel -- Daniel Noll - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL

Re: How to retrieve number of documents based on a query ?

2008-06-25 Thread Daniel Noll
On Thursday 26 June 2008 15:09:44 java_is_everything wrote: Hi all. Is there a way to obtain the number of documents in the Lucene index (2.0.0), having a particular term indexed, much like what we do in a database ? I suspect the normal way is a HitCollector which does nothing but increment

Re: lucene search options

2008-06-23 Thread Daniel Noll
On Monday 23 June 2008 16:21:17 Aditi Goyal wrote: I think wildcard (*) cannot be used in the beginning :( Wrong: http://lucene.apache.org/java/2_3_0/api/core/org/apache/lucene/queryParser/QueryParser.html#setAllowLeadingWildcard(boolean) Daniel

Re: lucene search options

2008-06-23 Thread Daniel Noll
On Monday 23 June 2008 18:08:29 Aditi Goyal wrote: Oh. For one moment I was elated to hear the news. :( Is there any way out? *:* -jakarta apache Or subclass QueryParser and override the getBooleanQuery() method to do this behind the scenes using MatchAllDocsQuery. Daniel

Re: creating Array of IndexReaders

2008-06-22 Thread Daniel Noll
On Saturday 21 June 2008 18:57:49 Sebastin wrote: Since i am maintaining more than 1.5 years records in the windows 2003 server,based on the user input for example if the user wants to display june 1 - june 15 folders and fetch the records from them.if the user wants to display may 1-may15

Re: lucene memory consumption

2008-05-29 Thread Daniel Noll
On Friday 30 May 2008 08:17:52 Alex wrote: Hi, other than the in memory terms (.tii), and the few kilobytes of opened file buffer, where are some other sources of significant memory consumption when searching on a large index ? ( 100GB). The queries are just normal term queries. Norms can

Re: Is it possible to add multiple keywords to a single field from one doc?

2008-05-25 Thread Daniel Noll
On Monday 26 May 2008 02:25:40 Tom Conlon wrote: Hi Mark, For example: you have a content field (default) and you also have an 'attributes' field. I'd like to add multiple attributes for a given document rather than just one value and be able to somehow search on the attributes. Ie.

Re: Search for long titles - wildcard queries

2008-05-13 Thread Daniel Noll
On Saturday 10 May 2008 20:32:42 legrand thomas wrote: I think I cannot use the WildcardQuery because the term shouldn't start with * of ?. Should I use a QueryParser ? How can I do it ? WildcardQuery does permit a wildcard at the front, it's just much slower. Also, QueryParser allows

Re: Does Lucene Supports Billions of data

2008-04-30 Thread Daniel Noll
On Thursday 01 May 2008 00:01:48 John Wang wrote: I am not sure how well lucene would perform with 2 Billion docs in a single index anyway. Even if they're in multiple indexes, the doc IDs being ints will still prevent it going past 2Gi unless you wrap your own framework around it. Daniel

Re: Problems about using Lucene to generate tag cloud..

2008-04-02 Thread Daniel Noll
On Thursday 03 April 2008 08:08:09 Dominique Béjean wrote: Hum, it looks like it is not true. Use a do-while loop make the first terms.term().field() generate a null pointer exception. Depends which terms method you use. TermEnum terms = reader.terms();

Re: Problems about using Lucene to generate tag cloud..

2008-04-01 Thread Daniel Noll
On Tuesday 01 April 2008 18:51:55 Dominique Béjean wrote: IndexReader reader = IndexReader.open(temp_index); TermEnum terms = reader.terms(); while (terms.next()) { String field = terms.term().field(); Gotcha: after calling terms() it's already pointing at

Re: java.lang.OutOfMemoryError: Java heap space when sorting the fields

2008-03-19 Thread Daniel Noll
On Thursday 20 March 2008 07:22:27 Mark Miller wrote: You might think, if I only ask for the top 10 docs, don't i only read 10 field values? But of course you don't know what docs will be returned as each search comes in...so you have to cache them all. If it lazily cached one field at a time,

Re: Contrib Highlighter and Phrase search

2008-03-19 Thread Daniel Noll
On Wednesday 19 March 2008 18:28:15 Itamar Syn-Hershko wrote: 1. Build a Radix tree (PATRICIA) and populate it with all search terms. Phrase queries will be considered as one big string, regardless their spaces. 2. Iterate through your text ignoring spaces and punctuation marks, and for each

Re: Question with Hits Interface

2008-03-18 Thread Daniel Noll
On Wednesday 19 March 2008 01:44:33 Ramdas M Ramakrishnan wrote: I am using a MultiFieldQueryParser to parse and search the index. Once I have the Hits and iterate thru it, I need to know the following? For every hit document I need to know under which indexed field was this Hit originating

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-17 Thread Daniel Noll
On Monday 17 March 2008 19:38:46 Michael McCandless wrote: Well ... expungeDeletes() first forces a flush, at which point the deletions are flushed as a .del file against the just flushed segment. Still, if you call expungeDeletes after every flush (commit) then it's only 1 segment whose

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-16 Thread Daniel Noll
On Thursday 13 March 2008 19:46:20 Michael McCandless wrote: But, when a normal merge of segments with deletions completes, your docIDs will shift. In trunk we now explicitly compute the docID shifting that happens after a merge, because we don't always flush pending deletes when flushing

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-12 Thread Daniel Noll
On Wednesday 12 March 2008 19:36:57 Michael McCandless wrote: OK, I think very likely this is the issue: when IndexWriter hits an exception while processing a document, the portion of the document already indexed is left in the index, and then its docID is marked for deletion. You can see

Re: indexing api wrt Analyzer

2008-03-12 Thread Daniel Noll
On Thursday 13 March 2008 15:21:19 Asgeir Frimannsson wrote: I was hoping to have IndexWriter take an AnalyzerFactory, where the AnalyzerFactory produces Analyzer depending on some criteria of the document, e.g. language. With PerFieldAnalyzerWrapper, you can specify which analyzer to

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-11 Thread Daniel Noll
On Tuesday 11 March 2008 19:55:39 Michael McCandless wrote: Hi Daniel, 2.3 should be no different from 2.2 in that docIDs only shift when a merge of segments with deletions completes. Could it be the ConcurrentMergeScheduler? Merges now run in the background by default and commit whenever

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-11 Thread Daniel Noll
On Wednesday 12 March 2008 09:53:58 Erick Erickson wrote: But to me, it always seems...er...fraught to even *think* about relying on doc ids. I know you've been around the block with Lucene, but do you have a compelling reason to use the doc ID and not your own unique ID? From memory it was

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-11 Thread Daniel Noll
On Wednesday 12 March 2008 10:20:12 Michael McCandless wrote: Oh, so you do not see the problem with SerialMergeScheduler but you do with ConcurrentMergeScheduler? [...] Oh, there are no deletions?  Then this is very strange.  Is it   optimize that messes up the docIDs?  Or, is it when you add

Document ID shuffling under 2.3.x (on merge?)

2008-03-10 Thread Daniel Noll
Hi all. We're using the document ID to associate extra information stored outside Lucene. Some of this information is being stored at load-time and some afterwards; later on it turns out the information stored at load-time is returning the wrong results when converting the database contents

Re: searching for Nothing

2008-03-02 Thread Daniel Noll
On Monday 03 March 2008 05:40:39 Ghinwa Choueiter wrote: thank you. You were right. Indexing by does not do what I need. How would one represent a null index? Perhaps another way of asking the question is what query would return to me all the documents in the database (all-pass filter).

Re: When does QueryParser creates PhraseQueries

2008-02-28 Thread Daniel Noll
On Wednesday 27 February 2008 00:50:04 [EMAIL PROTECTED] wrote: Looks that this is really hard-coded behaviour, and not Analyzer-specific. The whitespace part is coded into QueryParser.jj, yes. So are the quotes and : and other query-specific things. I want to search for directories with

Re: Rebuilding Document from index?

2008-02-28 Thread Daniel Noll
On Wednesday 27 February 2008 03:33:53 Itamar Syn-Hershko wrote: I'm still trying to engineer the best possible solution for Lucene with Hebrew, right now my path is NOT using a stemmer by default, only by explicit request of the user. MoreLikeThis would only return relevant results if I will

Re: Inconsistent Search Speed

2008-02-28 Thread Daniel Noll
On Thursday 28 February 2008 01:52:27 Erick Erickson wrote: And don't iterate through the Hits object for more than 100 or so hits. Like Mark said. Really. Really don't G... Is there a good trick for avoiding this? Say you have a situation like this... - User searches - User sees first N

Re: Searching multiple indexes

2008-02-21 Thread Daniel Noll
On Tuesday 19 February 2008 21:08:59 [EMAIL PROTECTED] wrote: 1. IndexSearcher with a MultiReader will search the indexes sequentially? Not exactly. It will fuse the indexes together such that things like TermEnum will merge the ones from the real indexes, and will search using those

Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

2008-02-04 Thread Daniel Noll
On Monday 04 February 2008 21:51:39 Michael McCandless wrote: Even pre-2.3, you should have seen gains by adding threads, if indeed your hardware has good concurrency. And definitely with the changes in 2.3, you should see gains by adding threads. With regards to this, I have been wondering:

Re: Lucene to index OCR text

2008-01-28 Thread Daniel Noll
On Friday 25 January 2008 19:26:44 Paul Elschot wrote: There is no way to do exact phrase matching on OCR data, because no correction of OCR data will be perfect. Otherwise the OCR would have made the correction... snip suggestion to use fuzzy query The problem I see with a fuzzy query is

field:* query type, and prefix queries

2008-01-23 Thread Daniel Noll
Hi all... Just out of interest, why does field:* go via getWildcardQuery instead of getPrefixQuery? It seems to me that it should be treated as a prefix of , but am I missing something important? Also, I've noticed that although RangeQuery was optimised in a recent version of Lucene,

Re: Question regarding adding documents

2008-01-07 Thread Daniel Noll
On Tuesday 08 January 2008 00:52:35 Developer Developer wrote: here is another approach. StandardAnalyzer st = new StandardAnalyzer(); StringReader reader= new StringReader(text to index...); TokenStream stream = st.tokenStream(content, reader); Then use the Field

Fullwidth alphanumeric characters, plus a question on Korean ranges

2008-01-06 Thread Daniel Noll
Hi all. We discovered that fullwidth letters are not treated as LETTER and fullwidth digits are not treated as DIGIT. This in itself is probably easy to fix (including the filter for normalising these back to the normal versions) but while sanity checking the blocks in StandardTokenizer.jj I

Re: Question regarding adding documents

2008-01-06 Thread Daniel Noll
On Monday 07 January 2008 11:35:59 chris.b wrote: is it possible to add a document to an index and, while doing so, get the terms in that document? If so, how would one do this? :x My first thought would be: when adding fields to the document, use the Field constructors which accept a

Re: Query.rewrite - help me to understand it

2007-12-16 Thread Daniel Noll
On Thursday 13 December 2007 23:07:49 游泳池的鱼 wrote: hehe ,you can do a test with PrefixQuery rewrite method,and extract terms . like this query = prefixQuery.rewrite(reader); query.extractTerms(set); for(String term : set){ System.out.println(term); } It will give you a

Re: DEFAULT_OPERATOR_AND globally ?

2007-12-11 Thread Daniel Noll
On Wednesday 12 December 2007 03:34:08 Helmut Jarausch wrote: Hi, I know how to set DEFAULT_OPERATOR_AND for an individual QueryParser Objekt (after creation) Since I always want this to be set, is there a means to set a (global) option such that any QueryParser object has this default

Tricky (maybe) query question

2007-12-05 Thread Daniel Noll
Hi all. Suppose you have a text index with a field used for deduplication, and then you later add a second field with further information that might also be used for deduplication. We'll call them A and B for the sake of brevity. If I have only a current text index, then I can use (a:foo AND

Re: How to check which field contains Term

2007-11-07 Thread Daniel Noll
On Thursday 08 November 2007 02:41:50 Lukasz Rzeszotarski wrote: I must write application, where client wants to make very complex query, like: find word blabla in (Content_1 OR Content_2) AND (...) AND (...)... and as a result he expectes not only documents, but also information in

Re: Storing Host and IP Information in Lucene

2007-09-10 Thread Daniel Noll
On Monday 10 September 2007 23:53:06 AnkitSinghal wrote: And if i make the field as UNTOKENIZED i cannot search for queries like host:xyz.* . I'm not sure why that wouldn't work. If the stored token is xyz.example.com, then xyz.* will certainly match it. Daniel

Re: How do YOU detect corrupt indexes?

2007-08-03 Thread Daniel Noll
consistency. Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699 Web: http://nuix.com/ Fax: +61 2 9212 6902 - To unsubscribe, e-mail: [EMAIL

Re: Getting only the Ids, not the whole documents.

2007-08-02 Thread Daniel Noll
. Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699 Web: http://nuix.com/ Fax: +61 2 9212 6902 - To unsubscribe, e-mail: [EMAIL PROTECTED

Re: Lucene equivalent of SQL DISTINCT for a specific field's stored values

2007-07-26 Thread Daniel Noll
with untokenised fields, unless you have somewhere else you can store the untokenised version which is quicker to iterate over. Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699 Web: http://nuix.com/ Fax: +61 2 9212 6902

Re: Search for null

2007-07-25 Thread Daniel Noll
. For the least code you can probably do... BooleanFilter f = new BooleanFilter(); f.add(new FilterClause(RangeFilter.More(field, ), BooleanClause.Occur.MUST_NOT)); f = new CachingWrapperFilter(f); Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo

Re: How to make a case insensitive search using a FuzzyQuery?

2007-07-05 Thread Daniel Noll
? I don't see why you would use a FuzzyQuery for something where a normal PhraseQuery should suffice. Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699 Web: http://nuix.com/ Fax: +61 2 9212 6902

Can I delete without shuffling document IDs?

2007-06-28 Thread Daniel Noll
actual document ID to the sequence ID. (e.g. if documents 1000 through 1999 are deleted, there would be an entry in the table saying that ID 2000 starts at document ID 1000.) I just wanted to put the question out in case someone has solved the exact same problem already. Daniel -- Daniel Noll

Re: Problem using RAMDirectory as a buffer

2007-06-21 Thread Daniel Noll
On Friday 22 June 2007 09:34:44 Tanya Levshina wrote:          ramWriter.addDocument(doc);          fsWriter.addIndexes(new Directory[] {ramDir,}); As IndexWriter already does this internally, I'm not exactly sure why you're trying to implement it again on the outside. Daniel -- Daniel

Re: negative queries

2007-06-18 Thread Daniel Noll
button. Good way to discourage potential contributors I suppose. Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699 Web: http://nuix.com/ Fax: +61 2 9212 6902

Re: negative queries

2007-06-18 Thread Daniel Noll
On Tuesday 19 June 2007 11:03:25 Erik Hatcher wrote: Good way to discourage potential contributors I suppose. And (most) spammers, which is really the point of requiring a profile. I believe this is called throwing the baby out with the bath water. Daniel -- Daniel Noll Nuix Pty Ltd

Re: negative queries

2007-06-17 Thread Daniel Noll
, however. Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699 Web: http://nuix.com/ Fax: +61 2 9212 6902 - To unsubscribe, e-mail: [EMAIL

Re: negative queries

2007-06-14 Thread Daniel Noll
on it? Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699 Web: http://nuix.com/ Fax: +61 2 9212 6902 - To unsubscribe, e-mail: [EMAIL PROTECTED

Re: regarding range search

2007-06-12 Thread Daniel Noll
want to do it inside ordinary text content as well.) Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699 Web: http://nuix.com/ Fax: +61 2 9212 6902

Re: I need 'cat???' to match 'cat' again!

2007-06-06 Thread Daniel Noll
to maintain a custom lucene package. Please help! Can you not use RegexQuery instead? Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699 Web: http://nuix.com/ Fax: +61 2 9212 6902

Re: Concept Search

2007-05-16 Thread Daniel Noll
* be tricky is phrase queries since inserting a new term breaks the offsets AFAIK. Although, I suppose you could always store the concepts in a different field and not modify the analyser being used for the text itself. Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007

Re: Concept Search

2007-05-16 Thread Daniel Noll
wrong before. Ah, I see. A feature I haven't toyed with just yet. That's rather nice. :-) Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699 Web: http://nuix.com/ Fax: +61 2 9212 6902

Re: Memory leak (JVM 1.6 only)

2007-05-15 Thread Daniel Noll
said I had allocated it 1.8GB, and someone *still* recommended this option. :-) Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699 Web: http://nuix.com/ Fax: +61 2 9212 6902

Re: Finding out which field matched for a query

2007-05-06 Thread Daniel Noll
, but I could find no information on the forum for this. Seems easy enough to do it by looking at the text of the fields (we do this ourselves for highlighting the hits.) Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699 Web: http

Re: Implementing lagre secure Lucene search system questions.

2007-05-03 Thread Daniel Noll
jim shirreffs wrote: Hi, I'm a relative Lucene newbe and would appreciate some expert advice. Sounds like you might want to start a new thread, otherwise people who know the answer to your problem might not see your post. Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo

Re: Straigtforward stemming example? Dictionary needed?

2007-04-25 Thread Daniel Noll
sentences where you don't know which word it is. To use the example just given, I saw wood is pretty ambiguous, without having more context. Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699 Web: http://nuix.com

Re: Adding large files to index

2007-04-25 Thread Daniel Noll
was on opposite sides of the split? etc. That being said, I haven't had issues adding files of this size. But then, our application doesn't require the ability to read at the same time some other thread is writing (so our memory requirements are lower to begin with.) Daniel -- Daniel Noll Nuix Pty Ltd

Re: IndexReader method semantics

2007-04-24 Thread Daniel Noll
documents. Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699 Web: http://nuix.com/ Fax: +61 2 9212 6902 This message is intended only for the named recipient. If you are not the intended recipient you

Re: IndexReader method semantics

2007-04-22 Thread Daniel Noll
concern about trying to do something crazy like this.) Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699 Web: http://nuix.com/ Fax: +61 2 9212 6902 This message is intended only for the named recipient

IndexReader method semantics

2007-04-19 Thread Daniel Noll
of the framework implement their own caching of the TermEnum, or is all the caching implemented in the reader and TermEnum implementations? Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699 Web: http://nuix.com

  1   2   >