combine to MultiTermQuery with OR
Hi, i want to combine two MultiTermQueries. One searches over FieldA, one over FieldB. Both queries should be combined with OR operator. so in lucene Syntax i want to search FieldA:Term1 OR FieldB:Term1, FieldA:Term2 OR FieldB:Term2, FieldA:Term3 OR FieldB:Term3... how can i do this? greetings sascha - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Request to be added to the ContributorsGroup
Hi Charlie, You need to create an account on the wiki and tell us your account name. Steve On Feb 10, 2015, at 3:46 AM, Charlie Picorini charlie.picor...@gmail.com wrote: Dear Lucene Team, Please add me to the contributorsGroup so that I can add IntraCherche which is actually based on Lucene. Kind regards, - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
StandardQueryParser with date/time fields stored as longs
Hello, I've done a lot of googling, but haven't stumbled upon the magic answer: how does one use StandardQueryParser with numeric fields representing timestamps, to allow for range queries? When indexing, my timestamp fields are ISO 8601 strings. I'm parsing them and then storing the milliseconds epoch time as a long, i.e.: doc.add(new LongField(created, ts.getMillis(), Field.Store.NO)); From reading around, this seems to be the preferred method to index a timestamp (makes sense). However, how can you get StandardQueryParser to handle a query like created:[2010-01-01 TO 2014-12-31]? For other numeric fields, StandardQueryParser.setNumericConfigMap() is working just fine for me. It would seem that the created field would have to be part of this map in order to execute the range query properly, but that there must also be a component to parse the date/time strings in the query and convert them to long values, right? Thanks in advance, Jon - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Aw: Re: combine to MultiTermQuery with OR
hm, already thought this could be the solution but didn't know how to do the or Operation so i tried this BooleanQuery bquery = new BooleanQuery(); bquery.add(queryFieldA, BooleanClause.Occur.SHOULD); bquery.add(queryFieldB, BooleanClause.Occur.SHOULD); this is the correct way? Gesendet: Dienstag, 10. Februar 2015 um 17:31 Uhr Von: Ian Lea ian@gmail.com An: java-user@lucene.apache.org Betreff: Re: combine to MultiTermQuery with OR org.apache.lucene.search.BooleanQuery. -- Ian. On Tue, Feb 10, 2015 at 3:28 PM, Sascha Janz sascha.j...@gmx.net wrote: Hi, i want to combine two MultiTermQueries. One searches over FieldA, one over FieldB. Both queries should be combined with OR operator. so in lucene Syntax i want to search FieldA:Term1 OR FieldB:Term1, FieldA:Term2 OR FieldB:Term2, FieldA:Term3 OR FieldB:Term3... how can i do this? greetings sascha - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: combine to MultiTermQuery with OR
org.apache.lucene.search.BooleanQuery. -- Ian. On Tue, Feb 10, 2015 at 3:28 PM, Sascha Janz sascha.j...@gmx.net wrote: Hi, i want to combine two MultiTermQueries. One searches over FieldA, one over FieldB. Both queries should be combined with OR operator. so in lucene Syntax i want to search FieldA:Term1 OR FieldB:Term1, FieldA:Term2 OR FieldB:Term2, FieldA:Term3 OR FieldB:Term3... how can i do this? greetings sascha - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Lucene search in attachments
If you don’t index content, you won’t be able to search for it I guess. That said, Tika can have this extracted characters limit. See indexedChars below: tika().parseToString(new BytesStreamInput(content, false), metadata, indexedChars); [1] https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456 https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456 -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 10 févr. 2015 à 09:24, sreedevi s sreedevi.payik...@gmail.com a écrit : Hi, Which is the best method to search in attachments in lucene? I am new to lucene and I am using version 4.10.2. By making use of Tika, I know I can convert files to text and then index it as another field. But for large files that will not be the ideal solution. I believe the maximum characters per field is 10,000. So, what can be ideal method to search attachments then Best Regards, Sreedevi S
Re: Lucene search in attachments
Thank you David. Yes, it has a restriction of characters to 1. But for large files, what could be done in that case? Best Regards, Sreedevi S On Tue, Feb 10, 2015 at 2:04 PM, David Pilato da...@pilato.fr wrote: If you don’t index content, you won’t be able to search for it I guess. That said, Tika can have this extracted characters limit. See indexedChars below: tika().parseToString(new BytesStreamInput(content, false), metadata, indexedChars); [1] https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456 https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456 -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 10 févr. 2015 à 09:24, sreedevi s sreedevi.payik...@gmail.com a écrit : Hi, Which is the best method to search in attachments in lucene? I am new to lucene and I am using version 4.10.2. By making use of Tika, I know I can convert files to text and then index it as another field. But for large files that will not be the ideal solution. I believe the maximum characters per field is 10,000. So, what can be ideal method to search attachments then Best Regards, Sreedevi S
Lucene search in attachments
Hi, Which is the best method to search in attachments in lucene? I am new to lucene and I am using version 4.10.2. By making use of Tika, I know I can convert files to text and then index it as another field. But for large files that will not be the ideal solution. I believe the maximum characters per field is 10,000. So, what can be ideal method to search attachments then Best Regards, Sreedevi S
Re: Lucene search in attachments
I don’t understand. If you don’t raise this restriction to a higher value (or to -1), all the text won’t be extracted so only a subset of the text will be indexed. Non indexed parts of the text won’t be searchable. Did I misunderstand your question? -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 10 févr. 2015 à 09:52, sreedevi s sreedevi.payik...@gmail.com a écrit : Thank you David. Yes, it has a restriction of characters to 1. But for large files, what could be done in that case? Best Regards, Sreedevi S On Tue, Feb 10, 2015 at 2:04 PM, David Pilato da...@pilato.fr wrote: If you don’t index content, you won’t be able to search for it I guess. That said, Tika can have this extracted characters limit. See indexedChars below: tika().parseToString(new BytesStreamInput(content, false), metadata, indexedChars); [1] https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456 https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456 -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 10 févr. 2015 à 09:24, sreedevi s sreedevi.payik...@gmail.com a écrit : Hi, Which is the best method to search in attachments in lucene? I am new to lucene and I am using version 4.10.2. By making use of Tika, I know I can convert files to text and then index it as another field. But for large files that will not be the ideal solution. I believe the maximum characters per field is 10,000. So, what can be ideal method to search attachments then Best Regards, Sreedevi S
Re: Lucene search in attachments
No David. By increasing the value or I can set to -1 to make it unlimited but still I cannot assure that my whole text can be searchable, which is still a problem with large files because only the part which is indexed will be searchable. Was looking for some alternatives. Best Regards, Sreedevi S On Tue, Feb 10, 2015 at 2:26 PM, David Pilato da...@pilato.fr wrote: I don’t understand. If you don’t raise this restriction to a higher value (or to -1), all the text won’t be extracted so only a subset of the text will be indexed. Non indexed parts of the text won’t be searchable. Did I misunderstand your question? -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 10 févr. 2015 à 09:52, sreedevi s sreedevi.payik...@gmail.com a écrit : Thank you David. Yes, it has a restriction of characters to 1. But for large files, what could be done in that case? Best Regards, Sreedevi S On Tue, Feb 10, 2015 at 2:04 PM, David Pilato da...@pilato.fr wrote: If you don’t index content, you won’t be able to search for it I guess. That said, Tika can have this extracted characters limit. See indexedChars below: tika().parseToString(new BytesStreamInput(content, false), metadata, indexedChars); [1] https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456 https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456 -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 10 févr. 2015 à 09:24, sreedevi s sreedevi.payik...@gmail.com a écrit : Hi, Which is the best method to search in attachments in lucene? I am new to lucene and I am using version 4.10.2. By making use of Tika, I know I can convert files to text and then index it as another field. But for large files that will not be the ideal solution. I believe the maximum characters per field is 10,000. So, what can be ideal method to search attachments then Best Regards, Sreedevi S
RE: Lucene search in attachments
Hi, There is no restriction to 1 characters inside Lucene and there never was one. In earlier Lucene versions (long time ago) there was an implicit restriction to 10,000 TERMS (not characters). This is no longer the case. If you still want this, you have to wrap your Analyzer: http://goo.gl/SRf45A If you have a limitation to 10,000 characters somewhere, it might be your TIKA text extraction. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: sreedevi s [mailto:sreedevi.payik...@gmail.com] Sent: Tuesday, February 10, 2015 9:53 AM To: java-user@lucene.apache.org Subject: Re: Lucene search in attachments Thank you David. Yes, it has a restriction of characters to 1. But for large files, what could be done in that case? Best Regards, Sreedevi S On Tue, Feb 10, 2015 at 2:04 PM, David Pilato da...@pilato.fr wrote: If you don’t index content, you won’t be able to search for it I guess. That said, Tika can have this extracted characters limit. See indexedChars below: tika().parseToString(new BytesStreamInput(content, false), metadata, indexedChars); [1] https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob /master/src/main/java/org/elasticsearch/index/mapper/attachment/Attach mentMapper.java#L456 https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob /master/src/main/java/org/elasticsearch/index/mapper/attachment/Attach mentMapper.java#L456 -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 10 févr. 2015 à 09:24, sreedevi s sreedevi.payik...@gmail.com a écrit : Hi, Which is the best method to search in attachments in lucene? I am new to lucene and I am using version 4.10.2. By making use of Tika, I know I can convert files to text and then index it as another field. But for large files that will not be the ideal solution. I believe the maximum characters per field is 10,000. So, what can be ideal method to search attachments then Best Regards, Sreedevi S - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Lucene search in attachments
Hi, -Original Message- From: sreedevi s [mailto:sreedevi.payik...@gmail.com] Sent: Tuesday, February 10, 2015 10:46 AM To: java-user@lucene.apache.org Subject: Re: Lucene search in attachments Hi Uwe, Thank you for the info update.I will remove the limit in tika and check. So, my understanding is,currently lucene doesnt have any restriction on number of terms per field but when a term is greater then 2^15 bytes it is silently ignored at indexing time – a message is logged in to infoStream if enabled, but no error is thrown . Yes. There is only a limit on a single term *after* text analysis. But keep in mind that some Analyzers like StandardAnalyzer have other limits way below that one. On the other hand, if you index your documents as StingField or with KeywordAnalyzer, there is no tokenization done at all, in that case the whole field is indexed as a single term - but that’s not useful for searching in full text anyways. So use a suitable analyzer! Is that right? Yes! Uwe Best Regards, Sreedevi S On Tue, Feb 10, 2015 at 2:45 PM, Uwe Schindler u...@thetaphi.de wrote: Hi, There is no restriction to 1 characters inside Lucene and there never was one. In earlier Lucene versions (long time ago) there was an implicit restriction to 10,000 TERMS (not characters). This is no longer the case. If you still want this, you have to wrap your Analyzer: http://goo.gl/SRf45A If you have a limitation to 10,000 characters somewhere, it might be your TIKA text extraction. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: sreedevi s [mailto:sreedevi.payik...@gmail.com] Sent: Tuesday, February 10, 2015 9:53 AM To: java-user@lucene.apache.org Subject: Re: Lucene search in attachments Thank you David. Yes, it has a restriction of characters to 1. But for large files, what could be done in that case? Best Regards, Sreedevi S On Tue, Feb 10, 2015 at 2:04 PM, David Pilato da...@pilato.fr wrote: If you don’t index content, you won’t be able to search for it I guess. That said, Tika can have this extracted characters limit. See indexedChars below: tika().parseToString(new BytesStreamInput(content, false), metadata, indexedChars); [1] https://github.com/elasticsearch/elasticsearch-mapper-attachments/ blob /master/src/main/java/org/elasticsearch/index/mapper/attachment/Atta ch mentMapper.java#L456 https://github.com/elasticsearch/elasticsearch-mapper-attachments/ blob /master/src/main/java/org/elasticsearch/index/mapper/attachment/Atta ch mentMapper.java#L456 -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 10 févr. 2015 à 09:24, sreedevi s sreedevi.payik...@gmail.com a écrit : Hi, Which is the best method to search in attachments in lucene? I am new to lucene and I am using version 4.10.2. By making use of Tika, I know I can convert files to text and then index it as another field. But for large files that will not be the ideal solution. I believe the maximum characters per field is 10,000. So, what can be ideal method to search attachments then Best Regards, Sreedevi S - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Indexing and searching a DateTime range
Hi, OK. I found the Alfresco code on GitHub. So it's open source it seems. And I found the DateTimeAnalyser, so I will just take that code as a starting point: https://github.com/lsbueno/alfresco/tree/master/root/projects/repository/ source/java/org/alfresco/repo/search/impl/lucene/analysis This won't help you: a) its outdated code from very early Lucene versions b) it would be slow, because it does not use the numeric features of Lucene, so your code would be very slow if you search for date ranges Basically, I don't really understand your problem: If you use Lucene directly you are responsible for processing the text before it goes into the index. If you want to create a Lucene Document per Line, it is your turn to do this. Lucene has no functionality to split documents. You have to process your input and bring it into a format that Lucene wants: Documents consisting of Key/Value pairs. Analyzers are only there for processing one specific field and tokenize the input (so the index contains words and not the whole field as one term). Analyzers have nothing to do with Analysis of the structure of Log lines (because they would only work on one field, which does not help for structured queries like on date). So basically your indexing workflow is: - Open Log file - Read log file line by line - Create a Lucene IndexDocument instance - Extract interesting key/value pairs from your log file, e.g. by using regular expressions (like Logstash does). Basically this would for example detect the date, class name from Log4J files, or whatever else - Put those key/value pairs as fields (numeric, text,...) to the Lucene IndexDocument: One field for the date, one field for message content, one field for classname,... (those fields don't need to be stored, unless you want to display only them in search results, see below). - In addition, it is wise to add an additional Lucene TextField instance (that is also STORED=TRUE, INDEXED=TRUE with good Analyzer) that contains the whole line (redundant). By STORING it, you are able to return the whole log line in your search results - Index the document - Process next line If you don't want to write this code on your own, use Logstash and Elasticsearch (or write a separate plugin for Logstash that indexes to lucene). But your comment is strange: You say: Elasticsearch and Logstah is too slow for many log lines. How should then Lucene be faster? Elasticsearch also uses Lucene under the hood. The main problem if its slow is in most cases incorrect data types while indexing (like using a text field for dates and doing ranges). It is the same like indexing a number in a relational database as String and then do like queries instead of real numeric comparisons - just wrong and slow. Uwe Thank you for everybody for the time to respond. 2015-02-10 9:55 GMT+09:00 Gergely Nagy foge...@gmail.com: Thank you Barry, I really appreciate your time to respond, Let me clarify this a little bit more. I think it was not clear. I know how to parse dates, this is not the question here. (See my previous email: how can I pipe my converter logic into the indexing process?) All of your solutions guys would work fine if I wanted to index per-document. Which I do NOT want to do. What I would like to do to index per log line. I need to do a full text search, but with the additional requirement to filter those search hits by DateTime range. I hope this makes it clearer. So any suggestions how to do that? Sidenote: I saw that Alfresco implemented this analyzer, called DateTimeAnalyzer, but Alfresco is not open source. So I was wondering how to implement the same. Actually after wondering for 2 days, I became convinced that writing an Analyzer should be the way to go. I will post my solution later if I have a working code. 2015-02-10 8:50 GMT+09:00 Barry Coughlan b.coughl...@gmail.com: Hi Gergely, Writing an analyzer would work but it is unnecessarily complicated. You could just parse the date from the string in your input code and index it in the LongField like this: SimpleDateFormat format = new SimpleDateFormat(-MM-dd HH:mm:ss.S'Z'); format.setTimeZone(TimeZone.getTimeZone(UTC)); long t = format.parse(2015-02-08 00:02:06.123Z INFO...).getTime(); Barry On Tue, Feb 10, 2015 at 12:21 AM, Gergely Nagy foge...@gmail.com wrote: Thank you for taking your time to respond Karthik, Can you show me an example how to convert DateTime to milliseconds? I mean how can I pipe my converter logic into the indexing process? I suspect I need to write my own Analyzer/Tokenizer to achieve this. Is this correct? 2015-02-09 22:58 GMT+09:00 KARTHIK SHIVAKUMAR nskarthi...@gmail.com: Hi Long time ago,.. I used to store datetime in millisecond . TermRangequery used to work in perfect condition Convert all datetime to millisecond and index
Re: Lucene search in attachments
Hi Uwe, Thank you for the info update.I will remove the limit in tika and check. So, my understanding is,currently lucene doesnt have any restriction on number of terms per field but when a term is greater then 2^15 bytes it is silently ignored at indexing time – a message is logged in to infoStream if enabled, but no error is thrown . Is that right? Best Regards, Sreedevi S On Tue, Feb 10, 2015 at 2:45 PM, Uwe Schindler u...@thetaphi.de wrote: Hi, There is no restriction to 1 characters inside Lucene and there never was one. In earlier Lucene versions (long time ago) there was an implicit restriction to 10,000 TERMS (not characters). This is no longer the case. If you still want this, you have to wrap your Analyzer: http://goo.gl/SRf45A If you have a limitation to 10,000 characters somewhere, it might be your TIKA text extraction. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: sreedevi s [mailto:sreedevi.payik...@gmail.com] Sent: Tuesday, February 10, 2015 9:53 AM To: java-user@lucene.apache.org Subject: Re: Lucene search in attachments Thank you David. Yes, it has a restriction of characters to 1. But for large files, what could be done in that case? Best Regards, Sreedevi S On Tue, Feb 10, 2015 at 2:04 PM, David Pilato da...@pilato.fr wrote: If you don’t index content, you won’t be able to search for it I guess. That said, Tika can have this extracted characters limit. See indexedChars below: tika().parseToString(new BytesStreamInput(content, false), metadata, indexedChars); [1] https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob /master/src/main/java/org/elasticsearch/index/mapper/attachment/Attach mentMapper.java#L456 https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob /master/src/main/java/org/elasticsearch/index/mapper/attachment/Attach mentMapper.java#L456 -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs Le 10 févr. 2015 à 09:24, sreedevi s sreedevi.payik...@gmail.com a écrit : Hi, Which is the best method to search in attachments in lucene? I am new to lucene and I am using version 4.10.2. By making use of Tika, I know I can convert files to text and then index it as another field. But for large files that will not be the ideal solution. I believe the maximum characters per field is 10,000. So, what can be ideal method to search attachments then Best Regards, Sreedevi S - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
re-mapping lucene index
We use MMapdirectory impl. in our search application. Occasionally we need to do a full indexing by dropping entire directory contents. How does re-mapping work with MMapDirectory as the directory contents are going to replace with new ones? is this going to be seamless or an application restart required? Additonal Info: We use SearcherManger to acquire searchers and we do periodically refresh serachers.
Re: re-mapping lucene index
searching and indexing apps run in diffrent jvms. we use lucene 4.7 and using the default openmode. For full indexing, we use java.io.File.delete() to recursively delete index directory contents. will remapping cause any issues in this case if I dont use options you suggested? On Tue, Feb 10, 2015 at 1:56 PM, Michael McCandless luc...@mikemccandless.com wrote: Just open a new IndexWriter with OpenMode.CREATE. It will replace the index. Or if you already have an IW open, use deleteAll. Mike McCandless http://blog.mikemccandless.com On Tue, Feb 10, 2015 at 1:31 PM, Vijay B vijay.nip...@gmail.com wrote: We use MMapdirectory impl. in our search application. Occasionally we need to do a full indexing by dropping entire directory contents. How does re-mapping work with MMapDirectory as the directory contents are going to replace with new ones? is this going to be seamless or an application restart required? Additonal Info: We use SearcherManger to acquire searchers and we do periodically refresh serachers. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: re-mapping lucene index
Just open a new IndexWriter with OpenMode.CREATE. It will replace the index. Or if you already have an IW open, use deleteAll. Mike McCandless http://blog.mikemccandless.com On Tue, Feb 10, 2015 at 1:31 PM, Vijay B vijay.nip...@gmail.com wrote: We use MMapdirectory impl. in our search application. Occasionally we need to do a full indexing by dropping entire directory contents. How does re-mapping work with MMapDirectory as the directory contents are going to replace with new ones? is this going to be seamless or an application restart required? Additonal Info: We use SearcherManger to acquire searchers and we do periodically refresh serachers. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: re-mapping lucene index
It's fine if writer and reader are in separate JVMs. You really should not rm -rf yourself. It's better to let Lucene's do it, e.g. it's transactional at that point so that if your new IndexWriter (that deleted all docs) crashes before it could commit, the old index is still intact. It also ensures file names won't be reused, which is important on windows if you still have readers open on the index. Regardless of which approach you use, the old mappings will remain alive until you've closed all open readers agains the old index. Mike McCandless http://blog.mikemccandless.com On Tue, Feb 10, 2015 at 2:09 PM, Vijay B vijay.nip...@gmail.com wrote: searching and indexing apps run in diffrent jvms. we use lucene 4.7 and using the default openmode. For full indexing, we use java.io.File.delete() to recursively delete index directory contents. will remapping cause any issues in this case if I dont use options you suggested? On Tue, Feb 10, 2015 at 1:56 PM, Michael McCandless luc...@mikemccandless.com wrote: Just open a new IndexWriter with OpenMode.CREATE. It will replace the index. Or if you already have an IW open, use deleteAll. Mike McCandless http://blog.mikemccandless.com On Tue, Feb 10, 2015 at 1:31 PM, Vijay B vijay.nip...@gmail.com wrote: We use MMapdirectory impl. in our search application. Occasionally we need to do a full indexing by dropping entire directory contents. How does re-mapping work with MMapDirectory as the directory contents are going to replace with new ones? is this going to be seamless or an application restart required? Additonal Info: We use SearcherManger to acquire searchers and we do periodically refresh serachers. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: re-mapping lucene index
Hi, In Linux/Solaris/BSD/... operating systems you can delete files while they are open (or mmapped, does not matter). The inode/file on disk stays alive until everything is closed (delete on last close semantics), it just disappears from the directory listing, so you cannot open new handles to the file. This means: If there are still index readers open, deleting the underlying directory and/or its files has no effect on the IndexReader - you can still search it (until you close it). But in any case, don't do this! Just let IndexWriter clean up by explicitely creating a new index. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Vijay B [mailto:vijay.nip...@gmail.com] Sent: Tuesday, February 10, 2015 8:38 PM To: java-user@lucene.apache.org Subject: Re: re-mapping lucene index Appreciate it Mike. That answeres it all. BTW we use solaris. On Tue, Feb 10, 2015 at 2:29 PM, Michael McCandless luc...@mikemccandless.com wrote: It's fine if writer and reader are in separate JVMs. You really should not rm -rf yourself. It's better to let Lucene's do it, e.g. it's transactional at that point so that if your new IndexWriter (that deleted all docs) crashes before it could commit, the old index is still intact. It also ensures file names won't be reused, which is important on windows if you still have readers open on the index. Regardless of which approach you use, the old mappings will remain alive until you've closed all open readers agains the old index. Mike McCandless http://blog.mikemccandless.com On Tue, Feb 10, 2015 at 2:09 PM, Vijay B vijay.nip...@gmail.com wrote: searching and indexing apps run in diffrent jvms. we use lucene 4.7 and using the default openmode. For full indexing, we use java.io.File.delete() to recursively delete index directory contents. will remapping cause any issues in this case if I dont use options you suggested? On Tue, Feb 10, 2015 at 1:56 PM, Michael McCandless luc...@mikemccandless.com wrote: Just open a new IndexWriter with OpenMode.CREATE. It will replace the index. Or if you already have an IW open, use deleteAll. Mike McCandless http://blog.mikemccandless.com On Tue, Feb 10, 2015 at 1:31 PM, Vijay B vijay.nip...@gmail.com wrote: We use MMapdirectory impl. in our search application. Occasionally we need to do a full indexing by dropping entire directory contents. How does re-mapping work with MMapDirectory as the directory contents are going to replace with new ones? is this going to be seamless or an application restart required? Additonal Info: We use SearcherManger to acquire searchers and we do periodically refresh serachers. --- -- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: re-mapping lucene index
Appreciate it Mike. That answeres it all. BTW we use solaris. On Tue, Feb 10, 2015 at 2:29 PM, Michael McCandless luc...@mikemccandless.com wrote: It's fine if writer and reader are in separate JVMs. You really should not rm -rf yourself. It's better to let Lucene's do it, e.g. it's transactional at that point so that if your new IndexWriter (that deleted all docs) crashes before it could commit, the old index is still intact. It also ensures file names won't be reused, which is important on windows if you still have readers open on the index. Regardless of which approach you use, the old mappings will remain alive until you've closed all open readers agains the old index. Mike McCandless http://blog.mikemccandless.com On Tue, Feb 10, 2015 at 2:09 PM, Vijay B vijay.nip...@gmail.com wrote: searching and indexing apps run in diffrent jvms. we use lucene 4.7 and using the default openmode. For full indexing, we use java.io.File.delete() to recursively delete index directory contents. will remapping cause any issues in this case if I dont use options you suggested? On Tue, Feb 10, 2015 at 1:56 PM, Michael McCandless luc...@mikemccandless.com wrote: Just open a new IndexWriter with OpenMode.CREATE. It will replace the index. Or if you already have an IW open, use deleteAll. Mike McCandless http://blog.mikemccandless.com On Tue, Feb 10, 2015 at 1:31 PM, Vijay B vijay.nip...@gmail.com wrote: We use MMapdirectory impl. in our search application. Occasionally we need to do a full indexing by dropping entire directory contents. How does re-mapping work with MMapDirectory as the directory contents are going to replace with new ones? is this going to be seamless or an application restart required? Additonal Info: We use SearcherManger to acquire searchers and we do periodically refresh serachers. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Re: combine to MultiTermQuery with OR
Yep, that looks good to me. -- Ian. On Tue, Feb 10, 2015 at 5:01 PM, Sascha Janz sascha.j...@gmx.net wrote: hm, already thought this could be the solution but didn't know how to do the or Operation so i tried this BooleanQuery bquery = new BooleanQuery(); bquery.add(queryFieldA, BooleanClause.Occur.SHOULD); bquery.add(queryFieldB, BooleanClause.Occur.SHOULD); this is the correct way? Gesendet: Dienstag, 10. Februar 2015 um 17:31 Uhr Von: Ian Lea ian@gmail.com An: java-user@lucene.apache.org Betreff: Re: combine to MultiTermQuery with OR org.apache.lucene.search.BooleanQuery. -- Ian. On Tue, Feb 10, 2015 at 3:28 PM, Sascha Janz sascha.j...@gmx.net wrote: Hi, i want to combine two MultiTermQueries. One searches over FieldA, one over FieldB. Both queries should be combined with OR operator. so in lucene Syntax i want to search FieldA:Term1 OR FieldB:Term1, FieldA:Term2 OR FieldB:Term2, FieldA:Term3 OR FieldB:Term3... how can i do this? greetings sascha - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: combine to MultiTermQuery with OR
Hi sascha, You can do with boolean query, Take your three queries and OR them with boolean clause Occur.should. -Nitin On Tuesday 10 February 2015 08:58 PM, Sascha Janz wrote: Hi, i want to combine two MultiTermQueries. One searches over FieldA, one over FieldB. Both queries should be combined with OR operator. so in lucene Syntax i want to search FieldA:Term1 OR FieldB:Term1, FieldA:Term2 OR FieldB:Term2, FieldA:Term3 OR FieldB:Term3... how can i do this? greetings sascha - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
search on a field by a single word
Hi folks I have a question as follows: suppose there are 3 document in field name: 1) a b c 2) a b 3) a I just want to retrival doc 3) only. I try to use syntax like this: name:a but I find it is not correct.is there any way to solve my question. please help me! thanks ahead! - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Request to be added to the ContributorsGroup
Dear Lucene Team, Please add me to the contributorsGroup so that I can add IntraCherche which is actually based on Lucene. Kind regards,
BulkScorer and .explain() compute scores separately?
I have subclassed the BooleanQuery and changed the BooleanWeight constructor to change the way the /coord/ and /idf /components of the similiarity formula are computed, and my changes work as expected when calling IndexSearcher.explain(). However, I now find that when just calling IndexSearcher.search(), the scores reported for each document and resulting ranking are quite different from what .explain() shows me. What is going on? Clearly scores are computed somewhere else when done by BulkScorer and not in BooleanQuery.BooleanWeight(). I have been looking at the code but it's mighty confusing and I still haven't figured out how to make the same changes on this pipeline. Please help!! Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/BulkScorer-and-explain-compute-scores-separately-tp4185544.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org