Some questions about index...
Hi all, for simplicity reasons I would like to use the index as my data storage whilst using the advantage of the highly optimised Lucene index structure. 1) Can I store all the information of the text file, but also apply a analyser. E.g. I use the StopAnalyzer. After finding the document, I want to extract the original text also from the index. Does this require that I store the information twice in two different fields (one indexed and one unindexed) ? 2) I would like to extract information from the index which can found in a boolean way. I know that Lucene is a VSM which provides Boolean operators. This however does not change its functioning. For example, I have a field with contains an ID number and I want to use the search like a database operatation (e.g. to find the document with id=1). I can solve the problem by searching with query id:1. However, this does not ensure that I will only get one result. Usually the first result is the document I want. But it could happen, that this sometimes does not work. What happens if I should get no results? I guess if I search for id=5 and 5 did not exist I would probably get 50, 51, .. just because the contain 5. Did somebody work with this and can suggest a stable solution? A good solution for these two questions would help me avoiding a database which would need to replicate most the data which I already have in my Lucene index... Kind Regards, Karl -- DSL Komplett von GMX +++ Supergünstig und stressfrei einsteigen! AKTION Kein Einrichtungspreis nutzen: http://www.gmx.net/de/go/dsl - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Some questions about index...
On Feb 5, 2005, at 10:04 AM, Karl Koch wrote: 1) Can I store all the information of the text file, but also apply a analyser. E.g. I use the StopAnalyzer. After finding the document, I want to extract the original text also from the index. Does this require that I store the information twice in two different fields (one indexed and one unindexed) ? You should use a single stored, tokenized, and indexed field for this purpose. Be cautious of how you construct the Field object to achieve this. 2) I would like to extract information from the index which can found in a boolean way. I know that Lucene is a VSM which provides Boolean operators. This however does not change its functioning. For example, I have a field with contains an ID number and I want to use the search like a database operatation (e.g. to find the document with id=1). I can solve the problem by searching with query id:1. However, this does not ensure that I will only get one result. Usually the first result is the document I want. But it could happen, that this sometimes does not work. Why wouldn't it work? For ID-type fields, use a Field.Keyword (stored, indexed, but not tokenized). Search for a specific ID using a TermQuery (don't use QueryParser for this, please). If the ID values are unique, you'll either get zero or one result. What happens if I should get no results? I guess if I search for id=5 and 5 did not exist I would probably get 50, 51, .. just because the contain 5. Did somebody work with this and can suggest a stable solution? No, this would not be the case, unless you're analyzing the ID field with some strange character-by-character analyzer or doing a wildcard *5* type query. A good solution for these two questions would help me avoiding a database which would need to replicate most the data which I already have in my Lucene index... You're on the right track and avoiding a database when it is overkill or duplicative is commendable :) Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
ParallelMultiSearcher and many RemoteSearchers
Hello, lucene-user. Does anyone have idea will ParallelMultiSearcher and many RemoteSearchers be a way to get fast search on distributed index on many servers. For example I have 5 servers with indexes of 50Gb on each server. Indexes are updated interactively. I want to run on 6th server ParallelMultiSearcher which will be connected to other 5 server through RemoteSearcher. Does it okay to go with RemoteSearcher class based on RMI in this case?.. I am concerned about reponse time and speed of the system... Yura Smolsky - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Some questions about index...
Thank you four fast, straight and usefull comments. Keeping in mind what was said, did anybody actually think about implementing a kind of database layer on top of a lucene index. A database would be an index, collumns would be fields and entries documents. At least everything which would only require a single table could be done. A SELECT would be search ... :-) Karl On Feb 5, 2005, at 10:04 AM, Karl Koch wrote: 1) Can I store all the information of the text file, but also apply a analyser. E.g. I use the StopAnalyzer. After finding the document, I want to extract the original text also from the index. Does this require that I store the information twice in two different fields (one indexed and one unindexed) ? You should use a single stored, tokenized, and indexed field for this purpose. Be cautious of how you construct the Field object to achieve this. 2) I would like to extract information from the index which can found in a boolean way. I know that Lucene is a VSM which provides Boolean operators. This however does not change its functioning. For example, I have a field with contains an ID number and I want to use the search like a database operatation (e.g. to find the document with id=1). I can solve the problem by searching with query id:1. However, this does not ensure that I will only get one result. Usually the first result is the document I want. But it could happen, that this sometimes does not work. Why wouldn't it work? For ID-type fields, use a Field.Keyword (stored, indexed, but not tokenized). Search for a specific ID using a TermQuery (don't use QueryParser for this, please). If the ID values are unique, you'll either get zero or one result. What happens if I should get no results? I guess if I search for id=5 and 5 did not exist I would probably get 50, 51, .. just because the contain 5. Did somebody work with this and can suggest a stable solution? No, this would not be the case, unless you're analyzing the ID field with some strange character-by-character analyzer or doing a wildcard *5* type query. A good solution for these two questions would help me avoiding a database which would need to replicate most the data which I already have in my Lucene index... You're on the right track and avoiding a database when it is overkill or duplicative is commendable :) Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Lassen Sie Ihren Gedanken freien Lauf... z.B. per FreeSMS GMX bietet bis zu 100 FreeSMS/Monat: http://www.gmx.net/de/go/mail - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]