Re: indexing Chienese langage
We are trying SOLR 1.3 with Paoding Chinese Analyzer , and after reindexing the index size went from 1.5 Gb to 2.7 Gb. Is that some expected behavior ? Is there any switch or trick to avoid having a double + index file size? Koji Sekiguchi-2 wrote: CharFilter can normalize (convert) traditional chinese to simplified chinese or vice versa, if you define mapping.txt. Here is the sample of Chinese character normalization: https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG See SOLR-822 for the detail: https://issues.apache.org/jira/browse/SOLR-822 Koji revathy arun wrote: Hi, When I index chinese content using chinese tokenizer and analyzer in solr 1.3 ,some of the chinese text files are getting indexed but others are not. Since chinese has got many different language subtypes as in standard chinese,simplified chinese etc which of these does the chinese tokenizer support and is there any method to find the type of chiense language from the file? Rgds -- View this message in context: http://www.nabble.com/indexing-Chienese-langage-tp22033302p23864358.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Which caches should use the solr.FastLRUCache
FastLRUCache is designed to be lock free so it is well suited for caches which are hit several times in a request. I guess there is no harm in using FastLRUCache across all the caches. On Thu, Jun 4, 2009 at 3:22 AM, Robert Purdy rdpu...@gmail.com wrote: Hey there, Anyone got any advice on which caches (filterCache, queryResultCache, documentCache, fieldValueCache) should be implemented using the solr.FastLRUCache in solr 1.4 and what are the pros cons vs the solr.LRUCache. Thanks Robert. -- View this message in context: http://www.nabble.com/Which-caches-should-use-the-solr.FastLRUCache-tp23860182p23860182.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Token filter on multivalue field
isn't better to use an UpdateProcessor for this? On Thu, Jun 4, 2009 at 1:52 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hello, It's ugly, but the first thing that came to mind was ThreadLocal. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: David Giffin da...@giffin.org To: solr-user@lucene.apache.org Sent: Wednesday, June 3, 2009 1:57:42 PM Subject: Token filter on multivalue field Hi There, I'm working on a unique token filter, to eliminate duplicates on a multivalue field. My filter works properly for a single value field. It seems that a new TokenFilter is created for each value in the multivalue field. I need to maintain an array of used tokens across all of the values in the multivalue field. Is there a good way to do this? Here is my current code: public class UniqueTokenFilter extends TokenFilter { private ArrayList words; public UniqueTokenFilter(TokenStream input) { super(input); this.words = new ArrayList(); } @Override public final Token next(Token in) throws IOException { for (Token token=input.next(in); token!=null; token=input.next()) { if ( !words.contains(token.term()) ) { words.add(token.term()); return token; } } return null; } } Thanks, David -- - Noble Paul | Principal Engineer| AOL | http://aol.com
RE: Strange behaviour with copyField
What is the defaultOperator set in your solrconfig.xml? Are you sure that it matches for au and not author? -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Thursday, June 04, 2009 2:53 AM To: solr-user@lucene.apache.org Subject: Re: Strange behaviour with copyField On Jun 3, 2009, at 5:09 AM, James Grant wrote: I've been hitting my head against a wall all morning trying to figure this out and haven't managed to get anywhere and wondered if anybody here can help. I have defined a field type fieldType name=text_au class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / /analyzer /fieldType I have two fields field name=au type=text_au indexed=true stored=true required=false multiValued=true/ field name=author type=text_au indexed=true stored=false multiValued=true/ I don't see the difference, as they are the same FieldType for each field, text_au. Is this a typo or am I missing something? and a copyField line copyField source=au dest=author / The idea is to allow searching for authors so a search for author: (Hobbs A.U.) will match the au field value Hobbs A. U. (notice the space). What would lower casing do for handling the space? However the query au:(Hobbs A.U.) matches and the the query author:(Hobbs A.U.) does not. Any ideas? How are you indexing? -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Field Compression
Is it correct to assume that using field compression will cause performance issues if we decide to allow search over this field? ie: field name=id type=sint indexed=true stored=true required=true / field name=title type=textindexed=true stored=true omitNorms=true/ field name=file_location type=string indexed=false stored=true/ field name=body type=text indexed=true stored=false omitNorms=true/ if I decide to add compressed=true to the BODY field... and a I allow search on body... would that be a problem? At the same time: if I add compressed=true , but I never do search on this field ? Stu Hood-3 wrote: I just finished watching this talk about a column-store RDBMS, which has a long section on column compression. Specifically, it talks about the gains from compressing similar data together, and how lazily decompressing data only when it must be processed is great for memory/CPU cache usage. http://youtube.com/watch?v=yrLd-3lnZ58 While interesting, its not relevant to Lucene's stored field storage. On the other hand, it did get me thinking about stored field compression and lazy field loading. Can anyone give me some pointers about compressThreshold values that would be worth experimenting with? Our stored fields are often between 20 and 300 characters, and we're willing to spend more time indexing if it will make searching less IO bound. Thanks, Stu Hood Architecture Software Developer Mailtrust, a Rackspace Company -- View this message in context: http://www.nabble.com/Field-Compression-tp15258669p23865558.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spell checking
Yao Ge schrieb: Maybe we should call this alternative search terms or suggested search terms instead of spell checking. It is misleading as there is no right or wrong in spelling, there is only popular (term frequency?) alternatives. I had exactly the same difficulty in understanding the concept because of the name given to the feature, which usually denotes just what it says, i.e. a spellchecker, which is driven by an authoritative dictionary and a set of rules, as integrated in word processors, in order to ensure orthography. What we have here is quite different from a spellchecker. IMHO, a name conveying the actual meaning, along the lines of suggest, would make more sense. Michael Ludwig
HashDocSet's maxSize and loadFactor
Hey there, I am trying to optimize the setup of hasDocSet. Have read the documentation here: http://wiki.apache.org/solr/SolrPerformanceFactors#head-2de2e9a6f806ab8a3afbd73f1d99ece48e27b3ab But can't exactly understand it. Does it mean that the maxSize should be 0.005 x NumberDocsOfMyIndex or that maxSize should be aprox the same than the number of docs of my index? And... what's the loadFactor? Thanks in advance -- View this message in context: http://www.nabble.com/HashDocSet%27s-maxSize-and-loadFactor-tp23868434p23868434.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: indexing Chienese langage
Hmmm, are you quite sure that you emptied the index first and didn'tjust add all the documents a second time to the index? Also, when you say the index almost doubled, were you looking only at the size of the *directory*? SOLR might have been holding a copy of the old index open while you built a new one... Best Erick On Thu, Jun 4, 2009 at 2:20 AM, Fer-Bj fernando.b...@gmail.com wrote: We are trying SOLR 1.3 with Paoding Chinese Analyzer , and after reindexing the index size went from 1.5 Gb to 2.7 Gb. Is that some expected behavior ? Is there any switch or trick to avoid having a double + index file size? Koji Sekiguchi-2 wrote: CharFilter can normalize (convert) traditional chinese to simplified chinese or vice versa, if you define mapping.txt. Here is the sample of Chinese character normalization: https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG See SOLR-822 for the detail: https://issues.apache.org/jira/browse/SOLR-822 Koji revathy arun wrote: Hi, When I index chinese content using chinese tokenizer and analyzer in solr 1.3 ,some of the chinese text files are getting indexed but others are not. Since chinese has got many different language subtypes as in standard chinese,simplified chinese etc which of these does the chinese tokenizer support and is there any method to find the type of chiense language from the file? Rgds -- View this message in context: http://www.nabble.com/indexing-Chienese-langage-tp22033302p23864358.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field Compression
Warning: This is from a Lucene perspective I don't think it matters. I'm pretty sure that COMPRESS onlyapplies to *storing* the data, not putting the tokens in the index (this latter is what's serached)... It *will* cause performance issues if you load that field for a large number of documents on a particular search. I know Lucene itself has lazy field loading that helps in this case, but I don't know how to persuade SOLR to use it (it may even lazy-load automatically). But this is separate from searching... Best er...@nottoomuchhelpbutimtrying. On Thu, Jun 4, 2009 at 4:07 AM, Fer-Bj fernando.b...@gmail.com wrote: Is it correct to assume that using field compression will cause performance issues if we decide to allow search over this field? ie: field name=id type=sint indexed=true stored=true required=true / field name=title type=textindexed=true stored=true omitNorms=true/ field name=file_location type=string indexed=false stored=true/ field name=body type=text indexed=true stored=false omitNorms=true/ if I decide to add compressed=true to the BODY field... and a I allow search on body... would that be a problem? At the same time: if I add compressed=true , but I never do search on this field ? Stu Hood-3 wrote: I just finished watching this talk about a column-store RDBMS, which has a long section on column compression. Specifically, it talks about the gains from compressing similar data together, and how lazily decompressing data only when it must be processed is great for memory/CPU cache usage. http://youtube.com/watch?v=yrLd-3lnZ58 While interesting, its not relevant to Lucene's stored field storage. On the other hand, it did get me thinking about stored field compression and lazy field loading. Can anyone give me some pointers about compressThreshold values that would be worth experimenting with? Our stored fields are often between 20 and 300 characters, and we're willing to spend more time indexing if it will make searching less IO bound. Thanks, Stu Hood Architecture Software Developer Mailtrust, a Rackspace Company -- View this message in context: http://www.nabble.com/Field-Compression-tp15258669p23865558.html Sent from the Solr - User mailing list archive at Nabble.com.
SpellCheckComponent: queryAnalyzerFieldType
Shalin Shekhar Mangar wrote: | If you use spellcheck.q parameter for specifying | the spelling query, then the field's analyzer will | be used [...] If you use the q parameter, then the | SpellingQueryConverter is used. http://markmail.org/message/k35r7qmpatjvllsc - message http://markmail.org/thread/gypvpfnsd5sggkpx - whole thread Is it correct to say that when I intend to always use the spellcheck.q parameter I do not need to specify a queryAnalyzerFieldType in my spellcheck searchComponent, which I define in solrconfig.xml? Given the limitations of the SpellingQueryConverter laid out in the thread referred to above, it seems you want to use the spellcheck.q parameter for anything but what can be encoded in ASCII. Is that true? Michael Ludwig
Re: spell checking
query suggest --wunder On 6/4/09 1:25 AM, Michael Ludwig m...@as-guides.com wrote: Yao Ge schrieb: Maybe we should call this alternative search terms or suggested search terms instead of spell checking. It is misleading as there is no right or wrong in spelling, there is only popular (term frequency?) alternatives. I had exactly the same difficulty in understanding the concept because of the name given to the feature, which usually denotes just what it says, i.e. a spellchecker, which is driven by an authoritative dictionary and a set of rules, as integrated in word processors, in order to ensure orthography. What we have here is quite different from a spellchecker. IMHO, a name conveying the actual meaning, along the lines of suggest, would make more sense. Michael Ludwig
Re: Questions regarding IT search solution
Hi, Any help/pointers on the following message would really help me.. Thanks,Surfer --- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote: From: Silent Surfer silentsurfe...@yahoo.com Subject: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Tuesday, June 2, 2009, 5:45 PM Hi, I am new to Lucene forum and it is my first question.I need a clarification from you. Requirement:--1. Build a IT search tool for logs similar to that of Splunk(Only wrt searching logs but not in terms of reporting, graphs etc) using solr/lucene. The log files are mainly the server logs like JBoss, Custom application server logs (May or may not be log4j logs) and the files size can go potentially upto 100 MB2. The logs are spread across multiple servers (25 to 30 servers)2. Capability to be do search almost realtime3. Support distributed search Our search criterion can be based on a keyword or timestamp or IP address etc. Can anyone throw some light if solr/lucene is right solution for this ? Appreciate any quick help in this regard. Thanks,Surfer
Re: Questions regarding IT search solution
Hi, Any help/pointers on the following message would really help me.. Thanks,Surfer --- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote: From: Silent Surfer silentsurfe...@yahoo.com Subject: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Tuesday, June 2, 2009, 5:45 PM Hi, I am new to Lucene forum and it is my first question.I need a clarification from you. Requirement:--1. Build a IT search tool for logs similar to that of Splunk(Only wrt searching logs but not in terms of reporting, graphs etc) using solr/lucene. The log files are mainly the server logs like JBoss, Custom application server logs (May or may not be log4j logs) and the files size can go potentially upto 100 MB2. The logs are spread across multiple servers (25 to 30 servers)2. Capability to be do search almost realtime3. Support distributed search Our search criterion can be based on a keyword or timestamp or IP address etc. Can anyone throw some light if solr/lucene is right solution for this ? Appreciate any quick help in this regard. Thanks,Surfer
Re: Which caches should use the solr.FastLRUCache
2009/6/4 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: FastLRUCache is designed to be lock free so it is well suited for caches which are hit several times in a request. I guess there is no harm in using FastLRUCache across all the caches. Gets are cheaper, but evictions are more expensive. If the cache hit rate is low, the old synchronized cache may be faster, unless you have a ton of CPUs... not sure where the crossover point is though. -Yonik http://www.lucidimagination.com
Re: Questions regarding IT search solution
Why build one? Don't those already exist? Personally, I'd start with Hadoop instead of Solr. Putting logs in a search index is guaranteed to not scale. People were already trying different approaches ten years ago. wunder On 6/4/09 8:41 AM, Silent Surfer silentsurfe...@yahoo.com wrote: Hi, Any help/pointers on the following message would really help me.. Thanks,Surfer --- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote: From: Silent Surfer silentsurfe...@yahoo.com Subject: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Tuesday, June 2, 2009, 5:45 PM Hi, I am new to Lucene forum and it is my first question.I need a clarification from you. Requirement:--1. Build a IT search tool for logs similar to that of Splunk(Only wrt searching logs but not in terms of reporting, graphs etc) using solr/lucene. The log files are mainly the server logs like JBoss, Custom application server logs (May or may not be log4j logs) and the files size can go potentially upto 100 MB2. The logs are spread across multiple servers (25 to 30 servers)2. Capability to be do search almost realtime3. Support distributed search Our search criterion can be based on a keyword or timestamp or IP address etc. Can anyone throw some light if solr/lucene is right solution for this ? Appreciate any quick help in this regard. Thanks,Surfer
Faceting on text fields
I am index a database with over 1 millions rows. Two of fields contain unstructured text but size of each fields is limited (256 characters). I come up with an idea to use visualize the text fields using text cloud by turning the two text fields in facets. The weight of font and size is of each facet value (words) derived from the facet counts. I used simpler field type so that the there is no stemming to these facet values: fieldType name=word class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType The facet query is considerably slower comparing to other facets from structured database fields (with highly repeated values). What I found interesting is that even after I constrained search results to just a few hunderd hits using other facets, these text facets are still very slow. I understand that text fields are not good candidate for faceting as it can contain very large number of unique values. However why it is still slow after my matching documents is reduced to hundreds? Is it because the whole filter is cached (regardless the matching docs) and I don't have enough filter cache size to fit the whole list? The following is my filterCahce setting: filterCache class=solr.LRUCache size=5120 initialSize=512 autowarmCount=128/ Lastly, what I really want to is to give user a chance to visualize and filter on top relevant words in the free-text fields. Are there alternative to facet field approach? term vectors? I can do client side process based on top N (say 100) hits for this but it is my last option. -- View this message in context: http://www.nabble.com/Faceting-on-text-fields-tp23872891p23872891.html Sent from the Solr - User mailing list archive at Nabble.com.
statistics about word distances in solr
Hi, I was wondering if there's an option to return statistics about distances from the query terms to the most frequent terms in the result documents. At present I return the most frequent terms using facetSearch which returns for each word in the result documents the number ob occurences (within the results). The additional information I'm looking for is the average distance between these terms and my search term. So let's say I have two docs the house is red I live in a red house The search for house should also return the info the:1 is:1 red:1.5 I:5 live:4 and so on... As I wasn't able to find such a function I thought about two solution for the problem 1) Use facetSearch and implement a different facet.method which calculates the average distance of a word to the given search term. Solr doesn't seem to provide an interface to implement a different method so I think this solution would be a bit dogdy and would lead to problems with the next Solr Upgrade. 2) Using the TermVectorComponent which return the position of each word within a document, I could calculate the distance based on this data in the application. But TermVectorComponent returns information per document which means I would need to return all documents of the result set which is probably not recommended. My question is a) Did a miss a function of Solr that already does what I'm looking for? b) Is solution 2) feasible even if I always have to return all docs of the results set (the content doesn't need to be return though, just the statistics) c) Are the interfaces to ammend facetSearch the way I described which I might have missed? Thanks Jens
Re: Field Compression
On Jun 4, 2009, at 6:42 AM, Erick Erickson wrote: It *will* cause performance issues if you load that field for a large number of documents on a particular search. I know Lucene itself has lazy field loading that helps in this case, but I don't know how to persuade SOLR to use it (it may even lazy-load automatically). But this is separate from searching... Lazy loading is an option configured in the solrconfig.xml
Re: SpellCheckComponent: queryAnalyzerFieldType
On Thu, Jun 4, 2009 at 7:24 PM, Michael Ludwig m...@as-guides.com wrote: Shalin Shekhar Mangar wrote: | If you use spellcheck.q parameter for specifying | the spelling query, then the field's analyzer will | be used [...] If you use the q parameter, then the | SpellingQueryConverter is used. http://markmail.org/message/k35r7qmpatjvllsc - message http://markmail.org/thread/gypvpfnsd5sggkpx - whole thread Is it correct to say that when I intend to always use the spellcheck.q parameter I do not need to specify a queryAnalyzerFieldType in my spellcheck searchComponent, which I define in solrconfig.xml? Yes, that is correct. Even if a queryAnalyzerFieldType is not specified and your query uses q, then WhitespaceTokenizer is used by default. Given the limitations of the SpellingQueryConverter laid out in the thread referred to above, it seems you want to use the spellcheck.q parameter for anything but what can be encoded in ASCII. Is that true? Umm, no actually. SpellingQueryConverter was written for a very simple use-case dealing with ASCII only. But there is no reason why we cannot extend it to cover the full UTF-8 set. I'm sorry I forgot to follow-up on the old thread where you and Jonathan posted a regex that should work. Can you please open an issue and if possible, give a patch? -- Regards, Shalin Shekhar Mangar.
Re: Faceting on text fields
Are you using Solr 1.3? You might want to try the latest 1.4 test build - faceting has changed a lot. -Yonik http://www.lucidimagination.com On Thu, Jun 4, 2009 at 12:01 PM, Yao Ge yao...@gmail.com wrote: I am index a database with over 1 millions rows. Two of fields contain unstructured text but size of each fields is limited (256 characters). I come up with an idea to use visualize the text fields using text cloud by turning the two text fields in facets. The weight of font and size is of each facet value (words) derived from the facet counts. I used simpler field type so that the there is no stemming to these facet values: fieldType name=word class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType The facet query is considerably slower comparing to other facets from structured database fields (with highly repeated values). What I found interesting is that even after I constrained search results to just a few hunderd hits using other facets, these text facets are still very slow. I understand that text fields are not good candidate for faceting as it can contain very large number of unique values. However why it is still slow after my matching documents is reduced to hundreds? Is it because the whole filter is cached (regardless the matching docs) and I don't have enough filter cache size to fit the whole list? The following is my filterCahce setting: filterCache class=solr.LRUCache size=5120 initialSize=512 autowarmCount=128/ Lastly, what I really want to is to give user a chance to visualize and filter on top relevant words in the free-text fields. Are there alternative to facet field approach? term vectors? I can do client side process based on top N (say 100) hits for this but it is my last option. -- View this message in context: http://www.nabble.com/Faceting-on-text-fields-tp23872891p23872891.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Which caches should use the solr.FastLRUCache
Thanks for the Good information :) Well I haven't had any evictions in any of the caches in years, but the hit ratio is 0.51 in queryResultCache, 0.77 in documentCache, 1.00 in the fieldValueCache, and 0.99 in the filterCache. So in your opinion should the documentCache and queryResultCache use the old way on a single CPU quad core machine? Also right now I have all caches using the solr.FastLRUCache (tried with both the cleanupThread = false or true) and I have noticed some queries that are taking 53 ms on a freshly warmed new searcher (when nothing else is querying the slave), but when the slave is busy the same query, that should be using the caches, is sometimes taking 8 secs? Any thoughts? Thanks Robert. Yonik Seeley-2 wrote: 2009/6/4 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: FastLRUCache is designed to be lock free so it is well suited for caches which are hit several times in a request. I guess there is no harm in using FastLRUCache across all the caches. Gets are cheaper, but evictions are more expensive. If the cache hit rate is low, the old synchronized cache may be faster, unless you have a ton of CPUs... not sure where the crossover point is though. -Yonik http://www.lucidimagination.com -- View this message in context: http://www.nabble.com/Which-caches-should-use-the-solr.FastLRUCache-tp23860182p23874898.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: HashDocSet's maxSize and loadFactor
On Thu, Jun 4, 2009 at 7:52 AM, Marc Sturlese marc.sturl...@gmail.com wrote: Hey there, I am trying to optimize the setup of hasDocSet. Be aware that in the latest versions of Solr 1.4, HashDocSet is no longer used by Solr. https://issues.apache.org/jira/browse/SOLR-1169 Have read the documentation here: http://wiki.apache.org/solr/SolrPerformanceFactors#head-2de2e9a6f806ab8a3afbd73f1d99ece48e27b3ab But can't exactly understand it. Does it mean that the maxSize should be 0.005 x NumberDocsOfMyIndex or that maxSize should be aprox the same than the number of docs of my index? The former. And... what's the loadFactor? loadFactor: size of the hash table compared to the number of elements stored. http://en.wikipedia.org/wiki/Hash_table -Yonik http://www.lucidimagination.com
Index Comma Separated numbers
Hi, One of the fields to be indexed is price which is comma separated, e.g., 12,034.00. How can I indexed it as a number? I am using DIH to pull the data. Thanks.
Re: Faceting on text fields
Yes. I am using 1.3. When is 1.4 due for release? Yonik Seeley-2 wrote: Are you using Solr 1.3? You might want to try the latest 1.4 test build - faceting has changed a lot. -Yonik http://www.lucidimagination.com On Thu, Jun 4, 2009 at 12:01 PM, Yao Ge yao...@gmail.com wrote: I am index a database with over 1 millions rows. Two of fields contain unstructured text but size of each fields is limited (256 characters). I come up with an idea to use visualize the text fields using text cloud by turning the two text fields in facets. The weight of font and size is of each facet value (words) derived from the facet counts. I used simpler field type so that the there is no stemming to these facet values: fieldType name=word class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType The facet query is considerably slower comparing to other facets from structured database fields (with highly repeated values). What I found interesting is that even after I constrained search results to just a few hunderd hits using other facets, these text facets are still very slow. I understand that text fields are not good candidate for faceting as it can contain very large number of unique values. However why it is still slow after my matching documents is reduced to hundreds? Is it because the whole filter is cached (regardless the matching docs) and I don't have enough filter cache size to fit the whole list? The following is my filterCahce setting: filterCache class=solr.LRUCache size=5120 initialSize=512 autowarmCount=128/ Lastly, what I really want to is to give user a chance to visualize and filter on top relevant words in the free-text fields. Are there alternative to facet field approach? term vectors? I can do client side process based on top N (say 100) hits for this but it is my last option. -- View this message in context: http://www.nabble.com/Faceting-on-text-fields-tp23872891p23872891.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Faceting-on-text-fields-tp23872891p23876051.html Sent from the Solr - User mailing list archive at Nabble.com.
How to disable posting updates from a remote server
Hi, I find that I am freely able to post to my production SOLR server, from any other host that can run the post command. So somebody can wipe out the whole index by posting a delete query. Is there a way SOLR can be configured so that it will take updates ONLY from the server on which it is running? Thanks - ashok -- View this message in context: http://www.nabble.com/How-to-disable-posting-updates-from-a-remote-server-tp23876170p23876170.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to disable posting updates from a remote server
Take a look at the security section in the wiki, u could do this with firewall rules or password access. On Thursday, June 4, 2009, ashokc ash...@qualcomm.com wrote: Hi, I find that I am freely able to post to my production SOLR server, from any other host that can run the post command. So somebody can wipe out the whole index by posting a delete query. Is there a way SOLR can be configured so that it will take updates ONLY from the server on which it is running? Thanks - ashok -- View this message in context: http://www.nabble.com/How-to-disable-posting-updates-from-a-remote-server-tp23876170p23876170.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Customizing results
Hello, If you know what language the user specified (or is associated with), then you just have to ensure the fl URL parameter contain that field (and any other fields you want returned). So if the language/locale is de_de, then make sure the request has fl=location_de_de,another_field,another_field, and not, for example location_it_it Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Manepalli, Kalyan kalyan.manepa...@orbitz.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 12:36:30 PM Subject: Customizing results Hi, I am trying to customize the response that I receive from Solr. In the index I have multiple fields that contain the same data in different language. At the query time client specifies the language. Based on this param, I want to return the value, copied into a different field. Eg: Lubang, Filippinerne Lubang, Philippinen Lubang, Philippines Lubang, Filipinas If the user specifies language as de_de, then I want to return the result as Lubang, Philippinen What is the most optimal way of doing this? Any suggestions on this will be helpful Thanks, Kalyan Manepalli
Re: Questions regarding IT search solution
I would also be interested to know what other existing solutions exist. Splunk's advantage is that it does extraction of the fields with advanced searching functionality (it has lexers/parsers for multiple content types). I believe that's the Solr's function desired in original posting. At the time they came out (2004), I was not aware of any good open source solutions to do what they did. And I would have loved one, as I was analyzing multi-gigabite logs. Hadoop might be a way to process the files, but what would do the indexing and searching? Regards, Alex. On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwoodwunderw...@netflix.com wrote: Why build one? Don't those already exist? Personally, I'd start with Hadoop instead of Solr. Putting logs in a search index is guaranteed to not scale. People were already trying different approaches ten years ago. wunder On 6/4/09 8:41 AM, Silent Surfer silentsurfe...@yahoo.com wrote: Hi, Any help/pointers on the following message would really help me.. Thanks,Surfer --- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote: From: Silent Surfer silentsurfe...@yahoo.com Subject: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Tuesday, June 2, 2009, 5:45 PM Hi, I am new to Lucene forum and it is my first question.I need a clarification from you. Requirement:--1. Build a IT search tool for logs similar to that of Splunk(Only wrt searching logs but not in terms of reporting, graphs etc) using solr/lucene. The log files are mainly the server logs like JBoss, Custom application server logs (May or may not be log4j logs) and the files size can go potentially upto 100 MB2. The logs are spread across multiple servers (25 to 30 servers)2. Capability to be do search almost realtime3. Support distributed search Our search criterion can be based on a keyword or timestamp or IP address etc. Can anyone throw some light if solr/lucene is right solution for this ? Appreciate any quick help in this regard. Thanks,Surfer
Re: Is there Downside to a huge synonyms file?
On Tue, Jun 2, 2009 at 11:28 PM, anuvenk anuvenkat...@hotmail.com wrote: I'm using query time synonyms. These don't currently work if the synonyms expand to more than one option, and those options have a different number of words. -Yonik http://www.lucidimagination.com
Re: indexing Chienese langage
I can't tell what that analyzer does, but I'm guessing it uses n-grams? Maybe consider trying https://issues.apache.org/jira/browse/LUCENE-1629 instead? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Fer-Bj fernando.b...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 2:20:03 AM Subject: Re: indexing Chienese langage We are trying SOLR 1.3 with Paoding Chinese Analyzer , and after reindexing the index size went from 1.5 Gb to 2.7 Gb. Is that some expected behavior ? Is there any switch or trick to avoid having a double + index file size? Koji Sekiguchi-2 wrote: CharFilter can normalize (convert) traditional chinese to simplified chinese or vice versa, if you define mapping.txt. Here is the sample of Chinese character normalization: https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG See SOLR-822 for the detail: https://issues.apache.org/jira/browse/SOLR-822 Koji revathy arun wrote: Hi, When I index chinese content using chinese tokenizer and analyzer in solr 1.3 ,some of the chinese text files are getting indexed but others are not. Since chinese has got many different language subtypes as in standard chinese,simplified chinese etc which of these does the chinese tokenizer support and is there any method to find the type of chiense language from the file? Rgds -- View this message in context: http://www.nabble.com/indexing-Chienese-langage-tp22033302p23864358.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Customizing results
Aha, so you really want to rename the field at response time? I wonder if this is something that could be done with (or should be added to) response writers. That's where I'd go look first. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Manepalli, Kalyan kalyan.manepa...@orbitz.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 5:30:40 PM Subject: RE: Customizing results Otis, With that solution, the client has to accept all type location fields (location_de_de, location_it_it). I want to copy the result into location field, so that client can just accept location. Thanks, Kalyan Manepalli -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Thursday, June 04, 2009 4:16 PM To: solr-user@lucene.apache.org Subject: Re: Customizing results Hello, If you know what language the user specified (or is associated with), then you just have to ensure the fl URL parameter contain that field (and any other fields you want returned). So if the language/locale is de_de, then make sure the request has fl=location_de_de,another_field,another_field, and not, for example location_it_it Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Manepalli, Kalyan To: solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 12:36:30 PM Subject: Customizing results Hi, I am trying to customize the response that I receive from Solr. In the index I have multiple fields that contain the same data in different language. At the query time client specifies the language. Based on this param, I want to return the value, copied into a different field. Eg: Lubang, Filippinerne Lubang, Philippinen Lubang, Philippines Lubang, Filipinas If the user specifies language as de_de, then I want to return the result as Lubang, Philippinen What is the most optimal way of doing this? Any suggestions on this will be helpful Thanks, Kalyan Manepalli
Re: Questions regarding IT search solution
Hi, As Alex correctly pointed out my main intention is to figure out whether Solr/lucene offer functionalities to replicate what Splunk is doing in terms of building indexes etc for enabling search capabilities. We evaluated Splunk, but it is not very cost effective solution for us as we may have logs running into few GBs per day as there can be around 25-20 servers running, and Splunk licensing model is based of size of logs per day that too, the license valid for only 1 year. With this back ground, any further inputs on this are greatly appreciated. Thanks,Surfer --- On Thu, 6/4/09, Alexandre Rafalovitch arafa...@gmail.com wrote: From: Alexandre Rafalovitch arafa...@gmail.com Subject: Re: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Thursday, June 4, 2009, 9:27 PM I would also be interested to know what other existing solutions exist. Splunk's advantage is that it does extraction of the fields with advanced searching functionality (it has lexers/parsers for multiple content types). I believe that's the Solr's function desired in original posting. At the time they came out (2004), I was not aware of any good open source solutions to do what they did. And I would have loved one, as I was analyzing multi-gigabite logs. Hadoop might be a way to process the files, but what would do the indexing and searching? Regards, Alex. On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwoodwunderw...@netflix.com wrote: Why build one? Don't those already exist? Personally, I'd start with Hadoop instead of Solr. Putting logs in a search index is guaranteed to not scale. People were already trying different approaches ten years ago. wunder On 6/4/09 8:41 AM, Silent Surfer silentsurfe...@yahoo.com wrote: Hi, Any help/pointers on the following message would really help me.. Thanks,Surfer --- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote: From: Silent Surfer silentsurfe...@yahoo.com Subject: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Tuesday, June 2, 2009, 5:45 PM Hi, I am new to Lucene forum and it is my first question.I need a clarification from you. Requirement:--1. Build a IT search tool for logs similar to that of Splunk(Only wrt searching logs but not in terms of reporting, graphs etc) using solr/lucene. The log files are mainly the server logs like JBoss, Custom application server logs (May or may not be log4j logs) and the files size can go potentially upto 100 MB2. The logs are spread across multiple servers (25 to 30 servers)2. Capability to be do search almost realtime3. Support distributed search Our search criterion can be based on a keyword or timestamp or IP address etc. Can anyone throw some light if solr/lucene is right solution for this ? Appreciate any quick help in this regard. Thanks,Surfer
Re: Questions regarding IT search solution
My guess is Solr/Lucene would work. Not sure how well/fast, but it would, esp. if you avoid range queries (or use tdate), and esp. if you shard/segment indices smartly, so that at query time you send (or distribute if you have to) the query to only those shards that have the data (if your query is for a limited time period). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Silent Surfer silentsurfe...@yahoo.com To: solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 5:52:21 PM Subject: Re: Questions regarding IT search solution Hi, As Alex correctly pointed out my main intention is to figure out whether Solr/lucene offer functionalities to replicate what Splunk is doing in terms of building indexes etc for enabling search capabilities. We evaluated Splunk, but it is not very cost effective solution for us as we may have logs running into few GBs per day as there can be around 25-20 servers running, and Splunk licensing model is based of size of logs per day that too, the license valid for only 1 year. With this back ground, any further inputs on this are greatly appreciated. Thanks,Surfer --- On Thu, 6/4/09, Alexandre Rafalovitch wrote: From: Alexandre Rafalovitch Subject: Re: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Thursday, June 4, 2009, 9:27 PM I would also be interested to know what other existing solutions exist. Splunk's advantage is that it does extraction of the fields with advanced searching functionality (it has lexers/parsers for multiple content types). I believe that's the Solr's function desired in original posting. At the time they came out (2004), I was not aware of any good open source solutions to do what they did. And I would have loved one, as I was analyzing multi-gigabite logs. Hadoop might be a way to process the files, but what would do the indexing and searching? Regards, Alex. On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwoodwrote: Why build one? Don't those already exist? Personally, I'd start with Hadoop instead of Solr. Putting logs in a search index is guaranteed to not scale. People were already trying different approaches ten years ago. wunder On 6/4/09 8:41 AM, Silent Surfer wrote: Hi, Any help/pointers on the following message would really help me.. Thanks,Surfer --- On Tue, 6/2/09, Silent Surfer wrote: From: Silent Surfer Subject: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Tuesday, June 2, 2009, 5:45 PM Hi, I am new to Lucene forum and it is my first question.I need a clarification from you. Requirement:--1. Build a IT search tool for logs similar to that of Splunk(Only wrt searching logs but not in terms of reporting, graphs etc) using solr/lucene. The log files are mainly the server logs like JBoss, Custom application server logs (May or may not be log4j logs) and the files size can go potentially upto 100 MB2. The logs are spread across multiple servers (25 to 30 servers)2. Capability to be do search almost realtime3. Support distributed search Our search criterion can be based on a keyword or timestamp or IP address etc. Can anyone throw some light if solr/lucene is right solution for this ? Appreciate any quick help in this regard. Thanks,Surfer
Determining Search Query Category
Hi, I have more than 20 categories for my search application. I'm interested in finding the category of query entered by user dynamically instead of asking the user to filter the results through long list of categories. Its a general question, its not specific to solr though, any suggestion about how to approach this problem will be helpful. Thanks Ram -- View this message in context: http://www.nabble.com/Determining-Search-Query-Category-tp23878965p23878965.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to do exact serch with solrj
I still have a problem with exact matching. query.setQuery(title:\hello the world\); This will return all docs with title containing hello the world, i.e., hello the world, Jack will also be matched. What I want is exactly hello the world. Setting this field to string instead of text doesn't work well either, because I want something like Hello, The World to be matched as well. Any idea? Thanks. --- On Sat, 5/30/09, Avlesh Singh avl...@gmail.com wrote: From: Avlesh Singh avl...@gmail.com Subject: Re: how to do exact serch with solrj To: solr-user@lucene.apache.org Date: Saturday, May 30, 2009, 11:45 PM You need exact match for all the three tokens? If yes, try query.setQuery(title:\hello the world\); Cheers Avlesh On Sun, May 31, 2009 at 12:12 PM, Jianbin Dai djian...@yahoo.com wrote: I tried, but seems it's not working right. --- On Sat, 5/30/09, Avlesh Singh avl...@gmail.com wrote: From: Avlesh Singh avl...@gmail.com Subject: Re: how to do exact serch with solrj To: solr-user@lucene.apache.org Date: Saturday, May 30, 2009, 10:56 PM query.setQuery(title:hello the world) is what you need. Cheers Avlesh On Sun, May 31, 2009 at 6:23 AM, Jianbin Dai djian...@yahoo.com wrote: Hi, I want to search hello the world in the title field using solrj. I set the query filter query.addFilterQuery(title); query.setQuery(hello the world); but it returns not exact match results as well. I know one way to do it is to set title field to string instead of text. But is there any way i can do it? If I do the search through web interface Solr Admin by title:hello the world, it returns exact matches. Thanks. JB
Re: indexing Chienese langage
What we usually do to reindex is: 1. stop solr 2. rmdir -r data (that is to remove everything in /opt/solr/data/ 3. mkdir data 4. start solr 5. start reindex. with this we're sure about not having old copies or index.. To check the index size we do: cd data du -sh Otis Gospodnetic wrote: I can't tell what that analyzer does, but I'm guessing it uses n-grams? Maybe consider trying https://issues.apache.org/jira/browse/LUCENE-1629 instead? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Fer-Bj fernando.b...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 2:20:03 AM Subject: Re: indexing Chienese langage We are trying SOLR 1.3 with Paoding Chinese Analyzer , and after reindexing the index size went from 1.5 Gb to 2.7 Gb. Is that some expected behavior ? Is there any switch or trick to avoid having a double + index file size? Koji Sekiguchi-2 wrote: CharFilter can normalize (convert) traditional chinese to simplified chinese or vice versa, if you define mapping.txt. Here is the sample of Chinese character normalization: https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG See SOLR-822 for the detail: https://issues.apache.org/jira/browse/SOLR-822 Koji revathy arun wrote: Hi, When I index chinese content using chinese tokenizer and analyzer in solr 1.3 ,some of the chinese text files are getting indexed but others are not. Since chinese has got many different language subtypes as in standard chinese,simplified chinese etc which of these does the chinese tokenizer support and is there any method to find the type of chiense language from the file? Rgds -- View this message in context: http://www.nabble.com/indexing-Chienese-langage-tp22033302p23864358.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/indexing-Chienese-langage-tp22033302p23879730.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field Compression
Here is what we have: for all the documents we have a field called small_body , which is a 60 chars max text field that were we store the abstract for each article. We have about 8,000,000 documents indexed, and usually we display this small_body on our listing pages. For each listing page we load 50 documents at the time, that is to say, we need to display this small_body we want to compress every time. I'll probably do the compress for this field and run a 1 week test to see the outcome, roll it back eventually. Last question: what's the best way to determine the compress threshold ? Grant Ingersoll-6 wrote: On Jun 4, 2009, at 6:42 AM, Erick Erickson wrote: It *will* cause performance issues if you load that field for a large number of documents on a particular search. I know Lucene itself has lazy field loading that helps in this case, but I don't know how to persuade SOLR to use it (it may even lazy-load automatically). But this is separate from searching... Lazy loading is an option configured in the solrconfig.xml -- View this message in context: http://www.nabble.com/Field-Compression-tp15258669p23879859.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Questions regarding IT search solution
Hi, This is encouraging to know that solr/lucene solution may work. Can anyone using solr/lucene for such scenario can confirm that the solution is used and working fine? That would be really helpful, as I just started looking into the solr/lucene solution only couple of days back and might be difficult to be 100% confident before proposing the solution approach in next couple of days. Thanks,Surfer --- On Thu, 6/4/09, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: From: Otis Gospodnetic otis_gospodne...@yahoo.com Subject: Re: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Thursday, June 4, 2009, 10:26 PM My guess is Solr/Lucene would work. Not sure how well/fast, but it would, esp. if you avoid range queries (or use tdate), and esp. if you shard/segment indices smartly, so that at query time you send (or distribute if you have to) the query to only those shards that have the data (if your query is for a limited time period). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Silent Surfer silentsurfe...@yahoo.com To: solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 5:52:21 PM Subject: Re: Questions regarding IT search solution Hi, As Alex correctly pointed out my main intention is to figure out whether Solr/lucene offer functionalities to replicate what Splunk is doing in terms of building indexes etc for enabling search capabilities. We evaluated Splunk, but it is not very cost effective solution for us as we may have logs running into few GBs per day as there can be around 25-20 servers running, and Splunk licensing model is based of size of logs per day that too, the license valid for only 1 year. With this back ground, any further inputs on this are greatly appreciated. Thanks,Surfer --- On Thu, 6/4/09, Alexandre Rafalovitch wrote: From: Alexandre Rafalovitch Subject: Re: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Thursday, June 4, 2009, 9:27 PM I would also be interested to know what other existing solutions exist. Splunk's advantage is that it does extraction of the fields with advanced searching functionality (it has lexers/parsers for multiple content types). I believe that's the Solr's function desired in original posting. At the time they came out (2004), I was not aware of any good open source solutions to do what they did. And I would have loved one, as I was analyzing multi-gigabite logs. Hadoop might be a way to process the files, but what would do the indexing and searching? Regards, Alex. On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwoodwrote: Why build one? Don't those already exist? Personally, I'd start with Hadoop instead of Solr. Putting logs in a search index is guaranteed to not scale. People were already trying different approaches ten years ago. wunder On 6/4/09 8:41 AM, Silent Surfer wrote: Hi, Any help/pointers on the following message would really help me.. Thanks,Surfer --- On Tue, 6/2/09, Silent Surfer wrote: From: Silent Surfer Subject: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Tuesday, June 2, 2009, 5:45 PM Hi, I am new to Lucene forum and it is my first question.I need a clarification from you. Requirement:--1. Build a IT search tool for logs similar to that of Splunk(Only wrt searching logs but not in terms of reporting, graphs etc) using solr/lucene. The log files are mainly the server logs like JBoss, Custom application server logs (May or may not be log4j logs) and the files size can go potentially upto 100 MB2. The logs are spread across multiple servers (25 to 30 servers)2. Capability to be do search almost realtime3. Support distributed search Our search criterion can be based on a keyword or timestamp or IP address etc. Can anyone throw some light if solr/lucene is right solution for this ? Appreciate any quick help in this regard. Thanks,Surfer
Re: Questions regarding IT search solution
Hey, Your system sounds similar to the work don by Stu Hood at Rackspace in their Mailtrust unit. See http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-datafor more details and inspiration. Regards, Jeff On Thu, Jun 4, 2009 at 4:58 PM, silentsurfe...@yahoo.com wrote: Hi, This is encouraging to know that solr/lucene solution may work. Can anyone using solr/lucene for such scenario can confirm that the solution is used and working fine? That would be really helpful, as I just started looking into the solr/lucene solution only couple of days back and might be difficult to be 100% confident before proposing the solution approach in next couple of days. Thanks,Surfer --- On Thu, 6/4/09, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: From: Otis Gospodnetic otis_gospodne...@yahoo.com Subject: Re: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Thursday, June 4, 2009, 10:26 PM My guess is Solr/Lucene would work. Not sure how well/fast, but it would, esp. if you avoid range queries (or use tdate), and esp. if you shard/segment indices smartly, so that at query time you send (or distribute if you have to) the query to only those shards that have the data (if your query is for a limited time period). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Silent Surfer silentsurfe...@yahoo.com To: solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 5:52:21 PM Subject: Re: Questions regarding IT search solution Hi, As Alex correctly pointed out my main intention is to figure out whether Solr/lucene offer functionalities to replicate what Splunk is doing in terms of building indexes etc for enabling search capabilities. We evaluated Splunk, but it is not very cost effective solution for us as we may have logs running into few GBs per day as there can be around 25-20 servers running, and Splunk licensing model is based of size of logs per day that too, the license valid for only 1 year. With this back ground, any further inputs on this are greatly appreciated. Thanks,Surfer --- On Thu, 6/4/09, Alexandre Rafalovitch wrote: From: Alexandre Rafalovitch Subject: Re: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Thursday, June 4, 2009, 9:27 PM I would also be interested to know what other existing solutions exist. Splunk's advantage is that it does extraction of the fields with advanced searching functionality (it has lexers/parsers for multiple content types). I believe that's the Solr's function desired in original posting. At the time they came out (2004), I was not aware of any good open source solutions to do what they did. And I would have loved one, as I was analyzing multi-gigabite logs. Hadoop might be a way to process the files, but what would do the indexing and searching? Regards, Alex. On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwoodwrote: Why build one? Don't those already exist? Personally, I'd start with Hadoop instead of Solr. Putting logs in a search index is guaranteed to not scale. People were already trying different approaches ten years ago. wunder On 6/4/09 8:41 AM, Silent Surfer wrote: Hi, Any help/pointers on the following message would really help me.. Thanks,Surfer --- On Tue, 6/2/09, Silent Surfer wrote: From: Silent Surfer Subject: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Tuesday, June 2, 2009, 5:45 PM Hi, I am new to Lucene forum and it is my first question.I need a clarification from you. Requirement:--1. Build a IT search tool for logs similar to that of Splunk(Only wrt searching logs but not in terms of reporting, graphs etc) using solr/lucene. The log files are mainly the server logs like JBoss, Custom application server logs (May or may not be log4j logs) and the files size can go potentially upto 100 MB2. The logs are spread across multiple servers (25 to 30 servers)2. Capability to be do search almost realtime3. Support distributed search Our search criterion can be based on a keyword or timestamp or IP address etc. Can anyone throw some light if solr/lucene is right solution for this ? Appreciate any quick help in this regard. Thanks,Surfer
using UpdateRequestProcessor from a custom analyzer
Is it possible to create a custom analyzer (index time) that uses UpdateRequestProcessor to add new fields to posts, based on the tokens generated by the other analyzers that have been run (before my custom analyzer)? The content of said fields must differ from post to post based on the tokens extracted from each one of them. Thank you very much for any answer/suggestion you can give me!!! G. -- View this message in context: http://www.nabble.com/using-UpdateRequestProcessor-from-a-custom-analyzer-tp23880160p23880160.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Determining Search Query Category
Ram, Typical queries are short, so they are hard to categorize using statistical approaches. Maybe categorization of queries would work with a custom set of rules applied to queries? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: ram_sj rpachaiyap...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 6:26:33 PM Subject: Determining Search Query Category Hi, I have more than 20 categories for my search application. I'm interested in finding the category of query entered by user dynamically instead of asking the user to filter the results through long list of categories. Its a general question, its not specific to solr though, any suggestion about how to approach this problem will be helpful. Thanks Ram -- View this message in context: http://www.nabble.com/Determining-Search-Query-Category-tp23878965p23878965.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to do exact serch with solrj
I don't think there is anything ready to be used in Solr (but would be easy to add), but if you indexed your with a custom beginning of string and end of string anchors, you'll be able to get your exact matching working. For example, convert hello the world to $hello the world$ before indexing (and make sure you use string type or KeywordTokenizer -- things that won't remove any characters. Then search for $hello the world$. This will not match $hello the world, Jack$. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Jianbin Dai djian...@yahoo.com To: solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 6:42:39 PM Subject: Re: how to do exact serch with solrj I still have a problem with exact matching. query.setQuery(title:\hello the world\); This will return all docs with title containing hello the world, i.e., hello the world, Jack will also be matched. What I want is exactly hello the world. Setting this field to string instead of text doesn't work well either, because I want something like Hello, The World to be matched as well. Any idea? Thanks. --- On Sat, 5/30/09, Avlesh Singh wrote: From: Avlesh Singh Subject: Re: how to do exact serch with solrj To: solr-user@lucene.apache.org Date: Saturday, May 30, 2009, 11:45 PM You need exact match for all the three tokens? If yes, try query.setQuery(title:\hello the world\); Cheers Avlesh On Sun, May 31, 2009 at 12:12 PM, Jianbin Dai wrote: I tried, but seems it's not working right. --- On Sat, 5/30/09, Avlesh Singh wrote: From: Avlesh Singh Subject: Re: how to do exact serch with solrj To: solr-user@lucene.apache.org Date: Saturday, May 30, 2009, 10:56 PM query.setQuery(title:hello the world) is what you need. Cheers Avlesh On Sun, May 31, 2009 at 6:23 AM, Jianbin Dai wrote: Hi, I want to search hello the world in the title field using solrj. I set the query filter query.addFilterQuery(title); query.setQuery(hello the world); but it returns not exact match results as well. I know one way to do it is to set title field to string instead of text. But is there any way i can do it? If I do the search through web interface Solr Admin by title:hello the world, it returns exact matches. Thanks. JB
Re: how to do exact serch with solrj
I re-read your original request. Here is the recipe that should work: * Define new field type that: Uses KeywordTokenizer Uses LowerCaseFilter * Make your field be of the above type. * Use those begin/end anchor characters at index and search time. I believe that should work. Please try it and let us know. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Otis Gospodnetic otis_gospodne...@yahoo.com To: solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 8:47:50 PM Subject: Re: how to do exact serch with solrj I don't think there is anything ready to be used in Solr (but would be easy to add), but if you indexed your with a custom beginning of string and end of string anchors, you'll be able to get your exact matching working. For example, convert hello the world to $hello the world$ before indexing (and make sure you use string type or KeywordTokenizer -- things that won't remove any characters. Then search for $hello the world$. This will not match $hello the world, Jack$. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Jianbin Dai To: solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 6:42:39 PM Subject: Re: how to do exact serch with solrj I still have a problem with exact matching. query.setQuery(title:\hello the world\); This will return all docs with title containing hello the world, i.e., hello the world, Jack will also be matched. What I want is exactly hello the world. Setting this field to string instead of text doesn't work well either, because I want something like Hello, The World to be matched as well. Any idea? Thanks. --- On Sat, 5/30/09, Avlesh Singh wrote: From: Avlesh Singh Subject: Re: how to do exact serch with solrj To: solr-user@lucene.apache.org Date: Saturday, May 30, 2009, 11:45 PM You need exact match for all the three tokens? If yes, try query.setQuery(title:\hello the world\); Cheers Avlesh On Sun, May 31, 2009 at 12:12 PM, Jianbin Dai wrote: I tried, but seems it's not working right. --- On Sat, 5/30/09, Avlesh Singh wrote: From: Avlesh Singh Subject: Re: how to do exact serch with solrj To: solr-user@lucene.apache.org Date: Saturday, May 30, 2009, 10:56 PM query.setQuery(title:hello the world) is what you need. Cheers Avlesh On Sun, May 31, 2009 at 6:23 AM, Jianbin Dai wrote: Hi, I want to search hello the world in the title field using solrj. I set the query filter query.addFilterQuery(title); query.setQuery(hello the world); but it returns not exact match results as well. I know one way to do it is to set title field to string instead of text. But is there any way i can do it? If I do the search through web interface Solr Admin by title:hello the world, it returns exact matches. Thanks. JB
Re: indexing Chienese langage
first: u not have to restart solr,,,u can use new data to replace old data and call solr to use new search..u can find something in shell script which with solr two: u not have to restart solr,,,just keep id is same..example: old id:1,title:hi, new id:1,title:welcome,,just index new data,,it will delete old data and insert new doc,,,like replace,,but it will use more time and resouce. u can find indexed doc number from solr admin page. On Fri, Jun 5, 2009 at 7:42 AM, Fer-Bj fernando.b...@gmail.com wrote: What we usually do to reindex is: 1. stop solr 2. rmdir -r data (that is to remove everything in /opt/solr/data/ 3. mkdir data 4. start solr 5. start reindex. with this we're sure about not having old copies or index.. To check the index size we do: cd data du -sh Otis Gospodnetic wrote: I can't tell what that analyzer does, but I'm guessing it uses n-grams? Maybe consider trying https://issues.apache.org/jira/browse/LUCENE-1629 instead? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Fer-Bj fernando.b...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 2:20:03 AM Subject: Re: indexing Chienese langage We are trying SOLR 1.3 with Paoding Chinese Analyzer , and after reindexing the index size went from 1.5 Gb to 2.7 Gb. Is that some expected behavior ? Is there any switch or trick to avoid having a double + index file size? Koji Sekiguchi-2 wrote: CharFilter can normalize (convert) traditional chinese to simplified chinese or vice versa, if you define mapping.txt. Here is the sample of Chinese character normalization: https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG See SOLR-822 for the detail: https://issues.apache.org/jira/browse/SOLR-822 Koji revathy arun wrote: Hi, When I index chinese content using chinese tokenizer and analyzer in solr 1.3 ,some of the chinese text files are getting indexed but others are not. Since chinese has got many different language subtypes as in standard chinese,simplified chinese etc which of these does the chinese tokenizer support and is there any method to find the type of chiense language from the file? Rgds -- View this message in context: http://www.nabble.com/indexing-Chienese-langage-tp22033302p23864358.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/indexing-Chienese-langage-tp22033302p23879730.html Sent from the Solr - User mailing list archive at Nabble.com. -- regards j.L ( I live in Shanghai, China)
Re: indexing Chienese langage
On Mon, Feb 16, 2009 at 4:30 PM, revathy arun revas...@gmail.com wrote: Hi, When I index chinese content using chinese tokenizer and analyzer in solr 1.3 ,some of the chinese text files are getting indexed but others are not. are u sure ur analyzer can do it good? if not sure, u can use analzyer link in solr admin page to check it Since chinese has got many different language subtypes as in standard chinese,simplified chinese etc which of these does the chinese tokenizer support and is there any method to find the type of chiense language from the file? Rgds -- regards j.L ( I live in Shanghai, China)
Re: Which caches should use the solr.FastLRUCache
On Thu, Jun 4, 2009 at 11:29 PM, Robert Purdy rdpu...@gmail.com wrote: Thanks for the Good information :) Well I haven't had any evictions in any of the caches in years, but the hit ratio is 0.51 in queryResultCache, 0.77 in documentCache, 1.00 in the fieldValueCache, and 0.99 in the filterCache. So in your opinion should the documentCache and queryResultCache use the old way on a single CPU quad core machine? Also right now I have all caches using the solr.FastLRUCache (tried with both the cleanupThread = false or true) and I have noticed some queries that are taking 53 ms on a freshly warmed new searcher (when nothing else is querying the slave), but when the slave is busy the same query, that should be using the caches, is sometimes taking 8 secs? Any thoughts? This overhead may not be because of the cache itself. Some queries are definitely missing the cache and they are likely to take time. if cleanupThread=true, then no eviction should take more time Thanks Robert. Yonik Seeley-2 wrote: 2009/6/4 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: FastLRUCache is designed to be lock free so it is well suited for caches which are hit several times in a request. I guess there is no harm in using FastLRUCache across all the caches. Gets are cheaper, but evictions are more expensive. If the cache hit rate is low, the old synchronized cache may be faster, unless you have a ton of CPUs... not sure where the crossover point is though. -Yonik http://www.lucidimagination.com -- View this message in context: http://www.nabble.com/Which-caches-should-use-the-solr.FastLRUCache-tp23860182p23874898.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Determining Search Query Category
If you haven't already given this a thought, you may want to try out an auto-complete feature, suggesting those categories upfront. Cheers Avlesh On Fri, Jun 5, 2009 at 3:56 AM, ram_sj rpachaiyap...@gmail.com wrote: Hi, I have more than 20 categories for my search application. I'm interested in finding the category of query entered by user dynamically instead of asking the user to filter the results through long list of categories. Its a general question, its not specific to solr though, any suggestion about how to approach this problem will be helpful. Thanks Ram -- View this message in context: http://www.nabble.com/Determining-Search-Query-Category-tp23878965p23878965.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index Comma Separated numbers
did you try the NumberFormatTransformer ? On Fri, Jun 5, 2009 at 12:09 AM, Jianbin Dai djian...@yahoo.com wrote: Hi, One of the fields to be indexed is price which is comma separated, e.g., 12,034.00. How can I indexed it as a number? I am using DIH to pull the data. Thanks. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Determining Search Query Category
Can you analyze the logs to see which categories people choose for each query? When there are enough queries and a clear preference, you can highlight that choice. wunder On 6/4/09 9:21 PM, Avlesh Singh avl...@gmail.com wrote: If you haven't already given this a thought, you may want to try out an auto-complete feature, suggesting those categories upfront. Cheers Avlesh On Fri, Jun 5, 2009 at 3:56 AM, ram_sj rpachaiyap...@gmail.com wrote: Hi, I have more than 20 categories for my search application. I'm interested in finding the category of query entered by user dynamically instead of asking the user to filter the results through long list of categories. Its a general question, its not specific to solr though, any suggestion about how to approach this problem will be helpful. Thanks Ram -- View this message in context: http://www.nabble.com/Determining-Search-Query-Category-tp23878965p23878965.h tml Sent from the Solr - User mailing list archive at Nabble.com.
Re: Customizing results
How are you accessing Solr? SolrJ? does this help? https://issues.apache.org/jira/browse/SOLR-1129 On Fri, Jun 5, 2009 at 3:00 AM, Manepalli, Kalyan kalyan.manepa...@orbitz.com wrote: Otis, With that solution, the client has to accept all type location fields (location_de_de, location_it_it). I want to copy the result into location field, so that client can just accept location. Thanks, Kalyan Manepalli -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Thursday, June 04, 2009 4:16 PM To: solr-user@lucene.apache.org Subject: Re: Customizing results Hello, If you know what language the user specified (or is associated with), then you just have to ensure the fl URL parameter contain that field (and any other fields you want returned). So if the language/locale is de_de, then make sure the request has fl=location_de_de,another_field,another_field, and not, for example location_it_it Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Manepalli, Kalyan kalyan.manepa...@orbitz.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 12:36:30 PM Subject: Customizing results Hi, I am trying to customize the response that I receive from Solr. In the index I have multiple fields that contain the same data in different language. At the query time client specifies the language. Based on this param, I want to return the value, copied into a different field. Eg: Lubang, Filippinerne Lubang, Philippinen Lubang, Philippines Lubang, Filipinas If the user specifies language as de_de, then I want to return the result as Lubang, Philippinen What is the most optimal way of doing this? Any suggestions on this will be helpful Thanks, Kalyan Manepalli -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Customizing results
Nice suggestion Noble! If you are using SolrJ, then this particular binding can be an answer to your question. Cheers Avlesh 2009/6/5 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com How are you accessing Solr? SolrJ? does this help? https://issues.apache.org/jira/browse/SOLR-1129 On Fri, Jun 5, 2009 at 3:00 AM, Manepalli, Kalyan kalyan.manepa...@orbitz.com wrote: Otis, With that solution, the client has to accept all type location fields (location_de_de, location_it_it). I want to copy the result into location field, so that client can just accept location. Thanks, Kalyan Manepalli -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Thursday, June 04, 2009 4:16 PM To: solr-user@lucene.apache.org Subject: Re: Customizing results Hello, If you know what language the user specified (or is associated with), then you just have to ensure the fl URL parameter contain that field (and any other fields you want returned). So if the language/locale is de_de, then make sure the request has fl=location_de_de,another_field,another_field, and not, for example location_it_it Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Manepalli, Kalyan kalyan.manepa...@orbitz.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 12:36:30 PM Subject: Customizing results Hi, I am trying to customize the response that I receive from Solr. In the index I have multiple fields that contain the same data in different language. At the query time client specifies the language. Based on this param, I want to return the value, copied into a different field. Eg: Lubang, Filippinerne Lubang, Philippinen Lubang, Philippines Lubang, Filipinas If the user specifies language as de_de, then I want to return the result as Lubang, Philippinen What is the most optimal way of doing this? Any suggestions on this will be helpful Thanks, Kalyan Manepalli -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: how to do exact serch with solrj
And the field should be of type, text, right Otis? Does one still need those anchors if the type is string with the filters you suggested? Cheers Avlesh On Fri, Jun 5, 2009 at 6:35 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: I re-read your original request. Here is the recipe that should work: * Define new field type that: Uses KeywordTokenizer Uses LowerCaseFilter * Make your field be of the above type. * Use those begin/end anchor characters at index and search time. I believe that should work. Please try it and let us know. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Otis Gospodnetic otis_gospodne...@yahoo.com To: solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 8:47:50 PM Subject: Re: how to do exact serch with solrj I don't think there is anything ready to be used in Solr (but would be easy to add), but if you indexed your with a custom beginning of string and end of string anchors, you'll be able to get your exact matching working. For example, convert hello the world to $hello the world$ before indexing (and make sure you use string type or KeywordTokenizer -- things that won't remove any characters. Then search for $hello the world$. This will not match $hello the world, Jack$. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Jianbin Dai To: solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 6:42:39 PM Subject: Re: how to do exact serch with solrj I still have a problem with exact matching. query.setQuery(title:\hello the world\); This will return all docs with title containing hello the world, i.e., hello the world, Jack will also be matched. What I want is exactly hello the world. Setting this field to string instead of text doesn't work well either, because I want something like Hello, The World to be matched as well. Any idea? Thanks. --- On Sat, 5/30/09, Avlesh Singh wrote: From: Avlesh Singh Subject: Re: how to do exact serch with solrj To: solr-user@lucene.apache.org Date: Saturday, May 30, 2009, 11:45 PM You need exact match for all the three tokens? If yes, try query.setQuery(title:\hello the world\); Cheers Avlesh On Sun, May 31, 2009 at 12:12 PM, Jianbin Dai wrote: I tried, but seems it's not working right. --- On Sat, 5/30/09, Avlesh Singh wrote: From: Avlesh Singh Subject: Re: how to do exact serch with solrj To: solr-user@lucene.apache.org Date: Saturday, May 30, 2009, 10:56 PM query.setQuery(title:hello the world) is what you need. Cheers Avlesh On Sun, May 31, 2009 at 6:23 AM, Jianbin Dai wrote: Hi, I want to search hello the world in the title field using solrj. I set the query filter query.addFilterQuery(title); query.setQuery(hello the world); but it returns not exact match results as well. I know one way to do it is to set title field to string instead of text. But is there any way i can do it? If I do the search through web interface Solr Admin by title:hello the world, it returns exact matches. Thanks. JB
Re: Customizing results
Hi Otis, is it a good idea to provide as aliasing feature for Solr similar to the SQL 'as' in SQL we can do select location_da_dk as location Solr may have fl.alias=location_da_dk:location --Noble On Fri, Jun 5, 2009 at 3:10 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Aha, so you really want to rename the field at response time? I wonder if this is something that could be done with (or should be added to) response writers. That's where I'd go look first. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Manepalli, Kalyan kalyan.manepa...@orbitz.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 5:30:40 PM Subject: RE: Customizing results Otis, With that solution, the client has to accept all type location fields (location_de_de, location_it_it). I want to copy the result into location field, so that client can just accept location. Thanks, Kalyan Manepalli -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Thursday, June 04, 2009 4:16 PM To: solr-user@lucene.apache.org Subject: Re: Customizing results Hello, If you know what language the user specified (or is associated with), then you just have to ensure the fl URL parameter contain that field (and any other fields you want returned). So if the language/locale is de_de, then make sure the request has fl=location_de_de,another_field,another_field, and not, for example location_it_it Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Manepalli, Kalyan To: solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 12:36:30 PM Subject: Customizing results Hi, I am trying to customize the response that I receive from Solr. In the index I have multiple fields that contain the same data in different language. At the query time client specifies the language. Based on this param, I want to return the value, copied into a different field. Eg: Lubang, Filippinerne Lubang, Philippinen Lubang, Philippines Lubang, Filipinas If the user specifies language as de_de, then I want to return the result as Lubang, Philippinen What is the most optimal way of doing this? Any suggestions on this will be helpful Thanks, Kalyan Manepalli -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Customizing results
Generally a good idea, but be prepared to entertain requests that should also ask you to be able to perform the query using those aliases. I mean when you talk about something similar to aliases in SQL, those aliases can be used in SQL scripts in the where clause too. Cheers Avlesh 2009/6/5 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Hi Otis, is it a good idea to provide as aliasing feature for Solr similar to the SQL 'as' in SQL we can do select location_da_dk as location Solr may have fl.alias=location_da_dk:location --Noble On Fri, Jun 5, 2009 at 3:10 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Aha, so you really want to rename the field at response time? I wonder if this is something that could be done with (or should be added to) response writers. That's where I'd go look first. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Manepalli, Kalyan kalyan.manepa...@orbitz.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 5:30:40 PM Subject: RE: Customizing results Otis, With that solution, the client has to accept all type location fields (location_de_de, location_it_it). I want to copy the result into location field, so that client can just accept location. Thanks, Kalyan Manepalli -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Thursday, June 04, 2009 4:16 PM To: solr-user@lucene.apache.org Subject: Re: Customizing results Hello, If you know what language the user specified (or is associated with), then you just have to ensure the fl URL parameter contain that field (and any other fields you want returned). So if the language/locale is de_de, then make sure the request has fl=location_de_de,another_field,another_field, and not, for example location_it_it Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Manepalli, Kalyan To: solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 12:36:30 PM Subject: Customizing results Hi, I am trying to customize the response that I receive from Solr. In the index I have multiple fields that contain the same data in different language. At the query time client specifies the language. Based on this param, I want to return the value, copied into a different field. Eg: Lubang, Filippinerne Lubang, Philippinen Lubang, Philippines Lubang, Filipinas If the user specifies language as de_de, then I want to return the result as Lubang, Philippinen What is the most optimal way of doing this? Any suggestions on this will be helpful Thanks, Kalyan Manepalli -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Customizing results
On Fri, Jun 5, 2009 at 10:20 AM, Avlesh Singh avl...@gmail.com wrote: Generally a good idea, but be prepared to entertain requests that should also ask you to be able to perform the query using those aliases. I mean when you talk about something similar to aliases in SQL, those aliases can be used in SQL scripts in the where clause too. I guess that can be a separate issue. But this can be implemented as a post processing step. At the ResponseWriter level. The current problem is that there are too many response writers and we will have to change and test them all Cheers Avlesh 2009/6/5 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Hi Otis, is it a good idea to provide as aliasing feature for Solr similar to the SQL 'as' in SQL we can do select location_da_dk as location Solr may have fl.alias=location_da_dk:location --Noble On Fri, Jun 5, 2009 at 3:10 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Aha, so you really want to rename the field at response time? I wonder if this is something that could be done with (or should be added to) response writers. That's where I'd go look first. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Manepalli, Kalyan kalyan.manepa...@orbitz.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 5:30:40 PM Subject: RE: Customizing results Otis, With that solution, the client has to accept all type location fields (location_de_de, location_it_it). I want to copy the result into location field, so that client can just accept location. Thanks, Kalyan Manepalli -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Thursday, June 04, 2009 4:16 PM To: solr-user@lucene.apache.org Subject: Re: Customizing results Hello, If you know what language the user specified (or is associated with), then you just have to ensure the fl URL parameter contain that field (and any other fields you want returned). So if the language/locale is de_de, then make sure the request has fl=location_de_de,another_field,another_field, and not, for example location_it_it Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Manepalli, Kalyan To: solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 12:36:30 PM Subject: Customizing results Hi, I am trying to customize the response that I receive from Solr. In the index I have multiple fields that contain the same data in different language. At the query time client specifies the language. Based on this param, I want to return the value, copied into a different field. Eg: Lubang, Filippinerne Lubang, Philippinen Lubang, Philippines Lubang, Filipinas If the user specifies language as de_de, then I want to return the result as Lubang, Philippinen What is the most optimal way of doing this? Any suggestions on this will be helpful Thanks, Kalyan Manepalli -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com