Re:The search response time is too loong
We used SOLR 1.4. All queries were excuted in SOLR back-end. I guess that I/O operations consume the time too much. From: newsam new...@zju.edu.cn Reply-To: solr-user@lucene.apache.orgnewsam new...@zju.edu.cn To: solr-user@lucene.apache.org Subject: Re:The search response time is too loong Date: Mon, 27 Sep 2010 16:05:49 +0800 I have setup a SOLR searcher instance with Tomcat 5.5.21. However, the response time is too long. Here is my scenario: 1. The index file is 8.2G. The doc num is 6110745. 2. DELL Server: Intel(R) Xeon(TM) CPU (4 cores) 3.00GHZ, 6G Mem. I used Key:* to query all records by localhost:8080. The response time is 68703 milliseconds. The cpu load is 50% and mem useage is over 400M. Any comments are welcomed.
RE: spellcheck on multiple fields?
You can use copyField to get multiple fields in the field you use for spell checking, don't forget to set it to multiValued. -Original message- From: Savannah Beckett savannah_becket...@yahoo.com Sent: Mon 27-09-2010 10:08 To: solr-user@lucene.apache.org; Subject: spellcheck on multiple fields? Is it possible to do spellcheck on multiple fields in my solr index? If so, how? The following setup works for only one field: lst name=spellchecker str name=namedefault/str str name=classnamesolr.IndexBasedSpellChecker/str str name=fieldmyfield/str str name=spellcheckIndexDir./spellchecker1/str str name=accuracy0.5/str str name=buildOnCommittrue/str /lst Thanks.
Re: TokenFilter that removes payload ?
Robert Erik, I appreciate your suggestions but we use Type for other purpose. Also, the product is out and we can't change the design so easily. So it seems the conclusion there is no such TokenFilter. I'll write one. Thanks. On Sep 27, 2010, at 1:00 PM, Robert Muir wrote: On Sun, Sep 26, 2010 at 11:49 PM, Teruhiko Kurosaka k...@basistech.comwrote: As I understand it, payloads go to the Lucene index. In most cases, the part-of-speech tags are not used if retrieved by the search applications. So they shouldn't go to the index. So I'd like to know if there is an existing TokenFilter that does this. Otherwise, I'd like to write one. I agree with Erick, I think a better approach would be to put the part of speech tags into another attribute. For example, you can put them in TypeAttribute, which is not stored in the index by default. Then, if the user wants to store them in the index, they just add TypeAsPayloadTokenFilterFactory, which copies the type into the payload... but otherwise they would not be stored. -- Robert Muir rcm...@gmail.com T. Kuro Kurosaka, 415-227-9600x122, 617-386-7122(direct)
Re: Solr UIMA integration
Hi Tommaso, All UIMA dependencies (uima-core,AlchemyAPIAnnotator, OpenCalaisAnnotator, Tagger, WhitespaceTokenizer) are 2.3.1-SNAPSHOT. All are checkout from svn AlchemyAPIAnnotator: http://svn.apache.org/repos/asf/uima/sandbox/trunk/AlchemyAPIAnnotator OpenCalaisAnnotator: http://svn.apache.org/repos/asf/uima/sandbox/trunk/OpenCalaisAnnotator Tagger: http://svn.apache.org/repos/asf/uima/sandbox/trunk/Tagger WhitespaceTokenizer: http://svn.apache.org/repos/asf/uima/sandbox/trunk/WhitespaceTokenizer solr-uima: http://solr-uima.googlecode.com/svn/trunk/solr-uima I am using the the latest Solr version checkout from svn i guess it is greater than 1.4.1. Tommaso, is it possible for you to upload all the dependency jar @ http://code.google.com/p/solr-uima/downloads/list. Thanks Mahesh -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-UIMA-integration-tp1528253p1587660.html Sent from the Solr - User mailing list archive at Nabble.com.
Multi-lingual auto-complete?
I want to provide auto-complete to users when they're inputting tags. The auto-complete tag suggestions would be based on tags that are already in the system. Multiple tags are separated by commas. A single tag could contain multiple words such as Apple computer. One issue is that a tag could be in multiple languages, including both languages (e.g. English, French) that use whitespace as word separator and languages that don't (e.g. CJK) An example of such a multi-lingual tag is Apple 电脑. If a user types apple, I'd like the autocomplete suggestions to include both Apple computer (ie. matches are case insensitive) and green apple (ie. matches aren't restricted to prefixes). And a user typing 电脑 should match Apple 电脑. Is it possible to do that? I read the article: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ In that article KeywordTokenizerFactor is used. If I changed it to CJKTokenizer would that work? With an input of Apple 电脑, what would CJKTokenizer produce? -is it Apple, 电, 脑 ? or - is it A, p, p, l, e, 电, 脑 ? Any help would be greatly appreciated. Andy
Re: Concurrent DB updates and delta import misses few records
You could get it from Solr, yes. That didn't even occur to me because when I was designing my scripts, I didn't yet have a fully integrated Solr index. :) With hindsight, I still wouldn't get it from Solr. I would lose some flexibility and ease of administration. It's certainly possible to store all build-related tracking information in the database. The build system for our old search product did it that way. I decided to go with simple text files in an NFS-mounted directory for the rewrite. It's easier for me to administer, just ssh to a server and examine or modify simple one-line text files. On the script side, the files get read into a Perl hash. With the old system, I found it cumbersome to go through the database interfaces. The only thing that's still in the database is the delete table, because it is populated by triggers on the metadata table. On 9/23/2010 12:48 AM, Shashikant Kore wrote: Thanks for the pointer, Shawn. It, definitely, is useful. I am wondering if you could retrieve minDid from the solr rather than storing it externally. Max id from Solr index and max id from DB should define the lower and upper thresholds, respectively, of the delta range. Am I missing something?
Re: Re:The search response time is too loong
mem usage is over 400M, do you mean Tomcat mem size? If you don't give your cache sizes enough room to grow you will choke the performance. You should adjust your Tomcat settings to let the cache grow to at least 1GB or better would be 2GB. You may also want to look into http://wiki.apache.org/solr/SolrCaching warming the cache to make the first time call a little faster. For comparison, I also have about 8GB in my index but only 2.8 million documents. My search query times on a smaller box than you specify are 6533 milliseconds on an unwarmed (newly rebooted) instance. -- View this message in context: http://lucene.472066.n3.nabble.com/Re-The-search-response-time-is-too-loong-tp1587395p1588554.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Re:The search response time is too loong
Also, how many rows are you requesting at one time? I've seen cases where the query time is blazing fast and the response writing is terribly slow because of too many documents being sent in the response. On Mon, Sep 27, 2010 at 6:37 AM, kenf_nc ken.fos...@realestate.com wrote: mem usage is over 400M, do you mean Tomcat mem size? If you don't give your cache sizes enough room to grow you will choke the performance. You should adjust your Tomcat settings to let the cache grow to at least 1GB or better would be 2GB. You may also want to look into http://wiki.apache.org/solr/SolrCaching warming the cache to make the first time call a little faster. For comparison, I also have about 8GB in my index but only 2.8 million documents. My search query times on a smaller box than you specify are 6533 milliseconds on an unwarmed (newly rebooted) instance. -- View this message in context: http://lucene.472066.n3.nabble.com/Re-The-search-response-time-is-too-loong-tp1587395p1588554.html Sent from the Solr - User mailing list archive at Nabble.com.
urgent SOLR query server request hangs
Hi, We are running into issues with SOLR queries. Our solr queries just hang. We are using SOLR 1.3 and below is the stack trace from threaddump. We are clueless about what can be causing this issue. We are in the midst of firefighting with our customer and any help is appreciated. Thanks,Bharat TP-Processor113 daemon prio=3 tid=0x071c3400 nid=0x134 runnable [0xfd7ed72a..0xfd7ed72a3920] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) - locked 0xfd7f26c1caf0 (a java.io.BufferedInputStream) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1064) - locked 0xfd7f2a260c50 (a sun.net.www.protocol.http.HttpURLConnection) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:373) at com.xxx..search.solr.SolrSearchServiceImpl.query(SolrSearchServiceImpl.java:271) at com.xxx..search.Searchable.query(Searchable.java:460) at com.xxx..search.JobReqSearchObject.query(JobReqSearchObject.java:903) Thanks Bharat Jain
Is Solr right for our project?
(I apologize in advance if I missed something in your documentation, but I've read through the Wiki on the subject of distributed searches and didn't find anything conclusive) We are currently evaluating Solr and Autonomy. Solr is attractive due to its open source background, following and price. Autonomy is expensive, but we know for a fact that it can handle our distributed search requirements perfectly. What we need to know is if Solr has capabilities that match or roughly approximate Autonomy's Distributed Search Handler. What it does it acts as a front-end for all of Autonomy's IDOL search servers (which correspond in this scenario to Solr shards). It is configured to know what is on each shard, which servers hold each shard and intelligently farms out queries based on that configuration. There is no need to specify which IDOL servers to hit while querying; the DiSH just knows where to go. Additionally, I believe in cases where an index piece is mirrored, it also monitors server health and falls back intelligently on other backup instances of a shard/index piece based on that. I'd appreciate it if someone can give me a frank explanation of where Solr stands in this area. Thanks, Mike
Re: Is Solr right for my business situation ?
When do you need to deploy? As I understand it, the spatial search in Solr is being rewritten and is slated for Solr 4.0, the release after next. The existing spatial search has some serious problems and is deprecated. Right now, I think the only way to get spatial search in Solr is to deploy a nightly snapshot from the active development on trunk. If you are deploying a year from now, that might change. There is not any support for SQL-like statements or for joins. The best practice for Solr is to think of your data as a single table, essentially creating a view from your database. The rows become Solr documents, the columns become Solr fields. wunder On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote: I am sure these kind of questions keep coming to you guys, but I want to raise the same question in a different context...my own business situation. I am very very new to solr and though I have tried to read through the documentation, I have nowhere near completing the whole read. The need is like this - We have a huge rdbms database/table. A single table perhaps houses 100+ million rows. Though oracle is doing a fine job of handling the insertion and updation of data, the querying is where our main concerns lie. Since we have spatial data, the index building takes hours and hours for such tables. That's when we thought of moving away from standard rdbms and thought of trying something different and fast. My last week has been spent in a journey reading through bigtable to hadoop to hbase, to hive and then finally landed on solr. As far as I am in my tests, it looks pretty good, but I have a few unanswered questions still. Trying this group for them :) (I am sure I can find some answers if I read/google more on the topic, but now I m being lazy and feel asking the people who are already using it/or perhaps developing it is a better bet). 1. Can I get my solr instance to load data (fresh data for indexing) from a stream (imagine a mq kind of queue, or similar) ? 2. Can I host my solr instance to use hbase as the database/file system (read HDFS) ? 3. are there somewhere any reports available (as in benchmarks ) for a solr instance's performance ? 4. are there any APIs available which might help me apply ANSI sql kind of statements to my solr data ? It would be great if people could help share their experience in the area... if it's too much trouble writing all of it, perhaps url would be easier... I welcome all kinds of help here... any advice/suggestions are good ... Looking forward to your viewpoints.. --raghav.. ** This message may contain confidential or proprietary information intended only for the use of the addressee(s) named above or may contain information that is legally privileged. If you are not the intended addressee, or the person responsible for delivering it to the intended addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message and any copies immediately thereafter. Thank you. ** CLLD
RE: bi-grams for common terms - any analyzers do that?
Hi Jonathan, I'm afraid I'm having trouble understanding if the analyzer returns more than one position back from a queryparser token I'm not sure if the queryparser forms a phrase query without explicit phrase quotes is a problem for me, I had no idea it happened until now, never noticed, and still don't really understand in what circumstances it happens. The problem I had was for a Boolean query l'art AND historie that the WordDelimiterFilter tokenized l'art as two tokens l at position 1 and art at position 2. So the queryparser decided this means a phrase query for l followed immediately by art. See http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance for details. This would happen whenever any token filter split a token into more than one token. For example a filter that splits foo-bar into foo bar. The exception is SynonymFilter or something like it. In the case of SynonymFilter, its not really a case of splitting one token into multiple tokens, but given one token of input, it outputs all the synonyms of the term. However all the tokens have the same position attribute. (see: http://www.lucidimagination.com/search/document/CDRG_ch05_5.6.19?q=synonym%20filter) So for example for the string the small thing if you had a synonym list for small: small=tiny,teeny input: postion|1 |2|3 token |the |small|thing Would output postion|1 |2|2|2|3 token |the |small| tiny|teeny|thing In this case when the queryParser gets back small teeny tiny since they have the same position, they are not turned into a phrase query. for l'art input postion|1 token |l'art output postion|1|2 token |l|art In this case there are two tokens with different positions so it treats them as a phrase query. Tom Burton-West
RE: bi-grams for common terms - any analyzers do that?
Hi Yonik, If the new autoGeneratePhraseQueries is off, position doesn't matter, and the query will be treated as index OR reader. Just wanted to make sure, in Solr does autoGeneratePhraseQueries = off treat the query with the *default* query operator as set in SolrConfig rather than necessarily using the Boolean OR operator? i.e. if solrQueryParser defaultOperator=AND/ and autoGeneratePhraseQueries = off then IndexReader - index reader - index AND reader Tom
Re: The search response time is too loong
2010/9/27 newsam new...@zju.edu.cn: I have setup a SOLR searcher instance with Tomcat 5.5.21. However, the response time is too long. Here is my scenario: 1. The index file is 8.2G. The doc num is 6110745. 2. DELL Server: Intel(R) Xeon(TM) CPU (4 cores) 3.00GHZ, 6G Mem. I used Key:* to query all records by localhost:8080. The response time is 68703 milliseconds. The cpu load is 50% and mem useage is over 400M. If you wanna get all records use q=*:* instead of Key:* that should give you faster results - way faster :) Why are you actually requesting all results and how many of them are you fetching? Maybe it would be a good idea to explain your usecase / problem first. simon Any comments are welcomed.
Question Related to sorting on Date
hi all I have a question related to sorting of date field i have Date field that is indexed like a string and look like 5/2/2008 4:33:30 PM i want to do sorting on this field on the basis of date, time does not matters. any suggestion how i could ignore the time part from this field and just sort on the date?
Re: Is Solr right for my business situation ?
Inline. On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote: When do you need to deploy? As I understand it, the spatial search in Solr is being rewritten and is slated for Solr 4.0, the release after next. It will be in 3.x, the next release The existing spatial search has some serious problems and is deprecated. Right now, I think the only way to get spatial search in Solr is to deploy a nightly snapshot from the active development on trunk. If you are deploying a year from now, that might change. There is not any support for SQL-like statements or for joins. The best practice for Solr is to think of your data as a single table, essentially creating a view from your database. The rows become Solr documents, the columns become Solr fields. There is now group-by capabilities in trunk as well, which may or may not help. wunder On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote: I am sure these kind of questions keep coming to you guys, but I want to raise the same question in a different context...my own business situation. I am very very new to solr and though I have tried to read through the documentation, I have nowhere near completing the whole read. The need is like this - We have a huge rdbms database/table. A single table perhaps houses 100+ million rows. Though oracle is doing a fine job of handling the insertion and updation of data, the querying is where our main concerns lie. Since we have spatial data, the index building takes hours and hours for such tables. That's when we thought of moving away from standard rdbms and thought of trying something different and fast. My last week has been spent in a journey reading through bigtable to hadoop to hbase, to hive and then finally landed on solr. As far as I am in my tests, it looks pretty good, but I have a few unanswered questions still. Trying this group for them :) (I am sure I can find some answers if I read/google more on the topic, but now I m being lazy and feel asking the people who are already using it/or perhaps developing it is a better bet). 1. Can I get my solr instance to load data (fresh data for indexing) from a stream (imagine a mq kind of queue, or similar) ? Yes, with a little bit of work. 2. Can I host my solr instance to use hbase as the database/file system (read HDFS) ? Probably, but I doubt it will be fast. Local disk is usually the best. 100+ M rows is large but not unreasonable. 3. are there somewhere any reports available (as in benchmarks ) for a solr instance's performance ? You can probably search the web for these. I've personally seen several installs w/ 1B+ docs and subsecond search and faceting and heard of others. You might look at the stuff the Hathi trust has put up. 4. are there any APIs available which might help me apply ANSI sql kind of statements to my solr data ? No. Question back? What kinds of things are you trying to do? It would be great if people could help share their experience in the area... if it's too much trouble writing all of it, perhaps url would be easier... I welcome all kinds of help here... any advice/suggestions are good ... Looking forward to your viewpoints.. --raghav.. ** This message may contain confidential or proprietary information intended only for the use of the addressee(s) named above or may contain information that is legally privileged. If you are not the intended addressee, or the person responsible for delivering it to the intended addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message and any copies immediately thereafter. Thank you. ** CLLD -- Grant Ingersoll http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8
Re: Is Solr right for my business situation ?
Grant Ingersoll wrote: There is now group-by capabilities in trunk as well, which may or may not help. Really, the field collapsing stuff has been committed to trunk finally? Or are you talking about something else? If it's the field collapsing stuff, and it's been committed to trunk, does that mean it'll be in the 3.0 release? Jonathan
Re: Is Solr right for my business situation ?
Hi Jonathan, Field collpasing is available in 1.4 by applying patch https://issues.apache.org/jira/browse/SOLR-236 -Ravi From: Jonathan Rochkind rochk...@jhu.edu To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Mon, September 27, 2010 9:18:20 PM Subject: Re: Is Solr right for my business situation ? Grant Ingersoll wrote: There is now group-by capabilities in trunk as well, which may or may not help. Really, the field collapsing stuff has been committed to trunk finally? Or are you talking about something else? If it's the field collapsing stuff, and it's been committed to trunk, does that mean it'll be in the 3.0 release? Jonathan
Re: Is Solr right for my business situation ?
Right, I know, I was curious about it's current closeness to being in main distro, not a patch. Among other things, when those who know better decide it goes in core distro, that makes me more comfortable that they've decided it works acceptably, and also makes more more comfortable that it will continue to be supported in _future_ versions without someone having to prepare a new patch. Ravi Julapalli wrote: Hi Jonathan, Field collpasing is available in 1.4 by applying patch https://issues.apache.org/jira/browse/SOLR-236 -Ravi From: Jonathan Rochkind rochk...@jhu.edu To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Mon, September 27, 2010 9:18:20 PM Subject: Re: Is Solr right for my business situation ? Grant Ingersoll wrote: There is now group-by capabilities in trunk as well, which may or may not help. Really, the field collapsing stuff has been committed to trunk finally? Or are you talking about something else? If it's the field collapsing stuff, and it's been committed to trunk, does that mean it'll be in the 3.0 release? Jonathan
Re: Is Solr right for my business situation ?
@Walter Underwood: Walter Underwood wrote: Right now, I think the only way to get spatial search in Solr is to deploy a nightly snapshot from the active development on trunk. Could you give me the link to this trunk, I need it very much! Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Is-Solr-right-for-our-project-tp1589927p1592330.html Sent from the Solr - User mailing list archive at Nabble.com.
resources for relevancy score tuning
Can someone share some good resources (books, articles, links, etc.) for tuning relevancy scores with multiple factors? I'm playing with different fields and boosts in my 'qf', 'pf', and 'bf' defaults but I feel like I'm shooting in the dark. http://wiki.apache.org/solr/SolrRelevancyCookbook has a couple of individual tips, but I need some help devising a good combination of boosts across multiple fields for scoring. E.g., I want to tweak scoring derived from a primary identifier field, a name field, a description field, a rating field, and a number of downloads field. But it seems when I adjust any single factor, it affects too many others. Thanks, -L
Re: Is Solr right for my business situation ?
Wow, that is a relief! I was going to have to look at ElasticSearch instead. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Mon, 9/27/10, Grant Ingersoll gsing...@apache.org wrote: From: Grant Ingersoll gsing...@apache.org Subject: Re: Is Solr right for my business situation ? To: solr-user@lucene.apache.org Date: Monday, September 27, 2010, 12:35 PM Inline. On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote: When do you need to deploy? As I understand it, the spatial search in Solr is being rewritten and is slated for Solr 4.0, the release after next. It will be in 3.x, the next release The existing spatial search has some serious problems and is deprecated. Right now, I think the only way to get spatial search in Solr is to deploy a nightly snapshot from the active development on trunk. If you are deploying a year from now, that might change. There is not any support for SQL-like statements or for joins. The best practice for Solr is to think of your data as a single table, essentially creating a view from your database. The rows become Solr documents, the columns become Solr fields. There is now group-by capabilities in trunk as well, which may or may not help. wunder On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote: I am sure these kind of questions keep coming to you guys, but I want to raise the same question in a different context...my own business situation. I am very very new to solr and though I have tried to read through the documentation, I have nowhere near completing the whole read. The need is like this - We have a huge rdbms database/table. A single table perhaps houses 100+ million rows. Though oracle is doing a fine job of handling the insertion and updation of data, the querying is where our main concerns lie. Since we have spatial data, the index building takes hours and hours for such tables. That's when we thought of moving away from standard rdbms and thought of trying something different and fast. My last week has been spent in a journey reading through bigtable to hadoop to hbase, to hive and then finally landed on solr. As far as I am in my tests, it looks pretty good, but I have a few unanswered questions still. Trying this group for them :) (I am sure I can find some answers if I read/google more on the topic, but now I m being lazy and feel asking the people who are already using it/or perhaps developing it is a better bet). 1. Can I get my solr instance to load data (fresh data for indexing) from a stream (imagine a mq kind of queue, or similar) ? Yes, with a little bit of work. 2. Can I host my solr instance to use hbase as the database/file system (read HDFS) ? Probably, but I doubt it will be fast. Local disk is usually the best. 100+ M rows is large but not unreasonable. 3. are there somewhere any reports available (as in benchmarks ) for a solr instance's performance ? You can probably search the web for these. I've personally seen several installs w/ 1B+ docs and subsecond search and faceting and heard of others. You might look at the stuff the Hathi trust has put up. 4. are there any APIs available which might help me apply ANSI sql kind of statements to my solr data ? No. Question back? What kinds of things are you trying to do? It would be great if people could help share their experience in the area... if it's too much trouble writing all of it, perhaps url would be easier... I welcome all kinds of help here... any advice/suggestions are good ... Looking forward to your viewpoints.. --raghav.. ** This message may contain confidential or proprietary information intended only for the use of the addressee(s) named above or may contain information that is legally privileged. If you are not the intended addressee, or the person responsible for delivering it to the intended addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message and any copies immediately thereafter. Thank you. ** CLLD -- Grant Ingersoll http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8
Re: Is Solr right for my business situation ?
Ah, totally looked over that news: spatial search in 3.x! :-D :-D Any idea already when this will be released? Awesome to hear that it has been moved forward! :) -- View this message in context: http://lucene.472066.n3.nabble.com/Is-Solr-right-for-our-project-tp1589927p1592448.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question Related to sorting on Date
Hi Ahson, You'll really want to store an additional date field (make it a TrieDateField type) that has only the date, and in the reverse order from how you've shown it. You can still keep the one you've got, just use it only for 'human viewing' rather than sorting. Something like: 20080205 if your example is 5 Feb, or 20080502 for May 2nd. This way, the parsing is most efficient, you won't have to do any tricky parsing at sort time, and, when your index gets large, your sorted searches will remain fast. On Mon, Sep 27, 2010 at 7:45 PM, Ahson Iqbal mianah...@yahoo.com wrote: hi all I have a question related to sorting of date field i have Date field that is indexed like a string and look like 5/2/2008 4:33:30 PM i want to do sorting on this field on the basis of date, time does not matters. any suggestion how i could ignore the time part from this field and just sort on the date?
DIH XML Entity Help (Newbie)
I am trying to configure the data-config.xml using the XPathEntityProcessor to index nested xml entities such as the following: study intervention intervention_typeDrug/intervention_type intervention_namefentanyl sublingual spray/intervention_name /intervention intervention intervention_typeOther/intervention_type intervention_namequestionnaire administration/intervention_name /intervention /study The data-config.xml looks like this: entity name=intervention url=${studiesdir.fileAbsolutePath} processor=XPathEntityProcessor forEach=/clinical_study/intervention/ field column=intervention_type_t multiValued=true xpath=/clinical_study/intervention/intervention_type / field column=intervention_name_t multiValued=true xpath=/clinical_study/intervention/intervention_name / /entity but it only indexes the first occurrence of intervention_type_t and intervention_name_t and they are placed as children of root entity instead of being children of intervention. I would appreciate your help! Thanks in advance, Aurelia -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-XML-Entity-Help-Newbie-tp1592723p1592723.html Sent from the Solr - User mailing list archive at Nabble.com.
Need help with spellcheck city name
Hi, I have city name as a text field, and I want to do spellcheck on it. I use setting in http://wiki.apache.org/solr/SpellCheckComponent If I setup city name as text field and do spell check on San Jos for San Jose, I get suggestion for Jos as ojos. I checked the extendedresult and I found that Jose is in the middle of all 10 suggestions in term of score and frequency. I then set city name as string field, and spell check again, I got Van for San and Ross for Jos, which is weird because San is correct. How do you setup spellchecker to spellcheck city names? City name can have multiple words. Thanks.
Re: Need help with spellcheck city name
Maybe process the city name as a single token? On Mon, Sep 27, 2010 at 3:25 PM, Savannah Beckett savannah_becket...@yahoo.com wrote: Hi, I have city name as a text field, and I want to do spellcheck on it. I use setting in http://wiki.apache.org/solr/SpellCheckComponent If I setup city name as text field and do spell check on San Jos for San Jose, I get suggestion for Jos as ojos. I checked the extendedresult and I found that Jose is in the middle of all 10 suggestions in term of score and frequency. I then set city name as string field, and spell check again, I got Van for San and Ross for Jos, which is weird because San is correct. How do you setup spellchecker to spellcheck city names? City name can have multiple words. Thanks.
Re: Need help with spellcheck city name
No, it doesn't work, I got weird result. I set my city name field to be parsed as a token as following: fieldType name=autocomplete1 class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I got following result for spellcheck: lstname=spellcheck - lstname=suggestions - lstname=san intname=numFound1/int intname=startOffset0/int intname=endOffset3/int - arrname=suggestion strswan/str /arr /lst - lstname=clar intname=numFound1/int intname=startOffset4/int intname=endOffset8/int arrname=suggestion strclark/str /arr /lst /lst From: Tom Hill solr-l...@worldware.com To: solr-user@lucene.apache.org Sent: Mon, September 27, 2010 3:52:48 PM Subject: Re: Need help with spellcheck city name Maybe process the city name as a single token? On Mon, Sep 27, 2010 at 3:25 PM, Savannah Beckett savannah_becket...@yahoo.com wrote: Hi, I have city name as a text field, and I want to do spellcheck on it. I use setting in http://wiki.apache.org/solr/SpellCheckComponent If I setup city name as text field and do spell check on San Jos for San Jose, I get suggestion for Jos as ojos. I checked the extendedresult and I found that Jose is in the middle of all 10 suggestions in term of score and frequency. I then set city name as string field, and spell check again, I got Van for San and Ross for Jos, which is weird because San is correct. How do you setup spellchecker to spellcheck city names? City name can have multiple words. Thanks.
Re: Need help with spellcheck city name
Hmmm, did you rebuild your spelling index after the config changes? And it really looks like somehow you're getting results from a field other than city. Are you also sure that your cityname field is of type autocomplete1? Shooting in the dark here, but these results are so weird that I suspect it's something fundamental Best Erick On Mon, Sep 27, 2010 at 8:05 PM, Savannah Beckett savannah_becket...@yahoo.com wrote: No, it doesn't work, I got weird result. I set my city name field to be parsed as a token as following: fieldType name=autocomplete1 class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I got following result for spellcheck: lstname=spellcheck - lstname=suggestions - lstname=san intname=numFound1/int intname=startOffset0/int intname=endOffset3/int - arrname=suggestion strswan/str /arr /lst - lstname=clar intname=numFound1/int intname=startOffset4/int intname=endOffset8/int arrname=suggestion strclark/str /arr /lst /lst From: Tom Hill solr-l...@worldware.com To: solr-user@lucene.apache.org Sent: Mon, September 27, 2010 3:52:48 PM Subject: Re: Need help with spellcheck city name Maybe process the city name as a single token? On Mon, Sep 27, 2010 at 3:25 PM, Savannah Beckett savannah_becket...@yahoo.com wrote: Hi, I have city name as a text field, and I want to do spellcheck on it. I use setting in http://wiki.apache.org/solr/SpellCheckComponent If I setup city name as text field and do spell check on San Jos for San Jose, I get suggestion for Jos as ojos. I checked the extendedresult and I found that Jose is in the middle of all 10 suggestions in term of score and frequency. I then set city name as string field, and spell check again, I got Van for San and Ross for Jos, which is weird because San is correct. How do you setup spellchecker to spellcheck city names? City name can have multiple words. Thanks.
Re: Need help with spellcheck city name
No, I checked, there is a city called Swan in Iowa. So, it is getting from the city index, so is Clerk. But why does it favor Swan than San? Spellcheck get weird after I treat city name as one token. If I do it in the old way, it let San go, and correct Jos as Ojos instead of Jose because Ojos is ranked as #1 and Jose at the middle. Any more suggestions? Rank it by frequency first then score doesn't work neither. From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Mon, September 27, 2010 5:24:25 PM Subject: Re: Need help with spellcheck city name Hmmm, did you rebuild your spelling index after the config changes? And it really looks like somehow you're getting results from a field other than city. Are you also sure that your cityname field is of type autocomplete1? Shooting in the dark here, but these results are so weird that I suspect it's something fundamental Best Erick On Mon, Sep 27, 2010 at 8:05 PM, Savannah Beckett savannah_becket...@yahoo.com wrote: No, it doesn't work, I got weird result. I set my city name field to be parsed as a token as following: fieldType name=autocomplete1 class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I got following result for spellcheck: lstname=spellcheck - lstname=suggestions - lstname=san intname=numFound1/int intname=startOffset0/int intname=endOffset3/int - arrname=suggestion strswan/str /arr /lst - lstname=clar intname=numFound1/int intname=startOffset4/int intname=endOffset8/int arrname=suggestion strclark/str /arr /lst /lst From: Tom Hill solr-l...@worldware.com To: solr-user@lucene.apache.org Sent: Mon, September 27, 2010 3:52:48 PM Subject: Re: Need help with spellcheck city name Maybe process the city name as a single token? On Mon, Sep 27, 2010 at 3:25 PM, Savannah Beckett savannah_becket...@yahoo.com wrote: Hi, I have city name as a text field, and I want to do spellcheck on it. I use setting in http://wiki.apache.org/solr/SpellCheckComponent If I setup city name as text field and do spell check on San Jos for San Jose, I get suggestion for Jos as ojos. I checked the extendedresult and I found that Jose is in the middle of all 10 suggestions in term of score and frequency. I then set city name as string field, and spell check again, I got Van for San and Ross for Jos, which is weird because San is correct. How do you setup spellchecker to spellcheck city names? City name can have multiple words. Thanks.
Re: Is Solr right for our project?
Solr will match this in version 3.1 which is the next major release. Read this page: http://wiki.apache.org/solr/SolrCloud for feature descriptions Coming to a trunk near you - see https://issues.apache.org/jira/browse/SOLR-1873 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 27. sep. 2010, at 17.44, Mike Thomsen wrote: (I apologize in advance if I missed something in your documentation, but I've read through the Wiki on the subject of distributed searches and didn't find anything conclusive) We are currently evaluating Solr and Autonomy. Solr is attractive due to its open source background, following and price. Autonomy is expensive, but we know for a fact that it can handle our distributed search requirements perfectly. What we need to know is if Solr has capabilities that match or roughly approximate Autonomy's Distributed Search Handler. What it does it acts as a front-end for all of Autonomy's IDOL search servers (which correspond in this scenario to Solr shards). It is configured to know what is on each shard, which servers hold each shard and intelligently farms out queries based on that configuration. There is no need to specify which IDOL servers to hit while querying; the DiSH just knows where to go. Additionally, I believe in cases where an index piece is mirrored, it also monitors server health and falls back intelligently on other backup instances of a shard/index piece based on that. I'd appreciate it if someone can give me a frank explanation of where Solr stands in this area. Thanks, Mike
Re: FieldType for storing date
: I was wondering what would be the best FieldType for storing date with a : millisecond precision that would allow me to sort and run range queries : against this field. We would like to achieve the best query performance, : minimal heap - fieldcache - requirements, good indexing throughput and : minimal index size in that order. if you don't need sortMissingLast or sortMissingFirst then TrieDateField should be exactly what you are looking for. : We could probably use TrieLongField, however, as we understand, this : doubles the heap requirements for fieldcache. Was wondering if there is : a clever way of achieving this without adding to the heap. TrieDateField uses the long[] FieldCache, I'm not sure what you mean by doubles the heap requirements ... unless you are comparing to int ? In that case: using TrieIntField seems like what you want? (but if you are comparing to DateField, the FieldCache for TrieDateField is going to be a lot smaller) -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Search Interface
Hi everybody, I`m implementing my first solr engine for conceptual tests, I`m crawling my wiki intranet to make some searches, the engine is working fine already, but I need some interface to make my searchs. Somebody knows where can I find some search interface just for customizations? Tks -- Claudio Devecchi flickr.com/cdevecchi
Re: Solr UIMA integration
Hi Maheshkumar, I attached a patch for inclusion of this project as a Solr contrib module [1] , there you can find the patch to apply to the Solr trunk along with needed jars (attached as a zip archive). I think that your issue could be related to the fact that GC project dependency is from Solr 1.4.1, not from trunk, so the patch should fix it. Hope this helps, Tommaso [1] : https://issues.apache.org/jira/browse/SOLR-2129 2010/9/27 maheshkumar maheshkuma...@gmail.com Hi Tommaso, All UIMA dependencies (uima-core,AlchemyAPIAnnotator, OpenCalaisAnnotator, Tagger, WhitespaceTokenizer) are 2.3.1-SNAPSHOT. All are checkout from svn AlchemyAPIAnnotator: http://svn.apache.org/repos/asf/uima/sandbox/trunk/AlchemyAPIAnnotator OpenCalaisAnnotator: http://svn.apache.org/repos/asf/uima/sandbox/trunk/OpenCalaisAnnotator Tagger: http://svn.apache.org/repos/asf/uima/sandbox/trunk/Tagger WhitespaceTokenizer: http://svn.apache.org/repos/asf/uima/sandbox/trunk/WhitespaceTokenizer solr-uima: http://solr-uima.googlecode.com/svn/trunk/solr-uima I am using the the latest Solr version checkout from svn i guess it is greater than 1.4.1. Tommaso, is it possible for you to upload all the dependency jar @ http://code.google.com/p/solr-uima/downloads/list. Thanks Mahesh -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-UIMA-integration-tp1528253p1587660.html Sent from the Solr - User mailing list archive at Nabble.com.