Re: Custom filter development
On Mon, May 9, 2011 at 5:07 AM, solrfan a2701...@jnxjn.com wrote: Hi, I would like to write my own filter. I try to use the following class: But this is a problem for me. The one-to-one mapping. I want to map a given Token, for example a to three Tokens a1, a2, a3. I want to do a one-to-one mapping to b - c too, and I want to have the possibility to remove a Token d - . How can I do this, when the next methods returns only one Token, not a collection? Buffer them internally. Look at SynonymFilter.java, it does exactly this. Tom Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-filter-development-tp2918459p2918459.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
Hi John, WeakReferences allow things to get GC'd, if there are no other references to the object referred to. My understanding is that WeakHashMaps use weak references for the Keys in the HashMap. What this means is that the keys in HashMap can be GC'd, once there are no other references to the key. I _think_ this occurs when the IndexReader is closed. It does not mean that objects in the FieldCache will get evicted in low memory conditions, unless that field cache entry is no longer needed (i.e. the IndexReader has closed). It just means they can be collected, when they are no longer needed (but not before). So, if you are seeing the FieldCache for the current IndexReader taking up 2.1, that's probably for the current cache usage. There isn't a knob you can turn to cut the cache size, but you can evaluate your usage of the cache. Some ideas: How many fields are you searching on? Sorting on? Are you sorting on String fields, where you could be using a numeric field? Numerics save space. Do you need to sort on every field that you are sorting on? Could you facet on fewer fields? For a String field, do you have too many distinct values? If so, can you reduce the number or unique terms? You might check your faceting algorithms, and see if you could use enum, instead of fc for some of them. Check your statistics page, what's your insanity count? Tom On Fri, Dec 10, 2010 at 12:17 PM, John Russell jjruss...@gmail.com wrote: I have been load testing solr 1.4.1 and have been running into OOM errors. Not out of heap but with the GC overhead limit exceeded message meaning that it didn't actually run out of heap space but just spent too much CPU time trying to make room and gave up. I got a heap dump and sent it through the Eclipse MAT and found that a single WeakHashMap in FieldCacheImpl called readerCache is taking up 2.1GB of my 2.6GB heap. From my understanding of WeakHashMaps the GC should be able to collect those references if it needs to but for some reason it isn't here. My questions are: 1) Any ideas why the GC is not collecting those weak references in that single hashmap? 2) Is there a nob in the solr config that can limit the size of that cache? Also, after the OOM is thrown solr doesn't respond much at all and throws the exception below, however when I go to the code I see this try { processor.processAdd(addCmd); addCmd.clear(); } catch (IOException e) { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, ERROR adding document + document); } } So its swallowing the IOException and throwing a new one without setting the cause so I can't see what the IOException is. Is this fixed in any newer version? Should I open a bug? Thanks a lot for your help John SEVERE: org.apache.solr.common.SolrException: ERROR adding document SolrInputDocument[{de.id=de.id(1.0)={C2B3B03F112C549254560A568C18}, de.type=de.type(1.0)={Social Contact}, sc.author=sc.author(1.0)={Author-3944}, sc.sourceType=sc.sourceType(1.0)={rss}, sc.link=sc.link(1.0)={ http://www.cisco.com/feed/date_12.07.10_16.18.03/idx/107 52}, sc.title=sc.title(1.0)={Title-erat metus eget vestibulum}, sc.publishedDate=sc.publishedDate(1.0)={Tue Dec 07 16:22:09 EST 2010}, sc.createdDate=sc.createdDate(1.0 )={Tue Dec 07 16:20:20 EST 2010}, sc.socialContactStatus=sc.socialContactStatus(1.0)={unread}, sc.socialContactStatusUserId=sc.socialContactStatusUserId(1.0)={}, sc.soc ialContactStatusDate=sc.socialContactStatusDate(1.0)={Tue Dec 07 16:20:20 EST 2010}, sc.tags=sc.tags(1.0)={[]}, sc.authorId=sc.authorId(1.0)={}, sc.replyToId=sc.replyTo Id(1.0)={}, sc.replyToAuthor=sc.replyToAuthor(1.0)={}, sc.replyToAuthorId=sc.replyToAuthorId(1.0)={}, sc.feedId=sc.feedId(1.0)={[124852]}, filterResult_124932_ti=filter Result_124932_ti(1.0)={67}, filterStatus_124932_s=filterStatus_124932_s(1.0)={COMPLETED}, filterResult_124937_ti=filterResult_124937_ti(1.0)={67}, filterStatus_124937_s =filterStatus_124937_s(1.0)={COMPLETED}, campaignDateAdded_124957_tdt=campaignDateAdded_124957_tdt(1.0)={Tue Dec 07 16:20:20 EST 2010}, campaignStatus_124957_s=campaign Status_124957_s(1.0)={NEW}, campaignDateAdded_124947_tdt=campaignDateAdded_124947_tdt(1.0)={Tue Dec 07 16:20:20 EST 2010}, campaignStatus_124947_s=campaignStatus_124947 _s(1.0)={NEW}, sc.campaignResultsSummary=sc.campaignResultsSummary(1.0)={[NEW, NEW]}}] at org.apache.solr.handler.BinaryUpdateRequestHandler$2.document(BinaryUpdateRequestHandler.java:81) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:136) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readIterator(JavaBinUpdateRequestCodec.java:126) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:210) at
Re: singular/plurals
Check out this page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Look, in particular, for stemming. On Fri, Dec 10, 2010 at 7:58 PM, Jack O jack_...@yahoo.com wrote: Hello, Need one more help: What do I have to do so that search will work for singulars and plurals ? I would really appreciate all your help. /J
Re: command line parameters for solr
java -jar start.jar --help More docs here http://docs.codehaus.org/display/JETTY/A+look+at+the+start.jar+mechanism Personally, I usually limit access to localhost by using whatever firewall the machine uses. Tom On Fri, Dec 10, 2010 at 7:55 PM, Jack O jack_...@yahoo.com wrote: Hello, For starting solr, from where do i find the list of command line parameters. java -jar start.jar blahblah... I am especially looking for how to specify my own jetty config file. I want to allow access of solr from localhost only. I would really appreciate all your help. /J
Re: Delete by query or Id very slow
I'd bet it's the optimize that's taking the time, and not the delete. You don't really need to optimize these days, and you certainly don't need to do it on every delete. And you can give solr a list of ids to delete, which would be more efficient. I don't believe you can tell which ones have failed, if any do, if you delete with a list, but you are not using unsuccessful now anyway. Tom On Thu, Dec 9, 2010 at 7:55 AM, Ravi Kiran ravi.bhas...@gmail.com wrote: Thank you Tom for responding. On an average the docs are around 25-35 KB. The code is as follows, Kindly let me know if you see anything weird, a second pair of eyes always helps :-) public ListString deleteDocs(ListString ids) throws SolrCustomException { CommonsHttpSolrServer server = (CommonsHttpSolrServer) getServerInstance(); ListString unsuccessful = new ArrayListString(); try { if(ids!= null !ids.isEmpty()) { for(String id : ids) { server.deleteById(id); } server.commit(); server.optimize(); } }catch(IOException ioex) { throw new SolrCustomException(IOException while deleting : , ioex); }catch(SolrServerException solrex) { throw new SolrCustomException(Could not delete : , solrex); } return unsuccessful; } private SolrServer getServerInstance() throws SolrCustomException { if(server != null) { return server; } else { String url = getServerURL(); log.debug(Server URL: + url); try { server = new CommonsHttpSolrServer(url); server.setSoTimeout(100); // socket read timeout server.setConnectionTimeout(100); server.setDefaultMaxConnectionsPerHost(1000); server.setMaxTotalConnections(1000); server.setFollowRedirects(false); // defaults to false // allowCompression defaults to false.Server side must support gzip or deflate for this to have any effect. server.setAllowCompression(true); server.setMaxRetries(1); // defaults to 0. 1 not recommended. } catch (MalformedURLException mex) { throw new SolrCustomException(Cannot resolve Solr Server at ' + url + '\n, mex); } return server; } } Thanks, Ravi Kiran Bhaskar On Wed, Dec 8, 2010 at 6:16 PM, Tom Hill solr-l...@worldware.com wrote: That''s a pretty low number of documents for auto complete. It means that when getting to 850,000 documents, you will create 8500 segments, and that's not counting merges. How big are your documents? I just created an 850,000 document (and a 3.5 m doc index) with tiny documents (id and title), and they deleted quickly (17 milliseconds). Maybe if you post your delete code? Are you doing anything else (like commit/optimize?) Tom On Wed, Dec 8, 2010 at 12:55 PM, Ravi Kiran ravi.bhas...@gmail.com wrote: Hello, Iam using solr 1.4.1 when I delete by query or Id from solrj it is very very slow almost like a hang. The core form which Iam deleting has close to 850K documents in the index. In the solrconfig.xml autocommit is set as follows. Any idea how to speed up the deletion process. Please let me know if any more info is required updateHandler class=*solr.DirectUpdateHandler2* !-- Perform a commit/ automatically under certain conditions: maxDocs - number of updates since last commit is greater than this maxTime - oldest *uncommited* update (in *ms*) is this long ago -- autoCommit maxDocs100/maxDocs maxTime12/maxTime /autoCommit /updateHandler Thanks, *Ravi Kiran Bhaskar*
Re: Triggering a reload of replicated configuration files
On Thu, Dec 9, 2010 at 4:49 AM, Ophir Adiv firt...@gmail.com wrote: On Thu, Dec 9, 2010 at 2:25 PM, Upayavira u...@odoko.co.uk wrote: On Thu, 09 Dec 2010 13:34 +0200, Ophir Adiv firt...@gmail.com wrote: Hi, I added a configuration file which is updated on one of the master cores' conf directory, and also added the file name to the list of confFiles. As as expected, after index change and commit, this file gets replicated to the slave core. However, the problem that remains is how to reload this file's data after it's replicated. What I did on the master core, is to initiate a core reload, and through a custom CoreAdminHandler override handleReloadAction() to reload the new file too. But this cannot be done on the slave, since the master, which triggers the update, is unaware who is slaves are. Any ideas on how to do this? http://wiki.apache.org/solr/CoreAdmin#RELOAD Doesn't this do it? Upayavira This works on the master core, since the application knows its master cores - but this does not trigger a reload on the slave cores. I believe it does. See SnapPuller.java if (successfulInstall) { LOG.info(Configuration files are modified, core will be reloaded); logReplicationTimeAndConfFiles(modifiedConfFiles, successfulInstall);//write to a file time of replication and conf files. reloadCore(); } And I tested it awhile ago, and it seemed to be working. Check your logs for errors, perhaps? Tom
Re: How badly does NTFS file fragmentation impact search performance? 1.1X? 10X? 100X?
If you can benchmark before and after, please post the results when you are done! Things like your index's size, and the amount of RAM in your computer will help make it meaningful. If all of your index can be cached, I don't think fragmentation is going matter much, once you get warmed up. Tom On Wed, Dec 8, 2010 at 9:59 AM, Will Milspec will.mils...@gmail.com wrote: Hi all, Pardon if this isn't the best place to post this email...maybe it belongs on the lucene-user list . Also, it's basically windows-specific,so not of use to everyone... The question: does NTFS fragmentation affect search performance a little bit or a lot? It's obvious that fragmentation will slow things down, but is it a factor of .1, 10 , or 100? (i.e what order of magnitude)? As a follow up: should solr/lucene users periodically remind Windows sysadmins to defrag their drives ? On a production system, I ran the windows defrag analyzer and found heavy fragmentation on the lucene index. 11,839 492 MB \data\index\search\_6io5.cfs 7,153 433 MB \data\index\search\_5ld6.cfs 6,953 661 MB \data\index\search\_8jvj.cfs 5,824 74 MB \data\index\search\_5ld7.frq 5,691 356 MB \data\index\search\_9eev.fdt 5,638 352 MB \data\index\search\_8mqi.fdt 5,629 352 MB \data\index\search\_8jvj.fdt 5,609 351 MB \data\index\search\_88z8.fdt 5,590 355 MB \data\index\search\_96l5.fdt 5,568 354 MB \data\index\search\_8zjn.fdt 5,471 342 MB \data\index\search\_5wgo.fdt 5,466 342 MB \data\index\search\_5uo1.fdt 5,450 340 MB \data\index\search\_5hrn.fdt 5,429 345 MB \data\index\search\_6nyy.fdt 5,371 353 MB \data\index\search\_8sob.fdt Incidentally, we periodically experience some *very* slow searches. Out of curiousity, I checked for file fragmentation (using 'analyze' mode of the nfts defragger) nota bene: Windows sysinternals has a utility Contig.exe whic allows you to defragment individual drives/directories. We'll use that to defragmeent the index direcotires will
Re: Delete by query or Id very slow
That''s a pretty low number of documents for auto complete. It means that when getting to 850,000 documents, you will create 8500 segments, and that's not counting merges. How big are your documents? I just created an 850,000 document (and a 3.5 m doc index) with tiny documents (id and title), and they deleted quickly (17 milliseconds). Maybe if you post your delete code? Are you doing anything else (like commit/optimize?) Tom On Wed, Dec 8, 2010 at 12:55 PM, Ravi Kiran ravi.bhas...@gmail.com wrote: Hello, Iam using solr 1.4.1 when I delete by query or Id from solrj it is very very slow almost like a hang. The core form which Iam deleting has close to 850K documents in the index. In the solrconfig.xml autocommit is set as follows. Any idea how to speed up the deletion process. Please let me know if any more info is required updateHandler class=*solr.DirectUpdateHandler2* !-- Perform a commit/ automatically under certain conditions: maxDocs - number of updates since last commit is greater than this maxTime - oldest *uncommited* update (in *ms*) is this long ago -- autoCommit maxDocs100/maxDocs maxTime12/maxTime /autoCommit /updateHandler Thanks, *Ravi Kiran Bhaskar*
Re: only index synonyms
Hi Lee, Sorry, I think Erick and I both thought the issue was converting the synonyms, not removing the other words. To keep only a set of words that match a list, use the KeepWordFilterFactory, with your list of synonyms. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.KeepWordFilterFactory I'd put the synonym filter first in your configuration for the field, then the keep words filter factory. Tom On Tue, Dec 7, 2010 at 12:06 PM, lee carroll lee.a.carr...@googlemail.com wrote: ok thanks for your response To summarise the solution then: To only index synonyms you must only send words that will match the synonym list. If words with out synonym ,atches are in the field to be indexed these words will be indexed. No way to avoid this by using schema.xml config. thanks lee c On 7 December 2010 13:21, Erick Erickson erickerick...@gmail.com wrote: OK, the light finally dawns *If* you have a defined list of words to remove, you can put them in with your stopwords and add a stopword filter to the field in schema.xml. Otherwise, you'll have to do some pre-processing and only send to solr words you want. I'm assuming you have a list of valid words (i.e. the words in your synonyms file) and could pre-filter the input to remove everything else. In that case you don't need a synonyms filter since you're controlling the whole process anyway Best Erick On Tue, Dec 7, 2010 at 6:07 AM, lee carroll lee.a.carr...@googlemail.com wrote: Hi tom This seems to place in the index This is a scenic line of words I just want scenic and words in the index I'm not at a terminal at the moment but will try again to make sure. I'm sure I'm missing the obvious Cheers lee On 7 Dec 2010 07:40, Tom Hill solr-l...@worldware.com wrote: Hi Lee, On Mon, Dec 6, 2010 at 10:56 PM, lee carroll lee.a.carr...@googlemail.com wrote: Hi Erik Nope, Erik is the other one. :-) thanks for the reply. I only want the synonyms to be in the index how can I achieve that ? Sorry probably missing something obvious in the docs Exactly what he said, use the = syntax. You've already got it. Add the lines pretty = scenic text = words to synonyms.txt, and it will do what you want. Tom On 7 Dec 2010 01:28, Erick Erickson erickerick...@gmail.com wrote: See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory with the = syntax, I think that's what you're looking for Best Erick On Mon, Dec 6, 2010 at 6:34 PM, lee carroll lee.a.carr...@googlemail.com wrote: Hi Can the following usecase be achieved. value to be analysed at index time this is a pretty line of text synonym list is pretty = scenic , text = words valued placed in the index is scenic words That is to say only the matching synonyms. Basically i want to produce a normalised set of phrases for faceting. Cheers Lee C
Re: customer ping response
Hi Tri, Well, I wouldn't really recommend this, but I just tried making a custom XMLReponseWriter that wrote the response you wanted. So you can use it with any request handler you want. Works fine, but it's pretty hack-y. The downside is, you are writing code, and you have to modify SolrCore. But it's trivial to do. So, I wouldn't recommend it, but it was fun to play around with. :) It's probably easier to fix the load balancer, which is almost certainly just looking for any string you specify. Just change what it's expecting. They are built so you can configure this. Tom On Tue, Dec 7, 2010 at 5:56 PM, Erick Erickson erickerick...@gmail.com wrote: That's the query term being sent to the server. On Tue, Dec 7, 2010 at 8:50 PM, Tri Nguyen tringuye...@yahoo.com wrote: Hi, I'm reading the wiki. What does q=apache mean in the url? http://localhost:8983/solr/select/?stylesheet=q=apachewt=xslttr=example.xsl thanks, tri From: Markus Jelsma markus.jel...@openindex.io To: Tri Nguyen tringuye...@yahoo.com Cc: solr-user@lucene.apache.org Sent: Tue, December 7, 2010 4:35:28 PM Subject: Re: customer ping response Well, you can go a long way with xslt but i wouldn't know how to embed the server name in the response as Solr simply doesn't return that information. You'd have to patch the response Solr's giving or put a small script in front that can embed the server name. I need to return this: ?xml version=1.0 encoding=UTF-8? admin status nameServer/name valueok/value /status /admin From: Markus Jelsma markus.jel...@openindex.io To: solr-user@lucene.apache.org Cc: Tri Nguyen tringuye...@yahoo.com Sent: Tue, December 7, 2010 4:27:32 PM Subject: Re: customer ping response Of course! The ping request handler behaves like any other request handler and accepts at last the wt parameter [1]. Use xslt [2] to transform the output to any desirable form or use other response writers [1]. Why anyway, is it a load balancer that only wants an OK output or something? [1]: http://wiki.apache.org/solr/CoreQueryParameters [2]: http://wiki.apache.org/solr/XsltResponseWriter [3]: http://wiki.apache.org/solr/QueryResponseWriter Can I have a custom xml response for the ping request? thanks, Tri
Re: complex boolean filtering in fq queries
For one thing, you wouldn't have fq= in there, except at the beginning. fq=location:national OR (location:CA AND city:San Francisco) more below... On Tue, Dec 7, 2010 at 10:25 PM, Andy angelf...@yahoo.com wrote: Forgot to add, my defaultOperator is AND. --- On Wed, 12/8/10, Andy angelf...@yahoo.com wrote: From: Andy angelf...@yahoo.com Subject: complex boolean filtering in fq queries To: solr-user@lucene.apache.org Date: Wednesday, December 8, 2010, 1:21 AM I have a facet query that requires some complex boolean filtering. Something like: fq=location:national OR (fq=location:CA AND fq=city:San Francisco) 1) How do I turn the above filters into a REST query string? Do you mean URL encoding it? You can just type your query into the search box in the admin UI, and copy from the resulting URL. 2) Do I need the double quotes around San Francisco? Yes. Else is will be (city:San) (Francisco) Probably not what you want. 3) Will complex boolean filters like this substantially slow down query performance? That's not very complex, and the filter may be cached. Probably won't be a problem. Tom Thanks
Re: Index version on slave nodes
Just off the top of my head, aren't you able to use a slave as a repeater, so it's configured as both a master and a slave? http://wiki.apache.org/solr/SolrReplication#Setting_up_a_Repeater This would seem to require that the slave return the same values as its master for indexversion. What happens if you configure your slave as a master, also? Does that get the behavior you want? Tom On Tue, Dec 7, 2010 at 8:16 AM, Markus Jelsma markus.jel...@openindex.io wrote: Yes, i read that too in the replication request handler's source comments. But i would find it convenient if it would just use the same values as we see using the details command. Any devs agree? Then i'd open a ticket for this one. On Tuesday 07 December 2010 17:14:09 Xin Li wrote: I read it somewhere (sorry for not remembering the source).. the indexversion command gets the replicable index version #. Since it is a slave machine, so the result is 0. Thanks, On Tue, Dec 7, 2010 at 11:06 AM, Markus Jelsma markus.jel...@openindex.io wrote: But why? I'd expect valid version numbers although the replication handler's source code seems to agree with you judging from the comments. On Monday 06 December 2010 17:49:16 Xin Li wrote: I think this is expected behavior. You have to issue the details command to get the real indexversion for slave machines. Thanks, Xin On Mon, Dec 6, 2010 at 11:26 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, The indexversion command in the replicationHandler on slave nodes returns 0 for indexversion and generation while the details command does return the correct information. I haven't found an existing ticket on this one although https://issues.apache.org/jira/browse/SOLR-1573 has similarities. Cheers, -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: only index synonyms
Hi Lee, On Mon, Dec 6, 2010 at 10:56 PM, lee carroll lee.a.carr...@googlemail.com wrote: Hi Erik Nope, Erik is the other one. :-) thanks for the reply. I only want the synonyms to be in the index how can I achieve that ? Sorry probably missing something obvious in the docs Exactly what he said, use the = syntax. You've already got it. Add the lines pretty = scenic text = words to synonyms.txt, and it will do what you want. Tom On 7 Dec 2010 01:28, Erick Erickson erickerick...@gmail.com wrote: See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory with the = syntax, I think that's what you're looking for Best Erick On Mon, Dec 6, 2010 at 6:34 PM, lee carroll lee.a.carr...@googlemail.com wrote: Hi Can the following usecase be achieved. value to be analysed at index time this is a pretty line of text synonym list is pretty = scenic , text = words valued placed in the index is scenic words That is to say only the matching synonyms. Basically i want to produce a normalised set of phrases for faceting. Cheers Lee C
Re: Need help with spellcheck city name
Maybe process the city name as a single token? On Mon, Sep 27, 2010 at 3:25 PM, Savannah Beckett savannah_becket...@yahoo.com wrote: Hi, I have city name as a text field, and I want to do spellcheck on it. I use setting in http://wiki.apache.org/solr/SpellCheckComponent If I setup city name as text field and do spell check on San Jos for San Jose, I get suggestion for Jos as ojos. I checked the extendedresult and I found that Jose is in the middle of all 10 suggestions in term of score and frequency. I then set city name as string field, and spell check again, I got Van for San and Ross for Jos, which is weird because San is correct. How do you setup spellchecker to spellcheck city names? City name can have multiple words. Thanks.
Re: Delete Dynamic Fields
Delete all docs with the dynamic fields, and then optimize. On Wed, Sep 22, 2010 at 1:58 PM, Moiz Bhukhiya moiz.bhukh...@gmail.com wrote: Hi All: I had used dynamic fields for some of my fields and then later decided to make it static. I removed that dynamic field from the schema but I still see it on admin interface(FIELD LIST). Could somebody please point me out how can I remove these dynamic fields? Thanks, Moiz
Re: Searching solr with a two word query
It will probably be clearer if you don't use the pseudo-boolean operators, and just use + for required terms. If you look at your output from debug, you see your query becomes: all_text:open +all_text:excel +presentation_id:294 +type:blob Note that all_text:open does not have a + sign, but all_text:excel has one. So all_text:open is not required, but all_text:excel is. I think this is because AND marks both of its operands as required. (which puts the + on +all_text:excel), but the open has no explicit op, so it uses OR, which marks that term as optional. What I would suggest you do is: opening excellent +presentation_id:294 +type:blob Which is think is much clearer. I think you could also do opening excellent presentation_id:294 AND type:blob but I think it's non-obvious how the result will differ from opening excellent AND presentation_id:294 AND type:blob So I wouldn't use either of the last two. Tom p.s. Not sure what is going on with the last lines of your debug output for the query. Is that really what shows up after presentation ID? I see Euro, hash mark, zero, semi-colon, and H with stroke str name=parsedquery_toString all_text:open +all_text:excel +presentation_id:€#0;Ħ +type:blob /str On Mon, Sep 20, 2010 at 12:46 PM, n...@frameweld.com wrote: Say if I had a two word query that was opening excellent, I would like it to return something like: opening excellent opening opening opening excellent excellent excellent Instead of: opening excellent excellent excellent excellent If I did a search, I would like the first word alone to also show up in the results, because currently my results show both words in one result and only the second word for the rest of the results. I've done a search on each word by itself, and there are results for them. Thanks. -Original Message- From: Erick Erickson erickerick...@gmail.com Sent: Monday, September 20, 2010 2:37pm To: solr-user@lucene.apache.org Subject: Re: Searching solr with a two word query I'm missing what you really want out of your query, your phrase either word as a single result just isn't connecting in my grey matter.. Could you give some example inputs and outputs that demonstrates what you want? Best Erick On Mon, Sep 20, 2010 at 11:41 AM, n...@frameweld.com wrote: I noticed that my defaultOperator is OR, and that does have an effect on what does come up. If I were to change that to and, it's an exact match to my query, but Im would like similar matches with either word as a single result. Is there another value I can use? Or maybe I should use another query parser? Thanks. - Noel -Original Message- From: Erick Erickson erickerick...@gmail.com Sent: Monday, September 20, 2010 10:05am To: solr-user@lucene.apache.org Subject: Re: Searching solr with a two word query Here's an excellent description of the Lucene query operators and how they differ from strict boolean logic: http://www.gossamer-threads.com/lists/lucene/java-user/47928 http://www.gossamer-threads.com/lists/lucene/java-user/47928But the short form is that (and boy, doesn't the fact that the URL escaping spaces as '+', which is also a Lucene operator make looking at these interesting), is that the first term is essentially a SHOULD clause in a Lucene BooleanQuery and is matching your docs all by itself. HTH Erick On Mon, Sep 20, 2010 at 8:58 AM, n...@frameweld.com wrote: Here is my raw query: q=opening+excellent+AND+presentation_id%3A294+AND+type%3Ablobversion=1.3 json.nl =maprows=10start=0wt=xmlhl=truehl.fl=texthl.simple.pre=span+class%3Dhlhl.simple.post=%2Fspanhl.fragsize=0hl.mergeContiguous=falsedebugQuery=on and here is what I get on the debugQuery: lst name=debug − str name=rawquerystring opening excellent AND presentation_id:294 AND type:blob /str − str name=querystring opening excellent AND presentation_id:294 AND type:blob /str − str name=parsedquery all_text:open +all_text:excel +presentation_id:294 +type:blob /str − str name=parsedquery_toString all_text:open +all_text:excel +presentation_id:€#0;Ħ +type:blob /str − lst name=explain − str name=1435675blob 3.1143723 = (MATCH) sum of: 0.46052343 = (MATCH) weight(all_text:open in 4457), product of: 0.5531408 = queryWeight(all_text:open), product of: 5.3283896 = idf(docFreq=162, maxDocs=12359) 0.10381013 = queryNorm 0.8325609 = (MATCH) fieldWeight(all_text:open in 4457), product of: 1.0 = tf(termFreq(all_text:open)=1) 5.3283896 = idf(docFreq=162, maxDocs=12359) 0.15625 = fieldNorm(field=all_text, doc=4457) 0.74662465 = (MATCH) weight(all_text:excel in 4457), product of: 0.7043054 = queryWeight(all_text:excel), product of: 6.7845535 = idf(docFreq=37, maxDocs=12359) 0.10381013 = queryNorm 1.0600865 = (MATCH)
Re: Odd query result
When I run it, with that fieldType, it seems to work for me. Here's a sample query output ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime17/int lst name=params str name=indenton/str str name=start0/str str name=qxtext:I-Car/str str name=version2.2/str str name=rows10/str /lst /lst result name=response numFound=2 start=0 doc str name=idALLCAPS/str str name=xtextI-CAR/str /doc doc str name=idCAMEL/str str name=xtextI-Car/str /doc /result /response Did I miss something? Could you show the output with debugQuery=on for the user's failing query? Assuming I did this right, I'd next look for is a copyField. Is the user's query really being executed against this field? Schema.xml could be useful, too. Tom On Tue, Apr 20, 2010 at 10:19 AM, Charlie Jackson charlie.jack...@cision.com wrote: I've got an odd scenario with a query a user's running. The user is searching for the term I-Car. It will hit if the document contains the term I-CAR (all caps) but not if it's I-Car. When I throw the terms into the analysis page, the resulting tokens look identical, and my I-Car tokens hit on either term. Here's the definition of the field: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType I'm pretty sure this has to do with the settings on the WordDelimiterFactory, but I must be missing something because I don't see anything that would cause the behavior I'm seeing.
Re: Problem with suggest search
You need a query string with the standard request handler. (dismax has q.alt) Try q=*:*, if you are trying to get facets for all documents. And yes, a friendlier error message would be a good thing. Tom On Mon, Mar 15, 2010 at 9:03 AM, David Rühr d...@marketing-factory.de wrote: Hi List. We have two Servers dev and live. Dev is not our Problem but on live we see with the facet.prefix paramter - if there is no q param - for suggest search this error: HTTP Status 500 - null java.lang.NullPointerException at java.io.StringReader.init(StringReader.java:54) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:197) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78) at org.apache.solr.search.QParser.getQuery(QParser.java:137) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:85) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1313) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:811) The Query looks like: facet=onfacet.mincount=1facet.limit=10json.nl =mapwt=jsonrows=0version=1.2omitHeader=truefl=contentstart=0q=facet.prefix=matefacet.field=contentfq=group:0+OR+group:-2+OR+group:1+OR+group:11+-group:-1fq=language:0 When we add the q param f.e. q=material we have no error. Anyone have the same error or can help? Thanks to all. David
Re: java.lang.OutOfMemoryError, VM may need to be forcibly terminated
Hi - The best way is probably to add more ram. :-) That error apparently results from running out of perm gen space, and with 512m, you may not have much perm gen space. Options for increasing this can be found http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp But, if you don't have enough memory, that's just going to move the problem. You can watch memory usage with jconsole, or get more detail with something like yourkit. Tom On Fri, Mar 12, 2010 at 10:17 AM, Oleg Burlaca o...@burlaca.com wrote: Hello, I've searched the list for this kind of error but never find one that is similar to my case: Java HotSpot(TM) Client VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGTERM to handler- the VM may need to be forcibly terminated I use the latest stable SOLR 1.4 and start it with Jetty from the /example/ folder. Sometimes SOLR dies without writing to the stderrout.log (I use the script from http://wiki.apache.org/solr/SolrJetty) The messages above appears in the standard error stream instead of the log file. (i.e. it appears directly in the SSH window). I've set: New class=org.mortbay.thread.BoundedThreadPool Set name=minThreads2/Set Set name=lowThreads2/Set Set name=maxThreads2/Set /New Is there a way to solve this? SOLR is on a VPS with 512MB of RAM. Regards, Oleg Burlaca
Re: Warning : no lockType configured for...
Hi Mani, Mani EZZAT wrote: I'm dynamically creating cores with a new index, using the same schema and solrconfig.xml Does the problem occur if you use the same configuration in a single, static core? Tom -- View this message in context: http://old.nabble.com/Re%3A-Warning-%3A-no-lockType-configured-for...-tp27740724p27758951.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple Cores Vs. Single Core for the following use case
Hi - I'd probably go with a single core on this one, just for ease of operations. But here are some thoughts: One advantage I can see to multiple cores, though, would be better idf calculations. With individual cores, each user only sees the idf for his own documents. With a single core, the idf will be across all documents. In theory, better relevance. While multi-core will use more ram to start with, and I would expect it to use more disk (term dictionary per core). Filters would add to the memory footprint of the multiple core setup. However, if you only end up sorting/faceting on some of the cores, your memory use with multiple cores may actually be less. With multiple cores, each field cache only covers one user's docs. With single core, you have one field cache entry per doc in the whole corpus. Depending on usage patterns, index sizes, etc, this could be a significant amount of memory. Tom On Wed, Jan 27, 2010 at 11:38 AM, Amit Nithian anith...@gmail.com wrote: It sounds to me that multiple cores won't scale.. wouldn't you have to create multiple configurations per each core and does the ranking function change per user? I would imagine that the filter method would work better.. the caching is there and as mentioned earlier would be fast for multiple searches. If you have searches for the same user, then add that to your warming queries list so that on server startup, the cache will be warm for certain users that you know tend to do a lot of searches. This can be known empirically or by log mining. I haven't used multiple cores but I suspect that having that many configuration files parsed and loaded in memory can't be good for memory usage over filter caching. Just my 2 cents Amit On Wed, Jan 27, 2010 at 8:58 AM, Matthieu Labour matthieu_lab...@yahoo.comwrote: Thanks Didier for your response And in your opinion, this should be as fast as if I would getCore(userId) -- provided that the core is already open -- and then search for Paris ? matt --- On Wed, 1/27/10, didier deshommes dfdes...@gmail.com wrote: From: didier deshommes dfdes...@gmail.com Subject: Re: Multiple Cores Vs. Single Core for the following use case To: solr-user@lucene.apache.org Date: Wednesday, January 27, 2010, 10:52 AM On Wed, Jan 27, 2010 at 9:48 AM, Matthieu Labour matthieu_lab...@yahoo.com wrote: What I am trying to understand is the search/filter algorithm. If I have 1 core with all documents and I search for Paris for userId=123, is lucene going to first search for all Paris documents and then apply a filter on the userId ? If this is the case, then I am better off having a specific index for the user=123 because this will be faster If you want to apply the filter to userid first, use filter queries (http://wiki.apache.org/solr/CommonQueryParameters#fq). This will filter by userid first then search for Paris. didier --- On Wed, 1/27/10, Marc Sturlese marc.sturl...@gmail.com wrote: From: Marc Sturlese marc.sturl...@gmail.com Subject: Re: Multiple Cores Vs. Single Core for the following use case To: solr-user@lucene.apache.org Date: Wednesday, January 27, 2010, 2:22 AM In case you are going to use core per user take a look to this patch: http://wiki.apache.org/solr/LotsOfCores Trey-13 wrote: Hi Matt, In most cases you are going to be better off going with the userid method unless you have a very small number of users and a very large number of docs/user. The userid method will likely be much easier to manage, as you won't have to spin up a new core every time you add a new user. I would start here and see if the performance is good enough for your requirements before you start worrying about it not being efficient. That being said, I really don't have any idea what your data looks like. How many users do you have? How many documents per user? Are any documents shared by multiple users? -Trey On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour matthieu_lab...@yahoo.comwrote: Hi Shall I set up Multiple Core or Single core for the following use case: I have X number of users. When I do a search, I always know for which user I am doing a search Shall I set up X cores, 1 for each user ? Or shall I set up 1 core and add a userId field to each document? If I choose the 1 core solution then I am concerned with performance. Let's say I search for NewYork ... If lucene returns all New York matches for all users and then filters based on the userId, then this is going to be less efficient than if I have sharded per user and send the request for New York to the user's core Thank you for your help matt -- View this message in context:
Re: Plurals in solr indexing
I recommend getting familiar with the analysis tool included with solr. From Solr's main admin screen, click on analysis, Check verbose, and enter your text, and you can see the changes that happen during analysis. It's really helpful, especially when getting started. Tom On Wed, Jan 27, 2010 at 2:41 AM, murali k ilar...@gmail.com wrote: Hi, I am having trouble with indexing plurals, I have the schema with following fields gender (field) - string (field type) (eg. data Boys) all (field) - text (field type) - solr.WhitespaceTokenizerFactory, solr.SynonymFilterFactory, solr.WordDelimiterFilterFactory, solr.LowerCaseFilterFactory, SnowballPorterFilterFactory i am using copyField from gender to all and searching on all field When i search for Boy, I get the results, If i search for Boys i dont get results, I have tried things like boys bikes - no results boy bikes - works kid and kids are synonymns for boy and boys, so i tried adding kid,kids,boy,boys in synonyms hoping it will work, it doesnt work that way I also have other content fields which are copied to all , and it contains words like kids, boys etc... any idea? -- View this message in context: http://old.nabble.com/Plurals-in-solr-indexing-tp27335639p27335639.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Improvising solr queries
Hi - Something doesn't make sense to me here: On Mon, Jan 4, 2010 at 5:55 AM, dipti khullar dipti.khul...@gmail.comwrote: - optimize runs on master in every 7 minutes - using postOptimize , we execute snapshooter on master - snappuller/snapinstaller on 2 slaves runs after every 10 minutes Why would you optimize every 7 minutes, and update the slaves every ten? After 70 minutes you'll be doing both at the same time. How about optimizing every ten minutes, at :00,:10, :20, :30, :40, :50 and then pulling every ten minutes at :01, :11, :21, :31, :41, :51 (assuming your optimize completes in one minute). Or did I misunderstand something? The issue gets resolved as soon as we optimize the slave index. In the solr admin, it shows only 4 requests/sec is handled with 400 ms response time. From your earlier description, it seems like you should only be distributing an optimized index, so optimizing the slave should be a no-op. Check to see what files you have on the slave after snappulling. Tom
Re: Case Insensitive search not working
Did you rebuild the index? Changing the analyzer for the index doesn't affect already indexed documents. Tom On Tue, Dec 8, 2009 at 11:57 AM, insaneyogi3008 insaney...@gmail.comwrote: Hello, I tried to force case insensitive search by having the following setting in my schema.xml file which I guess is standard for Case sensitive searches : fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer type = index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class = solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class = solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType However when I perform searches on San Jose san jose , I get 16 0 responses back respectively is there anything else I missing here ? -- View this message in context: http://old.nabble.com/Case-Insensitive-search-not-working-tp26699734p26699734.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: why no results?
Hi - That's a common one to get bit by. The string On Mon, Dec 7, 2009 at 7:44 PM, regany re...@newzealand.co.nz wrote: hi all - newbie solr question - I've indexed some documents and can search / receive results using the following schema - BUT ONLY when searching on the id field. If I try searching on the title, subtitle, body or text field I receive NO results. Very confused. :confused: Can anyone see anything obvious I'm doing wrong Regan. ?xml version=1.0 ? schema name=core0 version=1.1 types fieldtype name=string class=solr.StrField sortMissingLast=true omitNorms=true / /types fields !-- general -- field name=id type=string indexed=true stored=true multiValued=false required=true / field name=title type=string indexed=true stored=true multiValued=false / field name=subtitle type=string indexed=true stored=true multiValued=false / field name=body type=string indexed=true stored=true multiValued=false / field name=text type=string indexed=true stored=false multiValued=true / /fields !-- field to use to determine and enforce document uniqueness. -- uniqueKeyid/uniqueKey !-- field for the QueryParser to use when an explicit fieldname is absent -- defaultSearchFieldtext/defaultSearchField !-- SolrQueryParser configuration: defaultOperator=AND|OR -- solrQueryParser defaultOperator=OR/ !-- copyFields group fields into one single searchable indexed field for speed. -- copyField source=title dest=text / copyField source=subtitle dest=text / copyField source=body dest=text / /schema -- View this message in context: http://old.nabble.com/why-no-results--tp26688249p26688249.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: why no results?
Sorry, just discovered a keyboard shortcut for send. :-) That's a common one to get bit by. The fieldtype StrField indexes the entire field as one item. So you can only find it if your search term is everything in the field. That is, fox will not find The Quick Brown Fox, because it's not the whole field. The ID field probably works because it has one term in it. 1 finds 1 just fine. Try solr.TextField instead. Tom On Mon, Dec 7, 2009 at 7:47 PM, Tom Hill solr-l...@worldware.com wrote: Hi - That's a common one to get bit by. The string On Mon, Dec 7, 2009 at 7:44 PM, regany re...@newzealand.co.nz wrote: hi all - newbie solr question - I've indexed some documents and can search / receive results using the following schema - BUT ONLY when searching on the id field. If I try searching on the title, subtitle, body or text field I receive NO results. Very confused. :confused: Can anyone see anything obvious I'm doing wrong Regan. ?xml version=1.0 ? schema name=core0 version=1.1 types fieldtype name=string class=solr.StrField sortMissingLast=true omitNorms=true / /types fields !-- general -- field name=id type=string indexed=true stored=true multiValued=false required=true / field name=title type=string indexed=true stored=true multiValued=false / field name=subtitle type=string indexed=true stored=true multiValued=false / field name=body type=string indexed=true stored=true multiValued=false / field name=text type=string indexed=true stored=false multiValued=true / /fields !-- field to use to determine and enforce document uniqueness. -- uniqueKeyid/uniqueKey !-- field for the QueryParser to use when an explicit fieldname is absent -- defaultSearchFieldtext/defaultSearchField !-- SolrQueryParser configuration: defaultOperator=AND|OR -- solrQueryParser defaultOperator=OR/ !-- copyFields group fields into one single searchable indexed field for speed. -- copyField source=title dest=text / copyField source=subtitle dest=text / copyField source=body dest=text / /schema -- View this message in context: http://old.nabble.com/why-no-results--tp26688249p26688249.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: deleteById without solrj?
http://wiki.apache.org/solr/UpdateXmlMessages#A.22delete.22_by_ID_and_by_Query On Thu, Dec 3, 2009 at 11:57 AM, Joel Nylund jnyl...@yahoo.com wrote: Is there a url based approach to delete a document? thanks Joel
Re: Multi-Term Synonyms
Hi Brad, I suspect that this section from the wiki for SynonymFilterFactory might be relevant: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory *Keep in mind that while the SynonymFilter will happily work with synonyms containing multiple words (ie: **sea biscuit, sea biscit, seabiscuit**) The recommended approach for dealing with synonyms like this, is to expand the synonym when indexing. This is because there are two potential issues that can arrise at query time:* 1. *The Lucene QueryParser tokenizes on white space before giving any text to the Analyzer, so if a person searches for the words **sea biscit** the analyzer will be given the words sea and biscit seperately, and will not know that they match a synonym.* ... Tom On Tue, Nov 24, 2009 at 10:47 AM, brad anderson solrinter...@gmail.comwrote: Hi Folks, I was trying to get multi term synonyms to work. I'm experiencing some strange behavior and would like some feedback. In the synonyms file I have the line: thomas, boll holly, thomas a, john q = tom And I have a document with the text field as; tom However, when I do a search on boll holly, it does not return the document with tom. The same thing happens if I do a query on john q. But if I do a query on thomas, it gives me the document. Also, if I quote boll holly or john q it gives back the document. When I look at the analyzer page on the solr admin page, it is transforming boll holly to tom when it isn't quoted. Why is it that it is not returning the document? Is there some configuration I can make so it does return the document if I do an unquoted search on boll holly? My synonym filter is defined as follows, and is only defined on the query side: filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ I've also tried changing the synonym file to be tom, thomas, boll holly, thomas a, john q This produces the same results. Thanks, Brad
Webinar: An Introduction to Basics of Search and Relevancy with Apache Solr hosted by Lucid Imagination
In this introductory technical presentation, renowned search expert Mark Bennett, CTO of Search Consultancy New Idea Engineering, will present practical tips and examples to help you quickly get productive with Solr, including: * Working with the web command line and controlling your inputs and outputs * Understanding the DISMAX parser * Using the Explain output to tune your results relevance * Using the Schema browser Wednesday, December 2, 2009 11:00am PST / 2:00pm EST Click here to sign up: http://www.eventsvc.com/lucidimagination/120209?trk=WR-DEC2009-AP
Talk on Solr - Oakland, CA June 18, 2008
Hi - I'll be giving a talk on Solr at the East Bay Innovations Group (eBig) Java SIG on Wed, June 18. http://www.ebig.org/index.cfm?fuseaction=Calendar.eventDetaileventID=16 This is an introductory / overview talk intended to get you from What is Solr Why Would I Use It to Cool, now I know enough go home and start playing with Solr. Tom -- View this message in context: http://www.nabble.com/Talk-on-Solr---Oakland%2C-CA-June-18%2C-2008-tp17880636p17880636.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr + Tomcat Undeploy Leaks
I certainly have seen memory problems when I just drop a new war file in place. So now I usually stop tomcat and restart. I used to see problems (pre-1.0) when I just redeployed repeatedly, without even accessing the app, but I've got a little script running in the background that has done that 50 times now, without running out of space. Are you on a current version? I'm on 1.2 Tlom On 10/18/07, Mike Klaas [EMAIL PROTECTED] wrote: I'm not sure that many people are dynamically taking down/starting up Solr webapps in servlet containers. I certainly perfer process-level management of my (many) Solr instances. -Mike On 18-Oct-07, at 10:40 AM, Stu Hood wrote: Any ideas? Has anyone had experienced this problem with other containers? I'm not tied to Tomcat if I can find another servlet host with a REST api for deploying apps. Thanks, Stu -Original Message- From: Stu Hood [EMAIL PROTECTED] Sent: Wednesday, October 17, 2007 4:46pm To: solr-user@lucene.apache.org Subject: Solr + Tomcat Undeploy Leaks Hello, I'm using the Tomcat Manager app with 6.0.14 to start and stop Solr instances, and I believe I am running into a variant of the linked issue: http://wiki.apache.org/jakarta-commons/Logging/UndeployMemoryLeak? action=print According to `top`, the 'size' of the Tomcat process reaches the limit I have set for it with the Java -Xmx flag soon after starting and launching a few instances. The 'RSS' varies based on how full the caches are at any particular time, but I don't think it ever reaches the 'size'. After a few days, I will get OOM errors in the logs when I try and start new instances (note: this is typically in the middle of the night, when usage is low), and all of the instances will stop responding until I (hard) restart Tomcat. Has anyone run into this issue before? Is logging the culprit? If so, what options do I have (besides setting up a cron job to restart Tomcat nightly...) Thanks, Stu Hood Webmail.us You manage your business. We'll manage your email.(R)
Re: Availability Issues
Hi - We're definitely not seeing that. What do your logs show? What do your schema/solrconfig look like? Tom On 10/8/07, David Whalen [EMAIL PROTECTED] wrote: Hi All. I'm seeing all these threads about availability and I'm wondering why my situation is so different than others'. We're running SOLR 1.2 with a 2.5G heap size. On any given day, the system becomes completely unresponsive. We can't even get /solr/admin/ to come up, much less any select queries. The only thing we can do is kill the SOLR process and re-start it. We are indexing over 25 million documents and we add about as much as we remove daily, so the number remains fairly constant. Again, it seems like other folks are having a much easier time with SOLR than we are. Can anyone help by sharing how you've got it configured? Does anyone have a similar experience? TIA. DW
Re: Solr live at Netflix
Nice! And there seem to be some improvements. For example, Gamers and Gamera no longer stem to the same word :-) Tom On 10/2/07, Walter Underwood [EMAIL PROTECTED] wrote: Here at Netflix, we switched over our site search to Solr two weeks ago. We've seen zero problems with the server. We average 1.2 million queries/day on a 250K item index. We're running four Solr servers with simple round-robin HTTP load-sharing. This is all on 1.1. I've been too busy tuning to upgrade. Thanks everyone, this is a great piece of software. wunder -- Walter Underwood Search Guy, Netflix
Re: pluggable functions
Hi - I'm not sure what you mean by a reflection based approach, but I've been thinking about doing this for a bit, since we needed it, too. I'd just thought about listing class names in the config file. The functions would probably need to extend a subclass of ValueSource which will handle argument parsing for the function, so you won't need to hard code the parsing in a VSParser subclass. I think this might simplify the existing code a bit. You might have to do a bit of reflection to instantiate the function. Did you have an alternate approach in mind? Are there any other things this would need to do? Is anyone else working on this? Tom On 9/18/07, Jon Pierce [EMAIL PROTECTED] wrote: I see Yonik recently opened an issue in JIRA to track the addition of pluggable functions (https://issues.apache.org/jira/browse/SOLR-356). Any chance this will be implemented soon? It would save users like me from having to hack the Solr source or write custom request handlers for trivial additions (e.g., adding a distance function), not to mention changes to downstream dependencies (e.g., solr-ruby). Perhaps a reflection-based approach would do the trick? - Jon
Re: Query for German Special Characters (i.e., ä, ö, ß)
Hi Marc, Are you using the same stemmer on your queries that you use when indexing? Try the analysis function in the admin UI, to see how things are stemmed for indexing vs. querying. If they don't match for really and fünny, and do match for kraßen, then that's your problem. Tom On 9/14/07, Marc Bechler [EMAIL PROTECTED] wrote: Hi, oops, the URIEncoding was lost during the update to tomcat 6.0.14. Thanks for the advice. But now I am really curioused. After indexing the document from scratch, I have the effect that queries to this and is work fine, whereas queries to really and fünny do not return the result. Fünnily ;-) , after extending my sometext to This is really fünny kraßen., queries to really and fünny still do not work, but kraßen is found. Now I am somehow confused -- hopefully anyone has a good explanation ;-) Regards, marc Tom Hill schrieb: If you are using tomcat, try adding URIEncoding=UTF-8 to your tomcat connector. Connector port=8080 maxHttpHeaderSize=8192 maxThreads=150 minSpareThreads=25 maxSpareThreads=75 enableLookups=false redirectPort=8443 acceptCount=100 connectionTimeout=2 disableUploadTimeout=true URIEncoding=UTF-8 / use the analysis page of the admin interface to check to see what's happening to your queries, too. http://localhost:8080/solr/admin/analysis.jsp?highlight=on (your port # may vary) Tom On 9/13/07, Marc Bechler [EMAIL PROTECTED] wrote: Hi SOLR kings, I'm just playing around with queries, but I was not able to query for any special characters like the German Umlaute (i.e., ä, ö, ü). Maybe others might have the same effects and already found a solution ;-) Here is my example: I have one field called sometext of type text (the one delivered with the SOLR example). I indexed a few words similar to field name=sometext ![CDATA[ This is really fünny ]]/field Works fine, and searching for really shows the result and fünny will be displayed correctly. However, the query for fünny using the /solr/admin page is resolved (correctly) to the URL ...q=f%C3%BCnny... but does not find the document. And now the question: Any ideas? ;-) Cheers, marc
Re: Query for German Special Characters (i.e., ä, ö, ß)
Hi Marc, The searches are going to look for an exact match of the query (after analysis) in the index (after analysis). So, realli will not match really. So you want to have the same stemmer (probably not the English one, given your examples) in both in index analyzer, and the query analyzer. I've appended the section from solr 1.2 example schema.xml, note EnglishPorterFilterFactory is in both sections. That would be what you want to do, with the appropriate stemmer for your application. Or, you could use no stemmer for BOTH, but I think most people go with stemming. At least, I do. :-) Tom fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words= stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected= protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words= stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected= protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType On 9/14/07, Marc Bechler [EMAIL PROTECTED] wrote: Index for really: 5* really. Query for really: 5* really, 2* realli (from: EnglishPorterFilterFactory {protected=protwords.txt}, RemoveDuplicatesTokenFilterFactory {}) For this everyting is completely fine. Is a complete matching required between index and query or is a partial matching also okay? Thanks for helping me marc Tom Hill schrieb: Hi Marc, Are you using the same stemmer on your queries that you use when indexing? Try the analysis function in the admin UI, to see how things are stemmed for indexing vs. querying. If they don't match for really and fünny, and do match for kraßen, then that's your problem. Tom On 9/14/07, Marc Bechler [EMAIL PROTECTED] wrote: Hi, oops, the URIEncoding was lost during the update to tomcat 6.0.14. Thanks for the advice. But now I am really curioused. After indexing the document from scratch, I have the effect that queries to this and is work fine, whereas queries to really and fünny do not return the result. Fünnily ;-) , after extending my sometext to This is really fünny kraßen., queries to really and fünny still do not work, but kraßen is found. Now I am somehow confused -- hopefully anyone has a good explanation ;-) Regards, marc Tom Hill schrieb: If you are using tomcat, try adding URIEncoding=UTF-8 to your tomcat connector. Connector port=8080 maxHttpHeaderSize=8192 maxThreads=150 minSpareThreads=25 maxSpareThreads=75 enableLookups=false redirectPort=8443 acceptCount=100 connectionTimeout=2 disableUploadTimeout=true URIEncoding=UTF-8 / use the analysis page of the admin interface to check to see what's happening to your queries, too. http://localhost:8080/solr/admin/analysis.jsp?highlight=on (your port # may vary) Tom On 9/13/07, Marc Bechler [EMAIL PROTECTED] wrote: Hi SOLR kings, I'm just playing around with queries, but I was not able to query for any special characters like the German Umlaute ( i.e., ä, ö, ü). Maybe others might have the same effects and already found a solution ;-) Here is my example: I have one field called sometext of type text (the one delivered with the SOLR example). I indexed a few words similar to field name=sometext ![CDATA[ This is really fünny ]]/field Works fine, and searching for really shows the result and fünny will be displayed correctly. However, the query for fünny using the /solr/admin page is resolved (correctly) to the URL ...q=f%C3%BCnny... but does not find the document. And now the question: Any ideas? ;-) Cheers, marc
Re: Slow response
Hi Mike, Thanks for clarifying what has been a bit of a black box to me. A couple of questions, to increase my understanding, if you don't mind. If I am only using fields with multiValued=false, with a type of string or integer (untokenized), does solr automatically use approach 2? Or is this something I have to actively configure? And is approach 2 better than 1? Or vice versa? Or is the answer it depends? :-) If, as I suspect, the answer was it depends, are there any general guidelines on when to use or approach or the other? Thanks, Tom On 9/6/07, Mike Klaas [EMAIL PROTECTED] wrote: On 6-Sep-07, at 3:25 PM, Mike Klaas wrote: There are essentially two facet computation strategies: 1. cached bitsets: a bitset for each term is generated and intersected with the query restul bitset. This is more general and performs well up to a few thousand terms. 2. field enumeration: cache the field contents, and generate counts using this data. Relatively independent of #unique terms, but requires at most a single facet value per field per document. So, if you factor author into Primary author/Secondary author, where each is guaranteed to only have one value per doc, this could greatly accelerate your faceting. There are probably fewer unique subjects, so strategy 1 is likely fine. To use strategy 2, just make sure that multivalued=false is set for those fields in schema.xml I forgot to mention that strategy 2 also requires a single token for each doc (see http://wiki.apache.org/solr/ FAQ#head-14f9f2d84fb2cd1ff389f97f19acdb6ca55e4cd3) -Mike
Re: Query for German Special Characters (i.e., ä, ö, ß)
If you are using tomcat, try adding URIEncoding=UTF-8 to your tomcat connector. Connector port=8080 maxHttpHeaderSize=8192 maxThreads=150 minSpareThreads=25 maxSpareThreads=75 enableLookups=false redirectPort=8443 acceptCount=100 connectionTimeout=2 disableUploadTimeout=true URIEncoding=UTF-8 / use the analysis page of the admin interface to check to see what's happening to your queries, too. http://localhost:8080/solr/admin/analysis.jsp?highlight=on (your port # may vary) Tom On 9/13/07, Marc Bechler [EMAIL PROTECTED] wrote: Hi SOLR kings, I'm just playing around with queries, but I was not able to query for any special characters like the German Umlaute (i.e., ä, ö, ü). Maybe others might have the same effects and already found a solution ;-) Here is my example: I have one field called sometext of type text (the one delivered with the SOLR example). I indexed a few words similar to field name=sometext ![CDATA[ This is really fünny ]]/field Works fine, and searching for really shows the result and fünny will be displayed correctly. However, the query for fünny using the /solr/admin page is resolved (correctly) to the URL ...q=f%C3%BCnny... but does not find the document. And now the question: Any ideas? ;-) Cheers, marc
Re: update servlet not working
I don't use the java client, but when I switched to 1.2, I'd get that message when I forget to add the content type header, as described in CHANGES.txt 9. The example solrconfig.xml maps /update to XmlUpdateRequestHandler using the new request dispatcher (SOLR-104). This requires posted content to have a valid contentType: curl -H 'Content-type:text/xml; charset=utf-8' The response format matches that of /select and returns standard error codes. To enable solr1.1 style /update, do not map /update to any handler in solrconfig.xml (ryan) But your request log shows a GET, should be a POST, I would think. I'd double check the parameters on post.jar On 9/6/07, Benjamin Li [EMAIL PROTECTED] wrote: oops, sorry, its says missing content stream as far as logs go: i have a request log, didn't find anything with stack traces though. where is it? we're using the example one packaged with solr. GET /solr/update HTTP/1.1 400 1401 just to make sure, i typed java -jar post.jar solrfile.xml thanks! On 9/6/07, Chris Hostetter [EMAIL PROTECTED] wrote: : We are able to navigate to the solr/admin page, but when we try to : POST an xml document via the command line, there is a fatal error. It : seems that the solr/update servlet isnt running, giving a http 400 : error. a 400 could mean a lot of things ... what is the full HTTP response you get back from Solr? what kinds of Stack traces show up in the Jetty log output? -Hoss -- cheers, ben
Re: Facet for multiple values field
Hi - I wouldn't facet on a text field, I tend to use string for the reasons you describe. e.g. Use field name=neighborhood_id type=string indexed=true stored=true multiValued=true/ or in your example field name=sensor type=string indexed=true stored=true multiValued=true/ If I have multiple values, I add them as separate occurrences of the field I am faceting on. If you still need them all in one field for other reasons, use copyField to assemble them. Tom On 8/30/07, Giri [EMAIL PROTECTED] wrote: Hi, I am trying to get the facet values from a field that contains multiple words, for example: I have a field keywords and values for this: Keywords= relative humidity, air temperature, atmospheric moisture Please note: I am combining multiple keywords in to one single field, with comma delimiter When I query for facet, I am getting some thing like: - relative (10) - humidity (10) - temperature (5) But I really need to display: - relative humidity(10) - air temperature(5) How can I do this? I know I am missing something in my schema field type declaration. I would appreciate if any one can post me an example schema field type that can handle this. Thanks! Here is my schema excerpt: fieldtype name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1/ filter class=solr.StopFilterFactory ignoreCase=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype And, the I declared the field as: field name=sensor type=text indexed=true stored=true/
Re: How to realize index spaces
Hi - On 8/23/07, Marc Bechler [EMAIL PROTECTED] wrote: I was wondering whether or not it is possible to realize different index spaces with one solr instance. Example: imagine, you want to have 2 index spaces that coexist independently (and wich can be identified, e.g., by a unique id). In your query, you specify an id, and the query should be performed only in the index space with the respective id. Moreover, it should be possible to add/remove additional index spaces dynamically. Just add a field that tells which index space the document belongs to (belonging to multiple is OK). And then to query only that index space, add, for example fq=space:product to your query URL. Assuming you named the field 'space', and wanted the 'product' space. There's a related example in the example solrconfig, look at requestHandler name=partitioned class=solr.DisMaxRequestHandler Tom
Synonym questions
Hi - Just looking at synonyms, and had a couple of questions. 1) For some of my synonyms, it seems to make senses to simply replace the original word with the other (e.g. theatre = theater, so searches for either will find either). For others, I want to add an alternate term while preserving the original (e.g. cirque = circus, so searches for circus find Cirque du Soleil, but searches for cirque only match cirque, not circus. I was thinking that the best way to do this was with two different synonym filters. The replace filter would be used both at index and query time, the other only at index time. Does doing this using two synonym filters make sense? section from my schema.xml fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory words=stopwords.txt/ filter class=solr.SynonymFilterFactory synonyms=synonyms_replace.txt ignoreCase=true expand=false includeOrig=false/ filter class=solr.SynonymFilterFactory synonyms=synonyms_add.txt ignoreCase=true expand=false includeOrig=true/ filter class=solr.EnglishPorterFilterFactory protected= protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory words=stopwords.txt/ filter class=solr.SynonymFilterFactory synonyms=synonyms_replace.txt ignoreCase=true expand=false includeOrig=false/ filter class=solr.EnglishPorterFilterFactory protected= protwords.txt/ /analyzer /fieldType 2) For this to work, I need to use includeOrig. It appears that includeOrig is hard coded to be false in SynonymFilterFactory. Is there any reason for this? It's pretty easy to change (diff below), any reason this should not be supported? Thanks, Tom Diffing vs. my local copy of 1.2, but it appears to be the same in HEAD. --- src/java/org/apache/solr/analysis/SynonymFilterFactory.java +++ src/java/org/apache/solr/analysis/SynonymFilterFactory.java (working copy) @@ -37,6 +37,7 @@ ignoreCase = getBoolean(ignoreCase,false); expand = getBoolean(expand,true); +includeOrig = getBoolean(includeOrig,false); if (synonyms != null) { ListString wlist=null; @@ -57,8 +58,9 @@ private SynonymMap synMap; private boolean ignoreCase; private boolean expand; + private boolean includeOrig; - private static void parseRules(ListString rules, SynonymMap map, String mappingSep, String synSep, boolean ignoreCase, boolean expansion) { + private void parseRules(ListString rules, SynonymMap map, String mappingSep, String synSep, boolean ignoreCase, boolean expansion) { int count=0; for (String rule : rules) { // To use regexes, we need an expression that specifies an odd number of chars. @@ -88,7 +90,6 @@ } } - boolean includeOrig=false; for (ListString fromToks : source) { count++; for (ListString toToks : target) {
Returning errors from request handler
Hi - With solr 1.2, when using XmlUpdateRequestHandler , if I post a valid command like commit/ I get a response like ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime0/int/lst /response Nice, valid xml. But If I have an error (for example, commit/comit) I get an HTML page back. This tends to confuse the client software. Is there a way to get a return like: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status1/intexceptionblah, blah, blah/exeptionint name=QTime0/int/lst /response I've seen comments in solrconfig about setting handleSelect to true or false, but didn't see any difference with either setting. I've actually written my own handler, but since XmlUpdateHandler does the same thing, I thought it would make a simple example. Am I doing something wrong? Or is there some config I need to do, or is that just how it is? Tom
Using request parameters in dismax boost functions
Hi - Perhaps I'm missing something obvious, but it there a way to get values from the user's request as arguments to boost functions in dismax? I'm thinking about distance based weighting for search results, which requires the user's x,y. Tom
Optimizing frequently updated index
Hi - I have an index that is updated fairly frequently (every few seconds), and I'm replicating to several slave servers. Because of the frequent updates, I'm usually pushing an index that is not optimized. And, as it takes several minutes to optimize, I don't want to do it every time I replicate (at least not on the master). I was wondering if it make sense to replicate to a slave instance, optimize it there, and then distribute the optimized index from the first level slave? Any thoughts? Thanks, Tom
Re: optimize/ takes an hour
Hi - What happens if updates occur during the optimize? Thanks, Tom
Re: Index corruptions?
Hi Charlie, On 5/3/07, Charlie Jackson [EMAIL PROTECTED] wrote: I have a couple of questions regarding index corruptions. 1) Has anyone using Solr in a production environment ever experienced an index corruption? If so, how frequently do they occur? I once had all slaves complain about a missing file in the index. The master never had a problem. The problem went away at the next snapshot. Is the cp-lr in snapshot really guaranteed to be atomic? Or is it just fast, and unlikely to be interrupted? This has only occurred once over the last 5 months. 2) It seems like the CollectionDistribution setup would be a good way to put in place a recovery plan for (or at least have some viable backups of) the index. However, I have a small concern that if the index gets corrupted on the master server, the corruption would propagate down to the slave servers as well. Is this concern unfounded? I would expect this to be true. Also, each of the snapshots taken by snapshooter are viable full indexes, correct? If so, that means I'd have a backup of the index each and every time a commit (or optimize for that matter) is done, which would be awesome. That's my understanding. Tom
Re: Group results by field?
Hi Matthew, You might be able to just get away with just using facets, depending on whether your goal is to provide a clickable list of styles_ids to the user, or if you want to only return one search result for each style_id. For a list of clickable styles, it's basic faceting, and works really well. http://wiki.apache.org/solr/SimpleFacetParameters Facet on style_id, present the list of facets to the user, and if the user selects style_id =37, then reissue the query with one more clause (+style_id:37) If you want the ability to only show one search result from each group, then you might consider the structure of your data. Is each style/size a separate record? Or is each style a record with multi-valued sizes? The latter might give you what you really want. Or, if you really want to remove dups from search results, you could do what I've done.I ended up modifying SolrIndexSearcher, and replacing FieldSortedHitQueue, and ScorePriorityQueue with versions that remove dups based in a particular field. Tom On 5/2/07, Matthew Runo [EMAIL PROTECTED] wrote: Hello! I was wondering - is it possible to search and group the results by a given field? For example, I have an index with several million records. Most of them are different sizes of the same style_id. I'd love to be able to do.. group.by=style_id or something like that in the results, and provide the style_id as a clickable link to see all the sizes of that style. Any ideas? ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++
Re: browse a facet without a query?
Hi - On 4/23/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 4/23/07, Jennifer Seaman [EMAIL PROTECTED] wrote: When there is no q Solr complains. How can I browse a facet without a keyword query? For example, I want to view all document for a given state; ?q=fq=state:California With a relatively recent nightly build, you can use q=*:* Before that, use an open-ended range query like q=state:[* TO *] I was doing the q=state[* TO *] for a short time, and found it very slow. I switched to doing a query on a single field that covered the part of the index I was interested in, for example: inStock:true And got much faster performance. I was getting execution times in seconds (for example, I just manually did this and got. 2.2 seconds for the [* TO *], and 50 milliseconds for the latter (inStock:true), uncached) In my case the filter query hits about 80% of the docs, so it's doing a similar amount of work. I don't know how well *:* performs, but if it is similar to state:[* TO *], I would benchmark it before using. For us, facet queries are a high percentage, so the time was critical. It might even be worth adding a field, if you don't already have an appropriate one. Tom