Using facets to narrow results with multiword field
Hi, I'm trying to prepare narrow you search functionality using facets. I do have some products and would like to use a brand as a narrow filter. I did prepare in schema 2 fileds: fieldType name=brand_string class=solr.TextField sortMissingLast=true omitNorms=true positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.TrimFilterFactory / /analyzer /fieldType fieldType name=lower_string class=solr.TextField sortMissingLast=true omitNorms=true positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / /analyzer /fieldType field name=brand type=brand_string indexed=true stored=true default= none/ field name=lbrand type=lower_string indexed=true stored=false defaul t=none/ copyField source=brand dest=lbrand/ I'm using facet.field=lbrand and do get good results for eg: Geomax, GeoMax, GEOMAX all of them falls into geomax. But when I'm filtering I do get strange results: brand:geomax gives numFound=0 lbrand:geomax gives numFound=57 (GEOMAX, GeoMag, Geomag) How should I redefine brand to let narrow work correctly? Tomek
Re: Using facets to narrow results with multiword field
Correction: I'm using facet.field=lbrand and do get good results for eg: Geomag, GeoMag, GEOMAG all of them falls into geomag. But when I'm filtering I do get strange results: brand:geomag gives numFound=0 lbrand:geomag gives numFound=57 (GEOMAG, GeoMag, Geomag) How should I redefine brand to let narrow work correctly? Of course all of the words are the same (only case is different) TK
Re: Huge load and long response times during search
Hi, : I'm using SOLR(1.4) to search among about 3,500,000 documents. After the : server kernel was updated to 64bit system has started to suffer. ...if the *only* thing that was upgraded was switching the kernel from 32bit to 64bit, then perhaps you are getting bit by java now using 64 bit pointers instead of 32 bit pointers, causing a lot more ram to be eaten up by the pointers? it's not soemthing i've done a lot of testing on, but i've heared other people claim that it can cause some serious problems if you don't actaully need 64bit pointers for accessing huge heaps. ...that said, you should really double check what exactly what changed when your server was upgraded ... perhaps the upgrad inlcuded a new filesystem type, or changes to RAID settings, or even hardware changes ... if your problems started when an upgrade took place, then looking into what exactly changed during hte upgrade should be your furst step. The kernel was the only thing which was changed. There were no hardware update, nobody touch the filesystem as well. So now this is a 32bit Debian with 64bit kernel I have heard from our admins that the previous kernel had a grsec patch which regural killed java processes with signal 11. To find out if the SOLR is a single problem or is the coegzistence of other services at one machine we are going to move solr to another one (same configuration) which is low used (small php app providing data from memcache filled once per hour). Tom
Re: Boost document base on field length
Hi, I think i'm reading he question differently then Grant -- his suggestion applies when you are searching in the description field, and don't want documents with shorter descriptions to score higher when the same terms match the same number of times (the default behavior of lengthNorm) my udnerstanding is that you want documents that don't have a description to score lower then documents that do -- and you might be querying against completely differnet fields (description might not even be indexed) in that case there is no easy way to to achieve this with just the description field ... the easy thing to do is to index a boolean has_description field and then incorporate that into your query (or as the input to a function query) You get my point Hoss. In my case long description = good value. And your intuition is amazing ;-) I do have a field which is not used in search at all (image url) but docs with image have for me greater value than without it. I would add two fields then (boolean for photo and int for description length) fill them up during indexation and would play with them during the search. Thanks, Tom
Get one document from each category
Hi, I have the following case: In my index I do have documents categorized (category_id - int sortable field). I would like to get three top documents matching user query BUT each have to be from different category.: for example from returned set (doc_id : category id): 1:1 2:1 3:1 4:2 5:1 6:2 7:3 8:4 I would like to get docs 1, 4 and 7. Is that possible without quering 3 times? Often lot of (more than my limit) the docs at the beginning are from the same category. I'm using PHP Apache Solr so I would like to avoid processing large sets of data in my PHP based application. Tomek
Re: Huge load and long response times during search
Hi, Otis Gospodnetic pisze: Tom, It looks like the machine might simply be running too many things. If the load is around 1 when Solr is not running, and this is a dual-core server, it shows its already relatively busy (cca 50% idle). The server is running the Postgresql and Apache/PHP as well, but without solr the server condition is more than good (load usually less than 1, sometimes , even dring rush hours we observed 1m load avg 0,68). It is double dual core so load 1 means 25% am I right (4 cores)? Your caches are not small, so I am guessing you either have to have a relatively big heap, or your heap is not large enough and it's the GC that's causing high CPU load. The java starts with Xmx3584m. Should that be fine for such cache settings? By the way I'm wondering if we need such caches. I did check query frequency for last 10 days (~7 unique users) and most frequent phrase appears ~150 times, and only 11 queries exists more than 100 times. I did not count if user used the same query but goes to next page. Is this worthy to keep quite big cache in this cas? If you are seeing Solr causing lots of IO, that's a sign the box doesn't have enough memory for all those servers running comfortably on it. We do have some free memory to use. Server has 8G RAM and mostly uses up to 6G, I haven't seen the swap used yet. I would try to give more RAM for java and use smaller cache to see if it would work. Tom
Boost document base on field length
Hi, I would like to boost documents with longer descriptions to move down documents with 0 length description, I'm wondering if there is possibility to boost document basing on the field length while searching or the only way is to store field length as an int in a separate field while indexing? Tom
Huge load and long response times during search
Hi, I'm using SOLR(1.4) to search among about 3,500,000 documents. After the server kernel was updated to 64bit system has started to suffer. Our server has 8G of RAM and double Intel Core 2 DUO. We used to have average loads around 2-2,5. It was not as good as it should but as long HTTP response times was acceptable we do not care to much ;-) Since few days avg loads are usually around 6, sometimes goes even to 20. PHP, Mysql and Postgresql based application is rather fine, but when tries to access SOLR it takes ages to load page. In top java process (Jetty) takes 200-250% of CPU, iotop shows that most of the disk operations are done by SOLR threads as well. When we do shut down Jetty load goes down to 1,5 or even less than 1. My index has ~12G below is a part of my solrconf.xml: query maxBooleanClauses1024/maxBooleanClauses filterCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=4096/ queryResultCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=1024/ documentCache class=solr.LRUCache size=16384 initialSize=16384 autowarmCount=0/ enableLazyFieldLoadingtrue/enableLazyFieldLoading useFilterForSortedQuerytrue/useFilterForSortedQuery queryResultWindowSize40/queryResultWindowSize queryResultMaxDocsCached200/queryResultMaxDocsCached HashDocSet maxSize=3000 loadFactor=0.75/ listener event=newSearcher class=solr.QuerySenderListener arr name=queries lst str name=qsolr/str str name=start0/str str name=rows10/str /lst lst str name=qsolr/str str name=sortprice/str str name=start0/str str name=rows10/str /lst lst str name=qsolr/str str name=sortrekomendacja/str str name=start0/str str name=rows10/str /lst lststr name=qstatic newSearcher warming query from solrconfig.xml/str/lst /arr /listener listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lst str name=qfast_warm/str str name=start0/str str name=rows10/str /lst lststr name=qstatic firstSearcher warming query from solrconfig.xml/str/lst /arr /listener useColdSearcherfalse/useColdSearcher /query requestHandler name=dismax class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qf name^90.0 scategory^450.0 brand^90.0 text^0.01 description^30 /str str name=pf /str str name=bf /str str name=fl brand,description,id,name,price,score /str str name=mm 4lt;100% 5lt;90% /str int name=ps100/int str name=q.alt*:*/str /lst /requestHandler sample query parameters from log looks like this: 2009-11-20 21:07:15 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={spellcheck=truewt=jsonrows=20json.nl=mapstart=520facet=truespellcheck.collate=truefl=id,name,description,preparation,url,shop_idq=cameraqt=dismaxversion=1.3hl.fl=name,description,atributes,brand,urlfacet.field=shop_idfacet.field=brandhl.fragsize=200spellcheck.count=5hl.snippets=3hl=true} hits=3784 status=0 QTime=83 2009-11-20 21:07:15 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/spellCheckCompRH params={spellcheck=truewt=jsonrows=20json.nl=mapstart=520facet=truespellcheck.collate=truefl=id,name,description,preparation,url,shop_idq=cameraqt=dismaxversion=1.3hl.fl=name,description,atributes,brand,urlfacet.field=shop_idfacet.field=brandhl.fragsize=200spellcheck.count=5hl.snippets=3hl=true} hits=3784 status=0 QTime=16 And at last the question ;-) How to speed up the search? Which parameters should I check first to find out what is the bottleneck? Sorry for verbose entry but I would like to give as clear point of view as possible Thanks in advance, Tom