Re: Facet pivot 50.000.000 different values
Hi Mikhail, yes the thing is that I need to take into account different queries and that's why I can't use the Terms Component. Cheers. 2013/5/17 Mikhail Khludnev mkhlud...@griddynamics.com On Fri, May 17, 2013 at 12:47 PM, Carlos Bonilla carlosbonill...@gmail.comwrote: We only need to calculate how many different B values have more than 1 document but it takes ages Carlos, It's not clear whether you need to take results of a query into account or just gather statistics from index. if later you can just enumerate terms and watch into TermsEnum.docFreq() . Am I getting it right? -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Best query method
Hi I am using solr 4.2.1. My index has products from different stores with different attributes. If i want to get the count of all products which belongs to store X which is coloured red and is in-stock… My question is : Which way of querying is better in-terms of performance and cache usage. 1) q=*.*fq=(store:X) AND (colour:red) AND (in-stock:true) 2) q=store:Xfq=(colour:red) AND (in-stock:true) 3) q=store:Xfq=colour:redfq:in-stock:true f there is any other option better than these three.. please add let me know.. i am assuming that which ever filter eliminates more products… should come first (q, then list of fq's) ./zahoor
Re: Searching for terms having embedded white spaces like word1 word2
Thank you so very much Jack for your prompt reply. Your solution worked for us. I have another issue in querying fields having values of the sort stringThis is good/stringstringThis is also good/stringstringThis is excellent/string. I want to perform StartsWith as well as 'Contains searches on this field. The field definition is as follow, fieldType name=cust_str class=solr.TextField positionIncrementGap=100 sortMissingLast=true analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Please suggest how to perform the above mentioned search. -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-for-terms-having-embedded-white-spaces-like-word1-word2-tp4064170p4064355.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching for terms having embedded white spaces like word1 word2
Ideally, such a text search should be done using tokenized text and span query. Maybe you could do it using the surround query parser, but you should be able to do it using the LucidWorks Search query parser: this is BEFORE:1 (good OR excellent) But, given that you have a keyword tokenizer with embedded white space, you should be able to write a Lucene regex query for the same as raw text, something like [untested!]: /this\\s+is\\s+(\\w\\s+)?(good|excellent)/ That would be contains. Starts with: /^this\\s+is\\s+(\\w\\s+)?(good|excellent)/ Ends with: /this\\s+is\\s+(\\w\\s+)?(good|excellent)$/ Exact match: /^this\\s+is\\s+(\\w\\s+)?(good|excellent)$/ Caveat: BUT... such character-level regex matching is NOT guaranteed to be speedy and really should only be used for relatively small datasets. -- Jack Krupansky -Original Message- From: kobe.free.wo...@gmail.com Sent: Saturday, May 18, 2013 6:30 AM To: solr-user@lucene.apache.org Subject: Re: Searching for terms having embedded white spaces like word1 word2 Thank you so very much Jack for your prompt reply. Your solution worked for us. I have another issue in querying fields having values of the sort stringThis is good/stringstringThis is also good/stringstringThis is excellent/string. I want to perform StartsWith as well as 'Contains searches on this field. The field definition is as follow, fieldType name=cust_str class=solr.TextField positionIncrementGap=100 sortMissingLast=true analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Please suggest how to perform the above mentioned search. -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-for-terms-having-embedded-white-spaces-like-word1-word2-tp4064170p4064355.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best query method
You'll have to decide whether cached or uncached filter queries work best for your particular application. If you can us cached filter queries, that's better, and then separating or factoring the filter query terms is better. But if you have so much data or so little memory or such complex queries that caching is too expensive, you can go with uncached filter queries. You can then also assign a cost to each filter query to control the order they are executed: Example: q=*:*fq={!cache=false cost=5}inStock:truefq={!frange l=1 u=4 cache=false cost=50}sqrt(popularity) See: http://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters But, start simple, with separate, cached, filter queries, and only get fancy if you have problems with query latency. -- Jack Krupansky -Original Message- From: J Mohamed Zahoor Sent: Saturday, May 18, 2013 5:59 AM To: solr-user@lucene.apache.org Subject: Best query method Hi I am using solr 4.2.1. My index has products from different stores with different attributes. If i want to get the count of all products which belongs to store X which is coloured red and is in-stock… My question is : Which way of querying is better in-terms of performance and cache usage. 1) q=*.*fq=(store:X) AND (colour:red) AND (in-stock:true) 2) q=store:Xfq=(colour:red) AND (in-stock:true) 3) q=store:Xfq=colour:redfq:in-stock:true f there is any other option better than these three.. please add let me know.. i am assuming that which ever filter eliminates more products… should come first (q, then list of fq's) ./zahoor =
Re: Adding filed in Schema.xml
Hi Alex, Where I need to mention the types. Kindly tell me in detail. I use Drupal framework. It has given a schema file. In that there are already some long type fields, and these are actually shown by solr as part of index. Whatever long field I am adding it does not show part of index. Best Regards kamal On Fri, May 17, 2013 at 7:47 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Do you have the types corresponding to those fields present? Specifically, long. You don't get any special type names out of the box, they all need to be present in types area. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, May 17, 2013 at 8:49 AM, Kamal Palei palei.ka...@gmail.com wrote: Hi All I am trying to add few fields in schema.xml file as below. field name=salary type=long indexed=true stored=true / field name=experience type=long indexed=true stored=true / * field name=last_updated_date type=tdate indexed=true stored=true default=NOW multiValued=false / * dynamicField name=rs_* type=long indexed=true stored=true multiValued=false/ dynamicField name=rd_* type=tdate indexed=true stored=true multiValued=false/ Only the last_updated_date (the one in bold letters) getting added. Is there any syntax issue with other 4 entries. Kindly let me know. Thanks kamal
Re: Upgrading from SOLR 3.5 to 4.2.1 Results.
These numbers are really great. Would you mind sharing your h/w configuration and JVM params thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrading-from-SOLR-3-5-to-4-2-1-Results-tp4064266p4064370.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Upgrading from SOLR 3.5 to 4.2.1 Results.
Rishi, Fantastic! Thank you so very much for sharing the details. Jason On May 17, 2013, at 12:29 PM, Rishi Easwaran rishi.easwa...@aol.com wrote: Hi All, Its Friday 3:00pm, warm sunny outside and it was a good week. Figured I'd share some good news. I work for AOL mail team and we use SOLR for our mail search backend. We have been using it since pre-SOLR 1.4 and strong supporters of SOLR community. We deal with millions indexes and billions of requests a day across our complex. We finished full rollout of SOLR 4.2.1 into our production last week. Some key highlights: - ~75% Reduction in Search response times - ~50% Reduction in SOLR Disk busy , which in turn helped with ~90% Reduction in errors - Garbage collection total stop reduction by over 50% moving application throughput into the 99.8% - 99.9% range - ~15% reduction in CPU usage We did not tune our application moving from 3.5 to 4.2.1 nor update java. For the most part it was a binary upgrade, with patches for our special use case. Now going forward we are looking at prototyping SOLR Cloud for our search system, upgrade java and tomcat, tune our application further. Lots of fun stuff :) Have a great weekend everyone. Thanks, Rishi.
Re: Java heap space exception in 4.2.1
aah… was doing a facet on a double field which was having 6 decimal places… No surprise that the lucene cache got full… .z/ahoor On 17-May-2013, at 11:56 PM, J Mohamed Zahoor zah...@indix.com wrote: Memory increase a lot with queries which have facets… ./Zahoor On 17-May-2013, at 10:00 PM, Shawn Heisey s...@elyograg.org wrote: On 5/17/2013 1:17 AM, J Mohamed Zahoor wrote: I moved to 4.2.1 from 4.1 recently.. everything was working fine until i added few more stats query.. Now i am getting this error frequently that solr does not run even for 2 minutes continuously. All 5GB is getting used instantaneously in few queries... Someone on IRC ran into memory problems upgrading from 4.0 to 4.2. It wasn't OOM errors, they were just using a lot more heap than they were before and running into constant full garbage collections. There is another message on this list about someone who upgraded from 3.5 to 4.2 and is having memory troubles. The person on IRC made most of their fields unstored and reindexed, which fixed the problem for them. They only needed a few fields stored. Because the IRC user was on 4.0, I originally thought it had something to do with compressed stored fields, but on this thread, they started with 4.1. If that was the released 4.1.0 and not a SNAPSHOT version, then they had compressed stored fields before the upgrade. The user on IRC was not using termvectors or docvalues, which would be potential pain points unique to 4.2. I'm using 4.2.1 with no trouble in my setup, but I do have a heap that's considerably larger than I need. There are no apparent memory leaks - it's been running for over a month with updates once a minute. I've finally switched over from the 3.5.0 index to the new one, so for the last few days, it has been also taking our full query load. What could have changed between 4.1 and 4.2 to cause dramatically increased memory usage? From my /admin/system: date name=startTime2013-04-05T15:52:55.751Z/date Thanks, Shawn
Re: having trouble storing large text blob fields - returns binary address in search results
hello your comment made me think - so i decided to double check myself. i opened up the schema in squirrel and made sure that the two columns in question were actually of type TEXT in the schema - check i went in to the db-config.xml and removed all references to ClobTransformer, removed the cast directives from the fields as well as the clob=true on the two fields - i pasted the db-config.xml below for reference - check i restarted jboss - thus restarting solr - check i went in to the solr dataimport admin screen and did a clean import - check after the import was complete - i queried a part that i knew would have one of the clob fields - results are pasted below as well - you can see the binary address in the field. ?xml version=1.0? result name=response numFound=1 start=0 doc str name=accessoryIndicatorN/str * str name=attributes[B@5b372219/str* str name=availabilityStatusPIA/str arr name=divProductTypeDesc strRefrigerators and Freezers/str /arr str name=divProductTypeId0046/str str name=id12001892,0046,464/str str name=itemModelDescVALVE, WATER/str str name=itemModelNo12001892/str str name=itemModelNoExactMatchStr12001892/str int name=itemType1/int str name=otcStockIndicatorY/str int name=partCnt1/int str name=partConditionN/str arr name=plsBrandDesc str/ /arr str name=plsBrandId464/str str name=productIndicatorN/str int name=rankNo13/int float name=sellingPrice53.54/float str name=sourceOrderNo464 /str str name=subbedFlagY/str /doc /result document entity transformer=TemplateTransformer name=core1-parts query=select summ.*, 1 as item_type, 1 as part_cnt, '' as brand, mst.acy_prt_fl, mst.dil_tx, mst.hzd_mtl_typ_cd, mst.otc_cre_stk_fl, mst.prd_fl, mst.prt_cmt_tx, mst.prt_cnd_cd, mst.prt_inc_qt, mst.prt_made_by, mst.sug_qt, att.attr_val, rsr.rsr_val, case when sub.orb_itm_id is null then 'N' else 'Y' end as subbed_flag from prtxtps_prt_summ as summ left outer join prtxtpm_prt_mast as mst on mst.orb_itm_id = summ.orb_itm_id and mst.prd_gro_id = summ.prd_gro_id and mst.spp_id = summ.spp_id left outer join tmpxtpa_prt_attr as att on att.orb_itm_id = summ.orb_itm_id and att.prd_gro_id = summ.prd_gro_id and att.spp_id = summ.spp_id left outer join tmpxtpr_prt_rsr as rsr on rsr.orb_itm_id = summ.orb_itm_id and rsr.prd_gro_id = summ.prd_gro_id and rsr.spp_id = summ.spp_id left outer join tmpxtps_prt_sub as sub on sub.orb_itm_id = summ.orb_itm_id and sub.prd_gro_id = summ.prd_gro_id and sub.spp_id = summ.spp_id where summ.spp_id = '464' field column =id name=id template=${core1-parts.orb_itm_id},${core1-parts.prd_gro_id},${core1-parts.spp_id}/ field column=orb_itm_id name=itemModelNo/ field column=prd_gro_id name=divProductTypeId/ field column=ds_tx name=itemModelDesc/ field column=spp_id name=plsBrandId/ field column=rnk_no name=rankNo/ field column=item_type name=itemType/ field column=brand name=plsBrandDesc/ field column=prd_gro_ds name=divProductTypeDesc/ field column=part_cnt name=partCnt/ field column=avail name=availabilityStatus/ field column=price name=sellingPrice/ field column=prt_son name=sourceOrderNo/ field column=prt_src_cd name=sourceIdCode/ field column=rte_cd name=sourceRouteCode/ field column=acy_prt_fl name=accessoryIndicator/ field column=dil_tx name=disclosure/ field column=hzd_mtl_typ_cd name=hazardousMaterialCode/ field column=otc_cre_stk_fl name=otcStockIndicator/ field column=prd_fl name=productIndicator/ field column=prt_cmt_tx name=comment/ field column=prt_cnd_cd name=partCondition/ field column=prt_inc_qt name=qtyIncluded/ field column=prt_made_byname=madeBy/ field column=sug_qt name=suggestedQty/ field column=attr_val name=attributes/ field column=rsr_valname=restrictions/ field column=subbed_flag
Wide vs Tall document in Solr 4.2.1
Hi, We recently decided to move from Solr version 3.5 to 4.2.1. The transition seam to be smooth from development point but i see some intermediate issues with our cluster. Some information We use the classic Master/Slave model (have plans to move to Cloud v4.3) #documents 300K and have around 150 fields (including dynamic) index size 10GB Most of the fields are multiValued (type String) and the size of array in those vary from 5 to 50K. So our 30% of popular documents are tall. Not all information in this multivalued fields is required so at application layer we loop and eliminate the unwanted. These are stored is such fashion because of the 1 to many mapping in SQL DB. Issues that we observed is high CPU and Memory utilization while retrieving these document with large multivalued fields. So my questions is if its possible to make this tall document to a wide document so only required information is fetched. Is this a better approach to look for? Any other thoughts are welcomed. thanks Aditya -- View this message in context: http://lucene.472066.n3.nabble.com/Wide-vs-Tall-document-in-Solr-4-2-1-tp4064409.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Zookeeper Ensemble Startup Parameters For SolrCloud?
I have read that about zookeeper: Zookeeper servers have an active connections limit, which by default is 30. Do you define it higher than 30 for Solr? 2013/5/17 vsilgalis vsilga...@gmail.com As an example, I have 9 SOLR nodes (3 clusters of 3) using different versions of SOLR (4.1, 4.1, and 4.2.1), utilizing the same zookeeper ensemble (3 servers), using chroot for the different configs across clusters. My zookeeper servers are just VMs, dual-core with 1GB of RAM and are only used for SOLRCloud JVM settings for zookeeper for heap size are start 256MB and max heap size of 512MB or: -Xms256m -Xmx512m I have never seen it use more than the specified start heap size of 256MB. -- View this message in context: http://lucene.472066.n3.nabble.com/Zookeeper-Ensemble-Startup-Parameters-For-SolrCloud-tp4063905p4064279.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wide vs Tall document in Solr 4.2.1
: We recently decided to move from Solr version 3.5 to 4.2.1. The transition ... : Most of the fields are multiValued (type String) and the size of array in : those vary from 5 to 50K. So our 30% of popular documents are tall. Not all ... : Issues that we observed is high CPU and Memory utilization while retrieving : these document with large multivalued fields. Are you certain you ar using 4.2.1 and not 4.2 ? There was a particularly bad bug related to enableLazyFieldLoading affecting Solr 4.0, 4.1, and 4.2, but it should *not* affect 4.2.1... https://issues.apache.org/jira/browse/SOLR-4589 If you are seeing slow response times and heavy CPU spikes, it would help to know if you could take some thread dumps during those CPU spikes to see what it chewing up CPU ... you may just be seeing the effects of stored field compression -- which uses more CPU on stored field retrieval to decompress the blocks of field values, but allows the index size to be much smaller so more things can be cached in RAM. : So my questions is if its possible to make this tall document to a wide : document so only required information is fetched. Is this a better : approach to look for? Any other thoughts are welcomed. I don't really understand what you mean by tall vs wide (i thought i understood what you ment by tall initially, but i don't understand what you mean by make the tall document side just in case it's not obvious: if there are stored fields you don't want back in the response, leave them out of your fl param and only request the fields you actaully want. -Hoss
Re: Adding filed in Schema.xml
Hi Alex I just saw in* types *area, long is already defined as * fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/ * Hence I hope, I should be able to declare a long type index in* fields *area as shown below. field name=salary type=long indexed=true stored=true / field name=experience type=long indexed=true stored=true / Not sure, why it is not taking effect. Best Regards Kamal On Sat, May 18, 2013 at 6:23 PM, Kamal Palei palei.ka...@gmail.com wrote: Hi Alex, Where I need to mention the types. Kindly tell me in detail. I use Drupal framework. It has given a schema file. In that there are already some long type fields, and these are actually shown by solr as part of index. Whatever long field I am adding it does not show part of index. Best Regards kamal On Fri, May 17, 2013 at 7:47 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Do you have the types corresponding to those fields present? Specifically, long. You don't get any special type names out of the box, they all need to be present in types area. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, May 17, 2013 at 8:49 AM, Kamal Palei palei.ka...@gmail.com wrote: Hi All I am trying to add few fields in schema.xml file as below. field name=salary type=long indexed=true stored=true / field name=experience type=long indexed=true stored=true / * field name=last_updated_date type=tdate indexed=true stored=true default=NOW multiValued=false / * dynamicField name=rs_* type=long indexed=true stored=true multiValued=false/ dynamicField name=rd_* type=tdate indexed=true stored=true multiValued=false/ Only the last_updated_date (the one in bold letters) getting added. Is there any syntax issue with other 4 entries. Kindly let me know. Thanks kamal
Re: Upgrading from SOLR 3.5 to 4.2.1 Results.
Awesome news Rishi! Looking forward to your SolrCloud updates. On Sat, May 18, 2013 at 12:59 AM, Rishi Easwaran rishi.easwa...@aol.comwrote: Hi All, Its Friday 3:00pm, warm sunny outside and it was a good week. Figured I'd share some good news. I work for AOL mail team and we use SOLR for our mail search backend. We have been using it since pre-SOLR 1.4 and strong supporters of SOLR community. We deal with millions indexes and billions of requests a day across our complex. We finished full rollout of SOLR 4.2.1 into our production last week. Some key highlights: - ~75% Reduction in Search response times - ~50% Reduction in SOLR Disk busy , which in turn helped with ~90% Reduction in errors - Garbage collection total stop reduction by over 50% moving application throughput into the 99.8% - 99.9% range - ~15% reduction in CPU usage We did not tune our application moving from 3.5 to 4.2.1 nor update java. For the most part it was a binary upgrade, with patches for our special use case. Now going forward we are looking at prototyping SOLR Cloud for our search system, upgrade java and tomcat, tune our application further. Lots of fun stuff :) Have a great weekend everyone. Thanks, Rishi. -- Regards, Shalin Shekhar Mangar.
Re: Adding filed in Schema.xml
On 19 May 2013 08:36, Kamal Palei palei.ka...@gmail.com wrote: Hi Alex I just saw in* types *area, long is already defined as * fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/ * Hence I hope, I should be able to declare a long type index in* fields *area as shown below. field name=salary type=long indexed=true stored=true / field name=experience type=long indexed=true stored=true / Yes, this should be fine. Not sure, why it is not taking effect. What do you mean by not taking effect? You do not seem to have made this clear anywhere in the thread. Besides adding the fields to Solr's schema.xml, you have to make sure that field values are picked up, and indexed properly into Solr. How are you indexing? Have you reindexed after adding the fields? Are you getting any errors in the logs after the indexing. Regards, Gora