RE: CollapseQParserPluging Incorrect Facet Counts
Thanks Joel, I don't know why I was unable to find the "understanding collapsing" email thread via the search I did on the site but I found it in my own email search now. We'll look into our specific scenario and see if we can find a workaround. Thanks! CARLOS MAROTO M +1 626 354 7750 -Original Message- From: Joel Bernstein [mailto:joels...@gmail.com] Sent: Friday, June 19, 2015 1:18 PM To: solr-user@lucene.apache.org Subject: Re: CollapseQParserPluging Incorrect Facet Counts If you see the last comment on: https://issues.apache.org/jira/browse/SOLR-6143 You'll see there is a discussion starting about adding this feature. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Jun 19, 2015 at 4:14 PM, Joel Bernstein wrote: > The CollapsingQParserPlugin does not provide facet counts that are > them same as the group.facet feature in Grouping. It provides facet > counts that behave like group.truncate. > > The CollapsingQParserPlugin only collapses the result set. The facets > counts are then generated for the collapsed result set by the > FacetComponent. > > This has been a hot topic of late. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Fri, Jun 19, 2015 at 3:54 PM, Carlos Maroto > > wrote: > >> Hi, >> >> We are comparing results between Field Collapsing (&group* >> parameters) and CollapseQParserPlugin. We noticed that some facets >> are returning incorrect counts. >> >> Here are the relevant parameters of one of our test queries: >> >> Field Collapsing: >> --- >> >> q=red%20dress&facet=true&facet.mincount=1&facet.limit=-1&facet.field= >> searchcolorfacet&group=true&group.field=groupid&group.facet=true >> &group.ngroups=true >> >> ngroups = 5964 >> >> >> ... >> 11 >> ... >> >> >> CollapseQParserPlugin: >> >> --q=red%20dress&facet=true&facet.minc >> ount=1&facet.limit=-1&facet.field=searchcolorfacet&fq=%7B!collapse%20 >> field=groupid%7D >> >> numFound = 5964 (same) >> >> >> ... >> 8 >> ... >> >> >> When we change the CollapseQParserPlugin query by adding >> "&fq=searchcolorfacet:red", the numFound value is 11, effectively >> showing all 11 hits with that color. The facet count for red now >> shows the correct value of 11 as well. >> >> Has anyone seeing something similar? >> >> Thanks, >> Carlos >> > >
RE: How to do a Data sharding for data in a database table
As stated previously, using Field Collapsing (group parameters) tends to significantly slow down queries. In my experience, search response gets even worst when: - Requesting facets, which more often than not I do in my query formulation - Asking for the facet counts to be on the groups via the group.facet=true parameter (way worst in some of my use cases that had a lot of distinct values for at least one of the facets) - Queries are matching many hits, i.e. individual counts (hundreds of thousands or more in our case) and total groups counts (in the few thousands) Also stated by someone, switching to CollapseQParserPlugin will likely reduce significantly the response time given its different implementation. Using CollapseQParserPlugin means that you: 1- Have to change how the query gets created 2- May need to change how you consume the Solr response (depending on what you are using today) 3- Will not have the total number of individual hits (before collapsing count) because the numFound returned by the CollapseQParserPlugin represents the total number of groups (like groups.ngroups does) 4- You may have an issue with facet value counts not being exact in the CollapseQParserPlugin response With respect to sharding, there are multiple considerations. The most relevant given your need for grouping is to implement custom routing of documents to shards so that all members of a group are indexed in the same shard, if you can. Otherwise your grouping across shards will have some issues (particularly with counts, I believe.) CARLOS MAROTO http://www.searchtechnologies.com/ M +1 626 354 7750 -Original Message- From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] Sent: Friday, June 19, 2015 12:08 PM To: solr-user@lucene.apache.org Subject: RE: How to do a Data sharding for data in a database table Also, since you are tuning for relative times, you can tune on the smaller index. Surely, you will want to test at scale. But tuning query, analyzer or schema options is usually easier to do on a smaller index. If you get a 3x improvement at small scale, it may only be 2.5x at full scale. E.g. storing the group field as doc values is one option that can help grouping performance in some cases (at least according to this list, I haven't tried it yet). The number of distinct values of the grouping field is important as well. If there are very many, you may want to try CollapsingQParserPlugin. The point being, some of these options may require reindexing! So, again, it is a much easier and faster process to tune on a smaller index. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, June 19, 2015 2:33 PM To: solr-user@lucene.apache.org Subject: Re: How to do a Data sharding for data in a database table Do be aware that turning on &debug=query adds a load. I've seen the debug component take 90% of the query time. (to be fair it usually takes a much smaller percentage). But you'll see a section at the end of the response if you set debug=all with the time each component took so you'll have a sense of the relative time used by each component. Best, Erick On Fri, Jun 19, 2015 at 11:06 AM, Wenbin Wang wrote: > As for now, the index size is 6.5 M records, and the performance is > good enough. I will re-build the index for all the records (14 M) and > test it again with debug turned on. > > Thanks > > > On Fri, Jun 19, 2015 at 12:10 PM, Erick Erickson > > wrote: > >> First and most obvious thing to try: >> >> bq: the Solr was started with maximal 4G for JVM, and index size is < >> 2G >> >> Bump your JVM to 8G, perhaps 12G. The size of the index on disk is >> very loosely coupled to JVM requirements. It's quite possible that >> you're spending all your time in GC cycles. Consider gathering GC >> characteristics, see: >> http://lucidworks.com/blog/garbage-collection-bootcamp-1-0/ >> >> As Charles says, on the face of it the system you describe should >> handle quite a load, so it feels like things can be tuned and you >> won't have to resort to sharding. >> Sharding inevitably imposes some overhead so it's best to go there last. >> >> From my perspective, this is, indeed, an XY problem. You're assuming >> that sharding is your solution. But you really haven't identified the >> _problem_ other than "queries are too slow". Let's nail down the >> reason queries are taking a second before jumping into sharding. I've >> just spent too much of my life fixing the wrong thing ;) >> >> It would be useful to see a couple of sample queries so we can get a >> feel for how complex they are. Especially if you append, as Charles >> mentions, "d
CollapseQParserPluging Incorrect Facet Counts
Hi, We are comparing results between Field Collapsing (&group* parameters) and CollapseQParserPlugin. We noticed that some facets are returning incorrect counts. Here are the relevant parameters of one of our test queries: Field Collapsing: --- q=red%20dress&facet=true&facet.mincount=1&facet.limit=-1&facet.field=searchcolorfacet&group=true&group.field=groupid&group.facet=true &group.ngroups=true ngroups = 5964 ... 11 ... CollapseQParserPlugin: --q=red%20dress&facet=true&facet.mincount=1&facet.limit=-1&facet.field=searchcolorfacet&fq=%7B!collapse%20field=groupid%7D numFound = 5964 (same) ... 8 ... When we change the CollapseQParserPlugin query by adding "&fq=searchcolorfacet:red", the numFound value is 11, effectively showing all 11 hits with that color. The facet count for red now shows the correct value of 11 as well. Has anyone seeing something similar? Thanks, Carlos
Two Spellcheck Components in a Single Solr Search
Hi, Has anyone configured two spellchecker components in Solr so that a single search returns two different sets of suggestions? *Use Case:* Combined index of business names and categories of those businesses *Sample Query:* thisle (misspelling by the user) *Expected Results:* Thistle (actual name of a business) *Current Suggestion:* tiles (“tiles” is a more common term than “thistle” in the spellcheck field and therefore considered as a better suggestion by the spellchecker) *Expected Suggestions:* Since we want to configure one spellchecker to work against a field that indexes categories content and another spellchecker that indexes business names, then we would expect two different suggestions: “tiles” (from the categories spellchecker) and “thistle” (from the business names spellchecker) I tried: 1-1- Configuring two different spellcheckers and calling both as in the searchHandler, each spellchecker has a different field configured to generate the suggestions 2- 2- Configuring two based on different fields in the searchComponent configuration for the spellcheck component I can only get suggestions from one of the components Any ideas?
Using Update Handler to Combine Data from 2 Cores
Hi, Say I have an index of "Product Types" and a different index of "Products" that belong to one of the types in the other index. Users will do their searches for attributes of types and products combined so the two distinct, but related indices must be combined into a single, flattened index so that the searches and relevancy ranking can be done appropriately. Let's call this 3rd index type+product index. I've been asked by a customer to implement a custom update processor chain for the 3rd index that will get as input two values that define a relationship between a product and its corresponding type. In other words, the documents posted to the type+product index would simply be a value that corresponds with the uniqueId of a product type doc and another value that represents the uniqueId of the specific product of that type. An update processor would then read all fields stored in the product type index and append them to the document, then another update processor would take the other key and read the stored fields in the products index to also append them to the doc that will then be ready to be indexed into the 3rd core for merged content. I explained to the customer already that this would be custom development, for which we would need to extend various classes and implement ourselves the desired logic (not modifying anything in trunk, preferably). Has anyone implemented something similar? Is there anything that would prevent this from being possible in Solr? Here is an example scenario to illustrate what I've been asked to implement. Product Types: * T1 car T2 truck T3 motorcycle Products: ** 1 white $14500 2 red $ 5600 3 white $ 3300 4 blue $ 88000 Possible searches: * white car red motorcycle white truck Notice that with the two independent data sets above it is not possible to implement this solution. Therefore the idea to create a 3rd index (core) which will take the relationships: typeId = T1, prodId = 1 typeId = T3, prodId = 2 typeId = T3, prodId = 3 typeId = T2, prodId = 4 To generate through a custom update processing chain an index consisting of: Type+Product T1+1 car white $14500 T3+2 motorcycle red $ 5600 T3+3 motorcycle white $ 3300 T2+4 truckblue $ 88000 Thanks, Carlos
Setting a Key/Tag/Label for each group.query Result Set
Hi, I'm trying to get results in a single Solr call through multiple group.query definitions. I'm getting the results I want but, each group is presented under a "name" consisting of the query used for that group. I'd like to change the "name" of each group to some meaningful name instead. I'm looking for something similar to the "key" feature in Facets ( https://wiki.apache.org/solr/SimpleFacetParameters#key_:_Changing_the_output_key ) For example, the current output I get is: ... 5849 5849 ... Where I'd like to see something like: ... 5849 5849 ... Does anyone know about a way to do this? Thanks, Carlos
RE: Solr Suggester component doesn't return hits for non-English words
Hi Dejan, I wouldn't say your problem is because the words are non-English words as there is nothing in Solr to indicate that the terms are or not in English. I think it is a configuration thing in your implementation for the current data set or test, I would start by trying the following: - In the , the attribute may prevent either or both of your suggestions from being considered. Make sure that "marcos" and "dejan" appear in at least 0.5% (per the 0.005 value in the parameter) of your document set. If they don't, then that explains it: the suggester considers those too rare to be included as a suggestion. Perhaps set it to 0 to find out if the suggester returns them then (check a couple of references to "threshold" in the Suggester wiki article, particularly the details at http://wiki.apache.org/solr/Suggester#Dictionary ) - If you still don't get them as suggestions but you get some new suggestions as a result of the new value, then you may have a lot of other rare terms matching "mar" or "de" and you'd need to adjust other parameters, such as "spellcheck.count" in the or others Additionally, check the your configurations in general. For example, the has "spellcheck.onlymorepopular" all in lowercase and Solr may ignore it (the correct name is "spellcheck.onlyMorePopular"). You may not care about it and it shouldn't affect your current case but, it is better to reduce things to basics when troubleshooting something (remove/disable settings you don't need until you resolve the current issue) Hope this helps, Carlos www.searchtechnologies.com -Original Message- From: Dejan Caric [mailto:dejan.ca...@gmail.com] Sent: Sunday, February 24, 2013 4:35 AM To: solr-user@lucene.apache.org Subject: Solr Suggester component doesn't return hits for non-English words Hi everyone, I have defined a suggest component like this: suggest org.apache.solr.spelling.suggest.Suggester org.apache.solr.spelling.suggest.tst.TSTLookup autosuggest_general 0.005 true true suggest true 5 true suggest and autosuggest_general field like this: The suggester component doesn't return any hits for non-English words. I want to get auto-complete for word `Marcos`. So when I call http://localhost:8983/solr/mycore/suggest?q=mar I get the following response: 0 2 And regular search returns 10 hits: http://localhost:8983/solr/mycore/select?q=autosuggest_general:marcos For `de` I get the following response: 0 1 3 0 2 design developer development design `design`, `developer`, and `development` are fine but I don't get `dejan` in suggestions and that word does exist in autosuggest_general field. http://localhost:8983/solr/mycore/select?q=autosuggest_general:dejan returns 0 1 autosuggest_general:dejan ... I'm using Solr 4.1 Any help would be greatly appreciated! // Dejan
Re: numFound is not correct while using Result Grouping
Use group.ngroups, check it in the Solr wiki for FieldCollapsing Carlos Maroto Search Architect at Search Technologies (www.searchtechnologies.com) Nicholas Ding wrote: Hello, I grouped the result, and set group.main=true. I was expecting the numFound equals to the number of groups, but actually it was not. How do I get the number of groups? Thanks Nicholas