RE: CollapseQParserPluging Incorrect Facet Counts

2015-06-19 Thread Carlos Maroto
Thanks Joel,

I don't know why I was unable to find the "understanding collapsing" email 
thread via the search I did on the site but I found it in my own email search 
now.

We'll look into our specific scenario and see if we can find a workaround.  
Thanks!

CARLOS MAROTO   
M +1 626 354 7750

-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Friday, June 19, 2015 1:18 PM
To: solr-user@lucene.apache.org
Subject: Re: CollapseQParserPluging Incorrect Facet Counts

If you see the last comment on:

https://issues.apache.org/jira/browse/SOLR-6143

You'll see there is a discussion starting about adding this feature.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Jun 19, 2015 at 4:14 PM, Joel Bernstein  wrote:

> The CollapsingQParserPlugin does not provide facet counts that are 
> them same as the group.facet feature in Grouping. It provides facet 
> counts that behave like group.truncate.
>
> The CollapsingQParserPlugin only collapses the result set. The facets 
> counts are then generated for the collapsed result set by the 
> FacetComponent.
>
> This has been a hot topic of late.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, Jun 19, 2015 at 3:54 PM, Carlos Maroto 
> 
> wrote:
>
>> Hi,
>>
>> We are comparing results between Field Collapsing (&group* 
>> parameters) and CollapseQParserPlugin.  We noticed that some facets 
>> are returning incorrect counts.
>>
>> Here are the relevant parameters of one of our test queries:
>>
>> Field Collapsing:
>> ---
>>
>> q=red%20dress&facet=true&facet.mincount=1&facet.limit=-1&facet.field=
>> searchcolorfacet&group=true&group.field=groupid&group.facet=true
>> &group.ngroups=true
>>
>> ngroups = 5964
>>
>> 
>> ...
>> 11
>> ...
>> 
>>
>> CollapseQParserPlugin:
>>
>> --q=red%20dress&facet=true&facet.minc
>> ount=1&facet.limit=-1&facet.field=searchcolorfacet&fq=%7B!collapse%20
>> field=groupid%7D
>>
>> numFound = 5964 (same)
>>
>> 
>> ...
>> 8
>> ...
>> 
>>
>> When we change the CollapseQParserPlugin query by adding 
>> "&fq=searchcolorfacet:red", the numFound value is 11, effectively 
>> showing all 11 hits with that color.  The facet count for red now 
>> shows the correct value of 11 as well.
>>
>> Has anyone seeing something similar?
>>
>> Thanks,
>> Carlos
>>
>
>


RE: How to do a Data sharding for data in a database table

2015-06-19 Thread Carlos Maroto
As stated previously, using Field Collapsing (group parameters) tends to 
significantly slow down queries.  In my experience, search response gets even 
worst when:
- Requesting facets, which more often than not I do in my query formulation
- Asking for the facet counts to be on the groups via the group.facet=true 
parameter (way worst in some of my use cases that had a lot of distinct values 
for at least one of the facets)
- Queries are matching many hits, i.e. individual counts (hundreds of thousands 
or more in our case) and total groups counts (in the few thousands)

Also stated by someone, switching to CollapseQParserPlugin will likely reduce 
significantly the response time given its different implementation.  Using 
CollapseQParserPlugin means that you:

1- Have to change how the query gets created
2- May need to change how you consume the Solr response (depending on what you 
are using today)
3- Will not have the total number of individual hits (before collapsing count) 
because the numFound returned by the CollapseQParserPlugin represents the total 
number of groups (like groups.ngroups does)
4- You may have an issue with facet value counts not being exact in the 
CollapseQParserPlugin response

With respect to sharding, there are multiple considerations.  The most relevant 
given your need for grouping is to implement custom routing of documents to 
shards so that all members of a group are indexed in the same shard, if you 
can.  Otherwise your grouping across shards will have some issues (particularly 
with counts, I believe.)

CARLOS MAROTO   
http://www.searchtechnologies.com/
M +1 626 354 7750

-Original Message-
From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] 
Sent: Friday, June 19, 2015 12:08 PM
To: solr-user@lucene.apache.org
Subject: RE: How to do a Data sharding for data in a database table

Also, since you are tuning for relative times, you can tune on the smaller 
index.   Surely, you will want to test at scale.   But tuning query, analyzer 
or schema options is usually easier to do on a smaller index.   If you get a 3x 
improvement at small scale, it may only be 2.5x at full scale.

E.g. storing the group field as doc values is one option that can help grouping 
performance in some cases (at least according to this list, I haven't tried it 
yet).

The number of distinct values of the grouping field is important as well.  If 
there are very many, you may want to try CollapsingQParserPlugin. 

The point being, some of these options may require reindexing!   So, again, it 
is a much easier and faster process to tune on a smaller index.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Friday, June 19, 2015 2:33 PM
To: solr-user@lucene.apache.org
Subject: Re: How to do a Data sharding for data in a database table

Do be aware that turning on &debug=query adds a load. I've seen the debug 
component take 90% of the query time. (to be fair it usually takes a much 
smaller percentage).

But you'll see a section at the end of the response if you set debug=all with 
the time each component took so you'll have a sense of the relative time used 
by each component.

Best,
Erick

On Fri, Jun 19, 2015 at 11:06 AM, Wenbin Wang  wrote:
> As for now, the index size is 6.5 M records, and the performance is 
> good enough. I will re-build the index for all the records (14 M) and 
> test it again with debug turned on.
>
> Thanks
>
>
> On Fri, Jun 19, 2015 at 12:10 PM, Erick Erickson 
> 
> wrote:
>
>> First and most obvious thing to try:
>>
>> bq: the Solr was started with maximal 4G for JVM, and index size is < 
>> 2G
>>
>> Bump your JVM to 8G, perhaps 12G. The size of the index on disk is 
>> very loosely coupled to JVM requirements. It's quite possible that 
>> you're spending all your time in GC cycles. Consider gathering GC 
>> characteristics, see:
>> http://lucidworks.com/blog/garbage-collection-bootcamp-1-0/
>>
>> As Charles says, on the face of it the system you describe should 
>> handle quite a load, so it feels like things can be tuned and you 
>> won't have to resort to sharding.
>> Sharding inevitably imposes some overhead so it's best to go there last.
>>
>> From my perspective, this is, indeed, an XY problem. You're assuming 
>> that sharding is your solution. But you really haven't identified the 
>> _problem_ other than "queries are too slow". Let's nail down the 
>> reason queries are taking a second before jumping into sharding. I've 
>> just spent too much of my life fixing the wrong thing ;)
>>
>> It would be useful to see a couple of sample queries so we can get a 
>> feel for how complex they are. Especially if you append, as Charles 
>> mentions, "d

CollapseQParserPluging Incorrect Facet Counts

2015-06-19 Thread Carlos Maroto
Hi,

We are comparing results between Field Collapsing (&group* parameters) and
CollapseQParserPlugin.  We noticed that some facets are returning incorrect
counts.

Here are the relevant parameters of one of our test queries:

Field Collapsing:
---
q=red%20dress&facet=true&facet.mincount=1&facet.limit=-1&facet.field=searchcolorfacet&group=true&group.field=groupid&group.facet=true
&group.ngroups=true

ngroups = 5964


...
11
...


CollapseQParserPlugin:
--q=red%20dress&facet=true&facet.mincount=1&facet.limit=-1&facet.field=searchcolorfacet&fq=%7B!collapse%20field=groupid%7D

numFound = 5964 (same)


...
8
...


When we change the CollapseQParserPlugin query by adding
"&fq=searchcolorfacet:red", the numFound value is 11, effectively showing
all 11 hits with that color.  The facet count for red now shows the correct
value of 11 as well.

Has anyone seeing something similar?

Thanks,
Carlos


Two Spellcheck Components in a Single Solr Search

2014-11-13 Thread Carlos Maroto
Hi,



Has anyone configured two spellchecker components in Solr so that a single
search returns two different sets of suggestions?



*Use Case:* Combined index of business names and categories of those
businesses

*Sample Query:* thisle  (misspelling by the user)

*Expected Results:* Thistle (actual name of a business)

*Current Suggestion:* tiles (“tiles” is a more common term than “thistle”
in the spellcheck field and therefore considered as a better suggestion by
the spellchecker)

*Expected Suggestions:*  Since we want to configure one spellchecker to
work against a field that indexes categories content and another
spellchecker that indexes business names, then we would expect two
different suggestions: “tiles” (from the categories spellchecker) and
“thistle” (from the business names spellchecker)



I tried:

1-1-  Configuring two different spellcheckers and calling both as
 in the searchHandler, each spellchecker has a different
field configured to generate the suggestions

2- 2- Configuring two  based on different
fields in the searchComponent configuration for the spellcheck component


I can only get suggestions from one of the components



Any ideas?


Using Update Handler to Combine Data from 2 Cores

2014-08-28 Thread Carlos Maroto
Hi,

Say I have an index of "Product Types" and a different index of "Products"
that belong to one of the types in the other index.  Users will do their
searches for attributes of types and products combined so the two distinct,
but related indices must be combined into a single, flattened index so that
the searches and relevancy ranking can be done appropriately.  Let's call
this 3rd index type+product index.

I've been asked by a customer to implement a custom update processor chain
for the 3rd index that will get as input two values that define a
relationship between a product and its corresponding type.  In other words,
the documents posted to the type+product index would simply be a value that
corresponds with the uniqueId of a product type doc and another value that
represents the uniqueId of the specific product of that type.  An update
processor would then read all fields stored in the product type index and
append them to the document, then another update processor would take the
other key and read the stored fields in the products index to also append
them to the doc that will then be ready to be indexed into the 3rd core for
merged content.

I explained to the customer already that this would be custom development,
for which we would need to extend various classes and implement ourselves
the desired logic (not modifying anything in trunk, preferably).

Has anyone implemented something similar? Is there anything that would
prevent this from being possible in Solr?

Here is an example scenario to illustrate what I've been asked to implement.
Product Types:
*
T1  car
T2  truck
T3  motorcycle

Products:
**
1   white  $14500
2   red $  5600
3   white  $  3300
4   blue   $ 88000

Possible searches:
*
white car
red motorcycle
white truck

Notice that with the two independent data sets above it is not possible to
implement this solution.  Therefore the idea to create a 3rd index (core)
which will take the relationships:

typeId = T1, prodId = 1
typeId = T3, prodId = 2
typeId = T3, prodId = 3
typeId = T2, prodId = 4

To generate through a custom update processing chain an index consisting of:
Type+Product

T1+1   car   white  $14500
T3+2   motorcycle   red $  5600
T3+3   motorcycle   white  $  3300
T2+4   truckblue   $ 88000

Thanks,
Carlos


Setting a Key/Tag/Label for each group.query Result Set

2014-07-30 Thread Carlos Maroto
Hi,

I'm trying to get results in a single Solr call through multiple
group.query definitions.  I'm getting the results I want but, each group is
presented under a "name" consisting of the query used for that group.

I'd like to change the "name" of each group to some meaningful name
instead.  I'm looking for something similar to the "key" feature in Facets
(
https://wiki.apache.org/solr/SimpleFacetParameters#key_:_Changing_the_output_key
)

For example, the current output I get is:
...


 
5849


 
5849


...

Where I'd like to see something like:
...

   
5849



5849


...

Does anyone know about a way to do this?

Thanks,
Carlos


RE: Solr Suggester component doesn't return hits for non-English words

2013-02-25 Thread Carlos Maroto
Hi Dejan,

I wouldn't say your problem is because the words are non-English words as there 
is nothing in Solr to indicate that the terms are or not in English.  I think 
it is a configuration thing in your implementation for the current data set or 
test, I would start by trying the following:

- In the , the  attribute may prevent either or 
both of your suggestions from being considered.  Make sure that "marcos" and 
"dejan" appear in at least 0.5% (per the 0.005 value in the parameter) of your 
document set.  If they don't, then that explains it: the suggester considers 
those too rare to be included as a suggestion.  Perhaps set it to 0 to find out 
if the suggester returns them then  (check a couple of references to 
"threshold" in the Suggester wiki article, particularly the details at 
http://wiki.apache.org/solr/Suggester#Dictionary )
- If you still don't get them as suggestions but you get some new suggestions 
as a result of the new  value, then you may have a lot of other rare 
terms matching "mar" or "de" and you'd need to adjust other parameters, such as 
"spellcheck.count" in the  or others

Additionally, check the your configurations in general.  For example, the 
 has "spellcheck.onlymorepopular" all in lowercase and Solr may 
ignore it (the correct name is "spellcheck.onlyMorePopular").  You may not care 
about it and  it shouldn't affect your current case but, it is better to reduce 
things to basics when troubleshooting something (remove/disable settings you 
don't need until you resolve the current issue)

Hope this helps,
Carlos
www.searchtechnologies.com

-Original Message-
From: Dejan Caric [mailto:dejan.ca...@gmail.com] 
Sent: Sunday, February 24, 2013 4:35 AM
To: solr-user@lucene.apache.org
Subject: Solr Suggester component doesn't return hits for non-English words

Hi everyone,

I have defined a suggest component like this:



suggest
org.apache.solr.spelling.suggest.Suggester
org.apache.solr.spelling.suggest.tst.TSTLookup

autosuggest_general
0.005
true




true
suggest
true
5
true


suggest



and autosuggest_general field like this:











The suggester component doesn't return any hits for non-English words.

I want to get auto-complete for word `Marcos`.
So when I call http://localhost:8983/solr/mycore/suggest?q=mar I get the 
following response:



0
2






And regular search returns 10 hits:
http://localhost:8983/solr/mycore/select?q=autosuggest_general:marcos

For `de` I get the following response:



0
1




3
0
2

design
developer
development


design




`design`, `developer`, and `development` are fine but I don't get `dejan` in 
suggestions and that word does exist in autosuggest_general field.

http://localhost:8983/solr/mycore/select?q=autosuggest_general:dejan returns



0
1

autosuggest_general:dejan



...



I'm using Solr 4.1

Any help would be greatly appreciated!

// Dejan


Re: numFound is not correct while using Result Grouping

2013-02-25 Thread Carlos Maroto
Use group.ngroups, check it in the Solr wiki for FieldCollapsing

Carlos Maroto
Search Architect at Search Technologies (www.searchtechnologies.com)



Nicholas Ding  wrote:


Hello,

I grouped the result, and set group.main=true. I was expecting the numFound
equals to the number of groups, but actually it was not.

How do I get the number of groups?

Thanks
Nicholas