Re: field collapsing performance in sharded environment

2013-11-15 Thread Paul Masurel
That's not the way grouping is done. On a first round all shards return their 10 best group (represented as their 10 best grouping values). As a result it's a three round thing instead of the two round for regular search, so observing an increasing in latency is normal but not in the realm of what

Re: Does MMap works on the Virtual Box?

2013-08-16 Thread Paul Masurel
Hi, You can MMAP a size bigger than your memory without having any problem. Part of your file will just not be loaded into RAM, because you don't access it too often. If you are short in memory, consider deactivating page Host IO Caching, as it will be only redundant with your guest OS page cache

Re: Unexpected behavior when sorting groups

2013-08-06 Thread Paul Masurel
overall sorting of groups. > > The latest comment there suggests that it's a bug in distributed mode, but > I don't think that's the case since I'm only using one instance of Solr > with no sharding or anything. > > -Original Message- > From: Paul Ma

Re: Unexpected behavior when sorting groups

2013-08-06 Thread Paul Masurel
nstance of Solr > with no sharding or anything. > This is not a bug. If I get some time, I'll try to write a post about how collapsing is working in Solr. Even though it is counterintuitive, what you are asking for is actually a difficult problem. Regards, Paul > -Original Message

Re: Solr grouping performace

2013-08-05 Thread Paul Masurel
Collapsing is not that slow actually. With a high number of groups, you may just have to let group.ngroups set to false. If you need to get the overall number of groups, you may have to patch lucene. https://issues.apache.org/jira/browse/LUCENE-3972?page=com.atlassian.jira.plugin.system.issuetab

Re: Unexpected behavior when sorting groups

2013-08-04 Thread Paul Masurel
Dear Tony, The behavior you described is correct, and what you are requiring is impossible with Solr as it is. I wouldn't however say it is a limitation of Solr : your problem is actually difficult and require some preprocessing. One solution if it is feasible for you is to precompute the lowest

Re: Group and performing statistics on groups

2013-08-01 Thread Paul Masurel
https://issues.apache.org/jira/browse/SOLR-2931 Please add a word on the JIRA describing your mean and keep an eye on the ticket. I might release such a plugin any time soon. Regards, Paul On Fri, Jul 26, 2013 at 4:16 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Hi, > > I thin

Re: Pagination on grouped results

2013-08-01 Thread Paul Masurel
Let me copy paste an answer I wrote yesterday :) To get the number of groups, you are expected to set set group.ngroups=true. Even then, the result will only give you an upperbound in a distributed environment. To get the exact number of groups, you need to shard along your grouping field. If you

Re: Unexpected character '<' (code 60) expected '='

2013-08-01 Thread Paul Masurel
You can check for your xml validity with xmllint very simply. xmllint Does this return an error? On Thu, Aug 1, 2013 at 9:59 AM, deniz wrote: > Vineet Mishra wrote > > I am using Solr 3.5 with the posting XML file size of just 1Mb. > > > > > > On Wed, Jul 31, 2013 at 8:19 PM, Shawn Heisey <

Re: TrieField and FieldCache confusion

2013-08-01 Thread Paul Masurel
Thank you very much for your very fast answer and all the pointers. That's what I thought, but then I got confused by the last note http://wiki.apache.org/solr/StatsComponent "TrieFields has to use a precisionStep of -1 to avoid using UnInvertedField

Re: FieldCollapsing issues in SolrCloud 4.4

2013-07-31 Thread Paul Masurel
+and+Indexing+Data+in+SolrCloud On Wed, Jul 31, 2013 at 8:02 PM, Ali, Saqib wrote: > Hello Paul, > > Can you please explain what you mean by: > "To get the exact number of groups, you need to shard along your grouping > field" > > Thanks! :) > > > On We

Re: FieldCollapsing issues in SolrCloud 4.4

2013-07-31 Thread Paul Masurel
Do you mean you get different results with group=true? numFound is supposed returns the number of ungrouped hits. To get the number of groups, you are expected to set set group.ngroups=true. Even then, the result will only give you an upperbound in a distributed environment. To get the exact numbe

TrieField and FieldCache confusion

2013-07-31 Thread Paul Masurel
ues when working with TrieField with the precisionStep higher than 0. If not, what did I get wrong? Regards, Paul Masurel e-mail: paul.masu...@gmail.com