Re: Fetching results with a minimum from each category

2013-12-30 Thread Jilal Oussama
I don't get it, if b and d only have 4 products, how can you fetch one more
for them?

But any way, if you realy know what you are doing, then I can say that this
is quite specific, so you may have to do it in 2 solr queries.

also see the Grouping Componant, it may help you get what you want
(grouping by the category field)

http://wiki.apache.org/solr/FieldCollapsing



2013/12/30 nish nishantsharma...@gmail.com

 I am using solr 4.4.0. The search is performed on products, each of which
 has
 a category field. I want to retrieve top n products. But, if some category
 has less than m products among the top n, then I want to retrieve more
 products only for those categories.

 Eg. I have 4 categories a, b, c, d. n=20 and m=5. Now lets say the top
 20(=n) have following category distribution (a:6, b:4, c:6, d:4).
 Categories
 b and d have less than m(=5) products. So I would like to fetch one more
 product(with the next highest score) for both these categories.

 Is there a way I can do this using solr



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Fetching-results-with-a-minimum-from-each-category-tp4108659.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Query Slowliness

2013-12-27 Thread Jilal Oussama
Thank you guys for your replies,

Sorry that I forgot to mention that I have allocated 10 GB of memory to the
Java Heap.


2013/12/26 Shawn Heisey s...@elyograg.org

 On 12/26/2013 3:38 AM, Jilal Oussama wrote:
  Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB
 memory
   840 GB storage) and contained several cores for different usage.
 
  When I manually executed a query through Solr Admin (a query containing
  10~15 terms, with some of them having boosts over one field and limited
 to
  one result without any sorting or faceting etc ) it takes around 700
  ms, and the Core contained 7 million documents.
 
  When the scripts are executed things get slower, my query takes 7~10s.
 
  Then what I did is to turn to SolrCloud expecting huge performance
 increase.
 
  I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU
  with 28 ECU, 15 GB memory  160 SSD storage), then I created one
 collection
  to contain the core I was querying, I sharded it to 25 shards (each node
  containing 5 shards without replication), each shards took 54 MB of
 storage.
 
  Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich
  is very good !
 
  Tested my scripts again (I have 30 scripts running at the same time), and
  as a surprise, things run fast for 5 seconds then it turns realy slow
 again
  (query time ).
 
  I updated the solrconfig.xml to remove the query caches (I don't need
 them
  since queries are very different and only 1 time queries) and changes the
  index memory to 1 GB, but only got a small increase (3~4s for each query
 ?!)

 Your SolrCloud setup has 35 times as much CPU power (just basing this on
 the ECU numbers) as your single-server setup, ten times as much memory,
 and a lot more IOPS because you moved to SSD.  A 10X increase in single
 query performance is not surprising.

 You have not indicated how much memory is assigned to the java heap on
 each server.  I think that there are three possible problems happening
 here, with a strong possibility that the third one is happening at the
 same time as one of the other two:

 1) Full garbage collections are too frequent because the heap is too small.
 2) Garbage collections take too long because the heap is very large and
 GC is not tuned.
 3) Extremely high disk I/O because the OS disk cache is too small for
 the index size.

 Some information on these that might be helpful:

 http://wiki.apache.org/solr/SolrPerformanceProblems

 The general solution for good Solr performance is to throw hardware,
 especially memory, at the problem.  It's worth pointing out that any
 level of hardware investment has an upper limit on the total query
 volume it can support.  Running 30 test scripts at the same time will be
 difficult for all but the most powerful and expensive hardware to deal
 with, especially if every query is different.  A five-server cloud where
 each server has 8 CPU cores and 15GB of memory is pretty small, all
 things considered.

 Thanks,
 Shawn




Solr Query Slowliness

2013-12-26 Thread Jilal Oussama
Hi all,

I have multiple python scripts querying solr with the sunburnt module.

Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB memory
 840 GB storage) and contained several cores for different usage.

When I manually executed a query through Solr Admin (a query containing
10~15 terms, with some of them having boosts over one field and limited to
one result without any sorting or faceting etc ) it takes around 700
ms, and the Core contained 7 million documents.

When the scripts are executed things get slower, my query takes 7~10s.

Then what I did is to turn to SolrCloud expecting huge performance increase.

I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU
with 28 ECU, 15 GB memory  160 SSD storage), then I created one collection
to contain the core I was querying, I sharded it to 25 shards (each node
containing 5 shards without replication), each shards took 54 MB of storage.

Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich
is very good !

Tested my scripts again (I have 30 scripts running at the same time), and
as a surprise, things run fast for 5 seconds then it turns realy slow again
(query time ).

I updated the solrconfig.xml to remove the query caches (I don't need them
since queries are very different and only 1 time queries) and changes the
index memory to 1 GB, but only got a small increase (3~4s for each query ?!)

Any ideas ?

PS: My index size will not stay with 7m documents, it will grow to +100m
and that may get things worse


Re: Solr Query Slowliness

2013-12-26 Thread Jilal Oussama
Thanks Rafal for your reply,

My scripts are running on other independent machines so they does not
affect Solr, I did mention that the queries are not the same (that is why I
removed the query cache from solrconfig.xml), and I only get 1 result from
Solr (which is the top scored one so no sorting since it is by default
ordred by score)



2013/12/26 Rafał Kuć r@solr.pl

 Hello!

 Could you tell us more about your scripts? What they do? If the
 queries are the same? How many results you fetch with your scripts and
 so on.

 --
 Regards,
  Rafał Kuć
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com/


  Hi all,

  I have multiple python scripts querying solr with the sunburnt module.

  Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB
 memory
   840 GB storage) and contained several cores for different usage.

  When I manually executed a query through Solr Admin (a query containing
  10~15 terms, with some of them having boosts over one field and limited
 to
  one result without any sorting or faceting etc ) it takes around 700
  ms, and the Core contained 7 million documents.

  When the scripts are executed things get slower, my query takes 7~10s.

  Then what I did is to turn to SolrCloud expecting huge performance
 increase.

  I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU
  with 28 ECU, 15 GB memory  160 SSD storage), then I created one
 collection
  to contain the core I was querying, I sharded it to 25 shards (each node
  containing 5 shards without replication), each shards took 54 MB of
 storage.

  Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich
  is very good !

  Tested my scripts again (I have 30 scripts running at the same time), and
  as a surprise, things run fast for 5 seconds then it turns realy slow
 again
  (query time ).

  I updated the solrconfig.xml to remove the query caches (I don't need
 them
  since queries are very different and only 1 time queries) and changes the
  index memory to 1 GB, but only got a small increase (3~4s for each query
 ?!)

  Any ideas ?

  PS: My index size will not stay with 7m documents, it will grow to +100m
  and that may get things worse




Re: Solr Query Slowliness

2013-12-26 Thread Jilal Oussama
This an example of a query:

http://myip:8080/solr/TestCatMatch_shard12_replica1/select?q=Royal+Cashmere+RC+106+CS+Silk+Cashmere+V+Neck+Moss+Green+Men
^10+s+Sweater+Cashmere^3+Men^3+Sweaters^3+Clothing^3rows=1wt=jsonindent=true

in return :

{
  responseHeader:{
status:0,
QTime:191},
  response:{numFound:4539784,start:0,maxScore:2.0123534,docs:[
  {
Sections:fashion,
IdsCategories:11101911,
IdProduct:ef6b8d7cf8340d0c8935727a07baebab,
Id:11101911-ef6b8d7cf8340d0c8935727a07baebab,
Name:Uniqlo Men Cashmere V Neck Sweater Men Clothing
Sweaters Cashmere,
_version_:1455419757424541696}]
  }}

This query was executed when no script is running so the QTime is only
191 ms, but it may take up to 3s when they are)


Of course it can be smaller or bigger and of course that affects the
execution time (the execution times I spoke of are the internal ones
returned by solr, not calculated by me).

And yes the CPU is fully used.


2013/12/26 Rafał Kuć r@solr.pl

 Hello!

 Different queries can have different execution time, that's why I
 asked about the details. When running the scripts, is Solr CPU fully
 utilized? To tell more I would like to see what queries are run
 against Solr from scripts.

 Do you have any information on network throughput between the server
 you are running scripts on and the Solr cluster? You wrote that the
 scripts are fine for 5 seconds and than they get slow. If your Solr
 cluster is not fully utilized I would take a look at the queries and
 what they return (ie. using faceting with facet.limit=-1) and seeing
 if the network is able to process those.

 --
 Regards,
  Rafał Kuć
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com/


  Thanks Rafal for your reply,

  My scripts are running on other independent machines so they does not
  affect Solr, I did mention that the queries are not the same (that is
 why I
  removed the query cache from solrconfig.xml), and I only get 1 result
 from
  Solr (which is the top scored one so no sorting since it is by default
  ordred by score)



  2013/12/26 Rafał Kuć r@solr.pl

  Hello!
 
  Could you tell us more about your scripts? What they do? If the
  queries are the same? How many results you fetch with your scripts and
  so on.
 
  --
  Regards,
   Rafał Kuć
  Performance Monitoring * Log Analytics * Search Analytics
  Solr  Elasticsearch Support * http://sematext.com/
 
 
   Hi all,
 
   I have multiple python scripts querying solr with the sunburnt module.
 
   Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB
  memory
840 GB storage) and contained several cores for different usage.
 
   When I manually executed a query through Solr Admin (a query
 containing
   10~15 terms, with some of them having boosts over one field and
 limited
  to
   one result without any sorting or faceting etc ) it takes around
 700
   ms, and the Core contained 7 million documents.
 
   When the scripts are executed things get slower, my query takes 7~10s.
 
   Then what I did is to turn to SolrCloud expecting huge performance
  increase.
 
   I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8
 vCPU
   with 28 ECU, 15 GB memory  160 SSD storage), then I created one
  collection
   to contain the core I was querying, I sharded it to 25 shards (each
 node
   containing 5 shards without replication), each shards took 54 MB of
  storage.
 
   Tested my query on the new SolrCloud, it takes 70 ms ! huge increase
 wich
   is very good !
 
   Tested my scripts again (I have 30 scripts running at the same time),
 and
   as a surprise, things run fast for 5 seconds then it turns realy slow
  again
   (query time ).
 
   I updated the solrconfig.xml to remove the query caches (I don't need
  them
   since queries are very different and only 1 time queries) and changes
 the
   index memory to 1 GB, but only got a small increase (3~4s for each
 query
  ?!)
 
   Any ideas ?
 
   PS: My index size will not stay with 7m documents, it will grow to
 +100m
   and that may get things worse
 
 




Re: [ANNOUNCE] Solr wiki editing change

2013-03-28 Thread Jilal Oussama
Please add OussamaJilal to the group.

Thank you.


2013/3/28 Steve Rowe sar...@gmail.com

 On Mar 28, 2013, at 9:25 AM, Andy Lester a...@petdance.com wrote:
  On Mar 24, 2013, at 10:18 PM, Steve Rowe sar...@gmail.com wrote:
  Please request either on the solr-user@lucene.apache.org or on
 d...@lucene.apache.org to have your wiki username added to the
 ContributorsGroup page - this is a one-time step.
 
  Please add my username, AndyLester, to the approved editors list.
  Thanks.

 Added to solr ContributorsGroup.


Re: SOLR - Recommendation on architecture

2013-03-08 Thread Jilal Oussama
I would not recommend Windows too


2013/3/8 Kobe J kobe.free.wo...@gmail.com

 We are planning to use SOLR 4.1 for full text indexing. Following is the
 hardware configuration of the web server that we plan to install SOLR on:-

 *CPU*: 2 x Dual Core (4 cores)

 *R**AM:* 12GB

 *Storage*: 212GB

 *OS Version* – Windows 2008 R2



 The dataset to be imported will have approx.. 800k records, with 450 fields
 per record. Query response time should be btw 200ms-800ms.



 Please suggest if the current single server implementation should work fine
 and if the specified configuration is enough for the requirement.



Re: removing whitespaces in query

2013-03-07 Thread Jilal Oussama
You can use two fields, in one you keep the original data, and use the
second one as a copy field and use the Pattern Replace Filter combined with
the Keyword Tockenizer.


2013/3/7 Jochen Lienhard lienh...@ub.uni-freiburg.de

 Hello,

 we have indexed a field, where we have removed the whitespaces before the
 indexing.

 For example:

 50A91
 Frei91\:9984

 Now we want allow the users to search for:

 50 A 91
 Frei 91 \: 9984

 Our idea was to add a PatternReplaceFilterFactory in the query analyzer to
 remove the whitespaces:
 charFilter class=solr.**PatternReplaceFilterFactory pattern=(\s+)
 replacement= replace=all/

 But it does not work.

 For normal queries - we are using vufind als frontend - we can remove the
 whitespace in the yaml part, but if
 the user search with wildcards ... the yaml does not work ... so we hope
 to find a solution in solr.

 We are using solr 3.6.

 Thanks for ideas and hints.

 Greetings from Germany

 Jochen

 --
 Dr. rer. nat. Jochen Lienhard
 Dezernat EDV

 Albert-Ludwigs-Universität Freiburg
 Universitätsbibliothek
 Rempartstr. 10-16  | Postfach 1629
 79098 Freiburg | 79016 Freiburg

 Telefon: +49 761 203-3908
 E-Mail: lienh...@ub.uni-freiburg.de
 Internet: www.ub.uni-freiburg.de




Re: Building a central index with Lucene + Solr

2013-03-05 Thread Jilal Oussama
I use Solarium as a PHP library too, and I would greatly recommend it.


2013/3/5 Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu

 Agreed, PHP and Solr are an excellent combination. I'm using Solr 3.6 +
 PHP (Symfony2 + NelmioSolariumBundle + Solarium) and getting excellent
 results. Even solarium as a PHP library is great, right now it lack's of
 solr4 support, but for solr 3.6 it's great.

 - Mensaje original -
 De: David Quarterman da...@corexe.com
 Para: solr-user@lucene.apache.org
 Enviados: Martes, 5 de Marzo 2013 10:56:18
 Asunto: RE: Building a central index with Lucene + Solr

 Hi Alvaro,

 I agree with Otis  Alexandre (esp. Windows + PHP!). However, there are
 plenty of people using Solr  PHP out there very successfully. There's
 another good package at http://code.google.com/p/solr-php-client/ which
 is easy to implement and has some example usage.

 Regards,

 DQ



 From: Álvaro Vargas Quezada [mailto:al...@outlook.com]
 Sent: 05 March 2013 14:53
 To: solr-user@lucene.apache.org
 Subject: Building a central index with Lucene + Solr



 Hi everyone!



 I'm trying to develop a central index, I installed Solr and I reach the
 screen that I attach. But the problem is that I don't know how to continue
 since this point, I wanted to develop an app in php which use Solr, but I
 don't know how, anyone that can help me maybe with a tutorial or something
 like that?



 Thanks and greetz from Chile!






Re: create cores dynamically

2013-03-03 Thread Jilal Oussama
For me, it always creates the data dir next to the conf dir I specified
(this should depend on your core configuration) and load the core into solr
(and I think this is what it is suposed to do)
On Mar 3, 2013 3:18 AM, adeelmahmood adeelmahm...@gmail.com wrote:

 I am not sure if I understand how the creating cores dynamically
 functionality is supposed to work. From what I have sort of figured out is
 that I need to specify the instanceDir as the path to a directory which
 contains the conf file. So I have directory as template for configuration
 files but when I use this path, solr adds the data directory next to this
 template conf directory which defeats the purpose. I was hoping that it
 will
 copy the template files into a new directory created for the core. Is that
 not how its supposed to work.

 Any help is appreciated.

 Thanks
 Adeel



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/create-cores-dynamically-tp4044279.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Query with whitespace

2013-03-01 Thread Jilal Oussama
You can also specify in you schema that the default query operator is AND.
On Mar 1, 2013 5:35 PM, Jack Park jackp...@topicquests.org wrote:

 I found a tiny notice about just using quotes; tried it in the admin
 query console and it works. e.g. label:car house would fetch any
 document for which the label field contained that phrase.

 Jack

 On Fri, Mar 1, 2013 at 9:17 AM, Shawn Heisey s...@elyograg.org wrote:
  On 3/1/2013 8:50 AM, vsl wrote:
 
  I would like to send query like car house. My expectation is to have
  resulting documents that contains both car and house. Unfortunately
 Apache
  Solr out of the box returns documents as if the whitespace between was
  treated as OR. Does anybody know how to fix this?
 
 
  Three solutions come to mind: 1) Set the q.op parameter to AND.  2) Send
  car AND house instead, or +car +house.  3) Use the edismax query
 parser
  (defType=edismax) and set the mm parameter to 100%.  The wiki should have
  info on all these.
 
  Thanks,
  Shawn
 



RE: Can't search words in quotes

2013-02-27 Thread Jilal Oussama
You are welcome Alex
Glad it worked fine.
On Feb 28, 2013 6:15 AM, Alex Cougarman acoug...@bwc.org wrote:

 Thanks, Oussama. That was very useful information and we have added the
 double quotes. One interesting trick: we had to change the way we did it to
 wrap the pattern value in single quotes so we could have double quotes
 inside.

 Warm regards,
 Alex Cougarman

 Bahá'í World Centre
 Haifa, Israel
 Office: +972-4-835-8683
 Cell: +972-54-241-4742
 acoug...@bwc.org


 -Original Message-
 From: Oussama Jilal [mailto:jilal.ouss...@gmail.com]
 Sent: 26 February 2013 12:17 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Can't search words in quotes

 The pattern you are using in the PatternTokenizerFactory does not contain
 double quotes, so indexing the text The Promulgation of Universal Peace
 will results in the following tokens : The / Promulgation / of / Universal
 / Peace, that's why Peace will not match Peace.


 On 02/26/2013 08:08 AM, Alex Cougarman wrote:
  Hi. We have run into an interesting situation when searching for words
  that are within double-quotes in our documents. For example, when we
  enter the following search: promulgation AND peace
 
  The document in question has this text exactly (with the double quotes):
 The Promulgation of Universal Peace
  However, it finds and highlights the word Promulgation but not the
  word Peace Here's the field's definition in our schema.xml:
 
   fieldType name=text_general class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
   tokenizer class=solr.PatternTokenizerFactory
 pattern=[\s\.\?\!,:;]/
   filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.PorterStemFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.PatternTokenizerFactory
 pattern=[\s\.\?\!,:;]/
   filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.PorterStemFilterFactory/
 /analyzer
   /fieldType
 
  Warm regards,
  Alex Cougarman
 
  Bahá'í World Centre
  Haifa, Israel
  Office: +972-4-835-8683
  Cell: +972-54-241-4742
  acoug...@bwc.orgmailto:acoug...@bwc.org
 
 

 --
 Oussama Jilal




Re: Solr Grouping and empty fields

2013-02-24 Thread Jilal Oussama
Oh this is a good one ! Thank you very much Teun (But I will have to ask
you how do you generate a unique value for the copy field when the original
one is empty? Do you do this manualy or solr can do it?)
And thanks again.
On Feb 24, 2013 12:11 PM, Teun Duynstee t...@duynstee.com wrote:

 We had a comparable situation. We created an extra field and at index time
 copy the value if there is one and create a unique dummy value if there is
 none. We couldn't just make the initial field required, because it has a
 meaning other than just a grouping key.
 Teun
 Op 22 feb. 2013 20:47 schreef Daniel Collins danwcoll...@gmail.com het
 volgende:

  We had something similar to be fair, a cluster information field which
 was
  unfortunately optional, so all the documents that didn't have this field
  set grouped together.
 
  It isn't Solr's fault, to be fair, we told it to group on the values of
  field Z, null is a valid value and lots of documents have that value so
  they all group together.  We got what we asked for :-)
 
  Our solution was to make that field mandatory, and in our indexing
  pipeline we will set that field to some unique value (same as the
 document
  key if necessary) if it isn't set already to ensure that every document
 has
  that field set appropriately.
 
  -Original Message- From: Oussama Jilal
  Sent: Friday, February 22, 2013 5:25 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Solr Grouping and empty fields
 
  OK I'm sorry if I did not explained well my need. I'll try to give a
  better explanation.
 
  What I have : Millions of documents that have a field X , another field
  Y and another field Z which is not required (So it can be empty in some
  documents and not in others).
 
  What I want to do : Search for docs that have the field X equals
  something and group them by field Z (so that only 1 document is returned
  for every field Z value), BUT I want documents who have field Z as empty
  to be included in the results (all of them), and sort the results by
  field Y (so I can't separate the request into two requests).
 
  I hope that this is clearer.
 
 
  On 02/22/2013 03:59 PM, Jack Krupansky wrote:
 
  What?!?! You want them grouped but not grouped together?? What on earth
  does that mean?! I mean, either they are included or they are not. All
  results will be in some group, so where exactly do you want these not
 to
  be grouped together documents to be grouped? In any case, please
 clarify
  what your expectations really are.
 
  -- Jack Krupansky
  -Original Message- From: Oussama Jilal
  Sent: Friday, February 22, 2013 7:17 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Solr Grouping and empty fields
 
  Thank you Johannes, but I want the documents having the field empty to
  be included in the results, just not to be grouped together, and if I
  understood your solution correctly, it will simply remove those
  documents from the results (Note : The field values are very variable
  and unknown to me).
 
  On 02/22/2013 02:53 PM, Johannes Rodenwald wrote:
 
  Hi Oussama,
 
  If you have only a few distinct, unchanging values in the field that
 you
  group upon, you could implement a FilterQuery (query parameter fq)
 and
  add it to the query, allowing all valid values, but not an empty
 field. For
  example:
 
  fq=my_grouping_string_field:( value_a OR value_b OR value_c OR value_d
 )
 
  If you use SOLR 4.x, you should be able to group upon an integer field,
  allowing a range filter:
  (I still work with 3.6 which can only group on string fields, so i
 didnt
  test this one)
 
  fq=my_grouping_integer_field:[**1 TO *]
 
  --
  Johannes Rodenwald
 
 
  - Ursprüngliche Mail -
  Von: Oussama Jilal jilal.ouss...@gmail.com
  An: solr-user@lucene.apache.org
  Gesendet: Freitag, 22. Februar 2013 12:32:13
  Betreff: Solr Grouping and empty fields
 
  Hi,
 
  I need to group some results in solr based on a field, but I don't want
  documents having that field empty to be grouped together, does anyone
  know how to achieve that ?
 
 
 
  --
  Oussama Jilal
 
 



Re: Solr Grouping and empty fields

2013-02-23 Thread Jilal Oussama
Thank you Daniel, it is a nice idea.

I wish there was a better solution, but we will go with yours it seems.

(still open for any other idea)
On Feb 22, 2013 7:47 PM, Daniel Collins danwcoll...@gmail.com wrote:

 We had something similar to be fair, a cluster information field which was
 unfortunately optional, so all the documents that didn't have this field
 set grouped together.

 It isn't Solr's fault, to be fair, we told it to group on the values of
 field Z, null is a valid value and lots of documents have that value so
 they all group together.  We got what we asked for :-)

 Our solution was to make that field mandatory, and in our indexing
 pipeline we will set that field to some unique value (same as the document
 key if necessary) if it isn't set already to ensure that every document has
 that field set appropriately.

 -Original Message- From: Oussama Jilal
 Sent: Friday, February 22, 2013 5:25 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Grouping and empty fields

 OK I'm sorry if I did not explained well my need. I'll try to give a
 better explanation.

 What I have : Millions of documents that have a field X , another field
 Y and another field Z which is not required (So it can be empty in some
 documents and not in others).

 What I want to do : Search for docs that have the field X equals
 something and group them by field Z (so that only 1 document is returned
 for every field Z value), BUT I want documents who have field Z as empty
 to be included in the results (all of them), and sort the results by
 field Y (so I can't separate the request into two requests).

 I hope that this is clearer.


 On 02/22/2013 03:59 PM, Jack Krupansky wrote:

 What?!?! You want them grouped but not grouped together?? What on earth
 does that mean?! I mean, either they are included or they are not. All
 results will be in some group, so where exactly do you want these not to
 be grouped together documents to be grouped? In any case, please clarify
 what your expectations really are.

 -- Jack Krupansky
 -Original Message- From: Oussama Jilal
 Sent: Friday, February 22, 2013 7:17 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Grouping and empty fields

 Thank you Johannes, but I want the documents having the field empty to
 be included in the results, just not to be grouped together, and if I
 understood your solution correctly, it will simply remove those
 documents from the results (Note : The field values are very variable
 and unknown to me).

 On 02/22/2013 02:53 PM, Johannes Rodenwald wrote:

 Hi Oussama,

 If you have only a few distinct, unchanging values in the field that you
 group upon, you could implement a FilterQuery (query parameter fq) and
 add it to the query, allowing all valid values, but not an empty field. For
 example:

 fq=my_grouping_string_field:( value_a OR value_b OR value_c OR value_d )

 If you use SOLR 4.x, you should be able to group upon an integer field,
 allowing a range filter:
 (I still work with 3.6 which can only group on string fields, so i didnt
 test this one)

 fq=my_grouping_integer_field:[**1 TO *]

 --
 Johannes Rodenwald


 - Ursprüngliche Mail -
 Von: Oussama Jilal jilal.ouss...@gmail.com
 An: solr-user@lucene.apache.org
 Gesendet: Freitag, 22. Februar 2013 12:32:13
 Betreff: Solr Grouping and empty fields

 Hi,

 I need to group some results in solr based on a field, but I don't want
 documents having that field empty to be grouped together, does anyone
 know how to achieve that ?



 --
 Oussama Jilal