Re: sort groups by the sum of the scores of the documents within each group

2014-05-11 Thread Frankcis
hei, Erick, Sorry to bother you again, i send the client requirement to you
in the solr mail list, but i can't get your reply, i want your advice.


2014-05-06 13:24 GMT+08:00 Frankcis [via Lucene] 
ml-node+s472066n413486...@n3.nabble.com:

 thank you, Erick, you're good man,
 this is the client requirement:
 In the forum, there is a lot of discussion of the content under different
 subjects, search for a keyword,
 which will lead to a result that the word of content or subject match the
 query, group these document based on every subject, sort these groups based
 on the sum score of every subject.

 my pleasure to listen your suggestions.



 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134869.html
  To unsubscribe from Re: sort groups by the sum of the scores of the
 documents within each group, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4134715code=ZmluYWx4Y29kZUBnbWFpbC5jb218NDEzNDcxNXwyMDg1ODE1Mzg4
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4135044.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: sort groups by the sum of the scores of the documents within each group

2014-05-05 Thread Erick Erickson
I don't think so. Solr excels at getting the score of single
documents, not aggregation.

It's not at all clear to me, though, that the sum of documents' scores
is a reasonable thing to sort by. Consider grouping on a very common
term. You'd never do this, but group on the elements of a text field.
Then the group 'a' would sort to the top almost always (or maybe 'the'
or...).

This sounds like an XY problem, what use-case are you trying to solve?

Best,
Erick

On Sun, May 4, 2014 at 9:31 PM, frank shi finalxc...@gmail.com wrote:
 Currently, solr grouping (http://wiki.apache.org/solr/FieldCollapsing) sorts
 groups by the score of the top document within each group. E.g.
 [...]
 groups:[{
 groupValue:81cb63020d0339adb019a924b2a9e0c2,
 doclist:{numFound:9,start:0,maxScore:4.729042,docs:[
 {
   id:7481df771afe39fab368ce19dfeeb528,
   [...],
   score:4.729042},
 {
   id:c879e95b5f16343dad8b1248133727c2,
   [...],
   score:4.6635237},
 {
   id:485b9aec90fd3ef381f013c51ab6a4df,
   [...],
   score:4.347174}]
 }},
 [...]
 Is there an out-of-the-box way to sort groups by the sum of the scores of
 the documents within each group? E.g.
 [...]
 groups:[{
 groupValue:81cb63020d0339adb019a924b2a9e0c2,
 doclist:{numFound:9,start:0,scoreSum:13.739738,docs:[
 {
   id:7481df771afe39fab368ce19dfeeb528,
   [...],
   score:4.729042},
 {
   id:c879e95b5f16343dad8b1248133727c2,
   [...],
   score:4.6635237},
 {
   id:485b9aec90fd3ef381f013c51ab6a4df,
   [...],
   score:4.347174}]
 }},
 [...]
 With the release of sorting by Function Query
 (https://issues.apache.org/jira/browse/SOLR-1297), it seems that there
 should be a way to use the sum() function
 (http://wiki.apache.org/solr/FunctionQuery). But it's not quite close enough
 since the score field is not part of the documents.

 I feel like I'm close but I'm missing some obvious piece. I'm using Solr
 4.6.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134607.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: sort groups by the sum of the scores of the documents within each group

2014-05-05 Thread Frankcis
my scheme.xml:
schema name=example core one version=1.1
  types
   fieldtype name=string  class=solr.StrField sortMissingLast=true
omitNorms=true/
   fieldType name=long class=solr.TrieLongField precisionStep=0
positionIncrementGap=0/
   fieldType name=uuid class=solr.UUIDField indexed=true /
   fieldtype name=textComplex class=solr.TextField
positionIncrementGap=100 omitNorms=false
autoGeneratePhraseQueries=false
   analyzer type=query
tokenizer class=com.chenlb.mmseg4j.solr.MMSegTokenizerFactory
mode=complex dicPath=E:\solr-4.6.1\example\solr\dict/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt
ignoreCase=false expand=true/
/analyzer
analyzer type=index
tokenizer class=com.chenlb.mmseg4j.solr.MMSegTokenizerFactory
mode=complex dicPath=E:\solr-4.6.1\example\solr\dict/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt
ignoreCase=false expand=true/
/analyzer
  /fieldtype
  /types
  
 fields   
  field name=idtype=uuidindexed=true 
 stored=true 
multiValued=false required=true /
  field name=name  type=textComplexindexed=true 
stored=true  multiValued=false /
  field name=type  type=stringindexed=true  stored=true 
multiValued=false /
  field name=price type=longindexed=true  stored=true 
/
 
  field name=_version_ type=long  indexed=true  stored=true/
 /fields
 
 uniqueKeyid/uniqueKey

 
 defaultSearchFieldname/defaultSearchField

 
 solrQueryParser defaultOperator=OR/
/schema

update docs:
docs: [
  {
name: 苹果4s,
type: 手机,
price: 2000,
id: 4017e35a-6b19-45b6-b945-382340ca1eec,
_version_: 1466799722505175000
  },
  {
name: 苹果5,
type: 手机,
price: 5000,
id: 4052d9f3-f6d9-458f-8bb0-477b17852f37,
_version_: 1466799735745544200
  },
  {
name: 三星,
type: 手机,
price: 3000,
id: 468abce8-8bb9-4f51-9900-8d4d6abc02ac,
_version_: 1466799747596550100
  },
  {
name: 摩托罗拉i3,
type: 电脑,
price: 1000,
id: db66bb02-3d6a-4ab0-9133-2e6e38b3d4dd,
_version_: 1466799757491961900
  },
  {
name: 摩托罗拉i5,
type: 电脑,
price: 1500,
id: f211525f-bc3c-4ea7-aded-1c46a94ecd1c,
_version_: 1466799766311534600
  }
]
thank you , Erick,
i want to sort groups based on the sum of documents' scores within each
group, as you said, solr excels at getting the score of single documents, in
solr 4.6, the default sort of group each other depends on the maxScore of
all documents within each group, but the sum of documents' scores, though i
can get the sum of documents' scores by the client program, it's not good
idea, l know that the stats component of solr can statistics the long field,
so I had the idea to use statistic data for score field, but the score is
pse-udo field, the stats.field doesn't support it. In addition, as
scheme.xml displayed,  i do group on the elements of a string field(type)
without using participle.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134830.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: sort groups by the sum of the scores of the documents within each group

2014-05-05 Thread Erick Erickson
You haven't answered _why_ this is a good idea. I'm having a hard
time understanding what would be _useful_ about sorting this way. Just
because the sum of scores in a group is greater than the sum of scores
in another says _nothing_ about how relevant any of the docs in the group
are relative to each other.

I mean group 1 could have 10M documents all with a score of .01 and group
2 could have 1 document with a score of 1,000 and group 1 would sort
first.

So unless you have some unusual use-case which you haven't yet articulated,
this seems like a bad idea.

Best,
Erick

On Mon, May 5, 2014 at 7:20 PM, Frankcis finalxc...@gmail.com wrote:
 my scheme.xml:
 schema name=example core one version=1.1
   types
fieldtype name=string  class=solr.StrField sortMissingLast=true
 omitNorms=true/
fieldType name=long class=solr.TrieLongField precisionStep=0
 positionIncrementGap=0/
fieldType name=uuid class=solr.UUIDField indexed=true /
fieldtype name=textComplex class=solr.TextField
 positionIncrementGap=100 omitNorms=false
 autoGeneratePhraseQueries=false
analyzer type=query
 tokenizer 
 class=com.chenlb.mmseg4j.solr.MMSegTokenizerFactory
 mode=complex dicPath=E:\solr-4.6.1\example\solr\dict/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
 filter class=solr.SynonymFilterFactory 
 synonyms=synonyms.txt
 ignoreCase=false expand=true/
 /analyzer
 analyzer type=index
 tokenizer 
 class=com.chenlb.mmseg4j.solr.MMSegTokenizerFactory
 mode=complex dicPath=E:\solr-4.6.1\example\solr\dict/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
 filter class=solr.SynonymFilterFactory 
 synonyms=synonyms.txt
 ignoreCase=false expand=true/
 /analyzer
   /fieldtype
   /types

  fields
   field name=idtype=uuid
 indexed=true  stored=true
 multiValued=false required=true /
   field name=name  type=textComplexindexed=true
 stored=true  multiValued=false /
   field name=type  type=stringindexed=true  stored=true
 multiValued=false /
   field name=price type=longindexed=true  
 stored=true /

   field name=_version_ type=long  indexed=true  stored=true/
  /fields

  uniqueKeyid/uniqueKey


  defaultSearchFieldname/defaultSearchField


  solrQueryParser defaultOperator=OR/
 /schema

 update docs:
 docs: [
   {
 name: 苹果4s,
 type: 手机,
 price: 2000,
 id: 4017e35a-6b19-45b6-b945-382340ca1eec,
 _version_: 1466799722505175000
   },
   {
 name: 苹果5,
 type: 手机,
 price: 5000,
 id: 4052d9f3-f6d9-458f-8bb0-477b17852f37,
 _version_: 1466799735745544200
   },
   {
 name: 三星,
 type: 手机,
 price: 3000,
 id: 468abce8-8bb9-4f51-9900-8d4d6abc02ac,
 _version_: 1466799747596550100
   },
   {
 name: 摩托罗拉i3,
 type: 电脑,
 price: 1000,
 id: db66bb02-3d6a-4ab0-9133-2e6e38b3d4dd,
 _version_: 1466799757491961900
   },
   {
 name: 摩托罗拉i5,
 type: 电脑,
 price: 1500,
 id: f211525f-bc3c-4ea7-aded-1c46a94ecd1c,
 _version_: 1466799766311534600
   }
 ]
 thank you , Erick,
 i want to sort groups based on the sum of documents' scores within each
 group, as you said, solr excels at getting the score of single documents, in
 solr 4.6, the default sort of group each other depends on the maxScore of
 all documents within each group, but the sum of documents' scores, though i
 can get the sum of documents' scores by the client program, it's not good
 idea, l know that the stats component of solr can statistics the long field,
 so I had the idea to use statistic data for score field, but the score is
 pse-udo field, the stats.field doesn't support it. In addition, as
 scheme.xml displayed,  i do group on the elements of a string field(type)
 without using participle.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134830.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: sort groups by the sum of the scores of the documents within each group

2014-05-05 Thread Frankcis
thank you, Erick, you're right, the maxScore of document within each group is
more effective than the sum of scores in a group, especially some use-case
just as your assumption(group 1 could have 10M documents all with a score of
.01 and group 2 could have 1 document with a score of 1,000 and group 1
would sort 
first) ,but the function is required by the client, can you tell me the way
how to achieve it ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134856.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: sort groups by the sum of the scores of the documents within each group

2014-05-05 Thread Frankcis
thank you, Erick, you're good man,
this is the client requirement:
In the forum, there is a lot of discussion of the content under different
subjects, search for a keyword,
which will lead to a result that the word of content or subject match the
query, group these document based on every subject, sort these groups based
on the sum score of every subject.

my pleasure to listen your suggestions.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134869.html
Sent from the Solr - User mailing list archive at Nabble.com.