Re: sort groups by the sum of the scores of the documents within each group
hei, Erick, Sorry to bother you again, i send the client requirement to you in the solr mail list, but i can't get your reply, i want your advice. 2014-05-06 13:24 GMT+08:00 Frankcis [via Lucene] ml-node+s472066n413486...@n3.nabble.com: thank you, Erick, you're good man, this is the client requirement: In the forum, there is a lot of discussion of the content under different subjects, search for a keyword, which will lead to a result that the word of content or subject match the query, group these document based on every subject, sort these groups based on the sum score of every subject. my pleasure to listen your suggestions. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134869.html To unsubscribe from Re: sort groups by the sum of the scores of the documents within each group, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4134715code=ZmluYWx4Y29kZUBnbWFpbC5jb218NDEzNDcxNXwyMDg1ODE1Mzg4 . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4135044.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sort groups by the sum of the scores of the documents within each group
I don't think so. Solr excels at getting the score of single documents, not aggregation. It's not at all clear to me, though, that the sum of documents' scores is a reasonable thing to sort by. Consider grouping on a very common term. You'd never do this, but group on the elements of a text field. Then the group 'a' would sort to the top almost always (or maybe 'the' or...). This sounds like an XY problem, what use-case are you trying to solve? Best, Erick On Sun, May 4, 2014 at 9:31 PM, frank shi finalxc...@gmail.com wrote: Currently, solr grouping (http://wiki.apache.org/solr/FieldCollapsing) sorts groups by the score of the top document within each group. E.g. [...] groups:[{ groupValue:81cb63020d0339adb019a924b2a9e0c2, doclist:{numFound:9,start:0,maxScore:4.729042,docs:[ { id:7481df771afe39fab368ce19dfeeb528, [...], score:4.729042}, { id:c879e95b5f16343dad8b1248133727c2, [...], score:4.6635237}, { id:485b9aec90fd3ef381f013c51ab6a4df, [...], score:4.347174}] }}, [...] Is there an out-of-the-box way to sort groups by the sum of the scores of the documents within each group? E.g. [...] groups:[{ groupValue:81cb63020d0339adb019a924b2a9e0c2, doclist:{numFound:9,start:0,scoreSum:13.739738,docs:[ { id:7481df771afe39fab368ce19dfeeb528, [...], score:4.729042}, { id:c879e95b5f16343dad8b1248133727c2, [...], score:4.6635237}, { id:485b9aec90fd3ef381f013c51ab6a4df, [...], score:4.347174}] }}, [...] With the release of sorting by Function Query (https://issues.apache.org/jira/browse/SOLR-1297), it seems that there should be a way to use the sum() function (http://wiki.apache.org/solr/FunctionQuery). But it's not quite close enough since the score field is not part of the documents. I feel like I'm close but I'm missing some obvious piece. I'm using Solr 4.6. -- View this message in context: http://lucene.472066.n3.nabble.com/sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134607.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sort groups by the sum of the scores of the documents within each group
my scheme.xml: schema name=example core one version=1.1 types fieldtype name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=long class=solr.TrieLongField precisionStep=0 positionIncrementGap=0/ fieldType name=uuid class=solr.UUIDField indexed=true / fieldtype name=textComplex class=solr.TextField positionIncrementGap=100 omitNorms=false autoGeneratePhraseQueries=false analyzer type=query tokenizer class=com.chenlb.mmseg4j.solr.MMSegTokenizerFactory mode=complex dicPath=E:\solr-4.6.1\example\solr\dict/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=false expand=true/ /analyzer analyzer type=index tokenizer class=com.chenlb.mmseg4j.solr.MMSegTokenizerFactory mode=complex dicPath=E:\solr-4.6.1\example\solr\dict/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=false expand=true/ /analyzer /fieldtype /types fields field name=idtype=uuidindexed=true stored=true multiValued=false required=true / field name=name type=textComplexindexed=true stored=true multiValued=false / field name=type type=stringindexed=true stored=true multiValued=false / field name=price type=longindexed=true stored=true / field name=_version_ type=long indexed=true stored=true/ /fields uniqueKeyid/uniqueKey defaultSearchFieldname/defaultSearchField solrQueryParser defaultOperator=OR/ /schema update docs: docs: [ { name: 苹果4s, type: 手机, price: 2000, id: 4017e35a-6b19-45b6-b945-382340ca1eec, _version_: 1466799722505175000 }, { name: 苹果5, type: 手机, price: 5000, id: 4052d9f3-f6d9-458f-8bb0-477b17852f37, _version_: 1466799735745544200 }, { name: 三星, type: 手机, price: 3000, id: 468abce8-8bb9-4f51-9900-8d4d6abc02ac, _version_: 1466799747596550100 }, { name: 摩托罗拉i3, type: 电脑, price: 1000, id: db66bb02-3d6a-4ab0-9133-2e6e38b3d4dd, _version_: 1466799757491961900 }, { name: 摩托罗拉i5, type: 电脑, price: 1500, id: f211525f-bc3c-4ea7-aded-1c46a94ecd1c, _version_: 1466799766311534600 } ] thank you , Erick, i want to sort groups based on the sum of documents' scores within each group, as you said, solr excels at getting the score of single documents, in solr 4.6, the default sort of group each other depends on the maxScore of all documents within each group, but the sum of documents' scores, though i can get the sum of documents' scores by the client program, it's not good idea, l know that the stats component of solr can statistics the long field, so I had the idea to use statistic data for score field, but the score is pse-udo field, the stats.field doesn't support it. In addition, as scheme.xml displayed, i do group on the elements of a string field(type) without using participle. -- View this message in context: http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134830.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sort groups by the sum of the scores of the documents within each group
You haven't answered _why_ this is a good idea. I'm having a hard time understanding what would be _useful_ about sorting this way. Just because the sum of scores in a group is greater than the sum of scores in another says _nothing_ about how relevant any of the docs in the group are relative to each other. I mean group 1 could have 10M documents all with a score of .01 and group 2 could have 1 document with a score of 1,000 and group 1 would sort first. So unless you have some unusual use-case which you haven't yet articulated, this seems like a bad idea. Best, Erick On Mon, May 5, 2014 at 7:20 PM, Frankcis finalxc...@gmail.com wrote: my scheme.xml: schema name=example core one version=1.1 types fieldtype name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=long class=solr.TrieLongField precisionStep=0 positionIncrementGap=0/ fieldType name=uuid class=solr.UUIDField indexed=true / fieldtype name=textComplex class=solr.TextField positionIncrementGap=100 omitNorms=false autoGeneratePhraseQueries=false analyzer type=query tokenizer class=com.chenlb.mmseg4j.solr.MMSegTokenizerFactory mode=complex dicPath=E:\solr-4.6.1\example\solr\dict/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=false expand=true/ /analyzer analyzer type=index tokenizer class=com.chenlb.mmseg4j.solr.MMSegTokenizerFactory mode=complex dicPath=E:\solr-4.6.1\example\solr\dict/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=false expand=true/ /analyzer /fieldtype /types fields field name=idtype=uuid indexed=true stored=true multiValued=false required=true / field name=name type=textComplexindexed=true stored=true multiValued=false / field name=type type=stringindexed=true stored=true multiValued=false / field name=price type=longindexed=true stored=true / field name=_version_ type=long indexed=true stored=true/ /fields uniqueKeyid/uniqueKey defaultSearchFieldname/defaultSearchField solrQueryParser defaultOperator=OR/ /schema update docs: docs: [ { name: 苹果4s, type: 手机, price: 2000, id: 4017e35a-6b19-45b6-b945-382340ca1eec, _version_: 1466799722505175000 }, { name: 苹果5, type: 手机, price: 5000, id: 4052d9f3-f6d9-458f-8bb0-477b17852f37, _version_: 1466799735745544200 }, { name: 三星, type: 手机, price: 3000, id: 468abce8-8bb9-4f51-9900-8d4d6abc02ac, _version_: 1466799747596550100 }, { name: 摩托罗拉i3, type: 电脑, price: 1000, id: db66bb02-3d6a-4ab0-9133-2e6e38b3d4dd, _version_: 1466799757491961900 }, { name: 摩托罗拉i5, type: 电脑, price: 1500, id: f211525f-bc3c-4ea7-aded-1c46a94ecd1c, _version_: 1466799766311534600 } ] thank you , Erick, i want to sort groups based on the sum of documents' scores within each group, as you said, solr excels at getting the score of single documents, in solr 4.6, the default sort of group each other depends on the maxScore of all documents within each group, but the sum of documents' scores, though i can get the sum of documents' scores by the client program, it's not good idea, l know that the stats component of solr can statistics the long field, so I had the idea to use statistic data for score field, but the score is pse-udo field, the stats.field doesn't support it. In addition, as scheme.xml displayed, i do group on the elements of a string field(type) without using participle. -- View this message in context: http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134830.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sort groups by the sum of the scores of the documents within each group
thank you, Erick, you're right, the maxScore of document within each group is more effective than the sum of scores in a group, especially some use-case just as your assumption(group 1 could have 10M documents all with a score of .01 and group 2 could have 1 document with a score of 1,000 and group 1 would sort first) ,but the function is required by the client, can you tell me the way how to achieve it ? -- View this message in context: http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134856.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sort groups by the sum of the scores of the documents within each group
thank you, Erick, you're good man, this is the client requirement: In the forum, there is a lot of discussion of the content under different subjects, search for a keyword, which will lead to a result that the word of content or subject match the query, group these document based on every subject, sort these groups based on the sum score of every subject. my pleasure to listen your suggestions. -- View this message in context: http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134869.html Sent from the Solr - User mailing list archive at Nabble.com.