Re: sort groups by the sum of the scores of the documents within each group
hei, Erick, Sorry to bother you again, i send the client requirement to you in the solr mail list, but i can't get your reply, i want your advice. 2014-05-06 13:24 GMT+08:00 Frankcis [via Lucene] ml-node+s472066n413486...@n3.nabble.com: thank you, Erick, you're good man, this is the client requirement: In the forum, there is a lot of discussion of the content under different subjects, search for a keyword, which will lead to a result that the word of content or subject match the query, group these document based on every subject, sort these groups based on the sum score of every subject. my pleasure to listen your suggestions. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134869.html To unsubscribe from Re: sort groups by the sum of the scores of the documents within each group, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4134715code=ZmluYWx4Y29kZUBnbWFpbC5jb218NDEzNDcxNXwyMDg1ODE1Mzg4 . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4135044.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr does not recognize language
my pleasure! 2014-05-06 16:43 GMT+08:00 Victor Pascual [via Lucene] ml-node+s472066n413488...@n3.nabble.com: Thank you very much Ahmet for your help. It finally worked! For anyone interested, all your hints where more than useful. I basically had two problems: - Didn't have my language detection chain in the update/json requestHandler - Didn't create the field where the detected language should be stored Again, thanks for your help! On Mon, May 5, 2014 at 5:19 PM, Ahmet Arslan [hidden email]http://user/SendEmail.jtp?type=nodenode=4134885i=0 wrote: Hi Victor, I don't know mysolr, I assume you are using /update/json, lets add your chain to defaults section. requestHandler name=/update/json class=solr.UpdateRequestHandler lst name=defaults str name=stream.contentTypeapplication/json/str str name=update.chainlangid/str /lst /requestHandler On Monday, May 5, 2014 4:06 PM, Victor Pascual [hidden email] http://user/SendEmail.jtp?type=nodenode=4134885i=1 wrote: Hi there, I'm indexing my documents using mysolr. I mainly generate a lost of json objects and the run: solr.update(documents_array,'json') On Mon, May 5, 2014 at 1:08 PM, Ahmet Arslan [hidden email]http://user/SendEmail.jtp?type=nodenode=4134885i=2 wrote: Hi Victor, How do you index your documents? Your last config looks correct. However for example if you use data import handler you need to add update.chain there too. Same as extraction request hadler if you are using sole-cell. requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=config/home/username/data-config.xml/str str name=update.chainlangid/str /lst /requestHandler By the way The URL http://localhost:8080/solr/update?commit=trueupdate.chain=langid was just an example and meant to feed xml update messages by POST method. Not to use in a browser. Ahmet On Monday, May 5, 2014 11:04 AM, Victor Pascual [hidden email] http://user/SendEmail.jtp?type=nodenode=4134885i=3 wrote: Thank you very much for you help Ahmet. However the language detection is still not workin. :( My solrconfig.xml didn't contain that lst section inside the update requestHandler. That's the content I added: requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.chainlangid/str /lst /requestHandler updateRequestProcessorChain name=langid processor class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory lst name=defaults str name=langid.fltext/str str name=langid.langFieldlang/str /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain Now, your suggested query http://localhost:8080/solr/update?commit=trueupdate.chain=langidreturns response lst name=responseHeader int name=status0/int int name=QTime14/int /lst /response And there is still no lang field in my documents. Any idea what am I doing wrong? On Tue, Apr 29, 2014 at 5:33 PM, Ahmet Arslan [hidden email]http://user/SendEmail.jtp?type=nodenode=4134885i=4 wrote: Hi, solr/update should be used, not /solr/select curl ' http://localhost:8983/solr/update?commit=trueupdate.chain=langid ' By the way don't you have following definition in your solrconfig.xml? requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainlangid/str /lst /requestHandler On Tuesday, April 29, 2014 4:50 PM, Victor Pascual [hidden email] http://user/SendEmail.jtp?type=nodenode=4134885i=5 wrote: Hi Ahmet, thanks for your reply. Adding update.chain=langid to my query doesn't work: IP:8080/solr/select/?q=*%3A*update.chain=langid Regarding defining the chain in an UpdateRequestHandler... sorry for the lame question but shall I paste those three lines to solrconfig.xml, or shall I add them somewhere else? There is not UpdateRequestHandler in my solrconfig. Thanks! On Tue, Apr 29, 2014 at 3:13 PM, Ahmet Arslan [hidden email]http://user/SendEmail.jtp?type=nodenode=4134885i=6 wrote: Hi, Did you attach your chain to a UpdateRequestHandler? You can do it by adding update.chain=langid to the URL or defining it in a defaults section as follows lst name=defaults str name=update.chainlangid/str /lst On Tuesday, April 29, 2014 3:18 PM, Victor Pascual
Re: Solr does not recognize language
i think you should check your scheme.xml and solrconfig.xml encoding format = utf-8。 -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-does-not-recognize-language-tp4133711p4134643.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr does not recognize language
because if your encoding format doesn't both utf-8, building index will lead to messy code, of course, you will not get the expected result. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-does-not-recognize-language-tp4133711p4134647.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sort groups by the sum of the scores of the documents within each group
my scheme.xml: schema name=example core one version=1.1 types fieldtype name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=long class=solr.TrieLongField precisionStep=0 positionIncrementGap=0/ fieldType name=uuid class=solr.UUIDField indexed=true / fieldtype name=textComplex class=solr.TextField positionIncrementGap=100 omitNorms=false autoGeneratePhraseQueries=false analyzer type=query tokenizer class=com.chenlb.mmseg4j.solr.MMSegTokenizerFactory mode=complex dicPath=E:\solr-4.6.1\example\solr\dict/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=false expand=true/ /analyzer analyzer type=index tokenizer class=com.chenlb.mmseg4j.solr.MMSegTokenizerFactory mode=complex dicPath=E:\solr-4.6.1\example\solr\dict/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=false expand=true/ /analyzer /fieldtype /types fields field name=idtype=uuidindexed=true stored=true multiValued=false required=true / field name=name type=textComplexindexed=true stored=true multiValued=false / field name=type type=stringindexed=true stored=true multiValued=false / field name=price type=longindexed=true stored=true / field name=_version_ type=long indexed=true stored=true/ /fields uniqueKeyid/uniqueKey defaultSearchFieldname/defaultSearchField solrQueryParser defaultOperator=OR/ /schema update docs: docs: [ { name: 苹果4s, type: 手机, price: 2000, id: 4017e35a-6b19-45b6-b945-382340ca1eec, _version_: 1466799722505175000 }, { name: 苹果5, type: 手机, price: 5000, id: 4052d9f3-f6d9-458f-8bb0-477b17852f37, _version_: 1466799735745544200 }, { name: 三星, type: 手机, price: 3000, id: 468abce8-8bb9-4f51-9900-8d4d6abc02ac, _version_: 1466799747596550100 }, { name: 摩托罗拉i3, type: 电脑, price: 1000, id: db66bb02-3d6a-4ab0-9133-2e6e38b3d4dd, _version_: 1466799757491961900 }, { name: 摩托罗拉i5, type: 电脑, price: 1500, id: f211525f-bc3c-4ea7-aded-1c46a94ecd1c, _version_: 1466799766311534600 } ] thank you , Erick, i want to sort groups based on the sum of documents' scores within each group, as you said, solr excels at getting the score of single documents, in solr 4.6, the default sort of group each other depends on the maxScore of all documents within each group, but the sum of documents' scores, though i can get the sum of documents' scores by the client program, it's not good idea, l know that the stats component of solr can statistics the long field, so I had the idea to use statistic data for score field, but the score is pse-udo field, the stats.field doesn't support it. In addition, as scheme.xml displayed, i do group on the elements of a string field(type) without using participle. -- View this message in context: http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134830.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr does not recognize language
hi,iorixxx, i'm Frankcis, not Victor , are you make the wrong email? 2014-05-05 23:20 GMT+08:00 iorixxx [via Lucene] ml-node+s472066n4134713...@n3.nabble.com: Hi Victor, I don't know mysolr, I assume you are using /update/json, lets add your chain to defaults section. requestHandler name=/update/json class=solr.UpdateRequestHandler lst name=defaults str name=stream.contentTypeapplication/json/str str name=update.chainlangid/str /lst /requestHandler On Monday, May 5, 2014 4:06 PM, Victor Pascual [hidden email]http://user/SendEmail.jtp?type=nodenode=4134713i=0 wrote: Hi there, I'm indexing my documents using mysolr. I mainly generate a lost of json objects and the run: solr.update(documents_array,'json') On Mon, May 5, 2014 at 1:08 PM, Ahmet Arslan [hidden email]http://user/SendEmail.jtp?type=nodenode=4134713i=1 wrote: Hi Victor, How do you index your documents? Your last config looks correct. However for example if you use data import handler you need to add update.chain there too. Same as extraction request hadler if you are using sole-cell. requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=config/home/username/data-config.xml/str str name=update.chainlangid/str /lst /requestHandler By the way The URL http://localhost:8080/solr/update?commit=trueupdate.chain=langid was just an example and meant to feed xml update messages by POST method. Not to use in a browser. Ahmet On Monday, May 5, 2014 11:04 AM, Victor Pascual [hidden email] http://user/SendEmail.jtp?type=nodenode=4134713i=2 wrote: Thank you very much for you help Ahmet. However the language detection is still not workin. :( My solrconfig.xml didn't contain that lst section inside the update requestHandler. That's the content I added: requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.chainlangid/str /lst /requestHandler updateRequestProcessorChain name=langid processor class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory lst name=defaults str name=langid.fltext/str str name=langid.langFieldlang/str /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain Now, your suggested query http://localhost:8080/solr/update?commit=trueupdate.chain=langid returns response lst name=responseHeader int name=status0/int int name=QTime14/int /lst /response And there is still no lang field in my documents. Any idea what am I doing wrong? On Tue, Apr 29, 2014 at 5:33 PM, Ahmet Arslan [hidden email]http://user/SendEmail.jtp?type=nodenode=4134713i=3 wrote: Hi, solr/update should be used, not /solr/select curl ' http://localhost:8983/solr/update?commit=trueupdate.chain=langid' By the way don't you have following definition in your solrconfig.xml? requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainlangid/str /lst /requestHandler On Tuesday, April 29, 2014 4:50 PM, Victor Pascual [hidden email] http://user/SendEmail.jtp?type=nodenode=4134713i=4 wrote: Hi Ahmet, thanks for your reply. Adding update.chain=langid to my query doesn't work: IP:8080/solr/select/?q=*%3A*update.chain=langid Regarding defining the chain in an UpdateRequestHandler... sorry for the lame question but shall I paste those three lines to solrconfig.xml, or shall I add them somewhere else? There is not UpdateRequestHandler in my solrconfig. Thanks! On Tue, Apr 29, 2014 at 3:13 PM, Ahmet Arslan [hidden email]http://user/SendEmail.jtp?type=nodenode=4134713i=5 wrote: Hi, Did you attach your chain to a UpdateRequestHandler? You can do it by adding update.chain=langid to the URL or defining it in a defaults section as follows lst name=defaults str name=update.chainlangid/str /lst On Tuesday, April 29, 2014 3:18 PM, Victor Pascual [hidden email] http://user/SendEmail.jtp?type=nodenode=4134713i=6 wrote: Dear all, I'm a new user of Solr. I've managed to index a bunch of documents (in fact, they are tweets) and everything works quite smoothly. Nevertheless it looks like Solr doesn't detect the language of my documents nor remove stopwords accordingly so I can extract the most frequent terms. I've added this piece of XML to my solrconfig.xml as well as the Tika lib jars. updateRequestProcessorChain name=langid processor class
Re: sort groups by the sum of the scores of the documents within each group
thank you, Erick, you're right, the maxScore of document within each group is more effective than the sum of scores in a group, especially some use-case just as your assumption(group 1 could have 10M documents all with a score of .01 and group 2 could have 1 document with a score of 1,000 and group 1 would sort first) ,but the function is required by the client, can you tell me the way how to achieve it ? -- View this message in context: http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134856.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sort groups by the sum of the scores of the documents within each group
thank you, Erick, you're good man, this is the client requirement: In the forum, there is a lot of discussion of the content under different subjects, search for a keyword, which will lead to a result that the word of content or subject match the query, group these document based on every subject, sort these groups based on the sum score of every subject. my pleasure to listen your suggestions. -- View this message in context: http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134869.html Sent from the Solr - User mailing list archive at Nabble.com.