Re: sort groups by the sum of the scores of the documents within each group

2014-05-11 Thread Frankcis
hei, Erick, Sorry to bother you again, i send the client requirement to you
in the solr mail list, but i can't get your reply, i want your advice.


2014-05-06 13:24 GMT+08:00 Frankcis [via Lucene] 
ml-node+s472066n413486...@n3.nabble.com:

 thank you, Erick, you're good man,
 this is the client requirement:
 In the forum, there is a lot of discussion of the content under different
 subjects, search for a keyword,
 which will lead to a result that the word of content or subject match the
 query, group these document based on every subject, sort these groups based
 on the sum score of every subject.

 my pleasure to listen your suggestions.



 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134869.html
  To unsubscribe from Re: sort groups by the sum of the scores of the
 documents within each group, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4134715code=ZmluYWx4Y29kZUBnbWFpbC5jb218NDEzNDcxNXwyMDg1ODE1Mzg4
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4135044.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr does not recognize language

2014-05-06 Thread Frankcis
my pleasure!


2014-05-06 16:43 GMT+08:00 Victor Pascual [via Lucene] 
ml-node+s472066n413488...@n3.nabble.com:

 Thank you very much Ahmet for your help.
 It finally worked!

 For anyone interested, all your hints where more than useful. I basically
 had two problems:
 - Didn't have my language detection chain in the update/json
 requestHandler
 - Didn't create the field where the detected language should be stored

 Again, thanks for your help!


 On Mon, May 5, 2014 at 5:19 PM, Ahmet Arslan [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4134885i=0
 wrote:

  Hi Victor,
 
  I don't know mysolr, I assume you are using /update/json, lets add your
  chain to defaults section.
 
requestHandler name=/update/json class=solr.UpdateRequestHandler
 
  lst name=defaults
   str name=stream.contentTypeapplication/json/str
   str name=update.chainlangid/str
 /lst
/requestHandler
 
 
 
 
  On Monday, May 5, 2014 4:06 PM, Victor Pascual 
  [hidden email] http://user/SendEmail.jtp?type=nodenode=4134885i=1
 wrote:
  Hi there,
 
  I'm indexing my documents using mysolr. I mainly generate a lost of json
  objects and the run: solr.update(documents_array,'json')
 
 
 
  On Mon, May 5, 2014 at 1:08 PM, Ahmet Arslan [hidden 
  email]http://user/SendEmail.jtp?type=nodenode=4134885i=2
 wrote:
 
   Hi Victor,
  
   How do you index your documents? Your last config looks correct.
 However
   for example if you use data import handler you need to add
 update.chain
   there too. Same as extraction request hadler if you are using
 sole-cell.
  
   requestHandler name=/dataimport
   class=org.apache.solr.handler.dataimport.DataImportHandler
   lst name=defaults
 str name=config/home/username/data-config.xml/str
 str name=update.chainlangid/str
   /lst
 /requestHandler
  
   By the way The URL
   http://localhost:8080/solr/update?commit=trueupdate.chain=langid was
   just an example and meant to feed xml update messages by POST method.
 Not
   to use in a browser.
  
   Ahmet
  
   On Monday, May 5, 2014 11:04 AM, Victor Pascual 
   [hidden email] http://user/SendEmail.jtp?type=nodenode=4134885i=3
 wrote:
  
   Thank you very much for you help Ahmet.
  
   However the language detection is still not workin. :(
   My solrconfig.xml didn't contain that lst section inside the update
   requestHandler.
   That's the content I added:
  
 requestHandler name=/update
 class=solr.XmlUpdateRequestHandler
  lst name=defaults
str name=update.chainlangid/str
  /lst
   /requestHandler
   
  
  updateRequestProcessorChain name=langid
  processor
  
 
 class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory

 lst name=defaults
   str name=langid.fltext/str
   str name=langid.langFieldlang/str
 /lst
   /processor
   processor class=solr.LogUpdateProcessorFactory /
  processor class=solr.RunUpdateProcessorFactory /
/updateRequestProcessorChain
  
   Now, your suggested query
  
 http://localhost:8080/solr/update?commit=trueupdate.chain=langidreturns
  
   response
   lst name=responseHeader
   int name=status0/int
   int name=QTime14/int
   /lst
   /response
   And there is still no lang field in my documents.
   Any idea what am I doing wrong?
  
  
  
  
   On Tue, Apr 29, 2014 at 5:33 PM, Ahmet Arslan [hidden 
   email]http://user/SendEmail.jtp?type=nodenode=4134885i=4
 wrote:
  
   Hi,
   
   solr/update should be used, not /solr/select
   
   curl '
 http://localhost:8983/solr/update?commit=trueupdate.chain=langid
  '
   
   By the way don't you have following definition in your
 solrconfig.xml?
   
requestHandler name=/update class=solr.UpdateRequestHandler
   
  lst name=defaults
str name=update.chainlangid/str
  /lst
 /requestHandler
   
   
   
   
   On Tuesday, April 29, 2014 4:50 PM, Victor Pascual 
   [hidden email] http://user/SendEmail.jtp?type=nodenode=4134885i=5
 wrote:
   Hi Ahmet,
   
   thanks for your reply. Adding update.chain=langid to my query
 doesn't
   work: IP:8080/solr/select/?q=*%3A*update.chain=langid
   Regarding defining the chain in an UpdateRequestHandler... sorry for
 the
   lame question but shall I paste those three lines to solrconfig.xml,
 or
   shall I add them somewhere else?
   
   There is not UpdateRequestHandler in my solrconfig.
   
   Thanks!
   
   
   
   On Tue, Apr 29, 2014 at 3:13 PM, Ahmet Arslan [hidden 
   email]http://user/SendEmail.jtp?type=nodenode=4134885i=6

  wrote:
   
Hi,
   
Did you attach your chain to a UpdateRequestHandler?
   
You can do it by adding update.chain=langid to the URL or defining
 it
   in
a defaults section as follows
   
lst name=defaults
 str name=update.chainlangid/str
   /lst
   
   
   
On Tuesday, April 29, 2014 3:18 PM, Victor Pascual 
  

Re: Solr does not recognize language

2014-05-05 Thread Frankcis
i think you should check your scheme.xml and solrconfig.xml encoding format =
utf-8。



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-does-not-recognize-language-tp4133711p4134643.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr does not recognize language

2014-05-05 Thread Frankcis
because if your encoding format doesn't both utf-8, building index will lead
to messy code, of course, you will not get the expected result.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-does-not-recognize-language-tp4133711p4134647.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: sort groups by the sum of the scores of the documents within each group

2014-05-05 Thread Frankcis
my scheme.xml:
schema name=example core one version=1.1
  types
   fieldtype name=string  class=solr.StrField sortMissingLast=true
omitNorms=true/
   fieldType name=long class=solr.TrieLongField precisionStep=0
positionIncrementGap=0/
   fieldType name=uuid class=solr.UUIDField indexed=true /
   fieldtype name=textComplex class=solr.TextField
positionIncrementGap=100 omitNorms=false
autoGeneratePhraseQueries=false
   analyzer type=query
tokenizer class=com.chenlb.mmseg4j.solr.MMSegTokenizerFactory
mode=complex dicPath=E:\solr-4.6.1\example\solr\dict/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt
ignoreCase=false expand=true/
/analyzer
analyzer type=index
tokenizer class=com.chenlb.mmseg4j.solr.MMSegTokenizerFactory
mode=complex dicPath=E:\solr-4.6.1\example\solr\dict/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt
ignoreCase=false expand=true/
/analyzer
  /fieldtype
  /types
  
 fields   
  field name=idtype=uuidindexed=true 
 stored=true 
multiValued=false required=true /
  field name=name  type=textComplexindexed=true 
stored=true  multiValued=false /
  field name=type  type=stringindexed=true  stored=true 
multiValued=false /
  field name=price type=longindexed=true  stored=true 
/
 
  field name=_version_ type=long  indexed=true  stored=true/
 /fields
 
 uniqueKeyid/uniqueKey

 
 defaultSearchFieldname/defaultSearchField

 
 solrQueryParser defaultOperator=OR/
/schema

update docs:
docs: [
  {
name: 苹果4s,
type: 手机,
price: 2000,
id: 4017e35a-6b19-45b6-b945-382340ca1eec,
_version_: 1466799722505175000
  },
  {
name: 苹果5,
type: 手机,
price: 5000,
id: 4052d9f3-f6d9-458f-8bb0-477b17852f37,
_version_: 1466799735745544200
  },
  {
name: 三星,
type: 手机,
price: 3000,
id: 468abce8-8bb9-4f51-9900-8d4d6abc02ac,
_version_: 1466799747596550100
  },
  {
name: 摩托罗拉i3,
type: 电脑,
price: 1000,
id: db66bb02-3d6a-4ab0-9133-2e6e38b3d4dd,
_version_: 1466799757491961900
  },
  {
name: 摩托罗拉i5,
type: 电脑,
price: 1500,
id: f211525f-bc3c-4ea7-aded-1c46a94ecd1c,
_version_: 1466799766311534600
  }
]
thank you , Erick,
i want to sort groups based on the sum of documents' scores within each
group, as you said, solr excels at getting the score of single documents, in
solr 4.6, the default sort of group each other depends on the maxScore of
all documents within each group, but the sum of documents' scores, though i
can get the sum of documents' scores by the client program, it's not good
idea, l know that the stats component of solr can statistics the long field,
so I had the idea to use statistic data for score field, but the score is
pse-udo field, the stats.field doesn't support it. In addition, as
scheme.xml displayed,  i do group on the elements of a string field(type)
without using participle.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134830.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr does not recognize language

2014-05-05 Thread Frankcis
hi,iorixxx, i'm Frankcis, not Victor , are you make the wrong email?


2014-05-05 23:20 GMT+08:00 iorixxx [via Lucene] 
ml-node+s472066n4134713...@n3.nabble.com:

 Hi Victor,

 I don't know mysolr, I assume you are using /update/json, lets add your
 chain to defaults section.

   requestHandler name=/update/json class=solr.UpdateRequestHandler

 lst name=defaults
  str name=stream.contentTypeapplication/json/str
  str name=update.chainlangid/str
/lst
   /requestHandler




 On Monday, May 5, 2014 4:06 PM, Victor Pascual [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4134713i=0
 wrote:
 Hi there,

 I'm indexing my documents using mysolr. I mainly generate a lost of json
 objects and the run: solr.update(documents_array,'json')



 On Mon, May 5, 2014 at 1:08 PM, Ahmet Arslan [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4134713i=1
 wrote:

  Hi Victor,
 
  How do you index your documents? Your last config looks correct. However
  for example if you use data import handler you need to add update.chain
  there too. Same as extraction request hadler if you are using sole-cell.
 
  requestHandler name=/dataimport
  class=org.apache.solr.handler.dataimport.DataImportHandler
  lst name=defaults
str name=config/home/username/data-config.xml/str
str name=update.chainlangid/str
  /lst
/requestHandler
 
  By the way The URL
  http://localhost:8080/solr/update?commit=trueupdate.chain=langid was
  just an example and meant to feed xml update messages by POST method.
 Not
  to use in a browser.
 
  Ahmet
 
  On Monday, May 5, 2014 11:04 AM, Victor Pascual 
  [hidden email] http://user/SendEmail.jtp?type=nodenode=4134713i=2
 wrote:
 
  Thank you very much for you help Ahmet.
 
  However the language detection is still not workin. :(
  My solrconfig.xml didn't contain that lst section inside the update
  requestHandler.
  That's the content I added:
 
requestHandler name=/update
class=solr.XmlUpdateRequestHandler
 lst name=defaults
   str name=update.chainlangid/str
 /lst
  /requestHandler
  
 
 updateRequestProcessorChain name=langid
 processor
 
 class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory

lst name=defaults
  str name=langid.fltext/str
  str name=langid.langFieldlang/str
/lst
  /processor
  processor class=solr.LogUpdateProcessorFactory /
 processor class=solr.RunUpdateProcessorFactory /
   /updateRequestProcessorChain
 
  Now, your suggested query
  http://localhost:8080/solr/update?commit=trueupdate.chain=langid returns

 
  response
  lst name=responseHeader
  int name=status0/int
  int name=QTime14/int
  /lst
  /response
  And there is still no lang field in my documents.
  Any idea what am I doing wrong?
 
 
 
 
  On Tue, Apr 29, 2014 at 5:33 PM, Ahmet Arslan [hidden 
  email]http://user/SendEmail.jtp?type=nodenode=4134713i=3
 wrote:
 
  Hi,
  
  solr/update should be used, not /solr/select
  
  curl '
 http://localhost:8983/solr/update?commit=trueupdate.chain=langid'
  
  By the way don't you have following definition in your solrconfig.xml?
  
   requestHandler name=/update class=solr.UpdateRequestHandler
  
 lst name=defaults
   str name=update.chainlangid/str
 /lst
/requestHandler
  
  
  
  
  On Tuesday, April 29, 2014 4:50 PM, Victor Pascual 
  [hidden email] http://user/SendEmail.jtp?type=nodenode=4134713i=4
 wrote:
  Hi Ahmet,
  
  thanks for your reply. Adding update.chain=langid to my query doesn't
  work: IP:8080/solr/select/?q=*%3A*update.chain=langid
  Regarding defining the chain in an UpdateRequestHandler... sorry for
 the
  lame question but shall I paste those three lines to solrconfig.xml, or
  shall I add them somewhere else?
  
  There is not UpdateRequestHandler in my solrconfig.
  
  Thanks!
  
  
  
  On Tue, Apr 29, 2014 at 3:13 PM, Ahmet Arslan [hidden 
  email]http://user/SendEmail.jtp?type=nodenode=4134713i=5
 wrote:
  
   Hi,
  
   Did you attach your chain to a UpdateRequestHandler?
  
   You can do it by adding update.chain=langid to the URL or defining
 it
  in
   a defaults section as follows
  
   lst name=defaults
str name=update.chainlangid/str
  /lst
  
  
  
   On Tuesday, April 29, 2014 3:18 PM, Victor Pascual 
   [hidden email] http://user/SendEmail.jtp?type=nodenode=4134713i=6
 wrote:
   Dear all,
  
   I'm a new user of Solr. I've managed to index a bunch of documents
 (in
   fact, they are tweets) and everything works quite smoothly.
  
   Nevertheless it looks like Solr doesn't detect the language of my
  documents
   nor remove stopwords accordingly so I can extract the most frequent
  terms.
  
   I've added this piece of XML to my solrconfig.xml as well as the Tika
  lib
   jars.
  
   updateRequestProcessorChain name=langid
  processor
  
  
 
 class

Re: sort groups by the sum of the scores of the documents within each group

2014-05-05 Thread Frankcis
thank you, Erick, you're right, the maxScore of document within each group is
more effective than the sum of scores in a group, especially some use-case
just as your assumption(group 1 could have 10M documents all with a score of
.01 and group 2 could have 1 document with a score of 1,000 and group 1
would sort 
first) ,but the function is required by the client, can you tell me the way
how to achieve it ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134856.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: sort groups by the sum of the scores of the documents within each group

2014-05-05 Thread Frankcis
thank you, Erick, you're good man,
this is the client requirement:
In the forum, there is a lot of discussion of the content under different
subjects, search for a keyword,
which will lead to a result that the word of content or subject match the
query, group these document based on every subject, sort these groups based
on the sum score of every subject.

my pleasure to listen your suggestions.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134869.html
Sent from the Solr - User mailing list archive at Nabble.com.