Re: just testing if my emails are reaching the mailing list
Hi, I got it from the solr user list. Roland uyilmaz ezt írta (időpont: 2020. okt. 14., Sze, 9:39): > Hello all, > > I have never got an answer to my questions in this mailing list yet, and > my mail client shows INVALID next to my mail address, so I thought I should > check if my emails are reaching to you. > > Can anyone reply? > > Regards > > -- > uyilmaz >
Adding several new fields to managed-schema by sorlj
Hi folks, I am using solr 8.5.0 in standalone mode and use the CoreAdmin API and Schema API of solrj to create new core and its fields in managed-schema Is there any way to add several fields to managed-schema by solrj without processing each by each? The following two rows make the job done by 4sec/field which is extremely slow: SchemaRequest.AddField schemaRequest = new SchemaRequest.AddField(fieldAttributes); SchemaResponse.UpdateResponse response = schemaRequest.process(solrC); The core is empty as the field creation is the part of the core creation process. The schema API docs says: It is possible to perform one or more add requests in a single command. The API is transactional and all commands in a single call either succeed or fail together. I am looking for the equivalent of this approach in solrj. Is there any? Cheers, Roland
allTermsRequired not working for me with suggester
Hi folks, I have allTermsRequired=true defined in the suggester component. Despite of this if I run the following query: http://localhost:8983/solr/pocwithedgengram/suggesthandler?allTermsRequired=true=*%3A*=Arany%20J%C3%A1nos I get back the following result (it is only a snipet from the result): { "term":" Tardy János", "weight":0, "payload":""}, { "term":"Arany János", "weight":0, "payload":""}, { "term":"Arany László", " weight":0, "payload":""}, I expected only the second one. How can I make suggestions for multi term queries if I would like all query terms to be found in the matched document? The typical use case is that a user has already typed in some terms fully and the last one partially. (I do not want to use terms component here because it is difficult to deal with on the client side. The user can edit any parts of his multi term expression and it is not trivial with javascript to find out which term should be queried) Cheers, Roland
highlight if the field and hl.fl has different analysis
Hi folks, I have a author field with very simple definition: I have a suggester friendly definition of this field: I do not use the suggester component as it gives back strings and I need specific documents so I apply the following approch in solrconfig: all edismax * author_ngram^5 title_ngram^10* id,imageUrl,title,price,author 374% author_ngram^15 title_ngram^30 0.1 true * author title* original As you see my queryparser searches in the author_ngram field which is a copyfield and not stored of course. On the other hand I would like to show to the customers the meaningful fields like author. Despite of this the highlighter gives back partially good results: If the author field is Arany János and I search for Arany Já, I get back Arany<-b> János. The second term is not highlighted. I need help on two issues: 1. Why did it work even partially if the analysis of the query field and the highlight fields are different? 2. If it is able to handle the different analysis what can I do to support the multi field highlighting? Thanks, Roland
unified highlighter methods works unexpected
Hi All, I use Solr 8.4.1 and implement suggester functionality. As part of the suggestions I would like to show product info so I had to implement this functionality with normal query parsers instead of suggester component. I applied an edgengramm filter without stemming to fasten the analysis of the query which is crucial for the suggester functionality. I could use the Highlight component with edismax query parser without any problem. This is a typical output if hl.method=original (this is the default): { "responseHeader":{ "status":0, "QTime":4, "params":{ "mm":"3<74%", "q":"Arany Já", "tie":"0.1", "defType":"edismax", "hl":"true", "echoParams":"all", "qf ":"author_ngram^5 title_ngram^10", "fl":"id,imageUrl,title,price", "pf":"author_ngram^15 title_ngram^30", "hl.fl":"title", "hl.method":"original", "_": "1585830768672"}}, "response":{"numFound":2,"start":0,"docs":[ { "id":"369", "title":"Arany János összes költeményei", "price":185.0, "imageUrl":" https://cdn.bknw.net/prd/covers_big/369.jpg"}, { "id":"26321", "title":"Arany János összes költeményei", "price":1400.0, "imageUrl":" https://cdn.bknw.net/prd/covers_big/26321.jpg"}] }, "highlighting":{ "369":{ "title":["\n \n Arany\n \n János összes költeményei"]}, " 26321":{ "title":["\n \n Arany\n \n János összes költeményei"]}}} If I change the method to unified, I get unexpected result: { "responseHeader":{ "status":0, "QTime":5, "params":{ "mm":"3<74%", "q":"Arany Já", "tie":"0.1", "defType":"edismax", "hl":"true", "echoParams":"all", "qf ":"author_ngram^5 title_ngram^10", "fl":"id,imageUrl,title,price", "pf":"author_ngram^15 title_ngram^30", "hl.fl":"title", "hl.method":"unified", "_":"1585830768672" }}, "response":{"numFound":2,"start":0,"docs":[ { "id":"369", "title":"Arany János összes költeményei", "price":185.0, "imageUrl":" https://cdn.bknw.net/prd/covers_big/369.jpg"}, { "id":"26321", "title":"Arany János összes költeményei", "price":1400.0, "imageUrl":" https://cdn.bknw.net/prd/covers_big/26321.jpg"}] }, "highlighting":{ "369":{ "title":[]}, "26321":{ "title":[]}}} Any idea why the newest method fails to deliver the same results? Thanks, Roland
Re: expand=true throws error
Hi Munendra, Yes, indeed it was the problem. Thank you very much your help. Expand is just a pure parameter. Now it is working. Thanks, Roland Munendra S N ezt írta (időpont: 2020. márc. 31., K, 5:22): > > Case 3 let;s extend it with expand=true: > > { "responseHeader":{ "status":0, "QTime":1, "params":{ > > "q":"author:\"William > > Shakespeare\"", "fq":"{!collapse field=title}=true", "_": > > "1585603593269"}}, > > > I think it is because, expand=true parameter is not passed properly. As you > can see from the params in the responseHeader section, q , fq are separate > keys but expand=true is appended to fq value. > > If passed correctly, it should look something like this > > > { "responseHeader":{ "status":0, "QTime":1, "params":{ > > "q":"author:\"William > > Shakespeare\"", "fq":"{!collapse field=title}", "expand": "true", "_": > > "1585603593269"}}, > > > > Regards, > Munendra S N > > > > On Tue, Mar 31, 2020 at 3:07 AM Szűcs Roland > wrote: > > > Hi Munendra, > > Let's see the 3 scenario: > > 1. Query without collapse > > 2. Query with collapse > > 3. Query with collapse and expand > > I made a mini book database for this: > > Case 1: > > { "responseHeader":{ "status":0, "QTime":0, "params":{ > > "q":"author:\"William > > Shakespeare\"", "_":"1585603593269"}}, > "response":{"numFound":4,"start":0," > > docs":[ { "id":"1", "author":"William Shakespeare", "title":"The Taming > of > > the Shrew", "format":"ebook", "_version_":1662625767773700096}, { > "id":"2", > > "author":"William Shakespeare", "title":"The Taming of the Shrew", > > "format": > > "paper", "_version_":1662625790857052160}, { "id":"3", "author":"William > > Shakespeare", "title":"The Taming of the Shrew", "format":"audiobook", " > > _version_":1662625809553162240}, { "id":"4", "author":"William > > Shakespeare", > > "title":"Much Ado about Nothing", "format":"paper", "_version_": > > 1662625868323749888}] }} > > As you can see there are 3 different format from the same book. > > > > Case 2: > > { "responseHeader":{ "status":0, "QTime":2, "params":{ > > "q":"author:\"William > > Shakespeare\"", "fq":"{!collapse field=title}", "_":"1585603593269"}}, " > > response":{"numFound":2,"start":0,"docs":[ { "id":"1", "author":"William > > Shakespeare", "title":"The Taming of the Shrew", "format":"ebook", " > > _version_":1662625767773700096}, { "id":"4", "author":"William > > Shakespeare", > > "title":"Much Ado about Nothing", "format":"paper", "_version_": > > 1662625868323749888}] }} > > Collapse post filter worked as I expected. > > Case 3 let;s extend it with expand=true: > > { "responseHeader":{ "status":0, "QTime":1, "params":{ > > "q":"author:\"William > > Shakespeare\"", "fq":"{!collapse field=title}=true", "_": > > "1585603593269"}}, "response":{"numFound":2,"start":0,"docs":[ { > "id":"1", > > " > > author":"William Shakespeare", "title":"The Taming of the Shrew", > "format": > > "ebook", "_version_":1662625767773700096}, { "id":"4", "author":"William > > Shakespeare", "title":"Much Ado about Nothing", "format":"paper", > > "_version_ > > ":1662625868323749888}] }} > > > > As you can see nothing as changed. There is no additional section of the > > response. > > > > Cheers, > > Roland > > > > Munendra S N ezt írta (időpont: 2020. márc. > 30., > > H, 17:46): > > > > > Please share the complete request. Also, does number of results change > > with > > > & without collapse. Usually title would be unique every document. If > that > > > is the case then, there won't be anything to expand right? > > > > > > On Mon, Mar 30, 2020, 8:22 PM Szűcs Roland < > szucs.rol...@bookandwalk.hu> > > > wrote: > > > > > > > Hi Munendra, > > > > I do not get error . The strange thing is that I get exactly the same > > > > response with fq={!collapse field=title} versus fq={!collapse > > > > field=title}=true. > > > > Collapse works properly as a standalone fq but expand has no impact. > > How > > > > can I have access to the "hidden" documents then? > > > > > > > > Roland > > > > > > > > Munendra S N ezt írta (időpont: 2020. > márc. > > > 30., > > > > H, 16:47): > > > > > > > > > Hey, > > > > > Could you please share the stacktrace or error message you > received? > > > > > > > > > > On Mon, Mar 30, 2020, 7:58 PM Szűcs Roland < > > > szucs.rol...@bookandwalk.hu> > > > > > wrote: > > > > > > > > > > > Hi All, > > > > > > > > > > > > I manage to use edismax queryparser in solr 8.4.1 with collapse > > > without > > > > > any > > > > > > problem. I tested it with the SOLR admin GUI. So fq={!collapse > > > > > field=title} > > > > > > worked fine. > > > > > > > > > > > > As soon as I use the example from the documentation and use: > > > > > fq={!collapse > > > > > > field=title}=true, I did not get back any additional > output > > > with > > > > > > section expanded. > > > > > > > > > > > > Any idea? > > > > > > > > > > > > Thanks in advance, > > > > > > Roland > > > > > > > > > > > > > > > > > > > > >
Re: expand=true throws error
Hi Munendra, Let's see the 3 scenario: 1. Query without collapse 2. Query with collapse 3. Query with collapse and expand I made a mini book database for this: Case 1: { "responseHeader":{ "status":0, "QTime":0, "params":{ "q":"author:\"William Shakespeare\"", "_":"1585603593269"}}, "response":{"numFound":4,"start":0," docs":[ { "id":"1", "author":"William Shakespeare", "title":"The Taming of the Shrew", "format":"ebook", "_version_":1662625767773700096}, { "id":"2", "author":"William Shakespeare", "title":"The Taming of the Shrew", "format": "paper", "_version_":1662625790857052160}, { "id":"3", "author":"William Shakespeare", "title":"The Taming of the Shrew", "format":"audiobook", " _version_":1662625809553162240}, { "id":"4", "author":"William Shakespeare", "title":"Much Ado about Nothing", "format":"paper", "_version_": 1662625868323749888}] }} As you can see there are 3 different format from the same book. Case 2: { "responseHeader":{ "status":0, "QTime":2, "params":{ "q":"author:\"William Shakespeare\"", "fq":"{!collapse field=title}", "_":"1585603593269"}}, " response":{"numFound":2,"start":0,"docs":[ { "id":"1", "author":"William Shakespeare", "title":"The Taming of the Shrew", "format":"ebook", " _version_":1662625767773700096}, { "id":"4", "author":"William Shakespeare", "title":"Much Ado about Nothing", "format":"paper", "_version_": 1662625868323749888}] }} Collapse post filter worked as I expected. Case 3 let;s extend it with expand=true: { "responseHeader":{ "status":0, "QTime":1, "params":{ "q":"author:\"William Shakespeare\"", "fq":"{!collapse field=title}=true", "_": "1585603593269"}}, "response":{"numFound":2,"start":0,"docs":[ { "id":"1", " author":"William Shakespeare", "title":"The Taming of the Shrew", "format": "ebook", "_version_":1662625767773700096}, { "id":"4", "author":"William Shakespeare", "title":"Much Ado about Nothing", "format":"paper", "_version_ ":1662625868323749888}] }} As you can see nothing as changed. There is no additional section of the response. Cheers, Roland Munendra S N ezt írta (időpont: 2020. márc. 30., H, 17:46): > Please share the complete request. Also, does number of results change with > & without collapse. Usually title would be unique every document. If that > is the case then, there won't be anything to expand right? > > On Mon, Mar 30, 2020, 8:22 PM Szűcs Roland > wrote: > > > Hi Munendra, > > I do not get error . The strange thing is that I get exactly the same > > response with fq={!collapse field=title} versus fq={!collapse > > field=title}=true. > > Collapse works properly as a standalone fq but expand has no impact. How > > can I have access to the "hidden" documents then? > > > > Roland > > > > Munendra S N ezt írta (időpont: 2020. márc. > 30., > > H, 16:47): > > > > > Hey, > > > Could you please share the stacktrace or error message you received? > > > > > > On Mon, Mar 30, 2020, 7:58 PM Szűcs Roland < > szucs.rol...@bookandwalk.hu> > > > wrote: > > > > > > > Hi All, > > > > > > > > I manage to use edismax queryparser in solr 8.4.1 with collapse > without > > > any > > > > problem. I tested it with the SOLR admin GUI. So fq={!collapse > > > field=title} > > > > worked fine. > > > > > > > > As soon as I use the example from the documentation and use: > > > fq={!collapse > > > > field=title}=true, I did not get back any additional output > with > > > > section expanded. > > > > > > > > Any idea? > > > > > > > > Thanks in advance, > > > > Roland > > > > > > > > > >
Re: expand=true throws error
Hi Munendra, I do not get error . The strange thing is that I get exactly the same response with fq={!collapse field=title} versus fq={!collapse field=title}=true. Collapse works properly as a standalone fq but expand has no impact. How can I have access to the "hidden" documents then? Roland Munendra S N ezt írta (időpont: 2020. márc. 30., H, 16:47): > Hey, > Could you please share the stacktrace or error message you received? > > On Mon, Mar 30, 2020, 7:58 PM Szűcs Roland > wrote: > > > Hi All, > > > > I manage to use edismax queryparser in solr 8.4.1 with collapse without > any > > problem. I tested it with the SOLR admin GUI. So fq={!collapse > field=title} > > worked fine. > > > > As soon as I use the example from the documentation and use: > fq={!collapse > > field=title}=true, I did not get back any additional output with > > section expanded. > > > > Any idea? > > > > Thanks in advance, > > Roland > > >
expand=true throws error
Hi All, I manage to use edismax queryparser in solr 8.4.1 with collapse without any problem. I tested it with the SOLR admin GUI. So fq={!collapse field=title} worked fine. As soon as I use the example from the documentation and use: fq={!collapse field=title}=true, I did not get back any additional output with section expanded. Any idea? Thanks in advance, Roland
spellcheccker offers less alternatives
Hi All, My question is that it is a feature or bug in solr spellchecker with the default distance measure with maxedits 2: A multiValued field includes:"József" and it's ASCIIfolding filtered version "Jozsef" to support mobile search where users usually do not waste of time to type József. When I make a query with spellcheck.q=Józzef then interestingly I got back only Jozsef as an alternative. Is it normal that in case of multiValued fields only one term is returned? Secondly, I tried collations by spellcheck.q="Józzef Atila" where the real author field includes either József Attila or Jozsef Attila. I got suggestion for Józzef like before and for Atila I got correctly Attila but I always get collations null in solrj with Solr 8.4.1. Here is my relevant solrconfig: default textSpell shortTextSpell solr.DirectSolrSpellChecker internal 0.5 2 2 5 4 0.01 schema: Thanks in advance, Roland
deduplication of suggester results are not enough
Hi All, I follow the discussion of the suggester related discussions quite a while ago. Everybody agrees that it is not the expected behaviour from a Suggester where the terms are the entities and not the documents to return the same string representation several times. One suggestion was to make deduplication on client side of Solr. It is very easy in most of the client solution as any set based data structure solve this. *But one important problem is not solved the deduplication: suggest.count*. If I have15 matches by the suggester and the suggest.count=10 and the first 9 matches are the same, I will get back only 2 after the deduplication and the remaining 5 unique terms will be never shown. What is the solution for this? Cheers, Roland
suggestion with multiple context field
Hi All, Is there any way to define multiple context fields with the suggester? It is typical use case in an ecommerce environment that the facets are listed in the sidebar, and they are acting as filter queries, when the user select them. I am looking for similar functionality for the suggester.Do you know how to solve this? A potential workaround could be using normal queries with fq parameter and N-gram based index analysis chain. Can it be fast enough to follow the speed of typing? Thanks, Roland
Re: how to add multiple value for a filter query in Solrj
Thanks Avi, it worked. Raboah, Avi ezt írta (időpont: 2020. márc. 24., K, 11:08): > You can do something like that if we are talking on the same filter query > name. > > addFilterQuery(String.format("%s:(%s %s)", filterName, value1, value2)); > > > -Original Message- > From: Szűcs Roland > Sent: Tuesday, March 24, 2020 11:35 AM > To: solr-user@lucene.apache.org > Subject: how to add multiple value for a filter query in Solrj > > Hi All, > > I use Solr 8.4.1 and the latest solrj client. > There is a field let's which can have 3 different values. If I use the > admin UI, I write to the fq the following: filterName:"value1" > filterName:"value2" and it is working as expected. > If I use solrJ SolrQuery.addFilterQuery method and call it twice like: > addFilterQuery(filterName+":\""+value1+"\""); > addFilterQuery(filterName+":\""+value2+"\""); > I got no any document back. > > Can somebody help me what syntax is appropriate with solrj to add filter > queries one by one if there is one filter field but multiple values? > > Thanks, > > Roland > > > This electronic message may contain proprietary and confidential > information of Verint Systems Inc., its affiliates and/or subsidiaries. The > information is intended to be for the use of the individual(s) or > entity(ies) named above. If you are not the intended recipient (or > authorized to receive this e-mail for the intended recipient), you may not > use, copy, disclose or distribute to anyone this message or any information > contained in this message. If you have received this electronic message in > error, please notify us by replying to this e-mail. >
how to add multiple value for a filter query in Solrj
Hi All, I use Solr 8.4.1 and the latest solrj client. There is a field let's which can have 3 different values. If I use the admin UI, I write to the fq the following: filterName:"value1" filterName:"value2" and it is working as expected. If I use solrJ SolrQuery.addFilterQuery method and call it twice like: addFilterQuery(filterName+":\""+value1+"\""); addFilterQuery(filterName+":\""+value2+"\""); I got no any document back. Can somebody help me what syntax is appropriate with solrj to add filter queries one by one if there is one filter field but multiple values? Thanks, Roland
phrase boosting by edismax
Hi all, Context: I use solr 8.4.1. A have a small database with books around 3500 documents. I recognized that I can not search on the copyfield of all fields (author, title, publisher, description) because description has different analyze workflow than the others (it has stemming and stop word removals, the others has not). That's why I query the same search expression for multiple field. If I run a query "József Attila", I get the following response (I echoed all the parameters for clarity): {status=0,QTime=52,params={facet.field=[category, format],qt=/query,debug=true,spellcheck.dictionary=[default, wordbreak],echoParams=all,indent=true,fl=title,author,publisher,description,category,price,stock,created,format,imageUrl,rows=40,version=2,q=title:"{"q":"József Attila"}" category:"{"q":"József Attila"}" publisher:"{"q":"József Attila"}" description:"{"q":"József Attila"}" author:"{"q":"József Attila"}",defType=edismax,qf=title^10 author^7 category^5 publisher^3 description,spellcheck=on,pf=title^30 author^14 category^10 publisher^6 description^2,facet.mincount=1,facet=true,wt=javabin}} When I checked in solrj debug mode what is the translated query by solr it was not what I expected based on the documentation ( https://lucene.apache.org/solr/guide/8_4/the-extended-dismax-query-parser.html#using-slop ): rawquerystring=title:"{"q":"József Attila"}" category:"{"q":"József Attila"}" publisher:"{"q":"József Attila"}" description:"{"q":"József Attila"}" author:"{"q":"József Attila"}", querystring=title:"{"q":"József Attila"}" category:"{"q":"József Attila"}" publisher:"{"q":"József Attila"}" description:"{"q":"József Attila"}" author:"{"q":"József Attila"}", parsedquery=+(DisjunctionMaxQuery(((publisher:q)^3.0 | description:q | (title:q)^10.0 | (category:q)^5.0 | (author:q)^7.0)) DisjunctionMaxQuery(((category::)^5.0)) DisjunctionMaxQuery(((publisher:józsef)^3.0 | description:józsef | (title:józsef)^10.0 | (category:József)^5.0 | (author:józsef)^7.0)) DisjunctionMaxQuery(((publisher:attila)^3.0 | description:attila | (title:attila)^10.0 | (category:Attila)^5.0 | (author:attila)^7.0)) DisjunctionMaxQuery(((category:})^5.0)) category:{ DisjunctionMaxQuery(((publisher:q)^3.0 | description:q | (title:q)^10.0 | (category:q)^5.0 | (author:q)^7.0)) DisjunctionMaxQuery(((category::)^5.0)) DisjunctionMaxQuery(((publisher:józsef)^3.0 | description:józsef | (title:józsef)^10.0 | (category:József)^5.0 | (author:józsef)^7.0)) DisjunctionMaxQuery(((publisher:attila)^3.0 | description:attila | (title:attila)^10.0 | (category:Attila)^5.0 | (author:attila)^7.0)) DisjunctionMaxQuery(((category:})^5.0)) DisjunctionMaxQuery(((publisher:q)^3.0 | description:q | (title:q)^10.0 | (category:q)^5.0 | (author:q)^7.0)) DisjunctionMaxQuery(((category::)^5.0)) DisjunctionMaxQuery(((publisher:józsef)^3.0 | description:józsef | (title:józsef)^10.0 | (category:József)^5.0 | (author:józsef)^7.0)) DisjunctionMaxQuery(((publisher:attila)^3.0 | description:attila | (title:attila)^10.0 | (category:Attila)^5.0 | (author:attila)^7.0)) DisjunctionMaxQuery(((category:})^5.0)) DisjunctionMaxQuery(((publisher:q)^3.0 | description:q | (title:q)^10.0 | (category:q)^5.0 | (author:q)^7.0)) DisjunctionMaxQuery(((category::)^5.0)) DisjunctionMaxQuery(((publisher:józsef)^3.0 | description:józsef | (title:józsef)^10.0 | (category:József)^5.0 | (author:józsef)^7.0)) DisjunctionMaxQuery(((publisher:attila)^3.0 | description:attila | (title:attila)^10.0 | (category:Attila)^5.0 | (author:attila)^7.0)) DisjunctionMaxQuery(((category:})^5.0)) DisjunctionMaxQuery(((publisher:q)^3.0 | description:q | (title:q)^10.0 | (category:q)^5.0 | (author:q)^7.0)) DisjunctionMaxQuery(((category::)^5.0)) DisjunctionMaxQuery(((publisher:józsef)^3.0 | description:józsef | (title:józsef)^10.0 | (category:József)^5.0 | (author:józsef)^7.0)) DisjunctionMaxQuery(((publisher:attila)^3.0 | description:attila | (title:attila)^10.0 | (category:Attila)^5.0 | (author:attila)^7.0)) DisjunctionMaxQuery(((category:})^5.0))) DisjunctionMaxQuery(((title:"q józsef attila q józsef attila q józsef attila q józsef attila q józsef attila")^30.0 | (author:"q józsef attila q józsef attila q józsef attila q józsef attila q józsef attila")^14.0 | (publisher:"q józsef attila q józsef attila q józsef attila q józsef attila q józsef attila")^6.0 | (description:"q józsef attila q józsef attila q józsef attila q józsef attila q józsef attila")^2.0)),parsedquery_toString=+(((publisher:q)^3.0 | description:q | (title:q)^10.0 | (category:q)^5.0 | (author:q)^7.0) ((category::)^5.0) ((publisher:józsef)^3.0 | description:józsef | (title:józsef)^10.0 | (category:József)^5.0 | (author:józsef)^7.0) ((publisher:attila)^3.0 | description:attila | (title:attila)^10.0 | (category:Attila)^5.0 | (author:attila)^7.0) ((category:})^5.0) category:{ ((publisher:q)^3.0 | description:q | (title:q)^10.0 | (category:q)^5.0 | (author:q)^7.0) ((category::)^5.0) ((publisher:józsef)^3.0 | description:józsef | (title:józsef)^10.0 |
Re: more like this query parser with faceting
Thanks David. This page I was looking for. Roland David Hastings ezt írta (időpont: 2019. aug. 12., H, 20:52): > should be fine, > https://cwiki.apache.org/confluence/display/solr/MoreLikeThisHandler > > for more info > > On Mon, Aug 12, 2019 at 2:49 PM Szűcs Roland > wrote: > > > Hi David, > > Thanks the fast reply. Am I right that I can combine fq with mlt only if > I > > use more like this as a query parser? > > > > Is there a way to achieve the same with mlt as a request handler? > > Roland > > > > David Hastings ezt írta (időpont: 2019. > > aug. > > 12., H, 20:44): > > > > > The easiest way will be to pass in a filter query (fq) > > > > > > On Mon, Aug 12, 2019 at 2:40 PM Szűcs Roland < > > szucs.rol...@bookandwalk.hu> > > > wrote: > > > > > > > Hi All, > > > > > > > > Is there any tutorial or example how to use more like this > > functionality > > > > when we have some other constraints set by the user through faceting > > > > parameters like price range, or product category for example? > > > > > > > > Cheers, > > > > Roland > > > > > > > > > >
Re: more like this query parser with faceting
Hi David, Thanks the fast reply. Am I right that I can combine fq with mlt only if I use more like this as a query parser? Is there a way to achieve the same with mlt as a request handler? Roland David Hastings ezt írta (időpont: 2019. aug. 12., H, 20:44): > The easiest way will be to pass in a filter query (fq) > > On Mon, Aug 12, 2019 at 2:40 PM Szűcs Roland > wrote: > > > Hi All, > > > > Is there any tutorial or example how to use more like this functionality > > when we have some other constraints set by the user through faceting > > parameters like price range, or product category for example? > > > > Cheers, > > Roland > > >
more like this query parser with faceting
Hi All, Is there any tutorial or example how to use more like this functionality when we have some other constraints set by the user through faceting parameters like price range, or product category for example? Cheers, Roland
Re: Problem with solr suggester in case of non-ASCII characters
Hi Erick, Thanks your advice. I already removed it from the field definition used by the suggester and it works great. I will consider to took it from the entire processing of the other fields. I have only 7000 docs with index size of 18MB so far, so the memory footprint is not a key issue for me. Best, Roland Erick Erickson ezt írta (időpont: 2019. júl. 31., Sze, 14:24): > Roland: > > Have you considered just not using stopwords anywhere? Largely they’re a > holdover > from a long time ago when every byte counted. Plus using stopwords has > “interesting” > issues with things like highlighting and phrase queries and the like. > > Sure, not using stopwords will make your index larger, but so will a > copyfield… > > Your call of course, but stopwords are over-used IMO. > > I’m stealing Walter Underwood’s thunder here ;) > > Best, > Erick > > > On Jul 30, 2019, at 2:11 PM, Szűcs Roland > wrote: > > > > Hi Furkan, > > > > Thanks the suggestion, I always forget the most effective debugging tool > > the analysis page. > > > > It turned out that "Jó" was a stop word and it was eliminated during the > > text analysis. What I will do is to create a new field type but without > > stop word removal and I will use it like this: > > > name="suggestAnalyzerFieldType">short_text_hu_without_stop_removal > > > > Thanks again > > > > Roland > > > > Furkan KAMACI ezt írta (időpont: 2019. júl. > 30., > > K, 16:17): > > > >> Hi Roland, > >> > >> Could you check Analysis tab ( > >> https://lucene.apache.org/solr/guide/8_1/analysis-screen.html) and tell > >> how > >> the term is analyzed for both query and index? > >> > >> Kind Regards, > >> Furkan KAMACI > >> > >> On Tue, Jul 30, 2019 at 4:50 PM Szűcs Roland < > szucs.rol...@bookandwalk.hu> > >> wrote: > >> > >>> Hi All, > >>> > >>> I have an author suggester (searchcomponent and the related request > >>> handler) defined in solrconfig: > >>> > >>>> > >>> > >>> author > >>> AnalyzingInfixLookupFactory > >>> DocumentDictionaryFactory > >>> BOOK_productAuthor > >>> short_text_hu > >>> suggester_infix_author > >>> false > >>> false > >>> 2 > >>> > >>> > >>> > >>> >>> startup="lazy" > > >>> > >>> true > >>> 10 > >>> author > >>> > >>> > >>> suggest > >>> > >>> > >>> > >>> Author field has just a minimal text processing in query and index time > >>> based on the following definition: > >>> >>> positionIncrementGap="100" multiValued="true"> > >>> > >>> > >>> > >>> >>> ignoreCase="true"/> > >>> > >>> > >>> > >>> > >>> >>> ignoreCase="true"/> > >>> > >>> > >>> > >>> >>> docValues="true"/> > >>> >>> docValues="true" multiValued="true"/> > >>> >>> positionIncrementGap="100"> > >>> > >>> > >>> > >>> >> words="lang/stopwords_ar.txt" > >>> ignoreCase="true"/> > >>> > >>> > >>> > >>> > >>> > >>> When I use qeries with only ASCII characters, the results are correct: > >>> "Al":{ > >>> "term":"Alexandre Dumas", "weight":0, "payload":""} > >>> > >>> When I try it with Hungarian authorname with special character: > >>> "Jó":"author":{ > >>> "Jó":{ "numFound":0, "suggestions":[]}} > >>> > >>> When I try it with three letters, it works again: > >>> "Józ":"author":{ > >>> "Józ":{ "numFound":10, "suggestions":[{ "term":"Bajza József", " > >>> weight":0, "payload":""}, { "term":"Eötvös József", "weight":0, > " > >>> payload":""}, { "term":"Eötvös József", "weight":0, > >> "payload":""}, { > >>> "term":"Eötvös József", "weight":0, "payload":""}, { > >>> "term":"József > >>> Attila", "weight":0, "payload":""}.. > >>> > >>> Any idea how can it happen that a longer string has more matches than a > >>> shorter one. It is inconsistent. What can I do to fix it as it would > >>> results poor customer experience. > >>> They would feel that sometimes they need 2 sometimes 3 characters to > get > >>> suggestions. > >>> > >>> Thanks in advance, > >>> Roland > >>> > >> > >
Re: Problem with solr suggester in case of non-ASCII characters
Hi Furkan, Thanks the suggestion, I always forget the most effective debugging tool the analysis page. It turned out that "Jó" was a stop word and it was eliminated during the text analysis. What I will do is to create a new field type but without stop word removal and I will use it like this: short_text_hu_without_stop_removal Thanks again Roland Furkan KAMACI ezt írta (időpont: 2019. júl. 30., K, 16:17): > Hi Roland, > > Could you check Analysis tab ( > https://lucene.apache.org/solr/guide/8_1/analysis-screen.html) and tell > how > the term is analyzed for both query and index? > > Kind Regards, > Furkan KAMACI > > On Tue, Jul 30, 2019 at 4:50 PM Szűcs Roland > wrote: > > > Hi All, > > > > I have an author suggester (searchcomponent and the related request > > handler) defined in solrconfig: > > > > > > > > > author > > AnalyzingInfixLookupFactory > > DocumentDictionaryFactory > > BOOK_productAuthor > > short_text_hu > > suggester_infix_author > > false > > false > > 2 > > > > > > > > > startup="lazy" > > > > > true > > 10 > > author > > > > > > suggest > > > > > > > > Author field has just a minimal text processing in query and index time > > based on the following definition: > > > positionIncrementGap="100" multiValued="true"> > > > > > > > >> ignoreCase="true"/> > > > > > > > > > >> ignoreCase="true"/> > > > > > > > >> docValues="true"/> > >> docValues="true" multiValued="true"/> > >> positionIncrementGap="100"> > > > > > > > >words="lang/stopwords_ar.txt" > > ignoreCase="true"/> > > > > > > > > > > > > When I use qeries with only ASCII characters, the results are correct: > > "Al":{ > > "term":"Alexandre Dumas", "weight":0, "payload":""} > > > > When I try it with Hungarian authorname with special character: > > "Jó":"author":{ > > "Jó":{ "numFound":0, "suggestions":[]}} > > > > When I try it with three letters, it works again: > > "Józ":"author":{ > > "Józ":{ "numFound":10, "suggestions":[{ "term":"Bajza József", " > > weight":0, "payload":""}, { "term":"Eötvös József", "weight":0, " > > payload":""}, { "term":"Eötvös József", "weight":0, > "payload":""}, { > > "term":"Eötvös József", "weight":0, "payload":""}, { > > "term":"József > > Attila", "weight":0, "payload":""}.. > > > > Any idea how can it happen that a longer string has more matches than a > > shorter one. It is inconsistent. What can I do to fix it as it would > > results poor customer experience. > > They would feel that sometimes they need 2 sometimes 3 characters to get > > suggestions. > > > > Thanks in advance, > > Roland > > >
Problem with solr suggester in case of non-ASCII characters
Hi All, I have an author suggester (searchcomponent and the related request handler) defined in solrconfig: > author AnalyzingInfixLookupFactory DocumentDictionaryFactory BOOK_productAuthor short_text_hu suggester_infix_author false false 2 true 10 author suggest Author field has just a minimal text processing in query and index time based on the following definition: When I use qeries with only ASCII characters, the results are correct: "Al":{ "term":"Alexandre Dumas", "weight":0, "payload":""} When I try it with Hungarian authorname with special character: "Jó":"author":{ "Jó":{ "numFound":0, "suggestions":[]}} When I try it with three letters, it works again: "Józ":"author":{ "Józ":{ "numFound":10, "suggestions":[{ "term":"Bajza József", " weight":0, "payload":""}, { "term":"Eötvös József", "weight":0, " payload":""}, { "term":"Eötvös József", "weight":0, "payload":""}, { "term":"Eötvös József", "weight":0, "payload":""}, { "term":"József Attila", "weight":0, "payload":""}.. Any idea how can it happen that a longer string has more matches than a shorter one. It is inconsistent. What can I do to fix it as it would results poor customer experience. They would feel that sometimes they need 2 sometimes 3 characters to get suggestions. Thanks in advance, Roland
Re: very slow frequent updates
Thanks again Jeff. I will check the documentation of join queries becasue I never used it before. Regards Roland 2016-02-24 19:07 GMT+01:00 Jeff Wartes <jwar...@whitepages.com>: > > I suspect your problem is the intersection of “very large document” and > “high rate of change”. Either of those alone would be fine. > > You’re correct, if the thing you need to search or sort by is the thing > with a high change rate, you probably aren’t going to be able to peel those > things out of your index. > > Perhaps you could work something out with join queries? So you have two > kinds of documents - book content and book price - and your high-frequency > change is limited to documents with very little data. > > > > > > On 2/24/16, 4:01 AM, "roland.sz...@booknwalk.com on behalf of Szűcs > Roland" <roland.sz...@booknwalk.com on behalf of > szucs.rol...@bookandwalk.hu> wrote: > > >I have checked it already in the ref. guide. It is stated that you can not > >search in external fields: > > > https://cwiki.apache.org/confluence/display/solr/Working+with+External+Files+and+Processes > > > >Really I am very curios that my problem is not a usual one or the case is > >that SOLR mainly focuses on search and not a kind of end-to-end support. > >How this approach works with 1 million documents with frequently changing > >prices? > > > >Thanks your time, > > > >Roland > > > >2016-02-24 12:39 GMT+01:00 Stefan Matheis <matheis.ste...@gmail.com>: > > > >> Depending of what features you do actually need, might be worth a look > >> on "External File Fields" Roland? > >> > >> -Stefan > >> > >> On Wed, Feb 24, 2016 at 12:24 PM, Szűcs Roland > >> <szucs.rol...@bookandwalk.hu> wrote: > >> > Thanks Jeff your help, > >> > > >> > Can it work in production environment? Imagine when my customer > initiate > >> a > >> > query having 1 000 docs in the result set. I can not use the > pagination > >> of > >> > SOLR as the field which is the basis of the sort is not included in > the > >> > schema for example the price. The customer wants the list in > descending > >> > order of the price. > >> > > >> > So I have to get all the 1000 docids from solr and find the metadata > of > >> > them in a sql database or in cache in best case. This is the way you > >> > suggested? Is it not too slow? > >> > > >> > Regards, > >> > Roland > >> > > >> > 2016-02-23 19:29 GMT+01:00 Jeff Wartes <jwar...@whitepages.com>: > >> > > >> >> > >> >> My suggestion would be to split your problem domain. Use Solr > >> exclusively > >> >> for search - index the id and only those fields you need to search > on. > >> Then > >> >> use some other data store for retrieval. Get the id’s from the solr > >> >> results, and look them up in the data store to get the rest of your > >> fields. > >> >> This allows you to keep your solr docs as small as possible, and you > >> only > >> >> need to update them when a *searchable* field changes. > >> >> > >> >> Every “update" in solr is a delete/insert. Even the "atomic update” > >> >> feature is just a shortcut for that. It requires stored fields > because > >> the > >> >> data from the stored fields gets copied into the new insert. > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> On 2/22/16, 12:21 PM, "Roland Szűcs" <roland.sz...@booknwalk.com> > >> wrote: > >> >> > >> >> >Hi folks, > >> >> > > >> >> >We use SOLR 5.2.1. We have ebooks stored in SOLR. The majority of > the > >> >> >fields do not change at all like content, author, publisher Only > >> the > >> >> >price field changes frequently. > >> >> > > >> >> >We let the customers to make full text search so we indexed the > content > >> >> >filed. Due to the frequency of the price updates we use the atomic > >> update > >> >> >feature. As a requirement of the atomic updates we have to store all > >> the > >> >> >fields even the content field which is 1MB/document and we did not > >> want to > >> >> >sto
Re: very slow frequent updates
I have checked it already in the ref. guide. It is stated that you can not search in external fields: https://cwiki.apache.org/confluence/display/solr/Working+with+External+Files+and+Processes Really I am very curios that my problem is not a usual one or the case is that SOLR mainly focuses on search and not a kind of end-to-end support. How this approach works with 1 million documents with frequently changing prices? Thanks your time, Roland 2016-02-24 12:39 GMT+01:00 Stefan Matheis <matheis.ste...@gmail.com>: > Depending of what features you do actually need, might be worth a look > on "External File Fields" Roland? > > -Stefan > > On Wed, Feb 24, 2016 at 12:24 PM, Szűcs Roland > <szucs.rol...@bookandwalk.hu> wrote: > > Thanks Jeff your help, > > > > Can it work in production environment? Imagine when my customer initiate > a > > query having 1 000 docs in the result set. I can not use the pagination > of > > SOLR as the field which is the basis of the sort is not included in the > > schema for example the price. The customer wants the list in descending > > order of the price. > > > > So I have to get all the 1000 docids from solr and find the metadata of > > them in a sql database or in cache in best case. This is the way you > > suggested? Is it not too slow? > > > > Regards, > > Roland > > > > 2016-02-23 19:29 GMT+01:00 Jeff Wartes <jwar...@whitepages.com>: > > > >> > >> My suggestion would be to split your problem domain. Use Solr > exclusively > >> for search - index the id and only those fields you need to search on. > Then > >> use some other data store for retrieval. Get the id’s from the solr > >> results, and look them up in the data store to get the rest of your > fields. > >> This allows you to keep your solr docs as small as possible, and you > only > >> need to update them when a *searchable* field changes. > >> > >> Every “update" in solr is a delete/insert. Even the "atomic update” > >> feature is just a shortcut for that. It requires stored fields because > the > >> data from the stored fields gets copied into the new insert. > >> > >> > >> > >> > >> > >> On 2/22/16, 12:21 PM, "Roland Szűcs" <roland.sz...@booknwalk.com> > wrote: > >> > >> >Hi folks, > >> > > >> >We use SOLR 5.2.1. We have ebooks stored in SOLR. The majority of the > >> >fields do not change at all like content, author, publisher Only > the > >> >price field changes frequently. > >> > > >> >We let the customers to make full text search so we indexed the content > >> >filed. Due to the frequency of the price updates we use the atomic > update > >> >feature. As a requirement of the atomic updates we have to store all > the > >> >fields even the content field which is 1MB/document and we did not > want to > >> >store it just index it. > >> > > >> >As we wanted to update 100 documents with atomic update it took about 3 > >> >minutes. Taking into account that our metadata /document is 1 Kb and > our > >> >content field / document is 1MB we use 1000 more memory to accelerate > the > >> >update process. > >> > > >> >I am almost 100% sure that we make something wrong. > >> > > >> >What is the best practice of the frequent updates when 99% part of a > given > >> >document is constant forever? > >> > > >> >Thank in advance > >> > > >> >-- > >> ><https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Roland > >> Szűcs > >> ><https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Connect > >> with > >> >me on Linkedin < > >> https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> > >> ><https://bookandwalk.hu/> > >> >CEO Phone: +36 1 210 81 13 > >> >Bookandwalk.hu <https://bokandwalk.hu/> > >> > > > > > > > > -- > > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Szűcs > Roland > > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> > Ismerkedjünk > > meg a Linkedin < > https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> > > -en <https://bookandwalk.hu/> > > Ügyvezető Telefon: +36 1 210 81 13 > > Bookandwalk.hu <https://bokandwalk.hu/> > -- <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Szűcs Roland <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Ismerkedjünk meg a Linkedin <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> -en <https://bookandwalk.hu/> Ügyvezető Telefon: +36 1 210 81 13 Bookandwalk.hu <https://bokandwalk.hu/>
Re: very slow frequent updates
Thanks Jeff your help, Can it work in production environment? Imagine when my customer initiate a query having 1 000 docs in the result set. I can not use the pagination of SOLR as the field which is the basis of the sort is not included in the schema for example the price. The customer wants the list in descending order of the price. So I have to get all the 1000 docids from solr and find the metadata of them in a sql database or in cache in best case. This is the way you suggested? Is it not too slow? Regards, Roland 2016-02-23 19:29 GMT+01:00 Jeff Wartes <jwar...@whitepages.com>: > > My suggestion would be to split your problem domain. Use Solr exclusively > for search - index the id and only those fields you need to search on. Then > use some other data store for retrieval. Get the id’s from the solr > results, and look them up in the data store to get the rest of your fields. > This allows you to keep your solr docs as small as possible, and you only > need to update them when a *searchable* field changes. > > Every “update" in solr is a delete/insert. Even the "atomic update” > feature is just a shortcut for that. It requires stored fields because the > data from the stored fields gets copied into the new insert. > > > > > > On 2/22/16, 12:21 PM, "Roland Szűcs" <roland.sz...@booknwalk.com> wrote: > > >Hi folks, > > > >We use SOLR 5.2.1. We have ebooks stored in SOLR. The majority of the > >fields do not change at all like content, author, publisher Only the > >price field changes frequently. > > > >We let the customers to make full text search so we indexed the content > >filed. Due to the frequency of the price updates we use the atomic update > >feature. As a requirement of the atomic updates we have to store all the > >fields even the content field which is 1MB/document and we did not want to > >store it just index it. > > > >As we wanted to update 100 documents with atomic update it took about 3 > >minutes. Taking into account that our metadata /document is 1 Kb and our > >content field / document is 1MB we use 1000 more memory to accelerate the > >update process. > > > >I am almost 100% sure that we make something wrong. > > > >What is the best practice of the frequent updates when 99% part of a given > >document is constant forever? > > > >Thank in advance > > > >-- > ><https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Roland > Szűcs > ><https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Connect > with > >me on Linkedin < > https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> > ><https://bookandwalk.hu/> > >CEO Phone: +36 1 210 81 13 > >Bookandwalk.hu <https://bokandwalk.hu/> > -- <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Szűcs Roland <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Ismerkedjünk meg a Linkedin <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> -en <https://bookandwalk.hu/> Ügyvezető Telefon: +36 1 210 81 13 Bookandwalk.hu <https://bokandwalk.hu/>
AnalyzingInfixLookupFactory, Edgengramm with multiple terms
Hi all, I have a working suggester compnenet and requesthandler in my Solr 5.2.1 instance. It is working as I expected but I need a solution which handles multiple query terms "correctly". I have string field title. Let's see the following case: title 1: Green Apple Color title 2: Apple the master of innovation title 3: Apple the master of presentation. Using Edgengramm minsize3 for the copy of the string title field I get the following: suggest.q=''Appl", all documents are matched , fine. suggest.q=''Apple inno", all documents are matched, wrong as the user expectation is to have only title 2 matched Is there any way to make the suggester component smarter to handle multi term queries as user expect. AnalyzingInfixLookupFactory was a great improvement to handle terms not only from the beginning of the expressions but from the middle or the end. I think if we can apply "AND" relationship among the multi-terms query match like in case of normal queries it can help. Any idea is appreciated -- <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Szűcs Roland <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Ismerkedjünk meg a Linkedin <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> -en <https://bookandwalk.hu/>ÜgyvezetőTelefon: +36 1 210 81 13Bookandwalk.hu <https://bokandwalk.hu/>
Re: MoreLikeThisHandler with mltipli input documents
Hi Alessandro, Exactly. The response time varies but let's have a concrete other example. This is my call: http://localhost:8983/solr/bandwpl/mlt?q=id:10812=id This is my result: { "responseHeader":{ "status":0, "QTime":6232}, "response":{"numFound":4564,"start":0,"docs":[ { "id":"11335"}, { "id":"14984"}, { "id":"13948"}, { "id":"11105"}, { "id":"12122"}, { "id":"12315"}, { "id":"19145"}, { "id":"11843"}, { "id":"11640"}, { "id":"19053"}] }, "interestingTerms":[ "content:hinduski",1.0, "content:hindus",1.0174515, "content:głowa",1.0453196, "content:życie",1.0666888, "content:czas",1.0824177, "content:kobieta",1.0927386, "content:indie",1.119314, "content:quentin",1.1349105, "content:madras",1.239089, "content:musieć",1.2626213, "content:matka",1.2966589, "content:chcieć",1.299024, "content:domu",1.3370595, "content:stać",1.4053295, "content:sari",1.4284334, "content:ojciec",1.4596463, "content:lindsay",1.5857035, "content:wiedzieć",1.6952671, "content:powiedzieć",1.8430523, "content:baba",1.8915937, "content:mieć",2.1113522, "content:Nata",2.4373012, "content:Gopal",2.518996, "content:david",3.0211911, "content:Trixie",7.082156]} Cheers, Roland 2015-09-30 10:16 GMT+02:00 Alessandro Benedetti <benedetti.ale...@gmail.com> : > I am still missing why you quote the number of the documents... > If you have 5600 polish books, but you use the MLT only when you land in > the page of a specific book ... > I think i still miss the point ! > MLT on 1 polish book, takes 7 secs ? > > > 2015-09-30 9:10 GMT+01:00 Szűcs Roland <szucs.rol...@bookandwalk.hu>: > > > Hi Alessandro, > > > > You are right. I forget to mention one important factor. For 3000 > hungarian > > e-books the approach you mentioned is absolutely fine as the response > time > > is some 0.7 sec. But when I use the same mlt for 5600 polish e-books the > > response time is 7 sec which is definetely not acceptable for the users. > > > > Regards, > > Roland > > > > 2015-09-29 17:19 GMT+02:00 Alessandro Benedetti < > > benedetti.ale...@gmail.com> > > : > > > > > Hi Roland, > > > you said "The main goal is that when a customer is on the pruduct page > ". > > > But if you are in a product page, I guess you have the product Id. > > > If you have the product id , you can simply execute the MLT request > with > > > the single Doc Id in input. > > > > > > Why do you need to calculate beforehand? > > > > > > Cheers > > > > > > 2015-09-29 15:44 GMT+01:00 Szűcs Roland <szucs.rol...@bookandwalk.hu>: > > > > > > > Hello Upayavira, > > > > > > > > The main goal is that when a customer is on the pruduct page on an > > e-book > > > > and he does not like it somehow I want to immediately offer her/him > > > > alternative e-books in the same topic. If I expect from the customer > to > > > > click on a button like "similar e-books" I lose half of them as they > > are > > > > lazy to click anywhere. So I would like to present on the product > pages > > > the > > > > alternatives of the e-books without clicking. > > > > > > > > I assumed the best idea to claculate the similar e-books for all the > > > other > > > > (n*(n-1) similarity calculation) and present only the top 5. I > planned > > to > > > > do it when our server is not busy. In this point I found the > > description > > > of > > > > mlt as a search component which seemed to be a good candidate as it > > > > calculates the similar documents to all the result set of the query. > So > > > if > > > > I say q=*:* and mlt component is enabled I get similar document for > my > > > > entire document set. The only problem was with this approach that mlt > > > > search component does not give back the
Re: MoreLikeThisHandler with mltipli input documents
Hi Alessandro, You are right. I forget to mention one important factor. For 3000 hungarian e-books the approach you mentioned is absolutely fine as the response time is some 0.7 sec. But when I use the same mlt for 5600 polish e-books the response time is 7 sec which is definetely not acceptable for the users. Regards, Roland 2015-09-29 17:19 GMT+02:00 Alessandro Benedetti <benedetti.ale...@gmail.com> : > Hi Roland, > you said "The main goal is that when a customer is on the pruduct page ". > But if you are in a product page, I guess you have the product Id. > If you have the product id , you can simply execute the MLT request with > the single Doc Id in input. > > Why do you need to calculate beforehand? > > Cheers > > 2015-09-29 15:44 GMT+01:00 Szűcs Roland <szucs.rol...@bookandwalk.hu>: > > > Hello Upayavira, > > > > The main goal is that when a customer is on the pruduct page on an e-book > > and he does not like it somehow I want to immediately offer her/him > > alternative e-books in the same topic. If I expect from the customer to > > click on a button like "similar e-books" I lose half of them as they are > > lazy to click anywhere. So I would like to present on the product pages > the > > alternatives of the e-books without clicking. > > > > I assumed the best idea to claculate the similar e-books for all the > other > > (n*(n-1) similarity calculation) and present only the top 5. I planned to > > do it when our server is not busy. In this point I found the description > of > > mlt as a search component which seemed to be a good candidate as it > > calculates the similar documents to all the result set of the query. So > if > > I say q=*:* and mlt component is enabled I get similar document for my > > entire document set. The only problem was with this approach that mlt > > search component does not give back the interesting terms for my tag > cloud > > calculation. > > > > That's why I tried to mix the flexibility of mlt compoonent (multiple > docs > > as an input accepted) with the robustness of MoreLikeThisHandler (having > > interesting terms). > > > > If there is no solution, I will use the mlt component and solve the tag > > cloud calculation other way. By the way if I am not mistaken, the 5.3.1 > > version takes the union of the feature set of the mlt component, and > > handler > > > > Best Regards, > > Roland > > > > > > > > 2015-09-29 14:38 GMT+02:00 Upayavira <u...@odoko.co.uk>: > > > > > Let's take a step back. So, you have 3000 or so docs, and you want to > > > know which documents are similar to these. > > > > > > Why do you want to know this? What feature do you need to build that > > > will use that information? Knowing this may help us to arrive at the > > > right technology for you. > > > > > > For example, you might want to investigate offline clustering > algorithms > > > (e.g. [1], which might be a bit dense to follow). A good book on > machine > > > learning if you are okay with Python is "Programming Collective > > > Intelligence" as it explains the usual algorithms with simple for loops > > > making it very clear. > > > > > > Or, you could do searches, and then cluster the results at search time > > > (so if you search for 100 docs, it will identify clusters within those > > > 100 matching documents). That might get you there. See [2] > > > > > > So, if you let us know what the end-goal is, perhaps we can suggest an > > > alternative approach, rather than burying ourselves neck-deep in MLT > > > problems. > > > > > > Upayavira > > > > > > [1] > > > > > > > > > http://mylazycoding.blogspot.co.uk/2012/03/cluster-apache-solr-data-using-apache_13.html > > > [2] https://cwiki.apache.org/confluence/display/solr/Result+Clustering > > > > > > On Tue, Sep 29, 2015, at 12:42 PM, Szűcs Roland wrote: > > > > Hello Upayavira, > > > > > > > > Thanks dealing with my issue. I have applied already the > > termVectors=true > > > > to all fileds involved in the more like this calculation. I have > just 3 > > > > 000 > > > > documents each of them is represented by a relativly big term vector > > with > > > > more than 20 000 unique terms. If I run the more like this handler > for > > a > > > > solr doc it takes close to 1 sec to get back the first 10 similar > > > > documents.
Re: MoreLikeThisHandler with mltipli input documents
Hello Upayavira, We use the ajax call and it can work when it takes only some seconds (even the 7 sec can be acceptable in this case) as the customers first focus on the product page and if they are not satisfied with the e-book they will need the offer. I am just started to scare what will happen if we move to the market of English ebooks with 1 million titles. I will try the clustering as well, or using the termvector component we can implmenet our own more like this calculation as we realized that sometimes less than 25 interesting terms are enough to make good recommendation and it can make the calculation faster. If you see my previous email with the intresting terms it shows clearly that half of the terms would be enough or even less. What a pity that there is no such a parameter for the more like this handler: mlt.interestingtermcount which would be set 25 as a default but we could modify it in the solrconfig to make the calculation less resource intensive. Thank you Upayavira and Alessandro the lots of help and effort you made. I see the options much clearer now. Cheers, Roland 2015-09-30 10:23 GMT+02:00 Upayavira <u...@odoko.co.uk>: > Could you do the MLT as a separate (AJAX) request? They appear a little > afterwards, whilst the user is already reading the page? > > Or, you could do offline clustering, in which case, overnight, you > compare every document with every other, using a (likely non-solr) > clustering algorithm, and store those in a separate core. Then you can > request those immediately after your search query. Or reindex your > content with that data stored alongside. > > Upayavira > > On Wed, Sep 30, 2015, at 09:16 AM, Alessandro Benedetti wrote: > > I am still missing why you quote the number of the documents... > > If you have 5600 polish books, but you use the MLT only when you land in > > the page of a specific book ... > > I think i still miss the point ! > > MLT on 1 polish book, takes 7 secs ? > > > > > > 2015-09-30 9:10 GMT+01:00 Szűcs Roland <szucs.rol...@bookandwalk.hu>: > > > > > Hi Alessandro, > > > > > > You are right. I forget to mention one important factor. For 3000 > hungarian > > > e-books the approach you mentioned is absolutely fine as the response > time > > > is some 0.7 sec. But when I use the same mlt for 5600 polish e-books > the > > > response time is 7 sec which is definetely not acceptable for the > users. > > > > > > Regards, > > > Roland > > > > > > 2015-09-29 17:19 GMT+02:00 Alessandro Benedetti < > > > benedetti.ale...@gmail.com> > > > : > > > > > > > Hi Roland, > > > > you said "The main goal is that when a customer is on the pruduct > page ". > > > > But if you are in a product page, I guess you have the product Id. > > > > If you have the product id , you can simply execute the MLT request > with > > > > the single Doc Id in input. > > > > > > > > Why do you need to calculate beforehand? > > > > > > > > Cheers > > > > > > > > 2015-09-29 15:44 GMT+01:00 Szűcs Roland <szucs.rol...@bookandwalk.hu > >: > > > > > > > > > Hello Upayavira, > > > > > > > > > > The main goal is that when a customer is on the pruduct page on an > > > e-book > > > > > and he does not like it somehow I want to immediately offer her/him > > > > > alternative e-books in the same topic. If I expect from the > customer to > > > > > click on a button like "similar e-books" I lose half of them as > they > > > are > > > > > lazy to click anywhere. So I would like to present on the product > pages > > > > the > > > > > alternatives of the e-books without clicking. > > > > > > > > > > I assumed the best idea to claculate the similar e-books for all > the > > > > other > > > > > (n*(n-1) similarity calculation) and present only the top 5. I > planned > > > to > > > > > do it when our server is not busy. In this point I found the > > > description > > > > of > > > > > mlt as a search component which seemed to be a good candidate as it > > > > > calculates the similar documents to all the result set of the > query. So > > > > if > > > > > I say q=*:* and mlt component is enabled I get similar document > for my > > > > > entire document set. The only problem was with this approach that > mlt > > > > > s
Re: MoreLikeThisHandler with mltipli input documents
Hi Alessandro, My original goal was to get offline suggestsion on content based similarity for every e-book we have . We wanted to run a bulk more like this calculation in the evening when the usage of our site is low and we submit a new e-book. Real time more like this can take a while as we have typically long documents (2-5MB text) with all the content indexed. When we upload a new document we wanted to recalculate the more like this suggestions and a tf-idf based tag cloouds. Both of them are delivered by the More LikeThisHandler but only for one document as you wrote. The text input is not good for us because we need the similar doc list for each of the matched document. If I put together text of 10 document I can not separate which suggestion relates to which matched document and also the tag cloud will belong to the mixed text. Most likley we will use the MoreLikeThisHandler for each of the documents and parse the json repsonse and store the result in a DQL database Thanks your help. 2015-09-29 11:18 GMT+02:00 Alessandro Benedetti <benedetti.ale...@gmail.com> : > Hi Roland, > what is your exact requirement ? > Do you want to basically build a "description" for a set of documents and > then find documents in the index, similar to this description ? > > By default , based on my experience ( and on the code) this is the entry > point for the Lucene More Like This : > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *org.apache.lucene.queries.mlt.MoreLikeThis/*** Return a query that will > > return docs like the passed lucene document ID.** @param docNum the > > documentID of the lucene doc to generate the 'More Like This" query for.* > > @return a query that will return docs like the passed lucene document > > ID.*/public Query like(int docNum) throws IOException {if (fieldNames == > > null) {// gather list of valid fields from luceneCollection > fields > > = MultiFields.getIndexedFields(ir);fieldNames = fields.toArray(new > > String[fields.size()]);}return createQuery(retrieveTerms(docNum));}* > > It means that talking about "documents" you can feed only one Solr doc. > > But you can also feed the MLT with simple text. > > So you should study better your use case and understand which option > fits better : > > 1) customising the MLT component starting from Lucene > > 2) doing some processing client side and use the "text" similarity feature. > > > Cheers > > > 2015-09-29 10:05 GMT+01:00 Roland Szűcs <roland.sz...@bookandwalk.com>: > > > Hi all, > > > > Is it possible to feed multiple solr id for a MoreLikeThisHandler? > > > > > > > > false > > details > > title,content > > 4 > > title^12 content^1 > > 2 > > 10 > > true > > json > > true > > > > > > > > when I call this: http://localhost:8983/solr/bandwhu/mlt?q=id:8=id > > it works fine. Is there any way to have a kind of "bulk" call of more > like > > this handler . I need the intresting terms as well and as far as I know > if > > i use more like this as a search component it does not return with it so > it > > is not an alternative. > > > > Thanks in advance, > > > > > > -- > > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Roland > Szűcs > > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Connect > with > > me on Linkedin < > > https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> > > <https://bookandwalk.hu/>CEOPhone: +36 1 210 81 13Bookandwalk.hu > > <https://bokandwalk.hu/> > > > > > > -- > -- > > Benedetti Alessandro > Visiting card - http://about.me/alessandro_benedetti > Blog - http://alexbenedetti.blogspot.co.uk > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England > -- <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Szűcs Roland <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Ismerkedjünk meg a Linkedin <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> -en <https://bookandwalk.hu/>ÜgyvezetőTelefon: +36 1 210 81 13Bookandwalk.hu <https://bokandwalk.hu/>
Re: MoreLikeThisHandler with mltipli input documents
Hello Upayavira, Thanks dealing with my issue. I have applied already the termVectors=true to all fileds involved in the more like this calculation. I have just 3 000 documents each of them is represented by a relativly big term vector with more than 20 000 unique terms. If I run the more like this handler for a solr doc it takes close to 1 sec to get back the first 10 similar documents. Aftwr this I have to pass the docid-s to my other application which find the cover of the e-book and other metadata and put it on the web. The end-to-end process takes too much time from customer perspective that is why I tried to find solution for offline more like this calculation. But if my app has to call the morelikethishandler for each doc it puts overhead for the offline calculation. Best Regards, Roland 2015-09-29 13:01 GMT+02:00 Upayavira <u...@odoko.co.uk>: > If MoreLikeThis is slow for large documents that are indexed, have you > enabled term vectors on the similarity fields? > > Basically, what more like this does is this: > > * decide on what terms in the source doc are "interesting", and pick the > 25 most interesting ones > * build and execute a boolean query using these interesting terms. > > Looking at the first phase of this in more detail: > > If you pass in a document using stream.body, it will analyse this > document into terms, and then calculate the most interesting terms from > that. > > If you reference document in your index with a field that is stored, it > will take the stored version, and analyse it and identify the > interesting terms from there. > > If, however, you have stored term vectors against that field, this work > is not needed. You have already done much of the work, and the > identification of your "interesting terms" will be much faster. > > Thus, on the content field of your documents, add termVectors="true" in > your schema, and re-index. Then you could well find MLT becoming a lot > more efficient. > > Upayavira > > On Tue, Sep 29, 2015, at 10:39 AM, Szűcs Roland wrote: > > Hi Alessandro, > > > > My original goal was to get offline suggestsion on content based > > similarity > > for every e-book we have . We wanted to run a bulk more like this > > calculation in the evening when the usage of our site is low and we > > submit > > a new e-book. Real time more like this can take a while as we have > > typically long documents (2-5MB text) with all the content indexed. > > > > When we upload a new document we wanted to recalculate the more like this > > suggestions and a tf-idf based tag cloouds. Both of them are delivered by > > the More LikeThisHandler but only for one document as you wrote. > > > > The text input is not good for us because we need the similar doc list > > for > > each of the matched document. If I put together text of 10 document I can > > not separate which suggestion relates to which matched document and also > > the tag cloud will belong to the mixed text. > > > > Most likley we will use the MoreLikeThisHandler for each of the documents > > and parse the json repsonse and store the result in a DQL database > > > > Thanks your help. > > > > 2015-09-29 11:18 GMT+02:00 Alessandro Benedetti > > <benedetti.ale...@gmail.com> > > : > > > > > Hi Roland, > > > what is your exact requirement ? > > > Do you want to basically build a "description" for a set of documents > and > > > then find documents in the index, similar to this description ? > > > > > > By default , based on my experience ( and on the code) this is the > entry > > > point for the Lucene More Like This : > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *org.apache.lucene.queries.mlt.MoreLikeThis/*** Return a query that > will > > > > return docs like the passed lucene document ID.** @param docNum the > > > > documentID of the lucene doc to generate the 'More Like This" query > for.* > > > > @return a query that will return docs like the passed lucene document > > > > ID.*/public Query like(int docNum) throws IOException {if > (fieldNames == > > > > null) {// gather list of valid fields from luceneCollection > > > fields > > > > = MultiFields.getIndexedFields(ir);fieldNames = fields.toArray(new > > > > String[fields.size()]);}return createQuery(retrieveTer
Re: MoreLikeThisHandler with mltipli input documents
Hello Upayavira, The main goal is that when a customer is on the pruduct page on an e-book and he does not like it somehow I want to immediately offer her/him alternative e-books in the same topic. If I expect from the customer to click on a button like "similar e-books" I lose half of them as they are lazy to click anywhere. So I would like to present on the product pages the alternatives of the e-books without clicking. I assumed the best idea to claculate the similar e-books for all the other (n*(n-1) similarity calculation) and present only the top 5. I planned to do it when our server is not busy. In this point I found the description of mlt as a search component which seemed to be a good candidate as it calculates the similar documents to all the result set of the query. So if I say q=*:* and mlt component is enabled I get similar document for my entire document set. The only problem was with this approach that mlt search component does not give back the interesting terms for my tag cloud calculation. That's why I tried to mix the flexibility of mlt compoonent (multiple docs as an input accepted) with the robustness of MoreLikeThisHandler (having interesting terms). If there is no solution, I will use the mlt component and solve the tag cloud calculation other way. By the way if I am not mistaken, the 5.3.1 version takes the union of the feature set of the mlt component, and handler Best Regards, Roland 2015-09-29 14:38 GMT+02:00 Upayavira <u...@odoko.co.uk>: > Let's take a step back. So, you have 3000 or so docs, and you want to > know which documents are similar to these. > > Why do you want to know this? What feature do you need to build that > will use that information? Knowing this may help us to arrive at the > right technology for you. > > For example, you might want to investigate offline clustering algorithms > (e.g. [1], which might be a bit dense to follow). A good book on machine > learning if you are okay with Python is "Programming Collective > Intelligence" as it explains the usual algorithms with simple for loops > making it very clear. > > Or, you could do searches, and then cluster the results at search time > (so if you search for 100 docs, it will identify clusters within those > 100 matching documents). That might get you there. See [2] > > So, if you let us know what the end-goal is, perhaps we can suggest an > alternative approach, rather than burying ourselves neck-deep in MLT > problems. > > Upayavira > > [1] > > http://mylazycoding.blogspot.co.uk/2012/03/cluster-apache-solr-data-using-apache_13.html > [2] https://cwiki.apache.org/confluence/display/solr/Result+Clustering > > On Tue, Sep 29, 2015, at 12:42 PM, Szűcs Roland wrote: > > Hello Upayavira, > > > > Thanks dealing with my issue. I have applied already the termVectors=true > > to all fileds involved in the more like this calculation. I have just 3 > > 000 > > documents each of them is represented by a relativly big term vector with > > more than 20 000 unique terms. If I run the more like this handler for a > > solr doc it takes close to 1 sec to get back the first 10 similar > > documents. Aftwr this I have to pass the docid-s to my other application > > which find the cover of the e-book and other metadata and put it on the > > web. The end-to-end process takes too much time from customer perspective > > that is why I tried to find solution for offline more like this > > calculation. But if my app has to call the morelikethishandler for each > > doc > > it puts overhead for the offline calculation. > > > > Best Regards, > > Roland > > > > 2015-09-29 13:01 GMT+02:00 Upayavira <u...@odoko.co.uk>: > > > > > If MoreLikeThis is slow for large documents that are indexed, have you > > > enabled term vectors on the similarity fields? > > > > > > Basically, what more like this does is this: > > > > > > * decide on what terms in the source doc are "interesting", and pick > the > > > 25 most interesting ones > > > * build and execute a boolean query using these interesting terms. > > > > > > Looking at the first phase of this in more detail: > > > > > > If you pass in a document using stream.body, it will analyse this > > > document into terms, and then calculate the most interesting terms from > > > that. > > > > > > If you reference document in your index with a field that is stored, it > > > will take the stored version, and analyse it and identify the > > > interesting terms from there. > > > > > > If, however, you have stored term vectors against that field, this
start solr 5.3.1 under windows and admmin GUI show 5.2.1 is running
Hi guys, I downloaded the latest version of solr to my computer. When I started solr as a standalone proccess on the default port two strange things happened: 1. I got an error message : Failed to parse command line arguments due to: Unrecognized option: -maxWaitSecs. I did not use any argumet when I saterted solr just the start command 2. When I go tothe localhost in the webbrowser I saw the attached picture. It shows that I am running version 5.2.1, although al the enviromental variable is reffered to a subdirectory of the 5.3.1 installation. Any idea? Best Regards -- <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Szűcs Roland <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Ismerkedjünk meg a Linkedin <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> -en <https://bookandwalk.hu/>ÜgyvezetőTelefon: +36 1 210 81 13Bookandwalk.hu <https://bokandwalk.hu/>
Re: commit of xml update by AJAX
Thanks Erick, Your blog post made it clear. It was looong, but not too long. Roland 2015-08-29 19:00 GMT+02:00 Erick Erickson erickerick...@gmail.com: 1 My first guess is that your autocommit section in solrconfig.xml has openSearcherfalse/openSearcher So the commitWithin happened but a new searcher was not opened thus the document is invisible. Try issuing a separate commit or change that value in solrconfig.xml and try again. Here's a lng post on all this: https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ 2 No clue since I'm pretty ajax-ignorant. 3 because curl easily downloadable at worst and most often already on someone's machine and let people at least get started. Pretty soon, though, for production situations people will use SolrJ or the like or use one of the off-the-shelf tools packaged around Solr. Best Erick On Sat, Aug 29, 2015 at 9:30 AM, Szűcs Roland szucs.rol...@bookandwalk.hu wrote: Hello SOLR experts, I am new to solr as you will see from my problem. I just try to understand how solr works. I use one core (BandW) on my locla machine and I use javascript for my learning purpose. I have a test schema.xml: with two fileds: id, title. I managed to run queries with faceting, autocomplete, etc. In all cases I used Ajax post method. For example my search was (searchWithSuggest.searchAjaxRequest is an XMLHttpRequest object): var s=document.getElementById(searchWithSuggest.inputBoxId).value; var params='q='+s+'start=0rows=10'; a=searchWithSuggest.solrServer+'/query'; searchWithSuggest.searchAjaxRequest.open(POST,a, true); searchWithSuggest.searchAjaxRequest.setRequestHeader(Content-type, application/x-www-form-urlencoded); searchWithSuggest.searchAjaxRequest.send(encodeURIComponent(params)); It worked fine. I thought that an xml update can work the same way so I tried to add and index one new document by xml(a is an XMLHttpRequest object): a.open(POST,http://localhost:8983/solr/bandw/update,true); a.setRequestHeader(Content-type, application/x-www-form-urlencoded); a.send(encodeURIComponent(stream.body=add commitWithin=5000docfield name='id'3222/fieldfield name='title'Blade/field/doc/add)); I got a response with error: missing content stream. I have changed only the a.open function call to this one: a.open(POST,http://localhost:8983/solr/bandw/update?commit=true ,true); the rest of the did not change. Finally, I got response with no error from SOLR. Later it turned out that the new doc was not indexed at all. My questions: 1. If I get no error from solr what is wrong with the second solution and how can I fix it? 2. Is there any solution to put all the parameters to the a.send call as in case of queries. I tried a.send(encodeURIComponent(commit=truestream.body=add commitWithin=5000docfield name='id'3222/fieldfield name='title'Blade/field/doc/add)); but it was not working. 3. Why 95% of the examples in SOLR wiki pages relates to curl. Is this the most efficient alternative? Is there a mapping between a curl syntax and the post request? Best Regards, Roland -- https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huSzűcs Roland https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu Ismerkedjünk meg a Linkedin https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu -en https://bookandwalk.hu/ÜgyvezetőTelefon: +36 1 210 81 13Bookandwalk.hu https://bokandwalk.hu/ -- https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huSzűcs Roland https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huIsmerkedjünk meg a Linkedin https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu -en https://bookandwalk.hu/ÜgyvezetőTelefon: +36 1 210 81 13Bookandwalk.hu https://bokandwalk.hu/
Re: commit of xml update by AJAX
Hi Upayavira, You were rigtht. I had to only replace the Content-type to appliacation/xml and it worked correctly. Roland 2015-08-30 11:22 GMT+02:00 Upayavira u...@odoko.co.uk: On Sat, Aug 29, 2015, at 05:30 PM, Szűcs Roland wrote: Hello SOLR experts, I am new to solr as you will see from my problem. I just try to understand how solr works. I use one core (BandW) on my locla machine and I use javascript for my learning purpose. I have a test schema.xml: with two fileds: id, title. I managed to run queries with faceting, autocomplete, etc. In all cases I used Ajax post method. For example my search was (searchWithSuggest.searchAjaxRequest is an XMLHttpRequest object): var s=document.getElementById(searchWithSuggest.inputBoxId).value; var params='q='+s+'start=0rows=10'; a=searchWithSuggest.solrServer+'/query'; searchWithSuggest.searchAjaxRequest.open(POST,a, true); searchWithSuggest.searchAjaxRequest.setRequestHeader(Content-type, application/x-www-form-urlencoded); searchWithSuggest.searchAjaxRequest.send(encodeURIComponent(params)); It worked fine. I thought that an xml update can work the same way so I tried to add and index one new document by xml(a is an XMLHttpRequest object): a.open(POST,http://localhost:8983/solr/bandw/update,true); a.setRequestHeader(Content-type, application/x-www-form-urlencoded); a.send(encodeURIComponent(stream.body=add commitWithin=5000docfield name='id'3222/fieldfield name='title'Blade/field/doc/add)); I got a response with error: missing content stream. I have changed only the a.open function call to this one: a.open(POST,http://localhost:8983/solr/bandw/update?commit=true ,true); the rest of the did not change. Finally, I got response with no error from SOLR. Later it turned out that the new doc was not indexed at all. My questions: 1. If I get no error from solr what is wrong with the second solution and how can I fix it? 2. Is there any solution to put all the parameters to the a.send call as in case of queries. I tried a.send(encodeURIComponent(commit=truestream.body=add commitWithin=5000docfield name='id'3222/fieldfield name='title'Blade/field/doc/add)); but it was not working. 3. Why 95% of the examples in SOLR wiki pages relates to curl. Is this the most efficient alternative? Is there a mapping between a curl syntax and the post request? Best Regards, Roland You're using a POST to fake a GET - just make the Content-type text/xml (or application/xml, I forget) and call a.send(add/add); You may need the encodeURIComponent, not sure. The stream.body feature allows you to do an HTTP GET that has a stream within it, but you are already doing a POST so it isn't needed. Upayavira -- https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huSzűcs Roland https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huIsmerkedjünk meg a Linkedin https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu -en https://bookandwalk.hu/ÜgyvezetőTelefon: +36 1 210 81 13Bookandwalk.hu https://bokandwalk.hu/
commit of xml update by AJAX
Hello SOLR experts, I am new to solr as you will see from my problem. I just try to understand how solr works. I use one core (BandW) on my locla machine and I use javascript for my learning purpose. I have a test schema.xml: with two fileds: id, title. I managed to run queries with faceting, autocomplete, etc. In all cases I used Ajax post method. For example my search was (searchWithSuggest.searchAjaxRequest is an XMLHttpRequest object): var s=document.getElementById(searchWithSuggest.inputBoxId).value; var params='q='+s+'start=0rows=10'; a=searchWithSuggest.solrServer+'/query'; searchWithSuggest.searchAjaxRequest.open(POST,a, true); searchWithSuggest.searchAjaxRequest.setRequestHeader(Content-type, application/x-www-form-urlencoded); searchWithSuggest.searchAjaxRequest.send(encodeURIComponent(params)); It worked fine. I thought that an xml update can work the same way so I tried to add and index one new document by xml(a is an XMLHttpRequest object): a.open(POST,http://localhost:8983/solr/bandw/update,true); a.setRequestHeader(Content-type, application/x-www-form-urlencoded); a.send(encodeURIComponent(stream.body=add commitWithin=5000docfield name='id'3222/fieldfield name='title'Blade/field/doc/add)); I got a response with error: missing content stream. I have changed only the a.open function call to this one: a.open(POST,http://localhost:8983/solr/bandw/update?commit=true,true); the rest of the did not change. Finally, I got response with no error from SOLR. Later it turned out that the new doc was not indexed at all. My questions: 1. If I get no error from solr what is wrong with the second solution and how can I fix it? 2. Is there any solution to put all the parameters to the a.send call as in case of queries. I tried a.send(encodeURIComponent(commit=truestream.body=add commitWithin=5000docfield name='id'3222/fieldfield name='title'Blade/field/doc/add)); but it was not working. 3. Why 95% of the examples in SOLR wiki pages relates to curl. Is this the most efficient alternative? Is there a mapping between a curl syntax and the post request? Best Regards, Roland -- https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huSzűcs Roland https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huIsmerkedjünk meg a Linkedin https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu -en https://bookandwalk.hu/ÜgyvezetőTelefon: +36 1 210 81 13Bookandwalk.hu https://bokandwalk.hu/
Re: multiple but identical suggestions in autocomplete
Hello Nutch Solr user, You are right I use DocumentDictionaryFactory as you can see in my solrconfig file searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=namesuggest_publisher/str str name=lookupImplAnalyzingInfixLookupFactory/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldpublisher/str str name=suggestAnalyzerFieldTypetext_hu_suggest_ngram/str str name=indexPathsuggester_infix_dir_publisher/str str name=weightFieldprice/str str name=builOnStartupfalse/str str name=buildOnCommitfalse/str /lst /searchComponent You wrote that you you have developed a service between the ui and solr. How can I use that one if I use javascript / ajax on the client side? Thanks, Roland 2015-08-04 16:25 GMT+02:00 Nutch Solr User nutchsolru...@gmail.com: May be you are using DocumentDictionaryFactory because HighFrequencyDictionaryFactory will never return duplicate duplicate terms. We also had same problem with *DocumentDictionaryFactory + AnalyzingInfixSuggester* We have created one service between UI and Solr which groups duplicate suggestions. and returns unique list to UI with only contains unique suggestions. - Nutch Solr User The ultimate search engine would basically understand everything in the world, and it would always give you the right thing. -- View this message in context: http://lucene.472066.n3.nabble.com/multiple-but-identical-suggestions-in-autocomplete-tp4220055p4220727.html Sent from the Solr - User mailing list archive at Nabble.com. -- https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huSzűcs Roland https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huIsmerkedjünk meg a Linkedin https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu -en https://bookandwalk.hu/ÜgyvezetőTelefon: +36 1 210 81 13Bookandwalk.hu https://bokandwalk.hu/
multiple but identical suggestions in autocomplete
Hello Guys, I use SOLR 5.2.1 and the relatively new solr.SuggestComponent. It worked fine at the beginning. I use this function to auto-complete the publisher names. I have 3000 documents and 80 publishers. When I use the autocomplete feature and I get back the name of the publishers matched as many times as many book titles they published. If suggest.q=Har and Harlequin publisher has 100 documents I get back a json with 100 suggestions with the same publisher name. Obviously it is not my intention. I would like to get back the matched publisher name once and later I will use a filter query to the selected publisher name. Any Idea how can I get identical suggestions only once? Is there any parameter I can set in solrconfig.xml to solve this? Thanks in advance, -- https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huSzűcs Roland https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huIsmerkedjünk meg a Linkedin https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu -en https://bookandwalk.hu/ÜgyvezetőTelefon: +36 1 210 81 13Bookandwalk.hu https://bokandwalk.hu/
Re: autosuggest with solr.EdgeNGramFilterFactory no result found
Thanx Erick, Your blog article was the perfect answer to my problem. Rgds, Roland 2015-07-03 18:57 GMT+02:00 Erick Erickson erickerick...@gmail.com: OK, I think you took a wrong turn at the bakery The FST-based suggesters are intended to look at the beginnings of fields. It is totally unnecessary to use ngrams, the FST that gets built does that _for_ you. Actually it builds an internal FST structure that does this en passant. For getting whole fields that are anywhere in the input field, you probably want to think about AnalyzingInfixSuggester or FreeTextSuggester. The important bit here is that you shouldn't have to do so much work... This might help: http://lucidworks.com/blog/solr-suggester/ Best, Erick On Fri, Jul 3, 2015 at 4:40 AM, Roland Szűcs roland.sz...@bookandwalk.com wrote: I tried to setup an autosuggest feature with multiple dictionaries for title , author and publisher fields. I used the solr.EdgeNGramFilterFactory to optimize the performance of the auto suggest. I have a document in the index with title: Romana. When I test the text analysis for auto suggest (on filed of title_suggest_ngram): ENGTF textraw_bytesstartendpositionLengthtypeposition rom[72 6f 6d]061word1roma[72 6f 6d 61]061word1roman[72 6f 6d 61 6e]061word1 romana[72 6f 6d 61 6e 61]061word1 If I try to run http://localhost:8983/solr/bandw/suggest?q=Roma, I get: response lst name=responseHeader int name=status0/int int name=QTime1/int /lst lst name=suggest lst name=suggest_publisher lst name=Roma int name=numFound0/int arr name=suggestions/ /lst /lst lst name=suggest_title lst name=Roma int name=numFound0/int arr name=suggestions/ /lst /lst lst name=suggest_author lst name=Roma int name=numFound0/int arr name=suggestions/ /lst /lst /lst /response my relevant field definitions: field name=id type=string indexed=true stored=true required=true multiValued=false omitNorms=true / field name=author type=text_hu indexed=true stored=true multiValued=true/ field name=title type=text_hu indexed=true stored=true multiValued=false/ field name=subtitle type=text_hu indexed=true stored=true multiValued=false/ field name=publisher type=text_hu indexed=true stored=true multiValued=false/ field name=title_suggest_ngram type=text_hu_suggest_ngram indexed=true stored=false multiValued=false omitNorms=true/ field name=author_suggest_ngram type=text_hu_suggest_ngram indexed=true stored=false multiValued=false omitNorms=true/ field name=publisher_suggest_ngram type=text_hu_suggest_ngram indexed=true stored=false multiValued=false omitNorms=true/ copyField source=title dest=title_suggest_ngram/ copyField source=author dest=author_suggest_ngram/ copyField source=publisher dest=publisher_suggest_ngram/ My EdgeNGram related field type definition: fieldType name=text_hu_suggest_ngram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_hu.txt / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=8/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_hu.txt / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType My requesthandler for suggest: requestHandler name=/suggest class=solr.SearchHandler startup=lazy lst name=defaults str name=suggesttrue/str str name=suggest.count5/str str name=suggest.dictionarysuggest_author/str str name=suggest.dictionarysuggest_title/str str name=suggest.dictionarysuggest_publisher/str /lst arr name=components strsuggest/str /arr /requestHandler And finally my searchcomponent: searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=namesuggest_title/str str name=lookupImplFSTLookupFactory/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldtitle_suggest_ngram/str str name=wightFieldprice/str str name=builOnStartuptrue/str str name=buildOnCommittrue/str /lst lst name=suggester str name=namesuggest_author/str str name=lookupImplFSTLookupFactory/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldauthor_suggest_ngram/str str name=wightFieldprice/str str name=builOnStartuptrue/str str name=buildOnCommittrue/str /lst lst name=suggester str name=namesuggest_publisher/str str name=lookupImplFSTLookupFactory/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldpublisher_suggest_ngram/str str