Re: just testing if my emails are reaching the mailing list

2020-10-14 Thread Szűcs Roland
Hi,
I got it from the solr user list.


Roland

uyilmaz  ezt írta (időpont: 2020. okt. 14.,
Sze, 9:39):

> Hello all,
>
> I have never got an answer to my questions in this mailing list yet, and
> my mail client shows INVALID next to my mail address, so I thought I should
> check if my emails are reaching to you.
>
> Can anyone reply?
>
> Regards
>
> --
> uyilmaz 
>


Adding several new fields to managed-schema by sorlj

2020-04-29 Thread Szűcs Roland
Hi folks,

I am using solr 8.5.0 in standalone mode and use the CoreAdmin API and
Schema API of solrj to create new core and its fields in managed-schema
Is there any way to add several fields to managed-schema by solrj without
processing each by each?

The following two rows make the job done by 4sec/field which is
extremely slow:
SchemaRequest.AddField schemaRequest = new
SchemaRequest.AddField(fieldAttributes);
SchemaResponse.UpdateResponse response =  schemaRequest.process(solrC);

The core is empty as the field creation is the part of the core creation
process. The schema API docs says:
It is possible to perform one or more add requests in a single command. The
API is transactional and all commands in a single call either succeed or
fail together.
I am looking for the equivalent of this approach in solrj.
Is there any?

Cheers,
Roland


allTermsRequired not working for me with suggester

2020-04-08 Thread Szűcs Roland
Hi folks,

I have allTermsRequired=true defined in the suggester component. Despite of
this if I run the following query:
http://localhost:8983/solr/pocwithedgengram/suggesthandler?allTermsRequired=true=*%3A*=Arany%20J%C3%A1nos

I get back the following result (it is only a snipet from the result):
{ "term":" Tardy János", "weight":0, "payload":""}, {
"term":"Arany
János", "weight":0, "payload":""}, { "term":"Arany László", "
weight":0,
"payload":""},

I expected only the second one.

How can I make suggestions for multi term queries if I would like all query
terms to be found in the matched document? The typical use case is that a
user has already typed in some terms fully and the last one partially.  (I
do not want to use terms component here because it is difficult to deal
with on the client side. The user can edit any parts of his multi term
expression and it is not trivial with javascript to find out which term
should be queried)

Cheers,
Roland


highlight if the field and hl.fl has different analysis

2020-04-04 Thread Szűcs Roland
Hi folks,
I have a author field with very simple definition:




  
  
  


  
  

  

I have a suggester friendly definition of this field:





  
 
 
  


  
  

  

I do not use the suggester component as it gives back strings and I need
specific documents so I apply the following approch in solrconfig:



all
edismax
* author_ngram^5 title_ngram^10*
id,imageUrl,title,price,author
374%
author_ngram^15 title_ngram^30
0.1

true
* author title*
original


As you see my queryparser searches in the author_ngram field which is a
copyfield and not stored of course. On the other hand I would like to show
to the customers the meaningful fields like author.

Despite of this the highlighter gives back partially good results:

If the author field is Arany János and I search for Arany Já, I get back
Arany<-b> János. The second term is not highlighted.

I need help on two issues:
1. Why did it work even partially if the analysis of the query field and
the highlight fields are different?
2. If it is able to handle the different analysis what can I do to support
the multi field highlighting?

Thanks,
Roland


unified highlighter methods works unexpected

2020-04-02 Thread Szűcs Roland
Hi All,

I use Solr 8.4.1 and implement suggester functionality. As part of the
suggestions I would like to show product info so I had to implement this
functionality with normal query parsers instead of suggester component. I
applied an edgengramm filter without stemming to fasten the analysis of the
query which is crucial for the suggester functionality.
I could use the Highlight component with edismax query parser without any
problem. This is a typical output if hl.method=original (this is the
default):
{ "responseHeader":{ "status":0, "QTime":4, "params":{ "mm":"3<74%", "q":"Arany
Já", "tie":"0.1", "defType":"edismax", "hl":"true", "echoParams":"all", "qf
":"author_ngram^5 title_ngram^10", "fl":"id,imageUrl,title,price",
"pf":"author_ngram^15
title_ngram^30", "hl.fl":"title", "hl.method":"original", "_":
"1585830768672"}}, "response":{"numFound":2,"start":0,"docs":[ { "id":"369",
"title":"Arany János összes költeményei", "price":185.0, "imageUrl":"
https://cdn.bknw.net/prd/covers_big/369.jpg"}, { "id":"26321", "title":"Arany
János összes költeményei", "price":1400.0, "imageUrl":"
https://cdn.bknw.net/prd/covers_big/26321.jpg"}] }, "highlighting":{ "369":{
"title":["\n \n Arany\n \n János összes költeményei"]}, "
26321":{ "title":["\n \n Arany\n \n János összes
költeményei"]}}}

If I change the method to unified, I get unexpected result:
{ "responseHeader":{ "status":0, "QTime":5, "params":{ "mm":"3<74%", "q":"Arany
Já", "tie":"0.1", "defType":"edismax", "hl":"true", "echoParams":"all", "qf
":"author_ngram^5 title_ngram^10", "fl":"id,imageUrl,title,price",
"pf":"author_ngram^15
title_ngram^30", "hl.fl":"title", "hl.method":"unified", "_":"1585830768672"
}}, "response":{"numFound":2,"start":0,"docs":[ { "id":"369", "title":"Arany
János összes költeményei", "price":185.0, "imageUrl":"
https://cdn.bknw.net/prd/covers_big/369.jpg"}, { "id":"26321", "title":"Arany
János összes költeményei", "price":1400.0, "imageUrl":"
https://cdn.bknw.net/prd/covers_big/26321.jpg"}] }, "highlighting":{ "369":{
"title":[]}, "26321":{ "title":[]}}}

Any idea why the newest method fails to deliver the same results?

Thanks,
Roland


Re: expand=true throws error

2020-03-31 Thread Szűcs Roland
Hi Munendra,

Yes, indeed it was the problem. Thank you very much your help. Expand is
just a pure parameter. Now it is working.

Thanks,
Roland

Munendra S N  ezt írta (időpont: 2020. márc. 31.,
K, 5:22):

> > Case 3 let;s extend it with expand=true:
> > { "responseHeader":{ "status":0, "QTime":1, "params":{
> > "q":"author:\"William
> > Shakespeare\"", "fq":"{!collapse field=title}=true", "_":
> > "1585603593269"}},
> >
> I think it is because, expand=true parameter is not passed properly. As you
> can see from the params in the responseHeader section, q , fq are separate
> keys but expand=true is appended to fq value.
>
> If passed correctly, it should look something like this
>
> > { "responseHeader":{ "status":0, "QTime":1, "params":{
> > "q":"author:\"William
> > Shakespeare\"", "fq":"{!collapse field=title}", "expand": "true", "_":
> > "1585603593269"}},
> >
>
> Regards,
> Munendra S N
>
>
>
> On Tue, Mar 31, 2020 at 3:07 AM Szűcs Roland 
> wrote:
>
> > Hi Munendra,
> > Let's see the 3 scenario:
> > 1. Query without collapse
> > 2. Query with collapse
> > 3. Query with collapse and expand
> > I made a mini book database for this:
> > Case 1:
> > { "responseHeader":{ "status":0, "QTime":0, "params":{
> > "q":"author:\"William
> > Shakespeare\"", "_":"1585603593269"}},
> "response":{"numFound":4,"start":0,"
> > docs":[ { "id":"1", "author":"William Shakespeare", "title":"The Taming
> of
> > the Shrew", "format":"ebook", "_version_":1662625767773700096}, {
> "id":"2",
> > "author":"William Shakespeare", "title":"The Taming of the Shrew",
> > "format":
> > "paper", "_version_":1662625790857052160}, { "id":"3", "author":"William
> > Shakespeare", "title":"The Taming of the Shrew", "format":"audiobook", "
> > _version_":1662625809553162240}, { "id":"4", "author":"William
> > Shakespeare",
> > "title":"Much Ado about Nothing", "format":"paper", "_version_":
> > 1662625868323749888}] }}
> > As you can see there are 3 different format from the same book.
> >
> > Case 2:
> > { "responseHeader":{ "status":0, "QTime":2, "params":{
> > "q":"author:\"William
> > Shakespeare\"", "fq":"{!collapse field=title}", "_":"1585603593269"}}, "
> > response":{"numFound":2,"start":0,"docs":[ { "id":"1", "author":"William
> > Shakespeare", "title":"The Taming of the Shrew", "format":"ebook", "
> > _version_":1662625767773700096}, { "id":"4", "author":"William
> > Shakespeare",
> > "title":"Much Ado about Nothing", "format":"paper", "_version_":
> > 1662625868323749888}] }}
> > Collapse post filter worked as I expected.
> > Case 3 let;s extend it with expand=true:
> > { "responseHeader":{ "status":0, "QTime":1, "params":{
> > "q":"author:\"William
> > Shakespeare\"", "fq":"{!collapse field=title}=true", "_":
> > "1585603593269"}}, "response":{"numFound":2,"start":0,"docs":[ {
> "id":"1",
> > "
> > author":"William Shakespeare", "title":"The Taming of the Shrew",
> "format":
> > "ebook", "_version_":1662625767773700096}, { "id":"4", "author":"William
> > Shakespeare", "title":"Much Ado about Nothing", "format":"paper",
> > "_version_
> > ":1662625868323749888}] }}
> >
> > As you can see nothing as changed. There is no additional section of the
> > response.
> >
> > Cheers,
> > Roland
> >
> > Munendra S N  ezt írta (időpont: 2020. márc.
> 30.,
> > H, 17:46):
> >
> > > Please share the complete request. Also, does number of results change
> > with
> > > & without collapse. Usually title would be unique every document. If
> that
> > > is  the case then, there won't be anything to expand right?
> > >
> > > On Mon, Mar 30, 2020, 8:22 PM Szűcs Roland <
> szucs.rol...@bookandwalk.hu>
> > > wrote:
> > >
> > > > Hi Munendra,
> > > > I do not get error . The strange thing is that I get exactly the same
> > > > response with fq={!collapse field=title} versus  fq={!collapse
> > > > field=title}=true.
> > > > Collapse works properly as a standalone fq but expand has no impact.
> > How
> > > > can I have access to the "hidden" documents then?
> > > >
> > > > Roland
> > > >
> > > > Munendra S N  ezt írta (időpont: 2020.
> márc.
> > > 30.,
> > > > H, 16:47):
> > > >
> > > > > Hey,
> > > > > Could you please share the stacktrace or error message you
> received?
> > > > >
> > > > > On Mon, Mar 30, 2020, 7:58 PM Szűcs Roland <
> > > szucs.rol...@bookandwalk.hu>
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I manage to use edismax queryparser in solr 8.4.1 with collapse
> > > without
> > > > > any
> > > > > > problem. I tested it with the SOLR admin GUI. So fq={!collapse
> > > > > field=title}
> > > > > > worked fine.
> > > > > >
> > > > > > As soon as I use the example from the documentation and use:
> > > > > fq={!collapse
> > > > > > field=title}=true, I did not get back any additional
> output
> > > with
> > > > > > section expanded.
> > > > > >
> > > > > > Any idea?
> > > > > >
> > > > > > Thanks in advance,
> > > > > > Roland
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: expand=true throws error

2020-03-30 Thread Szűcs Roland
Hi Munendra,
Let's see the 3 scenario:
1. Query without collapse
2. Query with collapse
3. Query with collapse and expand
I made a mini book database for this:
Case 1:
{ "responseHeader":{ "status":0, "QTime":0, "params":{ "q":"author:\"William
Shakespeare\"", "_":"1585603593269"}}, "response":{"numFound":4,"start":0,"
docs":[ { "id":"1", "author":"William Shakespeare", "title":"The Taming of
the Shrew", "format":"ebook", "_version_":1662625767773700096}, { "id":"2",
"author":"William Shakespeare", "title":"The Taming of the Shrew", "format":
"paper", "_version_":1662625790857052160}, { "id":"3", "author":"William
Shakespeare", "title":"The Taming of the Shrew", "format":"audiobook", "
_version_":1662625809553162240}, { "id":"4", "author":"William Shakespeare",
"title":"Much Ado about Nothing", "format":"paper", "_version_":
1662625868323749888}] }}
As you can see there are 3 different format from the same book.

Case 2:
{ "responseHeader":{ "status":0, "QTime":2, "params":{ "q":"author:\"William
Shakespeare\"", "fq":"{!collapse field=title}", "_":"1585603593269"}}, "
response":{"numFound":2,"start":0,"docs":[ { "id":"1", "author":"William
Shakespeare", "title":"The Taming of the Shrew", "format":"ebook", "
_version_":1662625767773700096}, { "id":"4", "author":"William Shakespeare",
"title":"Much Ado about Nothing", "format":"paper", "_version_":
1662625868323749888}] }}
Collapse post filter worked as I expected.
Case 3 let;s extend it with expand=true:
{ "responseHeader":{ "status":0, "QTime":1, "params":{ "q":"author:\"William
Shakespeare\"", "fq":"{!collapse field=title}=true", "_":
"1585603593269"}}, "response":{"numFound":2,"start":0,"docs":[ { "id":"1", "
author":"William Shakespeare", "title":"The Taming of the Shrew", "format":
"ebook", "_version_":1662625767773700096}, { "id":"4", "author":"William
Shakespeare", "title":"Much Ado about Nothing", "format":"paper", "_version_
":1662625868323749888}] }}

As you can see nothing as changed. There is no additional section of the
response.

Cheers,
Roland

Munendra S N  ezt írta (időpont: 2020. márc. 30.,
H, 17:46):

> Please share the complete request. Also, does number of results change with
> & without collapse. Usually title would be unique every document. If that
> is  the case then, there won't be anything to expand right?
>
> On Mon, Mar 30, 2020, 8:22 PM Szűcs Roland 
> wrote:
>
> > Hi Munendra,
> > I do not get error . The strange thing is that I get exactly the same
> > response with fq={!collapse field=title} versus  fq={!collapse
> > field=title}=true.
> > Collapse works properly as a standalone fq but expand has no impact. How
> > can I have access to the "hidden" documents then?
> >
> > Roland
> >
> > Munendra S N  ezt írta (időpont: 2020. márc.
> 30.,
> > H, 16:47):
> >
> > > Hey,
> > > Could you please share the stacktrace or error message you received?
> > >
> > > On Mon, Mar 30, 2020, 7:58 PM Szűcs Roland <
> szucs.rol...@bookandwalk.hu>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I manage to use edismax queryparser in solr 8.4.1 with collapse
> without
> > > any
> > > > problem. I tested it with the SOLR admin GUI. So fq={!collapse
> > > field=title}
> > > > worked fine.
> > > >
> > > > As soon as I use the example from the documentation and use:
> > > fq={!collapse
> > > > field=title}=true, I did not get back any additional output
> with
> > > > section expanded.
> > > >
> > > > Any idea?
> > > >
> > > > Thanks in advance,
> > > > Roland
> > > >
> > >
> >
>


Re: expand=true throws error

2020-03-30 Thread Szűcs Roland
Hi Munendra,
I do not get error . The strange thing is that I get exactly the same
response with fq={!collapse field=title} versus  fq={!collapse
field=title}=true.
Collapse works properly as a standalone fq but expand has no impact. How
can I have access to the "hidden" documents then?

Roland

Munendra S N  ezt írta (időpont: 2020. márc. 30.,
H, 16:47):

> Hey,
> Could you please share the stacktrace or error message you received?
>
> On Mon, Mar 30, 2020, 7:58 PM Szűcs Roland 
> wrote:
>
> > Hi All,
> >
> > I manage to use edismax queryparser in solr 8.4.1 with collapse without
> any
> > problem. I tested it with the SOLR admin GUI. So fq={!collapse
> field=title}
> > worked fine.
> >
> > As soon as I use the example from the documentation and use:
> fq={!collapse
> > field=title}=true, I did not get back any additional output with
> > section expanded.
> >
> > Any idea?
> >
> > Thanks in advance,
> > Roland
> >
>


expand=true throws error

2020-03-30 Thread Szűcs Roland
Hi All,

I manage to use edismax queryparser in solr 8.4.1 with collapse without any
problem. I tested it with the SOLR admin GUI. So fq={!collapse field=title}
worked fine.

As soon as I use the example from the documentation and use:  fq={!collapse
field=title}=true, I did not get back any additional output with
section expanded.

Any idea?

Thanks in advance,
Roland


spellcheccker offers less alternatives

2020-03-28 Thread Szűcs Roland
Hi All,

My question is that it is a feature or bug in solr spellchecker with the
default distance measure with maxedits 2:
A multiValued field includes:"József" and it's ASCIIfolding filtered
version "Jozsef" to support mobile search where users usually do not waste
of time to type József.
When I make a query with spellcheck.q=Józzef then interestingly I got back
only Jozsef as an alternative.

Is it normal that in case of multiValued fields only one term is returned?

Secondly, I tried collations by spellcheck.q="Józzef Atila" where the real
author field includes either József Attila or Jozsef Attila.

I got suggestion for Józzef like before and for Atila I got
correctly Attila but I always get collations null in solrj with Solr 8.4.1.
Here is my relevant solrconfig:


  default
  textSpell
  shortTextSpell
  solr.DirectSolrSpellChecker
  internal
  0.5
  2
  2
  5
  4
  0.01


schema:

















Thanks in advance,
Roland


deduplication of suggester results are not enough

2020-03-26 Thread Szűcs Roland
Hi All,

I follow the discussion of the suggester related discussions quite a while
ago. Everybody agrees that it is not the expected behaviour from a
Suggester where the terms are the entities and not the documents to return
the same string representation several times.

One suggestion was to make deduplication on client side of Solr. It is very
easy in most of the client solution as any set based data structure solve
this.

*But one important problem is not solved the deduplication: suggest.count*.

If I have15 matches by the suggester and the suggest.count=10 and the first
9 matches are the same, I will get back only 2 after the deduplication and
the remaining 5 unique terms will be never shown.

What is the solution for this?

Cheers,
Roland


suggestion with multiple context field

2020-03-26 Thread Szűcs Roland
Hi All,

Is there any way to define multiple context fields with the suggester?

It is typical use case in an ecommerce environment that the facets are
listed in the sidebar, and they are acting as filter queries, when the user
select them. I am looking for similar functionality for the suggester.Do
you know how to solve this?

A potential workaround could be using normal queries with fq parameter and
N-gram based index analysis chain. Can it be fast enough to follow the
speed of typing?

Thanks,
Roland


Re: how to add multiple value for a filter query in Solrj

2020-03-24 Thread Szűcs Roland
Thanks Avi, it worked.

Raboah, Avi  ezt írta (időpont: 2020. márc. 24., K,
11:08):

> You can do something like that if we are talking on the same filter query
> name.
>
> addFilterQuery(String.format("%s:(%s %s)", filterName, value1, value2));
>
>
> -Original Message-
> From: Szűcs Roland 
> Sent: Tuesday, March 24, 2020 11:35 AM
> To: solr-user@lucene.apache.org
> Subject: how to add multiple value for a filter query in Solrj
>
> Hi All,
>
> I use Solr 8.4.1 and the latest solrj client.
> There is a field let's which can have 3 different values. If I use the
> admin UI, I write to the fq the following: filterName:"value1"
> filterName:"value2" and it is working as expected.
> If I use solrJ SolrQuery.addFilterQuery method and call it twice like:
> addFilterQuery(filterName+":\""+value1+"\"");
> addFilterQuery(filterName+":\""+value2+"\"");
> I got no any document back.
>
> Can somebody help me what syntax is appropriate with solrj to add filter
> queries one by one if there is one filter field but multiple values?
>
> Thanks,
>
> Roland
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


how to add multiple value for a filter query in Solrj

2020-03-24 Thread Szűcs Roland
Hi All,

I use Solr 8.4.1 and the latest solrj client.
There is a field let's which can have 3 different values. If I use the
admin UI, I write to the fq the following: filterName:"value1"
filterName:"value2" and it is working as expected.
If I use solrJ SolrQuery.addFilterQuery method and call it twice like:
addFilterQuery(filterName+":\""+value1+"\"");
addFilterQuery(filterName+":\""+value2+"\"");
I got no any document back.

Can somebody help me what syntax is appropriate with solrj to add filter
queries one by one if there is one filter field but multiple values?

Thanks,

Roland


phrase boosting by edismax

2020-03-20 Thread Szűcs Roland
Hi all,

Context:
I use solr 8.4.1. A have a small database with books around 3500 documents.
I recognized that I can not search on the copyfield of all fields (author,
title, publisher, description) because description has different analyze
workflow than the others (it has stemming and stop word removals, the
others has not).

That's why I query the same search expression for multiple field. If I run
a query "József Attila", I get the following response (I echoed all the
parameters for clarity):
{status=0,QTime=52,params={facet.field=[category,
format],qt=/query,debug=true,spellcheck.dictionary=[default,
wordbreak],echoParams=all,indent=true,fl=title,author,publisher,description,category,price,stock,created,format,imageUrl,rows=40,version=2,q=title:"{"q":"József
Attila"}" category:"{"q":"József Attila"}" publisher:"{"q":"József
Attila"}" description:"{"q":"József Attila"}" author:"{"q":"József
Attila"}",defType=edismax,qf=title^10 author^7 category^5 publisher^3
description,spellcheck=on,pf=title^30 author^14 category^10 publisher^6
description^2,facet.mincount=1,facet=true,wt=javabin}}

When I checked in solrj debug mode what is the translated query by solr it
was not what I expected based on the documentation (
https://lucene.apache.org/solr/guide/8_4/the-extended-dismax-query-parser.html#using-slop
):
rawquerystring=title:"{"q":"József Attila"}" category:"{"q":"József
Attila"}" publisher:"{"q":"József Attila"}" description:"{"q":"József
Attila"}" author:"{"q":"József Attila"}",

querystring=title:"{"q":"József Attila"}" category:"{"q":"József Attila"}"
publisher:"{"q":"József Attila"}" description:"{"q":"József Attila"}"
author:"{"q":"József Attila"}",

parsedquery=+(DisjunctionMaxQuery(((publisher:q)^3.0 | description:q |
(title:q)^10.0 | (category:q)^5.0 | (author:q)^7.0))
DisjunctionMaxQuery(((category::)^5.0))
DisjunctionMaxQuery(((publisher:józsef)^3.0 | description:józsef |
(title:józsef)^10.0 | (category:József)^5.0 | (author:józsef)^7.0))
DisjunctionMaxQuery(((publisher:attila)^3.0 | description:attila |
(title:attila)^10.0 | (category:Attila)^5.0 | (author:attila)^7.0))
DisjunctionMaxQuery(((category:})^5.0)) category:{
DisjunctionMaxQuery(((publisher:q)^3.0 | description:q | (title:q)^10.0 |
(category:q)^5.0 | (author:q)^7.0)) DisjunctionMaxQuery(((category::)^5.0))
DisjunctionMaxQuery(((publisher:józsef)^3.0 | description:józsef |
(title:józsef)^10.0 | (category:József)^5.0 | (author:józsef)^7.0))
DisjunctionMaxQuery(((publisher:attila)^3.0 | description:attila |
(title:attila)^10.0 | (category:Attila)^5.0 | (author:attila)^7.0))
DisjunctionMaxQuery(((category:})^5.0))
DisjunctionMaxQuery(((publisher:q)^3.0 | description:q | (title:q)^10.0 |
(category:q)^5.0 | (author:q)^7.0)) DisjunctionMaxQuery(((category::)^5.0))
DisjunctionMaxQuery(((publisher:józsef)^3.0 | description:józsef |
(title:józsef)^10.0 | (category:József)^5.0 | (author:józsef)^7.0))
DisjunctionMaxQuery(((publisher:attila)^3.0 | description:attila |
(title:attila)^10.0 | (category:Attila)^5.0 | (author:attila)^7.0))
DisjunctionMaxQuery(((category:})^5.0))
DisjunctionMaxQuery(((publisher:q)^3.0 | description:q | (title:q)^10.0 |
(category:q)^5.0 | (author:q)^7.0)) DisjunctionMaxQuery(((category::)^5.0))
DisjunctionMaxQuery(((publisher:józsef)^3.0 | description:józsef |
(title:józsef)^10.0 | (category:József)^5.0 | (author:józsef)^7.0))
DisjunctionMaxQuery(((publisher:attila)^3.0 | description:attila |
(title:attila)^10.0 | (category:Attila)^5.0 | (author:attila)^7.0))
DisjunctionMaxQuery(((category:})^5.0))
DisjunctionMaxQuery(((publisher:q)^3.0 | description:q | (title:q)^10.0 |
(category:q)^5.0 | (author:q)^7.0)) DisjunctionMaxQuery(((category::)^5.0))
DisjunctionMaxQuery(((publisher:józsef)^3.0 | description:józsef |
(title:józsef)^10.0 | (category:József)^5.0 | (author:józsef)^7.0))
DisjunctionMaxQuery(((publisher:attila)^3.0 | description:attila |
(title:attila)^10.0 | (category:Attila)^5.0 | (author:attila)^7.0))
DisjunctionMaxQuery(((category:})^5.0))) DisjunctionMaxQuery(((title:"q
józsef attila q józsef attila q józsef attila q józsef attila q józsef
attila")^30.0 | (author:"q józsef attila q józsef attila q józsef attila q
józsef attila q józsef attila")^14.0 | (publisher:"q józsef attila q józsef
attila q józsef attila q józsef attila q józsef attila")^6.0 |
(description:"q józsef attila q józsef attila q józsef attila q józsef
attila q józsef attila")^2.0)),parsedquery_toString=+(((publisher:q)^3.0 |
description:q | (title:q)^10.0 | (category:q)^5.0 | (author:q)^7.0)
((category::)^5.0) ((publisher:józsef)^3.0 | description:józsef |
(title:józsef)^10.0 | (category:József)^5.0 | (author:józsef)^7.0)
((publisher:attila)^3.0 | description:attila | (title:attila)^10.0 |
(category:Attila)^5.0 | (author:attila)^7.0) ((category:})^5.0) category:{
((publisher:q)^3.0 | description:q | (title:q)^10.0 | (category:q)^5.0 |
(author:q)^7.0) ((category::)^5.0) ((publisher:józsef)^3.0 |
description:józsef | (title:józsef)^10.0 | 

Re: more like this query parser with faceting

2019-08-12 Thread Szűcs Roland
Thanks David.
This page I was looking for.

Roland

David Hastings  ezt írta (időpont: 2019. aug.
12., H, 20:52):

> should be fine,
> https://cwiki.apache.org/confluence/display/solr/MoreLikeThisHandler
>
> for more info
>
> On Mon, Aug 12, 2019 at 2:49 PM Szűcs Roland 
> wrote:
>
> > Hi David,
> > Thanks the fast reply. Am I right that I can combine fq with mlt only if
> I
> > use more like this as a query parser?
> >
> > Is there a way to achieve the same with mlt as a request handler?
> > Roland
> >
> > David Hastings  ezt írta (időpont: 2019.
> > aug.
> > 12., H, 20:44):
> >
> > > The easiest way will be to pass in a filter query (fq)
> > >
> > > On Mon, Aug 12, 2019 at 2:40 PM Szűcs Roland <
> > szucs.rol...@bookandwalk.hu>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > Is there any tutorial or example how to use more like this
> > functionality
> > > > when we have some other constraints set by the user through faceting
> > > > parameters like price range, or product category for example?
> > > >
> > > > Cheers,
> > > > Roland
> > > >
> > >
> >
>


Re: more like this query parser with faceting

2019-08-12 Thread Szűcs Roland
Hi David,
Thanks the fast reply. Am I right that I can combine fq with mlt only if I
use more like this as a query parser?

Is there a way to achieve the same with mlt as a request handler?
Roland

David Hastings  ezt írta (időpont: 2019. aug.
12., H, 20:44):

> The easiest way will be to pass in a filter query (fq)
>
> On Mon, Aug 12, 2019 at 2:40 PM Szűcs Roland 
> wrote:
>
> > Hi All,
> >
> > Is there any tutorial or example how to use more like this functionality
> > when we have some other constraints set by the user through faceting
> > parameters like price range, or product category for example?
> >
> > Cheers,
> > Roland
> >
>


more like this query parser with faceting

2019-08-12 Thread Szűcs Roland
Hi All,

Is there any tutorial or example how to use more like this functionality
when we have some other constraints set by the user through faceting
parameters like price range, or product category for example?

Cheers,
Roland


Re: Problem with solr suggester in case of non-ASCII characters

2019-07-31 Thread Szűcs Roland
Hi Erick,

Thanks your advice.
I already removed it from the field definition used by the suggester and it
works great. I will consider to took it from the entire processing of the
other fields. I have only 7000 docs with index size of 18MB so far, so  the
memory footprint is not a key issue for me.

Best,
Roland

Erick Erickson  ezt írta (időpont: 2019. júl. 31.,
Sze, 14:24):

> Roland:
>
> Have you considered just not using stopwords anywhere? Largely they’re a
> holdover
> from a long time ago when every byte counted. Plus using stopwords has
> “interesting”
> issues with things like highlighting and phrase queries and the like.
>
> Sure, not using stopwords will make your index larger, but so will a
> copyfield…
>
> Your call of course, but stopwords are over-used IMO.
>
> I’m stealing Walter Underwood’s thunder here ;)
>
> Best,
> Erick
>
> > On Jul 30, 2019, at 2:11 PM, Szűcs Roland 
> wrote:
> >
> > Hi Furkan,
> >
> > Thanks the suggestion, I always forget the most effective debugging tool
> > the analysis page.
> >
> > It turned out that "Jó" was a stop word and it was eliminated during the
> > text analysis. What I will do is to create a new field type but without
> > stop word removal and I will use it like this:
> >  > name="suggestAnalyzerFieldType">short_text_hu_without_stop_removal
> >
> > Thanks again
> >
> > Roland
> >
> > Furkan KAMACI  ezt írta (időpont: 2019. júl.
> 30.,
> > K, 16:17):
> >
> >> Hi Roland,
> >>
> >> Could you check Analysis tab (
> >> https://lucene.apache.org/solr/guide/8_1/analysis-screen.html) and tell
> >> how
> >> the term is analyzed for both query and index?
> >>
> >> Kind Regards,
> >> Furkan KAMACI
> >>
> >> On Tue, Jul 30, 2019 at 4:50 PM Szűcs Roland <
> szucs.rol...@bookandwalk.hu>
> >> wrote:
> >>
> >>> Hi All,
> >>>
> >>> I have an author suggester (searchcomponent and the related request
> >>> handler) defined in solrconfig:
> >>> 
> >>>>
> >>>
> >>>  author
> >>>  AnalyzingInfixLookupFactory
> >>>  DocumentDictionaryFactory
> >>>  BOOK_productAuthor
> >>>  short_text_hu
> >>>  suggester_infix_author
> >>>  false
> >>>  false
> >>>  2
> >>>
> >>> 
> >>>
> >>>  >>> startup="lazy" >
> >>> 
> >>>  true
> >>>  10
> >>>  author
> >>> 
> >>> 
> >>>  suggest
> >>> 
> >>> 
> >>>
> >>> Author field has just a minimal text processing in query and index time
> >>> based on the following definition:
> >>>  >>> positionIncrementGap="100" multiValued="true">
> >>>
> >>>  
> >>>  
> >>>   >>> ignoreCase="true"/>
> >>>  
> >>>
> >>>
> >>>  
> >>>   >>> ignoreCase="true"/>
> >>>  
> >>>
> >>>  
> >>>   >>> docValues="true"/>
> >>>   >>> docValues="true" multiValued="true"/>
> >>>   >>> positionIncrementGap="100">
> >>>
> >>>  
> >>>  
> >>>   >> words="lang/stopwords_ar.txt"
> >>> ignoreCase="true"/>
> >>>  
> >>>  
> >>>
> >>>  
> >>>
> >>> When I use qeries with only ASCII characters, the results are correct:
> >>> "Al":{
> >>> "term":"Alexandre Dumas", "weight":0, "payload":""}
> >>>
> >>> When I try it with Hungarian authorname with special character:
> >>> "Jó":"author":{
> >>> "Jó":{ "numFound":0, "suggestions":[]}}
> >>>
> >>> When I try it with three letters, it works again:
> >>> "Józ":"author":{
> >>> "Józ":{ "numFound":10, "suggestions":[{ "term":"Bajza József", "
> >>> weight":0, "payload":""}, { "term":"Eötvös József", "weight":0,
> "
> >>> payload":""}, { "term":"Eötvös József", "weight":0,
> >> "payload":""}, {
> >>> "term":"Eötvös József", "weight":0, "payload":""}, {
> >>> "term":"József
> >>> Attila", "weight":0, "payload":""}..
> >>>
> >>> Any idea how can it happen that a longer string has more matches than a
> >>> shorter one. It is inconsistent. What can I do to fix it as it would
> >>> results poor customer experience.
> >>> They would feel that sometimes they need 2 sometimes 3 characters to
> get
> >>> suggestions.
> >>>
> >>> Thanks in advance,
> >>> Roland
> >>>
> >>
>
>


Re: Problem with solr suggester in case of non-ASCII characters

2019-07-30 Thread Szűcs Roland
Hi Furkan,

Thanks the suggestion, I always forget the most effective debugging tool
the analysis page.

It turned out that "Jó" was a stop word and it was eliminated during the
text analysis. What I will do is to create a new field type but without
stop word removal and I will use it like this:
short_text_hu_without_stop_removal

Thanks again

Roland

Furkan KAMACI  ezt írta (időpont: 2019. júl. 30.,
K, 16:17):

> Hi Roland,
>
> Could you check Analysis tab (
> https://lucene.apache.org/solr/guide/8_1/analysis-screen.html) and tell
> how
> the term is analyzed for both query and index?
>
> Kind Regards,
> Furkan KAMACI
>
> On Tue, Jul 30, 2019 at 4:50 PM Szűcs Roland 
> wrote:
>
> > Hi All,
> >
> > I have an author suggester (searchcomponent and the related request
> > handler) defined in solrconfig:
> > 
> > >
> > 
> >   author
> >   AnalyzingInfixLookupFactory
> >   DocumentDictionaryFactory
> >   BOOK_productAuthor
> >   short_text_hu
> >   suggester_infix_author
> >   false
> >   false
> >   2
> > 
> > 
> >
> >  > startup="lazy" >
> > 
> >   true
> >   10
> >   author
> > 
> > 
> >   suggest
> > 
> > 
> >
> > Author field has just a minimal text processing in query and index time
> > based on the following definition:
> >  > positionIncrementGap="100" multiValued="true">
> > 
> >   
> >   
> >> ignoreCase="true"/>
> >   
> > 
> > 
> >   
> >> ignoreCase="true"/>
> >   
> > 
> >   
> >> docValues="true"/>
> >> docValues="true" multiValued="true"/>
> >> positionIncrementGap="100">
> > 
> >   
> >   
> >words="lang/stopwords_ar.txt"
> > ignoreCase="true"/>
> >   
> >   
> > 
> >   
> >
> > When I use qeries with only ASCII characters, the results are correct:
> > "Al":{
> > "term":"Alexandre Dumas", "weight":0, "payload":""}
> >
> > When I try it with Hungarian authorname with special character:
> > "Jó":"author":{
> > "Jó":{ "numFound":0, "suggestions":[]}}
> >
> > When I try it with three letters, it works again:
> > "Józ":"author":{
> > "Józ":{ "numFound":10, "suggestions":[{ "term":"Bajza József", "
> > weight":0, "payload":""}, { "term":"Eötvös József", "weight":0, "
> > payload":""}, { "term":"Eötvös József", "weight":0,
> "payload":""}, {
> > "term":"Eötvös József", "weight":0, "payload":""}, {
> > "term":"József
> > Attila", "weight":0, "payload":""}..
> >
> > Any idea how can it happen that a longer string has more matches than a
> > shorter one. It is inconsistent. What can I do to fix it as it would
> > results poor customer experience.
> > They would feel that sometimes they need 2 sometimes 3 characters to get
> > suggestions.
> >
> > Thanks in advance,
> > Roland
> >
>


Problem with solr suggester in case of non-ASCII characters

2019-07-30 Thread Szűcs Roland
Hi All,

I have an author suggester (searchcomponent and the related request
handler) defined in solrconfig:

>

  author
  AnalyzingInfixLookupFactory
  DocumentDictionaryFactory
  BOOK_productAuthor
  short_text_hu
  suggester_infix_author
  false
  false
  2





  true
  10
  author


  suggest



Author field has just a minimal text processing in query and index time
based on the following definition:


  
  
  
  


  
  
  

  
  
  
  

  
  
  
  
  

  

When I use qeries with only ASCII characters, the results are correct:
"Al":{
"term":"Alexandre Dumas", "weight":0, "payload":""}

When I try it with Hungarian authorname with special character:
"Jó":"author":{
"Jó":{ "numFound":0, "suggestions":[]}}

When I try it with three letters, it works again:
"Józ":"author":{
"Józ":{ "numFound":10, "suggestions":[{ "term":"Bajza József", "
weight":0, "payload":""}, { "term":"Eötvös József", "weight":0, "
payload":""}, { "term":"Eötvös József", "weight":0, "payload":""}, {
"term":"Eötvös József", "weight":0, "payload":""}, {
"term":"József
Attila", "weight":0, "payload":""}..

Any idea how can it happen that a longer string has more matches than a
shorter one. It is inconsistent. What can I do to fix it as it would
results poor customer experience.
They would feel that sometimes they need 2 sometimes 3 characters to get
suggestions.

Thanks in advance,
Roland


Re: very slow frequent updates

2016-02-24 Thread Szűcs Roland
Thanks again Jeff. I will check the documentation of join queries becasue I
never used it before.

Regards

Roland

2016-02-24 19:07 GMT+01:00 Jeff Wartes <jwar...@whitepages.com>:

>
> I suspect your problem is the intersection of “very large document” and
> “high rate of change”. Either of those alone would be fine.
>
> You’re correct, if the thing you need to search or sort by is the thing
> with a high change rate, you probably aren’t going to be able to peel those
> things out of your index.
>
> Perhaps you could work something out with join queries? So you have two
> kinds of documents - book content and book price - and your high-frequency
> change is limited to documents with very little data.
>
>
>
>
>
> On 2/24/16, 4:01 AM, "roland.sz...@booknwalk.com on behalf of Szűcs
> Roland" <roland.sz...@booknwalk.com on behalf of
> szucs.rol...@bookandwalk.hu> wrote:
>
> >I have checked it already in the ref. guide. It is stated that you can not
> >search in external fields:
> >
> https://cwiki.apache.org/confluence/display/solr/Working+with+External+Files+and+Processes
> >
> >Really I am very curios that my problem is not a usual one or the case is
> >that SOLR mainly focuses on search and not a kind of end-to-end support.
> >How this approach works with 1 million documents with frequently changing
> >prices?
> >
> >Thanks your time,
> >
> >Roland
> >
> >2016-02-24 12:39 GMT+01:00 Stefan Matheis <matheis.ste...@gmail.com>:
> >
> >> Depending of what features you do actually need, might be worth a look
> >> on "External File Fields" Roland?
> >>
> >> -Stefan
> >>
> >> On Wed, Feb 24, 2016 at 12:24 PM, Szűcs Roland
> >> <szucs.rol...@bookandwalk.hu> wrote:
> >> > Thanks Jeff your help,
> >> >
> >> > Can it work in production environment? Imagine when my customer
> initiate
> >> a
> >> > query having 1 000 docs in the result set. I can not use the
> pagination
> >> of
> >> > SOLR as the field which is the basis of the sort is not included in
> the
> >> > schema for example the price. The customer wants the list in
> descending
> >> > order of the price.
> >> >
> >> > So I have to get all the 1000 docids from solr and find the metadata
> of
> >> > them in a sql database or in cache in best case. This is the way you
> >> > suggested? Is it not too slow?
> >> >
> >> > Regards,
> >> > Roland
> >> >
> >> > 2016-02-23 19:29 GMT+01:00 Jeff Wartes <jwar...@whitepages.com>:
> >> >
> >> >>
> >> >> My suggestion would be to split your problem domain. Use Solr
> >> exclusively
> >> >> for search - index the id and only those fields you need to search
> on.
> >> Then
> >> >> use some other data store for retrieval. Get the id’s from the solr
> >> >> results, and look them up in the data store to get the rest of your
> >> fields.
> >> >> This allows you to keep your solr docs as small as possible, and you
> >> only
> >> >> need to update them when a *searchable* field changes.
> >> >>
> >> >> Every “update" in solr is a delete/insert. Even the "atomic update”
> >> >> feature is just a shortcut for that. It requires stored fields
> because
> >> the
> >> >> data from the stored fields gets copied into the new insert.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On 2/22/16, 12:21 PM, "Roland Szűcs" <roland.sz...@booknwalk.com>
> >> wrote:
> >> >>
> >> >> >Hi folks,
> >> >> >
> >> >> >We use SOLR 5.2.1. We have ebooks stored in SOLR. The majority of
> the
> >> >> >fields do not change at all like content, author, publisher Only
> >> the
> >> >> >price field changes frequently.
> >> >> >
> >> >> >We let the customers to make full text search so we indexed the
> content
> >> >> >filed. Due to the frequency of the price updates we use the atomic
> >> update
> >> >> >feature. As a requirement of the atomic updates we have to store all
> >> the
> >> >> >fields even the content field which is 1MB/document and we did not
> >> want to
> >> >> >sto

Re: very slow frequent updates

2016-02-24 Thread Szűcs Roland
I have checked it already in the ref. guide. It is stated that you can not
search in external fields:
https://cwiki.apache.org/confluence/display/solr/Working+with+External+Files+and+Processes

Really I am very curios that my problem is not a usual one or the case is
that SOLR mainly focuses on search and not a kind of end-to-end support.
How this approach works with 1 million documents with frequently changing
prices?

Thanks your time,

Roland

2016-02-24 12:39 GMT+01:00 Stefan Matheis <matheis.ste...@gmail.com>:

> Depending of what features you do actually need, might be worth a look
> on "External File Fields" Roland?
>
> -Stefan
>
> On Wed, Feb 24, 2016 at 12:24 PM, Szűcs Roland
> <szucs.rol...@bookandwalk.hu> wrote:
> > Thanks Jeff your help,
> >
> > Can it work in production environment? Imagine when my customer initiate
> a
> > query having 1 000 docs in the result set. I can not use the pagination
> of
> > SOLR as the field which is the basis of the sort is not included in the
> > schema for example the price. The customer wants the list in descending
> > order of the price.
> >
> > So I have to get all the 1000 docids from solr and find the metadata of
> > them in a sql database or in cache in best case. This is the way you
> > suggested? Is it not too slow?
> >
> > Regards,
> > Roland
> >
> > 2016-02-23 19:29 GMT+01:00 Jeff Wartes <jwar...@whitepages.com>:
> >
> >>
> >> My suggestion would be to split your problem domain. Use Solr
> exclusively
> >> for search - index the id and only those fields you need to search on.
> Then
> >> use some other data store for retrieval. Get the id’s from the solr
> >> results, and look them up in the data store to get the rest of your
> fields.
> >> This allows you to keep your solr docs as small as possible, and you
> only
> >> need to update them when a *searchable* field changes.
> >>
> >> Every “update" in solr is a delete/insert. Even the "atomic update”
> >> feature is just a shortcut for that. It requires stored fields because
> the
> >> data from the stored fields gets copied into the new insert.
> >>
> >>
> >>
> >>
> >>
> >> On 2/22/16, 12:21 PM, "Roland Szűcs" <roland.sz...@booknwalk.com>
> wrote:
> >>
> >> >Hi folks,
> >> >
> >> >We use SOLR 5.2.1. We have ebooks stored in SOLR. The majority of the
> >> >fields do not change at all like content, author, publisher Only
> the
> >> >price field changes frequently.
> >> >
> >> >We let the customers to make full text search so we indexed the content
> >> >filed. Due to the frequency of the price updates we use the atomic
> update
> >> >feature. As a requirement of the atomic updates we have to store all
> the
> >> >fields even the content field which is 1MB/document and we did not
> want to
> >> >store it just index it.
> >> >
> >> >As we wanted to update 100 documents with atomic update it took about 3
> >> >minutes. Taking into account that our metadata /document is 1 Kb and
> our
> >> >content field / document is 1MB we use 1000 more memory to accelerate
> the
> >> >update process.
> >> >
> >> >I am almost 100% sure that we make something wrong.
> >> >
> >> >What is the best practice of the frequent updates when 99% part of a
> given
> >> >document is constant forever?
> >> >
> >> >Thank in advance
> >> >
> >> >--
> >> ><https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Roland
> >> Szűcs
> >> ><https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Connect
> >> with
> >> >me on Linkedin <
> >> https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
> >> ><https://bookandwalk.hu/>
> >> >CEO Phone: +36 1 210 81 13
> >> >Bookandwalk.hu <https://bokandwalk.hu/>
> >>
> >
> >
> >
> > --
> > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Szűcs
> Roland
> > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
> Ismerkedjünk
> > meg a Linkedin <
> https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
> > -en <https://bookandwalk.hu/>
> > Ügyvezető Telefon: +36 1 210 81 13
> > Bookandwalk.hu <https://bokandwalk.hu/>
>



-- 
<https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Szűcs Roland
<https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Ismerkedjünk
meg a Linkedin <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
-en <https://bookandwalk.hu/>
Ügyvezető Telefon: +36 1 210 81 13
Bookandwalk.hu <https://bokandwalk.hu/>


Re: very slow frequent updates

2016-02-24 Thread Szűcs Roland
Thanks Jeff your help,

Can it work in production environment? Imagine when my customer initiate a
query having 1 000 docs in the result set. I can not use the pagination of
SOLR as the field which is the basis of the sort is not included in the
schema for example the price. The customer wants the list in descending
order of the price.

So I have to get all the 1000 docids from solr and find the metadata of
them in a sql database or in cache in best case. This is the way you
suggested? Is it not too slow?

Regards,
Roland

2016-02-23 19:29 GMT+01:00 Jeff Wartes <jwar...@whitepages.com>:

>
> My suggestion would be to split your problem domain. Use Solr exclusively
> for search - index the id and only those fields you need to search on. Then
> use some other data store for retrieval. Get the id’s from the solr
> results, and look them up in the data store to get the rest of your fields.
> This allows you to keep your solr docs as small as possible, and you only
> need to update them when a *searchable* field changes.
>
> Every “update" in solr is a delete/insert. Even the "atomic update”
> feature is just a shortcut for that. It requires stored fields because the
> data from the stored fields gets copied into the new insert.
>
>
>
>
>
> On 2/22/16, 12:21 PM, "Roland Szűcs" <roland.sz...@booknwalk.com> wrote:
>
> >Hi folks,
> >
> >We use SOLR 5.2.1. We have ebooks stored in SOLR. The majority of the
> >fields do not change at all like content, author, publisher Only the
> >price field changes frequently.
> >
> >We let the customers to make full text search so we indexed the content
> >filed. Due to the frequency of the price updates we use the atomic update
> >feature. As a requirement of the atomic updates we have to store all the
> >fields even the content field which is 1MB/document and we did not want to
> >store it just index it.
> >
> >As we wanted to update 100 documents with atomic update it took about 3
> >minutes. Taking into account that our metadata /document is 1 Kb and our
> >content field / document is 1MB we use 1000 more memory to accelerate the
> >update process.
> >
> >I am almost 100% sure that we make something wrong.
> >
> >What is the best practice of the frequent updates when 99% part of a given
> >document is constant forever?
> >
> >Thank in advance
> >
> >--
> ><https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Roland
> Szűcs
> ><https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Connect
> with
> >me on Linkedin <
> https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
> ><https://bookandwalk.hu/>
> >CEO Phone: +36 1 210 81 13
> >Bookandwalk.hu <https://bokandwalk.hu/>
>



-- 
<https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Szűcs Roland
<https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> Ismerkedjünk
meg a Linkedin <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
-en <https://bookandwalk.hu/>
Ügyvezető Telefon: +36 1 210 81 13
Bookandwalk.hu <https://bokandwalk.hu/>


AnalyzingInfixLookupFactory, Edgengramm with multiple terms

2015-11-20 Thread Szűcs Roland
Hi all,

I have a working suggester compnenet and requesthandler in my Solr 5.2.1
instance. It is working as I expected but I need a solution which handles
multiple query terms "correctly".

I have string field title. Let's see the following case:
title 1: Green Apple Color
title 2: Apple the master of innovation
title 3: Apple the master of presentation.
Using Edgengramm minsize3 for the copy of the string title field I get the
following:
suggest.q=''Appl", all documents are matched , fine.

suggest.q=''Apple inno", all documents are matched, wrong as the user
expectation is to have only title 2 matched

Is there any way to make the suggester component smarter to handle multi
term queries as user expect. AnalyzingInfixLookupFactory was a great
improvement to handle terms not only from the beginning of the expressions
but from the middle or the end.

I think if we can apply "AND" relationship among the multi-terms query
match like in case of normal queries it can help.

Any idea is appreciated
-- 
<https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Szűcs Roland
<https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Ismerkedjünk
meg a Linkedin <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
-en <https://bookandwalk.hu/>ÜgyvezetőTelefon: +36 1 210 81 13Bookandwalk.hu
<https://bokandwalk.hu/>


Re: MoreLikeThisHandler with mltipli input documents

2015-09-30 Thread Szűcs Roland
Hi Alessandro,

Exactly. The response time varies but let's have a concrete other example.
This is my call: http://localhost:8983/solr/bandwpl/mlt?q=id:10812=id

This is my result:

{
  "responseHeader":{
"status":0,
"QTime":6232},
  "response":{"numFound":4564,"start":0,"docs":[
  {
"id":"11335"},
  {
"id":"14984"},
  {
"id":"13948"},
  {
"id":"11105"},
  {
"id":"12122"},
  {
"id":"12315"},
  {
"id":"19145"},
  {
"id":"11843"},
  {
"id":"11640"},
  {
"id":"19053"}]
  },
  "interestingTerms":[
"content:hinduski",1.0,
"content:hindus",1.0174515,
"content:głowa",1.0453196,
"content:życie",1.0666888,
"content:czas",1.0824177,
"content:kobieta",1.0927386,
"content:indie",1.119314,
"content:quentin",1.1349105,
"content:madras",1.239089,
"content:musieć",1.2626213,
"content:matka",1.2966589,
"content:chcieć",1.299024,
"content:domu",1.3370595,
"content:stać",1.4053295,
"content:sari",1.4284334,
"content:ojciec",1.4596463,
"content:lindsay",1.5857035,
"content:wiedzieć",1.6952671,
"content:powiedzieć",1.8430523,
"content:baba",1.8915937,
"content:mieć",2.1113522,
"content:Nata",2.4373012,
"content:Gopal",2.518996,
"content:david",3.0211911,
"content:Trixie",7.082156]}


Cheers,

Roland


2015-09-30 10:16 GMT+02:00 Alessandro Benedetti <benedetti.ale...@gmail.com>
:

> I am still missing why you quote the number of the documents...
> If you have 5600 polish books, but you use the MLT only when you land in
> the page of a specific book ...
> I think i still miss the point !
> MLT on 1 polish book, takes 7 secs ?
>
>
> 2015-09-30 9:10 GMT+01:00 Szűcs Roland <szucs.rol...@bookandwalk.hu>:
>
> > Hi Alessandro,
> >
> > You are right. I forget to mention one important factor. For 3000
> hungarian
> > e-books the approach you mentioned is absolutely fine as the response
> time
> > is some 0.7 sec. But when I use the same mlt for 5600 polish e-books the
> > response time is 7 sec which is definetely not acceptable for the users.
> >
> > Regards,
> > Roland
> >
> > 2015-09-29 17:19 GMT+02:00 Alessandro Benedetti <
> > benedetti.ale...@gmail.com>
> > :
> >
> > > Hi Roland,
> > > you said "The main goal is that when a customer is on the pruduct page
> ".
> > > But if you are in a  product page, I guess you have the product Id.
> > > If you have the product id , you can simply execute the MLT request
> with
> > > the single Doc Id in input.
> > >
> > > Why do you need to calculate beforehand?
> > >
> > > Cheers
> > >
> > > 2015-09-29 15:44 GMT+01:00 Szűcs Roland <szucs.rol...@bookandwalk.hu>:
> > >
> > > > Hello Upayavira,
> > > >
> > > > The main goal is that when a customer is on the pruduct page on an
> > e-book
> > > > and he does not like it somehow I want to immediately offer her/him
> > > > alternative e-books in the same topic. If I expect from the customer
> to
> > > > click on a button like "similar e-books" I lose half of them as they
> > are
> > > > lazy to click anywhere. So I would like to present on the product
> pages
> > > the
> > > > alternatives of the e-books  without clicking.
> > > >
> > > > I assumed the best idea to claculate the similar e-books for all the
> > > other
> > > > (n*(n-1) similarity calculation) and present only the top 5. I
> planned
> > to
> > > > do it when our server is not busy. In this point I found the
> > description
> > > of
> > > > mlt as a search component which seemed to be a good candidate as it
> > > > calculates the similar documents to all the result set of the query.
> So
> > > if
> > > > I say q=*:* and mlt component is enabled I get similar document for
> my
> > > > entire document set. The only problem was with this approach that mlt
> > > > search component does not give back the 

Re: MoreLikeThisHandler with mltipli input documents

2015-09-30 Thread Szűcs Roland
Hi Alessandro,

You are right. I forget to mention one important factor. For 3000 hungarian
e-books the approach you mentioned is absolutely fine as the response time
is some 0.7 sec. But when I use the same mlt for 5600 polish e-books the
response time is 7 sec which is definetely not acceptable for the users.

Regards,
Roland

2015-09-29 17:19 GMT+02:00 Alessandro Benedetti <benedetti.ale...@gmail.com>
:

> Hi Roland,
> you said "The main goal is that when a customer is on the pruduct page ".
> But if you are in a  product page, I guess you have the product Id.
> If you have the product id , you can simply execute the MLT request with
> the single Doc Id in input.
>
> Why do you need to calculate beforehand?
>
> Cheers
>
> 2015-09-29 15:44 GMT+01:00 Szűcs Roland <szucs.rol...@bookandwalk.hu>:
>
> > Hello Upayavira,
> >
> > The main goal is that when a customer is on the pruduct page on an e-book
> > and he does not like it somehow I want to immediately offer her/him
> > alternative e-books in the same topic. If I expect from the customer to
> > click on a button like "similar e-books" I lose half of them as they are
> > lazy to click anywhere. So I would like to present on the product pages
> the
> > alternatives of the e-books  without clicking.
> >
> > I assumed the best idea to claculate the similar e-books for all the
> other
> > (n*(n-1) similarity calculation) and present only the top 5. I planned to
> > do it when our server is not busy. In this point I found the description
> of
> > mlt as a search component which seemed to be a good candidate as it
> > calculates the similar documents to all the result set of the query. So
> if
> > I say q=*:* and mlt component is enabled I get similar document for my
> > entire document set. The only problem was with this approach that mlt
> > search component does not give back the interesting terms for my tag
> cloud
> > calculation.
> >
> > That's why I tried to mix the flexibility of mlt compoonent (multiple
> docs
> > as an input accepted) with the robustness of MoreLikeThisHandler (having
> > interesting terms).
> >
> > If there is no solution, I will use the mlt component and solve the tag
> > cloud calculation other way. By the way if I am not mistaken, the 5.3.1
> > version takes the union of the feature set of the mlt component, and
> > handler
> >
> > Best Regards,
> > Roland
> >
> >
> >
> > 2015-09-29 14:38 GMT+02:00 Upayavira <u...@odoko.co.uk>:
> >
> > > Let's take a step back. So, you have 3000 or so docs, and you want to
> > > know which documents are similar to these.
> > >
> > > Why do you want to know this? What feature do you need to build that
> > > will use that information? Knowing this may help us to arrive at the
> > > right technology for you.
> > >
> > > For example, you might want to investigate offline clustering
> algorithms
> > > (e.g. [1], which might be a bit dense to follow). A good book on
> machine
> > > learning if you are okay with Python is "Programming Collective
> > > Intelligence" as it explains the usual algorithms with simple for loops
> > > making it very clear.
> > >
> > > Or, you could do searches, and then cluster the results at search time
> > > (so if you search for 100 docs, it will identify clusters within those
> > > 100 matching documents). That might get you there. See [2]
> > >
> > > So, if you let us know what the end-goal is, perhaps we can suggest an
> > > alternative approach, rather than burying ourselves neck-deep in MLT
> > > problems.
> > >
> > > Upayavira
> > >
> > > [1]
> > >
> > >
> >
> http://mylazycoding.blogspot.co.uk/2012/03/cluster-apache-solr-data-using-apache_13.html
> > > [2] https://cwiki.apache.org/confluence/display/solr/Result+Clustering
> > >
> > > On Tue, Sep 29, 2015, at 12:42 PM, Szűcs Roland wrote:
> > > > Hello Upayavira,
> > > >
> > > > Thanks dealing with my issue. I have applied already the
> > termVectors=true
> > > > to all fileds involved in the more like this calculation. I have
> just 3
> > > > 000
> > > > documents each of them is represented by a relativly big term vector
> > with
> > > > more than 20 000 unique terms. If I run the more like this handler
> for
> > a
> > > > solr doc it takes close to 1 sec to get back the first 10 similar
> > > > documents. 

Re: MoreLikeThisHandler with mltipli input documents

2015-09-30 Thread Szűcs Roland
Hello Upayavira,

We use the ajax call and it can work when it takes only some seconds (even
the 7 sec can be acceptable in this case) as the customers first focus on
the product page and if they are not satisfied with the e-book they will
need the offer. I am just started to scare what will happen if we move to
the market of English ebooks with 1 million titles. I will try the
clustering as well, or using the termvector component we can implmenet our
own more like this calculation as we realized that sometimes less than 25
interesting terms are enough to make good recommendation and it can make
the calculation faster. If you see my previous email with the intresting
terms it shows clearly that half of the terms would be enough or even less.
What a pity that there is no such a parameter for the more like this
handler: mlt.interestingtermcount which would be set 25 as a default but we
could modify it in the solrconfig to make the calculation less resource
intensive.

Thank you Upayavira and Alessandro the lots of help and effort you made. I
see the options much clearer now.

Cheers,
Roland

2015-09-30 10:23 GMT+02:00 Upayavira <u...@odoko.co.uk>:

> Could you do the MLT as a separate (AJAX) request? They appear a little
> afterwards, whilst the user is already reading the page?
>
> Or, you could do offline clustering, in which case, overnight, you
> compare every document with every other, using a (likely non-solr)
> clustering algorithm, and store those in a separate core. Then you can
> request those immediately after your search query. Or reindex your
> content with that data stored alongside.
>
> Upayavira
>
> On Wed, Sep 30, 2015, at 09:16 AM, Alessandro Benedetti wrote:
> > I am still missing why you quote the number of the documents...
> > If you have 5600 polish books, but you use the MLT only when you land in
> > the page of a specific book ...
> > I think i still miss the point !
> > MLT on 1 polish book, takes 7 secs ?
> >
> >
> > 2015-09-30 9:10 GMT+01:00 Szűcs Roland <szucs.rol...@bookandwalk.hu>:
> >
> > > Hi Alessandro,
> > >
> > > You are right. I forget to mention one important factor. For 3000
> hungarian
> > > e-books the approach you mentioned is absolutely fine as the response
> time
> > > is some 0.7 sec. But when I use the same mlt for 5600 polish e-books
> the
> > > response time is 7 sec which is definetely not acceptable for the
> users.
> > >
> > > Regards,
> > > Roland
> > >
> > > 2015-09-29 17:19 GMT+02:00 Alessandro Benedetti <
> > > benedetti.ale...@gmail.com>
> > > :
> > >
> > > > Hi Roland,
> > > > you said "The main goal is that when a customer is on the pruduct
> page ".
> > > > But if you are in a  product page, I guess you have the product Id.
> > > > If you have the product id , you can simply execute the MLT request
> with
> > > > the single Doc Id in input.
> > > >
> > > > Why do you need to calculate beforehand?
> > > >
> > > > Cheers
> > > >
> > > > 2015-09-29 15:44 GMT+01:00 Szűcs Roland <szucs.rol...@bookandwalk.hu
> >:
> > > >
> > > > > Hello Upayavira,
> > > > >
> > > > > The main goal is that when a customer is on the pruduct page on an
> > > e-book
> > > > > and he does not like it somehow I want to immediately offer her/him
> > > > > alternative e-books in the same topic. If I expect from the
> customer to
> > > > > click on a button like "similar e-books" I lose half of them as
> they
> > > are
> > > > > lazy to click anywhere. So I would like to present on the product
> pages
> > > > the
> > > > > alternatives of the e-books  without clicking.
> > > > >
> > > > > I assumed the best idea to claculate the similar e-books for all
> the
> > > > other
> > > > > (n*(n-1) similarity calculation) and present only the top 5. I
> planned
> > > to
> > > > > do it when our server is not busy. In this point I found the
> > > description
> > > > of
> > > > > mlt as a search component which seemed to be a good candidate as it
> > > > > calculates the similar documents to all the result set of the
> query. So
> > > > if
> > > > > I say q=*:* and mlt component is enabled I get similar document
> for my
> > > > > entire document set. The only problem was with this approach that
> mlt
> > > > > s

Re: MoreLikeThisHandler with mltipli input documents

2015-09-29 Thread Szűcs Roland
Hi Alessandro,

My original goal was to get offline suggestsion on content based similarity
for every e-book we have . We wanted to run a bulk more like this
calculation in the evening when the usage of our site is low and we submit
a new e-book. Real time more like this can take a while as we have
typically long documents (2-5MB text) with all the content indexed.

When we upload a new document we wanted to recalculate the more like this
suggestions and a tf-idf based tag cloouds. Both of them are delivered by
the More LikeThisHandler but only for one document as you wrote.

The text input is not good for us because we need the similar doc list for
each of the matched document. If I put together text of 10 document I can
not separate which suggestion relates to which matched document and also
the tag cloud will belong to the mixed text.

Most likley we will use the MoreLikeThisHandler for each of the documents
and parse the json repsonse and store the result in a DQL database

Thanks your help.

2015-09-29 11:18 GMT+02:00 Alessandro Benedetti <benedetti.ale...@gmail.com>
:

> Hi Roland,
> what is your exact requirement ?
> Do you want to basically build a "description" for a set of documents and
> then find documents in the index, similar to this description ?
>
> By default , based on my experience ( and on the code) this is the entry
> point for the Lucene More Like This :
>
>
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > *org.apache.lucene.queries.mlt.MoreLikeThis/*** Return a query that will
> > return docs like the passed lucene document ID.** @param docNum the
> > documentID of the lucene doc to generate the 'More Like This" query for.*
> > @return a query that will return docs like the passed lucene document
> > ID.*/public Query like(int docNum) throws IOException {if (fieldNames ==
> > null) {// gather list of valid fields from luceneCollection
> fields
> > = MultiFields.getIndexedFields(ir);fieldNames = fields.toArray(new
> > String[fields.size()]);}return createQuery(retrieveTerms(docNum));}*
>
> It means that talking about "documents" you can feed only one Solr doc.
>
> But you can also feed the MLT with simple text.
>
> So you should study better your use case and understand which option
> fits better :
>
> 1) customising the MLT component starting from Lucene
>
> 2) doing some processing client side and use the "text" similarity feature.
>
>
> Cheers
>
>
> 2015-09-29 10:05 GMT+01:00 Roland Szűcs <roland.sz...@bookandwalk.com>:
>
> > Hi all,
> >
> > Is it possible to feed multiple solr id for a MoreLikeThisHandler?
> >
> > 
> > 
> > false
> > details
> > title,content
> > 4
> > title^12 content^1
> > 2
> > 10
> > true
> > json
> > true
> > 
> >   
> >
> > when I call this: http://localhost:8983/solr/bandwhu/mlt?q=id:8=id
> >  it works fine. Is there any way to have a kind of "bulk" call of more
> like
> > this handler . I need the intresting terms as well and as far as I know
> if
> > i use more like this as a search component it does not return with it so
> it
> > is not an alternative.
> >
> > Thanks in advance,
> >
> >
> > --
> > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Roland
> Szűcs
> > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Connect
> with
> > me on Linkedin <
> > https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
> > <https://bookandwalk.hu/>CEOPhone: +36 1 210 81 13Bookandwalk.hu
> > <https://bokandwalk.hu/>
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
<https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Szűcs Roland
<https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Ismerkedjünk
meg a Linkedin <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
-en <https://bookandwalk.hu/>ÜgyvezetőTelefon: +36 1 210 81 13Bookandwalk.hu
<https://bokandwalk.hu/>


Re: MoreLikeThisHandler with mltipli input documents

2015-09-29 Thread Szűcs Roland
Hello Upayavira,

Thanks dealing with my issue. I have applied already the termVectors=true
to all fileds involved in the more like this calculation. I have just 3 000
documents each of them is represented by a relativly big term vector with
more than 20 000 unique terms. If I run the more like this handler for a
solr doc it takes close to 1 sec to get back the first 10 similar
documents. Aftwr this I have to pass the docid-s to my other application
which find the cover of the e-book and other metadata and put it on the
web. The end-to-end process takes too much time from customer perspective
that is why I tried to find solution for offline more like this
calculation. But if my app has to call the morelikethishandler for each doc
it puts overhead for the offline calculation.

Best Regards,
Roland

2015-09-29 13:01 GMT+02:00 Upayavira <u...@odoko.co.uk>:

> If MoreLikeThis is slow for large documents that are indexed, have you
> enabled term vectors on the similarity fields?
>
> Basically, what more like this does is this:
>
> * decide on what terms in the source doc are "interesting", and pick the
> 25 most interesting ones
> * build and execute a boolean query using these interesting terms.
>
> Looking at the first phase of this in more detail:
>
> If you pass in a document using stream.body, it will analyse this
> document into terms, and then calculate the most interesting terms from
> that.
>
> If you reference document in your index with a field that is stored, it
> will take the stored version, and analyse it and identify the
> interesting terms from there.
>
> If, however, you have stored term vectors against that field, this work
> is not needed. You have already done much of the work, and the
> identification of your "interesting terms" will be much faster.
>
> Thus, on the content field of your documents, add termVectors="true" in
> your schema, and re-index. Then you could well find MLT becoming a lot
> more efficient.
>
> Upayavira
>
> On Tue, Sep 29, 2015, at 10:39 AM, Szűcs Roland wrote:
> > Hi Alessandro,
> >
> > My original goal was to get offline suggestsion on content based
> > similarity
> > for every e-book we have . We wanted to run a bulk more like this
> > calculation in the evening when the usage of our site is low and we
> > submit
> > a new e-book. Real time more like this can take a while as we have
> > typically long documents (2-5MB text) with all the content indexed.
> >
> > When we upload a new document we wanted to recalculate the more like this
> > suggestions and a tf-idf based tag cloouds. Both of them are delivered by
> > the More LikeThisHandler but only for one document as you wrote.
> >
> > The text input is not good for us because we need the similar doc list
> > for
> > each of the matched document. If I put together text of 10 document I can
> > not separate which suggestion relates to which matched document and also
> > the tag cloud will belong to the mixed text.
> >
> > Most likley we will use the MoreLikeThisHandler for each of the documents
> > and parse the json repsonse and store the result in a DQL database
> >
> > Thanks your help.
> >
> > 2015-09-29 11:18 GMT+02:00 Alessandro Benedetti
> > <benedetti.ale...@gmail.com>
> > :
> >
> > > Hi Roland,
> > > what is your exact requirement ?
> > > Do you want to basically build a "description" for a set of documents
> and
> > > then find documents in the index, similar to this description ?
> > >
> > > By default , based on my experience ( and on the code) this is the
> entry
> > > point for the Lucene More Like This :
> > >
> > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > *org.apache.lucene.queries.mlt.MoreLikeThis/*** Return a query that
> will
> > > > return docs like the passed lucene document ID.** @param docNum the
> > > > documentID of the lucene doc to generate the 'More Like This" query
> for.*
> > > > @return a query that will return docs like the passed lucene document
> > > > ID.*/public Query like(int docNum) throws IOException {if
> (fieldNames ==
> > > > null) {// gather list of valid fields from luceneCollection
> > > fields
> > > > = MultiFields.getIndexedFields(ir);fieldNames = fields.toArray(new
> > > > String[fields.size()]);}return createQuery(retrieveTer

Re: MoreLikeThisHandler with mltipli input documents

2015-09-29 Thread Szűcs Roland
Hello Upayavira,

The main goal is that when a customer is on the pruduct page on an e-book
and he does not like it somehow I want to immediately offer her/him
alternative e-books in the same topic. If I expect from the customer to
click on a button like "similar e-books" I lose half of them as they are
lazy to click anywhere. So I would like to present on the product pages the
alternatives of the e-books  without clicking.

I assumed the best idea to claculate the similar e-books for all the other
(n*(n-1) similarity calculation) and present only the top 5. I planned to
do it when our server is not busy. In this point I found the description of
mlt as a search component which seemed to be a good candidate as it
calculates the similar documents to all the result set of the query. So if
I say q=*:* and mlt component is enabled I get similar document for my
entire document set. The only problem was with this approach that mlt
search component does not give back the interesting terms for my tag cloud
calculation.

That's why I tried to mix the flexibility of mlt compoonent (multiple docs
as an input accepted) with the robustness of MoreLikeThisHandler (having
interesting terms).

If there is no solution, I will use the mlt component and solve the tag
cloud calculation other way. By the way if I am not mistaken, the 5.3.1
version takes the union of the feature set of the mlt component, and handler

Best Regards,
Roland



2015-09-29 14:38 GMT+02:00 Upayavira <u...@odoko.co.uk>:

> Let's take a step back. So, you have 3000 or so docs, and you want to
> know which documents are similar to these.
>
> Why do you want to know this? What feature do you need to build that
> will use that information? Knowing this may help us to arrive at the
> right technology for you.
>
> For example, you might want to investigate offline clustering algorithms
> (e.g. [1], which might be a bit dense to follow). A good book on machine
> learning if you are okay with Python is "Programming Collective
> Intelligence" as it explains the usual algorithms with simple for loops
> making it very clear.
>
> Or, you could do searches, and then cluster the results at search time
> (so if you search for 100 docs, it will identify clusters within those
> 100 matching documents). That might get you there. See [2]
>
> So, if you let us know what the end-goal is, perhaps we can suggest an
> alternative approach, rather than burying ourselves neck-deep in MLT
> problems.
>
> Upayavira
>
> [1]
>
> http://mylazycoding.blogspot.co.uk/2012/03/cluster-apache-solr-data-using-apache_13.html
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Clustering
>
> On Tue, Sep 29, 2015, at 12:42 PM, Szűcs Roland wrote:
> > Hello Upayavira,
> >
> > Thanks dealing with my issue. I have applied already the termVectors=true
> > to all fileds involved in the more like this calculation. I have just 3
> > 000
> > documents each of them is represented by a relativly big term vector with
> > more than 20 000 unique terms. If I run the more like this handler for a
> > solr doc it takes close to 1 sec to get back the first 10 similar
> > documents. Aftwr this I have to pass the docid-s to my other application
> > which find the cover of the e-book and other metadata and put it on the
> > web. The end-to-end process takes too much time from customer perspective
> > that is why I tried to find solution for offline more like this
> > calculation. But if my app has to call the morelikethishandler for each
> > doc
> > it puts overhead for the offline calculation.
> >
> > Best Regards,
> > Roland
> >
> > 2015-09-29 13:01 GMT+02:00 Upayavira <u...@odoko.co.uk>:
> >
> > > If MoreLikeThis is slow for large documents that are indexed, have you
> > > enabled term vectors on the similarity fields?
> > >
> > > Basically, what more like this does is this:
> > >
> > > * decide on what terms in the source doc are "interesting", and pick
> the
> > > 25 most interesting ones
> > > * build and execute a boolean query using these interesting terms.
> > >
> > > Looking at the first phase of this in more detail:
> > >
> > > If you pass in a document using stream.body, it will analyse this
> > > document into terms, and then calculate the most interesting terms from
> > > that.
> > >
> > > If you reference document in your index with a field that is stored, it
> > > will take the stored version, and analyse it and identify the
> > > interesting terms from there.
> > >
> > > If, however, you have stored term vectors against that field, this

start solr 5.3.1 under windows and admmin GUI show 5.2.1 is running

2015-09-26 Thread Szűcs Roland
Hi guys,

I downloaded the latest version of solr to my computer. When I started solr
as a standalone proccess on the default port two strange things happened:
1. I got an error message : Failed to parse command line arguments due to:
Unrecognized option: -maxWaitSecs. I did not use any argumet when I
saterted solr just the start command

2. When I go tothe localhost in the webbrowser I saw the attached picture.
It shows that I am running version 5.2.1, although al the enviromental
variable is reffered to a subdirectory of the 5.3.1 installation.

Any idea?

Best Regards


-- 
<https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Szűcs Roland
<https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Ismerkedjünk
meg a Linkedin <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>
-en <https://bookandwalk.hu/>ÜgyvezetőTelefon: +36 1 210 81 13Bookandwalk.hu
<https://bokandwalk.hu/>


Re: commit of xml update by AJAX

2015-08-30 Thread Szűcs Roland
Thanks Erick,

Your blog post made it clear. It was looong, but not too long.

Roland

2015-08-29 19:00 GMT+02:00 Erick Erickson erickerick...@gmail.com:

 1 My first guess is that your autocommit
 section in solrconfig.xml has openSearcherfalse/openSearcher
 So the commitWithin happened but a new searcher
 was not opened thus the document is invisible.
 Try issuing a separate commit or change that value
 in solrconfig.xml and try again.

 Here's a lng post on all this:

 https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

 2 No clue since I'm pretty ajax-ignorant.

 3 because curl easily downloadable at worst and most often
 already on someone's machine and let people at least get started.
 Pretty soon, though, for production situations people will use SolrJ
 or the like or use one of the off-the-shelf tools packaged around
 Solr.

 Best
 Erick

 On Sat, Aug 29, 2015 at 9:30 AM, Szűcs Roland
 szucs.rol...@bookandwalk.hu wrote:
  Hello SOLR experts,
 
  I am new to solr as you will see from my problem. I just try to
 understand
  how solr works. I use one core (BandW) on my locla machine and I use
  javascript for my learning purpose.
 
  I have a test schema.xml: with two fileds: id, title. I managed to run
  queries with faceting, autocomplete, etc. In all cases I used Ajax post
  method. For example my search was (searchWithSuggest.searchAjaxRequest is
  an XMLHttpRequest object):
  var s=document.getElementById(searchWithSuggest.inputBoxId).value;
  var params='q='+s+'start=0rows=10';
  a=searchWithSuggest.solrServer+'/query';
  searchWithSuggest.searchAjaxRequest.open(POST,a, true);
  searchWithSuggest.searchAjaxRequest.setRequestHeader(Content-type,
  application/x-www-form-urlencoded);
  searchWithSuggest.searchAjaxRequest.send(encodeURIComponent(params));
 
  It worked fine. I thought that an xml update can work the same way so I
  tried to add and index one new document by xml(a is an XMLHttpRequest
  object):
  a.open(POST,http://localhost:8983/solr/bandw/update,true);
  a.setRequestHeader(Content-type, application/x-www-form-urlencoded);
  a.send(encodeURIComponent(stream.body=add commitWithin=5000docfield
  name='id'3222/fieldfield name='title'Blade/field/doc/add));
 
  I got a response with error: missing content stream.
 
  I have changed only the a.open function call to this one:
  a.open(POST,http://localhost:8983/solr/bandw/update?commit=true
 ,true);
  the rest of the did not change.
  Finally, I got response with no error from SOLR. Later it turned out that
  the new doc was not indexed at all.
 
  My questions:
  1. If I get no error from solr what is wrong with the second solution and
  how can I fix it?
  2. Is there any solution to put all the parameters to the a.send call as
 in
  case of queries. I tried
  a.send(encodeURIComponent(commit=truestream.body=add
  commitWithin=5000docfield name='id'3222/fieldfield
  name='title'Blade/field/doc/add)); but it was not working.
  3. Why 95% of the examples in SOLR wiki pages relates to curl. Is this
 the
  most efficient alternative? Is there a mapping between a curl syntax
 and
  the post request?
 
  Best Regards,
  Roland
 
  --
  https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huSzűcs
 Roland
  https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu
 Ismerkedjünk
  meg a Linkedin 
 https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu
  -en https://bookandwalk.hu/ÜgyvezetőTelefon: +36 1 210 81
 13Bookandwalk.hu
  https://bokandwalk.hu/




-- 
https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huSzűcs Roland
https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huIsmerkedjünk
meg a Linkedin https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu
-en https://bookandwalk.hu/ÜgyvezetőTelefon: +36 1 210 81 13Bookandwalk.hu
https://bokandwalk.hu/


Re: commit of xml update by AJAX

2015-08-30 Thread Szűcs Roland
Hi Upayavira,

You were rigtht. I had to only replace the Content-type to appliacation/xml
and it worked correctly.

Roland

2015-08-30 11:22 GMT+02:00 Upayavira u...@odoko.co.uk:



 On Sat, Aug 29, 2015, at 05:30 PM, Szűcs Roland wrote:
  Hello SOLR experts,
 
  I am new to solr as you will see from my problem. I just try to
  understand
  how solr works. I use one core (BandW) on my locla machine and I use
  javascript for my learning purpose.
 
  I have a test schema.xml: with two fileds: id, title. I managed to run
  queries with faceting, autocomplete, etc. In all cases I used Ajax post
  method. For example my search was (searchWithSuggest.searchAjaxRequest is
  an XMLHttpRequest object):
  var s=document.getElementById(searchWithSuggest.inputBoxId).value;
  var params='q='+s+'start=0rows=10';
  a=searchWithSuggest.solrServer+'/query';
  searchWithSuggest.searchAjaxRequest.open(POST,a, true);
  searchWithSuggest.searchAjaxRequest.setRequestHeader(Content-type,
  application/x-www-form-urlencoded);
  searchWithSuggest.searchAjaxRequest.send(encodeURIComponent(params));
 
  It worked fine. I thought that an xml update can work the same way so I
  tried to add and index one new document by xml(a is an XMLHttpRequest
  object):
  a.open(POST,http://localhost:8983/solr/bandw/update,true);
  a.setRequestHeader(Content-type, application/x-www-form-urlencoded);
  a.send(encodeURIComponent(stream.body=add commitWithin=5000docfield
  name='id'3222/fieldfield name='title'Blade/field/doc/add));
 
  I got a response with error: missing content stream.
 
  I have changed only the a.open function call to this one:
  a.open(POST,http://localhost:8983/solr/bandw/update?commit=true
 ,true);
  the rest of the did not change.
  Finally, I got response with no error from SOLR. Later it turned out that
  the new doc was not indexed at all.
 
  My questions:
  1. If I get no error from solr what is wrong with the second solution and
  how can I fix it?
  2. Is there any solution to put all the parameters to the a.send call as
  in
  case of queries. I tried
  a.send(encodeURIComponent(commit=truestream.body=add
  commitWithin=5000docfield name='id'3222/fieldfield
  name='title'Blade/field/doc/add)); but it was not working.
  3. Why 95% of the examples in SOLR wiki pages relates to curl. Is this
  the
  most efficient alternative? Is there a mapping between a curl syntax
  and
  the post request?
 
  Best Regards,
  Roland

 You're using a POST to fake a GET - just make the Content-type text/xml
 (or application/xml, I forget) and call a.send(add/add);

 You may need the encodeURIComponent, not sure.

 The stream.body feature allows you to do an HTTP GET that has a stream
 within it, but you are already doing a POST so it isn't needed.

 Upayavira




-- 
https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huSzűcs Roland
https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huIsmerkedjünk
meg a Linkedin https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu
-en https://bookandwalk.hu/ÜgyvezetőTelefon: +36 1 210 81 13Bookandwalk.hu
https://bokandwalk.hu/


commit of xml update by AJAX

2015-08-29 Thread Szűcs Roland
Hello SOLR experts,

I am new to solr as you will see from my problem. I just try to understand
how solr works. I use one core (BandW) on my locla machine and I use
javascript for my learning purpose.

I have a test schema.xml: with two fileds: id, title. I managed to run
queries with faceting, autocomplete, etc. In all cases I used Ajax post
method. For example my search was (searchWithSuggest.searchAjaxRequest is
an XMLHttpRequest object):
var s=document.getElementById(searchWithSuggest.inputBoxId).value;
var params='q='+s+'start=0rows=10';
a=searchWithSuggest.solrServer+'/query';
searchWithSuggest.searchAjaxRequest.open(POST,a, true);
searchWithSuggest.searchAjaxRequest.setRequestHeader(Content-type,
application/x-www-form-urlencoded);
searchWithSuggest.searchAjaxRequest.send(encodeURIComponent(params));

It worked fine. I thought that an xml update can work the same way so I
tried to add and index one new document by xml(a is an XMLHttpRequest
object):
a.open(POST,http://localhost:8983/solr/bandw/update,true);
a.setRequestHeader(Content-type, application/x-www-form-urlencoded);
a.send(encodeURIComponent(stream.body=add commitWithin=5000docfield
name='id'3222/fieldfield name='title'Blade/field/doc/add));

I got a response with error: missing content stream.

I have changed only the a.open function call to this one:
a.open(POST,http://localhost:8983/solr/bandw/update?commit=true,true);
the rest of the did not change.
Finally, I got response with no error from SOLR. Later it turned out that
the new doc was not indexed at all.

My questions:
1. If I get no error from solr what is wrong with the second solution and
how can I fix it?
2. Is there any solution to put all the parameters to the a.send call as in
case of queries. I tried
a.send(encodeURIComponent(commit=truestream.body=add
commitWithin=5000docfield name='id'3222/fieldfield
name='title'Blade/field/doc/add)); but it was not working.
3. Why 95% of the examples in SOLR wiki pages relates to curl. Is this the
most efficient alternative? Is there a mapping between a curl syntax and
the post request?

Best Regards,
Roland

-- 
https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huSzűcs Roland
https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huIsmerkedjünk
meg a Linkedin https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu
-en https://bookandwalk.hu/ÜgyvezetőTelefon: +36 1 210 81 13Bookandwalk.hu
https://bokandwalk.hu/


Re: multiple but identical suggestions in autocomplete

2015-08-04 Thread Szűcs Roland
Hello Nutch Solr user,

You are right I use DocumentDictionaryFactory as you can see in my
solrconfig file
searchComponent name=suggest class=solr.SuggestComponent
lst name=suggester
str name=namesuggest_publisher/str
str name=lookupImplAnalyzingInfixLookupFactory/str
str name=dictionaryImplDocumentDictionaryFactory/str
str name=fieldpublisher/str
str name=suggestAnalyzerFieldTypetext_hu_suggest_ngram/str
str name=indexPathsuggester_infix_dir_publisher/str
str name=weightFieldprice/str
str name=builOnStartupfalse/str
str name=buildOnCommitfalse/str
/lst
  /searchComponent
You wrote that you you have developed a service between the ui and solr.
How can I use that one if I use javascript / ajax on the client side?

Thanks,
Roland

2015-08-04 16:25 GMT+02:00 Nutch Solr User nutchsolru...@gmail.com:

 May be you are using DocumentDictionaryFactory because
 HighFrequencyDictionaryFactory will never return duplicate duplicate terms.

 We also had same problem with *DocumentDictionaryFactory +
 AnalyzingInfixSuggester* We have created one service between UI and Solr
 which groups duplicate suggestions. and returns unique list to UI with only
 contains unique suggestions.



 -
 Nutch Solr User

 The ultimate search engine would basically understand everything in the
 world, and it would always give you the right thing.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/multiple-but-identical-suggestions-in-autocomplete-tp4220055p4220727.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huSzűcs Roland
https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huIsmerkedjünk
meg a Linkedin https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu
-en https://bookandwalk.hu/ÜgyvezetőTelefon: +36 1 210 81 13Bookandwalk.hu
https://bokandwalk.hu/


multiple but identical suggestions in autocomplete

2015-07-31 Thread Szűcs Roland
Hello Guys,

I use SOLR 5.2.1 and the relatively new solr.SuggestComponent. It worked
fine at the beginning. I use this function to auto-complete the publisher
names. I have 3000 documents and 80 publishers. When I use the autocomplete
feature and I get back the name of the publishers matched as many times as
many book titles they published.

If suggest.q=Har and Harlequin publisher has 100 documents I get back a
json with 100 suggestions with the same publisher name. Obviously it is not
my intention. I would like to get back the matched publisher name once and
later I will use a filter query to the selected publisher name.

Any Idea how can I get identical suggestions only once? Is there any
parameter I can set in solrconfig.xml to solve this?

Thanks in advance,


-- 
https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huSzűcs Roland
https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/huIsmerkedjünk
meg a Linkedin https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu
-en https://bookandwalk.hu/ÜgyvezetőTelefon: +36 1 210 81 13Bookandwalk.hu
https://bokandwalk.hu/


Re: autosuggest with solr.EdgeNGramFilterFactory no result found

2015-07-07 Thread Szűcs Roland
Thanx Erick,

Your blog article was the perfect answer to my problem.

Rgds,

Roland

2015-07-03 18:57 GMT+02:00 Erick Erickson erickerick...@gmail.com:

 OK, I think you took a wrong turn at the bakery

 The FST-based suggesters are intended to look at the
 beginnings of fields. It is totally unnecessary to use
 ngrams, the FST that gets built does that _for_ you.
 Actually it builds an internal FST structure that does
 this en passant.

 For getting whole fields that are anywhere in the input
 field, you probably want to think about
 AnalyzingInfixSuggester or FreeTextSuggester.

 The important bit here is that you shouldn't have to do
 so much work...

 This might help:

 http://lucidworks.com/blog/solr-suggester/

 Best,
 Erick

 On Fri, Jul 3, 2015 at 4:40 AM, Roland Szűcs
 roland.sz...@bookandwalk.com wrote:
  I tried to setup an autosuggest feature with multiple dictionaries for
  title , author and publisher fields.
 
  I used the solr.EdgeNGramFilterFactory to optimize the performance of the
  auto suggest.
 
  I have a document in the index with title: Romana.
 
  When I test the text analysis for auto suggest (on filed of
  title_suggest_ngram):
  ENGTF
  textraw_bytesstartendpositionLengthtypeposition
  rom[72 6f 6d]061word1roma[72 6f 6d 61]061word1roman[72 6f 6d 61
 6e]061word1
  romana[72 6f 6d 61 6e 61]061word1
  If I try to run http://localhost:8983/solr/bandw/suggest?q=Roma, I get:
  response
  lst name=responseHeader
  int name=status0/int
  int name=QTime1/int
  /lst
  lst name=suggest
  lst name=suggest_publisher
  lst name=Roma
  int name=numFound0/int
  arr name=suggestions/
  /lst
  /lst
  lst name=suggest_title
  lst name=Roma
  int name=numFound0/int
  arr name=suggestions/
  /lst
  /lst
  lst name=suggest_author
  lst name=Roma
  int name=numFound0/int
  arr name=suggestions/
  /lst
  /lst
  /lst
  /response
 
  my relevant field definitions:
  field name=id type=string indexed=true stored=true
 required=true
  multiValued=false omitNorms=true /
 field name=author type=text_hu indexed=true stored=true
  multiValued=true/
 field name=title type=text_hu indexed=true stored=true
  multiValued=false/
 field name=subtitle type=text_hu indexed=true stored=true
  multiValued=false/
 field name=publisher type=text_hu indexed=true stored=true
  multiValued=false/
  field name=title_suggest_ngram type=text_hu_suggest_ngram
  indexed=true stored=false multiValued=false omitNorms=true/
 field name=author_suggest_ngram type=text_hu_suggest_ngram
  indexed=true stored=false multiValued=false omitNorms=true/
 field name=publisher_suggest_ngram type=text_hu_suggest_ngram
  indexed=true stored=false multiValued=false omitNorms=true/
 copyField source=title dest=title_suggest_ngram/
 copyField source=author dest=author_suggest_ngram/
 copyField source=publisher dest=publisher_suggest_ngram/
 
  My EdgeNGram related field type definition:
  fieldType name=text_hu_suggest_ngram class=solr.TextField
  positionIncrementGap=100
  analyzer type=index
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.StopFilterFactory
  ignoreCase=true
  words=stopwords_hu.txt
  /
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.EdgeNGramFilterFactory minGramSize=3
  maxGramSize=8/
/analyzer
analyzer type=query
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.StopFilterFactory
  ignoreCase=true
  words=stopwords_hu.txt
  /
  filter class=solr.LowerCaseFilterFactory/
/analyzer
  /fieldType
 
  My requesthandler for suggest:
  requestHandler name=/suggest class=solr.SearchHandler
 startup=lazy
  lst name=defaults
  str name=suggesttrue/str
  str name=suggest.count5/str
  str name=suggest.dictionarysuggest_author/str
  str name=suggest.dictionarysuggest_title/str
  str name=suggest.dictionarysuggest_publisher/str
  /lst
  arr name=components
  strsuggest/str
  /arr
/requestHandler
 
  And finally my searchcomponent:
  searchComponent name=suggest class=solr.SuggestComponent
  lst name=suggester
  str name=namesuggest_title/str
  str name=lookupImplFSTLookupFactory/str
  str name=dictionaryImplDocumentDictionaryFactory/str
  str name=fieldtitle_suggest_ngram/str
  str name=wightFieldprice/str
  str name=builOnStartuptrue/str
  str name=buildOnCommittrue/str
  /lst
  lst name=suggester
  str name=namesuggest_author/str
  str name=lookupImplFSTLookupFactory/str
  str name=dictionaryImplDocumentDictionaryFactory/str
  str name=fieldauthor_suggest_ngram/str
  str name=wightFieldprice/str
  str name=builOnStartuptrue/str
  str name=buildOnCommittrue/str
  /lst
  lst name=suggester
  str name=namesuggest_publisher/str
  str name=lookupImplFSTLookupFactory/str
  str name=dictionaryImplDocumentDictionaryFactory/str
  str name=fieldpublisher_suggest_ngram/str
  str