Re: MLT and facetting

2019-02-25 Thread Zheng Lin Edwin Yeo
Hi Martin,

What is your setting in your /mlt requestHandler in solrconfig.xml?

Regards,
Edwin

On Tue, 26 Feb 2019 at 14:43, Martin Frank Hansen (MHQ)  wrote:

> Hi Edwin,
>
> Thanks for your response.
>
> Yes you are right. It was simply the search parameters from Solr.
>
> The query looks like this:
>
> http://
> .../solr/.../mlt?df=text&facet.field=Journalnummer&facet=on&fl=id,Journalnummer&q=id:*6512815*
>
> best regards,
>
> Martin
>
>
> Internal - KMD A/S
>
> -Original Message-
> From: Zheng Lin Edwin Yeo 
> Sent: 26. februar 2019 03:54
> To: solr-user@lucene.apache.org
> Subject: Re: MLT and facetting
>
> Hi Martin,
>
> I think there are some pictures which are not being sent through in the
> email.
>
> Do send your query that you are using, and which version of Solr you are
> using?
>
> Regards,
> Edwin
>
> On Mon, 25 Feb 2019 at 20:54, Martin Frank Hansen (MHQ) 
> wrote:
>
> > Hi,
> >
> >
> >
> > I am trying to combine the mlt functionality with facets, but Solr
> > throws
> > org.apache.solr.common.SolrException: ":"Unable to compute facet
> > ranges, facet context is not set".
> >
> >
> >
> > What I am trying to do is quite simple, find similar documents using
> > mlt and group these using the facet parameter. When using mlt and
> > facets separately everything works fine, but not when combining the
> functionality.
> >
> >
> >
> >
> >
> > {
> >
> >   "responseHeader":{
> >
> > "status":500,
> >
> > "QTime":109},
> >
> >   "match":{"numFound":1,"start":0,"docs":[
> >
> >   {
> >
> > "Journalnummer":" 00759",
> >
> > "id":"6512815"  },
> >
> >   "response":{"numFound":602234,"start":0,"docs":[
> >
> >   {
> >
> > "Journalnummer":" 00759",
> >
> > "id":"6512816",
> >
> >   {
> >
> > "Journalnummer":" 00759",
> >
> > "id":"6834653"
> >
> >   {
> >
> > "Journalnummer":" 00739",
> >
> > "id":"6202373"
> >
> >   {
> >
> > "Journalnummer":" 00739",
> >
> > "id":"6748105"
> >
> >
> >
> >   {
> >
> > "Journalnummer":" 00803",
> >
> > "id":"7402155"
> >
> >   },
> >
> >   "error":{
> >
> > "metadata":[
> >
> >   "error-class","org.apache.solr.common.SolrException",
> >
> >   "root-error-class","org.apache.solr.common.SolrException"],
> >
> > "msg":"Unable to compute facet ranges, facet context is not set",
> >
> > "trace":"org.apache.solr.common.SolrException: Unable to compute
> > facet ranges, facet context is not set\n\tat
> > org.apache.solr.handler.component.RangeFacetProcessor.getFacetRangeCou
> > nts(RangeFacetProcessor.java:66)\n\tat
> > org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetC
> > omponent.java:331)\n\tat
> > org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetC
> > omponent.java:295)\n\tat
> > org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLike
> > ThisHandler.java:240)\n\tat
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> > rBase.java:199)\n\tat
> > org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)\n\tat
> > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)\n\
> > tat
> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)\n\tat
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> > .java:377)\n\tat
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> > .java:323)\n\tat
> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletH
> > andler.java:1634)\n\tat
> > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:
> > 533)\n\tat
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.ja
> > va:146)\n\tat
> > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java
> > :548)\n\tat
> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.
> > java:132)\n\tat
> > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandle
> > r.java:257)\n\tat
> > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandle
> > r.java:1595)\n\tat
> > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandle
> > r.java:255)\n\tat
> > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandle
> > r.java:1317)\n\tat
> > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler
> > .java:203)\n\tat
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:4
> > 73)\n\tat
> > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler
> > .java:1564)\n\tat
> > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler
> > .java:201)\n\tat
> > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler
> > .java:1219)\n\tat
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.ja
> > va:144)\n\tat
> > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Conte
> > xtHandlerCo

SOLR Tokenizer “solr.SimplePatternSplitTokenizerFactory” splits at unexpected characters

2019-02-25 Thread Stephan Damson
Hi!

I'm having unexpected results with the solr.SimplePatternSplitTokenizerFactory. 
The pattern used is actually from an example in the SOLR documentation and I do 
not understand where I made a mistake or why it does not work as expected.
If we take the example input "operative", the analyzer shows that during 
indexing, the input gets split into the tokens "ope", "a" and "ive", that is 
the tokenizer splits at the characters "r" and "t", and not at the expected 
whitespace characters (CR, TAB). Just to be sure I also tried to use more than 
one backspace in the pattern (e.g. \t and \\t), but this did not 
change how the input is tokenized during indexing.

What am I missing?
SOLR version used is 7.5.0.
The definition of the field type in the schema is as follows:

  




  
  





  


Many thanks in advance for any help you can provide!


RE: MLT and facetting

2019-02-25 Thread Martin Frank Hansen (MHQ)
Sorry forgot to mention that we are using Solr 7.5. 


Internal - KMD A/S

-Original Message-
From: Martin Frank Hansen (MHQ)  
Sent: 26. februar 2019 07:43
To: solr-user@lucene.apache.org
Subject: RE: MLT and facetting

Hi Edwin,

Thanks for your response. 

Yes you are right. It was simply the search parameters from Solr. 

The query looks like this:

http://.../solr/.../mlt?df=text&facet.field=Journalnummer&facet=on&fl=id,Journalnummer&q=id:*6512815*

best regards,

Martin


Internal - KMD A/S

-Original Message-
From: Zheng Lin Edwin Yeo 
Sent: 26. februar 2019 03:54
To: solr-user@lucene.apache.org
Subject: Re: MLT and facetting

Hi Martin,

I think there are some pictures which are not being sent through in the email.

Do send your query that you are using, and which version of Solr you are using?

Regards,
Edwin

On Mon, 25 Feb 2019 at 20:54, Martin Frank Hansen (MHQ)  wrote:

> Hi,
>
>
>
> I am trying to combine the mlt functionality with facets, but Solr 
> throws
> org.apache.solr.common.SolrException: ":"Unable to compute facet 
> ranges, facet context is not set".
>
>
>
> What I am trying to do is quite simple, find similar documents using 
> mlt and group these using the facet parameter. When using mlt and 
> facets separately everything works fine, but not when combining the 
> functionality.
>
>
>
>
>
> {
>
>   "responseHeader":{
>
> "status":500,
>
> "QTime":109},
>
>   "match":{"numFound":1,"start":0,"docs":[
>
>   {
>
> "Journalnummer":" 00759",
>
> "id":"6512815"  },
>
>   "response":{"numFound":602234,"start":0,"docs":[
>
>   {
>
> "Journalnummer":" 00759",
>
> "id":"6512816",
>
>   {
>
> "Journalnummer":" 00759",
>
> "id":"6834653"
>
>   {
>
> "Journalnummer":" 00739",
>
> "id":"6202373"
>
>   {
>
> "Journalnummer":" 00739",
>
> "id":"6748105"
>
>
>
>   {
>
> "Journalnummer":" 00803",
>
> "id":"7402155"
>
>   },
>
>   "error":{
>
> "metadata":[
>
>   "error-class","org.apache.solr.common.SolrException",
>
>   "root-error-class","org.apache.solr.common.SolrException"],
>
> "msg":"Unable to compute facet ranges, facet context is not set",
>
> "trace":"org.apache.solr.common.SolrException: Unable to compute 
> facet ranges, facet context is not set\n\tat 
> org.apache.solr.handler.component.RangeFacetProcessor.getFacetRangeCou
> nts(RangeFacetProcessor.java:66)\n\tat
> org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetC
> omponent.java:331)\n\tat
> org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetC
> omponent.java:295)\n\tat
> org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLike
> ThisHandler.java:240)\n\tat
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> rBase.java:199)\n\tat
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)\n\tat
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)\n\
> tat
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> .java:377)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> .java:323)\n\tat
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletH
> andler.java:1634)\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:
> 533)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.ja
> va:146)\n\tat
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java
> :548)\n\tat
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.
> java:132)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandle
> r.java:257)\n\tat
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandle
> r.java:1595)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandle
> r.java:255)\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandle
> r.java:1317)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler
> .java:203)\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:4
> 73)\n\tat
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler
> .java:1564)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler
> .java:201)\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler
> .java:1219)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.ja
> va:144)\n\tat
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Conte
> xtHandlerCollection.java:219)\n\tat
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColle
> ction.java:126)\n\tat
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.
> java:132)\n\tat
> org.eclipse.jetty.rewrite.handler.Rew

LTR feature based on other collection data

2019-02-25 Thread Kamal Kishore Aggarwal
Hi,

I am working on LTR using solr 6.6.2. I am working on custom feature
creation. I am able to create few custom features as per our requirement.

But, there are certain features, for which the data is stored in other
collection. Data like count of clicks, last date when the product was
ordered, etc. These type of information is stored in another collection and
we are not planning to put this info. in first collection.

Now, we need to use the data in other collection to generate the score of
the document  in LTR. We are open to develop custom components as well.

Is there a way, we can modify our query using some join. But, we know join
is expensive.

Please suggest. Thanks in advance.

Regards
Kamal Kishore


RE: MLT and facetting

2019-02-25 Thread Martin Frank Hansen (MHQ)
Hi Dave, 

Thanks for your suggestion, I was under the impression that you could do it in 
one-search approach. But if that’s not possible I will try to divide into two 
searches. 

Is the best way to do this through Solrj? 

Best regards

Martin


Internal - KMD A/S

-Original Message-
From: Dave  
Sent: 26. februar 2019 05:39
To: solr-user@lucene.apache.org
Subject: Re: MLT and facetting

Use the mlt to get the queries to use for getting facets in a two search 
approach

> On Feb 25, 2019, at 10:18 PM, Zheng Lin Edwin Yeo  
> wrote:
> 
> Hi Martin,
> 
> I think there are some pictures which are not being sent through in 
> the email.
> 
> Do send your query that you are using, and which version of Solr you 
> are using?
> 
> Regards,
> Edwin
> 
>> On Mon, 25 Feb 2019 at 20:54, Martin Frank Hansen (MHQ)  wrote:
>> 
>> Hi,
>> 
>> 
>> 
>> I am trying to combine the mlt functionality with facets, but Solr 
>> throws
>> org.apache.solr.common.SolrException: ":"Unable to compute facet 
>> ranges, facet context is not set".
>> 
>> 
>> 
>> What I am trying to do is quite simple, find similar documents using 
>> mlt and group these using the facet parameter. When using mlt and 
>> facets separately everything works fine, but not when combining the 
>> functionality.
>> 
>> 
>> 
>> 
>> 
>> {
>> 
>>  "responseHeader":{
>> 
>>"status":500,
>> 
>>"QTime":109},
>> 
>>  "match":{"numFound":1,"start":0,"docs":[
>> 
>>  {
>> 
>>"Journalnummer":" 00759",
>> 
>>"id":"6512815"  },
>> 
>>  "response":{"numFound":602234,"start":0,"docs":[
>> 
>>  {
>> 
>>"Journalnummer":" 00759",
>> 
>>"id":"6512816",
>> 
>>  {
>> 
>>"Journalnummer":" 00759",
>> 
>>"id":"6834653"
>> 
>>  {
>> 
>>"Journalnummer":" 00739",
>> 
>>"id":"6202373"
>> 
>>  {
>> 
>>"Journalnummer":" 00739",
>> 
>>"id":"6748105"
>> 
>> 
>> 
>>  {
>> 
>>"Journalnummer":" 00803",
>> 
>>"id":"7402155"
>> 
>>  },
>> 
>>  "error":{
>> 
>>"metadata":[
>> 
>>  "error-class","org.apache.solr.common.SolrException",
>> 
>>  "root-error-class","org.apache.solr.common.SolrException"],
>> 
>>"msg":"Unable to compute facet ranges, facet context is not set",
>> 
>>"trace":"org.apache.solr.common.SolrException: Unable to compute 
>> facet ranges, facet context is not set\n\tat 
>> org.apache.solr.handler.component.RangeFacetProcessor.getFacetRangeCo
>> unts(RangeFacetProcessor.java:66)\n\tat
>> org.apache.solr.handler.component.FacetComponent.getFacetCounts(Facet
>> Component.java:331)\n\tat 
>> org.apache.solr.handler.component.FacetComponent.getFacetCounts(Facet
>> Component.java:295)\n\tat 
>> org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLik
>> eThisHandler.java:240)\n\tat 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
>> erBase.java:199)\n\tat 
>> org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)\n\tat
>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)\n
>> \tat 
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)\n\ta
>> t 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
>> r.java:377)\n\tat 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
>> r.java:323)\n\tat 
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
>> Handler.java:1634)\n\tat 
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java
>> :533)\n\tat 
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
>> ava:146)\n\tat 
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.jav
>> a:548)\n\tat 
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper
>> .java:132)\n\tat 
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandl
>> er.java:257)\n\tat 
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandl
>> er.java:1595)\n\tat 
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandl
>> er.java:255)\n\tat 
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandl
>> er.java:1317)\n\tat 
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandle
>> r.java:203)\n\tat 
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:
>> 473)\n\tat 
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandle
>> r.java:1564)\n\tat 
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandle
>> r.java:201)\n\tat 
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandle
>> r.java:1219)\n\tat 
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
>> ava:144)\n\tat 
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Cont
>> extHandlerCollection.java:219)\n\tat
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColl
>> ection.java:126)\n\tat 
>> org.eclipse.jetty.server.h

RE: MLT and facetting

2019-02-25 Thread Martin Frank Hansen (MHQ)
Hi Edwin,

Thanks for your response. 

Yes you are right. It was simply the search parameters from Solr. 

The query looks like this:

http://.../solr/.../mlt?df=text&facet.field=Journalnummer&facet=on&fl=id,Journalnummer&q=id:*6512815*

best regards,

Martin


Internal - KMD A/S

-Original Message-
From: Zheng Lin Edwin Yeo  
Sent: 26. februar 2019 03:54
To: solr-user@lucene.apache.org
Subject: Re: MLT and facetting

Hi Martin,

I think there are some pictures which are not being sent through in the email.

Do send your query that you are using, and which version of Solr you are using?

Regards,
Edwin

On Mon, 25 Feb 2019 at 20:54, Martin Frank Hansen (MHQ)  wrote:

> Hi,
>
>
>
> I am trying to combine the mlt functionality with facets, but Solr 
> throws
> org.apache.solr.common.SolrException: ":"Unable to compute facet 
> ranges, facet context is not set".
>
>
>
> What I am trying to do is quite simple, find similar documents using 
> mlt and group these using the facet parameter. When using mlt and 
> facets separately everything works fine, but not when combining the 
> functionality.
>
>
>
>
>
> {
>
>   "responseHeader":{
>
> "status":500,
>
> "QTime":109},
>
>   "match":{"numFound":1,"start":0,"docs":[
>
>   {
>
> "Journalnummer":" 00759",
>
> "id":"6512815"  },
>
>   "response":{"numFound":602234,"start":0,"docs":[
>
>   {
>
> "Journalnummer":" 00759",
>
> "id":"6512816",
>
>   {
>
> "Journalnummer":" 00759",
>
> "id":"6834653"
>
>   {
>
> "Journalnummer":" 00739",
>
> "id":"6202373"
>
>   {
>
> "Journalnummer":" 00739",
>
> "id":"6748105"
>
>
>
>   {
>
> "Journalnummer":" 00803",
>
> "id":"7402155"
>
>   },
>
>   "error":{
>
> "metadata":[
>
>   "error-class","org.apache.solr.common.SolrException",
>
>   "root-error-class","org.apache.solr.common.SolrException"],
>
> "msg":"Unable to compute facet ranges, facet context is not set",
>
> "trace":"org.apache.solr.common.SolrException: Unable to compute 
> facet ranges, facet context is not set\n\tat 
> org.apache.solr.handler.component.RangeFacetProcessor.getFacetRangeCou
> nts(RangeFacetProcessor.java:66)\n\tat
> org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetC
> omponent.java:331)\n\tat 
> org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetC
> omponent.java:295)\n\tat 
> org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLike
> ThisHandler.java:240)\n\tat 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> rBase.java:199)\n\tat 
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)\n\tat
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)\n\
> tat 
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> .java:377)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> .java:323)\n\tat 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletH
> andler.java:1634)\n\tat 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:
> 533)\n\tat 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.ja
> va:146)\n\tat 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java
> :548)\n\tat 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.
> java:132)\n\tat 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandle
> r.java:257)\n\tat 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandle
> r.java:1595)\n\tat 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandle
> r.java:255)\n\tat 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandle
> r.java:1317)\n\tat 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler
> .java:203)\n\tat 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:4
> 73)\n\tat 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler
> .java:1564)\n\tat 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler
> .java:201)\n\tat 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler
> .java:1219)\n\tat 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.ja
> va:144)\n\tat 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Conte
> xtHandlerCollection.java:219)\n\tat
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColle
> ction.java:126)\n\tat 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.
> java:132)\n\tat 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler
> .java:335)\n\tat 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.
> java:132)\n\tat 
> org.eclipse.jetty.server.Server.handle(Server.java:531)\

Re: MLT and facetting

2019-02-25 Thread Dave
Use the mlt to get the queries to use for getting facets in a two search 
approach

> On Feb 25, 2019, at 10:18 PM, Zheng Lin Edwin Yeo  
> wrote:
> 
> Hi Martin,
> 
> I think there are some pictures which are not being sent through in the
> email.
> 
> Do send your query that you are using, and which version of Solr you are
> using?
> 
> Regards,
> Edwin
> 
>> On Mon, 25 Feb 2019 at 20:54, Martin Frank Hansen (MHQ)  wrote:
>> 
>> Hi,
>> 
>> 
>> 
>> I am trying to combine the mlt functionality with facets, but Solr throws
>> org.apache.solr.common.SolrException: ":"Unable to compute facet ranges,
>> facet context is not set".
>> 
>> 
>> 
>> What I am trying to do is quite simple, find similar documents using mlt
>> and group these using the facet parameter. When using mlt and facets
>> separately everything works fine, but not when combining the functionality.
>> 
>> 
>> 
>> 
>> 
>> {
>> 
>>  "responseHeader":{
>> 
>>"status":500,
>> 
>>"QTime":109},
>> 
>>  "match":{"numFound":1,"start":0,"docs":[
>> 
>>  {
>> 
>>"Journalnummer":" 00759",
>> 
>>"id":"6512815"  },
>> 
>>  "response":{"numFound":602234,"start":0,"docs":[
>> 
>>  {
>> 
>>"Journalnummer":" 00759",
>> 
>>"id":"6512816",
>> 
>>  {
>> 
>>"Journalnummer":" 00759",
>> 
>>"id":"6834653"
>> 
>>  {
>> 
>>"Journalnummer":" 00739",
>> 
>>"id":"6202373"
>> 
>>  {
>> 
>>"Journalnummer":" 00739",
>> 
>>"id":"6748105"
>> 
>> 
>> 
>>  {
>> 
>>"Journalnummer":" 00803",
>> 
>>"id":"7402155"
>> 
>>  },
>> 
>>  "error":{
>> 
>>"metadata":[
>> 
>>  "error-class","org.apache.solr.common.SolrException",
>> 
>>  "root-error-class","org.apache.solr.common.SolrException"],
>> 
>>"msg":"Unable to compute facet ranges, facet context is not set",
>> 
>>"trace":"org.apache.solr.common.SolrException: Unable to compute facet
>> ranges, facet context is not set\n\tat
>> org.apache.solr.handler.component.RangeFacetProcessor.getFacetRangeCounts(RangeFacetProcessor.java:66)\n\tat
>> org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:331)\n\tat
>> org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:295)\n\tat
>> org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:240)\n\tat
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
>> org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)\n\tat
>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)\n\tat
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>> org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat
>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnect

Re: MLT and facetting

2019-02-25 Thread Zheng Lin Edwin Yeo
Hi Martin,

I think there are some pictures which are not being sent through in the
email.

Do send your query that you are using, and which version of Solr you are
using?

Regards,
Edwin

On Mon, 25 Feb 2019 at 20:54, Martin Frank Hansen (MHQ)  wrote:

> Hi,
>
>
>
> I am trying to combine the mlt functionality with facets, but Solr throws
> org.apache.solr.common.SolrException: ":"Unable to compute facet ranges,
> facet context is not set".
>
>
>
> What I am trying to do is quite simple, find similar documents using mlt
> and group these using the facet parameter. When using mlt and facets
> separately everything works fine, but not when combining the functionality.
>
>
>
>
>
> {
>
>   "responseHeader":{
>
> "status":500,
>
> "QTime":109},
>
>   "match":{"numFound":1,"start":0,"docs":[
>
>   {
>
> "Journalnummer":" 00759",
>
> "id":"6512815"  },
>
>   "response":{"numFound":602234,"start":0,"docs":[
>
>   {
>
> "Journalnummer":" 00759",
>
> "id":"6512816",
>
>   {
>
> "Journalnummer":" 00759",
>
> "id":"6834653"
>
>   {
>
> "Journalnummer":" 00739",
>
> "id":"6202373"
>
>   {
>
> "Journalnummer":" 00739",
>
> "id":"6748105"
>
>
>
>   {
>
> "Journalnummer":" 00803",
>
> "id":"7402155"
>
>   },
>
>   "error":{
>
> "metadata":[
>
>   "error-class","org.apache.solr.common.SolrException",
>
>   "root-error-class","org.apache.solr.common.SolrException"],
>
> "msg":"Unable to compute facet ranges, facet context is not set",
>
> "trace":"org.apache.solr.common.SolrException: Unable to compute facet
> ranges, facet context is not set\n\tat
> org.apache.solr.handler.component.RangeFacetProcessor.getFacetRangeCounts(RangeFacetProcessor.java:66)\n\tat
> org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:331)\n\tat
> org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:295)\n\tat
> org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:240)\n\tat
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)\n\tat
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)\n\tat
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
> org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)\n\tat
> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)\n\tat
> org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat
> org.eclipse.jetty.util.thread.strategy.

Re: how to get high-availability for Solr csv update handler?

2019-02-25 Thread Walter Underwood
We send batches of updates to a load balancer. The cluster gets the updates to 
the right leader with very little overhead. When we get an error, we resend the 
update batch. The load balancer will find a healthy node to receive it. This is 
simple, robust, and fast.

One handy tip: if a batch fails with a 400, we back off and resend it in 
batches of 1 document each so we can identify the bad one. This saves a ton of 
time trying to manually find the bad document.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 25, 2019, at 1:31 PM, Ganesh Sethuraman  
> wrote:
> 
> Thanks for details and updates. We are looking at load balancers not
> because of the little improvement in performance. But more for high
> availability. Other alternative is, if the update fails on one server using
> curl, on error we have to call another SOLR server. I was looking to see if
> there any other way to get the working leader from the Zookeeper before the
> update, is there a way to query zookeeper for the same? But, I understand
> there is no guarantee that leader wont change during the large CSV file
> update. But at least some protection during planed server restarts can be
> managed.
> 
> Regarding the Solrj option, it certainly seems to be best option, do we
> have the python solr client to it which can be Solr Leader aware? like how
> it is done in the solrj (java) client.
> 
> Regards,
> Ganesh
> 
> On Mon, Feb 25, 2019 at 3:00 PM Shawn Heisey  wrote:
> 
>> On 2/25/2019 11:15 AM, Ganesh Sethuraman wrote:
>>> We are using Solr Cloud 7.2.1. We are using Solr CSV update handler to do
>>> bulk update (several Millions of docs) in to multiple collections. When
>> we
>>> make a call to the CSV update handler using curl command line (as below),
>>> we are pointing to single server in Solr. During the problem time, when
>> one
>>> of the Solr server goes down this approach could fail. Is there any way
>>> that we do this to send the write to the leader, like how the solrj does,
>>> through the simple curl command(s) line?
>> 
>> The SolrJ client named CloudSolrClient is able to do this because it is
>> a full ZooKeeper client that has instant access to the clusterstate
>> maintained by your Solr servers.
>> 
>> To get that capability in any other client would require that the client
>> is aware of the ZooKeeper ensemble in the same way.  Curl cannot do this.
>> 
>>> 
>>> In the request below for some reason, if the SOLR1-SERVER is down, the
>>> request will fail, even though the new leader say SOLR2-SERVER is up.
>>> 
>>> curl 'http://
>> <>:8983/solr/my_collection/update?commit=true'
>>> --data-binary @example/exampledocs/books.csv -H
>>> 'Content-type:application/csv'
>>> 
>>> 1. I can create load balancer / ALB infront of solr, but that may not
>> still
>>> identify the Leader for efficiency.
>> 
>> A load balancer won't be able to identify the leader unless it is
>> capable of talking to ZooKeeper and knows how Solr represents data in
>> ZK.  Have you measured the efficiency improvement that comes from
>> sending to the leader?  If that improvement is small, it's probably not
>> worth implementing something that talks to ZooKeeper.  I know there are
>> people who don't try to send to leaders that are achieving very fast
>> indexing rates ... I suspect that the improvement obtained by sending to
>> leaders is relatively small.
>> 
>>> 2. I can write a solrj client to update, but i am not sure if i will get
>>> the efficiency of  bulk update? not sure about the simplicity of the curl
>>> as well.
>> 
>> SolrJ is probably more efficient than something like curl, because it
>> utilizes a compact binary format for data transfer in both directions,
>> called javabin.  With curl, you would most likely be using a text format
>> like json, xml, or csv.
>> 
>> SolrJ clients are fully thread-safe.  Which means you can use a single
>> instance to send updates in parallel with multiple threads.  That is the
>> best way to achieve good indexing performance with Solr.
>> 
>> Thanks,
>> Shawn
>> 



Re: how to get high-availability for Solr csv update handler?

2019-02-25 Thread Ganesh Sethuraman
Thanks for details and updates. We are looking at load balancers not
because of the little improvement in performance. But more for high
availability. Other alternative is, if the update fails on one server using
curl, on error we have to call another SOLR server. I was looking to see if
there any other way to get the working leader from the Zookeeper before the
update, is there a way to query zookeeper for the same? But, I understand
there is no guarantee that leader wont change during the large CSV file
update. But at least some protection during planed server restarts can be
managed.

Regarding the Solrj option, it certainly seems to be best option, do we
have the python solr client to it which can be Solr Leader aware? like how
it is done in the solrj (java) client.

Regards,
Ganesh

On Mon, Feb 25, 2019 at 3:00 PM Shawn Heisey  wrote:

> On 2/25/2019 11:15 AM, Ganesh Sethuraman wrote:
> > We are using Solr Cloud 7.2.1. We are using Solr CSV update handler to do
> > bulk update (several Millions of docs) in to multiple collections. When
> we
> > make a call to the CSV update handler using curl command line (as below),
> > we are pointing to single server in Solr. During the problem time, when
> one
> > of the Solr server goes down this approach could fail. Is there any way
> > that we do this to send the write to the leader, like how the solrj does,
> > through the simple curl command(s) line?
>
> The SolrJ client named CloudSolrClient is able to do this because it is
> a full ZooKeeper client that has instant access to the clusterstate
> maintained by your Solr servers.
>
> To get that capability in any other client would require that the client
> is aware of the ZooKeeper ensemble in the same way.  Curl cannot do this.
>
> >
> > In the request below for some reason, if the SOLR1-SERVER is down, the
> > request will fail, even though the new leader say SOLR2-SERVER is up.
> >
> > curl 'http://
> <>:8983/solr/my_collection/update?commit=true'
> > --data-binary @example/exampledocs/books.csv -H
> > 'Content-type:application/csv'
> >
> > 1. I can create load balancer / ALB infront of solr, but that may not
> still
> > identify the Leader for efficiency.
>
> A load balancer won't be able to identify the leader unless it is
> capable of talking to ZooKeeper and knows how Solr represents data in
> ZK.  Have you measured the efficiency improvement that comes from
> sending to the leader?  If that improvement is small, it's probably not
> worth implementing something that talks to ZooKeeper.  I know there are
> people who don't try to send to leaders that are achieving very fast
> indexing rates ... I suspect that the improvement obtained by sending to
> leaders is relatively small.
>
> > 2. I can write a solrj client to update, but i am not sure if i will get
> > the efficiency of  bulk update? not sure about the simplicity of the curl
> > as well.
>
> SolrJ is probably more efficient than something like curl, because it
> utilizes a compact binary format for data transfer in both directions,
> called javabin.  With curl, you would most likely be using a text format
> like json, xml, or csv.
>
> SolrJ clients are fully thread-safe.  Which means you can use a single
> instance to send updates in parallel with multiple threads.  That is the
> best way to achieve good indexing performance with Solr.
>
> Thanks,
> Shawn
>


Re: Schema configuration field defaults

2019-02-25 Thread Erick Erickson
Sure. In both cases define a fieldType with those attributes set however you 
want. Any field that is defined with that fieldType will have the defaults you 
specify unless overridden on the field definition itself.

Best,
Erick

> On Feb 25, 2019, at 9:08 AM, Dionte Smith  wrote:
> 
> Hi,
> 
> I have two questions about the field default values for multivalued and 
> indexed.
> 
> 
>  1.  Is it possible to make new fields have the indexed attribute set to 
> false by default for a schema? I understand this wouldn't normally be the 
> case, but we have a use case where it would be preferable as many fields may 
> be dynamically added via JSON.
>  2.  Is it possible to do the same for the multivalued attribute? If so, what 
> would happen if a field was dynamically added via JSON and it was contained 
> an array? Would Solr be able to determine that the field should instead be 
> created with the multivalued attribute set to true?
> 
> Kind Regards,
> 
> Dionté Smith
> Software Developer
> dionte.sm...@gm.com
> 
> 
> 
> Nothing in this message is intended to constitute an electronic signature 
> unless a specific statement to the contrary is included in this message.
> 
> Confidentiality Note: This message is intended only for the person or entity 
> to which it is addressed. It may contain confidential and/or privileged 
> material. Any review, transmission, dissemination or other use, or taking of 
> any action in reliance upon this message by persons or entities other than 
> the intended recipient is prohibited and may be unlawful. If you received 
> this message in error, please contact the sender and delete it from your 
> computer.



Re: how to get high-availability for Solr csv update handler?

2019-02-25 Thread Shawn Heisey

On 2/25/2019 11:15 AM, Ganesh Sethuraman wrote:

We are using Solr Cloud 7.2.1. We are using Solr CSV update handler to do
bulk update (several Millions of docs) in to multiple collections. When we
make a call to the CSV update handler using curl command line (as below),
we are pointing to single server in Solr. During the problem time, when one
of the Solr server goes down this approach could fail. Is there any way
that we do this to send the write to the leader, like how the solrj does,
through the simple curl command(s) line?


The SolrJ client named CloudSolrClient is able to do this because it is 
a full ZooKeeper client that has instant access to the clusterstate 
maintained by your Solr servers.


To get that capability in any other client would require that the client 
is aware of the ZooKeeper ensemble in the same way.  Curl cannot do this.




In the request below for some reason, if the SOLR1-SERVER is down, the
request will fail, even though the new leader say SOLR2-SERVER is up.

curl 'http://<>:8983/solr/my_collection/update?commit=true'
--data-binary @example/exampledocs/books.csv -H
'Content-type:application/csv'

1. I can create load balancer / ALB infront of solr, but that may not still
identify the Leader for efficiency.


A load balancer won't be able to identify the leader unless it is 
capable of talking to ZooKeeper and knows how Solr represents data in 
ZK.  Have you measured the efficiency improvement that comes from 
sending to the leader?  If that improvement is small, it's probably not 
worth implementing something that talks to ZooKeeper.  I know there are 
people who don't try to send to leaders that are achieving very fast 
indexing rates ... I suspect that the improvement obtained by sending to 
leaders is relatively small.



2. I can write a solrj client to update, but i am not sure if i will get
the efficiency of  bulk update? not sure about the simplicity of the curl
as well.


SolrJ is probably more efficient than something like curl, because it 
utilizes a compact binary format for data transfer in both directions, 
called javabin.  With curl, you would most likely be using a text format 
like json, xml, or csv.


SolrJ clients are fully thread-safe.  Which means you can use a single 
instance to send updates in parallel with multiple threads.  That is the 
best way to achieve good indexing performance with Solr.


Thanks,
Shawn


Schema configuration field defaults

2019-02-25 Thread Dionte Smith
Hi,

I have two questions about the field default values for multivalued and indexed.


  1.  Is it possible to make new fields have the indexed attribute set to false 
by default for a schema? I understand this wouldn't normally be the case, but 
we have a use case where it would be preferable as many fields may be 
dynamically added via JSON.
  2.  Is it possible to do the same for the multivalued attribute? If so, what 
would happen if a field was dynamically added via JSON and it was contained an 
array? Would Solr be able to determine that the field should instead be created 
with the multivalued attribute set to true?

Kind Regards,

Dionté Smith
Software Developer
dionte.sm...@gm.com



Nothing in this message is intended to constitute an electronic signature 
unless a specific statement to the contrary is included in this message.

Confidentiality Note: This message is intended only for the person or entity to 
which it is addressed. It may contain confidential and/or privileged material. 
Any review, transmission, dissemination or other use, or taking of any action 
in reliance upon this message by persons or entities other than the intended 
recipient is prohibited and may be unlawful. If you received this message in 
error, please contact the sender and delete it from your computer.


how to get high-availability for Solr csv update handler?

2019-02-25 Thread Ganesh Sethuraman
Hi

We are using Solr Cloud 7.2.1. We are using Solr CSV update handler to do
bulk update (several Millions of docs) in to multiple collections. When we
make a call to the CSV update handler using curl command line (as below),
we are pointing to single server in Solr. During the problem time, when one
of the Solr server goes down this approach could fail. Is there any way
that we do this to send the write to the leader, like how the solrj does,
through the simple curl command(s) line?

In the request below for some reason, if the SOLR1-SERVER is down, the
request will fail, even though the new leader say SOLR2-SERVER is up.

curl 'http://<>:8983/solr/my_collection/update?commit=true'
--data-binary @example/exampledocs/books.csv -H
'Content-type:application/csv'

1. I can create load balancer / ALB infront of solr, but that may not still
identify the Leader for efficiency.
2. I can write a solrj client to update, but i am not sure if i will get
the efficiency of  bulk update? not sure about the simplicity of the curl
as well.

Any best practices for the same would be good to have.

Regards
Ganesh


Re: Is anyone using proxy caching in front of solr?

2019-02-25 Thread Walter Underwood
Multiple caches can have the same hit rate as a single cache if the same query 
is always sent back to the same replica. This works great until a replica goes 
down. If the queries are redistributed, all the caches have the wrong content, 
very expensive. Instead. the queries need to be redistributed among the up 
replicas. We learned this the hard way at Infoseek in the late 1990s.

Overall, it is much easier to use a single HTTP cache in front of the cluster.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 25, 2019, at 8:43 AM, Michael Gibney  wrote:
> 
> Tangentially related, possibly of interest regarding solr-internal cache
> hit ratio (esp. with a lot of replicas):
> https://issues.apache.org/jira/browse/SOLR-13257
> 
> On Mon, Feb 25, 2019 at 11:33 AM Walter Underwood 
> wrote:
> 
>> Don’t worry about one and two character queries, because they will almost
>> always be served from cache.
>> 
>> There are only 26 one-letter queries (36 if you use numbers). Almost all
>> of those will be in the query results cache and will be very fast with very
>> little server load. The common two-letter queries will also be cached.
>> 
>> An external HTTP cache can be effective, especially if you have a lot of
>> replicas. The single cache will have a higher hit rate than the individual
>> servers.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Feb 25, 2019, at 7:57 AM, Edward Ribeiro 
>> wrote:
>>> 
>>> Maybe you could add a length filter factory to filter out queries with 2
>> or
>>> 3 characters using
>>> 
>> https://lucene.apache.org/solr/guide/7_4/filter-descriptions.html#FilterDescriptions-LengthFilter
>>> ?
>>> 
>>> PS: this filter requires a max length too.
>>> 
>>> Edward
>>> 
>>> Em qui, 21 de fev de 2019 04:52, Furkan KAMACI 
>>> escreveu:
>>> 
 Hi Joakim,
 
 I suggest you to read these resources:
 
 http://lucene.472066.n3.nabble.com/Varnish-td4072057.html
 http://lucene.472066.n3.nabble.com/SolrJ-HTTP-caching-td490063.html
 https://wiki.apache.org/solr/SolrAndHTTPCaches
 
 which gives information about HTTP Caching including Varnish Cache,
 Last-Modified, ETag, Expires, Cache-Control headers.
 
 Kind Regards,
 Furkan KAMACI
 
 On Wed, Feb 20, 2019 at 11:18 PM Joakim Hansson <
 joakim.hansso...@gmail.com>
 wrote:
 
> Hello dear user list!
> I work at a company in retail where we use solr to perform searches as
 you
> type.
> As soon as you type more than 1 characters in the search field solr
 starts
> serving hits.
> Of course this generates a lot of "unnecessary" queries (in the sense
 that
> they are never shown to the user) which is why I started thinking about
> using something like squid or varnish to cache a bunch of these 2-4
> character queries.
> 
> It seems most stuff I find about it is from pretty old sources, but as
 far
> as I know solrcloud doesn't have distributed cache support.
> 
> Our indexes aren't updated that frequently, about 4 - 6 times a day. We
> don't use a lot of shards and replicas (biggest index is split to 3
 shards
> with 2 replicas). All shards/replicas are not on the same solr host.
> Our solr setup handles around 80-200 queries per second during the day
 with
> peaks at >1500 before holiday season and sales.
> 
> I haven't really read up on the details yet but it seems like I could
>> use
> etags and Expires headers to work around having to do some of that
> "unnecessary" work.
> 
> Is anyone doing this? Why? Why not?
> 
> - peace!
> 
 
>> 
>> 



Re: Is anyone using proxy caching in front of solr?

2019-02-25 Thread Michael Gibney
Tangentially related, possibly of interest regarding solr-internal cache
hit ratio (esp. with a lot of replicas):
https://issues.apache.org/jira/browse/SOLR-13257

On Mon, Feb 25, 2019 at 11:33 AM Walter Underwood 
wrote:

> Don’t worry about one and two character queries, because they will almost
> always be served from cache.
>
> There are only 26 one-letter queries (36 if you use numbers). Almost all
> of those will be in the query results cache and will be very fast with very
> little server load. The common two-letter queries will also be cached.
>
> An external HTTP cache can be effective, especially if you have a lot of
> replicas. The single cache will have a higher hit rate than the individual
> servers.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Feb 25, 2019, at 7:57 AM, Edward Ribeiro 
> wrote:
> >
> > Maybe you could add a length filter factory to filter out queries with 2
> or
> > 3 characters using
> >
> https://lucene.apache.org/solr/guide/7_4/filter-descriptions.html#FilterDescriptions-LengthFilter
> > ?
> >
> > PS: this filter requires a max length too.
> >
> > Edward
> >
> > Em qui, 21 de fev de 2019 04:52, Furkan KAMACI 
> > escreveu:
> >
> >> Hi Joakim,
> >>
> >> I suggest you to read these resources:
> >>
> >> http://lucene.472066.n3.nabble.com/Varnish-td4072057.html
> >> http://lucene.472066.n3.nabble.com/SolrJ-HTTP-caching-td490063.html
> >> https://wiki.apache.org/solr/SolrAndHTTPCaches
> >>
> >> which gives information about HTTP Caching including Varnish Cache,
> >> Last-Modified, ETag, Expires, Cache-Control headers.
> >>
> >> Kind Regards,
> >> Furkan KAMACI
> >>
> >> On Wed, Feb 20, 2019 at 11:18 PM Joakim Hansson <
> >> joakim.hansso...@gmail.com>
> >> wrote:
> >>
> >>> Hello dear user list!
> >>> I work at a company in retail where we use solr to perform searches as
> >> you
> >>> type.
> >>> As soon as you type more than 1 characters in the search field solr
> >> starts
> >>> serving hits.
> >>> Of course this generates a lot of "unnecessary" queries (in the sense
> >> that
> >>> they are never shown to the user) which is why I started thinking about
> >>> using something like squid or varnish to cache a bunch of these 2-4
> >>> character queries.
> >>>
> >>> It seems most stuff I find about it is from pretty old sources, but as
> >> far
> >>> as I know solrcloud doesn't have distributed cache support.
> >>>
> >>> Our indexes aren't updated that frequently, about 4 - 6 times a day. We
> >>> don't use a lot of shards and replicas (biggest index is split to 3
> >> shards
> >>> with 2 replicas). All shards/replicas are not on the same solr host.
> >>> Our solr setup handles around 80-200 queries per second during the day
> >> with
> >>> peaks at >1500 before holiday season and sales.
> >>>
> >>> I haven't really read up on the details yet but it seems like I could
> use
> >>> etags and Expires headers to work around having to do some of that
> >>> "unnecessary" work.
> >>>
> >>> Is anyone doing this? Why? Why not?
> >>>
> >>> - peace!
> >>>
> >>
>
>


Re: Is anyone using proxy caching in front of solr?

2019-02-25 Thread Walter Underwood
Don’t worry about one and two character queries, because they will almost 
always be served from cache.

There are only 26 one-letter queries (36 if you use numbers). Almost all of 
those will be in the query results cache and will be very fast with very little 
server load. The common two-letter queries will also be cached.

An external HTTP cache can be effective, especially if you have a lot of 
replicas. The single cache will have a higher hit rate than the individual 
servers.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 25, 2019, at 7:57 AM, Edward Ribeiro  wrote:
> 
> Maybe you could add a length filter factory to filter out queries with 2 or
> 3 characters using
> https://lucene.apache.org/solr/guide/7_4/filter-descriptions.html#FilterDescriptions-LengthFilter
> ?
> 
> PS: this filter requires a max length too.
> 
> Edward
> 
> Em qui, 21 de fev de 2019 04:52, Furkan KAMACI 
> escreveu:
> 
>> Hi Joakim,
>> 
>> I suggest you to read these resources:
>> 
>> http://lucene.472066.n3.nabble.com/Varnish-td4072057.html
>> http://lucene.472066.n3.nabble.com/SolrJ-HTTP-caching-td490063.html
>> https://wiki.apache.org/solr/SolrAndHTTPCaches
>> 
>> which gives information about HTTP Caching including Varnish Cache,
>> Last-Modified, ETag, Expires, Cache-Control headers.
>> 
>> Kind Regards,
>> Furkan KAMACI
>> 
>> On Wed, Feb 20, 2019 at 11:18 PM Joakim Hansson <
>> joakim.hansso...@gmail.com>
>> wrote:
>> 
>>> Hello dear user list!
>>> I work at a company in retail where we use solr to perform searches as
>> you
>>> type.
>>> As soon as you type more than 1 characters in the search field solr
>> starts
>>> serving hits.
>>> Of course this generates a lot of "unnecessary" queries (in the sense
>> that
>>> they are never shown to the user) which is why I started thinking about
>>> using something like squid or varnish to cache a bunch of these 2-4
>>> character queries.
>>> 
>>> It seems most stuff I find about it is from pretty old sources, but as
>> far
>>> as I know solrcloud doesn't have distributed cache support.
>>> 
>>> Our indexes aren't updated that frequently, about 4 - 6 times a day. We
>>> don't use a lot of shards and replicas (biggest index is split to 3
>> shards
>>> with 2 replicas). All shards/replicas are not on the same solr host.
>>> Our solr setup handles around 80-200 queries per second during the day
>> with
>>> peaks at >1500 before holiday season and sales.
>>> 
>>> I haven't really read up on the details yet but it seems like I could use
>>> etags and Expires headers to work around having to do some of that
>>> "unnecessary" work.
>>> 
>>> Is anyone doing this? Why? Why not?
>>> 
>>> - peace!
>>> 
>> 



Re: Is anyone using proxy caching in front of solr?

2019-02-25 Thread Edward Ribeiro
Maybe you could add a length filter factory to filter out queries with 2 or
3 characters using
https://lucene.apache.org/solr/guide/7_4/filter-descriptions.html#FilterDescriptions-LengthFilter
?

PS: this filter requires a max length too.

Edward

Em qui, 21 de fev de 2019 04:52, Furkan KAMACI 
escreveu:

> Hi Joakim,
>
> I suggest you to read these resources:
>
> http://lucene.472066.n3.nabble.com/Varnish-td4072057.html
> http://lucene.472066.n3.nabble.com/SolrJ-HTTP-caching-td490063.html
> https://wiki.apache.org/solr/SolrAndHTTPCaches
>
> which gives information about HTTP Caching including Varnish Cache,
> Last-Modified, ETag, Expires, Cache-Control headers.
>
> Kind Regards,
> Furkan KAMACI
>
> On Wed, Feb 20, 2019 at 11:18 PM Joakim Hansson <
> joakim.hansso...@gmail.com>
> wrote:
>
> > Hello dear user list!
> > I work at a company in retail where we use solr to perform searches as
> you
> > type.
> > As soon as you type more than 1 characters in the search field solr
> starts
> > serving hits.
> > Of course this generates a lot of "unnecessary" queries (in the sense
> that
> > they are never shown to the user) which is why I started thinking about
> > using something like squid or varnish to cache a bunch of these 2-4
> > character queries.
> >
> > It seems most stuff I find about it is from pretty old sources, but as
> far
> > as I know solrcloud doesn't have distributed cache support.
> >
> > Our indexes aren't updated that frequently, about 4 - 6 times a day. We
> > don't use a lot of shards and replicas (biggest index is split to 3
> shards
> > with 2 replicas). All shards/replicas are not on the same solr host.
> > Our solr setup handles around 80-200 queries per second during the day
> with
> > peaks at >1500 before holiday season and sales.
> >
> > I haven't really read up on the details yet but it seems like I could use
> > etags and Expires headers to work around having to do some of that
> > "unnecessary" work.
> >
> > Is anyone doing this? Why? Why not?
> >
> > - peace!
> >
>


MLT and facetting

2019-02-25 Thread Martin Frank Hansen (MHQ)
Hi,

I am trying to combine the mlt functionality with facets, but Solr throws 
org.apache.solr.common.SolrException: ":"Unable to compute facet ranges, facet 
context is not set".

What I am trying to do is quite simple, find similar documents using mlt and 
group these using the facet parameter. When using mlt and facets separately 
everything works fine, but not when combining the functionality.

[cid:image002.png@01D4CD11.A38E3110]
[cid:image003.png@01D4CD11.A38E3110]

{
  "responseHeader":{
"status":500,
"QTime":109},
  "match":{"numFound":1,"start":0,"docs":[
  {
"Journalnummer":" 00759",
"id":"6512815"  },
  "response":{"numFound":602234,"start":0,"docs":[
  {
"Journalnummer":" 00759",
"id":"6512816",
  {
"Journalnummer":" 00759",
"id":"6834653"
  {
"Journalnummer":" 00739",
"id":"6202373"
  {
"Journalnummer":" 00739",
"id":"6748105"

  {
"Journalnummer":" 00803",
"id":"7402155"
  },
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"Unable to compute facet ranges, facet context is not set",
"trace":"org.apache.solr.common.SolrException: Unable to compute facet 
ranges, facet context is not set\n\tat 
org.apache.solr.handler.component.RangeFacetProcessor.getFacetRangeCounts(RangeFacetProcessor.java:66)\n\tat
 
org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:331)\n\tat
 
org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:295)\n\tat
 
org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:240)\n\tat
 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
 org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)\n\tat 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)\n\tat 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)\n\tat 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat
 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat
 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
 org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)\n\tat
 org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)\n\tat 
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat
 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\n\tat
 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)\n\tat
 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)\n\tat
 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java

RE: Solr 7.7 Cloud - can it adapt to Zookeeper address changes while running?

2019-02-25 Thread Addison, Alex (LNG-LON)
Is there a guide to doing this with elastic IPs or Route 53 that you're aware 
of? If not, I'll aim to publish one as a blog entry. This is the solution we're 
working on, but I thought it was worth asking the community in case there was a 
simpler way.

-Original Message-
From: Jörn Franke 
Sent: 25 February 2019 07:51
To: solr-user@lucene.apache.org
Cc: Thirunilathil, Shalini (LNG-LON) ; 
Allen, Steve P. (LNG-LON) ; Irwin, Max (LNG-CON) 

Subject: Re: Solr 7.7 Cloud - can it adapt to Zookeeper address changes while 
running?

*** External email: use caution ***



And or aws route 53

> Am 25.02.2019 um 08:46 schrieb Jörn Franke :
>
> Elastic ip addresses?
> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-address
> es-eip.html
>
>> Am 25.02.2019 um 08:22 schrieb Addison, Alex (LNG-LON) 
>> :
>>
>> Hi all, we're looking at how to run Solr & Zookeeper in production. We're 
>> running everything in AWS, and for resiliency we're using Exhibitor with 
>> Zookeeper and keeping Zookeeper in an auto-scaling group just to re-create 
>> instances that are terminated for whatever reason.
>> Unfortunately it's not simple to set this up so that Zookeeper retains a 
>> fixed IP or DNS name through such re-creation (i.e. the new virtual machine 
>> will have a new name and IP address); is there a way to inform Solr that the 
>> set of Zookeeper nodes it should talk to has changed? We're using Solr Cloud 
>> 7.7.
>>
>> Thanks,
>> Alex Addison
>>
>>
>> 
>>
>> LexisNexis is a trading name of RELX (UK) LIMITED - Registered office
>> - 1-3 STRAND, LONDON WC2N 5JR Registered in England - Company No. 02746621



LexisNexis is a trading name of RELX (UK) LIMITED - Registered office - 1-3 
STRAND, LONDON WC2N 5JR
Registered in England - Company No. 02746621