Solr Repeaters/Slaves replicating are every commit on Master instead of Optimize

2019-08-31 Thread Monil Parikh
Hello Solr Users,

I am trying to get Master-Repeater-Slave config to work, I am facing
replication related issue on luceneMatchVersion 7.7.1.

Posted on stack overflow with all details:
https://stackoverflow.com/questions/57741934/solr-repeaters-slaves-replicating-are-every-commit-on-master-instead-of-optimize

Thanks in advance!


Re: Multi-lingual Search & Accent Marks

2019-08-31 Thread Walter Underwood
> On Aug 31, 2019, at 12:00 PM, Toke Eskildsen  wrote:
> 
> Whenever we do this normalisation, we index two versions in our index: A very 
> lightly normalised (lowercased) field and a heavily normalised field: If a 
> record has a title "Köket" (kitchen in Swedish), we store title_orig:köket 
> and title_norm:køket. […] Going with what we do, my answer would be: Yes, do 
> preserve and also remove :-)


Right after I posted, I realized that I wanted to say “include all” as an 
option. They can even be in the same field with synonyms at the same token 
position.

Also, don’t worry too much about creating junk terms in the index with nonsense 
transliterations. Terms are cheap in search indexes (up to a point). So it 
really is OK to have all of these indexed at the same position, even if the 
last one is garbage. This still has the schön/schon problem, but at least there 
is a match.

coöperation
cooperation
cooepoeration (typewriter umlaut version)

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Re: Multi-lingual Search & Accent Marks

2019-08-31 Thread Toke Eskildsen
Audrey Lorberfeld - audrey.lorberf...@ibm.com  wrote:
> Just wanting to test the waters here – for those of you with search engines
> that index multiple languages, do you use ASCII-folding in your schema?

Our primary search engine is for Danish users, with sources being bibliographic 
records with titles and other meta data in many different languages. We 
normalise to Danish, meaning that most ligatures are removed, but also that 
letters such as Swedish ö becomes Danish ø. The rules for normalisation are 
dictated by Danish library practice and was implemented by a resident librarian.

Whenever we do this normalisation, we index two versions in our index: A very 
lightly normalised (lowercased) field and a heavily normalised field: If a 
record has a title "Köket" (kitchen in Swedish), we store title_orig:köket and 
title_norm:køket. edismax is used to ensure that both fields are searched per 
default (plus an explicit field alias "title" are set to point to both 
title_orig and title_norm for qualified searches) and that matches in 
title_orig has more weight for relevance calculation.

> We are onboarding Spanish documents into our index right now and keep
> going back and forth on whether we should preserve accent marks.

Going with what we do, my answer would be: Yes, do preserve and also remove 
:-). You could even have 3 or more levels of normalisation, depending on how 
much time you have for polishing.

- Toke Eskildsen


Re: ExecutorService support in SolrIndexSearcher

2019-08-31 Thread Michael McCandless
We pass ExecutorService to Lucene's IndexSearcher at Amazon (for customer
facing product search) and it's a big win on long-pole query latencies, but
hurts red-line QPS (cluster capacity) a bit, due to less efficient
collection across segments and thread context switching.

I'm surprised it's not an option for Solr and Elasticsearch ... for certain
applications it's a huge win.

And yes as David points out -- Collectors (CollectorManagers) need to
support this "gather results for each segment separately then reduce in the
end" mode...

Mike McCandless

http://blog.mikemccandless.com


On Fri, Aug 30, 2019 at 4:45 PM David Smiley 
wrote:

> It'd take some work to do that.  Years ago I recall Etsy did a POC and
> shared their experience at Lucene/Solr Revolution in Washington DC; I
> attended the presentation with great interest.  One of the major obstacles,
> if I recall, was the Collector needs to support this mode of operation, and
> in particular Solr's means of flipping bits in a big bitset to accumulate
> the DocSet had to be careful so that multiple threads don't try to
> overwrite the same underlying "long" in the long[].
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Mon, Aug 26, 2019 at 7:02 AM Aghasi Ghazaryan
>  wrote:
>
> > Hi,
> >
> > Lucene's IndexSearcher
> > <
> >
> http://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/IndexSearcher.html#IndexSearcher-org.apache.lucene.index.IndexReaderContext-java.util.concurrent.ExecutorService-
> > >
> > supports
> > running searches for each segment separately, using the provided
> > ExecutorService.
> > I wonder why SolrIndexSearcher does not support the same as it may
> improve
> > queries performance a lot?
> >
> > Thanks, looking forward to hearing from you.
> >
> > Regards
> > Aghasi Ghazaryan
> >
>