Solr Repeaters/Slaves replicating are every commit on Master instead of Optimize
Hello Solr Users, I am trying to get Master-Repeater-Slave config to work, I am facing replication related issue on luceneMatchVersion 7.7.1. Posted on stack overflow with all details: https://stackoverflow.com/questions/57741934/solr-repeaters-slaves-replicating-are-every-commit-on-master-instead-of-optimize Thanks in advance!
Re: Multi-lingual Search & Accent Marks
> On Aug 31, 2019, at 12:00 PM, Toke Eskildsen wrote: > > Whenever we do this normalisation, we index two versions in our index: A very > lightly normalised (lowercased) field and a heavily normalised field: If a > record has a title "Köket" (kitchen in Swedish), we store title_orig:köket > and title_norm:køket. […] Going with what we do, my answer would be: Yes, do > preserve and also remove :-) Right after I posted, I realized that I wanted to say “include all” as an option. They can even be in the same field with synonyms at the same token position. Also, don’t worry too much about creating junk terms in the index with nonsense transliterations. Terms are cheap in search indexes (up to a point). So it really is OK to have all of these indexed at the same position, even if the last one is garbage. This still has the schön/schon problem, but at least there is a match. coöperation cooperation cooepoeration (typewriter umlaut version) wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)
Re: Multi-lingual Search & Accent Marks
Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > Just wanting to test the waters here – for those of you with search engines > that index multiple languages, do you use ASCII-folding in your schema? Our primary search engine is for Danish users, with sources being bibliographic records with titles and other meta data in many different languages. We normalise to Danish, meaning that most ligatures are removed, but also that letters such as Swedish ö becomes Danish ø. The rules for normalisation are dictated by Danish library practice and was implemented by a resident librarian. Whenever we do this normalisation, we index two versions in our index: A very lightly normalised (lowercased) field and a heavily normalised field: If a record has a title "Köket" (kitchen in Swedish), we store title_orig:köket and title_norm:køket. edismax is used to ensure that both fields are searched per default (plus an explicit field alias "title" are set to point to both title_orig and title_norm for qualified searches) and that matches in title_orig has more weight for relevance calculation. > We are onboarding Spanish documents into our index right now and keep > going back and forth on whether we should preserve accent marks. Going with what we do, my answer would be: Yes, do preserve and also remove :-). You could even have 3 or more levels of normalisation, depending on how much time you have for polishing. - Toke Eskildsen
Re: ExecutorService support in SolrIndexSearcher
We pass ExecutorService to Lucene's IndexSearcher at Amazon (for customer facing product search) and it's a big win on long-pole query latencies, but hurts red-line QPS (cluster capacity) a bit, due to less efficient collection across segments and thread context switching. I'm surprised it's not an option for Solr and Elasticsearch ... for certain applications it's a huge win. And yes as David points out -- Collectors (CollectorManagers) need to support this "gather results for each segment separately then reduce in the end" mode... Mike McCandless http://blog.mikemccandless.com On Fri, Aug 30, 2019 at 4:45 PM David Smiley wrote: > It'd take some work to do that. Years ago I recall Etsy did a POC and > shared their experience at Lucene/Solr Revolution in Washington DC; I > attended the presentation with great interest. One of the major obstacles, > if I recall, was the Collector needs to support this mode of operation, and > in particular Solr's means of flipping bits in a big bitset to accumulate > the DocSet had to be careful so that multiple threads don't try to > overwrite the same underlying "long" in the long[]. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Mon, Aug 26, 2019 at 7:02 AM Aghasi Ghazaryan > wrote: > > > Hi, > > > > Lucene's IndexSearcher > > < > > > http://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/IndexSearcher.html#IndexSearcher-org.apache.lucene.index.IndexReaderContext-java.util.concurrent.ExecutorService- > > > > > supports > > running searches for each segment separately, using the provided > > ExecutorService. > > I wonder why SolrIndexSearcher does not support the same as it may > improve > > queries performance a lot? > > > > Thanks, looking forward to hearing from you. > > > > Regards > > Aghasi Ghazaryan > > >