question about solrCloud joining
Hello, I have a question about solrCloud joining. i knew solrCloud joining can do join only when index is not splited to shards, but when i test it, i found a problem which make me confused. i tested it on version 8.2 assuming i have 2 collections like sample about "joining" on solr offcial website, one collection called "movies", another called "movieDirectors". movies's fields: id, title, director_id movieDirectors's fields: id, name, has_oscar the information of shards and replicas as below, i started two nodes on my laptop: [image: image.png] moviesDirectors have 2 docs: [image: image.png] movies also have 2 docs: [image: image.png] everything is ok when i run query with "{!join from=id fromIndex=movieDirectors to=director_id}has_oscar:true" on both 8983 and 8984, i can got expacted result: [image: image.png] but when i run "{!join from=director_id fromIndex=movies to=id}title:"Dunkirk"" on 8983 got 1 doc and if i filter by "title:Get Out", i got nothing. i understood "Get Out" is not exist in 8983. [image: image.png] [image: image.png] but question is coming, when i run "{!join from=director_id fromIndex=movies to=id}title:"Dunkirk"" on 8984, i got "SolrCloud join: multiple shards not yet supported movies" no matter what filter value is. i found following code: [image: image.png] when i run joining from movies on 8983, slice length is 2 as movies have 2 shards. "fromReplica " was assigned in second cycle, because zkController name is 8983 and replica name is 8984 in first cycle. but when run on 8984, "fromReplica" was assigned in first cycle, because zkController name isand replica name both are 8984 in first cycle, so throw "SolrCloud join: multiple shards not yet supported" in second cycle. Thanks for your patience, it's too long. i'm confused about why use this way to judge "multiple shards", because the result is also wrong running on 8983 even if didnt throw exception. why dont use slice length>1 to judge "multiple shards" ? or maybe have other better way? Please advise. Thanks in advance!
8.2.0 getting warning - unable to load jetty, not starting JettyAdminServer
Hi, I am getting following warning in Solr admin UI logs. I did not get this warning in Solr 8.1.1 Please note that I am using Solr docker slim image from here - https://hub.docker.com/_/solr/ Unable to load jetty, not starting JettyAdminServer
Re: HttpShardHandlerFactory
Mark, Another thing to check is that I believe the configuration you posted may not actually be taking effect. Unless I'm mistaken, I think the correct element name to configure the shardHandler is "shardHandler*Factory*", not "shardHandler" ... as in, '...' The element name is documented correctly in the refGuide page for "Format of solr.xml": https://lucene.apache.org/solr/guide/8_1/format-of-solr-xml.html#the-shardhandlerfactory-element ... but the incorrect (?) element name is included in the refGuide page for "Distributed Requests": https://lucene.apache.org/solr/guide/8_1/distributed-requests.html#configuring-the-shardhandlerfactory Michael On Fri, Aug 16, 2019 at 9:40 AM Shawn Heisey wrote: > On 8/16/2019 3:51 AM, Mark Robinson wrote: > > I am trying to understand the socket time out and connection time out in > > the HttpShardHandlerFactory:- > > > > > >10 > >20 > > > > The shard handler is used when that Solr instance needs to make > connections to another Solr instance (which could be itself, as odd as > that might sound). It does not apply to the requests that you make from > outside Solr. > > > 1.Could some one please help me understand the effect of using such low > > values of 10 ms > > and 20ms as given above inside my /select handler? > > A connection timeout of 10 milliseconds *might* result in connections > not establishing at all. This is translated down to the TCP socket as > the TCP connection timeout -- the time limit imposed on making the TCP > connection itself. Which as I understand it, is the completion of the > "SYN", "SYN/ACK", and "ACK" sequence. If the two endpoints of the > connection are on a LAN, you might never see a problem from this -- LAN > connections are very low latency. But if they are across the Internet, > they might never work. > > The socket timeout of 20 milliseconds means that if the connection goes > idle for 20 milliseconds, it will be forcibly closed. So if it took 25 > milliseconds for the remote Solr instance to respond, this Solr instance > would have given up and closed the connection. It is extremely common > for requests to take 100, 500, 2000, or more milliseconds to respond. > > > 2. What is the guidelines for setting these parameters? Should they be > low > > or high > > I would probably use a value of about 5000 (five seconds) for the > connection timeout if everything's on a local LAN. I might go as high > as 15 seconds if there's a high latency network between them, but five > seconds is probably long enough too. > > For the socket timeout, you want a value that's considerably longer than > you expect requests to ever take. Probably somewhere between two and > five minutes. > > > 3. How can I test the effect of this chunk of code after adding it to my > > /select handler ie I want to > > make sure the above code snippet is working. That is why I gave > such > > low values and > > thought when I fire a query I would get both time out errors in the > > logs. But did not! > > Or is it that within the above time frame (10 ms, 20ms) if no > request > > comes the socket will > > time out and the connection will be lost. So to test this should I > > give a say 100 TPS load with > > these low values and then increase the values to maybe 1000 ms and > > 1500 ms respectively > > and see lesser time out error messages? > > If you were running a multi-server SolrCloud setup (or a single-server > setup with multiple shards and/or replicas), you probably would see > problems from values that low. But if Solr never has any need to make > connections to satisfy a request, then the values will never take effect. > > If you want to control these values for requests made from outside Solr, > you will need to do it in your client software that is making the request. > > Thanks, > Shawn >
Re: Slow Indexing scaling issue
Hi Parmeshwor, 2 hours for 3 gb of data seems too slow. We scale up to PBs in such a way: 1) Ignore all commits from client via IgnoreCommitOptimizeUpdateProcessorFactory 2) Heavy processes are done on external Tika server instead of Solr Cell with embedded Tika feature. 3) Adjust autocommit, softcommit and shard size according to your needs. 4) Adjust JVM parameters. 5) Do not use swap if you can. Kind Regards, Furkan KAMACI On Tue, Aug 13, 2019 at 8:37 PM Erick Erickson wrote: > Here’s some sample SolrJ code using TIka outside of Solr’s Extracting > Request Handler, along with some info about why loading Solr with the job > of extracting text is not optimal speed wise: > > https://lucidworks.com/post/indexing-with-solrj/ > > > On Aug 13, 2019, at 12:15 PM, Jan Høydahl wrote: > > > > You May want to review > https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-SlowIndexing > for some hints. > > > > Make sure to index with multiple parallel threads. Also remember that > using /extract on the solr side is resource intensive and may make your > cluster slow and unstable. Better to use Tika or similar on the client side > and send text docs to solr. > > > > Jan Høydahl > > > >> 13. aug. 2019 kl. 16:52 skrev Parmeshwor Thapa < > thapa.parmesh...@gmail.com>: > >> > >> Hi, > >> > >> We are having some issue on scaling solr indexing. Looking for > suggestion. > >> > >> Setup : We have two solr cloud (7.4) instances running in separate cloud > >> VMs with an external zookeeper ensemble. > >> > >> We are sending async / non-blocking http request to index documents in > solr. > >> 2 > >> > >> cloud VMs ( 4 core * 32 GB) > >> > >> 16 gb allocated for jvm > >> > >> We are sending all types to document to solr , which it would extract > and > >> index, Using /update/extract request handler > >> > >> We have stopwords.txt and dictionary (7mb) for stemming. > >> > >> > >> > >> Issue : indexing speed is quite slow for us. It is taking around 2 > hours to > >> index around 3 gb of data. 10,000 documents(PDF, xls, word, etc). We are > >> planning to index approximately 10 tb of data. > >> > >> Below is the solr config setting and schema, > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> >> languageSet="auto" ruleType="APPROX" concat="true"/> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> >> tokenizerModel="en-token.bin" sentenceModel="en-sent.bin"/> > >> > >> > >> > >> >> posTaggerModel="en-pos-maxent.bin"/> > >> > >> >> dictionary="en-lemmatizer-again.dict.txt"/> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> >> stored="false"/> > >> > >> > >> > >> > >> > >> >> indexed="true" stored="true"/> > >> > >> >> required="true" stored="true"/> > >> > >> >> indexed="true" stored="true"/> > >> > >> >> indexed="true" stored="true"/> > >> > >> >> stored="true"/> > >> > >> >> indexed="true" stored="true"/> > >> > >> >> indexed="true" stored="true" /> > >> > >> >> stored="true"/> > >> > >> >> indexed="true" stored="true"/> > >> > >> >> indexed="true" stored="true"/> > >> > >> >> indexed="true" stored="false"/> > >> > >> >> indexed="true" stored="false"/> > >> > >> >> indexed="true" stored="true"/> > >> > >> >> indexed="true" stored="true"/> > >> > >> >> indexed="true" stored="true"/> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> stored="false" > >> docValues="false" /> > >> > >> > >> > >> And below is the solrConfig, > >> > >> > >> > >> > >> > >> BEST_COMPRESSION > >> > >> > >> > >> > >> > >> > >> > >> 1000 > >> > >> 60 > >> > >> false > >> > >> > >> > >> > >> > >> > >> > >> ${solr.autoSoftCommit.maxTime:-1} > >> > >> > >> > >> > >> > >> >> > >> startup="lazy" > >> > >> class="solr.extraction.ExtractingRequestHandler" > > >> > >> > >> > >> true > >> > >> ignored_ > >> > >> content > >> > >> > >> > >> > >> > >> *Thanks,* > >> > >> *Parmeshwor Thapa* > >
Re: Multiple Request to solr from dotnet application
Hi, Can you provide an example what you want to achieve? Multiple requests in parallel? Are those requests related? Best regards > Am 19.08.2019 um 01:44 schrieb Prabhu Dhanaraj > : > > Hi Team > > I would like to know if there is any way where we can combine multiple > requests and send it solr. > We are using a dot net application to send the request to solr. > Please let us know if there is any article or sample code elated to this. > > Thanks > Prabhu > > American Express made the following annotations > > "This message and any attachments are solely for the intended recipient and > may contain confidential or privileged information. If you are not the > intended recipient, any disclosure, copying, use, or distribution of the > information > > included in this message and any attachments is prohibited. If you have > received this communication in error, please notify us by reply e-mail and > immediately and permanently delete this message and any attachments. Thank > you." > American Express a ajouté le commentaire suivant le > Ce courrier et toute pièce jointe qu'il contient sont réservés au seul > destinataire indiqué et peuvent renfermer des renseignements confidentiels et > privilégiés. Si vous n'êtes pas le destinataire prévu, toute divulgation, > duplication, utilisation ou distribution du courrier ou de toute pièce jointe > est interdite. Si vous avez reçu cette communication par erreur, veuillez > nous en aviser par courrier et détruire immédiatement le courrier et les > pièces jointes. Merci. >
Multiple Request to solr from dotnet application
Hi Team I would like to know if there is any way where we can combine multiple requests and send it solr. We are using a dot net application to send the request to solr. Please let us know if there is any article or sample code elated to this. Thanks Prabhu American Express made the following annotations "This message and any attachments are solely for the intended recipient and may contain confidential or privileged information. If you are not the intended recipient, any disclosure, copying, use, or distribution of the information included in this message and any attachments is prohibited. If you have received this communication in error, please notify us by reply e-mail and immediately and permanently delete this message and any attachments. Thank you." American Express a ajouté le commentaire suivant le Ce courrier et toute pièce jointe qu'il contient sont réservés au seul destinataire indiqué et peuvent renfermer des renseignements confidentiels et privilégiés. Si vous n'êtes pas le destinataire prévu, toute divulgation, duplication, utilisation ou distribution du courrier ou de toute pièce jointe est interdite. Si vous avez reçu cette communication par erreur, veuillez nous en aviser par courrier et détruire immédiatement le courrier et les pièces jointes. Merci.