Re: Need an advice for architecture.
: FWIW: I used the script below to build myself 3.8 million documents, with : 300 "text fields" consisting of anywhere from 1-10 "words" (integers : between 1 and 200) Whoops ... forgot to post the script... #!/usr/bin/perl use strict; use warnings; my $num_docs = 3_800_000; my $max_words_in_field = 10; my $words_in_vocab = 200; my $num_fields = 300; # header print "id"; map { print ",${_}_t" } 1..$num_fields; print "\n"; while ($num_docs--) { print "$num_docs"; # uniqueKey for (1..$num_fields) { my $words_in_field = int(rand($max_words_in_field)); print ",\""; map { print int(rand($words_in_vocab)) . " " } 0..$words_in_field; print "\""; } print "\n"; }
Re: Need an advice for architecture.
: SQL DB 4M documents with up to 5000 metadata fields each document [2xXeon : 2.1Ghz, 32GB RAM] : Actual Solr: 1 Core version 4.6, 3.8M documents, schema has 300 metadata : fields to import, size 3.6GB [2xXeon 2.4Ghz, 32GB RAM] : (atm we need 35h to build the index and about 24h for a mass update which : affects the production) The first question i have is why you are using a version of Solr that's almost 5 years old. The second question you should consider is what your indexing process looks like, and whether it's multithreaded or not, and if the bottleneck is your network/DB. The third question to consider is your solr configuration / schema: how complex the solr side indexing process is -- ie: are these 300 fields all TextFields with complex analyzers? FWIW: I used the script below to build myself 3.8 million documents, with 300 "text fields" consisting of anywhere from 1-10 "words" (integers between 1 and 200) The resulting CSV file was 24GB, and using a simple curl command to index with a single client thread (and a single solr thread) against the solr 7.4 running with the sample techproducts configs took less then 2 hours on my laptop (less CPU & half as much ram compared to your server) while i was doing other stuff. (I would bet your current indexing speed has very little to do with solr and is largey a factor of your source DB and how you are sending the data to solr) -Hoss http://www.lucidworks.com/
Re: Document Count Difference Between Solr Versions 4.7 and 7.3
: I performed a bulk reindex against one of our larger databases for the first : time using solr 7.3. The document count was substantially less (like at : least 15% less) than our most recent bulk reindex from th previous solr 4.7 : server. I will perform a more careful analysis, but I am assuming the : document count should not be different against the same database, even : accounting for the schema updates required for going from 4.7 to 7.3. Was the exact same souce data used in both cases? ... you mentioned "most recent bulk reindex" but it's not clear if the source data changed since that last index job. what does your bulk indexing code look like? does it log errors from solr? were there any errors in the solr logs? -Hoss http://www.lucidworks.com/
RE: SOLR 7.2.1 on SLES 11?
Welp, that didn't go spectacularly. All the OpenSuSE SLES 11 downloads are RPM, both source and compiled. Non-relocatable. I did attempt to rebuild, but it choked on the following dependencies: audit-devel is needed by bash-4.3-286.1.x86_64 fdupes is needed by bash-4.3-286.1.x86_64 patchutils is needed by bash-4.3-286.1.x86_64 If I can find a repository for them I can throw that into Zypper, but thus far I've failed. Anyone out there have any suggestions? -Original Message- From: Lichte, Lucas R - DHS (Tek Systems) [mailto:lucas.lic...@dhs.wisconsin.gov] Sent: Wednesday, July 11, 2018 3:12 PM To: solr-user@lucene.apache.org Subject: RE: SOLR 7.2.1 on SLES 11? Thanks for the head's up on that bug, it looks like we'll be doing some script editing either way. I think 1 is the most popular with the team at this point, but I'll take the temperature and see how people feel. -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Wednesday, July 11, 2018 2:04 PM To: solr-user@lucene.apache.org Subject: Re: SOLR 7.2.1 on SLES 11? On 7/11/2018 12:09 PM, Lichte, Lucas R - DHS (Tek Systems) wrote: > Hello, we're trying to get SOLR 7.2.1 running on SLES 11 but we hit issues > with BASH 3 and the ${distro_string,,} at the beginning of the > install_solr_service.sh. We're just trying to get this upgraded without > tossing out the old DB serves so we can get the content team happy and move > on to redesigning the environment. We're wondering if anyone else has hit > this, and if they have any lessons learned. > > As we see it, there's a few options: > > 1. Install OpenSUSE BASH 4, maybe in /opt > > 2. Update the lowercase method to something from BASH 3 ( pipe to tr?) > > 3. Do this by hand without the install_solr_service.sh > > 4. Build new Redhat servers, migrate the DB and nuke these things. Both bash 4 and SLES 11 are more than nine years old. Upgrades are definitely recommended. The option that might be fastest is the second one you've presented -- changing anything in the scripts that requires bash 4 so it's compatible with bash 3. If you're comfortable with modifying a shell script in this way, this is a good option. The first option is probably a little bit safer -- install bash 4, and make sure that this is the version used when installing and when starting Solr. That could be a PATH adjustment, or changing the shebang in each script. There is another issue you're going to need to deal with on SLES. A fix for this issue has not been committed to the source repository: https://secure-web.cisco.com/1t8VBNgY_sYJsqMF0W7q4JFwbT7oK6SKtn6P7g6r3FhhNbrIOZEfCoZsmsAi3v22fJ1oXP7lOSwU6SNv1nCeY9u6V-zUCAYo6hVkHGu78vrtg3CJ8vy0AUnEkx0qsrV_tlSOejpFw2cFEYcYHllu8JO6rFCBDVOlGU-vEnR59YvzuL38hOD3qg62rO_i-g-JrT2BRLaZeieXUwhOUBmr85Ucz7nPlLxDSr935AXGdPQvoZmPurfOlY2Q0HFTG9fetjkv0Q0lOSefrwM5h1wR3cQ/https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSOLR-11853 Thanks, Shawn
Re: CDCR documentation typo
Thanks, but I think that section has been reworked, that typo isn't in the current documentation. It's doubtful that we'll re-release that reference guide. Best, Erick On Thu, Jul 19, 2018 at 3:14 AM, Yair Yotam wrote: > Hi, > > CDCR documentation page for v 7.1: > https://lucene.apache.org/solr/guide/7_1/cross-data-center-replication-cdcr.html > > Contains a typo in "real world" scenario section - solrconfig.xml: > Target & Source should be lowercase. > > Using this configuration as reference will result in a generic none > informative exception. > > Regards, > Yair
Re: CDCR documentation typo
Thank you for sharing this with others. For documentation, it looks like it had been refactored and fixed already: https://lucene.apache.org/solr/guide/7_4/cdcr-config.html Regards, Alex. On 19 July 2018 at 06:14, Yair Yotam wrote: > Hi, > > CDCR documentation page for v 7.1: > https://lucene.apache.org/solr/guide/7_1/cross-data-center-replication-cdcr.html > > Contains a typo in "real world" scenario section - solrconfig.xml: > Target & Source should be lowercase. > > Using this configuration as reference will result in a generic none > informative exception. > > Regards, > Yair
Re: Need an advice for architecture.
Are you doing a commit after every document? Is the index on local disk? That is very slow indexing. With four shards and smaller documents, we can index about a million documents per minute. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 19, 2018, at 1:28 AM, Emir Arnautović > wrote: > > Hi Francois, > If I got your numbers right, you are indexing on a single server and indexing > rate is ~31 doc/s. I would first check if something is wrong with indexing > logic. You check where the bottleneck is: do you read documents from DB fast > enough, do you batch documents… > Assuming you cannot have better rate than 30 doc/s and that bottleneck is > Solr, in order to finish it in 6h, you need to parallelise indexing on Solr > by splitting index to ~6 servers and have overall indexing rate of 180 doc/s. > > Thanks, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > >> On 19 Jul 2018, at 09:59, servus01 wrote: >> >> Would like to ask what your recommendations are for a new performant Solr >> architecture. >> >> SQL DB 4M documents with up to 5000 metadata fields each document [2xXeon >> 2.1Ghz, 32GB RAM] >> Actual Solr: 1 Core version 4.6, 3.8M documents, schema has 300 metadata >> fields to import, size 3.6GB [2xXeon 2.4Ghz, 32GB RAM] >> (atm we need 35h to build the index and about 24h for a mass update which >> affects the production) >> >> Building the index should be less than 6h. Sometimes we change some of the >> Metadata fields which affects most of the documents and therefore a >> massupdate / reindex is necessary. Reindex is ok also for about 6h (night) >> but should not have an impact to user queries. Anyway, every faster indexing >> is very welcome. We will have max. 20 - 30 CCUser. >> >> So i asked myself. How many nodes, Lshards, replicas ect. Could someone >> please give me recommendation for a fast working architecture. >> >> really appreciate this, best >> >> Francois >> >> >> >> >> >> >> >> >> >> -- >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: Document Count Difference Between Solr Versions 4.7 and 7.3
monitor the logging on the admin interface while indexing. also make sure to add a commit when done to get the docs in the collection before comparing the document counts On Thu, Jul 19, 2018 at 10:30 AM, THADC wrote: > Hi, > > I performed a bulk reindex against one of our larger databases for the > first > time using solr 7.3. The document count was substantially less (like at > least 15% less) than our most recent bulk reindex from th previous solr 4.7 > server. I will perform a more careful analysis, but I am assuming the > document count should not be different against the same database, even > accounting for the schema updates required for going from 4.7 to 7.3. > > Any response appreciated. Thank you. > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Document Count Difference Between Solr Versions 4.7 and 7.3
Hi, I performed a bulk reindex against one of our larger databases for the first time using solr 7.3. The document count was substantially less (like at least 15% less) than our most recent bulk reindex from th previous solr 4.7 server. I will perform a more careful analysis, but I am assuming the document count should not be different against the same database, even accounting for the schema updates required for going from 4.7 to 7.3. Any response appreciated. Thank you. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr Nodes Killed During a ReIndexing Process on New VMs Out of Memory Error
Thanks, made heap size considerably larger and its fine now. Thank you -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
CDCR documentation typo
Hi, CDCR documentation page for v 7.1: https://lucene.apache.org/solr/guide/7_1/cross-data-center-replication-cdcr.html Contains a typo in "real world" scenario section - solrconfig.xml: Target & Source should be lowercase. Using this configuration as reference will result in a generic none informative exception. Regards, Yair
Problem in QueryElevationComponent with solr 7.4.0
Hello. We are using solr 6.6.2 and want to upgrade it to version 7.4.0. But we have a problem with QueryElevationComponent when adding parameter "elevateIds=..." and "fl=[elevated]" Expample of query /solr/products/select?omitHeader=true=1,2,3,4,5=*:*=0=20=id,[elevated]=true=category_1_id_is:123=true and in response we've got http error 500 with such stacktrace java.lang.AssertionError: Expected an IndexableField but got: class java.lang.String at org.apache.solr.response.transform.BaseEditorialTransformer.getKey(BaseEditorialTransformer.java:72) at org.apache.solr.response.transform.BaseEditorialTransformer.transform(BaseEditorialTransformer.java:52) at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:123) at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:59) at org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:276) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:162) at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:209) at org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:325) at org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:120) at org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:71) at org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65) at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:787) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:524) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:531) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678) at java.lang.Thread.run(Thread.java:748) Configuration in solrconf.xml of select request handler is explicit 10 query facet stats debug spellcheck elevator And the elevator component config is string elevate.xml
Re: Need an advice for architecture.
Hi Francois, If I got your numbers right, you are indexing on a single server and indexing rate is ~31 doc/s. I would first check if something is wrong with indexing logic. You check where the bottleneck is: do you read documents from DB fast enough, do you batch documents… Assuming you cannot have better rate than 30 doc/s and that bottleneck is Solr, in order to finish it in 6h, you need to parallelise indexing on Solr by splitting index to ~6 servers and have overall indexing rate of 180 doc/s. Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 19 Jul 2018, at 09:59, servus01 wrote: > > Would like to ask what your recommendations are for a new performant Solr > architecture. > > SQL DB 4M documents with up to 5000 metadata fields each document [2xXeon > 2.1Ghz, 32GB RAM] > Actual Solr: 1 Core version 4.6, 3.8M documents, schema has 300 metadata > fields to import, size 3.6GB [2xXeon 2.4Ghz, 32GB RAM] > (atm we need 35h to build the index and about 24h for a mass update which > affects the production) > > Building the index should be less than 6h. Sometimes we change some of the > Metadata fields which affects most of the documents and therefore a > massupdate / reindex is necessary. Reindex is ok also for about 6h (night) > but should not have an impact to user queries. Anyway, every faster indexing > is very welcome. We will have max. 20 - 30 CCUser. > > So i asked myself. How many nodes, Lshards, replicas ect. Could someone > please give me recommendation for a fast working architecture. > > really appreciate this, best > > Francois > > > > > > > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Need an advice for architecture.
Would like to ask what your recommendations are for a new performant Solr architecture. SQL DB 4M documents with up to 5000 metadata fields each document [2xXeon 2.1Ghz, 32GB RAM] Actual Solr: 1 Core version 4.6, 3.8M documents, schema has 300 metadata fields to import, size 3.6GB [2xXeon 2.4Ghz, 32GB RAM] (atm we need 35h to build the index and about 24h for a mass update which affects the production) Building the index should be less than 6h. Sometimes we change some of the Metadata fields which affects most of the documents and therefore a massupdate / reindex is necessary. Reindex is ok also for about 6h (night) but should not have an impact to user queries. Anyway, every faster indexing is very welcome. We will have max. 20 - 30 CCUser. So i asked myself. How many nodes, Lshards, replicas ect. Could someone please give me recommendation for a fast working architecture. really appreciate this, best Francois -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html