Re: Streaming Expressions: Merge array values? Inverse of cartesianProduct()
Actually you're second example is probably a straight forward: reduce(select(...), group(...), by="k1") Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Jun 14, 2018 at 7:33 PM, Joel Bernstein wrote: > Take a look at the reduce() function. You'll have to write a custom reduce > operation but you can follow the example here: > > https://github.com/apache/lucene-solr/blob/master/solr/ > solrj/src/java/org/apache/solr/client/solrj/io/ops/GroupOperation.java > > You can plug in your custom reduce operation in the solrconfig.xml and use > it like any other function. If you're interested in working on this you > could create a ticket and I can provide guidance. > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > 2018-06-14 13:13 GMT-04:00 Christian Spitzlay < > christian.spitz...@biologis.com>: > >> Hi, >> >> is there a way to merge array values? >> >> Something that transforms >> >> { >> "k1": "1", >> "k2": ["a", "b"] >> }, >> { >> "k1": "2", >> "k2": ["c", "d"] >> }, >> { >> "k1": "2", >> "k2": ["e", "f"] >> } >> >> into >> >> { >> "k1": "1", >> "k2": ["a", "b"] >> }, >> { >> "k1": "2", >> "k2": ["c", "d", "e", "f"] >> } >> >> >> And an inverse of cartesianProduct() that transforms >> >> { >> "k1": "1", >> "k2": "a" >> }, >> { >> "k1": "2", >> "k2": "b" >> }, >> { >> "k1": "2", >> "k2": "c" >> } >> >> into >> >> { >> "k1": "1", >> "k2": ["a"] >> }, >> { >> "k1": "2", >> "k2": ["b", "c"] >> } >> >> >> Christian >> >> >> >
Re: Streaming Expressions: Merge array values? Inverse of cartesianProduct()
Take a look at the reduce() function. You'll have to write a custom reduce operation but you can follow the example here: https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/io/ops/GroupOperation.java You can plug in your custom reduce operation in the solrconfig.xml and use it like any other function. If you're interested in working on this you could create a ticket and I can provide guidance. Joel Bernstein http://joelsolr.blogspot.com/ 2018-06-14 13:13 GMT-04:00 Christian Spitzlay < christian.spitz...@biologis.com>: > Hi, > > is there a way to merge array values? > > Something that transforms > > { > "k1": "1", > "k2": ["a", "b"] > }, > { > "k1": "2", > "k2": ["c", "d"] > }, > { > "k1": "2", > "k2": ["e", "f"] > } > > into > > { > "k1": "1", > "k2": ["a", "b"] > }, > { > "k1": "2", > "k2": ["c", "d", "e", "f"] > } > > > And an inverse of cartesianProduct() that transforms > > { > "k1": "1", > "k2": "a" > }, > { > "k1": "2", > "k2": "b" > }, > { > "k1": "2", > "k2": "c" > } > > into > > { > "k1": "1", > "k2": ["a"] > }, > { > "k1": "2", > "k2": ["b", "c"] > } > > > Christian > > >
Re: Exception when processing streaming expression
We have to check the behavior of the innerJoin. I suspect that its closing the second stream when the first stream his finished. This would cause a broken pipe with the second stream. The export handler has specific code that eats the broken pipe exception so it doesn't end up in the logs. The select hander does not have this code. In general you never want to use the select handler and set the rows to such a big number. If you have that many rows you'll want to use the export and handler which is designed to export the entire result set. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Jun 14, 2018 at 1:30 PM, Christian Spitzlay < christian.spitz...@biologis.com> wrote: > What does that mean exactly? If I set the rows parameter to 10 > the exception still occurs. AFAICT all this happens internally during the > processing of the streaming expression. Why wouldn't the select send > the EOF tuple when it reaches the end of the documents? > Or why wouldn't the receiving end wait for it to appear? > Due to an incredibly low timeout used internally? > > > Christian Spitzlay > > > > > Am 14.06.2018 um 19:18 schrieb Susmit : > > > > Hi, > > This may be expected if one of the streams is closed early - does not > reach to EOF tuple > > > > Sent from my iPhone > > > >> On Jun 14, 2018, at 9:53 AM, Christian Spitzlay < > christian.spitz...@biologis.com> wrote: > >> > >> Here ist one I stripped down as far as I could: > >> > >> innerJoin(sort(search(kmm, q="sds_endpoint_uuid:( > 2f927a0b\-fe38\-451e\-9103\-580914a77e82)", > fl="sds_endpoint_uuid,sds_to_endpoint_uuid", > sort="sds_to_endpoint_uuid ASC", qt="/export"), by="sds_endpoint_uuid > ASC"), search(kmm, q=ss_search_api_datasource:entity\:as_metadata, > fl="sds_metadata_of_uuid", sort="sds_metadata_of_uuid ASC", qt="/select", > rows=1), on="sds_endpoint_uuid=sds_metadata_of_uuid") > >> > >> The exception happens both via PHP (search_api_solr / Solarium) and via > the Solr admin UI. > >> (version: Solr 7.3.1 on macOS High Sierra 10.13.5) > >> > >> It seems to be related to the fact that the second stream uses > "select“. > >> - If I use "export“ the exception doesn’t occur. > >> - If I set the rows parameter "low enough“ so I do not get any results > >> the exception doesn’t occur either. > >> > >> > >> BTW: Do you know of any tool for formatting and/or syntax highlighting > >> these expressions? > >> > >> > >> Christian Spitzlay > >> > >> > >> > >> > >> > >>> Am 13.06.2018 um 23:02 schrieb Joel Bernstein : > >>> > >>> Can your provide some example expressions that are causing these > exceptions? > >>> > >>> Joel Bernstein > >>> http://joelsolr.blogspot.com/ > >>> > >>> On Wed, Jun 13, 2018 at 9:02 AM, Christian Spitzlay < > >>> christian.spitz...@biologis.com> wrote: > >>> > Hi, > > I am seeing a lot of (reproducible) exceptions in my solr log file > when I execute streaming expressions: > > o.a.s.s.HttpSolrCall Unable to write response, client closed > connection > or we are shutting down > org.eclipse.jetty.io.EofException > at org.eclipse.jetty.io.ChannelEndPoint.flush( > ChannelEndPoint.java:292) > at org.eclipse.jetty.io.WriteFlusher.flush( > WriteFlusher.java:429) > at org.eclipse.jetty.io.WriteFlusher.write( > WriteFlusher.java:322) > at org.eclipse.jetty.io.AbstractEndPoint.write( > AbstractEndPoint.java:372) > at org.eclipse.jetty.server.HttpConnection$SendCallback. > process(HttpConnection.java:794) > […] > at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run( > EatWhatYouKill.java:131) > at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ > ReservedThread.run(ReservedThreadExecutor.java:382) > at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob( > QueuedThreadPool.java:708) > at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run( > QueuedThreadPool.java:626) > at java.base/java.lang.Thread.run(Thread.java:844) > Caused by: java.io.IOException: Broken pipe > at java.base/sun.nio.ch.FileDispatcherImpl.writev0(Native > Method) > at java.base/sun.nio.ch.SocketDispatcher.writev( > SocketDispatcher.java:51) > at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:148) > at java.base/sun.nio.ch.SocketChannelImpl.write( > SocketChannelImpl.java:506) > at org.eclipse.jetty.io.ChannelEndPoint.flush( > ChannelEndPoint.java:272) > ... 69 more > > > I have read up on the exception message and found > http://lucene.472066.n3.nabble.com/Unable-to-write- > response-client-closed- > connection-or-we-are-shutting-down-tt4350349.html#a4350947 > but I don’t understand how an early client connect can cause what I am > seeing: > > What puzzles me is that the response has been delivered in full to the > client library, including
Re: Suggestions for debugging performance issue
On 6/12/2018 12:06 PM, Chris Troullis wrote: > The issue we are seeing is with 1 collection in particular, after we set up > CDCR, we are getting extremely slow response times when retrieving > documents. Debugging the query shows QTime is almost nothing, but the > overall responseTime is like 5x what it should be. The problem is > exacerbated by larger result sizes. IE retrieving 25 results is almost > normal, but 200 results is way slower than normal. I can run the exact same > query multiple times in a row (so everything should be cached), and I still > see response times way higher than another environment that is not using > CDCR. It doesn't seem to matter if CDCR is enabled or disabled, just that > we are using the CDCRUpdateLog. The problem started happening even before > we enabled CDCR. > > In a lower environment we noticed that the transaction logs were huge > (multiple gigs), so we tried stopping solr and deleting the tlogs then > restarting, and that seemed to fix the performance issue. We tried the same > thing in production the other day but it had no effect, so now I don't know > if it was a coincidence or not. There is one other cause besides CDCR buffering that I know of for huge transaction logs, and it has nothing to do with CDCR: A lack of hard commits. It is strongly recommended to have autoCommit set to a reasonably short interval (about a minute in my opinion, but 15 seconds is VERY common). Most of the time openSearcher should be set to false in the autoCommit config, and other mechanisms (which might include autoSoftCommit) should be used for change visibility. The example autoCommit settings might seem superfluous because they don't affect what's searchable, but it is actually a very important configuration to keep. Are the docs in this collection really big, by chance? As I went through previous threads you've started on the mailing list, I have noticed that none of your messages provided some details that would be useful for looking into performance problems: * What OS vendor and version Solr is running on. * Total document count on the server (counting all index cores). * Total index size on the server (counting all cores). * What the total of all Solr heaps on the server is. * Whether there is software other than Solr on the server. * How much total memory the server has installed. If you name the OS, I can use that information to help you gather some additional info which will actually show me most of that list. Total document count is something that I cannot get from the info I would help you gather. Something else that can cause performance issues is GC pauses. If you provide a GC log (The script that starts Solr logs this by default), we can analyze it to see if that's a problem. Attachments to messages on the mailing list typically do not make it to the list, so a file sharing website is a better way to share large logfiles. A paste website is good for log data that's smaller. Thanks, Shawn
Re: Changing Field Assignments
On 6/14/2018 12:10 PM, Terry Steichen wrote: > I don't disagree at all, but have a basic question: How do you easily > transition from a system using a dynamic schema to one using a fixed one? Not sure you need to actually transition. Just remove the config in solrconfig.xml that causes Solr to invoke the update chain where the unknown fields are added, upload the new config to zookeeper, and reload the collection. When you do that, indexing with unknown fields will fail, and if the indexing program has good error handling, somebody is going to notice the failure. The major difficulty with this will be more of a people problem than a technical problem. You have to convince people who use the Solr install that it's a lot better that they get an indexing error and ask you to fix it. They may not care that you've got a major problem on your hands when the system makes a mistake adding a field. > I'm runnning 6.6.0 in cloud mode (only because it's necessary, as I > understand it, to be in cloud mode for the authentication/authorization > to work). In my server/solr/configsets subdirectory there are > directories "data_driven_schema_configs" and "basic_configs". Both > contain a file named "managed_schema." Which one is the active one? As of Solr 6.5.0, the basic authentication plugin also works in non-cloud (standalone) mode. https://issues.apache.org/jira/browse/SOLR-9481 I will typically recommend cloud mode to anyone setting up a brand new Solr installation, mostly because it automates a lot of the steps of setting up high availability. I don't use cloud mode myself, because it didn't exist when I set up my systems. Converting to cloud mode would require rewriting all of the tools I've written that keep the indexes up to date. I might do that one day, but not today. In cloud mode, neither of the managed-schema files you have mentioned is active. The active config (solrconfig.xml, the schema, and all files mentioned in either of those) is in zookeeper, not on the disk. > From the AdminUI, each collection has an associated "managed_schema" > (under the "Files" option). I'm guessing that this collection-specific > managed_schema is the result of the automated field discovery process, > presumably using some baseline version (in configsets) to start with. If you create a collection with "bin/solr create", the config that you give it is usually uploaded to zookeeper and all shard replicas in the collection use that uploaded config. In older versions like 6.6.0, basic_configs is used if no source config is named. In newer versions, _default is used. When the update processor adds an unknown field, it is added to the managed-schema file in zookeeper and the collection is reloaded. The source configset on disk is not touched. > If that's true, then it would presumably make sense to save this > collection-specific managed_schema to disk as schema.xml. I further > presume I'd create a config subdirectory for each of said collections > and put schema.xml there. Is that right? As long as you're in cloud mode, all your index configs are in zookeeper. Any config you have on disk is NOT what is actually being used. https://lucene.apache.org/solr/guide/6_6/using-zookeeper-to-manage-configuration-files.html > Every time I read (and reread, and reread, ...) the Solr docs they seem > to be making certain (very basic) assumptions that I'm unclear about, so > your help in the preceding would be most appreciated. The Solr documentation is not very friendly to novices. Writing documentation that an expert can use is sometimes difficult, but most developers can manage it. Writing documentation that a novice can use is much harder, because it's not easy for someone who has intimate knowledge of the system to step back and look at it from a place where that knowledge isn't available. Some success has been achieved in later documentation versions. It's going to take a lot of time and effort before most of Solr's documentation is novice-friendly. Thanks, Shawn
Solr basic auth
Hi, I have configured basic auth for solrcloud. it works well when i access the solr url directly. i have integrated this solr with test.com domain. now if I access the solr url like test.com/solr it prompts the credentials but I dont want to ask this time since it is known domain. is there any way to achieve this. much appreciate your quick response. my security json below. i'm using the default security, want to allow my domain default without prompting any credentials. {"authentication":{ "blockUnknown": true, "class":"solr.BasicAuthPlugin", "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="} },"authorization":{ "class":"solr.RuleBasedAuthorizationPlugin", "permissions":[{"name":"security-edit", "role":"admin"}], "user-role":{"solr":"admin"} }}
Re: Changing Field Assignments
Shawn, I don't disagree at all, but have a basic question: How do you easily transition from a system using a dynamic schema to one using a fixed one? I'm runnning 6.6.0 in cloud mode (only because it's necessary, as I understand it, to be in cloud mode for the authentication/authorization to work). In my server/solr/configsets subdirectory there are directories "data_driven_schema_configs" and "basic_configs". Both contain a file named "managed_schema." Which one is the active one? >From the AdminUI, each collection has an associated "managed_schema" (under the "Files" option). I'm guessing that this collection-specific managed_schema is the result of the automated field discovery process, presumably using some baseline version (in configsets) to start with. If that's true, then it would presumably make sense to save this collection-specific managed_schema to disk as schema.xml. I further presume I'd create a config subdirectory for each of said collections and put schema.xml there. Is that right? And I have to do this for each collection, right? Every time I read (and reread, and reread, ...) the Solr docs they seem to be making certain (very basic) assumptions that I'm unclear about, so your help in the preceding would be most appreciated. Thanks. Terry On 06/14/2018 01:51 PM, Shawn Heisey wrote: > On 6/11/2018 2:02 PM, Terry Steichen wrote: >> I am using Solr (6.6.0) in the automatic mode (where it discovers >> fields). It's working fine with one exception. The problem is that >> Solr maps the discovered "meta_creation_date" is assigned the type >> TrieDateField. >> >> Unfortunately, that type is limited in a number of ways (like sorting, >> abbreviated forms and etc.). What I'd like to do is have that >> ("meta_creation_date") field assigned to a different type, like >> DateRangeField. >> >> Is it possible to accomplish this (during indexing) by creating a copy >> field to a different type, and using the copy field in the query? Or >> via some kind of function operation (which I've never understood)? > What you are describing is precisely why I never use the mode where Solr > automatically adds unknown fields. > > If the field does not exist in the schema before you index the document, > then the best Solr can do is precisely what is configured in the update > processor that adds unknown fields. You can adjust that config, but it > will always be a general purpose guess. > > What is actually needed for multiple unknown fields is often outside > what that update processor is capable of detecting and configuring > automatically. For that reason, I set up the schema manually, and I > want indexing to fail if the input documents contain fields that I > haven't defined. Then whoever is doing the indexing can contact me with > their error details, and I can add new fields with the exact required > definition. > > Thanks, > Shawn > >
Re: Indexing to replica instead leader
On 6/8/2018 3:56 AM, SOLR4189 wrote: > /When a document is sent to a Solr node for indexing, the system first > determines which Shard that document belongs to, and then which node is > currently hosting the leader for that shard. The document is then forwarded > to the current leader for indexing, and the leader forwards the update to > all of the other replicas./ > > So my question, what does happen when I'm sending index request to replica > server instead leader server? > > Replica becomes a leader for this request? Or replica becomes only federator > that resends request to leader and then leader will resend to replica? Terminology nit: The leader *is* a replica. It just has a temporary special job. It doesn't lose its status as a replica when it is elected leader. If you send a document update to an index that is not the leader for the correct shard, it will do just what you said above -- figure out the correct shard, figure out which replica is the leader of that shard, and forward the request there. That leader will index the request itself and then handle updating the other replicas. It will also reply to the index where you sent the request, which will reply to you. The leader role will not change to another core unless there is a leader election and the existing leader loses that election. An election is not going to happen without a significant cluster event. Examples are an explicit election request, or the core/server with the the leader role going down. Thanks, Shawn
Re: Changing Field Assignments
On 6/11/2018 2:02 PM, Terry Steichen wrote: > I am using Solr (6.6.0) in the automatic mode (where it discovers > fields). It's working fine with one exception. The problem is that > Solr maps the discovered "meta_creation_date" is assigned the type > TrieDateField. > > Unfortunately, that type is limited in a number of ways (like sorting, > abbreviated forms and etc.). What I'd like to do is have that > ("meta_creation_date") field assigned to a different type, like > DateRangeField. > > Is it possible to accomplish this (during indexing) by creating a copy > field to a different type, and using the copy field in the query? Or > via some kind of function operation (which I've never understood)? What you are describing is precisely why I never use the mode where Solr automatically adds unknown fields. If the field does not exist in the schema before you index the document, then the best Solr can do is precisely what is configured in the update processor that adds unknown fields. You can adjust that config, but it will always be a general purpose guess. What is actually needed for multiple unknown fields is often outside what that update processor is capable of detecting and configuring automatically. For that reason, I set up the schema manually, and I want indexing to fail if the input documents contain fields that I haven't defined. Then whoever is doing the indexing can contact me with their error details, and I can add new fields with the exact required definition. Thanks, Shawn
Re: Exception when processing streaming expression
What does that mean exactly? If I set the rows parameter to 10 the exception still occurs. AFAICT all this happens internally during the processing of the streaming expression. Why wouldn't the select send the EOF tuple when it reaches the end of the documents? Or why wouldn't the receiving end wait for it to appear? Due to an incredibly low timeout used internally? Christian Spitzlay > Am 14.06.2018 um 19:18 schrieb Susmit : > > Hi, > This may be expected if one of the streams is closed early - does not reach > to EOF tuple > > Sent from my iPhone > >> On Jun 14, 2018, at 9:53 AM, Christian Spitzlay >> wrote: >> >> Here ist one I stripped down as far as I could: >> >> innerJoin(sort(search(kmm, >> q="sds_endpoint_uuid:(2f927a0b\-fe38\-451e\-9103\-580914a77e82)", >> fl="sds_endpoint_uuid,sds_to_endpoint_uuid", sort="sds_to_endpoint_uuid >> ASC", qt="/export"), by="sds_endpoint_uuid ASC"), search(kmm, >> q=ss_search_api_datasource:entity\:as_metadata, fl="sds_metadata_of_uuid", >> sort="sds_metadata_of_uuid ASC", qt="/select", rows=1), >> on="sds_endpoint_uuid=sds_metadata_of_uuid") >> >> The exception happens both via PHP (search_api_solr / Solarium) and via the >> Solr admin UI. >> (version: Solr 7.3.1 on macOS High Sierra 10.13.5) >> >> It seems to be related to the fact that the second stream uses "select“. >> - If I use "export“ the exception doesn’t occur. >> - If I set the rows parameter "low enough“ so I do not get any results >> the exception doesn’t occur either. >> >> >> BTW: Do you know of any tool for formatting and/or syntax highlighting >> these expressions? >> >> >> Christian Spitzlay >> >> >> >> >> >>> Am 13.06.2018 um 23:02 schrieb Joel Bernstein : >>> >>> Can your provide some example expressions that are causing these exceptions? >>> >>> Joel Bernstein >>> http://joelsolr.blogspot.com/ >>> >>> On Wed, Jun 13, 2018 at 9:02 AM, Christian Spitzlay < >>> christian.spitz...@biologis.com> wrote: >>> Hi, I am seeing a lot of (reproducible) exceptions in my solr log file when I execute streaming expressions: o.a.s.s.HttpSolrCall Unable to write response, client closed connection or we are shutting down org.eclipse.jetty.io.EofException at org.eclipse.jetty.io.ChannelEndPoint.flush( ChannelEndPoint.java:292) at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:429) at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:322) at org.eclipse.jetty.io.AbstractEndPoint.write( AbstractEndPoint.java:372) at org.eclipse.jetty.server.HttpConnection$SendCallback. process(HttpConnection.java:794) […] at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run( EatWhatYouKill.java:131) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ ReservedThread.run(ReservedThreadExecutor.java:382) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob( QueuedThreadPool.java:708) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run( QueuedThreadPool.java:626) at java.base/java.lang.Thread.run(Thread.java:844) Caused by: java.io.IOException: Broken pipe at java.base/sun.nio.ch.FileDispatcherImpl.writev0(Native Method) at java.base/sun.nio.ch.SocketDispatcher.writev( SocketDispatcher.java:51) at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:148) at java.base/sun.nio.ch.SocketChannelImpl.write( SocketChannelImpl.java:506) at org.eclipse.jetty.io.ChannelEndPoint.flush( ChannelEndPoint.java:272) ... 69 more I have read up on the exception message and found http://lucene.472066.n3.nabble.com/Unable-to-write-response-client-closed- connection-or-we-are-shutting-down-tt4350349.html#a4350947 but I don’t understand how an early client connect can cause what I am seeing: What puzzles me is that the response has been delivered in full to the client library, including the document with EOF. So Solr must have already processed the streaming expression and returned the result. It’s just that the log is filled with stacktraces of this exception that suggests something went wrong. I don’t understand why this happens when the query seems to have succeeded. Best regards, Christian >>
Re: Exception when processing streaming expression
Hi, This may be expected if one of the streams is closed early - does not reach to EOF tuple Sent from my iPhone > On Jun 14, 2018, at 9:53 AM, Christian Spitzlay > wrote: > > Here ist one I stripped down as far as I could: > > innerJoin(sort(search(kmm, > q="sds_endpoint_uuid:(2f927a0b\-fe38\-451e\-9103\-580914a77e82)", > fl="sds_endpoint_uuid,sds_to_endpoint_uuid", sort="sds_to_endpoint_uuid ASC", > qt="/export"), by="sds_endpoint_uuid ASC"), search(kmm, > q=ss_search_api_datasource:entity\:as_metadata, fl="sds_metadata_of_uuid", > sort="sds_metadata_of_uuid ASC", qt="/select", rows=1), > on="sds_endpoint_uuid=sds_metadata_of_uuid") > > The exception happens both via PHP (search_api_solr / Solarium) and via the > Solr admin UI. > (version: Solr 7.3.1 on macOS High Sierra 10.13.5) > > It seems to be related to the fact that the second stream uses "select“. > - If I use "export“ the exception doesn’t occur. > - If I set the rows parameter "low enough“ so I do not get any results > the exception doesn’t occur either. > > > BTW: Do you know of any tool for formatting and/or syntax highlighting > these expressions? > > > Christian Spitzlay > > > > > >> Am 13.06.2018 um 23:02 schrieb Joel Bernstein : >> >> Can your provide some example expressions that are causing these exceptions? >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ >> >> On Wed, Jun 13, 2018 at 9:02 AM, Christian Spitzlay < >> christian.spitz...@biologis.com> wrote: >> >>> Hi, >>> >>> I am seeing a lot of (reproducible) exceptions in my solr log file >>> when I execute streaming expressions: >>> >>> o.a.s.s.HttpSolrCall Unable to write response, client closed connection >>> or we are shutting down >>> org.eclipse.jetty.io.EofException >>> at org.eclipse.jetty.io.ChannelEndPoint.flush( >>> ChannelEndPoint.java:292) >>> at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:429) >>> at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:322) >>> at org.eclipse.jetty.io.AbstractEndPoint.write( >>> AbstractEndPoint.java:372) >>> at org.eclipse.jetty.server.HttpConnection$SendCallback. >>> process(HttpConnection.java:794) >>> […] >>> at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run( >>> EatWhatYouKill.java:131) >>> at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ >>> ReservedThread.run(ReservedThreadExecutor.java:382) >>> at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob( >>> QueuedThreadPool.java:708) >>> at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run( >>> QueuedThreadPool.java:626) >>> at java.base/java.lang.Thread.run(Thread.java:844) >>> Caused by: java.io.IOException: Broken pipe >>> at java.base/sun.nio.ch.FileDispatcherImpl.writev0(Native Method) >>> at java.base/sun.nio.ch.SocketDispatcher.writev( >>> SocketDispatcher.java:51) >>> at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:148) >>> at java.base/sun.nio.ch.SocketChannelImpl.write( >>> SocketChannelImpl.java:506) >>> at org.eclipse.jetty.io.ChannelEndPoint.flush( >>> ChannelEndPoint.java:272) >>> ... 69 more >>> >>> >>> I have read up on the exception message and found >>> http://lucene.472066.n3.nabble.com/Unable-to-write-response-client-closed- >>> connection-or-we-are-shutting-down-tt4350349.html#a4350947 >>> but I don’t understand how an early client connect can cause what I am >>> seeing: >>> >>> What puzzles me is that the response has been delivered in full to the >>> client library, including the document with EOF. >>> >>> So Solr must have already processed the streaming expression and returned >>> the result. >>> It’s just that the log is filled with stacktraces of this exception that >>> suggests something went wrong. >>> I don’t understand why this happens when the query seems to have succeeded. >>> >>> >>> Best regards, >>> Christian >>> >>> >>> >
Streaming Expressions: Merge array values? Inverse of cartesianProduct()
Hi, is there a way to merge array values? Something that transforms { "k1": "1", "k2": ["a", "b"] }, { "k1": "2", "k2": ["c", "d"] }, { "k1": "2", "k2": ["e", "f"] } into { "k1": "1", "k2": ["a", "b"] }, { "k1": "2", "k2": ["c", "d", "e", "f"] } And an inverse of cartesianProduct() that transforms { "k1": "1", "k2": "a" }, { "k1": "2", "k2": "b" }, { "k1": "2", "k2": "c" } into { "k1": "1", "k2": ["a"] }, { "k1": "2", "k2": ["b", "c"] } Christian
Re: Exception when processing streaming expression
Here ist one I stripped down as far as I could: innerJoin(sort(search(kmm, q="sds_endpoint_uuid:(2f927a0b\-fe38\-451e\-9103\-580914a77e82)", fl="sds_endpoint_uuid,sds_to_endpoint_uuid", sort="sds_to_endpoint_uuid ASC", qt="/export"), by="sds_endpoint_uuid ASC"), search(kmm, q=ss_search_api_datasource:entity\:as_metadata, fl="sds_metadata_of_uuid", sort="sds_metadata_of_uuid ASC", qt="/select", rows=1), on="sds_endpoint_uuid=sds_metadata_of_uuid") The exception happens both via PHP (search_api_solr / Solarium) and via the Solr admin UI. (version: Solr 7.3.1 on macOS High Sierra 10.13.5) It seems to be related to the fact that the second stream uses "select“. - If I use "export“ the exception doesn’t occur. - If I set the rows parameter "low enough“ so I do not get any results the exception doesn’t occur either. BTW: Do you know of any tool for formatting and/or syntax highlighting these expressions? Christian Spitzlay > Am 13.06.2018 um 23:02 schrieb Joel Bernstein : > > Can your provide some example expressions that are causing these exceptions? > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Wed, Jun 13, 2018 at 9:02 AM, Christian Spitzlay < > christian.spitz...@biologis.com> wrote: > >> Hi, >> >> I am seeing a lot of (reproducible) exceptions in my solr log file >> when I execute streaming expressions: >> >> o.a.s.s.HttpSolrCall Unable to write response, client closed connection >> or we are shutting down >> org.eclipse.jetty.io.EofException >>at org.eclipse.jetty.io.ChannelEndPoint.flush( >> ChannelEndPoint.java:292) >>at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:429) >>at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:322) >>at org.eclipse.jetty.io.AbstractEndPoint.write( >> AbstractEndPoint.java:372) >>at org.eclipse.jetty.server.HttpConnection$SendCallback. >> process(HttpConnection.java:794) >> […] >>at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run( >> EatWhatYouKill.java:131) >>at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ >> ReservedThread.run(ReservedThreadExecutor.java:382) >>at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob( >> QueuedThreadPool.java:708) >>at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run( >> QueuedThreadPool.java:626) >>at java.base/java.lang.Thread.run(Thread.java:844) >> Caused by: java.io.IOException: Broken pipe >>at java.base/sun.nio.ch.FileDispatcherImpl.writev0(Native Method) >>at java.base/sun.nio.ch.SocketDispatcher.writev( >> SocketDispatcher.java:51) >>at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:148) >>at java.base/sun.nio.ch.SocketChannelImpl.write( >> SocketChannelImpl.java:506) >>at org.eclipse.jetty.io.ChannelEndPoint.flush( >> ChannelEndPoint.java:272) >>... 69 more >> >> >> I have read up on the exception message and found >> http://lucene.472066.n3.nabble.com/Unable-to-write-response-client-closed- >> connection-or-we-are-shutting-down-tt4350349.html#a4350947 >> but I don’t understand how an early client connect can cause what I am >> seeing: >> >> What puzzles me is that the response has been delivered in full to the >> client library, including the document with EOF. >> >> So Solr must have already processed the streaming expression and returned >> the result. >> It’s just that the log is filled with stacktraces of this exception that >> suggests something went wrong. >> I don’t understand why this happens when the query seems to have succeeded. >> >> >> Best regards, >> Christian >> >> >>
Re: Cost of enabling doc values
My claim is it simply doesn't matter. You either have to have those bytes laying around on disk in the DV case and using OS memory or in the cumulative java heap in the non-dv case. If you're doing one of the three operations I know of no situation where I would _not_ enable docValues. The Lucene people do a lot of effort to make things compact, so what you're coming up with is probably an upper bound. Frankly I'd just enable the DV fields, index a bunch of docs and look at the cumulative sizes of your dvd and dvm files. I'd probably index, say, 10M docs and measure the two extensions, then index 10M more and use the delta between 10M and 20M to extrapolate. I also use the size of those files to get something of a sense of how much OS memory I need for those operations (searching not included yet). Gives me a sense of whether what I want to do is possible or not. Long blog on the topic of sizing, but it sums up as "try it and see": https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Best, Erick On Thu, Jun 14, 2018 at 8:34 AM, root23 wrote: > Thanks for the detailed explanation erick. > I did a little math as you suggested. Just wanted to see if i am doing it > right. > So we have around 4 billion docs in production and around 70 nodes. > > To support the business use case we have around 18 fields on which we have > to enable docvalues for sorting. > > FieldType totalFields Size of field > TriIntField2 4 bytes > StrField 720 bytes > IntField14 bytes > Bool 1 1 bytes > TrieDateField 2 10 bytes > TextField5 10 bytes > > > Some of them i approximated the bytes like fot strField and textField based > on no. of chatacters we usually have in those fields. I am not sure about > the TrieDate field how much it will take. Please feel free to correct me if > i am way off. > > so acc. to the above total size for a doc is = 2*4 + 20 *7 + 4 + 1+20+50 = > 223 bytes. > > So for 4 billion docs it comes to approximate 8920 bytes or 892 gb. > > Does that math sound right or am i way off ? > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Cost of enabling doc values
Thanks for the detailed explanation erick. I did a little math as you suggested. Just wanted to see if i am doing it right. So we have around 4 billion docs in production and around 70 nodes. To support the business use case we have around 18 fields on which we have to enable docvalues for sorting. FieldType totalFields Size of field TriIntField2 4 bytes StrField 720 bytes IntField14 bytes Bool 1 1 bytes TrieDateField 2 10 bytes TextField5 10 bytes Some of them i approximated the bytes like fot strField and textField based on no. of chatacters we usually have in those fields. I am not sure about the TrieDate field how much it will take. Please feel free to correct me if i am way off. so acc. to the above total size for a doc is = 2*4 + 20 *7 + 4 + 1+20+50 = 223 bytes. So for 4 billion docs it comes to approximate 8920 bytes or 892 gb. Does that math sound right or am i way off ? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Logging Every document to particular core
You can enable DEBUG level for LogUpdateProcessorFactory category https://github.com/apache/lucene-solr/blob/228a84fd6db3ef5fc1624d69e1c82a1f02c51352/solr/core/src/java/org/apache/solr/update/processor/LogUpdateProcessorFactory.java#L100 On Wed, Jun 13, 2018 at 5:00 PM, govind nitk wrote: > Hi, > > Is there any way to log all the data getting indexed to a particular core > only ? > > > Regards, > govind > -- Sincerely yours Mikhail Khludnev
Re: Cost of enabling doc values
Depending on what your documents look like, it could be that enabling docValues would allow you to save space by switching to stored="false" since Solr can fetch the stored value from docValues. I say it depends on your documents and use case since sometimes it may be slower to access a docValue just to read one field if all the other fields come from stored values. If you do not do matches/lookups/range-queries on some fields you may even be able to set indexed="false" and save space in the inverted index. A benefit of having docValues enabled is that it then lets you do atomic updates to your docs, to re-index from an existing index (not from source) and to use streaming expressions on all fields. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 14. jun. 2018 kl. 04:13 skrev Erick Erickson : > > I pretty much agree with your business side. > > The rough size of the docValues fields is one of X for each doc. So > say you have an int field. Size is near maxDoc * 4 bytes. This is not > totally accurate, there is some int packing done for instance, but > it'll do. If you really want an accurate count, look at the > before/after size of your *.dvd, *.dvm segment files in your index. > > However, it's "pay me now or pay me later". The critical operations > are faceting, grouping and sorting. If you do any of those operations > on a field that is _not_ docValues=true, it will be uninverted on the > _java heap_, where it will consume GC cycles, put pressure on all your > other operations, etc. This process will be done _every_ time you open > a new searcher and use these fields. > > If the field _does_ have docValues=true, that will be held in the OS's > memory space, _not_ the JVM's heap due to using MMapDirectory (see: > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html). > Among other virtues, it can be swapped out (although you don't want it > to be, it's still better than OOMing). Plus loading it is just reading > it off disk rather than the expensive uninversion process. > > And if you don't do any of those operations (grouping, sorting and > faceting), then the bits just sit there on disk doing nothing. > > So say you carefully define what fields will be used for any of the > three operations and enable docValues. Then 3 months later the > business side comes back with "oh, we need to facet on another field". > Your choices are: > 1> live with the increased heap usage and other resource contention. > Perhaps along the way panicking because your processes OOM and prod > goes down. > or > 2> reindex from scratch, starting with a totally new collection. > > And note the fragility here. Your application can be humming along > just fine for months. Then one fine day someone innocently submits a > query that sorts on a new field that has docValues=false and B-OOM. > > If (and only if) you can _guarantee_ that fieldX will never be used > for any of the three operations, then turning off docValues for that > field will save you some disk space. But that's the only advantage. > Well, alright. If you have to do a full index replication that'll > happen a bit faster too. > > So I prefer to err on the side of caution. I recommend making fields > docValues=true unless I can absolutely guarantee (and business _also_ > agrees) > 1> that fieldX will never be used for sorting, grouping or faceting, > or > 2> if the can't promise that they guarantee to give me time to > completely reindex, > > Best, > Erick > > > On Wed, Jun 13, 2018 at 4:30 PM, root23 wrote: >> Hi all, >> Does anyone know how much typically index size increments when we enable doc >> value on a field. >> Our business side want to enable sorting fields on most of our fields. I am >> trying to push back saying that it will increase the index size, since >> enabling docvalues will create the univerted index. >> >> I know the size probably depends on what values are in the fields but i need >> a general idea so that i can convince them that enabling on the fields is >> costly and it will incur this much cost. >> >> If anyone knows how to find this out looking at an existing solr index which >> has docvalues enabled , that will also be great help. >> >> Thanks !!! >> >> >> >> -- >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr Suggest Component and OOM
I didn't get any answer to my questions ( unless you meant you have 25 millions of different values for those fields ...) Please read again my answer and elaborate further. Do you problem happen for the 2 different suggesters ? Cheers - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Hardware-Aware Solr Coud Sharding?
You could also look into the Autoscaling stuff in 7.x which can be programmed to move shards around based on system load and HW specs on the various nodes, so in theory that framework (although still a bit unstable) will suggest moving some replicas from weak nodes over to more powerful ones. If you "overshard" your system, i.e. if you have three nodes, you create a collection with 9 shards, then there will be three shards per node, and Solr can suggest moving one of them off to anther server. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 12. jun. 2018 kl. 18:39 skrev Erick Erickson : > > In a mixed-hardware situation you can certainly place replicas as you > choose. Create a minimal collection or use the special nodeset EMPTY > and then place your replicas one-by-one. > > You can also consider "replica placement rules", see: > https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html. > I _think_ this would be a variant of "rack aware". In this case you'd > provide a "snitch" that says something about the hardware > characteristics and the rules you'd define would be sensitive to that. > > WARNING: haven't done this myself so don't have any examples to point to > > Best, > Erick > > On Tue, Jun 12, 2018 at 8:34 AM, Shawn Heisey wrote: >> On 6/12/2018 9:12 AM, Michael Braun wrote: >>> The way to handle this right now looks to be running additional Solr >>> instances on nodes with increased resources to balance the load (so if the >>> machines are 1x, 1.5x, and 2x, run 2 instances, 3 instances, and 4 >>> instances, respectively). Has anyone looked into other ways of handling >>> this that don't require the additional Solr instance deployments? >> >> Usually, no. In most cases, you only want to run one Solr instance per >> server. One Solr instance can handle many individual shard replicas. >> If there are more individual indexes on a Solr instance, then it is >> likely to be able to take advantage of additional system resources >> without running another Solr instance. >> >> The only time you should run multiple Solr instances is when the heap >> requirements for running the required indexes with one instance would be >> way too big. Splitting the indexes between two instances with smaller >> heaps might end up with much better garbage collection efficiency. >> >> https://lucene.apache.org/solr/guide/7_3/taking-solr-to-production.html#running-multiple-solr-nodes-per-host >> >> Thanks, >> Shawn >>
Re: Can replace the IP with the hostname or some unique identifier for each node in Solr
See this FAQ https://github.com/docker-solr/docker-solr/blob/master/Docker-FAQ.md#can-i-run-zookeeper-and-solr-clusters-under-docker -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 8. jun. 2018 kl. 14:52 skrev akshat : > > Hi, > > I have deployed Solr in docker swarm and scaling the replicas as 3. > > What I have achieved -> Created the Solr core replicas in the other > containers. > > Blocker -> When I kill a container > , D > ocker swarm brings another container with a different IP. So, when I see > the graph it is still pointing to the older dead node. But in ZooKeeper > live_nodes, I can see the newly registered node. > > So, For experimenting I am doing it manually through GUI by pointin > g > the new node > by > manually delet > ing > the older node from the collections in Solr GUI and created a replica in > the new node. > > My question -> Is it possible to some way we can trick the > S > olr by replacing the IP which it shows in the graph to some unique > identifier so that when swarm brings the new node it should still be > pointing to the unique identifier name, not the IP. > > -- > Regards > Akshat Singh
Re: Solr Suggest Component and OOM
Anyone from the Solr team who can shed some more light? On Tue, Jun 12, 2018 at 8:13 PM, Ratnadeep Rakshit wrote: > I observed that the build works if the data size is below 25M. The moment > the records go beyond that, this OOM error shows up. Solar itself shows 56% > usage of 20GB space during the build. So, is there some settings I need to > change to handle larger data size? > > On Tue, Jun 12, 2018 at 3:17 PM, Alessandro Benedetti < > a.benede...@sease.io> wrote: > >> Hi, >> first of all the two different suggesters you are using are based on >> different data structures ( with different memory utilisation) : >> >> - FuzzyLookupFactory -> FST ( in memory and stored binary on disk) >> - AnalyzingInfixLookupFactory -> Auxiliary Lucene Index >> >> Both the data structures should be very memory efficient ( both in >> building >> and storage). >> What is the cardinality of the fields you are building suggestions from ? >> ( >> site_address and site_address_other) >> What is the memory situation in Solr when you start the suggester >> building ? >> You are allocating much more memory to the JVM Solr process than the OS ( >> which in your situation doesn't fit the entire index ideal scenario). >> >> I would recommend to put some monitoring in place ( there are plenty of >> open >> source tools to do that) >> >> Regards >> >> >> >> - >> --- >> Alessandro Benedetti >> Search Consultant, R Software Engineer, Director >> Sease Ltd. - www.sease.io >> -- >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >> > >
Re: A good KV store/plugins to go with Solr
The approach that Alfresco/Solr takes with this is store the original document in filesystem when it indexes content. This way you can be frugal about which fields are stored in the index. Then Alfresco/Solr can retrieve the original document as part of the results using a doc transformer. This may be an approach that Solr could adopt. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Jun 14, 2018 at 8:10 AM, Jan Høydahl wrote: > You could fetch the data from your application directly :;) > Also, the Streaming expressions has a jdbc() function but then you will > need to know what to query for. It also has a fetch() function which > enriches documents with fields from another collection. It would probably > be possible to write a fetchKV() function which per result document fetches > data from external JDBC (or other) source and enriches on the fly. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > > 5. jun. 2018 kl. 05:38 skrev Erick Erickson : > > > > Well, you can always throw more replicas at the problem as well. > > > > But Andrea's comment is spot on. When Solr stores a field, it > > compresses it. So to fetch the stored info, it has to: > > 1> seek the disk > > 2> decompress at minimum 16K > > 3> assemble the response. > > > > All the while perhaps causing memory to be consumed, adding to GC > > issues and the like. > > > > One possibility is implement an doc transformer. See the class > > ValueAugmenterFactory for a model. What that does is, for each doc > > returned in the result set, call the transform method. > > > > Another approach would be to only index the first, say, 1K characters > > and just return _that_, along with a link for the full doc that you > > get from another store. Or, indeed from Solr itself since that would > > only be one doc at a time. If you put this in as a string type with > > docValues=true you would avoid most of the disk seek/decompression > > issues. > > > > Best, > > Erick > > > > On Mon, Jun 4, 2018 at 12:27 PM, Andrea Gazzarini > wrote: > >> Hi Sam, I have been in a similar scenario (not recently so my answer > could > >> be outdated). As far as I remember caching, at least in that scenario, > >> didn't help so much, probably because the field size. > >> > >> So we went with the second option: a custom SearchComponent connected > with > >> Redis. I'm not aware if such component is available somewhere but, trust > >> me, it's a very easy thing to write. > >> > >> Best, > >> Andrea > >> > >> On Mon, 4 Jun 2018, 20:45 Sambhav Kothari, > wrote: > >> > >>> Hi everyone, > >>> > >>> We at MetaBrainz are trying to scale our solr cloud instance but are > >>> hitting a bottle-neck. > >>> > >>> Each of the documents in our solr index is accompanied by a '_store' > field > >>> that store our API compatible response for that document (which is > >>> basically parsed and displayed by our custom response writer). > >>> > >>> The main problem is that this field is very large (It takes upto > 60-70% of > >>> our index) and because of this, Solr is struggling to keep up with our > >>> required reqs/s. > >>> > >>> Any ideas on how to improve upon this? > >>> > >>> I have a couple of options in mind - > >>> > >>> 1. Use caches extensively. > >>> 2. Have solr return only a doc id and fetch the response string from a > KV > >>> store/fast db. > >>> > >>> About 2 - are there any solr plugins will allow me to do this? > >>> > >>> Thanks, > >>> Sam > >>> > >
Re: A good KV store/plugins to go with Solr
You could fetch the data from your application directly :;) Also, the Streaming expressions has a jdbc() function but then you will need to know what to query for. It also has a fetch() function which enriches documents with fields from another collection. It would probably be possible to write a fetchKV() function which per result document fetches data from external JDBC (or other) source and enriches on the fly. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 5. jun. 2018 kl. 05:38 skrev Erick Erickson : > > Well, you can always throw more replicas at the problem as well. > > But Andrea's comment is spot on. When Solr stores a field, it > compresses it. So to fetch the stored info, it has to: > 1> seek the disk > 2> decompress at minimum 16K > 3> assemble the response. > > All the while perhaps causing memory to be consumed, adding to GC > issues and the like. > > One possibility is implement an doc transformer. See the class > ValueAugmenterFactory for a model. What that does is, for each doc > returned in the result set, call the transform method. > > Another approach would be to only index the first, say, 1K characters > and just return _that_, along with a link for the full doc that you > get from another store. Or, indeed from Solr itself since that would > only be one doc at a time. If you put this in as a string type with > docValues=true you would avoid most of the disk seek/decompression > issues. > > Best, > Erick > > On Mon, Jun 4, 2018 at 12:27 PM, Andrea Gazzarini > wrote: >> Hi Sam, I have been in a similar scenario (not recently so my answer could >> be outdated). As far as I remember caching, at least in that scenario, >> didn't help so much, probably because the field size. >> >> So we went with the second option: a custom SearchComponent connected with >> Redis. I'm not aware if such component is available somewhere but, trust >> me, it's a very easy thing to write. >> >> Best, >> Andrea >> >> On Mon, 4 Jun 2018, 20:45 Sambhav Kothari, wrote: >> >>> Hi everyone, >>> >>> We at MetaBrainz are trying to scale our solr cloud instance but are >>> hitting a bottle-neck. >>> >>> Each of the documents in our solr index is accompanied by a '_store' field >>> that store our API compatible response for that document (which is >>> basically parsed and displayed by our custom response writer). >>> >>> The main problem is that this field is very large (It takes upto 60-70% of >>> our index) and because of this, Solr is struggling to keep up with our >>> required reqs/s. >>> >>> Any ideas on how to improve upon this? >>> >>> I have a couple of options in mind - >>> >>> 1. Use caches extensively. >>> 2. Have solr return only a doc id and fetch the response string from a KV >>> store/fast db. >>> >>> About 2 - are there any solr plugins will allow me to do this? >>> >>> Thanks, >>> Sam >>>
Re: Logging Every document to particular core
Isn't the Transaction Log what you are looking for ? Read this good blog post as a reference : https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Cheers - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Solr 7.2.1 Master-slave replication Issue
Hi, Facing issue in Solr 7.2.1 Master-slave replication, Master-slave replication is working fine. But if I disable replication from master, Slaves shows no data (numFound=0). Slave in not serving data, it had before replication. I suspect, Index generation is getting updated in slave, which was not there is previous Solr version. Please advise. Thanks, Nitin