RE: Where is Stored values resides ?
Hi, To my best knowledge the getopt luke is not supported anymore. Use this instead: https://github.com/DmitryKey/luke Regards, Dmitry Hi Prabaharan, You can use Luke to open an index. http://www.getopt.org/luke/ -Original Message- From: Rajendran, Prabaharan [mailto:rajendra...@dnb.com] Sent: Friday, June 24, 2016 3:42 PM To: solr-user@lucene.apache.org Subject: RE: Where is Stored values resides ? Thanks for the information, I just want to see how it stored data. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 24 June 2016 00:51 To: solr-user Subject: Re: Where is Stored values resides ? stored="true" data is kept in *.fdt and *.fdx files in the index directory, see: http://lucene.apache.org/core/6_1_0/core/org/apache/lucene/codecs/lucene60/package-summary.html#package_description the "summary of file extensions" section. The stored data is compressed, so you can't really read it with, say, a text editor. But the data stored there is just a copy of the input. What are you trying to do that you need to look at it? Best, Erick On Thu, Jun 23, 2016 at 8:32 AM, Rajendran, Prabaharan wrote: > Hi, > > I hope that indexed ("index=true") values resides in data/index directory. > > May I know where the stored ("stored=true") values placed, is that possible to see the contents ? > > Thanks, > Prabaharan > >
Re: Solr Date Query "Intraday"
Not if I'm reading this right. You want the docs from June 1 with a timestamp between 13:00 and 16:00 but not one from, say, 11:00. Ditto for the other days, right? If it's a predictable interval or a predictable granularity (i.e. the resolution you want is always going to be even hours) you could index a field hour_of_day in which case the query is something like timestamp:[2016-06-01T00:00:00Z TO 2016-06-10T59:59:59Z] AND hour_of_day:[13 TO 16]. That falls down if the intervals need to be arbitrary of course. There might be something tricky you can do with geospatial believe it or not. I confess I have to look it up every time, but here's a place to start: https://wiki.apache.org/solr/SpatialForTimeDurations Best, Erick On Fri, Jul 22, 2016 at 3:26 PM, Felipe Vinturini wrote: > Hi all, > > Is there a way to query solr between dates and query like "intraday", > between hours in those days? Something like: I want to search field "text" > with value: "test" and field "date" between 20160601 AND 20160610 and > between only hours of those days: 1PM AND 4PM? > > I know I could loop over the dates, I just would like to know if there is > another way to do it in Solr. My Solr version is: 4.10.2. > > Also, is there a "name" for these kind of queries? > > Thanks a lot for your attention and help. > > Regards, > Felipe.
Re: loading zookeeper data
bq: Zookeeper seems a step backward. For stand-alone Solr, I tend to agree it's a bit awkward. But as Shawn says, there's no _need_ to run Zookeeper with a more recent Solr. Running Solr without Zookeeper is perfectly possible, we call that "stand alone". And, if you have no need for sharding etc., there's no compelling reason to run SolrCloud. Well, there are some good reasons having to do with fail-over and the like, but... Where SolrCloud becomes compelling is when you _do_ need to have shard, and deal with HA/DR. Then the added step of maintaining things in Zookeeper is a small price to pay for _not_ having to be sure that all the configs on all the servers are all the same. Imagine a cluster with several hundred replicas out there. Being absolutely sure that all of them have the same configs, have been restarted and the like becomes daunting. So having to do an "upconfig" is a good tradeoff IMO. The bin/solr script has a "zk -upconfig" parameter that'll take care of pushing the configs up. Since you already have the configs in VCS, your process is just to pull them from vcs to "somewhere" then bin/solr zk -upconfig -z zookeeper_asserss -n configset_name -d directory_you_downloaded_to_from_VCS. Thereafter you simply refer to them by name when you create a collection and the rest of it is automatic. Every time a core reloads it gets the new configs. If you're trying to manipulate _cores_, that may be where you're going wrong. Think of them as _collections_. What's not clear from your problem statement is whether these cores on the various machines are part of the same collection or not. Do you have multiple shards in one logical index? Or do you have multiple collections that have masters/slaves (in which case the master and all the slaves that point to it will be a "collection")? Do all of the cores you have use the same configurations? Or is each set of master/slaves using a different configuration? Best, Erick On Fri, Jul 22, 2016 at 4:41 PM, Aristedes Maniatis wrote: > On 22/07/2016 5:22pm, Aristedes Maniatis wrote: >> But then what? In the production cluster it seems I then need to >> >> 1. Grab the latest configuration bundle for each core and unpack them >> 2. Launch Java >> 3. Execute the Solr jars (from the production server since it must be the >> right version) >> - with org.apache.solr.cloud.ZkCLI >> - and some parameters pointing to the production Zookeeper cluster >> - pointing also to the unpacked config files >> 4. Parse the output to understand if any error happened >> 5. Wait for Solr to pick up the new configuration and do any final >> production checks > > Shawn wrote: > >> If you *do* want to run in cloud mode, then you will need to use zkcli to >> upload config changes to zookeeper and then issue a collection reload with >> the Collections API. This will find and reload all the cores related to that >> collection, across the entire cloud. You have the option of using the ZkCLI >> java class, or the zkcli.sh script that can be found in all 5.x and 6.x >> installs at server/scripts/cloud-scripts. As of version 5.3, the jars >> required for zkcli are already unpacked before Solr is started. > > > Thanks Shawn, > > I'm trying to understand the common workflow of deploying configuration to > Zookeeper. I'm new to that tool, so at this point it appears to be a big > black box which can only be populated with data with a specific Java > application. Surely others here on this list use configuration management > tools and other non-manual workflows. > > I've written a little gradle task to wrap up sending data to zookeeper: > > task deployConfig { > description = 'Upload configuration to production zookeeper cluster.' > file('src/main/resources/solr').eachDir { core -> > doLast { > javaexec { > classpath configurations.zookeeper > main = 'org.apache.solr.cloud.ZkCLI' > args = [ > "-confdir", core, > "-zkhost", "solr.host.com:2181", > "-cmd", "upconfig", > "-confname", core.name > ] > } > } > } > } > > > That does the trick, although I've not yet figured out how to know whether it > was successful because it doesn't return anything. And as I outlined above, > it is quite cumbersome to automate. Are you saying that everyone who runs > SolrCloud runs all these scripts against their production jars by hand? > > Zookeeper seems a step backward from files on disk in terms of ease of > automation, inspecting for problems, version control and a new point of > failure. > > Perhaps because I'm new to it I'm missing a set of tools that make all that > much easier. Or for that matter, I'm missing an understanding of what problem > Zookeeper solves. > > Ari > > > -- > --> > Aristedes Maniatis > CEO, ish > https://www
Re: loading zookeeper data
On 22/07/2016 5:22pm, Aristedes Maniatis wrote: > But then what? In the production cluster it seems I then need to > > 1. Grab the latest configuration bundle for each core and unpack them > 2. Launch Java > 3. Execute the Solr jars (from the production server since it must be the > right version) > - with org.apache.solr.cloud.ZkCLI > - and some parameters pointing to the production Zookeeper cluster > - pointing also to the unpacked config files > 4. Parse the output to understand if any error happened > 5. Wait for Solr to pick up the new configuration and do any final production > checks Shawn wrote: > If you *do* want to run in cloud mode, then you will need to use zkcli to > upload config changes to zookeeper and then issue a collection reload with > the Collections API. This will find and reload all the cores related to that > collection, across the entire cloud. You have the option of using the ZkCLI > java class, or the zkcli.sh script that can be found in all 5.x and 6.x > installs at server/scripts/cloud-scripts. As of version 5.3, the jars > required for zkcli are already unpacked before Solr is started. Thanks Shawn, I'm trying to understand the common workflow of deploying configuration to Zookeeper. I'm new to that tool, so at this point it appears to be a big black box which can only be populated with data with a specific Java application. Surely others here on this list use configuration management tools and other non-manual workflows. I've written a little gradle task to wrap up sending data to zookeeper: task deployConfig { description = 'Upload configuration to production zookeeper cluster.' file('src/main/resources/solr').eachDir { core -> doLast { javaexec { classpath configurations.zookeeper main = 'org.apache.solr.cloud.ZkCLI' args = [ "-confdir", core, "-zkhost", "solr.host.com:2181", "-cmd", "upconfig", "-confname", core.name ] } } } } That does the trick, although I've not yet figured out how to know whether it was successful because it doesn't return anything. And as I outlined above, it is quite cumbersome to automate. Are you saying that everyone who runs SolrCloud runs all these scripts against their production jars by hand? Zookeeper seems a step backward from files on disk in terms of ease of automation, inspecting for problems, version control and a new point of failure. Perhaps because I'm new to it I'm missing a set of tools that make all that much easier. Or for that matter, I'm missing an understanding of what problem Zookeeper solves. Ari -- --> Aristedes Maniatis CEO, ish https://www.ish.com.au GPG fingerprint CBFB 84B4 738D 4E87 5E5C 5EFA EF6A 7D2E 3E49 102A signature.asc Description: OpenPGP digital signature
RE: Any option to NOT return stack trace in Solr response?
Hi Alex, Thanks for confirming my finding. When it comes to Solr interfacing to a client, I agree completely. However, I was hoping to limit the noise at Solr and not have to add extra code to filter out the exceptions. Just wondering, wouldn't it be a cleaner RESTFUL interface if instead of reporting the stack trace in response, Solr would return an error code and a basic message pointing back to Solr log for details such as stack trace. I am curious, what use case would it serve where one would require the stack trace in response? If there is interest, I could open an JIRA and come up with a patch. Regards, Koorosh -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Thursday, July 21, 2016 6:54 PM To: solr-user Subject: Re: Any option to NOT return stack trace in Solr response? I don't think there is a flag. But the bigger question is whether you are exposing Solr directly to the client? You should not be. You should have a middleware client that talks to Solr and then generates web UI or whatever. If you give untrusted access to Solr, there are too many things that can be done, starting from deleting the whole index. It might be possible to have a smart proxy and expose Solr with heavily filtered valid URLs, then you would need to scrub response. That's all I can think of without hacking and reregistering with your own response handler (probably not that hard). Regards, Alex. Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 22 July 2016 at 03:35, Koorosh Vakhshoori wrote: > Hi all, > Got a Solr 5.2.1 installation. I am getting following error response when > calling the TERMS component. Now the error is not the point, I know what is > going on in this instance. However, to address security concerns, I am trying > to have Solr truncate the stack trace in the response. Of course I would > still want Solr to log the error in its log file. What I was wondering, if > there is a flag or option I can set in solrconfig.xml globally or under TERMS > to omit the trace or just return ' java.lang.NullPointerException'? I have > looked at the source code and don't see anything relevant. However, I may > have missed something. Appreciated any suggestion and pointers. > > > > 500 > 5 > > > > java.lang.NullPointerException at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sear > chHandler.java:322) at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle > rBase.java:143) at > org.apache.solr.core.SolrCore.execute(SolrCore.java:2067) at > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654) at > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450) at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter > .java:227) at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter > .java:196) at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli > cationFilterChain.java:239) at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi > lterChain.java:206) at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli > cationFilterChain.java:239) at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi > lterChain.java:206) at > org.apache.catalina.filters.CorsFilter.handleNonCORS(CorsFilter.java:4 > 39) at > org.apache.catalina.filters.CorsFilter.doFilter(CorsFilter.java:178) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli > cationFilterChain.java:239) at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi > lterChain.java:206) at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa > lve.java:219) at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:136) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79) > at > org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:610) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:526) > at > org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1078) > at > org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:655) > at > org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:222) > at > org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1566) > at > org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1523) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurr
Solr Date Query "Intraday"
Hi all, Is there a way to query solr between dates and query like "intraday", between hours in those days? Something like: I want to search field "text" with value: "test" and field "date" between 20160601 AND 20160610 and between only hours of those days: 1PM AND 4PM? I know I could loop over the dates, I just would like to know if there is another way to do it in Solr. My Solr version is: 4.10.2. Also, is there a "name" for these kind of queries? Thanks a lot for your attention and help. Regards, Felipe.
Re: solr.NRTCachingDirectoryFactory
On 7/22/16 9:56 AM, Erick Erickson wrote: OK, scratch autowarming. In fact your autowarm counts are quite high, I suspect far past "diminishing returns". I usually see autowarm counts < 64, but YMMV. Are you seeing actual hit ratios that are decent on those caches (admin UI>>plugins/stats>>cache>>...) And your cache sizes are also quite high in my experience, it's probably worth measuring the utilization there as well. And, BTW, your filterCache can occupy up to 2G of your heap. That's probably not your central problem, but it's something to consider. Will look into it. So I don't know why your queries are taking that long, my assumption is that they may simply be very complex queries, or you have grouping on or. Queries are a bit complex for sure. I guess the next thing I'd do is start trying to characterize what queries are slow. Grouping? Pivot Faceting? 'cause from everything you've said so far it's surprising that you're seeing queries take this long, something doesn't feel right but what it is I don't have a clue. Thanks Best, Erick On Fri, Jul 22, 2016 at 9:15 AM, Rallavagu wrote: On 7/22/16 8:34 AM, Erick Erickson wrote: Mostly this sounds like a problem that could be cured with autowarming. But two things are conflicting here: 1> you say "We have a requirement to have updates available immediately (NRT)" 2> your docs aren't available for 120 seconds given your autoSoftCommit settings unless you're specifying -Dsolr.autoSoftCommit.maxTime=some_other_interval as a startup parameter. Yes. We have 120 seconds available. So assuming you really do have a 120 second autocommit time, you should be able to smooth out the spikes by appropriate autowarming. You also haven't indicated what your filterCache and queryResultCache settings are. They come with a default of 0 for autowarm. But what is their size? And do you see a correlation between longer queries every on 2 minute intervals? And do you have some test harness in place (jmeter works well) to demonstrate that differences in your configuration help or hurt? I can't over-emphasize the importance of this, otherwise if you rely on somebody simply saying "it's slow" you have no way to know what effect changes have. Here is the cache configuration. We have run load tests using JMeter with directory pointing to Solr and also tests that are pointing to the application that queries Solr. In both cases, we have noticed the results being slower. Thanks Best, Erick On Thu, Jul 21, 2016 at 11:22 PM, Shawn Heisey wrote: On 7/21/2016 11:25 PM, Rallavagu wrote: There is no other software running on the system and it is completely dedicated to Solr. It is running on Linux. Here is the full version. Linux version 3.8.13-55.1.6.el7uek.x86_64 (mockbu...@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015 Run the top program, press shift-M to sort by memory usage, and then grab a screenshot of the terminal window. Share it with a site like dropbox, imgur, or something similar, and send the URL. You'll end up with something like this: https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0 If you know what to look for, you can figure out all the relevant memory details from that. Thanks, Shawn
Re: Should streaming place load on the app server?
Since I'm using SolrJ as a conduit to Solr, to have the searches processed on a Solr server I need to wrap everything in a ParallelStream object. Got it, thanks! Joel Bernstein wrote > > If you just use the Java API directly, the code executes in the VM where > the code is run. You could use the ParallelStream to send the code to a > SolrCloud worker to execute the code as well. In this scenario the code is > serialized to a Streaming Expression and sent across the wire to the Solr > node. > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Fri, Jul 22, 2016 at 11:23 AM, tedsolr < > tsmith@ > > wrote: > >> The streaming API looks like it's meant to be run from the client app >> server >> - very similar to a standard Solr search. When I run a basic streaming >> operation the memory consumption occurs on the app server jvm, not the >> solr >> server jvm. The opposite of what I was expecting. >> >> (pseudo code) >> Stream A = new CloudSolrStream(); >> Stream B = new CloudSolrStream(); >> Stream C = new HashJoinStream(A, B); >> Stream D = new SortStream(C); >> Stream E = new ReducerStream(D); >> E.open(); >> >> The SortStream is processed in memory when open() is called. Can the >> processing be pushed off to the Solr cluster? Is that what the Parallel >> stream will do - using worker collections? >> >> confused, >> Ted >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Should-streaming-place-load-on-the-app-server-tp4288466.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> -- View this message in context: http://lucene.472066.n3.nabble.com/Should-streaming-place-load-on-the-app-server-tp4288466p4288521.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr.NRTCachingDirectoryFactory
Also, here is the link to screenshot. https://dl.dropboxusercontent.com/u/39813705/Screen%20Shot%202016-07-22%20at%2010.40.21%20AM.png Thanks On 7/21/16 11:22 PM, Shawn Heisey wrote: On 7/21/2016 11:25 PM, Rallavagu wrote: There is no other software running on the system and it is completely dedicated to Solr. It is running on Linux. Here is the full version. Linux version 3.8.13-55.1.6.el7uek.x86_64 (mockbu...@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015 Run the top program, press shift-M to sort by memory usage, and then grab a screenshot of the terminal window. Share it with a site like dropbox, imgur, or something similar, and send the URL. You'll end up with something like this: https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0 If you know what to look for, you can figure out all the relevant memory details from that. Thanks, Shawn
Re: solr.NRTCachingDirectoryFactory
Here is the snapshot of memory usage from "top" as you mentioned. First row is "solr" process. Thanks. PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 29468 solr 20 0 27.536g 0.013t 3.297g S 45.7 27.6 4251:45 java 21366 root 20 0 14.499g 217824 12952 S 1.0 0.4 192:11.54 java 2077 root 20 0 14.049g 190824 9980 S 0.7 0.4 62:44.00 java 511 root 20 0 125792 56848 56616 S 0.0 0.1 9:33.23 systemd-journal 316 splunk20 0 232056 44284 11804 S 0.7 0.1 84:52.74 splunkd 1045 root 20 0 257680 39956 6836 S 0.3 0.1 7:05.78 puppet 32631 root 20 0 360956 39292 4788 S 0.0 0.1 4:55.37 mcollectived 703 root 20 0 250372 9000976 S 0.0 0.0 1:35.52 rsyslogd 1058 nslcd 20 0 454192 6004 2996 S 0.0 0.0 15:08.87 nslcd On 7/21/16 11:22 PM, Shawn Heisey wrote: On 7/21/2016 11:25 PM, Rallavagu wrote: There is no other software running on the system and it is completely dedicated to Solr. It is running on Linux. Here is the full version. Linux version 3.8.13-55.1.6.el7uek.x86_64 (mockbu...@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015 Run the top program, press shift-M to sort by memory usage, and then grab a screenshot of the terminal window. Share it with a site like dropbox, imgur, or something similar, and send the URL. You'll end up with something like this: https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0 If you know what to look for, you can figure out all the relevant memory details from that. Thanks, Shawn
Re: Solr 4.3.1 - Spell-Checker with MULTI-WORD PHRASE
Hi all - please help me here On Thursday, July 21, 2016, SRINI SOLR wrote: > Hi All - > Could you please help me on spell check on multi-word phrase as a whole... > Scenario - > I have a problem with solr spellcheck suggestions for multi word phrases. With the query for 'red chillies' > > q=red+chillies&wt=xml&indent=true&spellcheck=true&spellcheck.extendedResults=true&spellcheck.collate=true > > I get > > > > 2 > 4 > 12 > 0 > > chiller4 > challis2 > > > false > red chiller > > > The problem is, even though 'chiller' has 4 results in index, 'red chiller' has none. So we end up suggesting a phrase with 0 result. > > What can I do to make spellcheck work on the whole phrase only? > > Please help me here ...
Re: Should streaming place load on the app server?
Streaming Expression can be sent to any SolrCloud node in any collection. You can setup collections that have no data and just execute the expressions. The expressions reference other collections that hold data. Collections that only execute expressions we can call "Worker Collections". Collections that hold data we can call "Search Collections". This allows you to have servers with a different specification for workers and search nodes and offload the the workloads onto different collections. The parallel function sends a streaming expression to N worker nodes. Each worker node executes the expression and processes a partition of the data. Both the search function and the topic function can be parallelized like this. So operations like joins that wrap a search can be parallelized. If you just use the Java API directly, the code executes in the VM where the code is run. You could use the ParallelStream to send the code to a SolrCloud worker to execute the code as well. In this scenario the code is serialized to a Streaming Expression and sent across the wire to the Solr node. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Jul 22, 2016 at 11:23 AM, tedsolr wrote: > The streaming API looks like it's meant to be run from the client app > server > - very similar to a standard Solr search. When I run a basic streaming > operation the memory consumption occurs on the app server jvm, not the solr > server jvm. The opposite of what I was expecting. > > (pseudo code) > Stream A = new CloudSolrStream(); > Stream B = new CloudSolrStream(); > Stream C = new HashJoinStream(A, B); > Stream D = new SortStream(C); > Stream E = new ReducerStream(D); > E.open(); > > The SortStream is processed in memory when open() is called. Can the > processing be pushed off to the Solr cluster? Is that what the Parallel > stream will do - using worker collections? > > confused, > Ted > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Should-streaming-place-load-on-the-app-server-tp4288466.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: solr.NRTCachingDirectoryFactory
OK, scratch autowarming. In fact your autowarm counts are quite high, I suspect far past "diminishing returns". I usually see autowarm counts < 64, but YMMV. Are you seeing actual hit ratios that are decent on those caches (admin UI>>plugins/stats>>cache>>...) And your cache sizes are also quite high in my experience, it's probably worth measuring the utilization there as well. And, BTW, your filterCache can occupy up to 2G of your heap. That's probably not your central problem, but it's something to consider. So I don't know why your queries are taking that long, my assumption is that they may simply be very complex queries, or you have grouping on or. I guess the next thing I'd do is start trying to characterize what queries are slow. Grouping? Pivot Faceting? 'cause from everything you've said so far it's surprising that you're seeing queries take this long, something doesn't feel right but what it is I don't have a clue. Best, Erick On Fri, Jul 22, 2016 at 9:15 AM, Rallavagu wrote: > > > On 7/22/16 8:34 AM, Erick Erickson wrote: >> >> Mostly this sounds like a problem that could be cured with >> autowarming. But two things are conflicting here: >> 1> you say "We have a requirement to have updates available immediately >> (NRT)" >> 2> your docs aren't available for 120 seconds given your autoSoftCommit >> settings unless you're specifying >> -Dsolr.autoSoftCommit.maxTime=some_other_interval >> as a startup parameter. >> > Yes. We have 120 seconds available. > >> So assuming you really do have a 120 second autocommit time, you should be >> able to smooth out the spikes by appropriate autowarming. You also haven't >> indicated what your filterCache and queryResultCache settings are. They >> come with a default of 0 for autowarm. But what is their size? And do you >> see a correlation between longer queries every on 2 minute intervals? And >> do you have some test harness in place (jmeter works well) to demonstrate >> that differences in your configuration help or hurt? I can't >> over-emphasize the >> importance of this, otherwise if you rely on somebody simply saying "it's >> slow" >> you have no way to know what effect changes have. > > > Here is the cache configuration. > > size="5000" > initialSize="5000" > autowarmCount="500"/> > > > size="2" > initialSize="2" > autowarmCount="500"/> > > > size="10" >initialSize="10" >autowarmCount="0"/> > > We have run load tests using JMeter with directory pointing to Solr and also > tests that are pointing to the application that queries Solr. In both cases, > we have noticed the results being slower. > > Thanks > >> >> Best, >> Erick >> >> >> On Thu, Jul 21, 2016 at 11:22 PM, Shawn Heisey >> wrote: >>> >>> On 7/21/2016 11:25 PM, Rallavagu wrote: There is no other software running on the system and it is completely dedicated to Solr. It is running on Linux. Here is the full version. Linux version 3.8.13-55.1.6.el7uek.x86_64 (mockbu...@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015 >>> >>> >>> Run the top program, press shift-M to sort by memory usage, and then >>> grab a screenshot of the terminal window. Share it with a site like >>> dropbox, imgur, or something similar, and send the URL. You'll end up >>> with something like this: >>> >>> https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0 >>> >>> If you know what to look for, you can figure out all the relevant memory >>> details from that. >>> >>> Thanks, >>> Shawn >>> >
Re: solr.NRTCachingDirectoryFactory
On 7/22/16 8:34 AM, Erick Erickson wrote: Mostly this sounds like a problem that could be cured with autowarming. But two things are conflicting here: 1> you say "We have a requirement to have updates available immediately (NRT)" 2> your docs aren't available for 120 seconds given your autoSoftCommit settings unless you're specifying -Dsolr.autoSoftCommit.maxTime=some_other_interval as a startup parameter. Yes. We have 120 seconds available. So assuming you really do have a 120 second autocommit time, you should be able to smooth out the spikes by appropriate autowarming. You also haven't indicated what your filterCache and queryResultCache settings are. They come with a default of 0 for autowarm. But what is their size? And do you see a correlation between longer queries every on 2 minute intervals? And do you have some test harness in place (jmeter works well) to demonstrate that differences in your configuration help or hurt? I can't over-emphasize the importance of this, otherwise if you rely on somebody simply saying "it's slow" you have no way to know what effect changes have. Here is the cache configuration. We have run load tests using JMeter with directory pointing to Solr and also tests that are pointing to the application that queries Solr. In both cases, we have noticed the results being slower. Thanks Best, Erick On Thu, Jul 21, 2016 at 11:22 PM, Shawn Heisey wrote: On 7/21/2016 11:25 PM, Rallavagu wrote: There is no other software running on the system and it is completely dedicated to Solr. It is running on Linux. Here is the full version. Linux version 3.8.13-55.1.6.el7uek.x86_64 (mockbu...@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015 Run the top program, press shift-M to sort by memory usage, and then grab a screenshot of the terminal window. Share it with a site like dropbox, imgur, or something similar, and send the URL. You'll end up with something like this: https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0 If you know what to look for, you can figure out all the relevant memory details from that. Thanks, Shawn
Re: solr.NRTCachingDirectoryFactory
Mostly this sounds like a problem that could be cured with autowarming. But two things are conflicting here: 1> you say "We have a requirement to have updates available immediately (NRT)" 2> your docs aren't available for 120 seconds given your autoSoftCommit settings unless you're specifying -Dsolr.autoSoftCommit.maxTime=some_other_interval as a startup parameter. So assuming you really do have a 120 second autocommit time, you should be able to smooth out the spikes by appropriate autowarming. You also haven't indicated what your filterCache and queryResultCache settings are. They come with a default of 0 for autowarm. But what is their size? And do you see a correlation between longer queries every on 2 minute intervals? And do you have some test harness in place (jmeter works well) to demonstrate that differences in your configuration help or hurt? I can't over-emphasize the importance of this, otherwise if you rely on somebody simply saying "it's slow" you have no way to know what effect changes have. Best, Erick On Thu, Jul 21, 2016 at 11:22 PM, Shawn Heisey wrote: > On 7/21/2016 11:25 PM, Rallavagu wrote: >> There is no other software running on the system and it is completely >> dedicated to Solr. It is running on Linux. Here is the full version. >> >> Linux version 3.8.13-55.1.6.el7uek.x86_64 >> (mockbu...@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red >> Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015 > > Run the top program, press shift-M to sort by memory usage, and then > grab a screenshot of the terminal window. Share it with a site like > dropbox, imgur, or something similar, and send the URL. You'll end up > with something like this: > > https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0 > > If you know what to look for, you can figure out all the relevant memory > details from that. > > Thanks, > Shawn >
Re: Solr query - response status
Thanks Shawn for your insight! On Fri, Jul 22, 2016 at 6:32 PM, Shawn Heisey wrote: > On 7/22/2016 12:41 AM, Shyam R wrote: > > I see that SOLR returns status value as 0 for successful searches > > org.apache.solr.core.SolrCore; [users_shadow_shard1_replica1] > > webapp=/solr path=/user/ping params={} status=0 QTime=0 I do see that > > the status come's back as 400 whenever the search is invalid ( > > invoking query with parameters that are not available in the target > > collection ) What are the legitimate values of status and reason for > > choosing 0? > > Solr (Jetty, really) sends back "200" for the HTTP status code when the > request status is zero. > > The reason Solr uses a status of zero internally has its origins in the > way most operating systems deal with program exit codes. Almost > universally, when a program exits with an exit code of 0, it tells the > operating system that the exit was normal, no errors. Any positive > number indicates some kind of error. The reason this is not reversed is > simple -- unlike HTTP, which has multiple codes meaning success, > operating systems must handle many different error codes, but only one > success code. So the success code is assigned to the number that's > inherently different from the rest -- zero. > > Internally, Solr doesn't necessarily know that the response is going to > use HTTP, although that is the most common method. In the mind of a > typical open source developer, an exit status of ANY positive number > means there was an error, including 200. Once control is handed off to > Jetty, then the zero success status is translated to the most-used > success code for HTTP. > > Any number could *potentially* be valid for the status in Solr logs, but > I've only ever seen zero, 40x, and 50x. The 40x series means there was > a problem detected in the request, 50x means an error happened inside > Solr itself after the request was determined to be good. The ping > handler will return a 503 statusif the health check is put into a > disabledstate. > > Thanks, > Shawn > > -- Ph: 9845704792
Re: Solr "replicateAfter optimize" is specified, but replication starts also on commits and master startup (tested on solr 5.5.2)
Well, if it is a bug you can spoof it by not issuing any commits until the indexing is completed. Certainly not elegant, and you risk having to re-index from scratch if your machine dies. Or take explicit control over it, which in your case might be preferable through the replication API, see: https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexReplication-HTTPAPICommandsfortheReplicationHandler Best, Erick On Fri, Jul 22, 2016 at 7:00 AM, Alessandro Bon wrote: > Thanks for your answer Shawn, > > If I got you, you are saying that regardless the "replicateAfter" directive > is "commit" or "optimize", a replication is triggered whenever a segments > merge occurs. Is that right? > Or is it triggered only when a full index merge occurs, which could happen > after a commit as well (other than after an optimization)? > > I would love to switch to SolrCloud, and for sure I will in the future, but > right now I just have to get the old master/slave architecture to work > properly. > > Thanks again, > Alessandro > > -Original Message- > From: Shawn Heisey [mailto:apa...@elyograg.org] > Sent: Friday, July 22, 2016 3:37 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr "replicateAfter optimize" is specified, but replication > starts also on commits and master startup (tested on solr 5.5.2) > > On 7/22/2016 4:02 AM, Alessandro Bon wrote: >> Issue: Full index replicas occur sometimes on master startup and after >> commits, despite only the optimize >> directive is specified. In the case of replica on commit, it occurs >> only for sufficiently big commits. Replica correctly starts again at >> the end of my indexing job, after the optimization phase. As result of >> this behaviour I get incomplete indexes on slaves during the indexing >> process. > > There's a known bug where full index replication happens after master > restart. This was supposed to be fixed in 5.5.2and 6.1.0, but you say you > are running 5.5.2. > > https://issues.apache.org/jira/browse/SOLR-9036 > > All replications are *supposed* to be delta replications -- only new/changed > files. Note that normal commits can cause segment merging, up to and > including the entire index if conditions are just right. > Segment merges can result in new segment files that are very large, which > could take a long time to replicate. > > Optimizing the index is a forced merge to one segment. This will always lead > to a full-index replication, because the entire index is rewritten into a > single segment and all the other segment files are deleted. > > You might want to give SolrCloud a try. There are no masters and no slaves. > It is a true redundant cluster. > > Thanks, > Shawn >
Should streaming place load on the app server?
The streaming API looks like it's meant to be run from the client app server - very similar to a standard Solr search. When I run a basic streaming operation the memory consumption occurs on the app server jvm, not the solr server jvm. The opposite of what I was expecting. (pseudo code) Stream A = new CloudSolrStream(); Stream B = new CloudSolrStream(); Stream C = new HashJoinStream(A, B); Stream D = new SortStream(C); Stream E = new ReducerStream(D); E.open(); The SortStream is processed in memory when open() is called. Can the processing be pushed off to the Solr cluster? Is that what the Parallel stream will do - using worker collections? confused, Ted -- View this message in context: http://lucene.472066.n3.nabble.com/Should-streaming-place-load-on-the-app-server-tp4288466.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr "replicateAfter optimize" is specified, but replication starts also on commits and master startup (tested on solr 5.5.2)
Thanks for your answer Shawn, If I got you, you are saying that regardless the "replicateAfter" directive is "commit" or "optimize", a replication is triggered whenever a segments merge occurs. Is that right? Or is it triggered only when a full index merge occurs, which could happen after a commit as well (other than after an optimization)? I would love to switch to SolrCloud, and for sure I will in the future, but right now I just have to get the old master/slave architecture to work properly. Thanks again, Alessandro -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Friday, July 22, 2016 3:37 PM To: solr-user@lucene.apache.org Subject: Re: Solr "replicateAfter optimize" is specified, but replication starts also on commits and master startup (tested on solr 5.5.2) On 7/22/2016 4:02 AM, Alessandro Bon wrote: > Issue: Full index replicas occur sometimes on master startup and after > commits, despite only the optimize > directive is specified. In the case of replica on commit, it occurs > only for sufficiently big commits. Replica correctly starts again at > the end of my indexing job, after the optimization phase. As result of > this behaviour I get incomplete indexes on slaves during the indexing > process. There's a known bug where full index replication happens after master restart. This was supposed to be fixed in 5.5.2and 6.1.0, but you say you are running 5.5.2. https://issues.apache.org/jira/browse/SOLR-9036 All replications are *supposed* to be delta replications -- only new/changed files. Note that normal commits can cause segment merging, up to and including the entire index if conditions are just right. Segment merges can result in new segment files that are very large, which could take a long time to replicate. Optimizing the index is a forced merge to one segment. This will always lead to a full-index replication, because the entire index is rewritten into a single segment and all the other segment files are deleted. You might want to give SolrCloud a try. There are no masters and no slaves. It is a true redundant cluster. Thanks, Shawn
Re: Solr "replicateAfter optimize" is specified, but replication starts also on commits and master startup (tested on solr 5.5.2)
On 7/22/2016 4:02 AM, Alessandro Bon wrote: > Issue: Full index replicas occur sometimes on master startup and after > commits, despite only the optimize > directive is specified. In the case of replica on commit, it occurs > only for sufficiently big commits. Replica correctly starts again at > the end of my indexing job, after the optimization phase. As result of > this behaviour I get incomplete indexes on slaves during the indexing > process. There's a known bug where full index replication happens after master restart. This was supposed to be fixed in 5.5.2and 6.1.0, but you say you are running 5.5.2. https://issues.apache.org/jira/browse/SOLR-9036 All replications are *supposed* to be delta replications -- only new/changed files. Note that normal commits can cause segment merging, up to and including the entire index if conditions are just right. Segment merges can result in new segment files that are very large, which could take a long time to replicate. Optimizing the index is a forced merge to one segment. This will always lead to a full-index replication, because the entire index is rewritten into a single segment and all the other segment files are deleted. You might want to give SolrCloud a try. There are no masters and no slaves. It is a true redundant cluster. Thanks, Shawn
Re: loading zookeeper data
On 7/22/2016 1:22 AM, Aristedes Maniatis wrote: > I'm not new to Solr, but I'm upgrading from Solr 4 to 5 and needing to > use the new Zookeeper configuration requirement. It is adding a lot of > extra complexity to our deployment and I want to check that we are > doing it right. Zookeeper is not required for Solr 5, or even for Solr 6. It's only required for SolrCloud. SolrCloud is an operating mode that is not mandatory. SolrCloud has been around since Solr 4.0.0. > The problem we want to escape is that this configuration causes > outages and other random issues each time the Solr master does a full > reload. It shouldn't, but it does and hopefully the new SolrCluster > will be better. The fact that Solr does a full replication when the master is restarted/reloaded is a bug. This bug is fixed in 5.5.2 and 6.1.0. https://issues.apache.org/jira/browse/SOLR-9036 If you *do* want to run in cloud mode, then you will need to use zkcli to upload config changes to zookeeper and then issue a collection reload with the Collections API. This will find and reload all the cores related to that collection, across the entire cloud. You have the option of using the ZkCLI java class, or the zkcli.sh script that can be found in all 5.x and 6.x installs at server/scripts/cloud-scripts. As of version 5.3, the jars required for zkcli are already unpacked before Solr is started. Thanks, Shawn
Re: Solr query - response status
On 7/22/2016 12:41 AM, Shyam R wrote: > I see that SOLR returns status value as 0 for successful searches > org.apache.solr.core.SolrCore; [users_shadow_shard1_replica1] > webapp=/solr path=/user/ping params={} status=0 QTime=0 I do see that > the status come's back as 400 whenever the search is invalid ( > invoking query with parameters that are not available in the target > collection ) What are the legitimate values of status and reason for > choosing 0? Solr (Jetty, really) sends back "200" for the HTTP status code when the request status is zero. The reason Solr uses a status of zero internally has its origins in the way most operating systems deal with program exit codes. Almost universally, when a program exits with an exit code of 0, it tells the operating system that the exit was normal, no errors. Any positive number indicates some kind of error. The reason this is not reversed is simple -- unlike HTTP, which has multiple codes meaning success, operating systems must handle many different error codes, but only one success code. So the success code is assigned to the number that's inherently different from the rest -- zero. Internally, Solr doesn't necessarily know that the response is going to use HTTP, although that is the most common method. In the mind of a typical open source developer, an exit status of ANY positive number means there was an error, including 200. Once control is handed off to Jetty, then the zero success status is translated to the most-used success code for HTTP. Any number could *potentially* be valid for the status in Solr logs, but I've only ever seen zero, 40x, and 50x. The 40x series means there was a problem detected in the request, 50x means an error happened inside Solr itself after the request was determined to be good. The ping handler will return a 503 statusif the health check is put into a disabledstate. Thanks, Shawn
Solr "replicateAfter optimize" is specified, but replication starts also on commits and master startup (tested on solr 5.5.2)
Hi everyone, I am experiencing a replication issue on a master/slave configuration, Issue: Full index replicas occur sometimes on master startup and after commits, despite only the optimize directive is specified. In the case of replica on commit, it occurs only for sufficiently big commits. Replica correctly starts again at the end of my indexing job, after the optimization phase. As result of this behaviour I get incomplete indexes on slaves during the indexing process. Solr version: 5.5.2 Configuration: ${solr.abortOnConfigurationError:true} 5.5.1 ${solr.data.dir:} 1000 false 32 10 10 native true 1 0 10 false 1 60 false [...] ${solr.master.enable:false} optimize optimize ${solr.numberOfVersionToKeep:3} ${solr.slave.enable:false} ${solr.master.url:}/replication ${solr.replication.pollInterval:00:00:30} [...] Any idea on how to solve this issue would be greatly appreciated. Many thanks, Alessandro
Re: Searching Home's, Homes and Home
Thanks for all the responses... I have checked these options, none of the option has worked so far. The option is only giving only two results not the third one. I am checking some more options and if you can share more ideas, that would be great. Thanks, Surender Singh -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-Home-s-Homes-and-Home-tp4286341p4288393.html Sent from the Solr - User mailing list archive at Nabble.com.
loading zookeeper data
Hi everyone I'm not new to Solr, but I'm upgrading from Solr 4 to 5 and needing to use the new Zookeeper configuration requirement. It is adding a lot of extra complexity to our deployment and I want to check that we are doing it right. 1. We are using Saltstack to push files to deployment servers. That makes it easy to put files anywhere I want, run scripts, etc. If you don't know Salt, it is a lot like Puppet or other configuration management tools. Salt is all python. 2. We use Jenkins to build and test 3. Deployment servers are all FreeBSD. Now, in the old days, I could just push the right core configuration files to each Solr instance (we have three cores), make sure one is the master and use cron to ensure the master updates. The other Solr slaves all update nicely. The problem we want to escape is that this configuration causes outages and other random issues each time the Solr master does a full reload. It shouldn't, but it does and hopefully the new SolrCluster will be better. Now, I can still deploy Solr and Zookeeper using Salt. All that works well and is easy. But how I do get the configuration files from our development/test environment (built and tested with Jenkins) into production? Obviously I want those config files in version control. And maybe Jenkins can zip up the 8 configuration files (per core) and push them to our artifact repository. But then what? In the production cluster it seems I then need to 1. Grab the latest configuration bundle for each core and unpack them 2. Launch Java 3. Execute the Solr jars (from the production server since it must be the right version) - with org.apache.solr.cloud.ZkCLI - and some parameters pointing to the production Zookeeper cluster - pointing also to the unpacked config files 4. Parse the output to understand if any error happened 5. Wait for Solr to pick up the new configuration and do any final production checks Am I missing some really simple step, or is this what we must now do? I'm thinking that gradle might help with 2&3 above since then at least it can launch the right version of Java, download the right Solr version and execute against that. And maybe that can run from Jenkins as a "release" step. Is that a good approach? Cheers Ari -- --> Aristedes Maniatis CEO, ish https://www.ish.com.au GPG fingerprint CBFB 84B4 738D 4E87 5E5C 5EFA EF6A 7D2E 3E49 102A signature.asc Description: OpenPGP digital signature