Re: SolrJ Socket Leak
Jared, I faced a similar issue when using CloudSolrServer with Solr. As Shawn pointed out the 'TIME_WAIT' status happens when the connection is closed by the http client. HTTP client closes connection whenever it thinks the connection is stale (https://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html #d5e405). Even the docs point out the stale connection checking cannot be all reliable. I see two ways to get around this: 1. Enable 'SO_REUSEADDR' 2. Disable stale connection checks. Also by default, when we create CSS it does not explicitly configure any http client parameters (https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/a pache/solr/client/solrj/impl/CloudSolrServer.java#L124). In this case, the default configuration parameters (max connections, max connections per host) are used for a http connection. You can explicitly configure these params when creating CSS using HttpClientUtil: ModifiableSolrParams params = new ModifiableSolrParams(); params.set(HttpClientUtil.PROP_MAX_CONNECTIONS, 128); params.set(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 32); params.set(HttpClientUtil.PROP_FOLLOW_REDIRECTS, false); params.set(HttpClientUtil.PROP_CONNECTION_TIMEOUT, 3); httpClient = HttpClientUtil.createClient(params); final HttpClient client = HttpClientUtil.createClient(params); LBHttpSolrServer lb = new LBHttpSolrServer(client); CloudSolrServer server = new CloudSolrServer(zkConnect, lb); Currently, I am using http client 4.3.2 and building the client when creating the CSS. I also use 'SO_REUSEADDR' option and I haven't seen the 'TIME_WAIT' after this (may be because of better handling of stale connections in 4.3.2 or because of 'SO_REUSEADDR' param enabled). My current http client code looks like this: (works only with http client 4.3.2) HttpClientBuilder httpBuilder = HttpClientBuilder.create(); Builder socketConfig = SocketConfig.custom(); socketConfig.setSoReuseAddress(true); socketConfig.setSoTimeout(1); httpBuilder.setDefaultSocketConfig(socketConfig.build()); httpBuilder.setMaxConnTotal(300); httpBuilder.setMaxConnPerRoute(100); httpBuilder.disableRedirectHandling(); httpBuilder.useSystemProperties(); LBHttpSolrServer lb = new LBHttpSolrServer(httpClient, parser) CloudSolrServer server = new CloudSolrServer(zkConnect, lb); There should be a way to configure socket reuse with 4.2.3 too. You can try different configurations. I am surprised you have 'TIME_WAIT' connections even after 30 minutes because 'TIME_WAIT' connection should be closed by default in 2 mins by O.S I think. HTH, -- Kiran Chitturi, On 2/13/14 12:38 PM, Jared Rodriguez jrodrig...@kitedesk.com wrote: I am using solr/solrj 4.6.1 along with the apache httpclient 4.3.2 as part of a web application which connects to the solr server via solrj using CloudSolrServer(); The web application is wired up with Guice, and there is a single instance of the CloudSolrServer class used by all inbound requests. All this is running on Amazon. Basically, everything looks and runs fine for a while, but even with moderate concurrency, solrj starts leaving sockets open. We are handling only about 250 connections to the web app per minute and each of these issues from 3 - 7 requests to solr. Over a 30 minute period of this type of use, we end up with many 1000s of lingering sockets. I can see these when running netstats tcp0 0 ip-10-80-14-26.ec2.in:41098 ip-10-99-145-47.ec2.i:glrpc TIME_WAIT All to the same target host, which is my solr server. There are no other pieces of infrastructure on that box, just solr. Eventually, the server just dies as no further sockets can be opened and the opened ones are not reused. The solr server itself is unphased and running like a champ. Average timer per request of 0.126, as seen in the solr web app admin UI query handler stats. Apache httpclient had a bunch of leakage from version 4.2.x that they cleaned up and refactored in 4.3.x, which is why I upgraded. Currently, solrj makes use of the old leaky 4.2 classes for establishing connections and using a connection pool. http://www.apache.org/dist/httpcomponents/httpclient/RELEASE_NOTES-4.3.x.t xt -- Jared Rodriguez
Re: DIH
On Sat, Feb 15, 2014 at 1:07 PM, Shawn Heisey s...@elyograg.org wrote: On 2/14/2014 10:45 PM, William Bell wrote: On virtual cores the DIH handler is really slow. On a 12 core box it only uses 1 core while indexing. Does anyone know how to do Java threading from a SQL query into Solr? Examples? I can use SolrJ to do it, or I might be able to modify DIH to enable threading. At some point in 3.x threading was enabled in DIH, but it was removed since people where having issues with it (we never did). If you know how to fix DIH so it can do multiple indexing threads safely, please open an issue and upload a patch. Please! Don't do it. Never again! https://issues.apache.org/jira/browse/SOLR-3011 As far as I understand the general idea is to find the DIH successor https://issues.apache.org/jira/browse/SOLR-4799?focusedCommentId=13738424page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13738424 I'm still using DIH for full rebuilds, but I'd actually like to replace it with a rebuild routine written in SolrJ. I currently achieve decent speed by running DIH on all my shards at the same time. I do use SolrJ for once-a-minute index maintenance, but the code that I've written to pull data out of SQL and write it to Solr is not able to index millions of documents in a single thread as fast as DIH does. I have been building a multithreaded design in my head, but I haven't had a chance to write real code and see whether it's actually a good design. For me, the bottleneck is definitely Solr, not the database. I recently wrote a test program that uses my current SolrJ indexing method. If I skip the server.add(docs) line, it can read all 91 million docs from the database and build SolrInputDocument objects for them in 2.5 hours or less, all with a single thread. When I do a real rebuild with DIH, it takes a little more than 4.5 hours -- and that is inherently multithreaded, because it's doing all the shards simultaneously. I have no idea how long it would take with a single-threaded SolrJ program. Thanks, Shawn -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Solr index filename doesn't match with solr vesion
Thanks Shawn, Tri for your infos, explanation. Tien On Mon, Feb 17, 2014 at 1:36 PM, Tri Cao tm...@me.com wrote: Lucene main file formats actually don't change a lot in 4.x (or even 5.x), and the newer codecs just delegate to previous versions for most file types. The newer file types don't typically include Lucene's version in file names. For example, Lucene 4.6 codes basically delegate stored fields and term vector file format to 4.1, doc format to 4.0, etc. and only implement the new segment info/fields info formats (the .si and .fnm files). https://github.com/apache/lucene-solr/blob/lucene_solr_4_6/lucene/core/src/java/org/apache/lucene/codecs/lucene46/Lucene46Codec.java#L50 Hope this helps, Tri On Feb 16, 2014, at 08:52 PM, Shawn Heisey s...@elyograg.org wrote: On 2/16/2014 7:25 PM, Nguyen Manh Tien wrote: I upgraded recently from solr 4.0 to solr 4.6, I check solr index folder and found there file _aars_*Lucene41*_0.doc _aars_*Lucene41*_0.pos _aars_*Lucene41*_0.tim _aars_*Lucene41*_0.tip I don't know why it don't have *Lucene46* in file name. This is an indication that this part of the index is using a file format introduced in Lucene 4.1. Here's what I have for one of my index segments on a Solr 4.6.1 server: _5s7_2h.del _5s7.fdt _5s7.fdx _5s7.fnm _5s7_Lucene41_0.doc _5s7_Lucene41_0.pos _5s7_Lucene41_0.tim _5s7_Lucene41_0.tip _5s7_Lucene45_0.dvd _5s7_Lucene45_0.dvm _5s7.nvd _5s7.nvm _5s7.si _5s7.tvd _5s7.tvx It shows the same pieces as your list, but I am also using docValues in my index, and those files indicate that they are using the format from Lucene 4.5. I'm not sure why there are not version numbers in *all* of the file extensions -- that happens in the Lucene layer, which is a bit of a mystery to me. Thanks, Shawn
Re: DIH
There has been a couple of discussions to find DIH successor (including on HelioSearch list), but no real momentum as far as I can tell. I think somebody will have to really pitch in and do the same couple of scenarios DIH does in several different frameworks (TodoMVC style). That should get it going. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Mon, Feb 17, 2014 at 7:40 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: On Sat, Feb 15, 2014 at 1:07 PM, Shawn Heisey s...@elyograg.org wrote: On 2/14/2014 10:45 PM, William Bell wrote: On virtual cores the DIH handler is really slow. On a 12 core box it only uses 1 core while indexing. Does anyone know how to do Java threading from a SQL query into Solr? Examples? I can use SolrJ to do it, or I might be able to modify DIH to enable threading. At some point in 3.x threading was enabled in DIH, but it was removed since people where having issues with it (we never did). If you know how to fix DIH so it can do multiple indexing threads safely, please open an issue and upload a patch. Please! Don't do it. Never again! https://issues.apache.org/jira/browse/SOLR-3011 As far as I understand the general idea is to find the DIH successor https://issues.apache.org/jira/browse/SOLR-4799?focusedCommentId=13738424page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13738424 I'm still using DIH for full rebuilds, but I'd actually like to replace it with a rebuild routine written in SolrJ. I currently achieve decent speed by running DIH on all my shards at the same time. I do use SolrJ for once-a-minute index maintenance, but the code that I've written to pull data out of SQL and write it to Solr is not able to index millions of documents in a single thread as fast as DIH does. I have been building a multithreaded design in my head, but I haven't had a chance to write real code and see whether it's actually a good design. For me, the bottleneck is definitely Solr, not the database. I recently wrote a test program that uses my current SolrJ indexing method. If I skip the server.add(docs) line, it can read all 91 million docs from the database and build SolrInputDocument objects for them in 2.5 hours or less, all with a single thread. When I do a real rebuild with DIH, it takes a little more than 4.5 hours -- and that is inherently multithreaded, because it's doing all the shards simultaneously. I have no idea how long it would take with a single-threaded SolrJ program. Thanks, Shawn -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: DIH
Hi Mikhail, Can you please elaborate what do you mean? My understanding is that there is no multi-threading support in DIH. For some reasons, it won't have. Am I correct? Regarding apache flume, how it can be dih replacement? Can I index rich documents on my disk using flume? Can I fetch documents from wikipedia,jira,twitter,dropbox,rdbms,rss,file system by using it? Ahmet On Monday, February 17, 2014 10:41 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: On Sat, Feb 15, 2014 at 1:07 PM, Shawn Heisey s...@elyograg.org wrote: On 2/14/2014 10:45 PM, William Bell wrote: On virtual cores the DIH handler is really slow. On a 12 core box it only uses 1 core while indexing. Does anyone know how to do Java threading from a SQL query into Solr? Examples? I can use SolrJ to do it, or I might be able to modify DIH to enable threading. At some point in 3.x threading was enabled in DIH, but it was removed since people where having issues with it (we never did). If you know how to fix DIH so it can do multiple indexing threads safely, please open an issue and upload a patch. Please! Don't do it. Never again! https://issues.apache.org/jira/browse/SOLR-3011 As far as I understand the general idea is to find the DIH successor https://issues.apache.org/jira/browse/SOLR-4799?focusedCommentId=13738424page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13738424 I'm still using DIH for full rebuilds, but I'd actually like to replace it with a rebuild routine written in SolrJ. I currently achieve decent speed by running DIH on all my shards at the same time. I do use SolrJ for once-a-minute index maintenance, but the code that I've written to pull data out of SQL and write it to Solr is not able to index millions of documents in a single thread as fast as DIH does. I have been building a multithreaded design in my head, but I haven't had a chance to write real code and see whether it's actually a good design. For me, the bottleneck is definitely Solr, not the database. I recently wrote a test program that uses my current SolrJ indexing method. If I skip the server.add(docs) line, it can read all 91 million docs from the database and build SolrInputDocument objects for them in 2.5 hours or less, all with a single thread. When I do a real rebuild with DIH, it takes a little more than 4.5 hours -- and that is inherently multithreaded, because it's doing all the shards simultaneously. I have no idea how long it would take with a single-threaded SolrJ program. Thanks, Shawn -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: DIH
I haven't tried Apache Flume but the manual seems to suggest 'yes' to a large number of your checklist items: http://flume.apache.org/FlumeUserGuide.html When you say 'rich document' indexing, the keyword you are looking for is (Apache) Tika, as that's what actually doing the job under the covers. Whether it can replicate your specific requirements, is a question only you can answer for yourself of course. When you do, maybe let us know, so we can learn too. :-) Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Mon, Feb 17, 2014 at 8:11 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Mikhail, Can you please elaborate what do you mean? My understanding is that there is no multi-threading support in DIH. For some reasons, it won't have. Am I correct? Regarding apache flume, how it can be dih replacement? Can I index rich documents on my disk using flume? Can I fetch documents from wikipedia,jira,twitter,dropbox,rdbms,rss,file system by using it? Ahmet On Monday, February 17, 2014 10:41 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: On Sat, Feb 15, 2014 at 1:07 PM, Shawn Heisey s...@elyograg.org wrote: On 2/14/2014 10:45 PM, William Bell wrote: On virtual cores the DIH handler is really slow. On a 12 core box it only uses 1 core while indexing. Does anyone know how to do Java threading from a SQL query into Solr? Examples? I can use SolrJ to do it, or I might be able to modify DIH to enable threading. At some point in 3.x threading was enabled in DIH, but it was removed since people where having issues with it (we never did). If you know how to fix DIH so it can do multiple indexing threads safely, please open an issue and upload a patch. Please! Don't do it. Never again! https://issues.apache.org/jira/browse/SOLR-3011 As far as I understand the general idea is to find the DIH successor https://issues.apache.org/jira/browse/SOLR-4799?focusedCommentId=13738424page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13738424 I'm still using DIH for full rebuilds, but I'd actually like to replace it with a rebuild routine written in SolrJ. I currently achieve decent speed by running DIH on all my shards at the same time. I do use SolrJ for once-a-minute index maintenance, but the code that I've written to pull data out of SQL and write it to Solr is not able to index millions of documents in a single thread as fast as DIH does. I have been building a multithreaded design in my head, but I haven't had a chance to write real code and see whether it's actually a good design. For me, the bottleneck is definitely Solr, not the database. I recently wrote a test program that uses my current SolrJ indexing method. If I skip the server.add(docs) line, it can read all 91 million docs from the database and build SolrInputDocument objects for them in 2.5 hours or less, all with a single thread. When I do a real rebuild with DIH, it takes a little more than 4.5 hours -- and that is inherently multithreaded, because it's doing all the shards simultaneously. I have no idea how long it would take with a single-threaded SolrJ program. Thanks, Shawn -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Solr Load Testing Issues
Sorry I didn't make myself clear. I have 20 machines in the configuration, each shard/replica is on it's own machine. On 14 February 2014 19:44, Shawn Heisey s...@elyograg.org wrote: On 2/14/2014 5:28 AM, Annette Newton wrote: Solr Version: 4.3.1 Number Shards: 10 Replicas: 1 Heap size: 15GB Machine RAM: 30GB Zookeeper timeout: 45 seconds We are continuing the fight to keep our solr setup functioning. As a result of this we have made significant changes to our schema to reduce the amount of data we write. I setup a new cluster to reindex our data, initially I ran the import with no replicas, and achieved quite impressive results. Our peak was 60,000 new documents per minute, no shard loses, no outages due to garbage collection (which is an issue we see in production), at the end of the load the index stood at 97,000,000 documents and 20GB per shard. During the highest insertion rate I would say that querying suffered, but that is not of concern right now. Solr 4.3.1 has a number of problems when it comes to large clouds. Upgrading to 4.6.1 would be strongly advisable, but that's only something to try after looking into the rest of what I have to say. If I read what you've written correctly, you are running all this on one machine. To put it bluntly, this isn't going to work well unless you put a LOT more memory into that machine. For good performance, Solr relies on the OS disk cache, because reading from the disk is VERY expensive in terms of time. The OS will automatically use RAM that's not being used for other purposes for the disk cache, so that it can avoid reading off the disk as much as possible. http://wiki.apache.org/solr/SolrPerformanceProblems Below is a summary of what that Wiki page says, with your numbers as I understand them. If I am misunderstanding your numbers, then this advice may need adjustment. Note that when I see one replica I take that to mean replicationFactor=1, so there is only one copy of the index. If you actually mean that you have *two* copies, then you have twice as much data as I've indicated below, and your requirements will be even larger: With ten shards that are each 20GB in size, your total index size is 200GB. With 15 GB of heap, your ideal memory size for that server would be 215GB -- the 15GB heap plus enough extra to fit the entire 200GB index into RAM. In reality you probably don't need that much, but it's likely that you would need at least half the index to fit into RAM at any one moment, which adds up to 115GB. If you're prepared to deal with moderate-to-severe performance problems, you **MIGHT** be able to get away with only 25% of the index fitting into RAM, which still requires 65GB of RAM, but with SolrCloud, such performance problems usually mean that the cloud won't be stable, so it's not advisable to even try it. One of the bits of advice on the wiki page is to split your index into shards and put it on more machines, which drops the memory requirements for each machine. You're already using a multi-shard SolrCloud, so you probably just need more hardware. If you had one 20GB shard on a machine with 30GB of RAM, you could probably use a heap size of 4-8GB per machine and have plenty of RAM left over to cache the index very well. You could most likely add another 50% to the index size and still be OK. Thanks, Shawn -- Annette Newton Database Administrator ServiceTick Ltd T:+44(0)1603 618326 Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ www.servicetick.com *www.sessioncam.com http://www.sessioncam.com* -- *This message is confidential and is intended to be read solely by the addressee. The contents should not be disclosed to any other person or copies taken unless authorised to do so. If you are not the intended recipient, please notify the sender and permanently delete this message. As Internet communications are not secure ServiceTick accepts neither legal responsibility for the contents of this message nor responsibility for any change made to this message after it was forwarded by the original author.*
Solrcloud: no registered leader found and new searcher error
I have configured solrcloud as follows, http://lucene.472066.n3.nabble.com/file/n4117724/Untitled.png Solr.xml: solr persistent=true sharedLib=lib cores adminPath=/admin/cores zkClientTimeout=${zkClientTimeout:15000} hostPort=${jetty.port:} hostContext=solr core loadOnStartup=true instanceDir=document\ transient=false name=document/ core loadOnStartup=true instanceDir=contract\ transient=false name=contract/ /cores /solr I have added all the required config for solrcloud, referred this : http://wiki.apache.org/solr/SolrCloud#Required_Config I am adding data to core:document. Now when i try to index using solrnet, (solr.Add(doc)) , i get this error : SEVERE: org.apache.solr.common.SolrException: *No registered leader was found, collection:document* slice:shard2 at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:481) and this error also: SEVERE: null:java.lang.RuntimeException: *SolrCoreState already closed* at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:84) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:520) I guess, it is because the leader is from core:contract and i am trying to index in core:document? Is there a way to change the leader, and how ? How can i change the state of shards from gone to active? Also when i try to query : q=*:* , this is shown org.apache.solr.common.SolrException: *Error opening new searcher at* org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415) at I read that, if number of commits exceed then this searcher error comes, but i did not issue commit command,then how will the commit exceed. Also it requires some warming setting, so i added this to solrconfig.xml, but still i get the same error, query listener event=newSearcher class=solr.QuerySenderListener arr name=queries lst str name=qsolr/str str name=start0/str str name=rows10/str /lst lst str name=qrocks/str str name=start0/str str name=rows10/str /lst /arr /listener maxWarmingSearchers2/maxWarmingSearchers /query I have just started with solrcloud, please tell if I am doing anything wrong in solrcloud configurations. Also i did not good material for solrcloud in windows 7 with apache tomcat , please suggest for that too. Thanks a lot. -- View this message in context: http://lucene.472066.n3.nabble.com/Solrcloud-no-registered-leader-found-and-new-searcher-error-tp4117724.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud how to spread out to multiple nodes
Thanks, Im going to give this a try -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-how-to-spread-out-to-multiple-nodes-tp4116326p4117728.html Sent from the Solr - User mailing list archive at Nabble.com.
Facet cache issue when deleting documents from the index
Hi guys, I'm using Solr 4.6.1 (embedded) and for some reason the facet cache is not invalidated when documents are deleted from the index. Sadly, for me, I cannot reproduce this issue with an integration test like this: --8-- SolrInstance server = getSolrInstance(); SolrInputDocument document = new SolrInputDocument(); document.setField(id, foo); document.setField(locale, en); server.add(document); server.commit(); document = new SolrInputDocument(); document.setField(id, bar); document.setField(locale, en); server.add(document); server.commit(); SolrQuery query = new SolrQuery(*:*); query.set(facet, on); query.set(facet.field, locale); QueryResponse response = server.query(query); Assert.assertEquals(2, response.getResults().size()); FacetField localeFacet = response.getFacetField(locale); Assert.assertEquals(1, localeFacet.getValues().size()); Count en = localeFacet.getValues().get(0); Assert.assertEquals(en, en.getName()); Assert.assertEquals(2, en.getCount()); server.delete(foo); server.commit(); response = server.query(query); Assert.assertEquals(1, response.getResults().size()); localeFacet = response.getFacetField(locale); Assert.assertEquals(1, localeFacet.getValues().size()); en = localeFacet.getValues().get(0); Assert.assertEquals(en, en.getName()); Assert.assertEquals(1, en.getCount()); --8-- Nevertheless, when I do the 'same' on my real environment, the count for the locale facet remains 2 after one of the documents is deleted. The search result count is fine, so that's why I think it's a facet cache issue. Note that the facet count remains 2 even after I restart the server, so the cache is persisted on the file system. Strangely, the facet count is updated correctly if I modify the document instead of deleting it (i.e. removing a keyword from the content so that it isn't matched by the search query any more). So it looks like only delete triggers the issue. Now, an interesting fact is that if, on my real environment, I delete one of the documents and then add a new one, the facet count becomes 3. So the last commit to the index, which inserts a new document, doesn't trigger a re-computation of the facet cache. The previous facet cache is simply incremented, so the error is perpetuated. At this point I don't even know how to fix the facet cache without deleting the Solr data folder so that the full index is rebuild. I'm still trying to figure out what is the difference between the integration test and my real environment (as I used the same schema and configuration). Do you know what might be wrong? Thanks, Marius
Solr Suggester not working in sharding (distributed search)
I have two solr server (solr 4.5.1) which is running in shard.. I have implemented solr suggester using spellcheckComponent for auto-suggester. when i execute suggest url on individual core then the solr suggestion is coming properly. http://localhost:8986/solr/core1/suggest?spellcheck.q=city%20of and http://localhost:8987/solr/core1/suggest?spellcheck.q=city%20of when i fired url with refence to solr wiki (https://wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Support) the result is not coming and below exception is occur. URL :- http://localhost:8986/solr/core1/select?shards=localhost:8986/solr/core1,localhost:8987/solr/core1spellcheck.q=city%20ofshards.qt=%2Fsuggestqt=suggest java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:843) at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:649) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:628) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:619) for reference below is my schema.xml and solrconfig.xml entry searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str str name=fieldsugg/str str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellcheckon/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler fieldType name=text_autocomplete class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=syn.txt ignoreCase=true expand=true/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType i have unique filed, id which is store = true in schema.xml can any one please suggest the
Re: DIH
On Mon, Feb 17, 2014 at 1:11 PM, Ahmet Arslan iori...@yahoo.com wrote: My understanding is that there is no multi-threading support in DIH. For some reasons, it won't have. Am I correct? threads parameter seems working in 3.6 or so, but was removed from 4.x as causes a lot of instability. Regarding apache flume, how it can be dih replacement? Can I index rich documents on my disk using flume? Can I fetch documents from wikipedia,jira,twitter, I don't know Flume, and I'm even not ready to propose a DIH replacement candidate. I personally consider an old school ETL, 'cause I'm mostly interested in joining RDBMS tables. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Facet cache issue when deleting documents from the index
Hi Marius, Facets are computed from indexed terms. Can you commit with expungeDeletes=true flag? Ahmet On Monday, February 17, 2014 12:17 PM, Marius Dumitru Florea mariusdumitru.flo...@xwiki.com wrote: Hi guys, I'm using Solr 4.6.1 (embedded) and for some reason the facet cache is not invalidated when documents are deleted from the index. Sadly, for me, I cannot reproduce this issue with an integration test like this: --8-- SolrInstance server = getSolrInstance(); SolrInputDocument document = new SolrInputDocument(); document.setField(id, foo); document.setField(locale, en); server.add(document); server.commit(); document = new SolrInputDocument(); document.setField(id, bar); document.setField(locale, en); server.add(document); server.commit(); SolrQuery query = new SolrQuery(*:*); query.set(facet, on); query.set(facet.field, locale); QueryResponse response = server.query(query); Assert.assertEquals(2, response.getResults().size()); FacetField localeFacet = response.getFacetField(locale); Assert.assertEquals(1, localeFacet.getValues().size()); Count en = localeFacet.getValues().get(0); Assert.assertEquals(en, en.getName()); Assert.assertEquals(2, en.getCount()); server.delete(foo); server.commit(); response = server.query(query); Assert.assertEquals(1, response.getResults().size()); localeFacet = response.getFacetField(locale); Assert.assertEquals(1, localeFacet.getValues().size()); en = localeFacet.getValues().get(0); Assert.assertEquals(en, en.getName()); Assert.assertEquals(1, en.getCount()); --8-- Nevertheless, when I do the 'same' on my real environment, the count for the locale facet remains 2 after one of the documents is deleted. The search result count is fine, so that's why I think it's a facet cache issue. Note that the facet count remains 2 even after I restart the server, so the cache is persisted on the file system. Strangely, the facet count is updated correctly if I modify the document instead of deleting it (i.e. removing a keyword from the content so that it isn't matched by the search query any more). So it looks like only delete triggers the issue. Now, an interesting fact is that if, on my real environment, I delete one of the documents and then add a new one, the facet count becomes 3. So the last commit to the index, which inserts a new document, doesn't trigger a re-computation of the facet cache. The previous facet cache is simply incremented, so the error is perpetuated. At this point I don't even know how to fix the facet cache without deleting the Solr data folder so that the full index is rebuild. I'm still trying to figure out what is the difference between the integration test and my real environment (as I used the same schema and configuration). Do you know what might be wrong? Thanks, Marius
Solr Suggester not working in sharding (distributed search)
I have two solr server (solr 4.5.1) which is running in shard.. I have implemented solr suggester using spellcheckComponent for auto-suggester. when i execute suggest url on individual core then the solr suggestion is coming properly. mysolr.com:8986/solr/core1/suggest?spellcheck.q=city%20of and mysolr.com:8987/solr/core1/suggest?spellcheck.q=city%20of when i fired url with refence to solr wiki (wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Support) the result is not coming and below exception is occur. URL :- mysolr.com:8986/solr/core1/select?shards=mysolr.com:8986/solr/core1,mysolr.com:8987/solr/core1spellcheck.q=city%20ofshards.qt=%2Fsuggestqt=suggest java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:843) at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:649) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:628) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:619) for reference below is my schema.xml and solrconfig.xml entry |searchComponentclass=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=name span class=pun style=margin: 0px; padding: 0px; border: 0px; font-size: 14px; vertical-align: baseline; background-color: transparent; color: rgb(0, 0, 0); background-position: initial initial; background-repeat: initial initial;suggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str str name=field sugg/str str name=buildOnCommit true/str /lst| | requestHandlerclass=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellcheckon/str str name=spellcheck.di ctionarysuggest/str str name=spellcheck.on lyMorePopulartrue/st r str name=spellcheck.co unt10/str str name=spellcheck.co llatetrue/str /lst arr name=components strsuggest/str /arr| |fieldType name=text_autocomplete class=solr.TextField positionIncrementGap=100 analyzer tokenizerclass= /spansolr.KeywordTokenizerFactory/ filterclass=solr.LowerCaseFilterFactory/ filterclass=solr.TrimFilterFactory/ filterclass=solr.SynonymFilterFactory synonyms=syn.txt ignoreCase=true expand=true/
Re: Facet cache issue when deleting documents from the index
Hi, Also I noticed that in your code snippet you have server.delete(foo); which does not exists. deleteById and deleteByQuery methods are defined in SolrServer implementation. On Monday, February 17, 2014 1:42 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Marius, Facets are computed from indexed terms. Can you commit with expungeDeletes=true flag? Ahmet On Monday, February 17, 2014 12:17 PM, Marius Dumitru Florea mariusdumitru.flo...@xwiki.com wrote: Hi guys, I'm using Solr 4.6.1 (embedded) and for some reason the facet cache is not invalidated when documents are deleted from the index. Sadly, for me, I cannot reproduce this issue with an integration test like this: --8-- SolrInstance server = getSolrInstance(); SolrInputDocument document = new SolrInputDocument(); document.setField(id, foo); document.setField(locale, en); server.add(document); server.commit(); document = new SolrInputDocument(); document.setField(id, bar); document.setField(locale, en); server.add(document); server.commit(); SolrQuery query = new SolrQuery(*:*); query.set(facet, on); query.set(facet.field, locale); QueryResponse response = server.query(query); Assert.assertEquals(2, response.getResults().size()); FacetField localeFacet = response.getFacetField(locale); Assert.assertEquals(1, localeFacet.getValues().size()); Count en = localeFacet.getValues().get(0); Assert.assertEquals(en, en.getName()); Assert.assertEquals(2, en.getCount()); server.delete(foo); server.commit(); response = server.query(query); Assert.assertEquals(1, response.getResults().size()); localeFacet = response.getFacetField(locale); Assert.assertEquals(1, localeFacet.getValues().size()); en = localeFacet.getValues().get(0); Assert.assertEquals(en, en.getName()); Assert.assertEquals(1, en.getCount()); --8-- Nevertheless, when I do the 'same' on my real environment, the count for the locale facet remains 2 after one of the documents is deleted. The search result count is fine, so that's why I think it's a facet cache issue. Note that the facet count remains 2 even after I restart the server, so the cache is persisted on the file system. Strangely, the facet count is updated correctly if I modify the document instead of deleting it (i.e. removing a keyword from the content so that it isn't matched by the search query any more). So it looks like only delete triggers the issue. Now, an interesting fact is that if, on my real environment, I delete one of the documents and then add a new one, the facet count becomes 3. So the last commit to the index, which inserts a new document, doesn't trigger a re-computation of the facet cache. The previous facet cache is simply incremented, so the error is perpetuated. At this point I don't even know how to fix the facet cache without deleting the Solr data folder so that the full index is rebuild. I'm still trying to figure out what is the difference between the integration test and my real environment (as I used the same schema and configuration). Do you know what might be wrong? Thanks, Marius
Solr cloud hangs
Hi, I have quite annoying problem with Solr cloud. I have a cluster with 8 shards and with 2 replicas in each. (Solr 4.6.1) After some time cluster doesn't respond to any update requests. Restarting the cluster nodes doesn't help. There are a lot of such stack traces (waiting for very long time): - sun.misc.Unsafe.park(Native Method) - java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) - org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342) - org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526) - org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44) - org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) - java.lang.Thread.run(Thread.java:722) Do you have any idea where can I look for? -- Pawel
Re: Facet cache issue when deleting documents from the index
On Mon, Feb 17, 2014 at 2:00 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Also I noticed that in your code snippet you have server.delete(foo); which does not exists. deleteById and deleteByQuery methods are defined in SolrServer implementation. Yes, sorry, I have a wrapper over the SolrInstance that doesn't do much. In the case of delete it just forwards the call to deleteById. I'll check the expungeDeletes=true flag and post back the results. Thanks, Marius On Monday, February 17, 2014 1:42 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Marius, Facets are computed from indexed terms. Can you commit with expungeDeletes=true flag? Ahmet On Monday, February 17, 2014 12:17 PM, Marius Dumitru Florea mariusdumitru.flo...@xwiki.com wrote: Hi guys, I'm using Solr 4.6.1 (embedded) and for some reason the facet cache is not invalidated when documents are deleted from the index. Sadly, for me, I cannot reproduce this issue with an integration test like this: --8-- SolrInstance server = getSolrInstance(); SolrInputDocument document = new SolrInputDocument(); document.setField(id, foo); document.setField(locale, en); server.add(document); server.commit(); document = new SolrInputDocument(); document.setField(id, bar); document.setField(locale, en); server.add(document); server.commit(); SolrQuery query = new SolrQuery(*:*); query.set(facet, on); query.set(facet.field, locale); QueryResponse response = server.query(query); Assert.assertEquals(2, response.getResults().size()); FacetField localeFacet = response.getFacetField(locale); Assert.assertEquals(1, localeFacet.getValues().size()); Count en = localeFacet.getValues().get(0); Assert.assertEquals(en, en.getName()); Assert.assertEquals(2, en.getCount()); server.delete(foo); server.commit(); response = server.query(query); Assert.assertEquals(1, response.getResults().size()); localeFacet = response.getFacetField(locale); Assert.assertEquals(1, localeFacet.getValues().size()); en = localeFacet.getValues().get(0); Assert.assertEquals(en, en.getName()); Assert.assertEquals(1, en.getCount()); --8-- Nevertheless, when I do the 'same' on my real environment, the count for the locale facet remains 2 after one of the documents is deleted. The search result count is fine, so that's why I think it's a facet cache issue. Note that the facet count remains 2 even after I restart the server, so the cache is persisted on the file system. Strangely, the facet count is updated correctly if I modify the document instead of deleting it (i.e. removing a keyword from the content so that it isn't matched by the search query any more). So it looks like only delete triggers the issue. Now, an interesting fact is that if, on my real environment, I delete one of the documents and then add a new one, the facet count becomes 3. So the last commit to the index, which inserts a new document, doesn't trigger a re-computation of the facet cache. The previous facet cache is simply incremented, so the error is perpetuated. At this point I don't even know how to fix the facet cache without deleting the Solr data folder so that the full index is rebuild. I'm still trying to figure out what is the difference between the integration test and my real environment (as I used the same schema and configuration). Do you know what might be wrong? Thanks, Marius
Best way to copy data from SolrCloud to standalone Solr?
Hi all, I have a production SolrCloud server which has multiple sharded indexes, and I need to copy all of the indexes to a (non-cloud) Solr server within our QA environment. Can I ask for advice on the best way to do this please? I've searched the web and found solr2solr (https://github.com/dbashford/solr2solr), but the author states that this is best for small indexes, and ours are rather large at ~20Gb each. I've also looked at replication, but can't find a definite reference on how this should be done between SolrCloud and Solr? Any guidance is very much appreciated. Best wishes, Daniel -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk http://www.tai-dev.co.uk/* daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk | +44 (0) 7799406399 | Twitter: @taidevcouk https://twitter.com/taidevcouk
Re: Solr cloud hangs
Can you share the full stack trace dump? - Mark http://about.me/markrmiller On Feb 17, 2014, at 7:07 AM, Pawel Rog pawelro...@gmail.com wrote: Hi, I have quite annoying problem with Solr cloud. I have a cluster with 8 shards and with 2 replicas in each. (Solr 4.6.1) After some time cluster doesn't respond to any update requests. Restarting the cluster nodes doesn't help. There are a lot of such stack traces (waiting for very long time): - sun.misc.Unsafe.park(Native Method) - java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) - org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342) - org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526) - org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44) - org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) - java.lang.Thread.run(Thread.java:722) Do you have any idea where can I look for? -- Pawel
Re: Solrcloud: no registered leader found and new searcher error
I think commits are not really the issue here. It _looks_ like at least one node in your document collection is failing to start, in fact your shard 2. On the Solr admin screen, the cloud section on the left should show you the states of all your nodes, make sure they're all green. My guess is that if you look at your Solr logs on the nodes that aren't coming up, you'll have a better idea of what's happening. You need to get all the nodes running first before worrying about messages like you're showing. Best, Erick On Mon, Feb 17, 2014 at 1:28 AM, sweety sweetyshind...@yahoo.com wrote: I have configured solrcloud as follows, http://lucene.472066.n3.nabble.com/file/n4117724/Untitled.png Solr.xml: solr persistent=true sharedLib=lib cores adminPath=/admin/cores zkClientTimeout=${zkClientTimeout:15000} hostPort=${jetty.port:} hostContext=solr core loadOnStartup=true instanceDir=document\ transient=false name=document/ core loadOnStartup=true instanceDir=contract\ transient=false name=contract/ /cores /solr I have added all the required config for solrcloud, referred this : http://wiki.apache.org/solr/SolrCloud#Required_Config I am adding data to core:document. Now when i try to index using solrnet, (solr.Add(doc)) , i get this error : SEVERE: org.apache.solr.common.SolrException: *No registered leader was found, collection:document* slice:shard2 at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:481) and this error also: SEVERE: null:java.lang.RuntimeException: *SolrCoreState already closed* at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:84) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:520) I guess, it is because the leader is from core:contract and i am trying to index in core:document? Is there a way to change the leader, and how ? How can i change the state of shards from gone to active? Also when i try to query : q=*:* , this is shown org.apache.solr.common.SolrException: *Error opening new searcher at* org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415) at I read that, if number of commits exceed then this searcher error comes, but i did not issue commit command,then how will the commit exceed. Also it requires some warming setting, so i added this to solrconfig.xml, but still i get the same error, query listener event=newSearcher class=solr.QuerySenderListener arr name=queries lst str name=qsolr/str str name=start0/str str name=rows10/str /lst lst str name=qrocks/str str name=start0/str str name=rows10/str /lst /arr /listener maxWarmingSearchers2/maxWarmingSearchers /query I have just started with solrcloud, please tell if I am doing anything wrong in solrcloud configurations. Also i did not good material for solrcloud in windows 7 with apache tomcat , please suggest for that too. Thanks a lot. -- View this message in context: http://lucene.472066.n3.nabble.com/Solrcloud-no-registered-leader-found-and-new-searcher-error-tp4117724.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: block join and atomic updates
Hello, It sounds like you need to switch to query time join. 15.02.2014 21:57 пользователь m...@preselect-media.com написал: Any suggestions? Zitat von m...@preselect-media.com: Yonik Seeley yo...@heliosearch.com: On Thu, Feb 13, 2014 at 8:25 AM, m...@preselect-media.com wrote: Is there any workaround to perform atomic updates on blocks or do I have to re-index the parent document and all its children always again if I want to update a field? The latter, unfortunately. Is there any plan to change this behavior in near future? So, I'm thinking of alternatives without loosing the benefit of block join. I try to explain an idea I just thought about: Let's say I have a parent document A with a number of fields I want to update regularly and a number of child documents AC_1 ... AC_n which are only indexed once and aren't going to change anymore. So, if I index A and AC_* in a block and I update A, the block is gone. But if I create an additional document AF which only contains something like an foreign key to A and indexing AF + AC_* as a block (not A + AC_* anymore), could I perform a {!parent ... } query on AF + AC_* and make an join from the results to get A? Does this makes any sense and is it even possible? ;-) And if it's possible, how can I do it? Thanks, - Moritz
Re: Solrcloud: no registered leader found and new searcher error
How do i get them running? -- View this message in context: http://lucene.472066.n3.nabble.com/Solrcloud-no-registered-leader-found-and-new-searcher-error-tp4117724p4117830.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr cloud hangs
Hi, Here is the whole stack trace: https://gist.github.com/anonymous/9056783 -- Pawel On Mon, Feb 17, 2014 at 4:53 PM, Mark Miller markrmil...@gmail.com wrote: Can you share the full stack trace dump? - Mark http://about.me/markrmiller On Feb 17, 2014, at 7:07 AM, Pawel Rog pawelro...@gmail.com wrote: Hi, I have quite annoying problem with Solr cloud. I have a cluster with 8 shards and with 2 replicas in each. (Solr 4.6.1) After some time cluster doesn't respond to any update requests. Restarting the cluster nodes doesn't help. There are a lot of such stack traces (waiting for very long time): - sun.misc.Unsafe.park(Native Method) - java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) - org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342) - org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526) - org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44) - org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) - java.lang.Thread.run(Thread.java:722) Do you have any idea where can I look for? -- Pawel
Re: Best way to copy data from SolrCloud to standalone Solr?
On 2/17/2014 8:32 AM, Daniel Bryant wrote: I have a production SolrCloud server which has multiple sharded indexes, and I need to copy all of the indexes to a (non-cloud) Solr server within our QA environment. Can I ask for advice on the best way to do this please? I've searched the web and found solr2solr (https://github.com/dbashford/solr2solr), but the author states that this is best for small indexes, and ours are rather large at ~20Gb each. I've also looked at replication, but can't find a definite reference on how this should be done between SolrCloud and Solr? Any guidance is very much appreciated. If the master index isn't changing at the time of the copy, and you're on a non-Windows platform, you should be able to copy the index directory directly. On a Windows platform, whether you can copy the index while Solr is using it would depend on how Solr/Lucene opens the files. A typical Windows file open will prevent anything else from opening them, and I do not know whether Lucene is smarter than that. SolrCloud requires the replication handler to be enabled on all configs, but during normal operation, it does not actually use replication. This is a confusing thing for some users. I *think* you can configure the replication handler on slave cores with a non-cloud config that point at the master cores, and it should replicate the main Lucene index, but not the config files. I have no idea whether things will work right if you configure other master options like replicateAfter and config files, and I also don't know if those options might cause problems for SolrCloud itself. Those options shouldn't be necessary for just getting the data into a dev environment, though. Thanks, Shawn
Re: Solr cloud hangs
There are also many errors in solr log like that one: org.apache.solr.update.StreamingSolrServers$1; error org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool at org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:232) at org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:199) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:456) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:232) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) -- Pawel On Mon, Feb 17, 2014 at 8:01 PM, Pawel Rog pawelro...@gmail.com wrote: Hi, Here is the whole stack trace: https://gist.github.com/anonymous/9056783 -- Pawel On Mon, Feb 17, 2014 at 4:53 PM, Mark Miller markrmil...@gmail.comwrote: Can you share the full stack trace dump? - Mark http://about.me/markrmiller On Feb 17, 2014, at 7:07 AM, Pawel Rog pawelro...@gmail.com wrote: Hi, I have quite annoying problem with Solr cloud. I have a cluster with 8 shards and with 2 replicas in each. (Solr 4.6.1) After some time cluster doesn't respond to any update requests. Restarting the cluster nodes doesn't help. There are a lot of such stack traces (waiting for very long time): - sun.misc.Unsafe.park(Native Method) - java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) - org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342) - org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526) - org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44) - org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) - java.lang.Thread.run(Thread.java:722) Do you have any idea where can I look for? -- Pawel
Re: Solrcloud: no registered leader found and new searcher error
Well, first determine whether they are running or not. Then look at the Solr log for that node when you try to start it up. Then post the results if you're still puzzled. You've given us no information about what the error (if any) is, I'm speculating here. You might want to review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Mon, Feb 17, 2014 at 10:27 AM, sweety sweetyshind...@yahoo.com wrote: How do i get them running? -- View this message in context: http://lucene.472066.n3.nabble.com/Solrcloud-no-registered-leader-found-and-new-searcher-error-tp4117724p4117830.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Autosuggest - Strange issue with leading numbers in query
Hi Erik, Thanks a lot for your reply. I expect it to return zero suggestions since the suggested keyword doesnt actually start with numbers. Expected results Searching for ga - returns galaxy Searching for gal - returns galaxy Searching for 12321312321312ga - should not return any suggestion since there is no keyword (combination) exists in the index. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Autosuggest-Strange-issue-with-leading-numbers-in-query-tp4116751p4117846.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Could not connect or ping a core after import a big data into it...
Sir, after I made experiment if I there are more than 1000(roughly) documents in the core, the problem will show up. then I make a query in command window it shows Exception in thread main org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) at ExampleSolrJClient.handler(ExampleSolrJClient.java:107) at ExampleSolrJClient.main(ExampleSolrJClient.java:53) -- View this message in context: http://lucene.472066.n3.nabble.com/Could-not-connect-or-ping-a-core-after-import-a-big-data-into-it-tp4117416p4117848.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrJ Socket Leak
Kiran Shawn, Thank you both for the info and you are both absolutely correct. The issue was not that sockets were leaked, but that wait time thing is a killer. I ended up fixing the problem by changing the system property of http.maxConnections which is used internally to Apache httpclient to setup the PoolingClientConnectionManager. Previously, this had no value, and was defaulting to 5. That meant that any time there were more than 50 (maxConnections * maxperroute) concurrent connections to the Solr server, non reusable connections were opening and closing and thus sitting in that idle state .. too many sockets. The fix was simply tuning the pool and setting http.maxConnections to a higher value representing the number of concurrent users that I expect. Problem fixed, and a modest speed improvement simply by higher socket reuse. Thank you both for the help! Jared On Mon, Feb 17, 2014 at 3:03 AM, Kiran Chitturi kiran.chitt...@lucidworks.com wrote: Jared, I faced a similar issue when using CloudSolrServer with Solr. As Shawn pointed out the 'TIME_WAIT' status happens when the connection is closed by the http client. HTTP client closes connection whenever it thinks the connection is stale ( https://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html #d5e405). Even the docs point out the stale connection checking cannot be all reliable. I see two ways to get around this: 1. Enable 'SO_REUSEADDR' 2. Disable stale connection checks. Also by default, when we create CSS it does not explicitly configure any http client parameters ( https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/a pache/solr/client/solrj/impl/CloudSolrServer.java#L124). In this case, the default configuration parameters (max connections, max connections per host) are used for a http connection. You can explicitly configure these params when creating CSS using HttpClientUtil: ModifiableSolrParams params = new ModifiableSolrParams(); params.set(HttpClientUtil.PROP_MAX_CONNECTIONS, 128); params.set(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 32); params.set(HttpClientUtil.PROP_FOLLOW_REDIRECTS, false); params.set(HttpClientUtil.PROP_CONNECTION_TIMEOUT, 3); httpClient = HttpClientUtil.createClient(params); final HttpClient client = HttpClientUtil.createClient(params); LBHttpSolrServer lb = new LBHttpSolrServer(client); CloudSolrServer server = new CloudSolrServer(zkConnect, lb); Currently, I am using http client 4.3.2 and building the client when creating the CSS. I also use 'SO_REUSEADDR' option and I haven't seen the 'TIME_WAIT' after this (may be because of better handling of stale connections in 4.3.2 or because of 'SO_REUSEADDR' param enabled). My current http client code looks like this: (works only with http client 4.3.2) HttpClientBuilder httpBuilder = HttpClientBuilder.create(); Builder socketConfig = SocketConfig.custom(); socketConfig.setSoReuseAddress(true); socketConfig.setSoTimeout(1); httpBuilder.setDefaultSocketConfig(socketConfig.build()); httpBuilder.setMaxConnTotal(300); httpBuilder.setMaxConnPerRoute(100); httpBuilder.disableRedirectHandling(); httpBuilder.useSystemProperties(); LBHttpSolrServer lb = new LBHttpSolrServer(httpClient, parser) CloudSolrServer server = new CloudSolrServer(zkConnect, lb); There should be a way to configure socket reuse with 4.2.3 too. You can try different configurations. I am surprised you have 'TIME_WAIT' connections even after 30 minutes because 'TIME_WAIT' connection should be closed by default in 2 mins by O.S I think. HTH, -- Kiran Chitturi, On 2/13/14 12:38 PM, Jared Rodriguez jrodrig...@kitedesk.com wrote: I am using solr/solrj 4.6.1 along with the apache httpclient 4.3.2 as part of a web application which connects to the solr server via solrj using CloudSolrServer(); The web application is wired up with Guice, and there is a single instance of the CloudSolrServer class used by all inbound requests. All this is running on Amazon. Basically, everything looks and runs fine for a while, but even with moderate concurrency, solrj starts leaving sockets open. We are handling only about 250 connections to the web app per minute and each of these issues from 3 - 7 requests to solr. Over a 30 minute period of this type of use, we end up with many 1000s of lingering sockets. I can see these when running netstats tcp0 0 ip-10-80-14-26.ec2.in:41098 ip-10-99-145-47.ec2.i:glrpc TIME_WAIT All to the same target host, which is my solr server. There are no other pieces of infrastructure on that box, just solr. Eventually, the server just dies as no further sockets can be opened and the opened ones are not reused. The solr server itself is unphased and
Re: Could not connect or ping a core after import a big data into it...
I found out in this stranger situation, I could import, update, or delete data(using DIH or SolrJ) But the query will waiting forever. So I delete all the documents or just reduce the document number and then restart the server, problem disappeared -- View this message in context: http://lucene.472066.n3.nabble.com/Could-not-connect-or-ping-a-core-after-import-a-big-data-into-it-tp4117416p4117852.html Sent from the Solr - User mailing list archive at Nabble.com.
Is it possible to load new elevate.xml on the fly?
Hi, I am trying to figure out a way to use multiple elevate.xml using the query parameters on the fly. We have a scenario where we need to elevate documents based on authentication (same core) without creating a new search handler. * For authenticated customers * elevate documents based on elevate1.xml *For non-authenticated customers* elevate documents based on elevate2.xml I am not sure if there is a way to implement this using any other method. Any help in this regard is appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-load-new-elevate-xml-on-the-fly-tp4117856.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Could not connect or ping a core after import a big data into it...
I solved it , my mistake I was using Solr4.6.1 jars, but in my solrconfig.xml I used LucenMatcheVersion 4.5 I just coped from last project and didn't check it. My really stupid mistake -- View this message in context: http://lucene.472066.n3.nabble.com/Could-not-connect-or-ping-a-core-after-import-a-big-data-into-it-tp4117416p4117859.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way to copy data from SolrCloud to standalone Solr?
I do know for certain that the backup command on a cloud core still works. We have a script like this running on a cron to snapshot indexes: curl -s ' http://localhost:8080/solr/#{core}/replication?command=backupnumberToKeep=4location=/tmp ' (not really using /tmp for this, parameters changed to protect the guilty) The admin handler for replication doesn't seem to be there, but the actual API seems to work normally. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. The Science of Influence Marketing 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Mon, Feb 17, 2014 at 2:02 PM, Shawn Heisey s...@elyograg.org wrote: On 2/17/2014 8:32 AM, Daniel Bryant wrote: I have a production SolrCloud server which has multiple sharded indexes, and I need to copy all of the indexes to a (non-cloud) Solr server within our QA environment. Can I ask for advice on the best way to do this please? I've searched the web and found solr2solr (https://github.com/dbashford/solr2solr), but the author states that this is best for small indexes, and ours are rather large at ~20Gb each. I've also looked at replication, but can't find a definite reference on how this should be done between SolrCloud and Solr? Any guidance is very much appreciated. If the master index isn't changing at the time of the copy, and you're on a non-Windows platform, you should be able to copy the index directory directly. On a Windows platform, whether you can copy the index while Solr is using it would depend on how Solr/Lucene opens the files. A typical Windows file open will prevent anything else from opening them, and I do not know whether Lucene is smarter than that. SolrCloud requires the replication handler to be enabled on all configs, but during normal operation, it does not actually use replication. This is a confusing thing for some users. I *think* you can configure the replication handler on slave cores with a non-cloud config that point at the master cores, and it should replicate the main Lucene index, but not the config files. I have no idea whether things will work right if you configure other master options like replicateAfter and config files, and I also don't know if those options might cause problems for SolrCloud itself. Those options shouldn't be necessary for just getting the data into a dev environment, though. Thanks, Shawn
Boost Query Example
Hi can some one help me on the Boost Sort query example. http://localhost:8983/solr/ ProductCollection/select?q=*%3A*wt=jsonindent=truefq=SKU:223-CL10V3^100 OR SKU:223-CL1^90 There is not different in the query Order, Let me know if I am missing something. Also I like to Order with the exact match for SKU:223-CL10V3^100 Thanks Ravi
Re: Facet cache issue when deleting documents from the index
I tried to set the expungeDeletes flag but it didn't fix the problem. The SolrServer doesn't expose a way to set this flag so I had to use: new UpdateRequest().setAction(UpdateRequest.ACTION.COMMIT, true, true, 1, true).process(solrServer); Any other hints? Note that I managed to run my test in my real environment at runtime and it passed, so it seems the behaviour depends on the size of the documents that are committed (added to or deleted from the index). Thanks, Marius On Mon, Feb 17, 2014 at 2:32 PM, Marius Dumitru Florea mariusdumitru.flo...@xwiki.com wrote: On Mon, Feb 17, 2014 at 2:00 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Also I noticed that in your code snippet you have server.delete(foo); which does not exists. deleteById and deleteByQuery methods are defined in SolrServer implementation. Yes, sorry, I have a wrapper over the SolrInstance that doesn't do much. In the case of delete it just forwards the call to deleteById. I'll check the expungeDeletes=true flag and post back the results. Thanks, Marius On Monday, February 17, 2014 1:42 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Marius, Facets are computed from indexed terms. Can you commit with expungeDeletes=true flag? Ahmet On Monday, February 17, 2014 12:17 PM, Marius Dumitru Florea mariusdumitru.flo...@xwiki.com wrote: Hi guys, I'm using Solr 4.6.1 (embedded) and for some reason the facet cache is not invalidated when documents are deleted from the index. Sadly, for me, I cannot reproduce this issue with an integration test like this: --8-- SolrInstance server = getSolrInstance(); SolrInputDocument document = new SolrInputDocument(); document.setField(id, foo); document.setField(locale, en); server.add(document); server.commit(); document = new SolrInputDocument(); document.setField(id, bar); document.setField(locale, en); server.add(document); server.commit(); SolrQuery query = new SolrQuery(*:*); query.set(facet, on); query.set(facet.field, locale); QueryResponse response = server.query(query); Assert.assertEquals(2, response.getResults().size()); FacetField localeFacet = response.getFacetField(locale); Assert.assertEquals(1, localeFacet.getValues().size()); Count en = localeFacet.getValues().get(0); Assert.assertEquals(en, en.getName()); Assert.assertEquals(2, en.getCount()); server.delete(foo); server.commit(); response = server.query(query); Assert.assertEquals(1, response.getResults().size()); localeFacet = response.getFacetField(locale); Assert.assertEquals(1, localeFacet.getValues().size()); en = localeFacet.getValues().get(0); Assert.assertEquals(en, en.getName()); Assert.assertEquals(1, en.getCount()); --8-- Nevertheless, when I do the 'same' on my real environment, the count for the locale facet remains 2 after one of the documents is deleted. The search result count is fine, so that's why I think it's a facet cache issue. Note that the facet count remains 2 even after I restart the server, so the cache is persisted on the file system. Strangely, the facet count is updated correctly if I modify the document instead of deleting it (i.e. removing a keyword from the content so that it isn't matched by the search query any more). So it looks like only delete triggers the issue. Now, an interesting fact is that if, on my real environment, I delete one of the documents and then add a new one, the facet count becomes 3. So the last commit to the index, which inserts a new document, doesn't trigger a re-computation of the facet cache. The previous facet cache is simply incremented, so the error is perpetuated. At this point I don't even know how to fix the facet cache without deleting the Solr data folder so that the full index is rebuild. I'm still trying to figure out what is the difference between the integration test and my real environment (as I used the same schema and configuration). Do you know what might be wrong? Thanks, Marius
Re: Boost Query Example
Hi, Filter queries don't affect score, so boosting won't have an effect there. If you want those query terms to get boosted, move them into the q parameter. http://wiki.apache.org/solr/CommonQueryParameters#fq Hope that helps! Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. The Science of Influence Marketing 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Mon, Feb 17, 2014 at 3:49 PM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi can some one help me on the Boost Sort query example. http://localhost:8983/solr/ProductCollection/select?q=*%3A*wt=jsonindent=truefq=SKU:223-CL10V3^100 OR SKU:223-CL1^90 There is not different in the query Order, Let me know if I am missing something. Also I like to Order with the exact match for SKU:223-CL10V3^100 Thanks Ravi
DIH and Tika
Is there a way to specify the document types that Tika parses? In my DIH I index the content of a SQL database which has a field that points to the SQL record's binary file (which could be Word, PDF, JPG, MOV, etc.). Tika then uses the document URL to index that document's content. However there are a lot of document types that Tika cannot parse. I'd like to limit Tika to just parsing Word and PDF documents so that I don't have to wait for Tika to determine the document type and whether or not it can parse it. I suspect that the number of exceptions being thrown over documents that Tika cannot read is increasing my indexing time significantly. Any guidance is appreciated. -Teague
Escape \\n from getting highlighted - highlighter component
Hi, When searching for a text like 'talk n text' the highlighter component also adds the em tags to the special characters like \n. Is there a way to avoid highlighting the special characters? \\r\\n Family Messaging is getting replaced as \\r\\emn/em Family Messaging -- View this message in context: http://lucene.472066.n3.nabble.com/Escape-n-from-getting-highlighted-highlighter-component-tp4117895.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Autosuggest - Strange issue with leading numbers in query
Ah, OK, I though you were indexing things like 123412335ga, but not so. Afraid I'm fresh out of ideas. Although I might try using TermsComponent to examine the index and see if, somehow, there _are_ terms with leading numbers in the output. It's also possible that numbers are stripped when building the FST that is used, but I don't know one way or the other. Best, Erick On Mon, Feb 17, 2014 at 11:30 AM, Developer bbar...@gmail.com wrote: Hi Erik, Thanks a lot for your reply. I expect it to return zero suggestions since the suggested keyword doesnt actually start with numbers. Expected results Searching for ga - returns galaxy Searching for gal - returns galaxy Searching for 12321312321312ga - should not return any suggestion since there is no keyword (combination) exists in the index. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Autosuggest-Strange-issue-with-leading-numbers-in-query-tp4116751p4117846.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Could not connect or ping a core after import a big data into it...
Glad it's resolved, thanks for letting us know, it removes some uncertainty. Erick On Mon, Feb 17, 2014 at 12:23 PM, Eric_Peng sagittariuse...@gmail.comwrote: I solved it , my mistake I was using Solr4.6.1 jars, but in my solrconfig.xml I used LucenMatcheVersion 4.5 I just coped from last project and didn't check it. My really stupid mistake -- View this message in context: http://lucene.472066.n3.nabble.com/Could-not-connect-or-ping-a-core-after-import-a-big-data-into-it-tp4117416p4117859.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR suggester component - Get suggestion dump
I started using terms component to view the terms and the counts... terms?terms.fl=autocomplete_phraseterms.regex=a.*terms.limit=1000 -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-suggester-component-Get-suggestion-dump-tp4110026p4117913.html Sent from the Solr - User mailing list archive at Nabble.com.
Preventing multiple on-deck searchers without causing failed commits
We're using Solr version 4.2.1, in case new functionality has helped with this issue. We have our Solr servers doing automatic soft commits with maxTime=1000. We also have a scheduled job that triggers a hard commit every fifteen minutes. When one of these hard commits happens while a soft commit is already in progress, we get that ubiquitous warning: PERFORMANCE WARNING: Overlapping onDeckSearchers=2 Recently, we had an occasion to have a second scheduled job also issue a hard commit every now and then. Since our maxWarmingSearchers value was set to the default, 2, we occasionally had a hard commit trigger when two other searchers were already warming up, which led to this: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request as the servers started responded with a 503 HTTP response. It seems like automatic soft commits wait until the hard commits are out of the way before they proceed. Is there a way to do the same for hard commits? Since we're passing waitSearcher=true in the update request that triggers the hard commits, I would expect the request to block until the server had enough headroom to service the commit. I did not expect that we'd start getting 503 responses. Is there a way to pull this off, either via some extra request parameters or via some server-side configuration?
Slow 95th-percentile
Hi all, I'm having trouble getting my Solr setup to get consistent performance. Average select latency is great, but 95% is dismal (10x average). It's probably something slightly misconfigured. I’ve seen it have nice, low variance latencies for a few hours here and there, but can’t figure out what’s different during those times. * I’m running 4.1.0 using SolrCloud. 3 replicas of 1 shard on 3 EC2 boxes (8proc, 30GB RAM, SSDs). Load peaks around 30 selects per second and about 150 updates per second. * The index has about 11GB of data in 14M docs, the other 10MB of data in 3K docs. Stays around 30 segments. * Soft commits after 10 seconds, hard commits after 120 seconds. Though, turning off the update traffic doesn’t seem to have any affect on the select latencies. * I think GC latency is low. Running 3GB heaps with 1G new size. GC time is around 3ms per second. Here’s a typical select query: fl=*,sortScore:textScoresort=textScore descstart=0q=text:((soccer OR MLS OR premier league OR FIFA OR world cup) OR (sorority OR fraternity OR greek life OR dorm OR campus))wt=jsonfq=startTime:[139265640 TO 139271754]fq={!frange l=2 u=3}timeflag(startTime)fq={!frange l=139265640 u=139269594 cache=false}timefix(startTime,-2160)fq=privacy:OPENdefType=edismaxrows=131 Anyone have any suggestions on where to look next? Or, if you know someone in the bay area that would consult for an hour or two and help me track it down, that’d be great too. Thanks! -Allan
Re: Preventing multiple on-deck searchers without causing failed commits
On 2/17/2014 6:06 PM, Colin Bartolome wrote: We're using Solr version 4.2.1, in case new functionality has helped with this issue. We have our Solr servers doing automatic soft commits with maxTime=1000. We also have a scheduled job that triggers a hard commit every fifteen minutes. When one of these hard commits happens while a soft commit is already in progress, we get that ubiquitous warning: PERFORMANCE WARNING: Overlapping onDeckSearchers=2 Recently, we had an occasion to have a second scheduled job also issue a hard commit every now and then. Since our maxWarmingSearchers value was set to the default, 2, we occasionally had a hard commit trigger when two other searchers were already warming up, which led to this: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request as the servers started responded with a 503 HTTP response. It seems like automatic soft commits wait until the hard commits are out of the way before they proceed. Is there a way to do the same for hard commits? Since we're passing waitSearcher=true in the update request that triggers the hard commits, I would expect the request to block until the server had enough headroom to service the commit. I did not expect that we'd start getting 503 responses. Remember this mantra: Hard commits are about durability, soft commits are about visibility. You might already know this, but it is the key to figuring out how to handle commits, whether they are user-triggered or done automatically by the server. With Solr 4.x, it's best to *always* configure autoCommit with openSearcher=false. This does a hard commit but does not open a new searcher. The result: Data is flushed to disk and the current transaction log is closed. New documents will not be searchable after this kind of commit. For maxTime and maxDocs, pick values that won't result in huge transaction logs, which increase Solr startup time. http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup For document visibility, you can rely on autoSoftCommit, and you indicated that you already have it configured. Decide how long you can wait for new content that has just been indexed. Do you *really* need new data to be searchable within one second? If so, you're good. If not, increase the maxTime value here. Be sure to make the value at least a little bit longer than the amount of time it takes for a soft commit to finish, including cache warmup time. Thanks, Shawn
Re: Slow 95th-percentile
On 2/17/2014 6:12 PM, Allan Carroll wrote: I'm having trouble getting my Solr setup to get consistent performance. Average select latency is great, but 95% is dismal (10x average). It's probably something slightly misconfigured. I’ve seen it have nice, low variance latencies for a few hours here and there, but can’t figure out what’s different during those times. * I’m running 4.1.0 using SolrCloud. 3 replicas of 1 shard on 3 EC2 boxes (8proc, 30GB RAM, SSDs). Load peaks around 30 selects per second and about 150 updates per second. * The index has about 11GB of data in 14M docs, the other 10MB of data in 3K docs. Stays around 30 segments. * Soft commits after 10 seconds, hard commits after 120 seconds. Though, turning off the update traffic doesn’t seem to have any affect on the select latencies. * I think GC latency is low. Running 3GB heaps with 1G new size. GC time is around 3ms per second. Here’s a typical select query: fl=*,sortScore:textScoresort=textScore descstart=0q=text:((soccer OR MLS OR premier league OR FIFA OR world cup) OR (sorority OR fraternity OR greek life OR dorm OR campus))wt=jsonfq=startTime:[139265640 TO 139271754]fq={!frange l=2 u=3}timeflag(startTime)fq={!frange l=139265640 u=139269594 cache=false}timefix(startTime,-2160)fq=privacy:OPENdefType=edismaxrows=131 The first thing to say is that it's fairly normal for the 95th and 99th percentile values to be quite a lot higher than the median and average values. I don't have actual values so I don't know if it's bad or not. You're good on the most important performance-related resource, which is memory for the OS disk cache. The only thing that stands out as a possible problem from what I know so far is garbage collection. It might be a case of full garbage collections happening too frequently, or it might be a case of garbage collection pauses taking too long. It might even be a combination of both. To fix frequent full collections, increase the heap size. To fix the other problem, use the CMS collector and tune it. Two bits of information will help with recommendations: Your java startup options, and your solrconfig.xml. You're using an option in your query that I've never seen before. I don't know if frange is slow or not. One last thing that might cause problems is super-frequent commits. I could also be completely wrong! Thanks, Shawn
Re: Preventing multiple on-deck searchers without causing failed commits
On 02/17/2014 05:38 PM, Shawn Heisey wrote: On 2/17/2014 6:06 PM, Colin Bartolome wrote: We're using Solr version 4.2.1, in case new functionality has helped with this issue. We have our Solr servers doing automatic soft commits with maxTime=1000. We also have a scheduled job that triggers a hard commit every fifteen minutes. When one of these hard commits happens while a soft commit is already in progress, we get that ubiquitous warning: PERFORMANCE WARNING: Overlapping onDeckSearchers=2 Recently, we had an occasion to have a second scheduled job also issue a hard commit every now and then. Since our maxWarmingSearchers value was set to the default, 2, we occasionally had a hard commit trigger when two other searchers were already warming up, which led to this: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request as the servers started responded with a 503 HTTP response. It seems like automatic soft commits wait until the hard commits are out of the way before they proceed. Is there a way to do the same for hard commits? Since we're passing waitSearcher=true in the update request that triggers the hard commits, I would expect the request to block until the server had enough headroom to service the commit. I did not expect that we'd start getting 503 responses. Remember this mantra: Hard commits are about durability, soft commits are about visibility. You might already know this, but it is the key to figuring out how to handle commits, whether they are user-triggered or done automatically by the server. With Solr 4.x, it's best to *always* configure autoCommit with openSearcher=false. This does a hard commit but does not open a new searcher. The result: Data is flushed to disk and the current transaction log is closed. New documents will not be searchable after this kind of commit. For maxTime and maxDocs, pick values that won't result in huge transaction logs, which increase Solr startup time. http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup For document visibility, you can rely on autoSoftCommit, and you indicated that you already have it configured. Decide how long you can wait for new content that has just been indexed. Do you *really* need new data to be searchable within one second? If so, you're good. If not, increase the maxTime value here. Be sure to make the value at least a little bit longer than the amount of time it takes for a soft commit to finish, including cache warmup time. Thanks, Shawn Increasing the maxTime value doesn't actually solve the problem, though; it just makes it a little less likely. Really, the soft commits aren't the problem here, as far as we can tell. It's that a request that triggers a hard commit simply fails when the server is already at maxWarmingSearchers. I would expect the request to queue up and wait until the server could handle it.
Re: Preventing multiple on-deck searchers without causing failed commits
On 2/17/2014 7:06 PM, Colin Bartolome wrote: Increasing the maxTime value doesn't actually solve the problem, though; it just makes it a little less likely. Really, the soft commits aren't the problem here, as far as we can tell. It's that a request that triggers a hard commit simply fails when the server is already at maxWarmingSearchers. I would expect the request to queue up and wait until the server could handle it. I think I put too much information in my reply. Apologies. Here's the most important information to deal with first: Don't send hard commits at all. Configure autoCommit in your server config, with the all-important openSearcher parameter set to false. That will take care of all your hard commit needs, but those commits will never open a new searcher, so they cannot cause an overlap with the soft commits that DO open a new searcher. Thanks, Shawn
Re: Limit amount of search result
hi Samee, Thank you very much for your suggestion. Now I got it worked now;) Chun. -- View this message in context: http://lucene.472066.n3.nabble.com/Limit-amount-of-search-result-tp4117062p4117952.html Sent from the Solr - User mailing list archive at Nabble.com.