Re: Frequent deletions
Also see this G+ post I wrote up recently showing how %tg deletions changes over time for an every add also deletes a previous document stress test: https://plus.google.com/112759599082866346694/posts/MJVueTznYnD Mike McCandless http://blog.mikemccandless.com On Wed, Dec 31, 2014 at 12:21 PM, Erick Erickson erickerick...@gmail.com wrote: It's usually not necessary to optimize, as more indexing happens you should see background merges happen that'll reclaim the space, so I wouldn't worry about it unless you're seeing actual problems that have to be addressed. Here's a great visualization of the process: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html See especially the third video, TieredMergePolicy which is the default. If you insist, however, try a commit with expungeDeletes=true and if that isn't enough, try an optimize call you can issue a force merge (aka optimize) command from the URL (Or cUrl etc) as: http://localhost:8983/solr/techproducts/update?optimize=true But please don't do this unless it's absolutely necessary. You state that you have frequent deletions, but eventually this shoul dall happen in the background. Optimize is a fairly expensive operation and should be used judiciously. Best, Erick On Wed, Dec 31, 2014 at 1:32 AM, ig01 inna.gel...@elbitsystems.com wrote: Hello, We perform frequent deletions from our index, which greatly increases the index size. How can we perform an optimization in order to reduce the size. Please advise, Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Frequent-deletions-tp4176689.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Create core problem in tomcat
You may have a field types in your schema that using stopwords.txt file like this: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ArabicNormalizationFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words= *lang/stopwords_ar.txt* / filter class=solr.StopFilterFactory ignoreCase=true words= *lang/stopwords_en.txt* / filter class=solr.StopFilterFactory ignoreCase=true words= *stopwords.txt* / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ArabicNormalizationFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words= *lang/stopwords_ar.txt* / filter class=solr.StopFilterFactory ignoreCase=true words= *lang/stopwords_en.txt* / filter class=solr.StopFilterFactory ignoreCase=true words= *stopwords.txt* / /analyzer /fieldType so, you must have files *stopwords_ar.txt* and* stopwords_en.txt* in INSTANCE_DIR/conf/lang/ and *stopwords.txt* in INSTANCE_DIR/conf/ sincerly, Mahmoud On Thu, Jan 1, 2015 at 9:18 AM, Noora noora.sa...@gmail.com wrote: Hi I'm using apache solr 4.7.2 ant apache tomcat ? I can't create core with query in my solr while I cat do it with jetty with the same config. The first problem was you can pass the system property -Dsolr.allow.unsafe.resourceloading=true to your JVM that I solve it in my Catalina.sh Now my error is : Unable to create core: uut8 Caused by: Can't find resource 'stopwords.txt' in classpath or conf My query is : http://10.1.221.210:8983/solr/admin/cores?action=CREATEname=my_coreinstanceDir=my_coredataDir=dataconfigSet=myConfig Can any one help me?
Re: SpellCheck (AutoComplete) Not Working In Distributed Environment
Shawn, When running SolrCloud do you even have to include the shards parameter ,shouldnt only shards.qt parameter suffice? On Dec 30, 2014 7:17 PM, Shawn Heisey apa...@elyograg.org wrote: On 12/30/2014 5:03 PM, Charles Sanders wrote: Thanks for the suggestion. I did not do that originally because the documentation states: This parameter is not required for the /select request handler. Which is what I am using. But I gave it a go, even though I'm not certain of the shard names. Now I have a NPE. solr/collection1/select?q=kernel+prows=1wt=jsonindent=trueshards.qt=/acshards=shard1,shard2 If this is not SolrCloud, then the shards parameter must include most of the full base URL for each shard that you will be querying. You can only use a bare shard name if you're running SolrCloud. The shards.qt parameter that you have used means that when the shards are consulted, the /ac handler will be used rather than /select. Here's an example of a shards parameter that will combine results from three cores on two machines. When not running SolrCloud, this is how you do distributed searching: shards= idxa2.example.com:8981/solr/ai-inclive,idxa1.example.com:8981/solr/ai-0live,idxa2.example.com:8981/solr/ai-1live SolrCloud hides almost all of this complexity. Thanks, Shawn
Re: Mixing 4.x SolrJ and Solr.war - compatible?
So, It seems I can't upgrade Solr beyond 4.7 as long as I'm running SolrJ on Java6 JVM. With any luck I might be able to compile SolrJ that's newer than 4.7 with Java6. I'll check that next. Thanks Shawn. That's very helping! On Wed, Dec 31, 2014 at 6:54 PM, Shawn Heisey apa...@elyograg.org wrote: On 12/31/2014 6:23 AM, Gili Nachum wrote: Can I use SolrJ v4.7 with the latest 4.x Solr.war? Should I switch the writer from Javabin, back to XML to ensure compatibility? http://wiki.apache.org/solr/Solrj#SolrJ.2FSolr_cross-version_compatibility I'm using CloudSolrServer. My client is running on Java6 so I can't go beyond 4.7. If you're running SolrCloud, I would not try it. SolrCloud is evolving very quickly and so many things have changed that running mismatched versions is likely to break. I have tried CloudSolrServer from 4.6.0 against a SolrCloud running 4.2.1, and a simple query will not work because of changes in the data contained in zookeeper. If you're not using CloudSolrServer, it should work very well even with a wide version discrepancy. Switching to XML is not required. The javabin version has only changed once, in version 3.1.0. It has been stable since that time. Thanks, Shawn
Re: Mixing 4.x SolrJ and Solr.war - compatible?
On 1/1/2015 6:34 AM, Gili Nachum wrote: So, It seems I can't upgrade Solr beyond 4.7 as long as I'm running SolrJ on Java6 JVM. With any luck I might be able to compile SolrJ that's newer than 4.7 with Java6. I'll check that next. Thanks Shawn. That's very helping! Solr 4.8 and later (compared to 4.7) have a very large number of code changes that will not compile under Java 6. It would be a *major* undertaking to change those back. When the project declared that Java 7 was the minimum version, we weren't just making a statement ... it really did become the minimum requirement. Oracle has stopped providing support for Java 6. Java 7 will also reach end of support in April 2015, so you might actually want to consider moving to Java 8, which has even more performance improvements over Java 7. Along with the latest Java 7 or Java 8, you should probably change your garbage collection tuning to G1. With a lot of invaluable help from Oracle employees, I've worked out a good set of parameters: https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning Thanks, Shawn
Re: Queries not supported by Lucene Query Parser syntax
Yes, you are always limited by the query parser syntax, but of course you can always write your own query parser as well. There is an open issue for an XML-based query parser that would give you greater control. but... it's not committed yet: https://issues.apache.org/jira/browse/SOLR-839 -- Jack Krupansky On Thu, Jan 1, 2015 at 4:08 AM, Leonid Bolshinsky leonid...@gmail.com wrote: Hello, Are we always limited by the query parser syntax when passing a query string to Solr? What about the query elements which are not supported by the syntax? For example, BooleanQuery.setMinimumNumberShouldMatch(n) is translated by BooleanQuery.toString() into ~n. But this is not a valid query syntax. So how can we express this via query syntax in Solr? And more general question: Given a Lucene Query object which was built programatically by a legacy code (which is using Lucene and not Solr), is there any way to translate it into Solr query (which must be a string). As Query.toString() doesn't have to be a valid Lucene query syntax, does it mean that the Solr query string must to be manually translated from the Lucene query object? Is there any utility that performs this job? And, again, what about queries not supported by the query syntax, like CustomScoreQuery, PayloadTermQuery etc.? Are we always limited in Solr by the query parser syntax? Thanks, Leonid
Re: Queries not supported by Lucene Query Parser syntax
Hi Leonid, Have you had a look at edismax query parser[1]? Isn't that any use to your requirement? I am not sure whether it is something that you are looking for. But the question seemed to be having a query related to that. [1] http://wiki.apache.org/solr/ExtendedDisMax#Query_Syntax On Thu, Jan 1, 2015 at 2:38 PM, Leonid Bolshinsky leonid...@gmail.com wrote: Hello, Are we always limited by the query parser syntax when passing a query string to Solr? What about the query elements which are not supported by the syntax? For example, BooleanQuery.setMinimumNumberShouldMatch(n) is translated by BooleanQuery.toString() into ~n. But this is not a valid query syntax. So how can we express this via query syntax in Solr? And more general question: Given a Lucene Query object which was built programatically by a legacy code (which is using Lucene and not Solr), is there any way to translate it into Solr query (which must be a string). As Query.toString() doesn't have to be a valid Lucene query syntax, does it mean that the Solr query string must to be manually translated from the Lucene query object? Is there any utility that performs this job? And, again, what about queries not supported by the query syntax, like CustomScoreQuery, PayloadTermQuery etc.? Are we always limited in Solr by the query parser syntax? Thanks, Leonid
Queries not supported by Lucene Query Parser syntax
Hello, Are we always limited by the query parser syntax when passing a query string to Solr? What about the query elements which are not supported by the syntax? For example, BooleanQuery.setMinimumNumberShouldMatch(n) is translated by BooleanQuery.toString() into ~n. But this is not a valid query syntax. So how can we express this via query syntax in Solr? And more general question: Given a Lucene Query object which was built programatically by a legacy code (which is using Lucene and not Solr), is there any way to translate it into Solr query (which must be a string). As Query.toString() doesn't have to be a valid Lucene query syntax, does it mean that the Solr query string must to be manually translated from the Lucene query object? Is there any utility that performs this job? And, again, what about queries not supported by the query syntax, like CustomScoreQuery, PayloadTermQuery etc.? Are we always limited in Solr by the query parser syntax? Thanks, Leonid
ignoring bad documents during index
Suppose I need to index a bulk of several documents ( D1 D2 D3 D4 ) - 4 documents in one request. If e.g. D3 was an incorrect , so exception will be thrown and HTTP response with 400 bad request will be returned . Documents D1 and D2 will be indexed, but D4 not . Also no indication will be returned . 1. If it is possible to ignore such an error and continue to index D4 ? 2. What will the best way to add an information about failed documents ? I thought about an update processor , with try / catch in addCommand and in case of exception add a doc ID to response . Or it may be better to implement a component or response writer to add the info ? -- View this message in context: http://lucene.472066.n3.nabble.com/ignoring-bad-documents-during-index-tp4176947.html Sent from the Solr - User mailing list archive at Nabble.com.
Garbage Collection tuning - G1 is now a good option
I've been working with Oracle employees to find better GC tuning options. The results are good enough to share with the community: https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning With the latest Java 7 or Java 8 version, and a couple of tuning options, G1GC has grown up enough to be a viable choice. Two of the settings on that list were critical for making the performance acceptable with my testing: ParallelRefProcEnabled and G1HeapRegionSize. I've included some notes on the wiki about how you can size the G1 heap regions appropriately for your own index. Thanks, Shawn
Re: Queries not supported by Lucene Query Parser syntax
Hi Lenoid, Here is another un-committed parser : https://issues.apache.org/jira/browse/LUCENE-5205 Ahmet On Thursday, January 1, 2015 5:59 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Leonid, I didn't look into solr qparser for a long time, but I think you should be able to combine different query parsers in one query. Look at the SolrQueryParser code, maybe now you can specify custom query parser for every clause (?), st like: foo AND {!lucene}bar I dont know, but worth exploring There is an another implementation of a query language, for which I know it allows to combine different query parsers in one (cause I wrote it), there the query goes this way: edismax(dog cat AND lucene((foo AND bar)~3)) meaning: use edismax to build the main query, but let lucene query parser build the 3rd clause - the nested 'for and bar' (parsers are expressed as function operators, so you can use any query parser there exist in SOLR) it is here, https://issues.apache.org/jira/browse/LUCENE-5014, but that was not reviewed/integrated either So no, you are not always limited by the query parser - you can combine them (in more or less limited fashion). But yes, the query parsers limit the expressiveness of your query language, but not what can be searched (they will all produce Query object). Best, roman On Thu, Jan 1, 2015 at 10:15 AM, Jack Krupansky jack.krupan...@gmail.com wrote: Yes, you are always limited by the query parser syntax, but of course you can always write your own query parser as well. There is an open issue for an XML-based query parser that would give you greater control. but... it's not committed yet: https://issues.apache.org/jira/browse/SOLR-839 -- Jack Krupansky On Thu, Jan 1, 2015 at 4:08 AM, Leonid Bolshinsky leonid...@gmail.com wrote: Hello, Are we always limited by the query parser syntax when passing a query string to Solr? What about the query elements which are not supported by the syntax? For example, BooleanQuery.setMinimumNumberShouldMatch(n) is translated by BooleanQuery.toString() into ~n. But this is not a valid query syntax. So how can we express this via query syntax in Solr? And more general question: Given a Lucene Query object which was built programatically by a legacy code (which is using Lucene and not Solr), is there any way to translate it into Solr query (which must be a string). As Query.toString() doesn't have to be a valid Lucene query syntax, does it mean that the Solr query string must to be manually translated from the Lucene query object? Is there any utility that performs this job? And, again, what about queries not supported by the query syntax, like CustomScoreQuery, PayloadTermQuery etc.? Are we always limited in Solr by the query parser syntax? Thanks, Leonid
Re: Frequent deletions
Is there a specific list of which data structures are sparce and non-sparce for Lucene and Solr (referencing G+ post)? I imagine this is obvious to low-level hackers, but could actually be nice to summarize it somewhere for troubleshooting. Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 1 January 2015 at 05:22, Michael McCandless luc...@mikemccandless.com wrote: Also see this G+ post I wrote up recently showing how %tg deletions changes over time for an every add also deletes a previous document stress test: https://plus.google.com/112759599082866346694/posts/MJVueTznYnD Mike McCandless http://blog.mikemccandless.com On Wed, Dec 31, 2014 at 12:21 PM, Erick Erickson erickerick...@gmail.com wrote: It's usually not necessary to optimize, as more indexing happens you should see background merges happen that'll reclaim the space, so I wouldn't worry about it unless you're seeing actual problems that have to be addressed. Here's a great visualization of the process: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html See especially the third video, TieredMergePolicy which is the default. If you insist, however, try a commit with expungeDeletes=true and if that isn't enough, try an optimize call you can issue a force merge (aka optimize) command from the URL (Or cUrl etc) as: http://localhost:8983/solr/techproducts/update?optimize=true But please don't do this unless it's absolutely necessary. You state that you have frequent deletions, but eventually this shoul dall happen in the background. Optimize is a fairly expensive operation and should be used judiciously. Best, Erick On Wed, Dec 31, 2014 at 1:32 AM, ig01 inna.gel...@elbitsystems.com wrote: Hello, We perform frequent deletions from our index, which greatly increases the index size. How can we perform an optimization in order to reduce the size. Please advise, Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Frequent-deletions-tp4176689.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Queries not supported by Lucene Query Parser syntax
Hi Leonid, I didn't look into solr qparser for a long time, but I think you should be able to combine different query parsers in one query. Look at the SolrQueryParser code, maybe now you can specify custom query parser for every clause (?), st like: foo AND {!lucene}bar I dont know, but worth exploring There is an another implementation of a query language, for which I know it allows to combine different query parsers in one (cause I wrote it), there the query goes this way: edismax(dog cat AND lucene((foo AND bar)~3)) meaning: use edismax to build the main query, but let lucene query parser build the 3rd clause - the nested 'for and bar' (parsers are expressed as function operators, so you can use any query parser there exist in SOLR) it is here, https://issues.apache.org/jira/browse/LUCENE-5014, but that was not reviewed/integrated either So no, you are not always limited by the query parser - you can combine them (in more or less limited fashion). But yes, the query parsers limit the expressiveness of your query language, but not what can be searched (they will all produce Query object). Best, roman On Thu, Jan 1, 2015 at 10:15 AM, Jack Krupansky jack.krupan...@gmail.com wrote: Yes, you are always limited by the query parser syntax, but of course you can always write your own query parser as well. There is an open issue for an XML-based query parser that would give you greater control. but... it's not committed yet: https://issues.apache.org/jira/browse/SOLR-839 -- Jack Krupansky On Thu, Jan 1, 2015 at 4:08 AM, Leonid Bolshinsky leonid...@gmail.com wrote: Hello, Are we always limited by the query parser syntax when passing a query string to Solr? What about the query elements which are not supported by the syntax? For example, BooleanQuery.setMinimumNumberShouldMatch(n) is translated by BooleanQuery.toString() into ~n. But this is not a valid query syntax. So how can we express this via query syntax in Solr? And more general question: Given a Lucene Query object which was built programatically by a legacy code (which is using Lucene and not Solr), is there any way to translate it into Solr query (which must be a string). As Query.toString() doesn't have to be a valid Lucene query syntax, does it mean that the Solr query string must to be manually translated from the Lucene query object? Is there any utility that performs this job? And, again, what about queries not supported by the query syntax, like CustomScoreQuery, PayloadTermQuery etc.? Are we always limited in Solr by the query parser syntax? Thanks, Leonid
Re: Queries not supported by Lucene Query Parser syntax
Hello Leonid, Yep. This problem exists and makes hard the migration from Lucene to Solr. You might be interested in Parboiled http://www.youtube.com/watch?v=DXiRYfFGHJE The simplest way to solve it is to serialize Lucene Query instance into parameter or request body. Unfortunately, Query is not Serializable, but it's possible to do this with non-invasive serializers like XStream. Then, QParserPlugin can read this param or a body and deserialize Lucene query instance. Have a good hack! On Thu, Jan 1, 2015 at 12:08 PM, Leonid Bolshinsky leonid...@gmail.com wrote: Hello, Are we always limited by the query parser syntax when passing a query string to Solr? What about the query elements which are not supported by the syntax? For example, BooleanQuery.setMinimumNumberShouldMatch(n) is translated by BooleanQuery.toString() into ~n. But this is not a valid query syntax. So how can we express this via query syntax in Solr? And more general question: Given a Lucene Query object which was built programatically by a legacy code (which is using Lucene and not Solr), is there any way to translate it into Solr query (which must be a string). As Query.toString() doesn't have to be a valid Lucene query syntax, does it mean that the Solr query string must to be manually translated from the Lucene query object? Is there any utility that performs this job? And, again, what about queries not supported by the query syntax, like CustomScoreQuery, PayloadTermQuery etc.? Are we always limited in Solr by the query parser syntax? Thanks, Leonid -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: ignoring bad documents during index
Hello, Please find below On Thu, Jan 1, 2015 at 11:59 PM, SolrUser1543 osta...@gmail.com wrote: 1. If it is possible to ignore such an error and continue to index D4 ? this can be done by catching and swallowing an exception in custom UpdateRequestProcessor 2. What will the best way to add an information about failed documents ? I thought about an update processor , with try / catch in addCommand and in case of exception add a doc ID to response . Or it may be better to implement a component or response writer to add the info ? it turns that you can add this info into SolrQueryResponse.getValues() even in custom UpdateRequestProcessor and it should be responded back. -- View this message in context: http://lucene.472066.n3.nabble.com/ignoring-bad-documents-during-index-tp4176947.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Garbage Collection tuning - G1 is now a good option
But tons of people on this mailing list do not recommend AggressiveOpts Why do you recommend it? On Thu, Jan 1, 2015 at 12:10 PM, Shawn Heisey apa...@elyograg.org wrote: I've been working with Oracle employees to find better GC tuning options. The results are good enough to share with the community: https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning With the latest Java 7 or Java 8 version, and a couple of tuning options, G1GC has grown up enough to be a viable choice. Two of the settings on that list were critical for making the performance acceptable with my testing: ParallelRefProcEnabled and G1HeapRegionSize. I've included some notes on the wiki about how you can size the G1 heap regions appropriately for your own index. Thanks, Shawn -- Bill Bell billnb...@gmail.com cell 720-256-8076
UseLargePages
Do you think setting aside 2GB for UseLargePages would generally help indexing or not? I can imaging it might help -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: poor performance when connecting to CloudSolrServer(zkHosts) using solrJ
My two cents, do check network connectivity. In past I remember changing the zookeeper server name to actual IP improved the speed a bit. DNS sometimes take time to resolve hostname. Could be worth trying this option. Thanks -Hussain On Mon, Dec 29, 2014 at 6:31 PM, Shawn Heisey apa...@elyograg.org wrote: On 12/29/2014 6:52 PM, zhangjia...@dcits.com wrote: I setups a SolrCloud, and code a simple solrJ program to query solr data as below, but it takes about 40 seconds to new CloudSolrServer instance,less than 100 miliseconds is acceptable. what is going on when new CloudSolrServer? and how to fix this issue? String zkHost = bicenter1.dcc:2181,datanode2.dcc:2181; String defaultCollection = hdfsCollection; long startms=System.currentTimeMillis(); CloudSolrServer server = new CloudSolrServer(zkHost); server.setDefaultCollection(defaultCollection); server.setZkConnectTimeout(3000); server.setZkClientTimeout(6000); long endms=System.currentTimeMillis(); System.out.println(endms-startms); ModifiableSolrParams params = new ModifiableSolrParams(); params.set(q, id:*hbase*); params.set(sort, price desc); params.set(start, 0); params.set(rows, 10); try { QueryResponse response=server.query(params); SolrDocumentList results = response.getResults(); for (SolrDocument doc:results) { String rowkey=doc.getFieldValue(id).toString(); } } catch (SolrServerException e) { // TODO Auto-generated catch block e.printStackTrace(); } server.shutdown(); The only part of the constructor for CloudSolrServer that I cannot easily look at is the part that creates the httpclient, because ultimately that calls code outside of Solr, in the HttpComponents project. Everything that I *can* see is code that should happen extremely quickly, and the httpclient creation code is something that I have used myself and never had any noticeable delay. The constructor for CloudSolrServer does *NOT* contact zookeeper or Solr, it merely sets up the instance. Nothing is contacted until a request is made. I examined the CloudSolrServer code from branch_5x. I tried out your code (with SolrJ 4.6.0 against a SolrCloud 4.2.1 cluster). Although the query itself encountered an exception in zookeeper (probably from the version discrepancy between Solr and SolrJ), the elapsed time printed out from the CloudSolrServer initialization was 240 milliseconds on the first run, 60 milliseconds on a second run, and 64 milliseconds on a third run. Those are all MUCH less than the 1000 milliseconds that would represent one second, and incredibly less than the 4 milliseconds that would represent 40 seconds. Side issue: I hope that you have more than two zookeeper servers in your ensemble. A two-node zookeeper ensemble is actually *less* reliable than a single node, because a failure of EITHER of those two nodes will result in a loss of quorum. Three nodes is the minimum required for a redundant zookeeper ensemble. Thanks, Shawn
RE: poor performance when connecting to CloudSolrServer(zkHosts) using solrJ
While I'm not a net optimization whiz, a properly configured DNS client will cache the recent resolved lookups; this way even though you are referring to the Fully Qualified Domain Name (FQDN), the local DNS client will return the recently acquired IP address (within the constraints of the Domain's configuration). In other words, while there is overhead between the local workstation/computer and the DNS client, it will NOT require access to the configured DNS server upstream. Enjoy,Steve Date: Thu, 1 Jan 2015 14:30:19 -0800 Subject: Re: poor performance when connecting to CloudSolrServer(zkHosts) using solrJ From: mohd.huss...@gmail.com To: solr-user@lucene.apache.org My two cents, do check network connectivity. In past I remember changing the zookeeper server name to actual IP improved the speed a bit. DNS sometimes take time to resolve hostname. Could be worth trying this option. Thanks -Hussain On Mon, Dec 29, 2014 at 6:31 PM, Shawn Heisey apa...@elyograg.org wrote: On 12/29/2014 6:52 PM, zhangjia...@dcits.com wrote: I setups a SolrCloud, and code a simple solrJ program to query solr data as below, but it takes about 40 seconds to new CloudSolrServer instance,less than 100 miliseconds is acceptable. what is going on when new CloudSolrServer? and how to fix this issue? String zkHost = bicenter1.dcc:2181,datanode2.dcc:2181; String defaultCollection = hdfsCollection; long startms=System.currentTimeMillis(); CloudSolrServer server = new CloudSolrServer(zkHost); server.setDefaultCollection(defaultCollection); server.setZkConnectTimeout(3000); server.setZkClientTimeout(6000); long endms=System.currentTimeMillis(); System.out.println(endms-startms); ModifiableSolrParams params = new ModifiableSolrParams(); params.set(q, id:*hbase*); params.set(sort, price desc); params.set(start, 0); params.set(rows, 10); try { QueryResponse response=server.query(params); SolrDocumentList results = response.getResults(); for (SolrDocument doc:results) { String rowkey=doc.getFieldValue(id).toString(); } } catch (SolrServerException e) { // TODO Auto-generated catch block e.printStackTrace(); } server.shutdown(); The only part of the constructor for CloudSolrServer that I cannot easily look at is the part that creates the httpclient, because ultimately that calls code outside of Solr, in the HttpComponents project. Everything that I *can* see is code that should happen extremely quickly, and the httpclient creation code is something that I have used myself and never had any noticeable delay. The constructor for CloudSolrServer does *NOT* contact zookeeper or Solr, it merely sets up the instance. Nothing is contacted until a request is made. I examined the CloudSolrServer code from branch_5x. I tried out your code (with SolrJ 4.6.0 against a SolrCloud 4.2.1 cluster). Although the query itself encountered an exception in zookeeper (probably from the version discrepancy between Solr and SolrJ), the elapsed time printed out from the CloudSolrServer initialization was 240 milliseconds on the first run, 60 milliseconds on a second run, and 64 milliseconds on a third run. Those are all MUCH less than the 1000 milliseconds that would represent one second, and incredibly less than the 4 milliseconds that would represent 40 seconds. Side issue: I hope that you have more than two zookeeper servers in your ensemble. A two-node zookeeper ensemble is actually *less* reliable than a single node, because a failure of EITHER of those two nodes will result in a loss of quorum. Three nodes is the minimum required for a redundant zookeeper ensemble. Thanks, Shawn