RE: inconsistent number of results returned in solr cloud
HI I am using solr 4.0 (Not BETA), and have created 2 shard 2 replica configuration. But when I query solr with filter query it returns inconsistent result count. Without filter query it returns same consistent result count. I don't understand why? Can any one help in this? Best Regards Hardik Upadhyay
Re: inconsistent number of results returned in solr cloud
check for dup id's a quick way is to facet using the id as a field and set the mincount to 2. -Mike Hardik Upadhyay wrote: HI I am using solr 4.0 (Not BETA), and have created 2 shard 2 replica configuration. But when I query solr with filter query it returns inconsistent result count. Without filter query it returns same consistent result count. I don't understand why? Can any one help in this? Best Regards Hardik Upadhyay
Re: inconsistent number of results returned in solr cloud
Just glad it's resolved Erick On Thu, Nov 29, 2012 at 7:46 PM, Buttler, David buttl...@llnl.gov wrote: Sorry, yes, I had been using the BETA version. I have deleted all of that, replaced the jars with the released versions (reduced my core count), and now I have consistent results. I guess I missed that JIRA ticket, sorry for the false alarm. Dave -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, November 23, 2012 4:25 AM To: solr-user@lucene.apache.org Subject: Re: inconsistent number of results returned in solr cloud Dave: I should have asked this first. What version of Solr are you using? I Not sure whether it was fixed in BETA or not (certainly is in the 4.0 GA release). There was a problem with adding a doclist via solrj, here's one related JIRA, although it wasn't the main fix: https://issues.apache.org/jira/browse/SOLR-3001. I suspect that's the known problem Mark mentioned. Because what you're seeing _sure_ sounds similar Best Erick On Mon, Nov 19, 2012 at 12:49 PM, Buttler, David buttl...@llnl.gov wrote: Answers inline below -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Saturday, November 17, 2012 6:40 AM To: solr-user@lucene.apache.org Subject: Re: inconsistent number of results returned in solr cloud Hmmm, first an aside. If by commit after every batch of documents you mean after every call to server.add(doclist), there's no real need to do that unless you're striving for really low latency. the usual recommendation is to use commitWithin when adding and commit only at the very end of the run. This shouldn't actually be germane to your issue, just an FYI. DB Good point. The code for committing docs to solr is fairly old. DB I will update it since I don't have a latency requirement. So you're saying that the inconsistency is permanent? By that I mean it keeps coming back inconsistently for minutes/hours/days? DB Yes, it is permanent. I have collections that have been up for DB weeks, and are still returning inconsistent results, and I haven't been adding any additional documents. DB Related to this, I seem to have a discrepancy between the number DB of documents I think I am sending to solr, and the number of documents it is reporting. I have tried reducing the number of shards for one of my small collections, so I deleted all references to this collections, and reloaded it. I think I have 260 documents submitted (counted from a hadoop job). Solr returns a count of ~430 (it varies), and the first returned document is not consistent. I guess if I were trying to test this I'd need to know how you added subsequent collections. In particular what you did re: zookeeper as you added each collection. DB These are my steps DB 1. Create the collection via the HTTP API: http:// host:port/solr/admin/collections?action=CREATEname=collectionn umShards=6%20collection.configName=collection DB 2. Relaunch one of my JVM processes, bootstrapping the collection: DB java -Xmx16g -Dcollection.configName=collection DB -Djetty.port=port -DzkHost=zkhost -Dsolr.solr.home=solr home -DnumShards=6 -Dbootstrap_confdir=conf -jar start.jar DB load data DB Let me know if something is unclear. I can run through the DB process again and document it more carefully. DB DB Thanks for looking at it, DB Dave Best Erick On Fri, Nov 16, 2012 at 2:55 PM, Buttler, David buttl...@llnl.gov wrote: My typical way of adding documents is through SolrJ, where I commit after every batch of documents (where the batch size is configurable) I have now tried committing several times, from the command line (curl) with and without openSearcher=true. It does not affect anything. Dave -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, November 16, 2012 11:04 AM To: solr-user@lucene.apache.org Subject: Re: inconsistent number of results returned in solr cloud How did you do the final commit? Can you try a lone commit (with openSearcher=true) and see if that affects things? Trying to determine if this is a known issue or not. - Mark On Nov 16, 2012, at 1:34 PM, Buttler, David buttl...@llnl.gov wrote: Hi all, I buried an issue in my last post, so let me pop it up. I have a cluster with 10 collections on it. The first collection I loaded works perfectly. But every subsequent collection returns an inconsistent number of results for each query. The queries can be simply *:*, or more complex facet queries. If I go to individual cores and issue the query, with distrib=false, I get a consistent number of results. I am wondering if there is some delay in returning results from my shards, and the queried node just times out and displays
RE: inconsistent number of results returned in solr cloud
Sorry, yes, I had been using the BETA version. I have deleted all of that, replaced the jars with the released versions (reduced my core count), and now I have consistent results. I guess I missed that JIRA ticket, sorry for the false alarm. Dave -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, November 23, 2012 4:25 AM To: solr-user@lucene.apache.org Subject: Re: inconsistent number of results returned in solr cloud Dave: I should have asked this first. What version of Solr are you using? I Not sure whether it was fixed in BETA or not (certainly is in the 4.0 GA release). There was a problem with adding a doclist via solrj, here's one related JIRA, although it wasn't the main fix: https://issues.apache.org/jira/browse/SOLR-3001. I suspect that's the known problem Mark mentioned. Because what you're seeing _sure_ sounds similar Best Erick On Mon, Nov 19, 2012 at 12:49 PM, Buttler, David buttl...@llnl.gov wrote: Answers inline below -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Saturday, November 17, 2012 6:40 AM To: solr-user@lucene.apache.org Subject: Re: inconsistent number of results returned in solr cloud Hmmm, first an aside. If by commit after every batch of documents you mean after every call to server.add(doclist), there's no real need to do that unless you're striving for really low latency. the usual recommendation is to use commitWithin when adding and commit only at the very end of the run. This shouldn't actually be germane to your issue, just an FYI. DB Good point. The code for committing docs to solr is fairly old. DB I will update it since I don't have a latency requirement. So you're saying that the inconsistency is permanent? By that I mean it keeps coming back inconsistently for minutes/hours/days? DB Yes, it is permanent. I have collections that have been up for DB weeks, and are still returning inconsistent results, and I haven't been adding any additional documents. DB Related to this, I seem to have a discrepancy between the number DB of documents I think I am sending to solr, and the number of documents it is reporting. I have tried reducing the number of shards for one of my small collections, so I deleted all references to this collections, and reloaded it. I think I have 260 documents submitted (counted from a hadoop job). Solr returns a count of ~430 (it varies), and the first returned document is not consistent. I guess if I were trying to test this I'd need to know how you added subsequent collections. In particular what you did re: zookeeper as you added each collection. DB These are my steps DB 1. Create the collection via the HTTP API: http:// host:port/solr/admin/collections?action=CREATEname=collectionn umShards=6%20collection.configName=collection DB 2. Relaunch one of my JVM processes, bootstrapping the collection: DB java -Xmx16g -Dcollection.configName=collection DB -Djetty.port=port -DzkHost=zkhost -Dsolr.solr.home=solr home -DnumShards=6 -Dbootstrap_confdir=conf -jar start.jar DB load data DB Let me know if something is unclear. I can run through the DB process again and document it more carefully. DB DB Thanks for looking at it, DB Dave Best Erick On Fri, Nov 16, 2012 at 2:55 PM, Buttler, David buttl...@llnl.gov wrote: My typical way of adding documents is through SolrJ, where I commit after every batch of documents (where the batch size is configurable) I have now tried committing several times, from the command line (curl) with and without openSearcher=true. It does not affect anything. Dave -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, November 16, 2012 11:04 AM To: solr-user@lucene.apache.org Subject: Re: inconsistent number of results returned in solr cloud How did you do the final commit? Can you try a lone commit (with openSearcher=true) and see if that affects things? Trying to determine if this is a known issue or not. - Mark On Nov 16, 2012, at 1:34 PM, Buttler, David buttl...@llnl.gov wrote: Hi all, I buried an issue in my last post, so let me pop it up. I have a cluster with 10 collections on it. The first collection I loaded works perfectly. But every subsequent collection returns an inconsistent number of results for each query. The queries can be simply *:*, or more complex facet queries. If I go to individual cores and issue the query, with distrib=false, I get a consistent number of results. I am wondering if there is some delay in returning results from my shards, and the queried node just times out and displays the number of results that it has received so far. If there is such a timeout, it must be very small, as my QTime is around 11 ms. Dave
Re: inconsistent number of results returned in solr cloud
Dave: I should have asked this first. What version of Solr are you using? I Not sure whether it was fixed in BETA or not (certainly is in the 4.0 GA release). There was a problem with adding a doclist via solrj, here's one related JIRA, although it wasn't the main fix: https://issues.apache.org/jira/browse/SOLR-3001. I suspect that's the known problem Mark mentioned. Because what you're seeing _sure_ sounds similar Best Erick On Mon, Nov 19, 2012 at 12:49 PM, Buttler, David buttl...@llnl.gov wrote: Answers inline below -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Saturday, November 17, 2012 6:40 AM To: solr-user@lucene.apache.org Subject: Re: inconsistent number of results returned in solr cloud Hmmm, first an aside. If by commit after every batch of documents you mean after every call to server.add(doclist), there's no real need to do that unless you're striving for really low latency. the usual recommendation is to use commitWithin when adding and commit only at the very end of the run. This shouldn't actually be germane to your issue, just an FYI. DB Good point. The code for committing docs to solr is fairly old. I will update it since I don't have a latency requirement. So you're saying that the inconsistency is permanent? By that I mean it keeps coming back inconsistently for minutes/hours/days? DB Yes, it is permanent. I have collections that have been up for weeks, and are still returning inconsistent results, and I haven't been adding any additional documents. DB Related to this, I seem to have a discrepancy between the number of documents I think I am sending to solr, and the number of documents it is reporting. I have tried reducing the number of shards for one of my small collections, so I deleted all references to this collections, and reloaded it. I think I have 260 documents submitted (counted from a hadoop job). Solr returns a count of ~430 (it varies), and the first returned document is not consistent. I guess if I were trying to test this I'd need to know how you added subsequent collections. In particular what you did re: zookeeper as you added each collection. DB These are my steps DB 1. Create the collection via the HTTP API: http:// host:port/solr/admin/collections?action=CREATEname=collectionnumShards=6%20collection.configName=collection DB 2. Relaunch one of my JVM processes, bootstrapping the collection: DB java -Xmx16g -Dcollection.configName=collection -Djetty.port=port -DzkHost=zkhost -Dsolr.solr.home=solr home -DnumShards=6 -Dbootstrap_confdir=conf -jar start.jar DB load data DB Let me know if something is unclear. I can run through the process again and document it more carefully. DB DB Thanks for looking at it, DB Dave Best Erick On Fri, Nov 16, 2012 at 2:55 PM, Buttler, David buttl...@llnl.gov wrote: My typical way of adding documents is through SolrJ, where I commit after every batch of documents (where the batch size is configurable) I have now tried committing several times, from the command line (curl) with and without openSearcher=true. It does not affect anything. Dave -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, November 16, 2012 11:04 AM To: solr-user@lucene.apache.org Subject: Re: inconsistent number of results returned in solr cloud How did you do the final commit? Can you try a lone commit (with openSearcher=true) and see if that affects things? Trying to determine if this is a known issue or not. - Mark On Nov 16, 2012, at 1:34 PM, Buttler, David buttl...@llnl.gov wrote: Hi all, I buried an issue in my last post, so let me pop it up. I have a cluster with 10 collections on it. The first collection I loaded works perfectly. But every subsequent collection returns an inconsistent number of results for each query. The queries can be simply *:*, or more complex facet queries. If I go to individual cores and issue the query, with distrib=false, I get a consistent number of results. I am wondering if there is some delay in returning results from my shards, and the queried node just times out and displays the number of results that it has received so far. If there is such a timeout, it must be very small, as my QTime is around 11 ms. Dave
RE: inconsistent number of results returned in solr cloud
Answers inline below -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Saturday, November 17, 2012 6:40 AM To: solr-user@lucene.apache.org Subject: Re: inconsistent number of results returned in solr cloud Hmmm, first an aside. If by commit after every batch of documents you mean after every call to server.add(doclist), there's no real need to do that unless you're striving for really low latency. the usual recommendation is to use commitWithin when adding and commit only at the very end of the run. This shouldn't actually be germane to your issue, just an FYI. DB Good point. The code for committing docs to solr is fairly old. I will update it since I don't have a latency requirement. So you're saying that the inconsistency is permanent? By that I mean it keeps coming back inconsistently for minutes/hours/days? DB Yes, it is permanent. I have collections that have been up for weeks, and are still returning inconsistent results, and I haven't been adding any additional documents. DB Related to this, I seem to have a discrepancy between the number of documents I think I am sending to solr, and the number of documents it is reporting. I have tried reducing the number of shards for one of my small collections, so I deleted all references to this collections, and reloaded it. I think I have 260 documents submitted (counted from a hadoop job). Solr returns a count of ~430 (it varies), and the first returned document is not consistent. I guess if I were trying to test this I'd need to know how you added subsequent collections. In particular what you did re: zookeeper as you added each collection. DB These are my steps DB 1. Create the collection via the HTTP API: http://host:port/solr/admin/collections?action=CREATEname=collectionnumShards=6%20collection.configName=collection DB 2. Relaunch one of my JVM processes, bootstrapping the collection: DB java -Xmx16g -Dcollection.configName=collection -Djetty.port=port -DzkHost=zkhost -Dsolr.solr.home=solr home -DnumShards=6 -Dbootstrap_confdir=conf -jar start.jar DB load data DB Let me know if something is unclear. I can run through the process again and document it more carefully. DB DB Thanks for looking at it, DB Dave Best Erick On Fri, Nov 16, 2012 at 2:55 PM, Buttler, David buttl...@llnl.gov wrote: My typical way of adding documents is through SolrJ, where I commit after every batch of documents (where the batch size is configurable) I have now tried committing several times, from the command line (curl) with and without openSearcher=true. It does not affect anything. Dave -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, November 16, 2012 11:04 AM To: solr-user@lucene.apache.org Subject: Re: inconsistent number of results returned in solr cloud How did you do the final commit? Can you try a lone commit (with openSearcher=true) and see if that affects things? Trying to determine if this is a known issue or not. - Mark On Nov 16, 2012, at 1:34 PM, Buttler, David buttl...@llnl.gov wrote: Hi all, I buried an issue in my last post, so let me pop it up. I have a cluster with 10 collections on it. The first collection I loaded works perfectly. But every subsequent collection returns an inconsistent number of results for each query. The queries can be simply *:*, or more complex facet queries. If I go to individual cores and issue the query, with distrib=false, I get a consistent number of results. I am wondering if there is some delay in returning results from my shards, and the queried node just times out and displays the number of results that it has received so far. If there is such a timeout, it must be very small, as my QTime is around 11 ms. Dave
Re: inconsistent number of results returned in solr cloud
Hmmm, first an aside. If by commit after every batch of documents you mean after every call to server.add(doclist), there's no real need to do that unless you're striving for really low latency. the usual recommendation is to use commitWithin when adding and commit only at the very end of the run. This shouldn't actually be germane to your issue, just an FYI. So you're saying that the inconsistency is permanent? By that I mean it keeps coming back inconsistently for minutes/hours/days? I guess if I were trying to test this I'd need to know how you added subsequent collections. In particular what you did re: zookeeper as you added each collection. Best Erick On Fri, Nov 16, 2012 at 2:55 PM, Buttler, David buttl...@llnl.gov wrote: My typical way of adding documents is through SolrJ, where I commit after every batch of documents (where the batch size is configurable) I have now tried committing several times, from the command line (curl) with and without openSearcher=true. It does not affect anything. Dave -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, November 16, 2012 11:04 AM To: solr-user@lucene.apache.org Subject: Re: inconsistent number of results returned in solr cloud How did you do the final commit? Can you try a lone commit (with openSearcher=true) and see if that affects things? Trying to determine if this is a known issue or not. - Mark On Nov 16, 2012, at 1:34 PM, Buttler, David buttl...@llnl.gov wrote: Hi all, I buried an issue in my last post, so let me pop it up. I have a cluster with 10 collections on it. The first collection I loaded works perfectly. But every subsequent collection returns an inconsistent number of results for each query. The queries can be simply *:*, or more complex facet queries. If I go to individual cores and issue the query, with distrib=false, I get a consistent number of results. I am wondering if there is some delay in returning results from my shards, and the queried node just times out and displays the number of results that it has received so far. If there is such a timeout, it must be very small, as my QTime is around 11 ms. Dave
Re: inconsistent number of results returned in solr cloud
How did you do the final commit? Can you try a lone commit (with openSearcher=true) and see if that affects things? Trying to determine if this is a known issue or not. - Mark On Nov 16, 2012, at 1:34 PM, Buttler, David buttl...@llnl.gov wrote: Hi all, I buried an issue in my last post, so let me pop it up. I have a cluster with 10 collections on it. The first collection I loaded works perfectly. But every subsequent collection returns an inconsistent number of results for each query. The queries can be simply *:*, or more complex facet queries. If I go to individual cores and issue the query, with distrib=false, I get a consistent number of results. I am wondering if there is some delay in returning results from my shards, and the queried node just times out and displays the number of results that it has received so far. If there is such a timeout, it must be very small, as my QTime is around 11 ms. Dave
RE: inconsistent number of results returned in solr cloud
My typical way of adding documents is through SolrJ, where I commit after every batch of documents (where the batch size is configurable) I have now tried committing several times, from the command line (curl) with and without openSearcher=true. It does not affect anything. Dave -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, November 16, 2012 11:04 AM To: solr-user@lucene.apache.org Subject: Re: inconsistent number of results returned in solr cloud How did you do the final commit? Can you try a lone commit (with openSearcher=true) and see if that affects things? Trying to determine if this is a known issue or not. - Mark On Nov 16, 2012, at 1:34 PM, Buttler, David buttl...@llnl.gov wrote: Hi all, I buried an issue in my last post, so let me pop it up. I have a cluster with 10 collections on it. The first collection I loaded works perfectly. But every subsequent collection returns an inconsistent number of results for each query. The queries can be simply *:*, or more complex facet queries. If I go to individual cores and issue the query, with distrib=false, I get a consistent number of results. I am wondering if there is some delay in returning results from my shards, and the queried node just times out and displays the number of results that it has received so far. If there is such a timeout, it must be very small, as my QTime is around 11 ms. Dave