Re: Wildcard search makes no sense!!
Many many thanks for the replies - it was helpful for me to start understanding how this works. I'm using 3.5 so this goes to explain a lot. What I have done is if the query contains a * I make the query lowercase before sending to solr. This seems to have solved this issue given your explanation above. Many thanks Something that is still not clear in my mind is how this tokenising works. For example with the filters I have when I run the analyser I get: Field: Hello You Hello|You Hello|You Hello|You hello|you hello|you Does this mean that the index is stored as 'hello|you' (the final one) and that when I run a query and it goes through the filters whatever the end result of that is must match the 'hello|you' in order to return a result? -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162284.html Sent from the Solr - User mailing list archive at Nabble.com.
Master-Slave setup using SolrCloud
Hello, We are trying to move our traditional master-slave Solr configuration to SolrCloud. As our index size is very small (around 1 GB), we are having only one shard. So basically, we are having same master-slave configuration with one leader and 6 replicas. We are experimenting with maxTime of both AutoCommit and AutoSoftCommit. Currently, autoCommit maxTime is 15 minutes and autoSoftCommit is 1 minute (Let me know if these values does not make sense). Caches are set such that warmup time is at most 20 seconds. We are having continuous indexing requests mostly for updating the existing documents. Few requests are for deleting/adding the documents. The problem we are facing is that we are getting very frequent NullPointerExceptions. We get continuous 200-300 such exceptions within a period of 30 seconds and for next few minutes, it works fine. Stacktrace of NullPointerException: *ERROR - 2014-10-02 18:09:38.464; org.apache.solr.common.SolrException; null:java.lang.NullPointerException* *at org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:1257)* *at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:720)* *at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:695)* I am not sure what would be causing it. My guess, whenever, it is trying to replay tlog, we are getting these exceptions. Is anything wrong in my configuration? -Sachin-
RE: Solr Replication during Tomcat shutdown causes shutdown to hang/fail
I was helping to look into this with Nick I think we may have figured out the core of the problem... The problem is easily reproducible by starting replication on the slave and then sending a shutdown command to tomcat (e.g. catalina.sh stop). With a debugger attached, it looks like the fsyncService thread is blocking VM shutdown because it is created as a non-daemon thread. Essentially what seems to be happening is that the fsyncService thread is running when 'catalina.sh stop' is executed. This goes in and calls SnapPuller.destroy() which aborts the current sync. Around line 517 of the SnapPuller, there is code that is supposed to cleanup the fsyncService thread, but I don't think it is getting executed because the thread that called SnapPuller.fetchLatestIndex() is configured as a daemon Thread, so the JVM ends up shutting that down before it can cleanup the fysncService... So... it seems like: if (fsyncService != null) ExecutorUtil.shutdownNowAndAwaitTermination(fsyncService); could be added around line 1706 of SnapPuller.java, or puller.setDaemon(*false*); could be added around line 230 of ReplicationHandler.java, however this needs some additional work (and I think it might need to be added regardless) since the cleanup code in SnapPuller(around 517) that shuts down the fsync thread never gets execute since logReplicationTimeAndConfFiles() can throw IO exceptions bypassing the rest of the finally block...So the call to logReplicationTimeAndConfFiles() around line 512 would need to get wrapped with a try/catch block to catch the IO exception... I can submit patches if needed... and cross post to the dev mailing list... -Phil
Re: Performance improvement in latest version comparing to v1.4
On 10/1/2014 11:23 PM, Danesh Kuruppu wrote: Currently we are using solr for service meta data indexing and Searching . we have embedded solr server running in our application and we are using solr 1.4 version. Have some doubts to be clear. 1. What are the performance improvements we can gain from updating to the latest solr version(4.10.1). One of the key areas that's better/faster is indexing, but there are performance improvements for querying too. Solr and Lucene have evolved considerably in the four years since Solr 1.4 (using Lucene 2.9) was released. 2. Currently we are using embedded solr, I have an idea of moving to standalone server. What is best way of using standalone server in our java webapp. The embedded server is still available, although even in 1.4 it was not recommended for anything but a proof of concept. You should simply install one or more standalone servers and access them from your app via http. On a LAN, the overhead introduced by http is minimal. Since you're already using the embedded server, that's simply a matter of changing EmbeddedSolrServer to HttpSolrServer or CloudSolrServer (depending on whether or not you use SolrCloud). You can also remove the solr-* and lucene-* jars from your classpath ... you just need the solrj jar and its standard dependencies, which can be found in the binary download or the compiled source code under dist/solrj-lib. Thanks, Shawn
Re: Wildcard search makes no sense!!
On 10/2/2014 4:33 AM, waynemailinglist wrote: Something that is still not clear in my mind is how this tokenising works. For example with the filters I have when I run the analyser I get: Field: Hello You Hello|You Hello|You Hello|You hello|you hello|you Does this mean that the index is stored as 'hello|you' (the final one) and that when I run a query and it goes through the filters whatever the end result of that is must match the 'hello|you' in order to return a result? The index has two terms for this field if this is the whole input -- hello and you -- which can be searched for individually. The tokenizer does the initial job of separating the input into tokens (terms) ... some filters can create additional terms, depending on exactly what's left when the tokenizer is done. Thanks, Shawn
Re: Master-Slave setup using SolrCloud
On 10/2/2014 6:58 AM, Sachin Kale wrote: We are trying to move our traditional master-slave Solr configuration to SolrCloud. As our index size is very small (around 1 GB), we are having only one shard. So basically, we are having same master-slave configuration with one leader and 6 replicas. We are experimenting with maxTime of both AutoCommit and AutoSoftCommit. Currently, autoCommit maxTime is 15 minutes and autoSoftCommit is 1 minute (Let me know if these values does not make sense). Caches are set such that warmup time is at most 20 seconds. We are having continuous indexing requests mostly for updating the existing documents. Few requests are for deleting/adding the documents. The problem we are facing is that we are getting very frequent NullPointerExceptions. We get continuous 200-300 such exceptions within a period of 30 seconds and for next few minutes, it works fine. Stacktrace of NullPointerException: *ERROR - 2014-10-02 18:09:38.464; org.apache.solr.common.SolrException; null:java.lang.NullPointerException* *at org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:1257)* *at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:720)* *at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:695)* I am not sure what would be causing it. My guess, whenever, it is trying to replay tlog, we are getting these exceptions. Is anything wrong in my configuration? Your automatic commit settings are fine. If you had tried to use a very small maxTime like 1000 (1 second), I would tell you that it's probably too short. The tlogs only get replayed when a core is first started or reloaded. These appear to be errors during queries, having nothing at all to do with indexing. I can't be sure with the available information (no Solr version, incomplete stacktrace, no info about what request caused and received the error), but if I had to guess, I'd say you probably changed your schema so that certain fields are now required that weren't required before, and didn't reindex, so those fields are not present on every document. Or it might be that you added a uniqueKey and didn't reindex, and that field is not present on every document. http://wiki.apache.org/solr/HowToReindex Thanks, Shawn
Re: Solr Replication during Tomcat shutdown causes shutdown to hang/fail
On 10/2/2014 7:25 AM, Phil Black-Knight wrote: I was helping to look into this with Nick I think we may have figured out the core of the problem... The problem is easily reproducible by starting replication on the slave and then sending a shutdown command to tomcat (e.g. catalina.sh stop). With a debugger attached, it looks like the fsyncService thread is blocking VM shutdown because it is created as a non-daemon thread. snip I can submit patches if needed... and cross post to the dev mailing list... File a detailed issue in Jira and attach your patch there. This is our bugtracker. You need an account on the Apache jira instance to do this: https://issues.apache.org/jira/browse/SOLR Thanks, Shawn
Re: Wildcard search makes no sense!!
right, prior to 3.6, the standard way to handle wildcards was to, essentially, pre-analyze the terms that had wildcards. This works fine for simple filters, things like lowercasing for instance, but doesn't work so well for things like stemming. So you're doing what can be done at this point, but moving to 4.x (or even 3.6) would solve it better. Best, Erick On Thu, Oct 2, 2014 at 6:29 AM, Shawn Heisey apa...@elyograg.org wrote: On 10/2/2014 4:33 AM, waynemailinglist wrote: Something that is still not clear in my mind is how this tokenising works. For example with the filters I have when I run the analyser I get: Field: Hello You Hello|You Hello|You Hello|You hello|you hello|you Does this mean that the index is stored as 'hello|you' (the final one) and that when I run a query and it goes through the filters whatever the end result of that is must match the 'hello|you' in order to return a result? The index has two terms for this field if this is the whole input -- hello and you -- which can be searched for individually. The tokenizer does the initial job of separating the input into tokens (terms) ... some filters can create additional terms, depending on exactly what's left when the tokenizer is done. Thanks, Shawn
RE: SolrCould read-only replicas
Erick, Thank you for your response. Yup, when I said it is not possible to have a cross continent data center replica, I meant that we never ever want to do that because of the latency. What I was hoping is that I could have Solr cloud in my DataCentre A (DC-A) and get all the benefits of sharding ( scaling/parallel computing) and failover redundancy within the same data center. If I could then have a read-only replica (with no guaranteed consistency of course ) of this entire cloud in my DataCenter B (DC-B), that would make my reads over DC-B faster without making my writes slow. To clarify, all the writes were going to go against DC-A only. The read-only cluster in DC-B could also be made the master in case the entire DC-A went down. The DC-B wouldn't be guaranteed to be in sync with the DB-A master but in my use case I could live with that. Seems like that is no possible out-of-the-box if I am using Solr 4.0+ in the cloud mode. It is either Solr Coud or cross data center read only replica. Can't do both at the same time. I think that is what you confirmed as well. If I have it wrong, please let me know. Also, any thoughts on the most easy way to accomplish the read-only replica of the entire solr cloud cluster? Thanks! Tikoo From: Sandeep Tikoo Sent: Saturday, September 27, 2014 9:43 PM To: 'solr-user@lucene.apache.org' Subject: SolrCould read-only replicas Hi- I have been reading up on SolrCloud and it seems that it is not possible to have a cross-datacenter read-only slave anymore but wanted to ask here to be sure. We currently have a pre Solr 4.0 installation with the master instance in our US mid-west datacenter. The datacenter in Europe has read-replicas which pull data using solr.ReplicationHandler. We wanted to upgrade to SolrCloud. As far as I have been able to figure out, with SolrCloud you cannot have a read-only replica anymore. A replica has to be able to become a leader and writes against all replicas for a shard have to succeed. Because of the a strong consistency model across replicas, it seems that replicas cannot be across datacenters anymore. So my question is, how can we have a read-ony replica in a remote datacenter in Solr 4.0+ similar to pre Solr 4.0? Is it not possible anymore without doing it all yourself? cheers, Tikoo
Re: Solr Replication during Tomcat shutdown causes shutdown to hang/fail
see the ticket here: https://issues.apache.org/jira/browse/SOLR-6579 including a patch to fix it. On Thu, Oct 2, 2014 at 9:44 AM, Shawn Heisey apa...@elyograg.org wrote: On 10/2/2014 7:25 AM, Phil Black-Knight wrote: I was helping to look into this with Nick I think we may have figured out the core of the problem... The problem is easily reproducible by starting replication on the slave and then sending a shutdown command to tomcat (e.g. catalina.sh stop). With a debugger attached, it looks like the fsyncService thread is blocking VM shutdown because it is created as a non-daemon thread. snip I can submit patches if needed... and cross post to the dev mailing list... File a detailed issue in Jira and attach your patch there. This is our bugtracker. You need an account on the Apache jira instance to do this: https://issues.apache.org/jira/browse/SOLR Thanks, Shawn
DIH - cacheImpl=SortedMapBackedCache - empty rows from sub entity
Hello i am fighting with cacheImpl=SortedMapBackedCache. I want to refactor my ugly entities and so i try out sub-entities with caching. My Problem is that my cached subquery do not return any values from the select. but why? thats my entity entity name=en1 pk=id transformer=DateFormatTransformer query=SELECT id, product FROM table WHERE product = 'abc' entity name=en2 pk=id transformer=DateFormatTransformer cacheImpl=SortedMapBackedCache query= SELECT id, code FROM table2 where=id = '${en1.id}'/ /entity this is very fast an clear and nice... but it does not work. all from table2 is not coming to my index =( BUT if i remove the line with cacheImpl=SortedMapBackedCache all data is present, but every row is selecte each by each. i thought that this construct, hopefully replace my ugly big join-query in a single entity!? -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-cacheImpl-SortedMapBackedCache-empty-rows-from-sub-entity-tp4162316.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Master-Slave setup using SolrCloud
If I look into the logs, many times I get only following line without any stacktrace: *ERROR - 2014-10-02 19:35:25.516; org.apache.solr.common.SolrException; java.lang.NullPointerException* These exceptions are not coming continuously. Once in every 10-15 minutes. But once it starts, there are continuous 800-1000 such exceptions one after another. Is it related to cache warmup? I can provide following information regarding the setup: We are now on using Solr 4.10.0 Memory allocated to each SOLR instance is 7GB. I guess it is more than sufficient for 1 GB index, right? Indexes are stored as normal, local filesystem. I am using three caches: Query Cache: Size 4096, autoWarmCount 2048 Filter cache: size 8192, autoWarmCount 4096 Document cache: size 4096 I am experimenting with commitMaxTime for both soft and hard commits After referring following: http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Hence, I set following: autoCommit maxTime${solr.autoCommit.maxTime:6}/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:90}/maxTime /autoSoftCommit Also, we are getting following warnings many times: *java.lang.NumberFormatException: For input string: 5193.0* Earlier we were on SOLR 4.4.0 and when we are upgraded to 4.10.0, we pointed it to the same index we were using for 4.4.0 On Thu, Oct 2, 2014 at 7:11 PM, Shawn Heisey apa...@elyograg.org wrote: On 10/2/2014 6:58 AM, Sachin Kale wrote: We are trying to move our traditional master-slave Solr configuration to SolrCloud. As our index size is very small (around 1 GB), we are having only one shard. So basically, we are having same master-slave configuration with one leader and 6 replicas. We are experimenting with maxTime of both AutoCommit and AutoSoftCommit. Currently, autoCommit maxTime is 15 minutes and autoSoftCommit is 1 minute (Let me know if these values does not make sense). Caches are set such that warmup time is at most 20 seconds. We are having continuous indexing requests mostly for updating the existing documents. Few requests are for deleting/adding the documents. The problem we are facing is that we are getting very frequent NullPointerExceptions. We get continuous 200-300 such exceptions within a period of 30 seconds and for next few minutes, it works fine. Stacktrace of NullPointerException: *ERROR - 2014-10-02 18:09:38.464; org.apache.solr.common.SolrException; null:java.lang.NullPointerException* *at org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:1257)* *at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:720)* *at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:695)* I am not sure what would be causing it. My guess, whenever, it is trying to replay tlog, we are getting these exceptions. Is anything wrong in my configuration? Your automatic commit settings are fine. If you had tried to use a very small maxTime like 1000 (1 second), I would tell you that it's probably too short. The tlogs only get replayed when a core is first started or reloaded. These appear to be errors during queries, having nothing at all to do with indexing. I can't be sure with the available information (no Solr version, incomplete stacktrace, no info about what request caused and received the error), but if I had to guess, I'd say you probably changed your schema so that certain fields are now required that weren't required before, and didn't reindex, so those fields are not present on every document. Or it might be that you added a uniqueKey and didn't reindex, and that field is not present on every document. http://wiki.apache.org/solr/HowToReindex Thanks, Shawn
RE: DIH - cacheImpl=SortedMapBackedCache - empty rows from sub entity
Try using the cacheKey/cacheLookup parameters instead: entity name=en1 pk=id transformer=DateFormatTransformer query=SELECT id, product FROM table WHERE product = 'abc' entity name=en2 cacheKey=id cacheLookup=en1.id transformer=DateFormatTransformer cacheImpl=SortedMapBackedCache query=SELECT id, code FROM table2 / /entity James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: stockii [mailto:stock.jo...@googlemail.com] Sent: Thursday, October 02, 2014 9:19 AM To: solr-user@lucene.apache.org Subject: DIH - cacheImpl=SortedMapBackedCache - empty rows from sub entity Hello i am fighting with cacheImpl=SortedMapBackedCache. I want to refactor my ugly entities and so i try out sub-entities with caching. My Problem is that my cached subquery do not return any values from the select. but why? thats my entity entity name=en1 pk=id transformer=DateFormatTransformer query=SELECT id, product FROM table WHERE product = 'abc' entity name=en2 pk=id transformer=DateFormatTransformer cacheImpl=SortedMapBackedCache query= SELECT id, code FROM table2 where=id = '${en1.id}'/ /entity this is very fast an clear and nice... but it does not work. all from table2 is not coming to my index =( BUT if i remove the line with cacheImpl=SortedMapBackedCache all data is present, but every row is selecte each by each. i thought that this construct, hopefully replace my ugly big join-query in a single entity!? -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-cacheImpl-SortedMapBackedCache-empty-rows-from-sub-entity-tp4162316.html Sent from the Solr - User mailing list archive at Nabble.com.
Upgrade from solr 4.4 to 4.10.1
I need to upgrade from Solr 4.4 to version 4.10.1 and am not sure if I need to reindex. The following from http://wiki.apache.org/solr/Solr4.0 leads me to believe I don't: The guarantee for this alpha release is that the index format will be the 4.0 index format, supported through the 5.x series of Lucene/Solr, unless there is a critical bug (e.g. that would cause index corruption) that would prevent this. I've been looking through the change logs and news and the following from http://lucene.apache.org/solr/solrnews.html makes me think that maybe I do need to reindex: Solr 4.6 Release Highlights: ... New default index format: Lucene46Codec ... It will not be an easy task to reindex the files so I am hoping the answer is that it is not necessary. Thanks for any advice, Grainne -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-from-solr-4-4-to-4-10-1-tp4162340.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCould read-only replicas
Here is a different approach. Set up independent Solr Cloud clusters in each data center. Send all updates into a persistent message queue (Amazon SQS, whatever) and have each cluster get updates from the queue. The two clusters are both live and configured identically, so there is nothing to change in a failover. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Oct 2, 2014, at 7:07 AM, Sandeep Tikoo sti...@digitalriver.com wrote: Erick, Thank you for your response. Yup, when I said it is not possible to have a cross continent data center replica, I meant that we never ever want to do that because of the latency. What I was hoping is that I could have Solr cloud in my DataCentre A (DC-A) and get all the benefits of sharding ( scaling/parallel computing) and failover redundancy within the same data center. If I could then have a read-only replica (with no guaranteed consistency of course ) of this entire cloud in my DataCenter B (DC-B), that would make my reads over DC-B faster without making my writes slow. To clarify, all the writes were going to go against DC-A only. The read-only cluster in DC-B could also be made the master in case the entire DC-A went down. The DC-B wouldn't be guaranteed to be in sync with the DB-A master but in my use case I could live with that. Seems like that is no possible out-of-the-box if I am using Solr 4.0+ in the cloud mode. It is either Solr Coud or cross data center read only replica. Can't do both at the same time. I think that is what you confirmed as well. If I have it wrong, please let me know. Also, any thoughts on the most easy way to accomplish the read-only replica of the entire solr cloud cluster? Thanks! Tikoo From: Sandeep Tikoo Sent: Saturday, September 27, 2014 9:43 PM To: 'solr-user@lucene.apache.org' Subject: SolrCould read-only replicas Hi- I have been reading up on SolrCloud and it seems that it is not possible to have a cross-datacenter read-only slave anymore but wanted to ask here to be sure. We currently have a pre Solr 4.0 installation with the master instance in our US mid-west datacenter. The datacenter in Europe has read-replicas which pull data using solr.ReplicationHandler. We wanted to upgrade to SolrCloud. As far as I have been able to figure out, with SolrCloud you cannot have a read-only replica anymore. A replica has to be able to become a leader and writes against all replicas for a shard have to succeed. Because of the a strong consistency model across replicas, it seems that replicas cannot be across datacenters anymore. So my question is, how can we have a read-ony replica in a remote datacenter in Solr 4.0+ similar to pre Solr 4.0? Is it not possible anymore without doing it all yourself? cheers, Tikoo
Re: Upgrade from solr 4.4 to 4.10.1
You should of course perform a test first to be sure, but you shouldn't need to reindex. Running an optimize on your cores or collections will upgrade them to the new format, or you could use Lucene's IndexUpgrader tool. In the meantime, bringing up your data in 4.10.1 will work, it just won't take advantage of some of the file format improvements. However, it is somewhat of a design smell that you can't reindex. In my experience, it is extremely valuable to be able to reindex your data at will. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Thu, Oct 2, 2014 at 12:06 PM, Grainne grainne_rei...@harvard.edu wrote: I need to upgrade from Solr 4.4 to version 4.10.1 and am not sure if I need to reindex. The following from http://wiki.apache.org/solr/Solr4.0 leads me to believe I don't: The guarantee for this alpha release is that the index format will be the 4.0 index format, supported through the 5.x series of Lucene/Solr, unless there is a critical bug (e.g. that would cause index corruption) that would prevent this. I've been looking through the change logs and news and the following from http://lucene.apache.org/solr/solrnews.html makes me think that maybe I do need to reindex: Solr 4.6 Release Highlights: ... New default index format: Lucene46Codec ... It will not be an easy task to reindex the files so I am hoping the answer is that it is not necessary. Thanks for any advice, Grainne -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-from-solr-4-4-to-4-10-1-tp4162340.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search makes no sense!!
Ok I think I understand your points there. Just clarify say if the term was Large increased and my filters went something like: Large|increased Large|increase|increased large|increase|increased the final tokens indexed would be large|increase|increased ? Once again thanks for all the help. On Thu, Oct 2, 2014 at 2:30 PM, Shawn Heisey-2 [via Lucene] ml-node+s472066n4162306...@n3.nabble.com wrote: On 10/2/2014 4:33 AM, waynemailinglist wrote: Something that is still not clear in my mind is how this tokenising works. For example with the filters I have when I run the analyser I get: Field: Hello You Hello|You Hello|You Hello|You hello|you hello|you Does this mean that the index is stored as 'hello|you' (the final one) and that when I run a query and it goes through the filters whatever the end result of that is must match the 'hello|you' in order to return a result? The index has two terms for this field if this is the whole input -- hello and you -- which can be searched for individually. The tokenizer does the initial job of separating the input into tokens (terms) ... some filters can create additional terms, depending on exactly what's left when the tokenizer is done. Thanks, Shawn -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162306.html To unsubscribe from Wildcard search makes no sense!!, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4162069code=d2F5bmVtYWlsaW5nbGlzdHNAZ21haWwuY29tfDQxNjIwNjl8LTIxOTMxNzkyNQ== . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162349.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr + Federated Search Question
Ahmet,Jeff, Thanks. Some terms are a bit overloaded. By federated, I do mean the ability to query multiple, disparate, repositories. So, no. All of my data would not necessarily be in Solr. Solr would be one of several - databases, filesystems, document stores, etc... that I would like to plug-in. The content in each repository would be of different types (the shape/schema of the content would differ significantly). Thanks, Alejandro On Wed, Oct 1, 2014 at 9:47 AM, Jack Krupansky j...@basetechnology.com wrote: Alejandro, you'll have to clarify how you are using the term federated search. I mean, technically Ahmet is correct in that Solr queries can be fanned out to shards and the results from each shard aggregated (federated) into a single result list, but... more traditionally, federated refers to disparate databases or search engines. See: http://en.wikipedia.org/wiki/Federated_search So, please tell us a little more about what you are really trying to do. I mean, is all of your data in Solr, in multiple collections, or on multiple Solr servers, or... is only some of your data in Solr and some is in other search engines? Another approach taken with Solr is that indeed all of your source data may be in disparate databases, but you perform an ETL (Extract, Transform, and Load) process to ingest all of that data into Solr and then simply directly search the data within Solr. -- Jack Krupansky -Original Message- From: Ahmet Arslan Sent: Wednesday, October 1, 2014 9:35 AM To: solr-user@lucene.apache.org Subject: Re: Solr + Federated Search Question Hi, Federation is possible. Solr has distributed search support with shards parameter. Ahmet On Wednesday, October 1, 2014 4:29 PM, Alejandro Calbazana acalbaz...@gmail.com wrote: Hello, I have a general question about Solr in a federated search context. I understand that Solr does not do federated search and that different tools are often used to incorporate Solr indexes into a federated/enterprise search solution. Does anyone have recommendations on any products (open source or otherwise) that addresses this space? Thanks, Alejandro
Re: Solr + Federated Search Question
Alexandre, Thanks. I will have a look. Alejandro On Wed, Oct 1, 2014 at 3:03 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: http://project.carrot2.org/ is worth having a look at. It supports Solr well. In fact, a subset of it is shipped with Solr Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 1 October 2014 09:29, Alejandro Calbazana acalbaz...@gmail.com wrote: Hello, I have a general question about Solr in a federated search context. I understand that Solr does not do federated search and that different tools are often used to incorporate Solr indexes into a federated/enterprise search solution. Does anyone have recommendations on any products (open source or otherwise) that addresses this space? Thanks, Alejandro
Re: Upgrade from solr 4.4 to 4.10.1
Hi Michael, Thanks for the quick response. Running optimize on the index sounds like a good idea. Do you know if that is possible from the command line? I agree it is an omission to not be easily able to reindex files and that is a story I need to prioritize. Thanks again, Grainne -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-from-solr-4-4-to-4-10-1-tp4162340p4162359.html Sent from the Solr - User mailing list archive at Nabble.com.
Export feature issue in Solr 4.10
Hi All, I'm trying to use Solr 4.10 export feature, but I'm getting an error. Maybe I missed something. Here's the scenario: 1. Download Solr 4.10.0 2. Use collection1 schema out of the box 3. Add docValues=true to price and pages fields in schema.xml 4. Index books.json using command line: curl http://localhost:8984/solr/collection1/update -H Content-Type: text/json --data-binary @example/exampledocs/books.json 5. Try running this query: http://localhost:8984/solr/collection1/export?q=*:*sort=price%20ascfl=price 6. Here's the error I get: java.lang.IllegalArgumentException: docID must be = 0 and maxDoc=4 (got docID=4) at org.apache.lucene.index.BaseCompositeReader.readerIndex(BaseCompositeReader.java:182) at org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:109) at org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:700) at org.apache.solr.util.SolrPluginUtils.optimizePreFetchDocs(SolrPluginUtils.java:213) at org.apache.solr.handler.component.QueryComponent.doPrefetch(QueryComponent.java:623) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:507) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967) ... Any ideas what could be missing? Thanks, A. Adel
Load existing index to solrCloud?
Hi , I have an index created by Lucen way back. I just set up a solrCloud, I don't want to reindex my documents again since I already have the index, how do I load the existing index into solrCloud empty new collection? I am able to load it into a solr instance but not sure how to load it correctly into a solrCloud so that the index will get redistributed. Thanks in advanced! -- View this message in context: http://lucene.472066.n3.nabble.com/Load-existing-index-to-solrCloud-tp4162362.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Upgrade from solr 4.4 to 4.10.1
Yes, you can just do something like curl http://mysolrserver:mysolrport/solr/mycollectionname/update?optimize=true;. You should expect heavy disk activity while this completes. I wouldn't do more than one collection at a time. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Thu, Oct 2, 2014 at 12:55 PM, Grainne grainne_rei...@harvard.edu wrote: Hi Michael, Thanks for the quick response. Running optimize on the index sounds like a good idea. Do you know if that is possible from the command line? I agree it is an omission to not be easily able to reindex files and that is a story I need to prioritize. Thanks again, Grainne -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-from-solr-4-4-to-4-10-1-tp4162340p4162359.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr + Federated Search Question
Hi Alejandro, So your example is better called as metasearch. Here a quotation from a book. Instead of retrieving information from a single information source using one search engine, one can utilize multiple search engines or a single search engine retrieving documents from a plethora of document collections. A scenario where multiple engines are used is known as metasearch, while the scenario where a single engine retrieves from multiple collections is known as federation. In both these scenarios, the final result of the retrieval effort needs to be a single, unified ranking of documents, based on several ranked lists. Ahmet On Thursday, October 2, 2014 7:29 PM, Alejandro Calbazana acalbaz...@gmail.com wrote: Ahmet,Jeff, Thanks. Some terms are a bit overloaded. By federated, I do mean the ability to query multiple, disparate, repositories. So, no. All of my data would not necessarily be in Solr. Solr would be one of several - databases, filesystems, document stores, etc... that I would like to plug-in. The content in each repository would be of different types (the shape/schema of the content would differ significantly). Thanks, Alejandro On Wed, Oct 1, 2014 at 9:47 AM, Jack Krupansky j...@basetechnology.com wrote: Alejandro, you'll have to clarify how you are using the term federated search. I mean, technically Ahmet is correct in that Solr queries can be fanned out to shards and the results from each shard aggregated (federated) into a single result list, but... more traditionally, federated refers to disparate databases or search engines. See: http://en.wikipedia.org/wiki/Federated_search So, please tell us a little more about what you are really trying to do. I mean, is all of your data in Solr, in multiple collections, or on multiple Solr servers, or... is only some of your data in Solr and some is in other search engines? Another approach taken with Solr is that indeed all of your source data may be in disparate databases, but you perform an ETL (Extract, Transform, and Load) process to ingest all of that data into Solr and then simply directly search the data within Solr. -- Jack Krupansky -Original Message- From: Ahmet Arslan Sent: Wednesday, October 1, 2014 9:35 AM To: solr-user@lucene.apache.org Subject: Re: Solr + Federated Search Question Hi, Federation is possible. Solr has distributed search support with shards parameter. Ahmet On Wednesday, October 1, 2014 4:29 PM, Alejandro Calbazana acalbaz...@gmail.com wrote: Hello, I have a general question about Solr in a federated search context. I understand that Solr does not do federated search and that different tools are often used to incorporate Solr indexes into a federated/enterprise search solution. Does anyone have recommendations on any products (open source or otherwise) that addresses this space? Thanks, Alejandro
Does Solr handle an sshfs mounted index
I am currently running Solr 4.4.0 on RHEL 6. The index used to be mounted via nfs and it all worked perfectly fine. For security reasons we switched the index to be sshfs mounted - and this seems to cause solr to fail after a while. If we switch back to nfs it works again. The behavior is strange - Solr starts up and issues an error: ... Oct 02, 2014 11:43:00 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher ... Caused by: java.io.FileNotFoundException: /path/to/collection/data/index/_10_Lucene41_0.tim (Operation not permitted) ... While Solr is running, if, as the same user, I look at the mounted path I get the same behavior: -bash-4.1$ ls /mounted/filesystem/path ls: reading directory /mounted/filesystem/path: Operation not permitted When I shut down Solr it behaves as expected and I get the file listing. The file is there and Several of us, including unix systems people, are looking at why this might be happening and have yet to figure it out. Does anyone know if it possible to run Solr where the index is mounted via sshfs? Thanks for any advice, Grainne -- View this message in context: http://lucene.472066.n3.nabble.com/Does-Solr-handle-an-sshfs-mounted-index-tp4162375.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr + Federated Search Question
Thanks Ahmet. Yay! New term :) Although it does look like federated and metasearch can be used interchangeably. Alejandro On Thu, Oct 2, 2014 at 2:37 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Alejandro, So your example is better called as metasearch. Here a quotation from a book. Instead of retrieving information from a single information source using one search engine, one can utilize multiple search engines or a single search engine retrieving documents from a plethora of document collections. A scenario where multiple engines are used is known as metasearch, while the scenario where a single engine retrieves from multiple collections is known as federation. In both these scenarios, the final result of the retrieval effort needs to be a single, unified ranking of documents, based on several ranked lists. Ahmet On Thursday, October 2, 2014 7:29 PM, Alejandro Calbazana acalbaz...@gmail.com wrote: Ahmet,Jeff, Thanks. Some terms are a bit overloaded. By federated, I do mean the ability to query multiple, disparate, repositories. So, no. All of my data would not necessarily be in Solr. Solr would be one of several - databases, filesystems, document stores, etc... that I would like to plug-in. The content in each repository would be of different types (the shape/schema of the content would differ significantly). Thanks, Alejandro On Wed, Oct 1, 2014 at 9:47 AM, Jack Krupansky j...@basetechnology.com wrote: Alejandro, you'll have to clarify how you are using the term federated search. I mean, technically Ahmet is correct in that Solr queries can be fanned out to shards and the results from each shard aggregated (federated) into a single result list, but... more traditionally, federated refers to disparate databases or search engines. See: http://en.wikipedia.org/wiki/Federated_search So, please tell us a little more about what you are really trying to do. I mean, is all of your data in Solr, in multiple collections, or on multiple Solr servers, or... is only some of your data in Solr and some is in other search engines? Another approach taken with Solr is that indeed all of your source data may be in disparate databases, but you perform an ETL (Extract, Transform, and Load) process to ingest all of that data into Solr and then simply directly search the data within Solr. -- Jack Krupansky -Original Message- From: Ahmet Arslan Sent: Wednesday, October 1, 2014 9:35 AM To: solr-user@lucene.apache.org Subject: Re: Solr + Federated Search Question Hi, Federation is possible. Solr has distributed search support with shards parameter. Ahmet On Wednesday, October 1, 2014 4:29 PM, Alejandro Calbazana acalbaz...@gmail.com wrote: Hello, I have a general question about Solr in a federated search context. I understand that Solr does not do federated search and that different tools are often used to incorporate Solr indexes into a federated/enterprise search solution. Does anyone have recommendations on any products (open source or otherwise) that addresses this space? Thanks, Alejandro
RE: Upgrade from solr 4.4 to 4.10.1
Michael Della Bitta [michael.della.bi...@appinions.com] wrote: You should of course perform a test first to be sure, but you shouldn't need to reindex. One gotcha is that support for DocValuesFormat=Disk was removed in Solr 4.9, so it simply can't open an index using that format. Fortunately it can be handled by changing format in the schema and performing an optimize using the old Solr version. How the performance/memory-trade-off of the Disk-format falls under critical bug and thus is reason enough to break backwards compatibility, I don't know. - Toke Eskildsen
Re: Does Solr handle an sshfs mounted index
Grainne, I would recommend that you do not do this. In fact, I would recommend you not use NFS as well, although that’s more likely to work, just not ideally. Solr’s going to do best when it’s working with fast, local storage that the OS can cache natively. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions | g+: plus.google.com/appinions w: appinions.com On Oct 2, 2014, at 14:44, Grainne grainne_rei...@harvard.edu wrote: I am currently running Solr 4.4.0 on RHEL 6. The index used to be mounted via nfs and it all worked perfectly fine. For security reasons we switched the index to be sshfs mounted - and this seems to cause solr to fail after a while. If we switch back to nfs it works again. The behavior is strange - Solr starts up and issues an error: ... Oct 02, 2014 11:43:00 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher ... Caused by: java.io.FileNotFoundException: /path/to/collection/data/index/_10_Lucene41_0.tim (Operation not permitted) ... While Solr is running, if, as the same user, I look at the mounted path I get the same behavior: -bash-4.1$ ls /mounted/filesystem/path ls: reading directory /mounted/filesystem/path: Operation not permitted When I shut down Solr it behaves as expected and I get the file listing. The file is there and Several of us, including unix systems people, are looking at why this might be happening and have yet to figure it out. Does anyone know if it possible to run Solr where the index is mounted via sshfs? Thanks for any advice, Grainne -- View this message in context: http://lucene.472066.n3.nabble.com/Does-Solr-handle-an-sshfs-mounted-index-tp4162375.html Sent from the Solr - User mailing list archive at Nabble.com.
Silent ping request logging
Hi, The ping request log as below generates too much noise in my solr log: INFO: [main0] webapp=/solr path=/admin/ping params={} status=0 QTime=0 I don't want to change the global logging level to eliminate this. Instead, I wonder if there is a way to change the logging level just for such ping requests in the code. If so, which class I should look into? Your help will be really appreciated. -- Best, Junyang
Re: Export feature issue in Solr 4.10
Yep getting the same error. Investigating... Joel Bernstein Search Engineer at Heliosearch On Thu, Oct 2, 2014 at 12:59 PM, Ahmed Adel ahmed.a...@badrit.com wrote: Hi All, I'm trying to use Solr 4.10 export feature, but I'm getting an error. Maybe I missed something. Here's the scenario: 1. Download Solr 4.10.0 2. Use collection1 schema out of the box 3. Add docValues=true to price and pages fields in schema.xml 4. Index books.json using command line: curl http://localhost:8984/solr/collection1/update -H Content-Type: text/json --data-binary @example/exampledocs/books.json 5. Try running this query: http://localhost:8984/solr/collection1/export?q=*:*sort=price%20ascfl=price 6. Here's the error I get: java.lang.IllegalArgumentException: docID must be = 0 and maxDoc=4 (got docID=4) at org.apache.lucene.index.BaseCompositeReader.readerIndex(BaseCompositeReader.java:182) at org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:109) at org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:700) at org.apache.solr.util.SolrPluginUtils.optimizePreFetchDocs(SolrPluginUtils.java:213) at org.apache.solr.handler.component.QueryComponent.doPrefetch(QueryComponent.java:623) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:507) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967) ... Any ideas what could be missing? Thanks, A. Adel
Re: Silent ping request logging
Please allow me to re-phrase my question a bit: I want to eliminate the dummy ping request, not the ones that are generated by concrete queries. On Thu, Oct 2, 2014 at 4:10 PM, Junyang Xin xinj...@gmail.com wrote: Hi, The ping request log as below generates too much noise in my solr log: INFO: [main0] webapp=/solr path=/admin/ping params={} status=0 QTime=0 I don't want to change the global logging level to eliminate this. Instead, I wonder if there is a way to change the logging level just for such ping requests in the code. If so, which class I should look into? Your help will be really appreciated. -- Best, Junyang -- Best, Junyang
Re: Export feature issue in Solr 4.10
There is bug in how the export handler is working when you have very few documents in the index and the solrconfig.xml is configured to enable lazy document loading: enableLazyFieldLoadingtrue/enableLazyFieldLoading The tests didn't catch this because lazy loading was set to the default which is false in the tests. The manual testing I did, didn't catch this because I tested with a large number of documents in the index. Your example will work if you change: enableLazyFieldLoadingfalse/enableLazyFieldLoading And if you load a typical index with lots of documents you should have no problems running with lazy loading enabled. I'll create jira to fix this issue. Joel Bernstein Search Engineer at Heliosearch On Thu, Oct 2, 2014 at 4:10 PM, Joel Bernstein joels...@gmail.com wrote: Yep getting the same error. Investigating... Joel Bernstein Search Engineer at Heliosearch On Thu, Oct 2, 2014 at 12:59 PM, Ahmed Adel ahmed.a...@badrit.com wrote: Hi All, I'm trying to use Solr 4.10 export feature, but I'm getting an error. Maybe I missed something. Here's the scenario: 1. Download Solr 4.10.0 2. Use collection1 schema out of the box 3. Add docValues=true to price and pages fields in schema.xml 4. Index books.json using command line: curl http://localhost:8984/solr/collection1/update -H Content-Type: text/json --data-binary @example/exampledocs/books.json 5. Try running this query: http://localhost:8984/solr/collection1/export?q=*:*sort=price%20ascfl=price 6. Here's the error I get: java.lang.IllegalArgumentException: docID must be = 0 and maxDoc=4 (got docID=4) at org.apache.lucene.index.BaseCompositeReader.readerIndex(BaseCompositeReader.java:182) at org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:109) at org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:700) at org.apache.solr.util.SolrPluginUtils.optimizePreFetchDocs(SolrPluginUtils.java:213) at org.apache.solr.handler.component.QueryComponent.doPrefetch(QueryComponent.java:623) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:507) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967) ... Any ideas what could be missing? Thanks, A. Adel
SolrCloud 4.7 not doing distributed search when querying from a load balancer.
Hi All, I am trying to query a 6 node Solr4.7 cluster with 3 shards and a replication factor of 2 . I have fronted these 6 Solr nodes using a load balancer , what I notice is that every time I do a search of the form q=*:*fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf) it gives me a result only once in every 3 tries , telling me that the load balancer is distributing the requests between the 3 shards and SolrCloud only returns a result if the request goes to the core that as that id . However if I do a simple search like q=*:* , I consistently get the right aggregated results back of all the documents across all the shards for every request from the load balancer. Can someone please let me know what this is symptomatic of ? Somehow Solr Cloud seems to be doing search query distribution and aggregation for queries of type *:* only. Thanks.
Regarding Default Scoring For Solr
If i add this to the end of my query string I get a score back. fl=*,score Is this the default score? I did read some info on scoring and it is detailed and granular and conceptual but because of limited time I can't go into the how's at the moment of the score calculation. Are the links below a good start as to the default calculation or can it be put any more into a tutorial fashion http://www.lucenetutorial.com/advanced-topics/scoring.html http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html -- View this message in context: http://lucene.472066.n3.nabble.com/Regarding-Default-Scoring-For-Solr-tp4162411.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.
Hmmm, nothing quite makes sense here Here are some experiments: 1 avoid the load balancer and issue queries like http://solr_server:8983/solr/collection/q=whateverdistrib=false the distrib=false bit will cause keep SolrCloud from trying to send the queries anywhere, they'll be served only from the node you address them to. that'll help check whether the nodes are consistent. You should be getting back the same results from each replica in a shard (i.e. 2 of your 6 machines). Next, try your failing query the same way. Next, try your failing query from a browser, pointing it at successive nodes. Where is the first place problems show up? My _guess_ is that your load balancer isn't quite doing what you think, or your cluster isn't set up the way you think it is, but those are guesses. Best, Erick On Thu, Oct 2, 2014 at 2:51 PM, S.L simpleliving...@gmail.com wrote: Hi All, I am trying to query a 6 node Solr4.7 cluster with 3 shards and a replication factor of 2 . I have fronted these 6 Solr nodes using a load balancer , what I notice is that every time I do a search of the form q=*:*fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf) it gives me a result only once in every 3 tries , telling me that the load balancer is distributing the requests between the 3 shards and SolrCloud only returns a result if the request goes to the core that as that id . However if I do a simple search like q=*:* , I consistently get the right aggregated results back of all the documents across all the shards for every request from the load balancer. Can someone please let me know what this is symptomatic of ? Somehow Solr Cloud seems to be doing search query distribution and aggregation for queries of type *:* only. Thanks.
Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.
Erick, Thanks for your reply, I tried your suggestions. 1 . When not using loadbalancer if *I have distrib=false* I get consistent results across the replicas. 2. However here's the insteresting part , while not using load balancer if I *dont have distrib=false* , then when I query a particular node ,I get the same behaviour as if I were using a loadbalancer , meaning the distributed search from a node works intermittently .Does this give any clue ? On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, nothing quite makes sense here Here are some experiments: 1 avoid the load balancer and issue queries like http://solr_server:8983/solr/collection/q=whateverdistrib=false the distrib=false bit will cause keep SolrCloud from trying to send the queries anywhere, they'll be served only from the node you address them to. that'll help check whether the nodes are consistent. You should be getting back the same results from each replica in a shard (i.e. 2 of your 6 machines). Next, try your failing query the same way. Next, try your failing query from a browser, pointing it at successive nodes. Where is the first place problems show up? My _guess_ is that your load balancer isn't quite doing what you think, or your cluster isn't set up the way you think it is, but those are guesses. Best, Erick On Thu, Oct 2, 2014 at 2:51 PM, S.L simpleliving...@gmail.com wrote: Hi All, I am trying to query a 6 node Solr4.7 cluster with 3 shards and a replication factor of 2 . I have fronted these 6 Solr nodes using a load balancer , what I notice is that every time I do a search of the form q=*:*fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf) it gives me a result only once in every 3 tries , telling me that the load balancer is distributing the requests between the 3 shards and SolrCloud only returns a result if the request goes to the core that as that id . However if I do a simple search like q=*:* , I consistently get the right aggregated results back of all the documents across all the shards for every request from the load balancer. Can someone please let me know what this is symptomatic of ? Somehow Solr Cloud seems to be doing search query distribution and aggregation for queries of type *:* only. Thanks.
Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.
Erick, I would like to add that the interesting behavior i.e point #2 that I mentioned in my earlier reply happens in all the shards , if this were to be a distributed search issue this should have not manifested itself in the shard that contains the key that I am searching for , looks like the search is just failing as whole intermittently . Also ,the collection is being actively indexed as I query this, could that be an issue too ? Thanks. On Thu, Oct 2, 2014 at 10:24 PM, S.L simpleliving...@gmail.com wrote: Erick, Thanks for your reply, I tried your suggestions. 1 . When not using loadbalancer if *I have distrib=false* I get consistent results across the replicas. 2. However here's the insteresting part , while not using load balancer if I *dont have distrib=false* , then when I query a particular node ,I get the same behaviour as if I were using a loadbalancer , meaning the distributed search from a node works intermittently .Does this give any clue ? On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, nothing quite makes sense here Here are some experiments: 1 avoid the load balancer and issue queries like http://solr_server:8983/solr/collection/q=whateverdistrib=false the distrib=false bit will cause keep SolrCloud from trying to send the queries anywhere, they'll be served only from the node you address them to. that'll help check whether the nodes are consistent. You should be getting back the same results from each replica in a shard (i.e. 2 of your 6 machines). Next, try your failing query the same way. Next, try your failing query from a browser, pointing it at successive nodes. Where is the first place problems show up? My _guess_ is that your load balancer isn't quite doing what you think, or your cluster isn't set up the way you think it is, but those are guesses. Best, Erick On Thu, Oct 2, 2014 at 2:51 PM, S.L simpleliving...@gmail.com wrote: Hi All, I am trying to query a 6 node Solr4.7 cluster with 3 shards and a replication factor of 2 . I have fronted these 6 Solr nodes using a load balancer , what I notice is that every time I do a search of the form q=*:*fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf) it gives me a result only once in every 3 tries , telling me that the load balancer is distributing the requests between the 3 shards and SolrCloud only returns a result if the request goes to the core that as that id . However if I do a simple search like q=*:* , I consistently get the right aggregated results back of all the documents across all the shards for every request from the load balancer. Can someone please let me know what this is symptomatic of ? Somehow Solr Cloud seems to be doing search query distribution and aggregation for queries of type *:* only. Thanks.
Boosting Top selling items
I have been working to try and identify top selling items in an eCommerce app and boost those in the results. The struggle I am having is that our catalog stores products and parts in the same taxonomy. Since parts are ordered more frequently when you search for something like TV you see cables and antennas first. My theory is that someone needs to tag products as Top Selling as a facet then use faceted search to avoid an artificial boost which screws up document relevance. Anyone fight with anything similar? Interested in discussing with other eCommerce search developers. Regards, Bob
Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.
bq: Also ,the collection is being actively indexed as I query this, could that be an issue too ? Not if the documents you're searching aren't being added as you search (and all your autocommit intervals have expired). I would turn off indexing for testing, it's just one more variable that can get in the way of understanding this. Do note that if the problem were endemic to Solr, there would probably be a _lot_ more noise out there. So to recap: 0 we can take the load balancer out of the picture all together. 1 when you query each shard individually with distrib=true, every replica in a particular shard returns the same count. 2 when you query without distrib=true you get varying counts. This is very strange and not at all expected. Let's try it again without indexing going on And what do you mean by indexing anyway? How are documents being fed to your system? Best, Erick@PuzzledAsWell On Thu, Oct 2, 2014 at 7:32 PM, S.L simpleliving...@gmail.com wrote: Erick, I would like to add that the interesting behavior i.e point #2 that I mentioned in my earlier reply happens in all the shards , if this were to be a distributed search issue this should have not manifested itself in the shard that contains the key that I am searching for , looks like the search is just failing as whole intermittently . Also ,the collection is being actively indexed as I query this, could that be an issue too ? Thanks. On Thu, Oct 2, 2014 at 10:24 PM, S.L simpleliving...@gmail.com wrote: Erick, Thanks for your reply, I tried your suggestions. 1 . When not using loadbalancer if *I have distrib=false* I get consistent results across the replicas. 2. However here's the insteresting part , while not using load balancer if I *dont have distrib=false* , then when I query a particular node ,I get the same behaviour as if I were using a loadbalancer , meaning the distributed search from a node works intermittently .Does this give any clue ? On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, nothing quite makes sense here Here are some experiments: 1 avoid the load balancer and issue queries like http://solr_server:8983/solr/collection/q=whateverdistrib=false the distrib=false bit will cause keep SolrCloud from trying to send the queries anywhere, they'll be served only from the node you address them to. that'll help check whether the nodes are consistent. You should be getting back the same results from each replica in a shard (i.e. 2 of your 6 machines). Next, try your failing query the same way. Next, try your failing query from a browser, pointing it at successive nodes. Where is the first place problems show up? My _guess_ is that your load balancer isn't quite doing what you think, or your cluster isn't set up the way you think it is, but those are guesses. Best, Erick On Thu, Oct 2, 2014 at 2:51 PM, S.L simpleliving...@gmail.com wrote: Hi All, I am trying to query a 6 node Solr4.7 cluster with 3 shards and a replication factor of 2 . I have fronted these 6 Solr nodes using a load balancer , what I notice is that every time I do a search of the form q=*:*fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf) it gives me a result only once in every 3 tries , telling me that the load balancer is distributing the requests between the 3 shards and SolrCloud only returns a result if the request goes to the core that as that id . However if I do a simple search like q=*:* , I consistently get the right aggregated results back of all the documents across all the shards for every request from the load balancer. Can someone please let me know what this is symptomatic of ? Somehow Solr Cloud seems to be doing search query distribution and aggregation for queries of type *:* only. Thanks.
Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.
Eirck, 0 Load balancer is out of the picture . 1When I query with *distrib=false* , I get consistent results as expected for those shards that dont have the key i.e I dont get the results back for those shards, however I just realized that while *distrib=false* is present in the query for the shard that is supposed to contain the key,only the replica of the shard that has this key returns the result , and the leader does not , looks like replica and the leader do not have the same data and replica seems to contain the key in the query for that shard. 2 By indexing I mean this collection is being populated by a web crawler. So looks like 1 above is pointing to leader and replica being out of synch for atleast one shard. On Thu, Oct 2, 2014 at 11:57 PM, Erick Erickson erickerick...@gmail.com wrote: bq: Also ,the collection is being actively indexed as I query this, could that be an issue too ? Not if the documents you're searching aren't being added as you search (and all your autocommit intervals have expired). I would turn off indexing for testing, it's just one more variable that can get in the way of understanding this. Do note that if the problem were endemic to Solr, there would probably be a _lot_ more noise out there. So to recap: 0 we can take the load balancer out of the picture all together. 1 when you query each shard individually with distrib=true, every replica in a particular shard returns the same count. 2 when you query without distrib=true you get varying counts. This is very strange and not at all expected. Let's try it again without indexing going on And what do you mean by indexing anyway? How are documents being fed to your system? Best, Erick@PuzzledAsWell On Thu, Oct 2, 2014 at 7:32 PM, S.L simpleliving...@gmail.com wrote: Erick, I would like to add that the interesting behavior i.e point #2 that I mentioned in my earlier reply happens in all the shards , if this were to be a distributed search issue this should have not manifested itself in the shard that contains the key that I am searching for , looks like the search is just failing as whole intermittently . Also ,the collection is being actively indexed as I query this, could that be an issue too ? Thanks. On Thu, Oct 2, 2014 at 10:24 PM, S.L simpleliving...@gmail.com wrote: Erick, Thanks for your reply, I tried your suggestions. 1 . When not using loadbalancer if *I have distrib=false* I get consistent results across the replicas. 2. However here's the insteresting part , while not using load balancer if I *dont have distrib=false* , then when I query a particular node ,I get the same behaviour as if I were using a loadbalancer , meaning the distributed search from a node works intermittently .Does this give any clue ? On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, nothing quite makes sense here Here are some experiments: 1 avoid the load balancer and issue queries like http://solr_server:8983/solr/collection/q=whateverdistrib=false the distrib=false bit will cause keep SolrCloud from trying to send the queries anywhere, they'll be served only from the node you address them to. that'll help check whether the nodes are consistent. You should be getting back the same results from each replica in a shard (i.e. 2 of your 6 machines). Next, try your failing query the same way. Next, try your failing query from a browser, pointing it at successive nodes. Where is the first place problems show up? My _guess_ is that your load balancer isn't quite doing what you think, or your cluster isn't set up the way you think it is, but those are guesses. Best, Erick On Thu, Oct 2, 2014 at 2:51 PM, S.L simpleliving...@gmail.com wrote: Hi All, I am trying to query a 6 node Solr4.7 cluster with 3 shards and a replication factor of 2 . I have fronted these 6 Solr nodes using a load balancer , what I notice is that every time I do a search of the form q=*:*fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf) it gives me a result only once in every 3 tries , telling me that the load balancer is distributing the requests between the 3 shards and SolrCloud only returns a result if the request goes to the core that as that id . However if I do a simple search like q=*:* , I consistently get the right aggregated results back of all the documents across all the shards for every request from the load balancer. Can someone please let me know what this is symptomatic of ? Somehow Solr Cloud seems to be doing search query distribution and aggregation for queries of type *:* only. Thanks.
Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.
H. Assuming that you aren't re-indexing the doc you're searching for... Try issuing http://blah blah:8983/solr/collection/update?commit=true. That'll force all the docs to be searchable. Does 1 still hold for the document in question? Because this is exactly backwards of what I'd expect. I'd expect, if anything, the replica (I'm trying to call it the follower when a distinction needs to be made since the leader is a replica too) would be out of sync. This is still a Bad Thing, but the leader gets first crack at indexing thing. bq: only the replica of the shard that has this key returns the result , and the leader does not , Just to be sure we're talking about the same thing. When you say leader, you mean the shard leader, right? The filled-in circle on the graph view from the admin/cloud page. And let's see your soft and hard commit settings please. Best, Erick On Thu, Oct 2, 2014 at 9:48 PM, S.L simpleliving...@gmail.com wrote: Eirck, 0 Load balancer is out of the picture . 1When I query with *distrib=false* , I get consistent results as expected for those shards that dont have the key i.e I dont get the results back for those shards, however I just realized that while *distrib=false* is present in the query for the shard that is supposed to contain the key,only the replica of the shard that has this key returns the result , and the leader does not , looks like replica and the leader do not have the same data and replica seems to contain the key in the query for that shard. 2 By indexing I mean this collection is being populated by a web crawler. So looks like 1 above is pointing to leader and replica being out of synch for atleast one shard. On Thu, Oct 2, 2014 at 11:57 PM, Erick Erickson erickerick...@gmail.com wrote: bq: Also ,the collection is being actively indexed as I query this, could that be an issue too ? Not if the documents you're searching aren't being added as you search (and all your autocommit intervals have expired). I would turn off indexing for testing, it's just one more variable that can get in the way of understanding this. Do note that if the problem were endemic to Solr, there would probably be a _lot_ more noise out there. So to recap: 0 we can take the load balancer out of the picture all together. 1 when you query each shard individually with distrib=true, every replica in a particular shard returns the same count. 2 when you query without distrib=true you get varying counts. This is very strange and not at all expected. Let's try it again without indexing going on And what do you mean by indexing anyway? How are documents being fed to your system? Best, Erick@PuzzledAsWell On Thu, Oct 2, 2014 at 7:32 PM, S.L simpleliving...@gmail.com wrote: Erick, I would like to add that the interesting behavior i.e point #2 that I mentioned in my earlier reply happens in all the shards , if this were to be a distributed search issue this should have not manifested itself in the shard that contains the key that I am searching for , looks like the search is just failing as whole intermittently . Also ,the collection is being actively indexed as I query this, could that be an issue too ? Thanks. On Thu, Oct 2, 2014 at 10:24 PM, S.L simpleliving...@gmail.com wrote: Erick, Thanks for your reply, I tried your suggestions. 1 . When not using loadbalancer if *I have distrib=false* I get consistent results across the replicas. 2. However here's the insteresting part , while not using load balancer if I *dont have distrib=false* , then when I query a particular node ,I get the same behaviour as if I were using a loadbalancer , meaning the distributed search from a node works intermittently .Does this give any clue ? On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, nothing quite makes sense here Here are some experiments: 1 avoid the load balancer and issue queries like http://solr_server:8983/solr/collection/q=whateverdistrib=false the distrib=false bit will cause keep SolrCloud from trying to send the queries anywhere, they'll be served only from the node you address them to. that'll help check whether the nodes are consistent. You should be getting back the same results from each replica in a shard (i.e. 2 of your 6 machines). Next, try your failing query the same way. Next, try your failing query from a browser, pointing it at successive nodes. Where is the first place problems show up? My _guess_ is that your load balancer isn't quite doing what you think, or your cluster isn't set up the way you think it is, but those are guesses. Best, Erick On Thu, Oct 2, 2014 at 2:51 PM, S.L simpleliving...@gmail.com wrote: Hi All, I am trying to query a 6 node Solr4.7 cluster with 3 shards and a replication factor of 2 . I have fronted these 6 Solr
Sorting Joins
Is it possible to join documents and use a field from the from documents to sort the results? For example, I need to search employees and sort on different fields of the company each employee is joined to. What would that query look like? We've looked at various resources but haven't found any concise examples that work. Thanks, Eric
Re: Sorting Joins
Hello, Did you look into https://issues.apache.org/jira/browse/SOLR-6234 ? On Fri, Oct 3, 2014 at 9:30 AM, Eric Katherman e...@knackhq.com wrote: Is it possible to join documents and use a field from the from documents to sort the results? For example, I need to search employees and sort on different fields of the company each employee is joined to. What would that query look like? We've looked at various resources but haven't found any concise examples that work. Thanks, Eric -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com