Re: Fully automated replica creation in AWS
Not sure if this will meet all your needs but you can probably do most of the work using AWS lambda. I haven't used it personally but it is supposed to launch custom code following some events. I guess you could create a small Java class to do the required work following the birth of a new server or the death of the old one. If you can find an event source that provide you the entry point, you'll probably be fine. https://aws.amazon.com/lambda/faqs/ Good luck From: Erick EricksonSent: Wednesday, December 9, 2015 2:19 PM To: solr-user Subject: Re: Fully automated replica creation in AWS Not that I know of. The two systems are somewhat disconnected. AWS doesn't know that Solr lives on those nodes, it's just spinning one up, right? Albeit with Solr running. There's nothing in Solr that auto-detects the existence of a new Solr node and automagically assigns collections and/or replicas. How would either system intuit that this new node is replacing something else and "do the right thing"? I'll tell you how, by interrogating Zookeeper and seeing that for some specific collection, shardX had fewer replicas than other shards and issuing the Collections API ADDREPLICA command. But now there are _three_ systems that need to be coordinated and doing the right thing in your situation would be the wrong thing in another. The last thing many sys ops want is having replicas started without their knowledge. And on top of that, I have doubts about the model. Having AWS elastically spin up a new replica is a heavyweight operation from Solr's perspective. I mean this potentially copies a many G set of index files from one place to another which could take a long time, is that really what's desired here? I have seen some folks spin up/down Solr instances based on a schedule if they know roughly when the peak load will be, but again there's nothing built in to handle this. Best, Erick On Wed, Dec 9, 2015 at 10:15 AM, Ugo Matrangolo wrote: > Hi, > > I was trying to setup a SolrCloud cluster in AWS backed by an ASG (auto > scaling group) serving a replicated collection. I have just came across a > case when one of the Solr node became unresponsive with AWS killing it and > spinning a new one. > > Unfortunately, this new Solr node did not join as a replica of the existing > collection requiring human intervention to configure it as a new replica. > > I was wondering if there is around something that will make this process > fully automated by detecting that a new node just joined the cluster and > instructing it (e.g. via Collections API) to join as a replica of a given > collection. > > Best > Ugo
Re: are there any SolrCloud supervisors?
I would be interested in seeing it in action. Do you have any documentation available on what it does and how? Thanks From: r bSent: Friday, October 2, 2015 3:09 PM To: solr-user@lucene.apache.org Subject: are there any SolrCloud supervisors? I've been working on something that just monitors ZooKeeper to add and remove nodes from collections. the use case being I put SolrCloud in an autoscaling group on EC2 and as instances go up and down, I need them added to the collection. It's something I've built for work and could clean up to share on GitHub if there is much interest. I asked in the IRC about a SolrCloud supervisor utility but wanted to extend that question to this list. are there any more "full featured" supervisors out there? -renning
Re: JSON Facet Analytics API in Solr 5.1
I prefer the second way. I find it more readable and shorter. Thanks for making Solr even better ;) From: Yonik Seeley ysee...@gmail.com Sent: Friday, April 17, 2015 12:20 PM To: solr-user@lucene.apache.org Subject: Re: JSON Facet Analytics API in Solr 5.1 Does anyone have any thoughts on the current general structure of JSON facets? The current general form of a facet command is: facet_name : { facet_type : facet_args } For example: top_authors : { terms : { field : author, limit : 5, }} One alternative I considered in the past is having the type in the args: top_authors : { type : terms, field : author, limit : 5 } It's a flatter structure... probably better in some ways, but worse in other ways. Thoughts / preferences? -Yonik On Tue, Apr 14, 2015 at 4:30 PM, Yonik Seeley ysee...@gmail.com wrote: Folks, there's a new JSON Facet API in the just released Solr 5.1 (actually, a new facet module under the covers too). It's marked as experimental so we have time to change the API based on your feedback. So let us know what you like, what you would change, what's missing, or any other ideas you may have! I've just started the documentation for the reference guide (on our confluence wiki), so for now the best doc is on my blog: http://yonik.com/json-facet-api/ http://yonik.com/solr-facet-functions/ http://yonik.com/solr-subfacets/ I'll also be hanging out more on the #solr-dev IRC channel on freenode if you want to hit me up there about any development ideas. -Yonik
Re: indexing db records via SolrJ
Do you have any references to such integrations (Solr + Storm)? Thanks From: mike st. john mstj...@gmail.com Sent: Monday, March 16, 2015 2:39 PM To: solr-user@lucene.apache.org Subject: Re: indexing db records via SolrJ Take a look at some of the integrations people are using with apache storm, we do something similar on a larger scale , having created a pgsql spout and having a solr indexing bolt. -msj On Mon, Mar 16, 2015 at 11:08 AM, Hal Roberts hrobe...@cyber.law.harvard.edu wrote: We import anywhere from five to fifty million small documents a day from a postgres database. I wrestled to get the DIH stuff to work for us for about a year and was much happier when I ditched that approach and switched to writing the few hundred lines of relatively simple code to handle directly the logic of what gets updated and how it gets queried from postgres ourselves. The DIH stuff is great for lots of cases, but if you are getting to the point of trying to hack its undocumented internals, I suspect you are better off spending a day or two of your time just writing all of the update logic yourself. We found a relatively simple combination of postgres triggers, export to csv based on those triggers, and then just calling update/csv to work best for us. -hal On 3/16/15 9:59 AM, Shawn Heisey wrote: On 3/16/2015 7:15 AM, sreedevi s wrote: I had checked this post.I dont know whether this is possible but my query is whether I can use the configuration for DIH for indexing via SolrJ You can use SolrJ for accessing DIH. I have code that does this, but only for full index rebuilds. It won't be particularly obvious how to do it. Writing code that can intepret DIH status and know when it finishes, succeeds, or fails is very tricky because DIH only uses human-readable status info, not machine-readable, and the info is not very consistent. I can't just share my code, because it's extremely convoluted ... but the general gist is to create a SolrQuery object, use setRequestHandler to set the handler to /dataimport or whatever your DIH handler is, and set the other parameters on the request like command to full-import and so on. Thanks, Shawn -- Hal Roberts Fellow Berkman Center for Internet Society Harvard University
Re: Delete By query on a multi-value field
Hi Lokesh, thanks for the information. I forgot to mention that the system I am working on is still using 3.5 so I will probably have to reindex the whole set of documents. Unless someone knows how to get around this... From: Lokesh Chhaparwal xyzlu...@gmail.com Sent: Monday, February 2, 2015 11:44 PM To: solr-user@lucene.apache.org Subject: Re: Delete By query on a multi-value field Hi Jean, Please see the issues https://issues.apache.org/jira/browse/SOLR-3862 https://issues.apache.org/jira/browse/SOLR-5992 Both of them are resolved. The *remove *clause (atomic update) has been added to 4.9.0 release. Haven't checked it though. Thanks, Lokesh On Tue, Feb 3, 2015 at 7:26 AM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedanalytics.com wrote: Hi All, Is there a way to delete a value from a Multi-value field without reindexing anything? Lets say I have three documents A,B and C with field XYZ set to 1,2,3, 2,3,4 and 1. I'd like to remove anything that has the value '1' in the field XYZ. That is I want to remove the value '1' from the field, deleting the document only if '1' is the only value present. Deleting documents such as C (single value) is easy with a Delete by query through the update handler but what about document A? Thanks for any hint
Delete By query on a multi-value field
Hi All, Is there a way to delete a value from a Multi-value field without reindexing anything? Lets say I have three documents A,B and C with field XYZ set to 1,2,3, 2,3,4 and 1. I'd like to remove anything that has the value '1' in the field XYZ. That is I want to remove the value '1' from the field, deleting the document only if '1' is the only value present. Deleting documents such as C (single value) is easy with a Delete by query through the update handler but what about document A? Thanks for any hint
RE: ANN: Solr Next
Hi Yonik, Very impressive results. Looking forward to use this on our systems. Any idea what`s the plan for this feature? Will it make its way into Solr 4.9? or do we have to switch to HeliosSearch to be able to use it? Thanks -Original Message- From: Yonik Seeley [mailto:ysee...@gmail.com] Sent: June-09-14 10:50 AM To: solr-user@lucene.apache.org Subject: Re: ANN: Solr Next On Tue, Jan 7, 2014 at 1:53 PM, Yonik Seeley ysee...@gmail.com wrote: [...] Next major feature: Native Code Optimizations. In addition to moving more large data structures off-heap(like UnInvertedField?), I am planning to implement native code optimizations for certain hotspots. Native code faceting would be an obvious first choice since it can often be a CPU bottleneck. It's in! Abbreviated report: 2x performance increase over stock solr faceting (which is already fast!) http://heliosearch.org/native-code-faceting/ -Yonik http://heliosearch.org -- making solr shine Project resources: https://github.com/Heliosearch/heliosearch https://groups.google.com/forum/#!forum/heliosearch https://groups.google.com/forum/#!forum/heliosearch-dev Freenode IRC: #heliosearch #heliosearch-dev -Yonik - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2014.0.4570 / Base de données virale: 3950/7571 - Date: 27/05/2014 La Base de données des virus a expiré.
RE: Strange Behavior with Solr in Tomcat.
I would try a thread dump and check the output to see what`s going on. You could also strace the process if you`re running on Unix or changed the log level in Solr to get more information logged -Original Message- From: S.L [mailto:simpleliving...@gmail.com] Sent: June-06-14 2:33 PM To: solr-user@lucene.apache.org Subject: Re: Strange Behavior with Solr in Tomcat. Anyone folks? On Wed, Jun 4, 2014 at 10:25 AM, S.L simpleliving...@gmail.com wrote: Hi Folks, I recently started using the spellchecker in my solrconfig.xml. I am able to build up an index in Solr. But,if I ever shutdown tomcat I am not able to restart it.The server never spits out the server startup time in seconds in the logs,nor does it print any error messages in the catalina.out file. The only way for me to get around this is by delete the data directory of the index and then start the server,obviously this makes me loose my index. Just wondering if anyone faced a similar issue and if they were able to solve this. Thanks. - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2014.0.4570 / Base de données virale: 3950/7571 - Date: 27/05/2014 La Base de données des virus a expiré.
RE: Strange behaviour when tuning the caches
Hi Otis, We saw some improvement when increasing the size of the caches. Since then, we followed Shawn advice on the filterCache and gave some additional RAM to the JVM in order to reduce GC. The performance is very good right now but we are still experiencing some instability but not at the same level as before. With our current settings the number of evictions is actually very low so we might be able to reduce some caches to free up some additional memory for the JVM to use. As for the queries, it is a set of 5 million queries taken from our logs so they vary a lot. All I can say is that all queries involve either grouping/field collapsing and/or radius search around a point. Our largest customer is using a set of 8-10 filters that are translated as fq parameters. The collection contains around 13 million documents distributed on 5 shards with 2 replicas. The second collection has the same configuration and is used for indexing or as a fail-over index in case the first one falls. We`ll keep making adjustments today but we are pretty close of having something that performs while being stable. Thanks all for your help. -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: June-03-14 12:17 AM To: solr-user@lucene.apache.org Subject: Re: Strange behaviour when tuning the caches Hi Jean-Sebastien, One thing you didn't mention is whether as you are increasing(I assume) cache sizes you actually see performance improve? If not, then maybe there is no value increasing cache sizes. I assume you changed only one cache at a time? Were you able to get any one of them to the point where there were no evictions without things breaking? What are your queries like, can you share a few examples? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Jun 2, 2014 at 11:09 AM, Jean-Sebastien Vachon jean- sebastien.vac...@wantedanalytics.com wrote: Thanks for your quick response. Our JVM is configured with a heap of 8GB. So we are pretty close of the optimal configuration you are mentioning. The only other programs running is Zookeeper (which has its own storage device) and a proprietary API (with a heap of 1GB) we have on top of Solr to server our customer`s requests. I will look into the filterCache to see if we can better use it. Thanks for your help -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: June-02-14 10:48 AM To: solr-user@lucene.apache.org Subject: Re: Strange behaviour when tuning the caches On 6/2/2014 8:24 AM, Jean-Sebastien Vachon wrote: We have yet to determine where the exact breaking point is. The two patterns we are seeing are: - less cache (around 20-30% hit/ratio), poor performance but overall good stability When caches are too small, a low hit ratio is expected. Increasing them is a good idea, but only increase them a little bit at a time. The filterCache in particular should not be increased dramatically, especially the autowarmCount value. Filters can take a very long time to execute, so a high autowarmCount can result in commits taking forever. Each filter entry can take up a lot of heap memory -- in terms of bytes, it is the number of documents in the core divided by 8. This means that if the core has 10 million documents, each filter entry (for JUST that core) will take over a megabyte of RAM. - more cache (over 90% hit/ratio), improved performance but almost no stability. In that case, we start seeing messages such as No shards hosting shard X or cancelElection did not find election node to remove This would not be a direct result of increasing the cache size, unless perhaps you've increased them so they are *REALLY* big and you're running out of RAM for the heap or OS disk cache. Anyone, has any advice on what could cause this? I am beginning to suspect the JVM version, is there any minimal requirements regarding the JVM? Oracle Java 7 is recommended for all releases, and required for Solr 4.8. You just need to stay away from 7u40, 7u45, and 7u51 because of bugs in Java itself. Right now, the latest release is recommended, which is 7u60. The 7u21 release that you are running should be perfectly fine. With six 9.4GB cores per node, you'll achieve the best performance if you have about 60GB of RAM left over for the OS disk cache to use -- the size of your index data on disk. You did mention that you have 92GB of RAM per node, but you have not said how big your Java heap is, or whether there is other software on the machine that may be eating up RAM for its heap or data. http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn
RE: Strange behaviour when tuning the caches
Yes we are already using it. -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: June-03-14 11:41 AM To: solr-user@lucene.apache.org Subject: Re: Strange behaviour when tuning the caches Hi, Have you seen https://wiki.apache.org/solr/CollapsingQParserPlugin ? May help with the field collapsing queries. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Tue, Jun 3, 2014 at 8:41 AM, Jean-Sebastien Vachon jean- sebastien.vac...@wantedanalytics.com wrote: Hi Otis, We saw some improvement when increasing the size of the caches. Since then, we followed Shawn advice on the filterCache and gave some additional RAM to the JVM in order to reduce GC. The performance is very good right now but we are still experiencing some instability but not at the same level as before. With our current settings the number of evictions is actually very low so we might be able to reduce some caches to free up some additional memory for the JVM to use. As for the queries, it is a set of 5 million queries taken from our logs so they vary a lot. All I can say is that all queries involve either grouping/field collapsing and/or radius search around a point. Our largest customer is using a set of 8-10 filters that are translated as fq parameters. The collection contains around 13 million documents distributed on 5 shards with 2 replicas. The second collection has the same configuration and is used for indexing or as a fail-over index in case the first one falls. We`ll keep making adjustments today but we are pretty close of having something that performs while being stable. Thanks all for your help. -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: June-03-14 12:17 AM To: solr-user@lucene.apache.org Subject: Re: Strange behaviour when tuning the caches Hi Jean-Sebastien, One thing you didn't mention is whether as you are increasing(I assume) cache sizes you actually see performance improve? If not, then maybe there is no value increasing cache sizes. I assume you changed only one cache at a time? Were you able to get any one of them to the point where there were no evictions without things breaking? What are your queries like, can you share a few examples? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Jun 2, 2014 at 11:09 AM, Jean-Sebastien Vachon jean- sebastien.vac...@wantedanalytics.com wrote: Thanks for your quick response. Our JVM is configured with a heap of 8GB. So we are pretty close of the optimal configuration you are mentioning. The only other programs running is Zookeeper (which has its own storage device) and a proprietary API (with a heap of 1GB) we have on top of Solr to server our customer`s requests. I will look into the filterCache to see if we can better use it. Thanks for your help -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: June-02-14 10:48 AM To: solr-user@lucene.apache.org Subject: Re: Strange behaviour when tuning the caches On 6/2/2014 8:24 AM, Jean-Sebastien Vachon wrote: We have yet to determine where the exact breaking point is. The two patterns we are seeing are: - less cache (around 20-30% hit/ratio), poor performance but overall good stability When caches are too small, a low hit ratio is expected. Increasing them is a good idea, but only increase them a little bit at a time. The filterCache in particular should not be increased dramatically, especially the autowarmCount value. Filters can take a very long time to execute, so a high autowarmCount can result in commits taking forever. Each filter entry can take up a lot of heap memory -- in terms of bytes, it is the number of documents in the core divided by 8. This means that if the core has 10 million documents, each filter entry (for JUST that core) will take over a megabyte of RAM. - more cache (over 90% hit/ratio), improved performance but almost no stability. In that case, we start seeing messages such as No shards hosting shard X or cancelElection did not find election node to remove This would not be a direct result of increasing the cache size, unless perhaps you've increased them so they are *REALLY* big and you're running out of RAM for the heap or OS disk cache. Anyone, has any advice on what could cause this? I am beginning to suspect the JVM version, is there any minimal requirements regarding the JVM
Strange behaviour when tuning the caches
Hi All, We have a 5 nodes setup running Solr 4.8.1 and we are trying to get the most out of it by tuning Solr caches. Following is the output of the script version.sh provided with Tomcat Server version: Apache Tomcat/7.0.39 Server built: Mar 22 2013 12:37:24 Server number: 7.0.39.0 OS Name:Linux OS Version: 3.0.76-0.11-default Architecture: amd64 JVM Version:1.7.0_21-b11 JVM Vendor: Oracle Corporation To measure the performance, we are running a simple set of queries using Jmeter with 25 threads from another host (not a member of our cloud) We tried to tune the different caches (mostly the documentCache, filterCache and queryResultCache) to reduce the number of evictions but the cloud became very unstable at some point. Each server has 92GB of RAM and has 2 collections (1 shard and two replicas) for a total of 6 cores per node. Each core is around 9.4GB in size according to the Core admin panel. We have yet to determine where the exact breaking point is. The two patterns we are seeing are: - less cache (around 20-30% hit/ratio), poor performance but overall good stability - more cache (over 90% hit/ratio), improved performance but almost no stability. In that case, we start seeing messages such as No shards hosting shard X or cancelElection did not find election node to remove Anyone, has any advice on what could cause this? I am beginning to suspect the JVM version, is there any minimal requirements regarding the JVM? Thanks
RE: Strange behaviour when tuning the caches
Thanks for your quick response. Our JVM is configured with a heap of 8GB. So we are pretty close of the optimal configuration you are mentioning. The only other programs running is Zookeeper (which has its own storage device) and a proprietary API (with a heap of 1GB) we have on top of Solr to server our customer`s requests. I will look into the filterCache to see if we can better use it. Thanks for your help -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: June-02-14 10:48 AM To: solr-user@lucene.apache.org Subject: Re: Strange behaviour when tuning the caches On 6/2/2014 8:24 AM, Jean-Sebastien Vachon wrote: We have yet to determine where the exact breaking point is. The two patterns we are seeing are: - less cache (around 20-30% hit/ratio), poor performance but overall good stability When caches are too small, a low hit ratio is expected. Increasing them is a good idea, but only increase them a little bit at a time. The filterCache in particular should not be increased dramatically, especially the autowarmCount value. Filters can take a very long time to execute, so a high autowarmCount can result in commits taking forever. Each filter entry can take up a lot of heap memory -- in terms of bytes, it is the number of documents in the core divided by 8. This means that if the core has 10 million documents, each filter entry (for JUST that core) will take over a megabyte of RAM. - more cache (over 90% hit/ratio), improved performance but almost no stability. In that case, we start seeing messages such as No shards hosting shard X or cancelElection did not find election node to remove This would not be a direct result of increasing the cache size, unless perhaps you've increased them so they are *REALLY* big and you're running out of RAM for the heap or OS disk cache. Anyone, has any advice on what could cause this? I am beginning to suspect the JVM version, is there any minimal requirements regarding the JVM? Oracle Java 7 is recommended for all releases, and required for Solr 4.8. You just need to stay away from 7u40, 7u45, and 7u51 because of bugs in Java itself. Right now, the latest release is recommended, which is 7u60. The 7u21 release that you are running should be perfectly fine. With six 9.4GB cores per node, you'll achieve the best performance if you have about 60GB of RAM left over for the OS disk cache to use -- the size of your index data on disk. You did mention that you have 92GB of RAM per node, but you have not said how big your Java heap is, or whether there is other software on the machine that may be eating up RAM for its heap or data. http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2014.0.4570 / Base de données virale: 3950/7571 - Date: 27/05/2014
RE: Question regarding the lastest version of HeliosSearch
Thanks for the information Yonik. -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: May-16-14 8:52 PM To: solr-user@lucene.apache.org Subject: Re: Question regarding the lastest version of HeliosSearch On Thu, May 15, 2014 at 3:44 PM, Jean-Sebastien Vachon jean- sebastien.vac...@wantedanalytics.com wrote: I spent some time today playing around with subfacets and facets functions now available in helios search 0.05 and I have some concerns... They look very promising . Thanks, glad for the feedback! [...] the response looks good except for one little thing... the mincount is not respected whenever I specify the facet.stat parameter. Removing it will cause the mincount to be respected but then I need this parameter. Right, the mincount parameter is not yet implemented. Hopefully soon! { val:1133, unique(job_id):0, == what is this? count:0}, Many zero entries following... I was wondering where the extra entries were coming from... the position_id = 1133 above is not even a match for my query (its title is Audit Consultant) I`ve also noticed a similar behaviour when using subfacets. It looks like the number of items returned always match the facet.limit parameter. If not enough values are present for a given entry then the bucket is filled with documents not matching the original query. Right... straight Solr faceting will do this too (unless you have a mincount0). We're just looking at terms in the field and we don't have enough context to know if some 0's make more sense than others to return. -Yonik http://heliosearch.org - facet functions, subfacets, off-heap filtersfieldcache - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2014.0.4570 / Base de données virale: 3931/7453 - Date: 07/05/2014 La Base de données des virus a expiré.
Question regarding the lastest version of HeliosSearch
Hi All, I spent some time today playing around with subfacets and facets functions now available in helios search 0.05 and I have some concerns... They look very promising . I indexed 10 000 documents and built some queries to look at each feature and found some weird behaviour that I could not explain. The first query I made was to find all documents having the word java in their title and then compute a facet on the field position_id with stats about the field job_id. Basically, I want the number of unique Job_ids for each position_id for all matching documents. http://localhost:8983/solr/current/select?q=title:javafacet=onfacet.field=position_idfacet.stat=unique(job_id)rows=1facet.limit=10facet.mincount=1wt=jsonindent=onfl=job_id,position_id,super_alias_id the response looks good except for one little thing... the mincount is not respected whenever I specify the facet.stat parameter. Removing it will cause the mincount to be respected but then I need this parameter. Without the parameter the facet looks like this: facet_counts:{ facet_queries:{}, facet_fields:{ position_id:[ 265151,5, 927284,1, 1662380,1, 2625553,1, 2862455,1, 4128904,1, 4253203,1]}, === accounted for all 11 documents And now when adding the parameter: facets:{ position_id:{ stats:{ unique(job_id):11, == again, 11 documents, which is good count:11}, buckets:[{ val:265151, unique(job_id):5, count:5}, { val:927284, unique(job_id):1, count:1}, { val:1662380, unique(job_id):1, count:1}, { val:2625553, unique(job_id):1, count:1}, { val:2862455, unique(job_id):1, count:1}, { val:4128904, unique(job_id):1, count:1}, { val:4253203, unique(job_id):1, count:1}, { val:1133, unique(job_id):0, == what is this? count:0}, Many zero entries following... I was wondering where the extra entries were coming from... the position_id = 1133 above is not even a match for my query (its title is Audit Consultant) I`ve also noticed a similar behaviour when using subfacets. It looks like the number of items returned always match the facet.limit parameter. If not enough values are present for a given entry then the bucket is filled with documents not matching the original query. Am I doing something wrong?
RE: Transformation on a numeric field
Thanks for the information. I will look into this but I`m curious to know why something this basic requires an external script... Anyone knows why we can`t have an analysis chain on a numeric field ? Looks to me like it would be very useful to be able to manipulate/transform a value without an external resources. Thanks -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: April-15-14 4:36 PM To: solr-user@lucene.apache.org Subject: Re: Transformation on a numeric field You can use an update processor. The stateless script update processor will let you write arbitrary JavaScript code, which can do this calculation. You should be able to figure it out from the wiki: http://wiki.apache.org/solr/ScriptUpdateProcessor My e-book has plenty of script examples for this processor as well. We could also write a generic script that takes a source and destination field name and then does a specified operation on it, like add an offset or multiple by a scale factor. -- Jack Krupansky -Original Message- From: Jean-Sebastien Vachon Sent: Tuesday, April 15, 2014 3:57 PM To: 'solr-user@lucene.apache.org' Subject: Transformation on a numeric field Hi All, I am looking for a way to index a numeric field and its value divided by 1 000 into another numeric field. I thought about using a CopyField with a PatternReplaceFilterFactory to keep only the first few digits (cutting the last three). Solr complains that I can not have an analysis chain on a numeric field: Core: org.apache.solr.common.SolrException:org.apache.solr.common.SolrExcept ion: Plugin init failure for [schema.xml] fieldType truncated_salary: FieldType: TrieIntField (truncated_salary) does not support specifying an analyzer. Schema file is /data/solr/solr-no-cloud/Core1/schema.xml Is there a way to accomplish this ? Thanks - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2014.0.4355 / Base de données virale: 3882/7323 - Date: 09/04/2014
Transformation on a numeric field
Hi All, I am looking for a way to index a numeric field and its value divided by 1 000 into another numeric field. I thought about using a CopyField with a PatternReplaceFilterFactory to keep only the first few digits (cutting the last three). Solr complains that I can not have an analysis chain on a numeric field: Core: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType truncated_salary: FieldType: TrieIntField (truncated_salary) does not support specifying an analyzer. Schema file is /data/solr/solr-no-cloud/Core1/schema.xml Is there a way to accomplish this ? Thanks
RE: Were changes made to facetting on multivalued fields recently?
Thanks to both of you. I finally found the issue and you were right (again) ;) The problem was not coming from the full indexation code containing the SQL replace statement but from another process whose job is to maintain our index up to date. This process had no idea that commas were to be replaced by spaces for some fields (and it should not about this either). I changed the Tokenizer used for the field to the following and everything is fine now. tokenizer class=solr.PatternTokenizerFactory pattern=,/ Thanks for your help -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: April-10-14 1:54 PM To: solr-user@lucene.apache.org Subject: Re: Were changes made to facetting on multivalued fields recently? bq: The SQL query contains a Replace statement that does this Well, I suspect that's where the issue is. The facet values being reported include: int name=4,1134826/int which indicates that the incoming text to Solr still has the commas. Solr is seeing the commas and all. You can cure this by using PatternReplaceCharFilterFactory and doing the substitution at index time if you want to. That doesn't clarify why the behavior has changed though, but my supposition is that it has nothing to do with Solr, and something about your SQL statement is different. Best, Erick On Thu, Apr 10, 2014 at 9:33 AM, Jean-Sebastien Vachon jean- sebastien.vac...@wantedanalytics.com wrote: The SQL query contains a Replace statement that does this -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: April-10-14 11:30 AM To: solr-user@lucene.apache.org Subject: Re: Were changes made to facetting on multivalued fields recently? On 4/10/2014 9:14 AM, Jean-Sebastien Vachon wrote: Here are the field definitions for both our old and new index... as you can see that are identical. We've been using this chain and field type starting with Solr 1.4 and never had any problem. As for the documents, both indexes are using the same data source. They could be slightly out of sync from time to time but we tend to index them on a daily basis. Both indexes are also using the same code (indexing through SolrJ) to index their content. The source is a column in MySql that contains entries such as 4,1 that get stored in a Multivalued fields after replacing commas by spaces OLD (4.6.1): fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType field name=ad_job_type_id type=text_ws indexed=true stored=true required=false multiValued=true / Just so you know, there's nothing here that would require the field to be multivalued. WhitespaceTokenizerFactory does not create multiple field values, it creates multiple terms. If you are actually inserting multiple values for the field in SolrJ, then you would need a multivalued field. What is replacing the commas with spaces? I don't see anything here that would do that. It sounds like that part of your indexing is not working. Thanks, Shawn - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2014.0.4355 / Base de données virale: 3882/7323 - Date: 09/04/2014 - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2014.0.4355 / Base de données virale: 3882/7323 - Date: 09/04/2014
RE: Were changes made to facetting on multivalued fields recently?
Here are the field definitions for both our old and new index... as you can see that are identical. We've been using this chain and field type starting with Solr 1.4 and never had any problem. As for the documents, both indexes are using the same data source. They could be slightly out of sync from time to time but we tend to index them on a daily basis. Both indexes are also using the same code (indexing through SolrJ) to index their content. The source is a column in MySql that contains entries such as 4,1 that get stored in a Multivalued fields after replacing commas by spaces OLD (4.6.1): fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType field name=ad_job_type_id type=text_ws indexed=true stored=true required=false multiValued=true / NEW (4.7.1): fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType field name=ad_job_type_id type=text_ws indexed=true stored=true required=false multiValued=true / It looks like the /analysis/field hanlder is not active in our current setup. I will look into this and perform additional checks later as we are currently doing a full reindex of our DB. Thanks for your time -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: April-09-14 5:23 PM To: solr-user@lucene.apache.org Subject: Re: Were changes made to facetting on multivalued fields recently? On 4/9/2014 2:15 PM, Erick Erickson wrote: Right, but the response in the doc when you make a request is almost, but not quite totally, unrelated to how facet values are tallied. It's all about what tokens are actually in your index, which you can see in the schema browser... Supplement to what Erick has told you: SOLR-5512 seems to be related to facets using docValues. The commit for that issue looks like it only touches on that specifically.If you do not have (and never have had) docValues on this field, then SOLR-5512 should not apply. I am reasonably sure that for facets on fields with docValues, your facets would reflect the *stored* information, not the indexed information. Finally, I don't think that docValues work on fieldtypes whose class is solr.TextField, which is the only class that can have an analysis chain that would turn 4 5 1 into three separate tokens. The response that you shared where the value is 4 5 1 looks like there is only one value in the field -- so for that document, it is effectively the same as one that is single-valued. Bottom line: It looks like either your analysis chain is working differently in the newer version, or you have documents in your newer index that are not in the older one. Can you share the field and fieldType definitions from both versions? Did your luceneMatchVersion change with the upgrade? If you are using DIH to populate your index, can you also share your DIH config? Thanks, Shawn - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2014.0.4354 / Base de données virale: 3722/7256 - Date: 27/03/2014 La Base de données des virus a expiré.
RE: Were changes made to facetting on multivalued fields recently?
The SQL query contains a Replace statement that does this -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: April-10-14 11:30 AM To: solr-user@lucene.apache.org Subject: Re: Were changes made to facetting on multivalued fields recently? On 4/10/2014 9:14 AM, Jean-Sebastien Vachon wrote: Here are the field definitions for both our old and new index... as you can see that are identical. We've been using this chain and field type starting with Solr 1.4 and never had any problem. As for the documents, both indexes are using the same data source. They could be slightly out of sync from time to time but we tend to index them on a daily basis. Both indexes are also using the same code (indexing through SolrJ) to index their content. The source is a column in MySql that contains entries such as 4,1 that get stored in a Multivalued fields after replacing commas by spaces OLD (4.6.1): fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType field name=ad_job_type_id type=text_ws indexed=true stored=true required=false multiValued=true / Just so you know, there's nothing here that would require the field to be multivalued. WhitespaceTokenizerFactory does not create multiple field values, it creates multiple terms. If you are actually inserting multiple values for the field in SolrJ, then you would need a multivalued field. What is replacing the commas with spaces? I don't see anything here that would do that. It sounds like that part of your indexing is not working. Thanks, Shawn - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2014.0.4355 / Base de données virale: 3882/7323 - Date: 09/04/2014
Were changes made to facetting on multivalued fields recently?
Hi All, We just discovered that the response from Solr (4.7.1) when faceting on one of our multi-valued fields has changed considerably. In the past (4.6.1 and prior versions as well) we used to have something like this: (there are 7 possible values for this attribute) lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=ad_job_type_id int name=111454652/int int name=411387070/int int name=52095603/int int name=3809992/int int name=2567244/int int name=6139389/int int name=74120/int /lst /lst lst name=facet_dates/ /lst And now with 4.7.1 we are getting this: lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=ad_job_type_id int name=110954552/int int name=410884418/int int name=52000530/int int name=3784491/int int name=2535935/int int name=4,1134826/int int name=5,111770/int ... there are too many values to list them all ... I checked the Change log for 4.7.1 and only saw an optimization made for https://issues.apache.org/jira/browse/SOLR-5512 Is there any new configuration directive that we should be aware of? Thanks
RE: Were changes made to facetting on multivalued fields recently?
Thanks Erick I will check this as soon as I can. In the meantime, here is a sample query and how it looks in our index. It looks good to me (at least that what is showing up as well in our other and older indexes) http://10.0.5.227:8201/solr/Current/select?q=*:*fl=ad_job_type_idfq=ad_job_type_id:[*%20TO%20*]facet=onfacet.field=ad_job_type_idrows=1 result name=response numFound=12204004 start=0 maxScore=1.0 doc arr name=ad_job_type_id str4 5 1/str /arr /doc /result -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: April-09-14 2:21 PM To: solr-user@lucene.apache.org Subject: Re: Were changes made to facetting on multivalued fields recently? That is...um...very strange. It looks to me like you have somehow indexed a bunch of new values. I'm guessing here, but it's suspicious that you have a value 4,1 should that have been indexed as 4 and 1 as separate tokens? So here's what I'd do 1 take a look at the solr/admin/schema browser output for that field in the two versions. I suspect you'll see 7 values in 4.6 and a bazillion in 4.7.1. 2 if 1 is true, take a look at the admin/analysis page for the field in question and see some sample index-time inputs, especially for the theoretical 4,1 entries. I suspect that 4.6 will break these up into two tokens and 4.7.1 won't. 3 if 2 is true, take a very careful look at the index-time analysis chains in the two versions, I bet they're different and that accounts for your observations. 4 try 1-3, discover I'm totally off base and paste the schema.xml definitions for the field in question in both 4.6 and 4.7.1 to this thread and we can take a look. This should not have changed between 4.6 and 4.7.1, at least not intentionally. Best, Erick On Wed, Apr 9, 2014 at 11:04 AM, Jean-Sebastien Vachon jean- sebastien.vac...@wantedanalytics.com wrote: Hi All, We just discovered that the response from Solr (4.7.1) when faceting on one of our multi-valued fields has changed considerably. In the past (4.6.1 and prior versions as well) we used to have something like this: (there are 7 possible values for this attribute) lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=ad_job_type_id int name=111454652/int int name=411387070/int int name=52095603/int int name=3809992/int int name=2567244/int int name=6139389/int int name=74120/int /lst /lst lst name=facet_dates/ /lst And now with 4.7.1 we are getting this: lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=ad_job_type_id int name=110954552/int int name=410884418/int int name=52000530/int int name=3784491/int int name=2535935/int int name=4,1134826/int int name=5,111770/int ... there are too many values to list them all ... I checked the Change log for 4.7.1 and only saw an optimization made for https://issues.apache.org/jira/browse/SOLR-5512 Is there any new configuration directive that we should be aware of? Thanks - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2014.0.4354 / Base de données virale: 3722/7256 - Date: 27/03/2014 La Base de données des virus a expiré.
RE: Update single field through SolrJ
Hi, Thanks for pointing me in the proper direction. I managed to change my code to send atomic updates through SolrJ but this morning we experienced something weird. I sent a large batch of updates and deletes through SolrJ and our Cloud quickly became unusable and unresponsive (no leader for a shard, etc). We looked through the logs and could not find a particular reason for this. We waited quite some time but some nodes were not showing any progress in their recovery so we restarted them (we are running Tomcat 7.0.39) and everything came back as if nothing happened. Does anyone experienced something similar? We are currently running Solr 4.6.1 on a 5 nodes cluster with both ZK 3.4.5 and Solr on them (ZK has its own storage device to minimize the impact). Both are also running under JRE 1.7.0_21 in 64 bits mode. Our index has 5 shards with 2 replicas. Thanks for your help -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: March-28-14 3:21 PM To: solr-user@lucene.apache.org Subject: Re: Update single field through SolrJ On 3/28/2014 1:02 PM, Jean-Sebastien Vachon wrote: I`d like to know how (it is possible) to update a field`s value using SolrJ. I looked at the API and could not figure it out so for now I'm using the UpdateHandler by sending it a JSON formatted document illustrating the required changes. Is there a way to do the same through SolrJ? The feature you are after is called Atomic Updates. In order to use this feature *all* of your fields must be stored, except for copyField destinations. See especially the Caveats and Limitations section of the first link below: http://wiki.apache.org/solr/Atomic_Updates https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Docu ments To do this with SolrJ, you must use a Map for the field value instead of just one or more regular values: http://stackoverflow.com/questions/16234045/solr-how-to-use-the-new- field-update-modes-atomic-updates-with-solrj Thanks, Shawn - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2014.0.4354 / Base de données virale: 3722/7256 - Date: 27/03/2014
Update single field through SolrJ
Hi All, I`d like to know how (it is possible) to update a field`s value using SolrJ. I looked at the API and could not figure it out so for now I'm using the UpdateHandler by sending it a JSON formatted document illustrating the required changes. Is there a way to do the same through SolrJ? Thanks
RE: Recherche avec et sans espaces
Bonjour Antoine, Je ne vois que 2 solutions à ton problème. 1) utilisation de synonymes mais tu seras limités au cas connus d'avance seulement alors c'est une solution qui ne scale pas à long terme. 2) sinon tu dois envisager d'avoir un deuxième champ (probablement en CopyField) qui n'utilisera pas un WhitespaceTokenizer (la classe KeywordTokenizerFactory semble un bon candidat) et faire la recherche sur les 2 champs (fq=champ1:la redoute OR champ2:la redoute) La page d'administration (/solr/admin/analysis.jsp) te permet de bien voir ce qui se passe pour différentes valeurs et champs. De plus, tu auras beaucoup plus de chance d'obtenir des réponses à tes questions si celles-ci sont rédigées en anglais. ;) Bonne chance -Original Message- From: Antoine REBOUL [mailto:antoine.reb...@gmail.com] Sent: November-04-13 11:42 AM To: solr-user@lucene.apache.org Subject: Recherche avec et sans espaces Bonjour, je souhaite faire en sorte que les recherches dans un champs de type texte renvoient des résultats même si les espaces sont mal saisies (par exemple : la redoute=laredoute). Aujourd'hui mon champ texte est défini de la façon suivante : fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.ElisionFilterFactory articles=elisions.txt/ filter class=solr.SynonymFilterFactory synonyms=synonyms2.txt ignoreCase=true expand=false/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=1 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query filter class=solr.ISOLatin1AccentFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=0 catenateAll=1 splitOnCaseChange=1 preserveOriginal=1 / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.ElisionFilterFactory articles=elisions.txt/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Merci d'avance pour vos éventuelles réponses. Cordialement. Antoine Reboul * - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2014.0.4158 / Base de données virale: 3615/6784 - Date: 26/10/2013 La Base de données des virus a expiré.
RE: Regarding improving performance of the solr
Have you checked the hit ratio of the different caches? Try to tune them to get rid of all evictions if possible. Tuning the size of the caches and warming you searcher can give you a pretty good improvement. You might want to check your analysis chain as well to see if you`re not doing anything that is not necessary. -Original Message- From: prabu palanisamy [mailto:pr...@serendio.com] Sent: September-06-13 4:55 AM To: solr-user@lucene.apache.org Subject: Regarding improving performance of the solr Hi I am currently using solr -3.5.0, indexed wikipedia dump (50 gb) with java 1.6. I am searching the solr with text (which is actually twitter tweets) . Currently it takes average time of 210 millisecond for each post, out of which 200 millisecond is consumed by solr server (QTime). I used the jconsole monitor tool. The stats are Heap usage - 10-50Mb, No of threads - 10-20 No of class- 3800, Cpu usage - 10-15% Currently I am loading all the fields of the wikipedia. I only need the freebase category and wikipedia category. I want to know how to optimize the solr server to improve the performance. Could you please help me out in optimize the performance? Thanks and Regards Prabu - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3392 / Base de données virale: 3222/6640 - Date: 05/09/2013
RE: Flushing cache without restarting everything?
How can you validate that the changes you just made had any impact on the performance of the cloud if you don't have the same starting conditions? What we do basically is running a batch of requests to warm up the index and then launch the benchmark itself. That way we can measure the impact of our change(s). Otherwise there is absolutely no way we can be sure who is responsible for the gain or loss of performance. Restarting a cloud is actually a real pain, I just want to know if there is a faster way to proceed. -Original Message- From: Dmitry Kan [mailto:solrexp...@gmail.com] Sent: August-22-13 7:26 AM To: solr-user@lucene.apache.org Subject: Re: Flushing cache without restarting everything? But is it really a good benchmarking, if you flush the cache? Wouldn't you want to benchmark against a system, that would be comparable to what is under real (=production) load? Dmitry On Tue, Aug 20, 2013 at 9:39 PM, Jean-Sebastien Vachon jean- sebastien.vac...@wantedanalytics.com wrote: I just want to run benchmarks and want to have the same starting conditions. -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: August-20-13 2:06 PM To: solr-user@lucene.apache.org Subject: Re: Flushing cache without restarting everything? Why? What are you trying to acheive with this? --wunder On Aug 20, 2013, at 11:04 AM, Jean-Sebastien Vachon wrote: Hi All, Is there a way to flush the cache of all nodes in a Solr Cloud (by reloading all the cores, through the collection API, ...) without having to restart all nodes? Thanks - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3392 / Base de données virale: 3209/6563 - Date: 09/08/2013 La Base de données des virus a expiré. - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3392 / Base de données virale: 3209/6563 - Date: 09/08/2013 La Base de données des virus a expiré.
RE: Flushing cache without restarting everything?
I was afraid someone would tell me that... thanks for your input -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: August-22-13 9:56 AM To: solr-user@lucene.apache.org Subject: Re: Flushing cache without restarting everything? On Tue, 2013-08-20 at 20:04 +0200, Jean-Sebastien Vachon wrote: Is there a way to flush the cache of all nodes in a Solr Cloud (by reloading all the cores, through the collection API, ...) without having to restart all nodes? As MMapDirectory shares data with the OS disk cache, flushing of Solr-related caches on a machine should involve 1) Shut down all Solr instances on the machine 2) Clear the OS read cache ('sudo echo 1 /proc/sys/vm/drop_caches' on a Linux box) 3) Start the Solr instances I do not know of any Solr-supported way to do step 2. For our performance tests we use custom scripts to perform the steps. - Toke Eskildsen, State and University Library, Denmark - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3392 / Base de données virale: 3209/6563 - Date: 09/08/2013 La Base de données des virus a expiré.
Flushing cache without restarting everything?
Hi All, Is there a way to flush the cache of all nodes in a Solr Cloud (by reloading all the cores, through the collection API, ...) without having to restart all nodes? Thanks
RE: Flushing cache without restarting everything?
I just want to run benchmarks and want to have the same starting conditions. -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: August-20-13 2:06 PM To: solr-user@lucene.apache.org Subject: Re: Flushing cache without restarting everything? Why? What are you trying to acheive with this? --wunder On Aug 20, 2013, at 11:04 AM, Jean-Sebastien Vachon wrote: Hi All, Is there a way to flush the cache of all nodes in a Solr Cloud (by reloading all the cores, through the collection API, ...) without having to restart all nodes? Thanks - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3392 / Base de données virale: 3209/6563 - Date: 09/08/2013 La Base de données des virus a expiré.
Huge discrepancy between QTime and ElapsedTime
Hi All, I am running some benchmarks to tune our Solr 4.3 cloud and noticed that while the reported QTime is quite satisfactory (100 ms or so), the elapsed time is quite large (around 5 seconds). The collection contains 12.8M documents and the index size on disk is about 35 GB.. I have only one shard and 4 replicas (we intent to have 5 shards but wanted to see how Solr would perform with only one shard so that we could benefit from all Solr functions) I checked for huge GC but found none. I also checked if we had intensive IO and we don't. All five nodes have 48GB of ram of which 4GB is allocated to Tomcat 7 and Solr. The caches have a hit ratio over 80%. Zookeeper is running on the same boxes (5 instances, one per node) but there does not seem to be much activity going on. This is a sample query: http://10.0.5.211:8201/solr/Current/select?fq=position_first_seen_date_id:[3484 TO 3516]q= (title:java OR semi_clean_title:java OR ad_description:java)rows=10start=0fl=job_id,position_id,super_alias_id,advertiser,super_alias,credited_source_id,position_first_seen_date_id,position_last_seen_date_id, position_posted_date_id, position_refreshed_date_id, position_job_type_id, position_function_id,position_green_code,title_id,semi_clean_title_id,clean_title_id,position_empl_count,place_id, state_id,county_id,msa_id,country_id,position_id,position_job_type_mva, ad_activity_status_id, position_score, ad_score,position_salary,position_salary_range_id,position_salary_source,position_naics_6_code,position_education_level_id, is_staffing,is_bulk,is_anonymous,is_third_party,is_dirty,ref_num,tags,lat,long,position_duns_number,url,advertiser_id, title, semi_clean_title, ad_description, position_description, ad_bls_salary, position_bls_salary, covering_source_id, content_model_id,position_soc_2011_8_codegroup.field=position_idgroup=truegroup.ngroups=falsegroup.main=truesort=position_first_seen_date_id desc,score desc Any idea what could cause this?
RE: Huge discrepancy between QTime and ElapsedTime
Thanks Shawn and Scott for your feedback. It is really appreciated. -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: August-14-13 12:39 PM To: solr-user@lucene.apache.org Subject: Re: Huge discrepancy between QTime and ElapsedTime On 8/14/2013 9:09 AM, Jean-Sebastien Vachon wrote: I am running some benchmarks to tune our Solr 4.3 cloud and noticed that while the reported QTime is quite satisfactory (100 ms or so), the elapsed time is quite large (around 5 seconds). The collection contains 12.8M documents and the index size on disk is about 35 GB.. I have only one shard and 4 replicas (we intent to have 5 shards but wanted to see how Solr would perform with only one shard so that we could benefit from all Solr functions) As your other reply from Scott says, you may be dealing with the fact that Solr must fetch stored field data from the index on disk and decompress it. Solr 4.1 and later have compressed stored fields. There is no way other than writing custom Solr code to turn off the compression. If the documents are very large, the decompression step can be a big performance penalty. You have a VERY large field list - fl parameter. Have you tried just leaving that parameter off so that Solr will return all stored fields instead of identifying each field? This might not help at all, I'm just putting it out there as something to try. I will give it a try. You also have grouping enabled. From what I understand, that can be slow. If you turn that off, what happens to your elapsed times? Yes grouping is slow but not as much as bad as it was in Solr 1.4 which we are still using in production with a similar index (actually it has 17M documents on 6 shards but all on the same server). I expect the grouping to be much faster in 4.x than in 1.4 and I don't have this problem in 1.4. It's true however that I have some additional stored fields in my new setup. But this was done to limit the number of times I have to fetch the information from MySQL. Your free RAM vs. index size is good, assuming that there's nothing else on your Solr servers. With 12.8 million documents plus the use of grouping and sorting, you might need a larger java heap. Try increasing it to 5GB as an initial test and see if that makes any difference, either good or bad. Your email says you checked for huge GC, but without knowing exactly how you checked, it's difficult to know what you would have actually found. I turned GC logging on and analyzed the resulting file. I've also taken a few heap dumps and all generations seem to be properly sized. I will give it a try to see if it affects the performance. Thanks, Shawn - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3392 / Base de données virale: 3209/6563 - Date: 09/08/2013
queryResultCache showing all zeros
Hi, We just configured a new Solr cloud (5 nodes) running Solr 4.3, ran about 200 000 queries taken from our production environment and measured the performance of the cloud over a collection of 14M documents with the default Solr settings. We are now trying to tune the different caches and when I look at each node of the cloud, all of them are showing no activity (see below) regarding the queryResultCache... all other caches are showing some activity. Any idea what could cause this? * org.apache.solr.search.LRUCache * version: 1.0 * description: LRU Cache(maxSize=512, initialSize=512) * src: $URL: https:/?/?svn.apache.org/?repos/?asf/?lucene/?dev/?branches/?lucene_solr_4_3/?solr/?core/?src/?java/?org/?apache/?solr/?search/?LRUCache.javahttps://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/solr/core/src/java/org/apache/solr/search/LRUCache.java $ * stats: * lookups: 0 * hits: 0 * hitratio: 0.00 * inserts: 0 * evictions: 0 * size: 0 * warmupTime: 0 * cumulative_lookups: 0 * cumulative_hits: 0 * cumulative_hitratio: 0.00 * cumulative_inserts: 0 * cumulative_evictions: 0
RE: queryResultCache showing all zeros
Looks like the problem might not be related to Solr but to a proprietary system we have on top of it. I made some queries with facets and the cache was updated. We are looking into this... I should not have assumed that the problem was coming from Solr ;) I'll let you know if there is anything From: Chris Hostetter Sent: Wednesday, July 31, 2013 1:58 PM To: solr-user@lucene.apache.org Subject: Re: queryResultCache showing all zeros : We just configured a new Solr cloud (5 nodes) running Solr 4.3, ran : about 200 000 queries taken from our production environment and measured : the performance of the cloud over a collection of 14M documents with the : default Solr settings. We are now trying to tune the different caches : and when I look at each node of the cloud, all of them are showing no : activity (see below) regarding the queryResultCache... all other caches : are showing some activity. Any idea what could cause this? Can you show us some examples of hte types of queries you are executing? Do you have useFilterForSortedQuery in your solrconfig.xml ? -Hoss
RE: queryResultCache showing all zeros
Ok I might have found an Solr issue after I fixed a problem in our system. This the kind of query we are making: http://10.0.5.214:8201/solr/Current/select?fq=position_refreshed_date_id:[2747%20TO%203501]fq=position_soc_2011_8_code:41101100fq=country_id:1fq=position_job_type_id:4fq=position_education_level_id:8fq=position_salary_range_id:2fq=is_dirty:falsefq=is_staffing:falsefq=-position_soc_2011_2_code:99fq=-covering_source_id:(839%20OR%201145%20OR%2025%20OR%20802%20OR%20777%20OR%2085%20OR%20881%20OR%20775%20OR%201558%20OR%20743%20OR%20800%20OR%201580%20OR%201147%20OR%201690%20OR%20674%20OR%20894%20OR%20791)q=%20(title:photographer%20OR%20ad_description:photographer%20OR%20super_alias:photographer)%20AND%20(_val_:%22sum(product(75,div(5000,sum(50,sub(3500,position_refreshed_date_id,product(0.75,job_score),product(0.75,source_score))%22)facet=truefacet.mincount=1f.state_id.facet.limit=10facet.field=state_idfacet.field=position_salary_range_idfacet.field=position_job_type_idfacet.field=position_naics_6_codefacet.field=place_idfacet.field=position_education_level_idfacet.field=position_soc_2011_8_codef.position_salary_range_id.facet.limit=10f.position_job_type_id.facet.limit=10f.position_naics_6_code.facet.limit=10f.place_id.facet.limit=10f.position_education_level_id.facet.limit=10f.position_soc_2011_8_code.facet.limit=10rows=10start=0fl=job_id,position_id,super_alias_id,advertiser,super_alias,credited_source_id,position_first_seen_date_id,position_last_seen_date_id,%20position_posted_date_id,%20position_refreshed_date_id,%20position_job_type_id,%20position_function_id,position_green_code,title_id,semi_clean_title_id,clean_title_id,position_empl_count,place_id,%20state_id,county_id,msa_id,country_id,position_id,position_job_type_mva,%20ad_activity_status_id,%20position_score,%20ad_score,position_salary,position_salary_range_id,position_salary_source,position_naics_6_code,position_education_level_id,%20is_staffing,is_bulk,is_anonymous,is_third_party,is_dirty,ref_num,tags,lat,long,position_duns_number,url,advertiser_id,%20title,%20semi_clean_title,%20ad_description,%20position_description,%20ad_bls_salary,%20position_bls_salary,%20covering_source_id,%20content_model_id,position_soc_2011_8_code,position_noc_2006_4_idgroup.field=position_idgroup=truegroup.ngroups=truegroup.main=truesort=score%20desc it's quite long but this request uses both faceting and grouping. If I remove the grouping then the cache is used. Is this a normal behavior or a bug? Thanks From: Jean-Sebastien Vachon Sent: Wednesday, July 31, 2013 2:38 PM To: solr-user@lucene.apache.org Subject: RE: queryResultCache showing all zeros Looks like the problem might not be related to Solr but to a proprietary system we have on top of it. I made some queries with facets and the cache was updated. We are looking into this... I should not have assumed that the problem was coming from Solr ;) I'll let you know if there is anything From: Chris Hostetter Sent: Wednesday, July 31, 2013 1:58 PM To: solr-user@lucene.apache.org Subject: Re: queryResultCache showing all zeros : We just configured a new Solr cloud (5 nodes) running Solr 4.3, ran : about 200 000 queries taken from our production environment and measured : the performance of the cloud over a collection of 14M documents with the : default Solr settings. We are now trying to tune the different caches : and when I look at each node of the cloud, all of them are showing no : activity (see below) regarding the queryResultCache... all other caches : are showing some activity. Any idea what could cause this? Can you show us some examples of hte types of queries you are executing? Do you have useFilterForSortedQuery in your solrconfig.xml ? -Hoss
RE: queryResultCache showing all zeros
Also we do not have any useFilterForSortedQuery in our config. So we are relying on the default which I guess is false. From: Jean-Sebastien Vachon Sent: Wednesday, July 31, 2013 3:44 PM To: solr-user@lucene.apache.org Subject: RE: queryResultCache showing all zeros Ok I might have found an Solr issue after I fixed a problem in our system. This the kind of query we are making: http://10.0.5.214:8201/solr/Current/select?fq=position_refreshed_date_id:[2747%20TO%203501]fq=position_soc_2011_8_code:41101100fq=country_id:1fq=position_job_type_id:4fq=position_education_level_id:8fq=position_salary_range_id:2fq=is_dirty:falsefq=is_staffing:falsefq=-position_soc_2011_2_code:99fq=-covering_source_id:(839%20OR%201145%20OR%2025%20OR%20802%20OR%20777%20OR%2085%20OR%20881%20OR%20775%20OR%201558%20OR%20743%20OR%20800%20OR%201580%20OR%201147%20OR%201690%20OR%20674%20OR%20894%20OR%20791)q=%20(title:photographer%20OR%20ad_description:photographer%20OR%20super_alias:photographer)%20AND%20(_val_:%22sum(product(75,div(5000,sum(50,sub(3500,position_refreshed_date_id,product(0.75,job_score),product(0.75,source_score))%22)facet=truefacet.mincount=1f.state_id.facet.limit=10facet.field=state_idfacet.field=position_salary_range_idfacet.field=position_job_type_idfacet.field=position_naics_6_codefacet.field=place_idfacet.field=position_education_level_idfacet.field=position_soc_2011_8_codef.position_salary_range_id.facet.limit=10f.position_job_type_id.facet.limit=10f.position_naics_6_code.facet.limit=10f.place_id.facet.limit=10f.position_education_level_id.facet.limit=10f.position_soc_2011_8_code.facet.limit=10rows=10start=0fl=job_id,position_id,super_alias_id,advertiser,super_alias,credited_source_id,position_first_seen_date_id,position_last_seen_date_id,%20position_posted_date_id,%20position_refreshed_date_id,%20position_job_type_id,%20position_function_id,position_green_code,title_id,semi_clean_title_id,clean_title_id,position_empl_count,place_id,%20state_id,county_id,msa_id,country_id,position_id,position_job_type_mva,%20ad_activity_status_id,%20position_score,%20ad_score,position_salary,position_salary_range_id,position_salary_source,position_naics_6_code,position_education_level_id,%20is_staffing,is_bulk,is_anonymous,is_third_party,is_dirty,ref_num,tags,lat,long,position_duns_number,url,advertiser_id,%20title,%20semi_clean_title,%20ad_description,%20position_description,%20ad_bls_salary,%20position_bls_salary,%20covering_source_id,%20content_model_id,position_soc_2011_8_code,position_noc_2006_4_idgroup.field=position_idgroup=truegroup.ngroups=truegroup.main=truesort=score%20desc it's quite long but this request uses both faceting and grouping. If I remove the grouping then the cache is used. Is this a normal behavior or a bug? Thanks From: Jean-Sebastien Vachon Sent: Wednesday, July 31, 2013 2:38 PM To: solr-user@lucene.apache.org Subject: RE: queryResultCache showing all zeros Looks like the problem might not be related to Solr but to a proprietary system we have on top of it. I made some queries with facets and the cache was updated. We are looking into this... I should not have assumed that the problem was coming from Solr ;) I'll let you know if there is anything From: Chris Hostetter Sent: Wednesday, July 31, 2013 1:58 PM To: solr-user@lucene.apache.org Subject: Re: queryResultCache showing all zeros : We just configured a new Solr cloud (5 nodes) running Solr 4.3, ran : about 200 000 queries taken from our production environment and measured : the performance of the cloud over a collection of 14M documents with the : default Solr settings. We are now trying to tune the different caches : and when I look at each node of the cloud, all of them are showing no : activity (see below) regarding the queryResultCache... all other caches : are showing some activity. Any idea what could cause this? Can you show us some examples of hte types of queries you are executing? Do you have useFilterForSortedQuery in your solrconfig.xml ? -Hoss
RE: Problem with document routing with Solr 4.2.1
Hi All, Evan Sayer from LucidWorks found the problem in our schema so this problem is not related at all to SolrCloud itself. (well it is but as least it is not a bug) I don't why :( but at some point we changed the type of the id field from 'string' to 'text'. Since we are doing custom hashing and that the id field was tokenized, Solr could not find back documents when collecting responses from each shards. We changed back the id field to the 'string' type and it is now working -Original Message- From: Jean-Sebastien Vachon [mailto:jean-sebastien.vac...@wantedanalytics.com] Sent: May-23-13 2:57 PM To: solr-user@lucene.apache.org Subject: RE: Problem with document routing with Solr 4.2.1 I must add the shard.keys= does not return anything on two on my nodes. But that is to be expected since I'm using a replication factor of 3 on a cloud of 5 servers -Original Message- From: Jean-Sebastien Vachon [mailto:jean-sebastien.vac...@wantedanalytics.com] Sent: May-23-13 11:27 AM To: solr-user@lucene.apache.org Subject: RE: Problem with document routing with Solr 4.2.1 If that can help.. adding distrib=false or shard.keys= is giving back results. -Original Message- From: Jean-Sebastien Vachon [mailto:jean-sebastien.vac...@wantedanalytics.com] Sent: May-23-13 10:39 AM To: solr-user@lucene.apache.org Subject: RE: Problem with document routing with Solr 4.2.1 I know. If a stop routing the documents and simply use a standard 'id' field then I am getting back my fields. I forgot to tell you how the collection was created. http://localhost:8201/solr/admin/collections?action=CREATEname=CurrentnumShards=15replicationFactor=3maxShardsPerNode=9 Since I am using the numshards parameter then composite routing should be working... unless I misunderstood something -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: May-23-13 10:27 AM To: solr-user@lucene.apache.org Subject: Re: Problem with document routing with Solr 4.2.1 That's strange. The default value of rows param is 10 so you should be getting 10 results back unless your StandardRequestHandler config in solrconfig has set rows to 0 or if none of your fields are stored. On Thu, May 23, 2013 at 7:40 PM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedanalytics.com wrote: Hi All, I just started indexing data in my brand new Solr Cloud running on 4.2.1. Since I am a big user of the grouping feature, I need to route my documents on the proper shard. Following the instruction found here: http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+So lrCloud I set my document id to something like this 'fieldA!id' where fieldA is the key I want to use to distribute my documents. (All documents with the same value for fieldA will be sent to the same shard). When I query my index, I can see that the number of documents increase but there are no fields at all in the index. http://10.0.5.211:8201/solr/Current/select?q=*:* response lst name=responseHeader int name=status0/int int name=QTime11/int lst name=params str name=q*:*/str /lst /lst result name=response numFound=26318 start=0 maxScore=1.0/ /response Specifying fields in the 'fl' parameter does nothing. What am I doing wrong? -- Regards, Shalin Shekhar Mangar. - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3336 / Base de données virale: 3162/6319 - Date: 12/05/2013 La Base de données des virus a expiré. - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3336 / Base de données virale: 3162/6319 - Date: 12/05/2013 La Base de données des virus a expiré. - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3336 / Base de données virale: 3162/6319 - Date: 12/05/2013 La Base de données des virus a expiré. - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3336 / Base de données virale: 3162/6319 - Date: 12/05/2013 La Base de données des virus a expiré.
Problem with document routing with Solr 4.2.1
Hi All, I just started indexing data in my brand new Solr Cloud running on 4.2.1. Since I am a big user of the grouping feature, I need to route my documents on the proper shard. Following the instruction found here: http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+SolrCloud I set my document id to something like this 'fieldA!id' where fieldA is the key I want to use to distribute my documents. (All documents with the same value for fieldA will be sent to the same shard). When I query my index, I can see that the number of documents increase but there are no fields at all in the index. http://10.0.5.211:8201/solr/Current/select?q=*:* response lst name=responseHeader int name=status0/int int name=QTime11/int lst name=params str name=q*:*/str /lst /lst result name=response numFound=26318 start=0 maxScore=1.0/ /response Specifying fields in the 'fl' parameter does nothing. What am I doing wrong?
RE: Problem with document routing with Solr 4.2.1
I know. If a stop routing the documents and simply use a standard 'id' field then I am getting back my fields. I forgot to tell you how the collection was created. http://localhost:8201/solr/admin/collections?action=CREATEname=CurrentnumShards=15replicationFactor=3maxShardsPerNode=9 Since I am using the numshards parameter then composite routing should be working... unless I misunderstood something -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: May-23-13 10:27 AM To: solr-user@lucene.apache.org Subject: Re: Problem with document routing with Solr 4.2.1 That's strange. The default value of rows param is 10 so you should be getting 10 results back unless your StandardRequestHandler config in solrconfig has set rows to 0 or if none of your fields are stored. On Thu, May 23, 2013 at 7:40 PM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedanalytics.com wrote: Hi All, I just started indexing data in my brand new Solr Cloud running on 4.2.1. Since I am a big user of the grouping feature, I need to route my documents on the proper shard. Following the instruction found here: http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+So lrCloud I set my document id to something like this 'fieldA!id' where fieldA is the key I want to use to distribute my documents. (All documents with the same value for fieldA will be sent to the same shard). When I query my index, I can see that the number of documents increase but there are no fields at all in the index. http://10.0.5.211:8201/solr/Current/select?q=*:* response lst name=responseHeader int name=status0/int int name=QTime11/int lst name=params str name=q*:*/str /lst /lst result name=response numFound=26318 start=0 maxScore=1.0/ /response Specifying fields in the 'fl' parameter does nothing. What am I doing wrong? -- Regards, Shalin Shekhar Mangar. - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3336 / Base de données virale: 3162/6319 - Date: 12/05/2013 La Base de données des virus a expiré.
RE: Problem with document routing with Solr 4.2.1
If that can help.. adding distrib=false or shard.keys= is giving back results. -Original Message- From: Jean-Sebastien Vachon [mailto:jean-sebastien.vac...@wantedanalytics.com] Sent: May-23-13 10:39 AM To: solr-user@lucene.apache.org Subject: RE: Problem with document routing with Solr 4.2.1 I know. If a stop routing the documents and simply use a standard 'id' field then I am getting back my fields. I forgot to tell you how the collection was created. http://localhost:8201/solr/admin/collections?action=CREATEname=CurrentnumShards=15replicationFactor=3maxShardsPerNode=9 Since I am using the numshards parameter then composite routing should be working... unless I misunderstood something -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: May-23-13 10:27 AM To: solr-user@lucene.apache.org Subject: Re: Problem with document routing with Solr 4.2.1 That's strange. The default value of rows param is 10 so you should be getting 10 results back unless your StandardRequestHandler config in solrconfig has set rows to 0 or if none of your fields are stored. On Thu, May 23, 2013 at 7:40 PM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedanalytics.com wrote: Hi All, I just started indexing data in my brand new Solr Cloud running on 4.2.1. Since I am a big user of the grouping feature, I need to route my documents on the proper shard. Following the instruction found here: http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+So lrCloud I set my document id to something like this 'fieldA!id' where fieldA is the key I want to use to distribute my documents. (All documents with the same value for fieldA will be sent to the same shard). When I query my index, I can see that the number of documents increase but there are no fields at all in the index. http://10.0.5.211:8201/solr/Current/select?q=*:* response lst name=responseHeader int name=status0/int int name=QTime11/int lst name=params str name=q*:*/str /lst /lst result name=response numFound=26318 start=0 maxScore=1.0/ /response Specifying fields in the 'fl' parameter does nothing. What am I doing wrong? -- Regards, Shalin Shekhar Mangar. - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3336 / Base de données virale: 3162/6319 - Date: 12/05/2013 La Base de données des virus a expiré. - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3336 / Base de données virale: 3162/6319 - Date: 12/05/2013 La Base de données des virus a expiré.
RE: Problem with document routing with Solr 4.2.1
I must add the shard.keys= does not return anything on two on my nodes. But that is to be expected since I'm using a replication factor of 3 on a cloud of 5 servers -Original Message- From: Jean-Sebastien Vachon [mailto:jean-sebastien.vac...@wantedanalytics.com] Sent: May-23-13 11:27 AM To: solr-user@lucene.apache.org Subject: RE: Problem with document routing with Solr 4.2.1 If that can help.. adding distrib=false or shard.keys= is giving back results. -Original Message- From: Jean-Sebastien Vachon [mailto:jean-sebastien.vac...@wantedanalytics.com] Sent: May-23-13 10:39 AM To: solr-user@lucene.apache.org Subject: RE: Problem with document routing with Solr 4.2.1 I know. If a stop routing the documents and simply use a standard 'id' field then I am getting back my fields. I forgot to tell you how the collection was created. http://localhost:8201/solr/admin/collections?action=CREATEname=CurrentnumShards=15replicationFactor=3maxShardsPerNode=9 Since I am using the numshards parameter then composite routing should be working... unless I misunderstood something -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: May-23-13 10:27 AM To: solr-user@lucene.apache.org Subject: Re: Problem with document routing with Solr 4.2.1 That's strange. The default value of rows param is 10 so you should be getting 10 results back unless your StandardRequestHandler config in solrconfig has set rows to 0 or if none of your fields are stored. On Thu, May 23, 2013 at 7:40 PM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedanalytics.com wrote: Hi All, I just started indexing data in my brand new Solr Cloud running on 4.2.1. Since I am a big user of the grouping feature, I need to route my documents on the proper shard. Following the instruction found here: http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+So lrCloud I set my document id to something like this 'fieldA!id' where fieldA is the key I want to use to distribute my documents. (All documents with the same value for fieldA will be sent to the same shard). When I query my index, I can see that the number of documents increase but there are no fields at all in the index. http://10.0.5.211:8201/solr/Current/select?q=*:* response lst name=responseHeader int name=status0/int int name=QTime11/int lst name=params str name=q*:*/str /lst /lst result name=response numFound=26318 start=0 maxScore=1.0/ /response Specifying fields in the 'fl' parameter does nothing. What am I doing wrong? -- Regards, Shalin Shekhar Mangar. - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3336 / Base de données virale: 3162/6319 - Date: 12/05/2013 La Base de données des virus a expiré. - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3336 / Base de données virale: 3162/6319 - Date: 12/05/2013 La Base de données des virus a expiré. - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3336 / Base de données virale: 3162/6319 - Date: 12/05/2013 La Base de données des virus a expiré.
RE: SOlr 3.5 and sharding
Hi Erick, It looks like we are saying the exact same thing but with different terms ;) I looked at the Solr glossary and you might be right.. maybe I should talk about partitions instead of shards. Since my last message, I`ve configured the replication between the master and slave and everything is working fine except for my original question about the number of documents not matching my expectations. I`ll try to clarify a few things and come back to this question... Machine A (which I called the master node) is where the indexation takes place. It consist of four Solr instances that will (eventually ) contain 1/4 of the entire collection. It`s just that, at this moment, since I have no control on which partition a given document is sent, I made copies of the same index for all partitions. Each Solr instance has a replication handler configured. I will eventually get to the point of changing the indexation code to distribute documents evenly on all partitions but the person who can give me access to this portion is not available right now so I can do nothing about it. Machine B has the same four shards setup to be replicas of the corresponding shard on machine A. Machine B also contains another Solr instance with the default handler configured to use the four local partitions. This instance receives client`s requests, collect the results from each partition and then select the best matches to form the final response. We intent to add new slaves being exact copies of Machine B and load balance clients requests on all slaves. My original question was that if each partition has 1000 documents matching a certain keyword and that I know all partitions have the same content then I was expecting to receive 4*1000 documents for the same keyword. But that is not the case. The replication is not an issue here since the same request on the master node will give me the same result. Each shard when called individually will give 1000 documents. But when I call them using the shards=xxx parameters then I am getting a little less than 4000 documents. I was just curious to know why this was happening... Is this a bug? Or something I am misunderstanding... Thanks for your time and contribution to Solr! -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: January-17-13 8:46 AM To: solr-user@lucene.apache.org Subject: Re: SOlr 3.5 and sharding You're still confusing shards (or at least mixing up the terminology) with simple replication. Shards are when you split up the index into several sub indexes and configure the sub-indexes to know about each other. Say you have 1M docs in 2 shards. 500K of them would go on one shard and 500K on the other. But logically you have a single index of 1M docs. So the two shards have to know about each other and when you send a request to one of them, it automatically queries the other (as well as itself), collects the response and combines them, returning the top N to the requester. This is totally different from replication. In replication (master/slave), each node has all 1M documents. Each node can work totally in isolation. An incoming request is handled by the slave without contacting any other node. If you're copying around indexes AND configuring them as though they were shards, each request will be distributed to all shards and the results collated, giving you the same doc repeatedly in your result set. If you have no access to the indexing code, you really can't go to a sharded setup. Polling is when the slaves periodically ask the master has anything changed? If so then the slave pulls down the changes. The polling interval is configured in solrconfig.xml _on the slave_. So let's say you index docs to the master. For some interval, until the slaves poll the master and get an updated index, the number of searchable docs on the master will be different than for the slaves. Additionally, you may have the issue of the polling intervals for the slaves being offset from one another, so for some brief interval the counts on the slaves may be different as well. Best Erick On Tue, Jan 15, 2013 at 10:18 AM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedanalytics.com wrote: Ok I see what Erick`s meant now.. Thanks. The original index I`m working on contains about 120k documents. Since I have no access to the code that pushes documents into the index, I made four copies of the same index. The master node contains no data at all, it simply use the data available in its four shards. Knowing that I have 1000 documents matching the keyword java on each shard I was expecting to receive 4000 documents out of my sharded setup. There are only a few documents that are not accounted for (The result count is about 3996 which is pretty close but not accurate). Right now, the index is static so there is no need for any replication so the polling interval has no effect. Later this week, I
RE: SOlr 3.5 and sharding
Hi Erick, Thanks for your comments but I am migrating an existing index (single instance) to a sharded setup and currently I have no access to the code involved in the indexation process. That`s why I made a simple copy of the index on each shards. In the end, the data will be distributed among all shards. I was just curious to know why I had not the expected number of documents with my four shards. Can you elaborate on this polling interval thing? I am pretty sure I never eared about this... Regards -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: January-15-13 8:00 AM To: solr-user@lucene.apache.org Subject: Re: SOlr 3.5 and sharding You're confusing shards and slaves here. Shards are splitting a logical index amongst N machines, where each machine contains a portion of the index. In that setup, you have to configure the slaves to know about the other shards, and the incoming query has to be distributed amongst all the shards to find all the docs. In your case, since you're really replicating (rather than sharding), you only have to query _one_ slave, the query doesn't need to be distributed. So pull all the sharding stuff out of your config files, put a load balancer in front of your slaves and only send the request to one of them would be the place I'd start. Also, don't be at all surprised if the number of hits from the _master_ (which you shouldn't be searching, BTW) is different than the slaves, there's the polling interval to consider. Best Erick On Mon, Jan 14, 2013 at 9:58 AM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedanalytics.com wrote: Hi, I`m setting up a small Sorl setup consisting of 1 master node and 4 shards. For now, all four shards contains the exact same data. When I perform a query on each individual shards for the word `java` I am receiving the same number of docs (as expected). However, when I am going through the master node using the shards parameters, the number of results is slightly off by a few documents. There is nothing special in my setup so I`m looking for hints on why I am getting this problem Thanks - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.2890 / Base de données virale: 2638/6032 - Date: 14/01/2013
RE: SOlr 3.5 and sharding
Ok I see what Erick`s meant now.. Thanks. The original index I`m working on contains about 120k documents. Since I have no access to the code that pushes documents into the index, I made four copies of the same index. The master node contains no data at all, it simply use the data available in its four shards. Knowing that I have 1000 documents matching the keyword java on each shard I was expecting to receive 4000 documents out of my sharded setup. There are only a few documents that are not accounted for (The result count is about 3996 which is pretty close but not accurate). Right now, the index is static so there is no need for any replication so the polling interval has no effect. Later this week, I will configure the replication and have the indexation modified to distribute the documents to each shard using a simple ID modulo 4 rule. Were my expectations wrong about the number of documents? -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: January-15-13 9:21 AM To: solr-user@lucene.apache.org Subject: Re: SOlr 3.5 and sharding He was referring to master/slave setup, where a slave will poll the master periodically asking for index updates. That frequency is configured in solrconfig.xml on the slave. So, you are saying that you have, say 1m documents in your master index. You then copy your index to four other boxes. At that point you have 1m documents on each of those four. Eventually, you'll delete some docs, so'd you have 250k on each. You're wondering, before the deletes, you're not seeing 1m docs on each of your instances. Or are you wondering why you're not seeing 1m docs when you do a distributed query across all for of these boxes? Is that correct? Upayavira On Tue, Jan 15, 2013, at 02:11 PM, Jean-Sebastien Vachon wrote: Hi Erick, Thanks for your comments but I am migrating an existing index (single instance) to a sharded setup and currently I have no access to the code involved in the indexation process. That`s why I made a simple copy of the index on each shards. In the end, the data will be distributed among all shards. I was just curious to know why I had not the expected number of documents with my four shards. Can you elaborate on this polling interval thing? I am pretty sure I never eared about this... Regards -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: January-15-13 8:00 AM To: solr-user@lucene.apache.org Subject: Re: SOlr 3.5 and sharding You're confusing shards and slaves here. Shards are splitting a logical index amongst N machines, where each machine contains a portion of the index. In that setup, you have to configure the slaves to know about the other shards, and the incoming query has to be distributed amongst all the shards to find all the docs. In your case, since you're really replicating (rather than sharding), you only have to query _one_ slave, the query doesn't need to be distributed. So pull all the sharding stuff out of your config files, put a load balancer in front of your slaves and only send the request to one of them would be the place I'd start. Also, don't be at all surprised if the number of hits from the _master_ (which you shouldn't be searching, BTW) is different than the slaves, there's the polling interval to consider. Best Erick On Mon, Jan 14, 2013 at 9:58 AM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedanalytics.com wrote: Hi, I`m setting up a small Sorl setup consisting of 1 master node and 4 shards. For now, all four shards contains the exact same data. When I perform a query on each individual shards for the word `java` I am receiving the same number of docs (as expected). However, when I am going through the master node using the shards parameters, the number of results is slightly off by a few documents. There is nothing special in my setup so I`m looking for hints on why I am getting this problem Thanks - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.2890 / Base de données virale: 2638/6032 - Date: 14/01/2013 - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.2890 / Base de données virale: 2638/6032 - Date: 14/01/2013
RE: SOlr 3.5 and sharding
Ok that was my first thought... thanks for the confirmation -Original Message- From: Michael Ryan [mailto:mr...@moreover.com] Sent: January-14-13 10:06 AM To: solr-user@lucene.apache.org Subject: RE: SOlr 3.5 and sharding If you have the same documents -- with the same uniqueKey -- across multiple shards, the count will not be what you expect. You'll need to ensure that each document exists only on a single shard. -Michael -Original Message- From: Jean-Sebastien Vachon [mailto:jean-sebastien.vac...@wantedanalytics.com] Sent: Monday, January 14, 2013 9:59 AM To: solr-user@lucene.apache.org Subject: SOlr 3.5 and sharding Hi, I`m setting up a small Sorl setup consisting of 1 master node and 4 shards. For now, all four shards contains the exact same data. When I perform a query on each individual shards for the word `java` I am receiving the same number of docs (as expected). However, when I am going through the master node using the shards parameters, the number of results is slightly off by a few documents. There is nothing special in my setup so I`m looking for hints on why I am getting this problem Thanks - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.2805 / Base de données virale: 2637/5996 - Date: 29/12/2012 La Base de données des virus a expiré.
RE: FieldCache
10 unique terms on 1.5M documents each with 50+ fields? I don't think so ;) What I mean is controlling its size like the other caches. There are currently no options in solrconfig.xml to control this cache. Is Solr/Lucene managing this all by itself? It could be that my understanding of the FieldCache is wrong. I thought this was the main cache for Lucene. Is that right? Thanks for your feedback -Original Message- From: pravesh [mailto:suyalprav...@yahoo.com] Sent: May-26-11 2:58 AM To: solr-user@lucene.apache.org Subject: Re: FieldCache This is because you may be having only 10 unique terms in your indexed Field. BTW, what do you mean by controlling the FieldCache? -- View this message in context: http://lucene.472066.n3.nabble.com/FieldCache-tp2987541p2988142.html Sent from the Solr - User mailing list archive at Nabble.com.
FieldCache
Hi All, Since there is no way of controlling the size of Lucene's internal FieldCache, how can we make sure that we are making good use of it? One of my shard has close to 1.5M documents and the fieldCache only contains about 10 elements. Is there anything we can do to control this? Thanks
SOLR-2209
Hi All, I am having some problems with the presence of unnecessary parenthesis in my query. A query such as: title:software AND (title:engineer) will return no results. Remove the parenthesis fix the issue but then since my user can enter the parenthesis by himself I need to find a way to fix or work-around this bug. I found that this is related to SOLR-2209 but there is no activity on this bug. Anyone know if this will get fixed some time in the future or if it is already fixed in Solr 4? Otherwise, could someone point me to the code handling this so that I can attempt to make a fix? Thx
RE: SOLR-2209
I'm using Solr 1.4... I thought I had a case without a NOT but it seems to work now :S It might be a glitch on my server. The problem is easily reproducible with the NOT operator http://10.0.5.221:8983/jobs/select?q=title:java%20AND%20(-title:programmer) http://10.0.5.221:8983/jobs/select?q=title:java%20AND%20(-(title:programmer) ) both queries returns 0 results while... http://10.0.5.221:8983/jobs/select?q=title:java%20AND%20-(title:programmer) (note the position of the negation operator) returns more than 50 000 results -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: May-19-11 9:53 AM To: solr-user@lucene.apache.org Subject: Re: SOLR-2209 What version of Solr are you using? Because this works fine for me. Could you attach the results of adding debugQuery=on in both instances? The parsed form of the query is identical in 1.4.1 as far as I can tell. The bug you're referencing is a peculiarity of the not (-) operator I think. Best Erick On Thu, May 19, 2011 at 7:25 AM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedtech.com wrote: Hi All, I am having some problems with the presence of unnecessary parenthesis in my query. A query such as: title:software AND (title:engineer) will return no results. Remove the parenthesis fix the issue but then since my user can enter the parenthesis by himself I need to find a way to fix or work-around this bug. I found that this is related to SOLR-2209 but there is no activity on this bug. Anyone know if this will get fixed some time in the future or if it is already fixed in Solr 4? Otherwise, could someone point me to the code handling this so that I can attempt to make a fix? Thx
RE: Exact match on a field with stemming
I'm curious to know why Solr is not respecting the phrase. If it consider manager as a phrase... shouldn't it return only document containing that phrase? -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: April-11-11 3:42 PM To: solr-user@lucene.apache.org Subject: Re: Exact match on a field with stemming Hi, Using quoted means use this as a phrase, not use this as a literal. :) I think copying to unstemmed field is the only/common work-around. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Pierre-Luc Thibeault pierre-luc.thibea...@wantedtech.com To: solr-user@lucene.apache.org Sent: Mon, April 11, 2011 2:55:04 PM Subject: Exact match on a field with stemming Hi all, Is there a way to perform an exact match query on a field that has stemming enable by using the standard /select handler? I thought that putting word inside double-quotes would enable this behaviour but if I query my field with a single word like “manager” I am receiving results containing the word “management” I know I can use a CopyField with different types but that would double the size of my index… Is there an alternative? Thanks
RE: Exact match on a field with stemming
Thanks for the clarification. This make sense. -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: April-11-11 7:54 PM To: solr-user@lucene.apache.org Subject: FW: Exact match on a field with stemming I'm curious to know why Solr is not respecting the phrase. If it consider manager as a phrase... shouldn't it return only document containing that phrase? A phrase means to solr (or rather to the lucene and dismax query parsers, which are what understand double-quoted phrases) these tokens in exactly this order So a phrase of one token manager, is exactly the same as if you didn't use the double quotes. It's only one token, so all the tokens in this phrase in exactly the order specified is, well, just the same as one token without phrase quotes. If you've set up a stemmed field at indexing time, then manager and management are stemmed IN THE INDEX, probably to something like manag. There is no longer any information in the index (at least in that field) on what the original literal was, it's been stemmed in the index. So there's no way possible for it to only match certain un-stemmed versions -- at least using that field. And when you enter either 'manager' or 'management' at query time, it is analyzed and stemmed to match that stemmed something-like manag in the index either way. If it didn't analyze and stem at query time, then instead the query would just match NOTHING, because neither 'manager' nor 'management' are in the index at all, only the stemmed versions. So, yes, double quotes are interpreted as a phrase, and only documents containing that phrase are returned, you got it. -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: April-11-11 3:42 PM To: solr-user@lucene.apache.org Subject: Re: Exact match on a field with stemming Hi, Using quoted means use this as a phrase, not use this as a literal. :) I think copying to unstemmed field is the only/common work-around. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Pierre-Luc Thibeault pierre-luc.thibea...@wantedtech.com To: solr-user@lucene.apache.org Sent: Mon, April 11, 2011 2:55:04 PM Subject: Exact match on a field with stemming Hi all, Is there a way to perform an exact match query on a field that has stemming enable by using the standard /select handler? I thought that putting word inside double-quotes would enable this behaviour but if I query my field with a single word like manager I am receiving results containing the word management I know I can use a CopyField with different types but that would double the size of my index. Is there an alternative? Thanks =
Re: spatial query parinsg error: org.apache.lucene.queryParser.ParseException
Try this... http://localhost:8080/solr/select?wt=jsonindent=trueq={!spatial%20lat=37.326375%20lng=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%20Loft - Original Message - From: Dennis Gearon gear...@sbcglobal.net To: solr-user@lucene.apache.org Sent: Wednesday, December 01, 2010 7:51 PM Subject: spatial query parinsg error: org.apache.lucene.queryParser.ParseException I am trying to get spatial search to work on my Solr installation. I am running version 1.4.1 with the Jayway Team spatial-solr-plugin. I am performing the search with the following url: http://localhost:8080/solr/select?wt=jsonindent=trueq=title:Art%20Loft{!spatial%20lat=37.326375%20lng=-121.892639%20radius=3%20unit=km%20threadCount=3} The result that I get is the following error: HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 'title:Art Loft{!spatial lat=37.326375 lng=-121.892639 radius=3 unit=km threadCount=3}': Encountered RANGEEX_GOOP lng=-121.892639 at line 1, column 38. Was expecting: } Not sure why it would be complaining about the lng parameter in the query. I double-checked to make sure that I had the right name for the longitude field in my solrconfig.xml file. Any help/suggestions would be greatly appreciated Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die.
Re: spatial query parinsg error: org.apache.lucene.queryParser.ParseException
I just saw the parameter 'lng' in your query... I believe it should be 'long'. Give it a try if the link I sent you is not working - Original Message - From: Dennis Gearon gear...@sbcglobal.net To: solr-user@lucene.apache.org Sent: Wednesday, December 01, 2010 11:39 PM Subject: Re: spatial query parinsg error: org.apache.lucene.queryParser.ParseException Thanks Jean-Sebastion. I forwarded it to my partner. His membership is still being held up. I'll be the go between until he has access. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Jean-Sebastien Vachon js.vac...@videotron.ca To: solr-user@lucene.apache.org Sent: Wed, December 1, 2010 7:12:20 PM Subject: Re: spatial query parinsg error: org.apache.lucene.queryParser.ParseException Try this... http://localhost:8080/solr/select?wt=jsonindent=trueq={!spatial%20lat=37.326375%20lng=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%20Loft - Original Message - From: Dennis Gearon gear...@sbcglobal.net To: solr-user@lucene.apache.org Sent: Wednesday, December 01, 2010 7:51 PM Subject: spatial query parinsg error: org.apache.lucene.queryParser.ParseException I am trying to get spatial search to work on my Solr installation. I am running version 1.4.1 with the Jayway Team spatial-solr-plugin. I am performing the search with the following url: http://localhost:8080/solr/select?wt=jsonindent=trueq=title:Art%20Loft{!spatial%20lat=37.326375%20lng=-121.892639%20radius=3%20unit=km%20threadCount=3} The result that I get is the following error: HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 'title:Art Loft{!spatial lat=37.326375 lng=-121.892639 radius=3 unit=km threadCount=3}': Encountered RANGEEX_GOOP lng=-121.892639 at line 1, column 38. Was expecting: } Not sure why it would be complaining about the lng parameter in the query. I double-checked to make sure that I had the right name for the longitude field in my solrconfig.xml file. Any help/suggestions would be greatly appreciated Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die.
Facet.query and collapsing
Hi All, I'm in a situation where I need to perform a facet on a query with field collapsing. Let's say the main query is something like this title:applefq={!tag=sources}source_id:(33 OR 44)facet=onfacet.field={!ex=sources}source_idfacet.query=source_id:(33 OR 44)collapse=oncollapse.field=hash_id I'd like my facet query to return the number of unique documents (based on the hash_id field) that are associated to either source 33 or 44 Right now, the query works but the count returned is larger than expected since there is no collapsing performed on the facet query's result set. Is there any way of doing this? I'd like to be able to do this without performing a second request. Thanks NOTE: I'm using Solr 1.4.1 with patch 236 (https://issues.apache.org/jira/browse/SOLR-236)
Re: Looking for help with Solr implementation
Yes we did. Sorry for this. We both made the same error replying to the mailing list. - Original Message - From: Thumuluri, Sai sai.thumul...@verizonwireless.com To: solr-user@lucene.apache.org Sent: Saturday, November 13, 2010 8:41 AM Subject: RE: Looking for help with Solr implementation Please refrain using this mailing group for soliciting and take it offline -Original Message- From: AC [mailto:acanuc...@yahoo.com] Sent: Sat 11/13/2010 1:12 AM To: solr-user@lucene.apache.org Subject: Re: Looking for help with Solr implementation Hey Jean-Sebastien, Thanks for the reply. It sounds like your experience is exactly what is needed for my project. To give you some background this project is for a personal project related to biomedical field that I'm trying to get up off the ground. The site is www.antibodyreview.com It is a portal site for researchers in the biotech industry specifically focused on antibodies - not sure how up you may be on biomedical research :) Anyway I have collected a lot of information about proteins and antibodies from various sources which people can search and browse. The site is and will be free to access by anyone. The current search uses MySQL but our requirements for how the site needs to operate cannot be properly handled by MySQL. Searches can take ~8-10 sec and this is clearly not acceptable. If you try the default search on the index page you can see how slow it is. Suggested terms to try: Akt, p53, PTEN, AIF. So there are several different items indexed in solr that we want to search: 1. Protein Information (~42,000 MySQL DB records) 2. Products (expect to host 200,000 product records, currently ~20,000 products) http://www.antibodyreview.com/products.php (current product search is faceted but also takes way too long) 3. Articles (text from ~120,000 articles) Article search can be accessed from the protein pages and advanced search page: http://www.antibodyreview.com/advsearch.php 4. Images (~100,000 image captions) Image search is found on this page http://www.antibodyreview.com/gallery.php The current solr search which has been set-up can be seen on this page: www.antibodyreview.com/proteins3.php (search bar on this page uses solr). It is clearly much faster and meets our needs so it seems clear that using solr is the solution to the search issue. The last programmer mentioned that he had indexed all the data and it is now just a matter of setting up the search queries in solr. The most complicated query to set-up will be the products as it requires faceted search. The other searches are failry routine or have more limited facets/options. If it looks like their is mutual interest I can share with you a document that he created that explains how things have been set-up which should help you get started. Please let me know what you think. Regards, Abe From: Jean-Sebastien Vachon js.vac...@videotron.ca To: solr-user@lucene.apache.org Sent: Fri, November 12, 2010 7:09:06 PM Subject: Re: Looking for help with Solr implementation Hi, If you're still looking for someone, I might be interested in getting more information about your project. From you initial message that does not seem to be a lot of work so I might be willing to give you some time. I've been working with Solr for the last 7 months on my full-time job and I'm currently managing a Solr based project that use Field collapsing, facetting, custom scoring with function queries and a custom query handler. Contact me if you're interested - Original Message - From: AC acanuc...@yahoo.com To: solr-user@lucene.apache.org Sent: Thursday, November 11, 2010 7:43 PM Subject: Looking for help with Solr implementation Hi, Not sure if this is the correct place to post but I'm looking for someone to help finish a Solr install on our LAMP based website. This would be a paid project. The programmer that started the project got too busy with his full-time job to finish the project. Solr has been installed and a basic search is working but we need to configure it to work across the site and also set-up faceted search. I tried posting on some popular freelance sites but haven't been able to find anyone with real Solr expertise / experience. If you think you can help me with this project please let me know and I can supply more details. Regards, Abe
Re: Looking for help with Solr implementation
Hi, If you're still looking for someone, I might be interested in getting more information about your project. From you initial message that does not seem to be a lot of work so I might be willing to give you some time. I've been working with Solr for the last 7 months on my full-time job and I'm currently managing a Solr based project that use Field collapsing, facetting, custom scoring with function queries and a custom query handler. Contact me if you're interested - Original Message - From: AC acanuc...@yahoo.com To: solr-user@lucene.apache.org Sent: Thursday, November 11, 2010 7:43 PM Subject: Looking for help with Solr implementation Hi, Not sure if this is the correct place to post but I'm looking for someone to help finish a Solr install on our LAMP based website. This would be a paid project. The programmer that started the project got too busy with his full-time job to finish the project. Solr has been installed and a basic search is working but we need to configure it to work across the site and also set-up faceted search. I tried posting on some popular freelance sites but haven't been able to find anyone with real Solr expertise / experience. If you think you can help me with this project please let me know and I can supply more details. Regards, Abe
Re: Looking for help with Solr implementation
Sorry all, I obviously meant to send this to the original poster - Original Message - From: Jean-Sebastien Vachon js.vac...@videotron.ca To: solr-user@lucene.apache.org Sent: Friday, November 12, 2010 10:09 PM Subject: Re: Looking for help with Solr implementation Hi, If you're still looking for someone, I might be interested in getting more information about your project. From you initial message that does not seem to be a lot of work so I might be willing to give you some time. I've been working with Solr for the last 7 months on my full-time job and I'm currently managing a Solr based project that use Field collapsing, facetting, custom scoring with function queries and a custom query handler. Contact me if you're interested - Original Message - From: AC acanuc...@yahoo.com To: solr-user@lucene.apache.org Sent: Thursday, November 11, 2010 7:43 PM Subject: Looking for help with Solr implementation Hi, Not sure if this is the correct place to post but I'm looking for someone to help finish a Solr install on our LAMP based website. This would be a paid project. The programmer that started the project got too busy with his full-time job to finish the project. Solr has been installed and a basic search is working but we need to configure it to work across the site and also set-up faceted search. I tried posting on some popular freelance sites but haven't been able to find anyone with real Solr expertise / experience. If you think you can help me with this project please let me know and I can supply more details. Regards, Abe
problem with wildcard
Hi All, I'm having some trouble with a query using some wildcard and I was wondering if anyone could tell me why these two similar queries do not return the same number of results. Basically, the query I'm making should return all docs whose title starts (or contain) the string lowe'. I suspect some analyzer is causing this behaviour and I'd like to know if there is a way to fix this problem. 1) select?q=*:*fq=title:(+lowe')debugQuery=onrows=0 result name=response numFound=302 start=0/ lst name=debug str name=rawquerystring*:*/str str name=querystring*:*/str str name=parsedqueryMatchAllDocsQuery(*:*)/str str name=parsedquery_toString*:*/str lst name=explain/ str name=QParserLuceneQParser/str arr name=filter_queries strtitle:( lowe')/str /arr arr name=parsed_filter_queries strtitle:low/str /arr 2) select?q=*:*fq=title:(+lowe'*)debugQuery=onrows=0 result name=response numFound=0 start=0/ lst name=debug str name=rawquerystring*:*/str str name=querystring*:*/str str name=parsedqueryMatchAllDocsQuery(*:*)/str str name=parsedquery_toString*:*/str lst name=explain/ str name=QParserLuceneQParser/str arr name=filter_queries strtitle:( lowe'*)/str /arr arr name=parsed_filter_queries strtitle:lowe'*/str /arr ... /lst The title field is defined as: field name=title type=text indexed=true stored=true required=false/ where the text type is: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType
Re: problem with wildcard
On 2010-11-11, at 3:45 PM, Ahmet Arslan wrote: I'm having some trouble with a query using some wildcard and I was wondering if anyone could tell me why these two similar queries do not return the same number of results. Basically, the query I'm making should return all docs whose title starts (or contain) the string lowe'. I suspect some analyzer is causing this behaviour and I'd like to know if there is a way to fix this problem. 1) select?q=*:*fq=title:(+lowe')debugQuery=onrows=0 wildcard queries are not analyzed http://search-lucene.com/m/pnmlH14o6eM1/ Yeah I found out about this a couple of minutes after I posted my problem. If there is no analyzer then why is Solr not finding any documents when a single quote precedes the wildcard?
Re: Problem escaping question marks
Have you tried encoding it with %3F? firstname:*%3F* On 2010-11-04, at 1:44 AM, Stephen Powis wrote: I'm having difficulty properly escaping ? in my search queries. It seems as tho it matches any character. Some info, a simplified schema and query to explain the issue I'm having. I'm currently running solr1.4.1 Schema: field name=id type=sint indexed=true stored=true required=true / field name=first_name type=string indexed=true stored=true required=false / I want to return any first name with a Question Mark in it Query: first_name: *\?* Returns all documents with any character in it. Can anyone lend a hand? Thanks! Stephen
Re: RAM increase
You will also need to switch to a 64 bits JVM You might have to add the `-d64` flag as well as the `-Xms` and `-Xmx` - Original Message - From: Gora Mohanty g...@mimirtech.com To: solr-user@lucene.apache.org Sent: Thursday, October 21, 2010 2:34 AM Subject: Re: RAM increase On Thu, Oct 21, 2010 at 10:46 AM, satya swaroop satya.yada...@gmail.com wrote: Hi all, I increased my RAM size to 8GB and i want 4GB of it to be used for solr itself. can anyone tell me the way to allocate the RAM for the solr. [...] You will need to set up the allocation of RAM for Java, via the the -Xmx and -Xms variables. If you are using something like Tomcat, that would be done in the Tomcat configuration file. E.g., this option can be added inside /etc/init.d/tomcat6 on new Debian/Ubuntu systems. Regards, Gora
Re: Need help with field collapsing and out of memory error
can you tell us what are your current settings regarding the fieldCollapseCache? I had similar issues with field collapsing and I found out that this cache was responsible for most of the OOM exceptions. Reduce or even remove this cache from your configuration and it should help. On 2010-09-01, at 1:10 PM, Moazzam Khan wrote: Hi guys, I have about 20k documents in the Solr index (and there's a lot of text in each of them). I have field collapsing enabled on a specific field (AdvisorID). The thing is if I have field collapsing enabled in the search request I don't get correct count for the total number of records that matched. It always says that the number of rows I asked to get back is the number of total records it found. And, when I run a query with search criteria *:* (to get the number of total advisors in the index) solr runs of out memory and gives me an error saying SEVERE: java.lang.OutOfMemoryError: Java heap space at java.nio.CharBuffer.wrap(CharBuffer.java:350) at java.nio.CharBuffer.wrap(CharBuffer.java:373) at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138) at java.lang.StringCoding.decode(StringCoding.java:173) This is going to be a huge problem later on when we index 50k documents later on. These are the options I am running Solr with : java -Xms2048M -Xmx2048M -XX:+UseConcMarkSweepGC -XX:PermSize=1024m MaxPermSize=1024m-jar start.jar Is there any way I can get the counts and not run out of memory? Thanks in advance, Moazzam
bug or feature???
Hi, Can someone tell me why the two following queries do not return the same results? Is that a bug or a feature? http://localhost:8983/jobs/select?fq=title:(NOT janitor)fq=description:(NOT janitor)q=*:* http://localhost:8983/jobs/select?q=title:(NOT janitor) AND description:(NOT janitor) The second query returns no result while the first one returns 6097276 documents Thanks
Re: question about the fieldCollapseCache
They used to be in the branches if I recall correctly but you're right. They aren't there anymore. Maybe someone else can explain why... it looks like they restructure the repository for the Solr/lucene merge. On 2010-06-15, at 4:54 AM, Rakhi Khatwani wrote: Hi, I tried downloading solr 1.4.1 from the site. but it shows an empty directory. where did u get solr 1.4.1 from? Regards, Raakhi On Tue, Jun 8, 2010 at 10:35 PM, Jean-Sebastien Vachon js.vac...@videotron.ca wrote: Hi All, I've been running some tests using 6 shards each one containing about 1 millions documents. Each shard is running in its own virtual machine with 7 GB of ram (5GB allocated to the JVM). After about 1100 unique queries the shards start to struggle and run out of memory. I've reduced all other caches without significant impact. When I remove completely the fieldCollapseCache, the server can keep up for hours and use only 2 GB of ram. (I'm even considering returning to a 32 bits JVM) The size of the fieldCollapseCache was set to 5000 items. How can 5000 items eat 3 GB of ram? Can someone tell me what is put in this cache? Has anyone experienced this kind of problem? I am running Solr 1.4.1 with patch 236. All requests are collapsing on a single field (pint) and collapse.maxdocs set to 200 000. Thanks for any hints...
Re: question about the fieldCollapseCache
ok great. I believe this should be mentioned in the wiki. Later On 2010-06-09, at 4:06 AM, Martijn v Groningen wrote: The fieldCollapseCache should not be used as it is now, it uses too much memory. It stores any information relevant for a field collapse search. Like document collapse counts, collapsed document ids / fields, collapsed docset and uncollapsed docset (everything per unique search). So the memory usage will grow for each unique query (and fast with all this information). So its best I think to disable this cache for now. Martijn On 8 June 2010 19:05, Jean-Sebastien Vachon js.vac...@videotron.ca wrote: Hi All, I've been running some tests using 6 shards each one containing about 1 millions documents. Each shard is running in its own virtual machine with 7 GB of ram (5GB allocated to the JVM). After about 1100 unique queries the shards start to struggle and run out of memory. I've reduced all other caches without significant impact. When I remove completely the fieldCollapseCache, the server can keep up for hours and use only 2 GB of ram. (I'm even considering returning to a 32 bits JVM) The size of the fieldCollapseCache was set to 5000 items. How can 5000 items eat 3 GB of ram? Can someone tell me what is put in this cache? Has anyone experienced this kind of problem? I am running Solr 1.4.1 with patch 236. All requests are collapsing on a single field (pint) and collapse.maxdocs set to 200 000. Thanks for any hints...
Re: Diagnosing solr timeout
Have you looked at the garbage collector statistics? I've experienced this kind of issues in the past and I was getting huge spikes when the GC was doing its job. On 2010-06-09, at 10:52 AM, Paul wrote: Hi all, In my app, it seems like solr has become slower over time. The index has grown a bit, and there are probably a few more people using the site, but the changes are not drastic. I notice that when a solr search is made, the amount of cpu and ram spike precipitously. I notice in the solr log, a bunch of entries in the same second that end in: status=0 QTime=212 status=0 QTime=96 status=0 QTime=44 status=0 QTime=276 status=0 QTime=8552 status=0 QTime=16 status=0 QTime=20 status=0 QTime=56 and then: status=0 QTime=315919 status=0 QTime=325071 My questions: How do I figure out what to fix? Do I need to start java with more memory? How do I tell what is the correct amount of memory to use? Is there something particularly inefficient about something else in my configuration, or the way I'm formulating the solr request, and how would I narrow down what it could be? I can't tell, but it seems like it happens after solr has been running unattended for a little while. Should I have a cron job that restarts solr every day? Could the solr process be starved by something else on the server (although -- the only other thing that is particularly running is apache/passenger/rails app)? In other words, I'm at a total loss about how to fix this. Thanks! P.S. In case this helps, here's the exact log entry for the first item that failed: Jun 9, 2010 1:02:52 PM org.apache.solr.core.SolrCore execute INFO: [resources] webapp=/solr path=/select params={hl.fragsize=600facet.missing=truefacet=falsefacet.mincount=1ids=http://pm.nlx.com/xtf/view?docId%3Dshelley/shelley.04.xml;chunk.id%3Ddiv.ww.shelleyworks.v4.44,http://pm.nlx.com/xtf/view?docId%3Dshelley/shelley.06.xml;chunk.id%3Ddiv.ww.shelleyworks.v6.67,http://pm.nlx.com/xtf/view?docId%3Dtennyson_c/tennyson_c.02.xml;chunk.id%3Ddiv.tennyson.v2.1115,http://pm.nlx.com/xtf/view?docId%3Dmarx/marx.39.xml;chunk.id%3Ddiv.marx.engels.39.325,http://pm.nlx.com/xtf/view?docId%3Dshelley_j/shelley_j.01.xml;chunk.id%3Ddiv.ww.shelley.journals.v1.80,http://pm.nlx.com/xtf/view?docId%3Deliot/eliot.01.xml;chunk.id%3Ddiv.eliot.novels.bede.116,http://pm.nlx.com/xtf/view?docId%3Deliot/eliot.01.xml;chunk.id%3Ddiv.eliot.novels.bede.115,http://pm.nlx.com/xtf/view?docId%3Deliot/eliot.01.xml;chunk.id%3Ddiv.eliot.novels.bede.75,http://pm.nlx.com/xtf/view?docId%3Deliot/eliot.01.xml;chunk.id%3Ddiv.eliot.novels.bede.76,http://pm.nlx.com/xtf/view?docId%3Demerson/emerson.05.xml;chunk.id%3Dralph.waldo.v5.d083,http://pm.nlx.com/xtf/view?docId%3Dshelley/shelley.04.xml;chunk.id%3Ddiv.ww.shelleyworks.v4.31,http://pm.nlx.com/xtf/view?docId%3Dshelley_j/shelley_j.01.xml;chunk.id%3Ddiv.ww.shelley.journals.v1.88,http://pm.nlx.com/xtf/view?docId%3Deliot/eliot.03.xml;chunk.id%3Ddiv.eliot.romola.48facet.limit=-1hl.fl=texthl.maxAnalyzedChars=512000wt=javabinhl=truerows=30version=1fl=uri,archive,date_label,genre,source,image,thumbnail,title,alternative,url,role_ART,role_AUT,role_EDT,role_PBL,role_TRL,role_EGR,role_ETR,role_CRE,freeculture,is_ocr,federation,has_full_text,source_xml,uristart=0q=(*:*+AND+(life)+AND+(death)+AND+(of)+AND+(jason)+AND+federation:NINES)+OR+(*:*+AND+(life)+AND+(death)+AND+(of)+AND+(jason)+AND+federation:NINES+-genre:Citation)^5facet.field=genrefacet.field=archivefacet.field=freeculturefacet.field=has_full_textfacet.field=federationisShard=truefq=year:1882} status=0 QTime=315919
Re: Diagnosing solr timeout
I use the following article as a reference when dealing with GC related issues http://www.petefreitag.com/articles/gctuning/ I suggest you activate the verbose option and send GC stats to a file. I don't remember exactly what was the option but you should find the information easily Good luck On 2010-06-09, at 11:35 AM, Paul wrote: Have you looked at the garbage collector statistics? I've experienced this kind of issues in the past and I was getting huge spikes when the GC was doing its job. I haven't, and I'm not sure what a good way to monitor this is. The problem occurs maybe once a week on a server. Should I run jstat the whole time and redirect the output to a log file? Is there another way to get that info? Also, I was suspecting GC myself. So, if it is the problem, what do I do about it? It seems like increasing RAM might make the problem worse because it would wait longer to GC, then it would have more to do.
question about the fieldCollapseCache
Hi All, I've been running some tests using 6 shards each one containing about 1 millions documents. Each shard is running in its own virtual machine with 7 GB of ram (5GB allocated to the JVM). After about 1100 unique queries the shards start to struggle and run out of memory. I've reduced all other caches without significant impact. When I remove completely the fieldCollapseCache, the server can keep up for hours and use only 2 GB of ram. (I'm even considering returning to a 32 bits JVM) The size of the fieldCollapseCache was set to 5000 items. How can 5000 items eat 3 GB of ram? Can someone tell me what is put in this cache? Has anyone experienced this kind of problem? I am running Solr 1.4.1 with patch 236. All requests are collapsing on a single field (pint) and collapse.maxdocs set to 200 000. Thanks for any hints...
Re: Faceted search not working?
Is the FacetComponent loaded at all? requestHandler name=standard class=solr.SearchHandler default=true arr name=components strquery/str strfacet/str /arr /requestHandler On 2010-05-25, at 3:32 AM, Sascha Szott wrote: Hi Birger, Birger Lie wrote: I don't think the bolean fields is mapped to on and off :) You can use true and on interchangeably. -Sascha -birger -Original Message- From: Ilya Sterin [mailto:ster...@gmail.com] Sent: 24. mai 2010 23:11 To: solr-user@lucene.apache.org Subject: Faceted search not working? I'm trying to perform a faceted search without any luck. Result set doesn't return any facet information... http://localhost:8080/solr/select/?q=title:*facet=onfacet.field=title I'm getting the result set, but no face information present? Is there something else that needs to happen to turn faceting on? I'm using latest Solr 1.4 release. Data is indexed from the database using dataimporter. Thanks. Ilya Sterin
Re: jmx issue with solr
Hi, Try adding these options... -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false On 2010-05-19, at 3:44 AM, Na_D wrote: Hi, I am trying to start solr with the following command : java -Dsolr.solr.home=./example-DIH/solr/ -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=3000 On doing so an error is reported : Error: Password file read access must be restricted: C:\Program Files\Java\jdk1. 6.0_18\jre\lib\management\jmxremote.password The jmxremote.password file is there in the lib\management folder and the same has been set to read-only. still the error persists.I am using Windows XP SP3 Version 2002, just mentioning the same if its of any help. Please do put in your suggestions. -- View this message in context: http://lucene.472066.n3.nabble.com/jmx-issue-with-solr-tp828478p828478.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: JTeam Spatial Plugin
Hi, Thanks for your suggestion but I received more information about this issue from one of the JTeam's developer and he told me that my problem was caused by the plugin not supporting sharding at this time. In my case, I noticed that individual shards were computing the distance through the geo_distance field. However, the master Solr instance controlling the shards was kind of loosing this information from the lack of support for shards. For now there is no quick work around that I know of. Later, On 2010-05-11, at 2:54 PM, Michael wrote: Try using geo_distance in the return fields. On Thu, Apr 29, 2010 at 9:26 AM, Jean-Sebastien Vachon js.vac...@videotron.ca wrote: Hi All, I am using JTeam's Spatial Plugin RC3 to perform spatial searches on my index and it works great. However, I can't seem to get it to return the computed distances. My query component is run before the geoDistanceComponent and the distanceField is set to distance Fields for lat/long are defined as well and the different tiers field are in the results. Increasing the radius cause the number of matches to increase so I guess that my setup is working... Here is sample query and its output (I removed some of the fields to keep it short): /select?passkey=sampleq={!spatial%20lat=40.27%20long=-76.29%20radius=22%20calc=arc}title:engineerwt=jsonindent=onfl=*,distance { responseHeader:{ status:0, QTime:69, params:{ fl:*,distance, indent:on, q:{!spatial lat=40.27 long=-76.29 radius=22 calc=arc}title:engineer, wt:json}}, response:{numFound:223,start:0,docs:[ { title:Electrical Engineer, long:-76.3054962158203, lat:40.037899017334, _tier_9:-3.004, _tier_10:-6.0008, _tier_11:-12.0016, _tier_12:-24.0031, _tier_13:-47.0061, _tier_14:-93.00122, _tier_15:-186.00243, _tier_16:-372.00485}, }} This output suggests to me that everything is in place. Anyone knows how to fetch the computed distance? I tried adding the field 'distance' to my list of fields but it didn't work Thanks
JTeam Spatial Plugin
Hi All, I am using JTeam's Spatial Plugin RC3 to perform spatial searches on my index and it works great. However, I can't seem to get it to return the computed distances. My query component is run before the geoDistanceComponent and the distanceField is set to distance Fields for lat/long are defined as well and the different tiers field are in the results. Increasing the radius cause the number of matches to increase so I guess that my setup is working... Here is sample query and its output (I removed some of the fields to keep it short): /select?passkey=sampleq={!spatial%20lat=40.27%20long=-76.29%20radius=22%20calc=arc}title:engineerwt=jsonindent=onfl=*,distance { responseHeader:{ status:0, QTime:69, params:{ fl:*,distance, indent:on, q:{!spatial lat=40.27 long=-76.29 radius=22 calc=arc}title:engineer, wt:json}}, response:{numFound:223,start:0,docs:[ { title:Electrical Engineer, long:-76.3054962158203, lat:40.037899017334, _tier_9:-3.004, _tier_10:-6.0008, _tier_11:-12.0016, _tier_12:-24.0031, _tier_13:-47.0061, _tier_14:-93.00122, _tier_15:-186.00243, _tier_16:-372.00485}, }} This output suggests to me that everything is in place. Anyone knows how to fetch the computed distance? I tried adding the field 'distance' to my list of fields but it didn't work Thanks
Re: Reg: Indexing Date Fields
I guess you can simply use a range query such as: fq=createdDate:[ date1 TO date2 ] On 2010-04-15, at 7:30 AM, Venkata Sai Krishna Vepakomma wrote: Hi, 1) How do I query for Data between 2 date ranges. I have specified the following field definition in Schema.xml. field name=createdDate type=long indexed=true stored=true / I have long values for Date fields. When I query with long values, I am always getting all the results. 2) For indexing to be working efficiently and for querying between Date ranges, Is it OK to use long values or Do I need to use 'Date' type with specific formats. Please Let me know your thoughts. Thanks Regards Venkat
Collapse problem
Hi All, I'd like to know if anyone else is experiencing the same problem we are facing basically, we are running query with field collapsing (Solr 1.4 with patch 236). The responses tells us that there are about 2700 documents matching our query. However, I can not get passed the 431th document. From this point on, the response will not contain any document. If I run the same query without collapsing then I can iterator through all results without problem. This tells me that the problem is not related to the shards. Any hints?
Re: Benchmarking Solr
Hi, why don't you use JMeter? It would give you greater control over the tests you wish to make. It has many different samplers that will let you run different scenarios using your existing set of queries. ab is great when you want to evaluate the performance of your server under heavy load. But other than this, I don`t see much use to it. JMeter offers many more options once you get to know it a little. good luck - Original Message - From: Blargy zman...@hotmail.com To: solr-user@lucene.apache.org Sent: Friday, April 09, 2010 9:46 PM Subject: Benchmarking Solr I am about to deploy Solr into our production environment and I would like to do some benchmarking to determine how many slaves I will need to set up. Currently the only way I know how to benchmark is to use Apache Benchmark but I would like to be able to send random requests to the Solr... not just one request over and over. I have a sample data set of 5000 user entered queries and I would like to be able to use AB to benchmark against all these random queries. Is this possible? FYI our current index is ~1.5 gigs with ~5m documents and we will be using faceting quite extensively. Are average requests per/day is ~2m. We will be running RHEL with about 8-12g ram. Any idea how many slaves might be required to handle our load? Thanks -- View this message in context: http://n3.nabble.com/Benchmarking-Solr-tp709561p709561.html Sent from the Solr - User mailing list archive at Nabble.com.
Excluding field from the results
Hi, Is there an easy way to prevent a field from being returned in the response? we can use fl=field1, field2, field3, ... but then our software has an option that must trigger the presence or not of a field in the response. So what I'd like to do is tell Solr to return all fields except one. Does Solr support this? I imagine the syntax could look like this: fl=*, -description Since this is not working, is there any other way of doing this? Otherwise, I will have to manage multiple list of fields. Thanks
Spatial queries
Hi All, I am using the package from JTeam to perform spatial searches on my index. I'd like to know if it is possible to build a query that uses multiple clauses. Here is an example: q={!spatial lat=123 long=456 radius=10} OR {!spatial lat=111 long=222 radius=20}title:java Basically that would return all documents having the word java in the title field and that are either within 10 miles from the first location OR 20 miles from the second. I've made a few tries but it does not seem to be supported. I'm still wondering if it would make sense to support this kind of queries. I could use multiple queries and merge the results myself but then I need some faceting. Thanks
Re: Recommended OS
On 2010-03-18, at 1:03 PM, K Wong wrote: http://wiki.apache.org/solr/FAQ#What_are_the_Requirements_for_running_a_Solr_server.3F I have Solr running on CentOS 5.4. It runs fine on the OpenJDK 1.6.0 and Tomcat 5. If I were to do it again, I'd probably just stick with Jetty. Would you mind explaining why you would stick with Jetty instead of Tomcat? You really will need to read the docs to get the settings right as there is no one-size-fits-all setting. (re your mem/dsk question) K On Thu, Mar 18, 2010 at 9:51 AM, blargy zman...@hotmail.com wrote: Does anyone have any recommendations on which OS to use when setting up Solr search server? Any memory/disk space recommendations? Thanks -- View this message in context: http://old.nabble.com/Recommended-OS-tp27948306p27948306.html Sent from the Solr - User mailing list archive at Nabble.com.
Spatial search in Solr 1.5
Hi All, I'm trying to figure out how to perform spatial searches using Solr 1.5 (from the trunk). Is the support for spatial search built-in? because none of the patches I tried could be applied to the source tree. If this is the case, can someone one tell me how to configure it? I find the available information very confusing so I hope someone will be able to give me some hints... Thanks
Profiling Solr
Hi, I'm trying to identify the bottleneck to get acceptable performance of a single shard containing 4.7 millions of documents using my own machine (Mac Pro - Quad Core with 8Gb of RAM with 4Gb allocated to the JVM). I tried using YourKit but I don't get anything about Solr classes. I'm new to Yourkit so I might be doing something wrong but it seems pretty straight forward. I am running Solr within a Tomcat instance within Eclipse. Does anyone have an idea about what could be wrong in my setup? I'm making individual requests (one at a time) and the response times are horrible (about 15 sec on average). I need to bring this way below 1 second. Here is a sample query: http://localhost:8983/jobs_part3/select/?q=*:*collapse=truecollapse.field=hash_idfacet=truefacet.field=county_idfacet.field=advertiser_idfacet.field=county_idsort=county_id+ascrows=100collapse.type=adjacent I know that collapsing results has a big hit on performance but it is a must have for us. Thanks for any hints. = JVM Parameters = -Xms4g -Xmx4g -d64 -server
Multi valued fields
Hi All, I'd like to know if it is possible to do the following on a multi-value field: Given the following data: document A: field1 = [ A B C D] document B: field 1 = [A B] document C: field 1 = [A] Can I build a query such as : -field: A which will return all documents that do not have exclusive A in the their field's values. By exclusive I mean that I don't want documents that only have A in their list of values. In my sample case, the query would return doc A and B. Because they both have other values in field1. It this kind of query possible with Solr/Lucene? Thanks
Re: Index size
Hi, All the document can be up to 10K. Most if it comes from a single field which is both indexed and stored. The data is uncompressed because it would eat up to much CPU considering the volume we have. We have around 30 fields in all. We also need to compute some facets as well as collapse the documents forming the result set and to be able to sort them on any field. Thx On 2010-02-25, at 5:50 PM, Otis Gospodnetic wrote: It depends on many factors - how big those docs are (compare a tweet to a news article to a book chapter) whether you store the data or just index it, whether you compress it, how and how much you analyze the data, etc. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Jean-Sebastien Vachon js.vac...@videotron.ca To: solr-user@lucene.apache.org Sent: Wed, February 24, 2010 8:57:21 AM Subject: Index size Hi All, I'm currently looking on integrating Solr and I'd like to have some hints on the size of the index (number of documents) I could possibly host on a server running a Double-Quad server (16 cores) with 48Gb of RAM running Linux. Basically, I need to determine how many of these servers would be required to host about half a billion documents. Should I setup multiple Solr instances (in Virtual Machines or not) or should I run a single instance (with multicores or not) using all available memory as the cache ? I also made some tests with shardings on this same server and I could not see any improvement (at least not with 4.5 millions documents). Should all the shards be hosted on different servers? I shall try with more documents in the following days. Thx
Index size
Hi All, I'm currently looking on integrating Solr and I'd like to have some hints on the size of the index (number of documents) I could possibly host on a server running a Double-Quad server (16 cores) with 48Gb of RAM running Linux. Basically, I need to determine how many of these servers would be required to host about half a billion documents. Should I setup multiple Solr instances (in Virtual Machines or not) or should I run a single instance (with multicores or not) using all available memory as the cache ? I also made some tests with shardings on this same server and I could not see any improvement (at least not with 4.5 millions documents). Should all the shards be hosted on different servers? I shall try with more documents in the following days. Thx