Re: Problem with solr deployment on weblogic 10.3
On 4/24/2013 10:47 PM, Radhakrishna Repala wrote: I'm new to solr, while deploying solr in weblogic got following exception. Please help me in this regard. Error 500--Internal Server Error java.lang.NoSuchMethodError: replaceEach The replaceEach method is included in Apache commons-lang 2.4 and later. The solr.war file contains a jar for this library, version 2.6 for the latest Solr versions. From what I have been able to determine using Google, weblogic 10.3 uses an older version of the Apache commons-lang library, and that is taking precedence, so the version included in Solr is not being used. It looks like the solution is adding some config to the weblogic.xml file in the solr.war so that weblogic prefers application classes. I filed SOLR-4762. I do not know if this change might have unintended consequences. http://ananthkannan.blogspot.com/2009/08/beware-of-stringutilscontainsignorecase.html https://issues.apache.org/jira/browse/SOLR-4762 Thanks, Shawn
Re: Update on shards
Hi It seems not to work in my case. We are using the solr php module for talking to Solr. Currently we have 2 collections 'intradesk' and 'lvs' for 10 solr hosts (shards: 5 - repl: 2). Because there is no more disc space I created 6 new hosts for collection 'messages' (shards: 3 - repl: 2). 'intradesk + lvs': solr01-dcg solr01-gs solr02-dcg solr02-gs solr03-dcg solr03-gs solr04-dcg solr04-gs solr05-dcg solr05-gs 'messages': solr06-dcg solr06-gs solr07-dcg solr07-gs solr08-dcg solr08-gs So when doing a select, I can talk to any host. When updating I must talk to a host with at least 1 shard on it. I created the new messages shard with the following command to get them on the new hosts (06 - 08): http://solr01-dcg.intnet.smartbit.be:8983/solr/admin/collections?action=CREATEname=messagesnumShards=3replicationFactor=2collection.configName=smsccreateNodeSet=solr06-gs.intnet.smartbit.be:8983_solr,solr06-dcg.intnet.smartbit.be:8983_solr,solr07-gs.intnet.smartbit.be:8983_solr,solr07-dcg.intnet.smartbit.be:8983_solr,solr08-gs.intnet.smartbit.be:8983_solr,solr08-dcg.intnet.smartbit.be:8983_solr They are all in the same config set 'smsc'. Below is the code: $client = new SolrClient( array( 'hostname' = solr01-dcg.intnet.smartbit.be http://solr01-dcg.intnet.smartbit.be:8983/solr/admin/collections?action=CREATEname=messagesnumShards=3replicationFactor=2collection.configName=smsccreateNodeSet=solr06-gs.intnet.smartbit.be:8983_solr,solr06-dcg.intnet.smartbit.be:8983_solr,solr07-gs.intnet.smartbit.be:8983_solr,solr07-dcg.intnet.smartbit.be:8983_solr,solr08-gs.intnet.smartbit.be:8983_solr,solr08-dcg.intnet.smartbit.be:8983_solr, 'port' = 8983, 'login' = ***, 'password' = ***, 'path' = solr/messages, 'wt'= json ) ); $doc = new SolrInputDocument(); $doc-addField('id', $uniqueID); $doc-addField('smsc_ssid', $ssID); $doc-addField('smsc_module', $i['module']); $doc-addField('smsc_modulekey', $i['moduleKey']); $doc-addField('smsc_courseid', $courseID); $doc-addField('smsc_description', $i['description']); $doc-addField('smsc_content', $i['content']); $doc-addField('smsc_lastdate', $lastdate); $doc-addField('smsc_userid', $userID); $client-addDocument($doc); The exception I get look like this: exception 'SolrClientException' with message 'Unsuccessful update request. Response Code 200. (null)' Nothing special to find in the solr log. Any idea? Arkadi On 04/24/2013 08:43 PM, Mark Miller wrote: Sorry - need to correct myself - updates worked the same as read requests - they also needed to hit a SolrCore in order to get forwarded to the right node. I was not thinking clearly when I said this applied to just reads and not writes. Both needed a SolrCore to do their work - with the request proxying, this is no longer the case, so you can hit Solr instances with no SolrCores or with SolrCores that are not part of the collection you are working with, and both read and write side requests are now proxied to a suitable node that has a SolrCore that can do the search or forward the update (or accept the update). - Mark On Apr 23, 2013, at 3:38 PM, Mark Miller markrmil...@gmail.com wrote: We have a 3rd release candidate for 4.3 being voted on now. I have never tested this feature with Tomcat - only Jetty. Users have reported it does not work with Tomcat. That leads one to think it may have a problem in other containers as well. A previous contributor donated a patch that explicitly flushes a stream in our proxy code - he says this allows the feature to work with Tomcat. I committed this feature - the flush can't hurt, and given the previous contributions of this individual, I'm fairly confident the fix makes things work in Tomcat. I have no first hand knowledge that it does work though. You might take the RC for a spin and test it our yourself: http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC3-rev1470846/ - Mark On Apr 23, 2013, at 3:20 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Mark; All in all you say that when 4.3 is tagged at repository (I mean when it is ready) this feature will work for Tomcat too at a stable version? 2013/4/23 Mark Miller markrmil...@gmail.com On Apr 23, 2013, at 2:49 PM, Shawn Heisey s...@elyograg.org wrote: What exactly is the 'request proxying' thing that doesn't work on tomcat? Is this something different from basic SolrCloud operation where you send any kind of request to any server and they get directed where they need to go? I haven't heard of that not working on tomcat before. Before 4.2, if you made a read request to a node that didn't contain part of the collection you where
Re: Using Solr For a Real Search Engine
Hi Otis; You are right. start.jar starts up an Jetty and there is a war file under example directory and deploys start.jar to itself, is that true? 2013/4/25 Otis Gospodnetic otis.gospodne...@gmail.com Suggestion : Don't call this embedded Jetty to avoid confusion with the actual embedded jetty. Otis Solr ElasticSearch Support http://sematext.com/ On Apr 23, 2013 4:56 PM, Furkan KAMACI furkankam...@gmail.com wrote: Thanks for the answers. I will go with embedded Jetty for my SolrCloud. If I face with something important I would want to share my experiences with you. 2013/4/23 Shawn Heisey s...@elyograg.org On 4/23/2013 2:25 PM, Furkan KAMACI wrote: Is there any documentation that explains using Jetty as embedded or not? I use Solr deployed at Tomcat but after you message I will consider about Jetty. If we think about other issues i.e. when I want to update my Solr jars/wars etc.(this is just an foo example) does any pros and cons Tomcat or Jetty has? The Jetty in the example is only 'embedded' in the sense that you don't have to install it separately. It is not special -- the Jetty components are not changed at all, a subset of them is just included in the Solr download with a tuned configuration file. If you go to www.eclipse.org/jetty and download the latest stable-8 version, you'll see some familiar things - start.jar, an etc directory, a lib directory, and a contexts directory. They have more in them than the example does -- extra functionality Solr doesn't need. If you want to start the downloaded version, you can use 'java -jar start.jar' just like you do with Solr. Thanks, Shawn
Re: filter before facet
On Wed, 2013-04-24 at 23:10 +0200, Daniel Tyreus wrote: But why is it slow to generate facets on a result set of 0? Furthermore, why does it take the same amount of time to generate facets on a result set of 2000 as 100,000 documents? The default faceting method for your query is field cache. Field cache faceting works by generating a structure for all the values for the field in the whole corpus. It is exactly the same work whether you hit 0, 2K or 100M documents with your query. After the structure has been build, the actual counting of values in the facet is fast. There is not much difference between 2K and 100K hits. This leads me to believe that the FQ is being applied AFTER the facets are calculated on the whole data set. For my use case it would make a ton of sense to apply the FQ first and then facet. Is it possible to specify this behavior or do I need to get into the code and get my hands dirty? As you write later, you have tried fc, enum and fcs, with fcs having the fastest first-request-time time. That is understandable as it is segment-oriented and (nearly) just a matter of loading the values sequentially from storage. However, the general observation is that it is about 10 times as slow as the fc-method for subsequent queries. Since you are doing NRT that might still leave fcs as the best method for you. As for creating a new faceting implementation that avoids the startup penalty by using only the found documents, then it is technically quite simple: Use stored fields, iterate the hits and request the values. Unfortunately this scales poorly with the number of hits, so unless you can guarantee that you will always have small result sets, this is probably not a viable option. - Toke Eskildsen, State and University Library, Denmark
Re: JVM Parameters to Startup Solr?
On Wed, 2013-04-24 at 18:03 +0200, Mark Miller wrote: On Apr 24, 2013, at 12:00 PM, Mark Miller markrmil...@gmail.com wrote: -XX:OnOutOfMemoryError=kill -9 %p -XX:+HeapDumpOnOutOfMemoryError The way I like to handle this is to have the OOM trigger a little script or set of cmds that logs the issue and kills the process. We treat all Errors as fatal by writing to a dedicated log and shutting down the JVM (which triggers the load balancer etc.). Unfortunately that means that some XML + XSLT combinations can bring the JVM down due to StackOverflowError. This might be a little too diligent as the Oracle JVM running on Linux (our current setup) is resilient to Threads hitting stack overflow. - Toke Eskildsen, State and University Library, Denmark
Re: solr.StopFilterFactory doesn't work with wildcard
1) I use StopFilterFactory in multiterm analyzer because without it query analizer doesn't work with multi-terms, in particular terms with wildcard. 2) I expect that: str name=rawquerystringsearch_string_ss_i:(hp* pavilion* series* d4*)/str str name=querystringsearch_string_ss_i:(hp* pavilion* series* d4*)/str str name=parsedquerysearch_string_ss_i:hp* +search_string_ss_i:pavilion* +search_string_ss_i:d4*/str str name=parsedquery_toString+search_string_ss_i:hp* +search_string_ss_i:pavilion* +search_string_ss_i:d4*/str i.e. I expect that StopFilterFactory will work likewise query without wildcard Thanks for you answer -- View this message in context: http://lucene.472066.n3.nabble.com/solr-StopFilterFactory-doesn-t-work-with-wildcard-tp4058581p4058856.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: JVM Parameters to Startup Solr?
Could you explain that what you mean with such kind of scripts? What it checks and do exactly? 2013/4/25 Toke Eskildsen t...@statsbiblioteket.dk On Wed, 2013-04-24 at 18:03 +0200, Mark Miller wrote: On Apr 24, 2013, at 12:00 PM, Mark Miller markrmil...@gmail.com wrote: -XX:OnOutOfMemoryError=kill -9 %p -XX:+HeapDumpOnOutOfMemoryError The way I like to handle this is to have the OOM trigger a little script or set of cmds that logs the issue and kills the process. We treat all Errors as fatal by writing to a dedicated log and shutting down the JVM (which triggers the load balancer etc.). Unfortunately that means that some XML + XSLT combinations can bring the JVM down due to StackOverflowError. This might be a little too diligent as the Oracle JVM running on Linux (our current setup) is resilient to Threads hitting stack overflow. - Toke Eskildsen, State and University Library, Denmark
Re: Solr metrics in Codahale metrics and Graphite?
Hi Walter, Dmitry, I opened https://issues.apache.org/jira/browse/SOLR-4735 for this, with some work-in-progress. Have a look! Alan Woodward www.flax.co.uk On 23 Apr 2013, at 07:40, Dmitry Kan wrote: Hello Walter, Have you had a chance to get something working with graphite, codahale and solr? Has anyone else tried these tools with Solr 3.x family? How much work is it to set things up? We have tried zabbix in the past. Even though it required lots of up front investment on configuration, it looks like a compelling option. In the meantime, we are looking into something more solr-tailed yet simple. Even without metrics persistence. Tried: jconsole and viewing stats via jmx. Main point for us now is to gather the RAM usage. Dmitry On Tue, Apr 9, 2013 at 9:43 PM, Walter Underwood wun...@wunderwood.orgwrote: If it isn't obvious, I'm glad to help test a patch for this. We can run a simulated production load in dev and report to our metrics server. wunder On Apr 8, 2013, at 1:07 PM, Walter Underwood wrote: That approach sounds great. --wunder On Apr 7, 2013, at 9:40 AM, Alan Woodward wrote: I've been thinking about how to improve this reporting, especially now that metrics-3 (which removes all of the funky thread issues we ran into last time I tried to add it to Solr) is close to release. I think we could go about it as follows: * refactor the existing JMX reporting to use metrics-3. This would mean replacing the SolrCore.infoRegistry map with a MetricsRegistry, and adding a JmxReporter, keeping the existing config logic to determine which JMX server to use. PluginInfoHandler and SolrMBeanInfoHandler translate the metrics-3 data back into SolrMBean format to keep the reporting backwards-compatible. This seems like a lot of work for no visible benefit, but… * we can then add the ability to define other metrics reporters in solrconfig.xml. There are already reporters for Ganglia and Graphite - you just add then to the Solr lib/ directory, configure them in solrconfig, and voila - Solr can be monitored using the same devops tools you use to monitor everything else. Does this sound sane? Alan Woodward www.flax.co.uk On 6 Apr 2013, at 20:49, Walter Underwood wrote: Wow, that really doesn't help at all, since these seem to only be reported in the stats page. I don't need another non-standard app-specific set of metrics, especially one that needs polling. I need metrics delivered to the common system that we use for all our servers. This is also why SPM is not useful for us, sorry Otis. Also, there is no time period on these stats. How do you graph the 95th percentile? I know there was a lot of work on these, but they seem really useless to me. I'm picky about metrics, working at Netflix does that to you. wunder On Apr 3, 2013, at 4:01 PM, Walter Underwood wrote: In the Jira, but not in the docs. It would be nice to have VM stats like GC, too, so we can have common monitoring and alerting on all our services. wunder On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote: It's there! :) http://search-lucene.com/?q=percentilefc_project=Solrfc_type=issue Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood wun...@wunderwood.org wrote: That sounds great. I'll check out the bug, I didn't see anything in the docs about this. And if I can't find it with a search engine, it probably isn't there. --wunder On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote: On 3/29/2013 12:07 PM, Walter Underwood wrote: What are folks using for this? I don't know that this really answers your question, but Solr 4.1 and later includes a big chunk of codahale metrics internally for request handler statistics - see SOLR-1972. First we tried including the jar and using the API, but that created thread leak problems, so the source code was added. Thanks, Shawn -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
Re: Solr 3.6.1: changing a field from stored to not stored
Good to know I missed something about solr replication. Thanks Jan On 24 April 2013 17:42, Jan Høydahl jan@cominvent.com wrote: I would create a new core as slave of the existing configuration without replicating the core schema and configuration. This way I can get the This won't work, as master/slave replication copies the index files as-is. You should re-index all your data. You don't need to take down the cluster to do that, just re-index on top of what's there already, and your index will become smaller and smaller as merging kicks out the old data :) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 24. apr. 2013 kl. 15:59 skrev Majirus FANSI majirus@gmail.com: I would create a new core as slave of the existing configuration without replicating the core schema and configuration. This way I can get the information from one index to the other while saving the space as fields in the new schema are mainly not stored. After the replication I would swap the cores for the online core to point to the right index dir and conf. i.e. the one with less stored fields. Maj On 24 April 2013 01:48, Petersen, Robert robert.peter...@mail.rakuten.comwrote: Hey I just want to verify one thing before I start doing this: function queries only require fields to be indexed but don't require them to be stored right? -Original Message- From: Petersen, Robert [mailto:robert.peter...@mail.rakuten.com] Sent: Tuesday, April 23, 2013 4:39 PM To: solr-user@lucene.apache.org Subject: RE: Solr 3.6.1: changing a field from stored to not stored Good info, Thanks Hoss! I was going to add a more specific fl= parameter to my queries at the same time. Currently I am doing fl=*,score so that will have to be changed. -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Tuesday, April 23, 2013 4:18 PM To: solr-user@lucene.apache.org Subject: Re: Solr 3.6.1: changing a field from stored to not stored : index? I noticed I am unnecessarily storing some fields in my index and : I'd like to stop storing them without having to 'reindex the world' and : let the changes just naturally percolate into my index as updates come : in the normal course of things. Do you guys think I could get away with : this? Yes, you can easily get away with this type of change w/o re-indexing, however you won't gain any immediate index size savings until each and every existing doc has been reindexed and the old copies expunged from the index via segment merges. the one hicup thta can affect people when doing this is what happens if you use something like fl=* (and likely hl=* as well) ... many places in Solr will try to avoid failure if a stored field is found in the index which isn't defined in the schema, and treat that stored value as a string (legacy behavior designed to make it easier for people to point Solr at old lucene indexes built w/o using Solr) ... so if these stored values are not strings, you might get some weird data in your response for these documents. -Hoss
Re: Solr metrics in Codahale metrics and Graphite?
Hi Alan, Great! What is the solr version you are patching? Speaking of graphite, we have set it up recently to monitor our shard farm. So far since the RAM usage has been most important metric we were fine with pidstat command and a little script generating stats for carbon. Having some additional stats from SOLR itself would certainly be great to have. Dmitry On Thu, Apr 25, 2013 at 12:01 PM, Alan Woodward a...@flax.co.uk wrote: Hi Walter, Dmitry, I opened https://issues.apache.org/jira/browse/SOLR-4735 for this, with some work-in-progress. Have a look! Alan Woodward www.flax.co.uk On 23 Apr 2013, at 07:40, Dmitry Kan wrote: Hello Walter, Have you had a chance to get something working with graphite, codahale and solr? Has anyone else tried these tools with Solr 3.x family? How much work is it to set things up? We have tried zabbix in the past. Even though it required lots of up front investment on configuration, it looks like a compelling option. In the meantime, we are looking into something more solr-tailed yet simple. Even without metrics persistence. Tried: jconsole and viewing stats via jmx. Main point for us now is to gather the RAM usage. Dmitry On Tue, Apr 9, 2013 at 9:43 PM, Walter Underwood wun...@wunderwood.org wrote: If it isn't obvious, I'm glad to help test a patch for this. We can run a simulated production load in dev and report to our metrics server. wunder On Apr 8, 2013, at 1:07 PM, Walter Underwood wrote: That approach sounds great. --wunder On Apr 7, 2013, at 9:40 AM, Alan Woodward wrote: I've been thinking about how to improve this reporting, especially now that metrics-3 (which removes all of the funky thread issues we ran into last time I tried to add it to Solr) is close to release. I think we could go about it as follows: * refactor the existing JMX reporting to use metrics-3. This would mean replacing the SolrCore.infoRegistry map with a MetricsRegistry, and adding a JmxReporter, keeping the existing config logic to determine which JMX server to use. PluginInfoHandler and SolrMBeanInfoHandler translate the metrics-3 data back into SolrMBean format to keep the reporting backwards-compatible. This seems like a lot of work for no visible benefit, but… * we can then add the ability to define other metrics reporters in solrconfig.xml. There are already reporters for Ganglia and Graphite - you just add then to the Solr lib/ directory, configure them in solrconfig, and voila - Solr can be monitored using the same devops tools you use to monitor everything else. Does this sound sane? Alan Woodward www.flax.co.uk On 6 Apr 2013, at 20:49, Walter Underwood wrote: Wow, that really doesn't help at all, since these seem to only be reported in the stats page. I don't need another non-standard app-specific set of metrics, especially one that needs polling. I need metrics delivered to the common system that we use for all our servers. This is also why SPM is not useful for us, sorry Otis. Also, there is no time period on these stats. How do you graph the 95th percentile? I know there was a lot of work on these, but they seem really useless to me. I'm picky about metrics, working at Netflix does that to you. wunder On Apr 3, 2013, at 4:01 PM, Walter Underwood wrote: In the Jira, but not in the docs. It would be nice to have VM stats like GC, too, so we can have common monitoring and alerting on all our services. wunder On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote: It's there! :) http://search-lucene.com/?q=percentilefc_project=Solrfc_type=issue Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood wun...@wunderwood.org wrote: That sounds great. I'll check out the bug, I didn't see anything in the docs about this. And if I can't find it with a search engine, it probably isn't there. --wunder On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote: On 3/29/2013 12:07 PM, Walter Underwood wrote: What are folks using for this? I don't know that this really answers your question, but Solr 4.1 and later includes a big chunk of codahale metrics internally for request handler statistics - see SOLR-1972. First we tried including the jar and using the API, but that created thread leak problems, so the source code was added. Thanks, Shawn -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
Exact matching in Solr 3.6.1
Hi, is it possible to get exact matched result if the search term is combined e.g. cats AND London NOT Leeds In the previus threads I have read that it is possible to create new field of String type and perform phrase search on it but nowhere the above mentioned combined search term had been taken into consideration. BR Pawel -- View this message in context: http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr metrics in Codahale metrics and Graphite?
This is on top of trunk at the moment, but would be back ported to 4.4 if there was interest. Alan Woodward www.flax.co.uk On 25 Apr 2013, at 10:32, Dmitry Kan wrote: Hi Alan, Great! What is the solr version you are patching? Speaking of graphite, we have set it up recently to monitor our shard farm. So far since the RAM usage has been most important metric we were fine with pidstat command and a little script generating stats for carbon. Having some additional stats from SOLR itself would certainly be great to have. Dmitry On Thu, Apr 25, 2013 at 12:01 PM, Alan Woodward a...@flax.co.uk wrote: Hi Walter, Dmitry, I opened https://issues.apache.org/jira/browse/SOLR-4735 for this, with some work-in-progress. Have a look! Alan Woodward www.flax.co.uk On 23 Apr 2013, at 07:40, Dmitry Kan wrote: Hello Walter, Have you had a chance to get something working with graphite, codahale and solr? Has anyone else tried these tools with Solr 3.x family? How much work is it to set things up? We have tried zabbix in the past. Even though it required lots of up front investment on configuration, it looks like a compelling option. In the meantime, we are looking into something more solr-tailed yet simple. Even without metrics persistence. Tried: jconsole and viewing stats via jmx. Main point for us now is to gather the RAM usage. Dmitry On Tue, Apr 9, 2013 at 9:43 PM, Walter Underwood wun...@wunderwood.org wrote: If it isn't obvious, I'm glad to help test a patch for this. We can run a simulated production load in dev and report to our metrics server. wunder On Apr 8, 2013, at 1:07 PM, Walter Underwood wrote: That approach sounds great. --wunder On Apr 7, 2013, at 9:40 AM, Alan Woodward wrote: I've been thinking about how to improve this reporting, especially now that metrics-3 (which removes all of the funky thread issues we ran into last time I tried to add it to Solr) is close to release. I think we could go about it as follows: * refactor the existing JMX reporting to use metrics-3. This would mean replacing the SolrCore.infoRegistry map with a MetricsRegistry, and adding a JmxReporter, keeping the existing config logic to determine which JMX server to use. PluginInfoHandler and SolrMBeanInfoHandler translate the metrics-3 data back into SolrMBean format to keep the reporting backwards-compatible. This seems like a lot of work for no visible benefit, but… * we can then add the ability to define other metrics reporters in solrconfig.xml. There are already reporters for Ganglia and Graphite - you just add then to the Solr lib/ directory, configure them in solrconfig, and voila - Solr can be monitored using the same devops tools you use to monitor everything else. Does this sound sane? Alan Woodward www.flax.co.uk On 6 Apr 2013, at 20:49, Walter Underwood wrote: Wow, that really doesn't help at all, since these seem to only be reported in the stats page. I don't need another non-standard app-specific set of metrics, especially one that needs polling. I need metrics delivered to the common system that we use for all our servers. This is also why SPM is not useful for us, sorry Otis. Also, there is no time period on these stats. How do you graph the 95th percentile? I know there was a lot of work on these, but they seem really useless to me. I'm picky about metrics, working at Netflix does that to you. wunder On Apr 3, 2013, at 4:01 PM, Walter Underwood wrote: In the Jira, but not in the docs. It would be nice to have VM stats like GC, too, so we can have common monitoring and alerting on all our services. wunder On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote: It's there! :) http://search-lucene.com/?q=percentilefc_project=Solrfc_type=issue Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood wun...@wunderwood.org wrote: That sounds great. I'll check out the bug, I didn't see anything in the docs about this. And if I can't find it with a search engine, it probably isn't there. --wunder On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote: On 3/29/2013 12:07 PM, Walter Underwood wrote: What are folks using for this? I don't know that this really answers your question, but Solr 4.1 and later includes a big chunk of codahale metrics internally for request handler statistics - see SOLR-1972. First we tried including the jar and using the API, but that created thread leak problems, so the source code was added. Thanks, Shawn -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
Re: Exact matching in Solr 3.6.1
Hi Pawel, Not sure which parser you are using, I am using edismax and tried using the bq parameter to boost the results having exact matches at the top. You may try something like: q=cats AND London NOT Leedsbq=cats^50 In edismax, pf and pf2 parameters also need some tuning to get the results at the top. HTH, Sandeep On 25 April 2013 10:33, vsl ociepa.pa...@gmail.com wrote: Hi, is it possible to get exact matched result if the search term is combined e.g. cats AND London NOT Leeds In the previus threads I have read that it is possible to create new field of String type and perform phrase search on it but nowhere the above mentioned combined search term had been taken into consideration. BR Pawel -- View this message in context: http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865.html Sent from the Solr - User mailing list archive at Nabble.com.
Preparing Solr 4.2.1 for IntelliJ fails - invalid sha1
Hi all, I'm trying to run 'ant idea' on 4.2.* and I'm getting invalid sha1 error messages. (see below) I'll appreciate any help, Shahar === . . . resolve ivy:retrieve :: problems summary :: WARNINGS problem while downloading module descriptor: http://repo1.maven.org/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (72ms) problem while downloading module descriptor: http://oss.sonatype.org/content/repositories/releases/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (53ms) problem while downloading module descriptor: http://maven.restlet.org/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (53ms) problem while downloading module descriptor: http://mirror.netcologne.de/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (58ms) module not found: org.apache.ant#ant;1.8.2 . . . public: tried http://repo1.maven.org/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom sonatype-releases: tried http://oss.sonatype.org/content/repositories/releases/org/apache/ant/ant/1.8.2/ant-1.8.2.pom maven.restlet.org: tried http://maven.restlet.org/org/apache/ant/ant/1.8.2/ant-1.8.2.pom working-chinese-mirror: tried http://mirror.netcologne.de/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom problem while downloading module descriptor: http://repo1.maven.org/maven2/junit/junit/4.10/junit-4.10.pom: invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (74ms) problem while downloading module descriptor: http://oss.sonatype.org/content/repositories/releases/junit/junit/4.10/junit-4.10.pom: invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (60ms) problem while downloading module descriptor: http://maven.restlet.org/junit/junit/4.10/junit-4.10.pom: invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (58ms) problem while downloading module descriptor: http://mirror.netcologne.de/maven2/junit/junit/4.10/junit-4.10.pom: invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (60ms) module not found: junit#junit;4.10 . . . . :: :: UNRESOLVED DEPENDENCIES :: :: :: org.apache.ant#ant;1.8.2: not found :: junit#junit;4.10: not found :: com.carrotsearch.randomizedtesting#junit4-ant;2.0.8: not found :: com.carrotsearch.randomizedtesting#randomizedtesting-runner;2.0.8: not found :: :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS D:\apache_solr_4.2.1\lucene\common-build.xml:348: impossible to resolve dependencies: resolve failed - see output for details
Re: Solr faceted search UI
Hi Rocha, In your webapp I guess you have at list a view and a service layers. The indexing and search modules should preferably be hosted at the service layer. I recommend you read the Api doc ( http://lucene.apache.org/solr/4_2_1/solr-solrj/index.html) to get a sense of what you can do with SolrJ. Followinf is a Basic Example Facets with SolrJ: //adding the query keyword to the SolrQuery object mySolrQuery.setQuery(queryBuilder.toString()); // add a facet field mySolrQuery.addFacetField(myFieldName) //add a facet query validatedFromTheLast7DaysFacetQuery = validationDateField + :[NOW/DAY-7DAY TO NOW] mySolrQuery.addFacetQuery(validatedFromTheLast7DaysFacetQuery) //send the request in HTTP POST as with HTTP GET you run into issues when the request string is too long. QueryResponse queryResponse = getSolrHttpServer().query(mysolrQuery, METHOD.POST); //write a transformer to convert the Solr response to a format understandable by the caller (the client of the search service) //List of results to transform SolrDocumentList responseSolrDocumentList = queryResponse.getResults(); //get the facet fields, interate over the list, parse each FacetField and extract the information you are intersted in queryResponse.getFacetFields() //get the facet query from the response MapString, Integer mapOfFacetQueries = queryResponse.getFacetQuery(); The keys of this map are your facet queries. The values are the counts you display to the user. In general, I have an identifier for each facetQuery. When I parse the keys of this map of facet queries, I return the identifier of each facet along with its count (if the count 0 of course). The caller is aware of this identifier so it knows what to display to the user. When the user clicks on a facet, you send it as a search criteria along with the initial keywords to the search service. The criteria resulting from the facet is treated as a filter query. That is how faceting search works. Adding a filter to your query is as simple as this snippet mySolrQuery.addFilterQuery(myfilterQuery). should you are filtering because your user click on the previously defined facet query, then the filter query is the same as the facet query. that is myfilterQuery = validationDateField + :[NOW/DAY-7DAY TO NOW]. I hope this helps. Cheers, Maj On 24 April 2013 17:27, richa striketheg...@gmail.com wrote: Hi Maj, Thanks for your suggestion. Tell me one thing, do you have any example on solrj? suppose I decide to use solrj in simple web application, to display faceted search on web page. Where will this fit into? what will be the flow? Please suggest. Thanks On Wed, Apr 24, 2013 at 11:01 AM, Majirus FANSI [via Lucene] ml-node+s472066n4058610...@n3.nabble.com wrote: Hi richa, You can use solrJ ( http://wiki.apache.org/solr/Solrj#Reading_Data_from_Solr) to query your solr index. On the wiki page indicated, you will see example of faceted search using solrJ. 2009 article by Yonik available on searchhubhttp://searchhub.org/2009/09/02/faceted-search-with-solr/ is a good tutorial on faceted search. Whether you go for MVC framework or not is up to you. It is recommend tough to develop search engine application in a Service Oriented Architecture. Regards, Maj On 24 April 2013 16:43, richa [hidden email] http://user/SendEmail.jtp?type=nodenode=4058610i=0 wrote: Hi, I am working on a POC, where I have to display faceted search result on web page. can anybody please help me to suggest what all set up I need to configure to display. I would prefer java technologies. Just to mention, I have solr cloud running on remote server. I would like to know: 1. Should I use MVC framework? 2. How will my local interact with remote solr server? 3. How will I send query through java code and what technology I should use to display faceted search result? Please help me on this. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598p4058610.html To unsubscribe from Solr faceted search UI, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4058598code=c3RyaWtldGhlZ29hbEBnbWFpbC5jb218NDA1ODU5OHwxNzIzOTAyMzYx . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context:
Re: Exact matching in Solr 3.6.1
Thanks for your reply. I am using edismax as well. What I want to get is the exact match without other results that could be close to the given term. -- View this message in context: http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058876.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exact matching in Solr 3.6.1
As indicated previously, yes, exact matching is possible in Solr. You, the developer, have full control over the exactness or inexactness of all queries. If any query is inexact in some way, it is solely due to decisions that you, the developer, have made. Generally speaking, inexactness, fuzziness if you will, is the precise quality that most developers - and users - are looking for in search. I mean, generally, having to be precise and exact in search requests... is tedious and a real drag, and something to be avoided - in general. But, that's what string fields, the white space tokenizer, the regular expression tokenizer, and full developer control of the token filter sequence are for - to let you, the developer, to have full control, including all aspects of exactness of search. As to your specific question - there is nothing about the AND, OR, or NOT (or + or -) operators that is in any way anything other than exact, in terms of document matching. OR can be considered a form of inexactness in that presence of a term is optional, but AND means absolutely MUST, and NOT means absolutely MUST_NOT. About as exact as anything could get. Scoring and relevancy are another story, but have nothing to do with matching or exactness. Exactness and matching only affect whether a document is counted in numFound and included in results or not, not the ordering of results. But why are you asking? Is there some problem you are trying to solve? Is there some query that is not giving you the results you expect? If this is simply a general information question, fine, answered. But if you are trying to solve some problem, you will need to clearly state your problem rather than asking some general, abstract question. -- Jack Krupansky -Original Message- From: vsl Sent: Thursday, April 25, 2013 5:33 AM To: solr-user@lucene.apache.org Subject: Exact matching in Solr 3.6.1 Hi, is it possible to get exact matched result if the search term is combined e.g. cats AND London NOT Leeds In the previus threads I have read that it is possible to create new field of String type and perform phrase search on it but nowhere the above mentioned combined search term had been taken into consideration. BR Pawel -- View this message in context: http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exact matching in Solr 3.6.1
I think in that case, making a field String type is your option, however remember that it'd be case sensitive. Another approach is to create a case insensitive field type and doing searches on those fields only. fieldType name=string_ci class=solr.TextField sortMissingLast=true omitNorms=true compressThreshold=10 analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Can you provide your fields and dismax config and if possible records you would like and records you do not want? -S On 25 April 2013 11:50, vsl ociepa.pa...@gmail.com wrote: Thanks for your reply. I am using edismax as well. What I want to get is the exact match without other results that could be close to the given term. -- View this message in context: http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058876.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exact matching in Solr 3.6.1
I will explain my case in the example below: We have three documents with given content: First document: london cats glenvilet Second document london cat glenvilet leeds Third document london cat glenvilet Search term: cats AND London NOT Leeds Expected result: First document Current result: First document, Third document Additionaly, next requirement says that when I type as search term: cats AND Londo NOT Leeds then I should get spell check collation: cats AND London NOT Leeds -- View this message in context: http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058890.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exact matching in Solr 3.6.1
It sounds as if your field type is doing stemming - mapping cats to cat. That is a valuable feature of search, but if you wish to turn it off... go ahead and do so by editing the field type. But just be aware that turning off stemming is a great loss of search flexibility. Who knows, maybe you might want to have both stemmed and unstemmed fields in an edismax query and give a higher boost to the unstemmed field - but it's not up to us to guess your requirements. We're dependent on you clearly expressing your requirements. As indicated before, you, the developer have complete control here. But... it is up to you, the developer to choose wisely, to suit your application requirements. But if you don't describe your requirements with greater precision and detail, we won't be able to be of much help to you. Your second (only two) requirement relates to spellcheck, which is completely unrelated to query matching and exactness. Yes, Solr has a spellcheck capability, and yes, it does collation. Is that all you are asking? If there is a specific issue, please be specific about it. -- Jack Krupansky -Original Message- From: vsl Sent: Thursday, April 25, 2013 8:00 AM To: solr-user@lucene.apache.org Subject: Re: Exact matching in Solr 3.6.1 I will explain my case in the example below: We have three documents with given content: First document: london cats glenvilet Second document london cat glenvilet leeds Third document london cat glenvilet Search term: cats AND London NOT Leeds Expected result: First document Current result: First document, Third document Additionaly, next requirement says that when I type as search term: cats AND Londo NOT Leeds then I should get spell check collation: cats AND London NOT Leeds -- View this message in context: http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058890.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr metrics in Codahale metrics and Graphite?
We are very much interested in 3.4. On Thu, Apr 25, 2013 at 12:55 PM, Alan Woodward a...@flax.co.uk wrote: This is on top of trunk at the moment, but would be back ported to 4.4 if there was interest. Alan Woodward www.flax.co.uk On 25 Apr 2013, at 10:32, Dmitry Kan wrote: Hi Alan, Great! What is the solr version you are patching? Speaking of graphite, we have set it up recently to monitor our shard farm. So far since the RAM usage has been most important metric we were fine with pidstat command and a little script generating stats for carbon. Having some additional stats from SOLR itself would certainly be great to have. Dmitry On Thu, Apr 25, 2013 at 12:01 PM, Alan Woodward a...@flax.co.uk wrote: Hi Walter, Dmitry, I opened https://issues.apache.org/jira/browse/SOLR-4735 for this, with some work-in-progress. Have a look! Alan Woodward www.flax.co.uk On 23 Apr 2013, at 07:40, Dmitry Kan wrote: Hello Walter, Have you had a chance to get something working with graphite, codahale and solr? Has anyone else tried these tools with Solr 3.x family? How much work is it to set things up? We have tried zabbix in the past. Even though it required lots of up front investment on configuration, it looks like a compelling option. In the meantime, we are looking into something more solr-tailed yet simple. Even without metrics persistence. Tried: jconsole and viewing stats via jmx. Main point for us now is to gather the RAM usage. Dmitry On Tue, Apr 9, 2013 at 9:43 PM, Walter Underwood wun...@wunderwood.org wrote: If it isn't obvious, I'm glad to help test a patch for this. We can run a simulated production load in dev and report to our metrics server. wunder On Apr 8, 2013, at 1:07 PM, Walter Underwood wrote: That approach sounds great. --wunder On Apr 7, 2013, at 9:40 AM, Alan Woodward wrote: I've been thinking about how to improve this reporting, especially now that metrics-3 (which removes all of the funky thread issues we ran into last time I tried to add it to Solr) is close to release. I think we could go about it as follows: * refactor the existing JMX reporting to use metrics-3. This would mean replacing the SolrCore.infoRegistry map with a MetricsRegistry, and adding a JmxReporter, keeping the existing config logic to determine which JMX server to use. PluginInfoHandler and SolrMBeanInfoHandler translate the metrics-3 data back into SolrMBean format to keep the reporting backwards-compatible. This seems like a lot of work for no visible benefit, but… * we can then add the ability to define other metrics reporters in solrconfig.xml. There are already reporters for Ganglia and Graphite - you just add then to the Solr lib/ directory, configure them in solrconfig, and voila - Solr can be monitored using the same devops tools you use to monitor everything else. Does this sound sane? Alan Woodward www.flax.co.uk On 6 Apr 2013, at 20:49, Walter Underwood wrote: Wow, that really doesn't help at all, since these seem to only be reported in the stats page. I don't need another non-standard app-specific set of metrics, especially one that needs polling. I need metrics delivered to the common system that we use for all our servers. This is also why SPM is not useful for us, sorry Otis. Also, there is no time period on these stats. How do you graph the 95th percentile? I know there was a lot of work on these, but they seem really useless to me. I'm picky about metrics, working at Netflix does that to you. wunder On Apr 3, 2013, at 4:01 PM, Walter Underwood wrote: In the Jira, but not in the docs. It would be nice to have VM stats like GC, too, so we can have common monitoring and alerting on all our services. wunder On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote: It's there! :) http://search-lucene.com/?q=percentilefc_project=Solrfc_type=issue Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood wun...@wunderwood.org wrote: That sounds great. I'll check out the bug, I didn't see anything in the docs about this. And if I can't find it with a search engine, it probably isn't there. --wunder On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote: On 3/29/2013 12:07 PM, Walter Underwood wrote: What are folks using for this? I don't know that this really answers your question, but Solr 4.1 and later includes a big chunk of codahale metrics internally for request handler statistics - see SOLR-1972. First we tried including the jar and using the API, but that created thread leak problems, so the source code was added. Thanks, Shawn -- Walter Underwood wun...@wunderwood.org -- Walter Underwood
Re: [solr 3.4] anomaly during distributed facet query with 102 shards
Are there any distrib facet gurus on the list? I would be ready to try sensible ideas, including on the source code level, if someone of you could give me a hand. Dmitry On Wed, Apr 24, 2013 at 3:08 PM, Dmitry Kan solrexp...@gmail.com wrote: Hello list, We deal with an anomaly when doing a distributed facet query against 102 shards. The problem manifests itself in both the frontend solr (router) and a shard. Each time the request is executed, always different shard is affected (at random, hence the anomaly). The query is: http://router_host:router_port /solr/select?q=testfacet=truefacet.field=field_of_type_longfacet.limit=1330facet.mincount=1rows=1facet.sort=indexfacet.zeros=falsefacet.offset=0 I have omitted the shards parameter. The router log: request: http://10.155.244.181:9150/solr/select at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Notice the port of a shard, that is affected. That port changes all the time, even for the same request The log entry is prepended with lines: SEVERE: org.apache.solr.common.SolrException: Internal Server Error Internal Server Error (they are not in the pastebin link) The shard log: Apr 24, 2013 11:08:49 AM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at java.io.StringReader.init(StringReader.java:50) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:203) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:81) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:722) Apr 24, 2013 11:08:49 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={} status=500 QTime=2 Apr 24, 2013 11:08:49 AM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at java.io.StringReader.init(StringReader.java:50) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:203) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:81) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173) at
Re: Exact matching in Solr 3.6.1
Exact matching is just one of my cases. Currently I perform search on field with given definition: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1 types=characters.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English/ /analyzer analyzer type=query filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1 types=characters.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English/ /analyzer /fieldType This field definition fullfils all other requirments. Examples: - special characters - passengers- passenger The case with exact matching is the last one I have to complete. The problem with cats - cat is caused by SnowballPorterFilterFactory. This is what I know. The question is whether it is possible to handle exact matching (edismax) with only one result like described in the previous post without influencing existing functionalities? BR Pawel -- View this message in context: http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058907.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr maven install - authorization problem when downloading maven.restlet.org dependencies
Hi, I'm trying to build Solr 4.2.x with Maven and I'm getting the following error in solr-core: [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 1.341s [INFO] Finished at: Thu Apr 25 15:33:09 IDT 2013 [INFO] Final Memory: 12M/174M [INFO] [ERROR] Failed to execute goal on project solr-core: Could not resolve dependencies for project org.apache.solr:solr-core:jar:4.2.1-SNAPSHOT: Failed to collect dependencies for [org.apache.solr:solr-solrj:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-core:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-codecs:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-analyzers-common:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-analyzers-kuromoji:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-analyzers-morfologik:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-analyzers-phonetic:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-highlighter:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-memory:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-misc:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-queryparser:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-spatial:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-suggest:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-grouping:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-queries:jar:4.2.1-SNAPSHOT (compile), commons-codec:commons-codec:jar:1.7 (compile), commons-cli:commons-cli:jar:1.2 (compile), commons-fileupload:commons-fileupload:jar:1.2.1 (compile), org.restlet.jee:org.restlet:jar:2.1.1 (compile), org.restlet.jee:org.restlet.ext.servlet:jar:2.1.1 (compile), org.slf4j:jcl-over-slf4j:jar:1.6.4 (compile), org.slf4j:slf4j-jdk14:jar:1.6.4 (compile), commons-io:commons-io:jar:2.1 (compile), commons-lang:commons-lang:jar:2.6 (compile), com.google.guava:guava:jar:13.0.1 (compile), org.eclipse.jetty:jetty-server:jar:8.1.8.v20121106 (compile?), org.eclipse.jetty:jetty-util:jar:8.1.8.v20121106 (compile?), org.eclipse.jetty:jetty-webapp:jar:8.1.8.v20121106 (compile?), org.codehaus.woodstox:wstx-asl:jar:3.2.7 (runtime), javax.servlet:servlet-api:jar:2.4 (provided), org.apache.httpcomponents:httpclient:jar:4.2.3 (compile), org.apache.httpcomponents:httpmime:jar:4.2.3 (compile), org.slf4j:slf4j-api:jar:1.6.4 (compile), junit:junit:jar:4.10 (test)]: Failed to read artifact descriptor for org.restlet.jee:org.restlet:jar:2.1.1: Could not transfer artifact org.restlet.jee:org.restlet:pom:2.1.1 from/to maven-restlet (http://maven.restlet.org): Not authorized, ReasonPhrase:Unauthorized. - [Help 1] Has anyone encountered this issue? Thanks, Shahar.
Re: Exact matching in Solr 3.6.1
Hi Pawel, If you are searching on any field of type text_general as defined in your schema, you are stuck with the porter stemmer. In fact in your setting solr is not aware of a term like cats, but cat. Thus no way to do exact match with cats in this case. What you can do is creating a new type of field and with the copyField facility save a verbatim version of your data in that field while the field of type text-general still performs stemming. Finally, do add the new field to the list of searcheable field with a higher boost so that exact match receives highest score. Hope this helps. regards, Maj On 25 April 2013 14:43, vsl ociepa.pa...@gmail.com wrote: Exact matching is just one of my cases. Currently I perform search on field with given definition: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1 types=characters.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English/ /analyzer analyzer type=query filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1 types=characters.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English/ /analyzer /fieldType This field definition fullfils all other requirments. Examples: - special characters - passengers- passenger The case with exact matching is the last one I have to complete. The problem with cats - cat is caused by SnowballPorterFilterFactory. This is what I know. The question is whether it is possible to handle exact matching (edismax) with only one result like described in the previous post without influencing existing functionalities? BR Pawel -- View this message in context: http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058907.html Sent from the Solr - User mailing list archive at Nabble.com.
FieldCache insanity with field used as facet and group
Hello, I am using the Lucene FieldCache with SolrCloud and I have insane instances with messages like: VALUEMISMATCH: Multiple distinct value objects for SegmentCoreReader(owner=_11i(4.2.1):C4493997/853637)+merchantid 'SegmentCoreReader(owner=_11i(4.2.1):C4493997/853637)'='merchantid',class org.apache.lucene.index.SortedDocValues,0.5=org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#557711353 'SegmentCoreReader(owner=_11i(4.2.1):C4493997/853637)'='merchantid',int,null=org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713 'SegmentCoreReader(owner=_11i(4.2.1):C4493997/853637)'='merchantid',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713 All insane instances are for a field merchantid of type int used as facet and group field. I'm using a custom SearchHandler which makes two sub-queries, a first query with group.field=merchantid and a second query with facet.field=merchantid. When I'm using the parameter facet.method=enum, I don't have the insane instance but I'm not sure it is the good fix. This insanity can have performance impact ? How can I fix it ? Elodie Sannier Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: Solr maven install - authorization problem when downloading maven.restlet.org dependencies
Building the solr 4.2.1 worked fine for me. Here is the relevant portion of ivy-settings.xml that I had to change: chain name=default returnFirst=true checkmodified=true changingPattern=.*SNAPSHOT resolver ref=local/ !-- resolver ref=local-maven-2 / -- resolver ref=main/ !-- resolver ref=sonatype-releases / -- !-- COMMENTED OUT -- resolver ref=maven.restlet.org / resolver ref=working-chinese-mirror / /chain Dmitry On Thu, Apr 25, 2013 at 3:53 PM, Shahar Davidson shah...@checkpoint.comwrote: Hi, I'm trying to build Solr 4.2.x with Maven and I'm getting the following error in solr-core: [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 1.341s [INFO] Finished at: Thu Apr 25 15:33:09 IDT 2013 [INFO] Final Memory: 12M/174M [INFO] [ERROR] Failed to execute goal on project solr-core: Could not resolve dependencies for project org.apache.solr:solr-core:jar:4.2.1-SNAPSHOT: Failed to collect dependencies for [org.apache.solr:solr-solrj:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-core:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-codecs:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-analyzers-common:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-analyzers-kuromoji:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-analyzers-morfologik:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-analyzers-phonetic:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-highlighter:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-memory:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-misc:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-queryparser:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-spatial:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-suggest:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-grouping:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-queries:jar:4.2.1-SNAPSHOT (compile), commons-codec:commons-codec:jar:1.7 (compile), commons-cli:commons-cli:jar:1.2 (compile), commons-fileupload:commons-fileupload:jar:1.2.1 (compile), org.restlet.jee:org.restlet:jar:2.1.1 (compile), org.restlet.jee:org.restlet.ext.servlet:jar:2.1.1 (compile), org.slf4j:jcl-over-slf4j:jar:1.6.4 (compile), org.slf4j:slf4j-jdk14:jar:1.6.4 (compile), commons-io:commons-io:jar:2.1 (compile), commons-lang:commons-lang:jar:2.6 (compile), com.google.guava:guava:jar:13.0.1 (compile), org.eclipse.jetty:jetty-server:jar:8.1.8.v20121106 (compile?), org.eclipse.jetty:jetty-util:jar:8.1.8.v20121106 (compile?), org.eclipse.jetty:jetty-webapp:jar:8.1.8.v20121106 (compile?), org.codehaus.woodstox:wstx-asl:jar:3.2.7 (runtime), javax.servlet:servlet-api:jar:2.4 (provided), org.apache.httpcomponents:httpclient:jar:4.2.3 (compile), org.apache.httpcomponents:httpmime:jar:4.2.3 (compile), org.slf4j:slf4j-api:jar:1.6.4 (compile), junit:junit:jar:4.10 (test)]: Failed to read artifact descriptor for org.restlet.jee:org.restlet:jar:2.1.1: Could not transfer artifact org.restlet.jee:org.restlet:pom:2.1.1 from/to maven-restlet ( http://maven.restlet.org): Not authorized, ReasonPhrase:Unauthorized. - [Help 1] Has anyone encountered this issue? Thanks, Shahar.
Re: Preparing Solr 4.2.1 for IntelliJ fails - invalid sha1
Hi Shahar, I suspect you may have an older version of Ivy installed - the errors you're seeing look like IVY-1194 https://issues.apache.org/jira/browse/IVY-1194, which was fixed in Ivy 2.2.0. Lucene/Solr uses Ivy 2.3.0. Take a look in C:\Users\account\.ant\lib\ and remove older versions of ivy-*.jar, then run 'ant ivy-bootstrap' from the Solr source code to download ivy-2.3.0.jar to C:\Users\account\.ant\lib\. Just now on a Windows 7 box, I downloaded solr-4.2.1-src.tgz from one of the Apache mirrors, unpacked it, deleted my C:\Users\account\.ivy2\ directory (so that ivy would re-download everything), and ran 'ant idea' from a cmd window. BUILD SUCCESSFUL. Steve On Apr 25, 2013, at 6:07 AM, Shahar Davidson shah...@checkpoint.com wrote: Hi all, I'm trying to run 'ant idea' on 4.2.* and I'm getting invalid sha1 error messages. (see below) I'll appreciate any help, Shahar === . . . resolve ivy:retrieve :: problems summary :: WARNINGS problem while downloading module descriptor: http://repo1.maven.org/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (72ms) problem while downloading module descriptor: http://oss.sonatype.org/content/repositories/releases/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (53ms) problem while downloading module descriptor: http://maven.restlet.org/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (53ms) problem while downloading module descriptor: http://mirror.netcologne.de/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (58ms) module not found: org.apache.ant#ant;1.8.2 . . . public: tried http://repo1.maven.org/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom sonatype-releases: tried http://oss.sonatype.org/content/repositories/releases/org/apache/ant/ant/1.8.2/ant-1.8.2.pom maven.restlet.org: tried http://maven.restlet.org/org/apache/ant/ant/1.8.2/ant-1.8.2.pom working-chinese-mirror: tried http://mirror.netcologne.de/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom problem while downloading module descriptor: http://repo1.maven.org/maven2/junit/junit/4.10/junit-4.10.pom: invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (74ms) problem while downloading module descriptor: http://oss.sonatype.org/content/repositories/releases/junit/junit/4.10/junit-4.10.pom: invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (60ms) problem while downloading module descriptor: http://maven.restlet.org/junit/junit/4.10/junit-4.10.pom: invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (58ms) problem while downloading module descriptor: http://mirror.netcologne.de/maven2/junit/junit/4.10/junit-4.10.pom: invalid sha1: expected=!-- computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (60ms) module not found: junit#junit;4.10 . . . . :: :: UNRESOLVED DEPENDENCIES :: :: :: org.apache.ant#ant;1.8.2: not found :: junit#junit;4.10: not found :: com.carrotsearch.randomizedtesting#junit4-ant;2.0.8: not found :: com.carrotsearch.randomizedtesting#randomizedtesting-runner;2.0.8: not found :: :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS D:\apache_solr_4.2.1\lucene\common-build.xml:348: impossible to resolve dependencies: resolve failed - see output for details
Re: Did something change with Payloads?
Hi Jim, I faced almost the same issue with payloads recently, and thought I would rather write about it. Please see the link below (my blog). I hope it helps. http://hnagtech.wordpress.com/2013/04/19/using-payloads-with-solr-4-x/ http://hnagtech.wordpress.com/2013/04/19/using-payloads-with-solr-4-x/ Additionally, like what Mark Miller has said, using Solr 4.x, you have to add documents one by one during indexing, to reflect payload scores correctly. Like say.. solr.addBean(doc); solr.commit(); When you try to add documents as a collection through addBeans() there is only one .PAY file created and all documents are scored as per the payload score of the first document to be indexed. There is surely some problem with Lucene 4.1 codec APIs. So for now the above solution would work. Probably, I need to write a sequel to my first article regarding the above point on indexing. :) Thanks, Hari. -- View this message in context: http://lucene.472066.n3.nabble.com/Did-something-change-with-Payloads-tp4049561p4058919.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr maven install - authorization problem when downloading maven.restlet.org dependencies
Hi Shahar, On a Windows 7 box, after downloading solr-4.2.1-src.tgz from one of the Apache mirrors and unpacking it, I did the following from a cmd window: PROMPT cd solr-4.2.1 PROMPT ant get-maven-poms PROMPT cd maven-build PROMPT mvn install Is the above what you did? After a while, I see: - [INFO] [INFO] Building Apache Solr Core [INFO]task-segment: [install] [INFO] Downloading: http://maven.restlet.org/org/restlet/jee/org.restlet/2.1.1/org.restlet-2.1.1.pom 614b downloaded (org.restlet-2.1.1.pom) Downloading: http://maven.restlet.org/org/restlet/jee/org.restlet.parent/2.1.1/org.restlet.parent-2.1.1.pom 7K downloaded (org.restlet.parent-2.1.1.pom) Downloading: http://maven.restlet.org/org/restlet/jee/org.restlet.ext.servlet/2.1.1/org.restlet.ext.servlet-2.1.1.pom 907b downloaded (org.restlet.ext.servlet-2.1.1.pom) Downloading: http://maven.restlet.org/org/restlet/jee/org.restlet/2.1.1/org.restlet-2.1.1.jar […] 709K downloaded (org.restlet-2.1.1.jar) Downloading: http://maven.restlet.org/org/restlet/jee/org.restlet.ext.servlet/2.1.1/org.restlet.ext.servlet-2.1.1.jar 19K downloaded (org.restlet.ext.servlet-2.1.1.jar) - It's possible that the Restlet maven repository was temporarily malfunctioning. Have you tried again? Steve On Apr 25, 2013, at 8:53 AM, Shahar Davidson shah...@checkpoint.com wrote: Hi, I'm trying to build Solr 4.2.x with Maven and I'm getting the following error in solr-core: [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 1.341s [INFO] Finished at: Thu Apr 25 15:33:09 IDT 2013 [INFO] Final Memory: 12M/174M [INFO] [ERROR] Failed to execute goal on project solr-core: Could not resolve dependencies for project org.apache.solr:solr-core:jar:4.2.1-SNAPSHOT: Failed to collect dependencies for [org.apache.solr:solr-solrj:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-core:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-codecs:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-analyzers-common:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-analyzers-kuromoji:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-analyzers-morfologik:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-analyzers-phonetic:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-highlighter:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-memory:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-misc:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-queryparser:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-spatial:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-suggest:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-grouping:jar:4.2.1-SNAPSHOT (compile), org.apache.lucene:lucene-queries:jar:4.2.1-SNAPSHOT (compile), commons-codec:commons-codec:jar:1.7 (compile), commons-cli:commons-cli:jar:1.2 (compile), commons-fileupload:commons-fileupload:jar:1.2.1 (compile), org.restlet.jee:org.restlet:jar:2.1.1 (compile), org.restlet.jee:org.restlet.ext.servlet:jar:2.1.1 (compile), org.slf4j:jcl-over-slf4j:jar:1.6.4 (compile), org.slf4j:slf4j-jdk14:jar:1.6.4 (compile), commons-io:commons-io:jar:2.1 (compile), commons-lang:commons-lang:jar:2.6 (compile), com.google.guava:guava:jar:13.0.1 (compile), org.eclipse.jetty:jetty-server:jar:8.1.8.v20121106 (compile?), org.eclipse.jetty:jetty-util:jar:8.1.8.v20121106 (compile?), org.eclipse.jetty:jetty-webapp:jar:8.1.8.v20121106 (compile?), org.codehaus.woodstox:wstx-asl:jar:3.2.7 (runtime), javax.servlet:servlet-api:jar:2.4 (provided), org.apache.httpcomponents:httpclient:jar:4.2.3 (compile), org.apache.httpcomponents:httpmime:jar:4.2.3 (compile), org.slf4j:slf4j-api:jar:1.6.4 (compile), junit:junit:jar:4.10 (test)]: Failed to read artifact descriptor for org.restlet.jee:org.restlet:jar:2.1.1: Could not transfer artifact org.restlet.jee:org.restlet:pom:2.1.1 from/to maven-restlet (http://maven.restlet.org): Not authorized, ReasonPhrase:Unauthorized. - [Help 1] Has anyone encountered this issue? Thanks, Shahar.
Re: [solr 3.4] anomaly during distributed facet query with 102 shards
On Thu, Apr 25, 2013 at 8:32 AM, Dmitry Kan solrexp...@gmail.com wrote: Are there any distrib facet gurus on the list? I would be ready to try sensible ideas, including on the source code level, if someone of you could give me a hand. The Lucene/Solr Revolution conference is coming up next week, so I think many are busy creating their presentations. What version of Solr are you using? Have you tried using a newer version? Is it reproducable with a smaller cluster? If so, you could try using the included Jetty server instead of Tomcat to rule out that factor. -Yonik http://lucidworks.com
Re: [solr 3.4] anomaly during distributed facet query with 102 shards
Thanks, Yonik. Yes, I supposed that. We are in the pre-release phase, so we have the pressure. Solr 3.4. Would setting up 4.2.1 router work with 3.4 shards? On 25 Apr 2013 17:11, Yonik Seeley yo...@lucidworks.com wrote: On Thu, Apr 25, 2013 at 8:32 AM, Dmitry Kan solrexp...@gmail.com wrote: Are there any distrib facet gurus on the list? I would be ready to try sensible ideas, including on the source code level, if someone of you could give me a hand. The Lucene/Solr Revolution conference is coming up next week, so I think many are busy creating their presentations. What version of Solr are you using? Have you tried using a newer version? Is it reproducable with a smaller cluster? If so, you could try using the included Jetty server instead of Tomcat to rule out that factor. -Yonik http://lucidworks.com
What is the difference between a Join Query and Embedded Entities in Solr DIH?
Hello guys, i saw this thread on stackoverflow, but still not satisfied with the answers. I am trying to index data across multiple tables using Solr's Data Import Handler. The official wiki on the DIH suggests using embedded entities to link multiple tables like so: document entity name=item pk=id query=SELECT * FROM item entity name=member pk=memberid query=SELECT * FROM member WHERE memberid='${item.memberid}' /entity /entity /document Another way that works is: document entity name=item pk=id query=SELECT * FROM item INNER JOIN member ON item.memberid=member.memberid /entity /document Are these two methods functionally different? Is there a performance difference? Another though would be that, if using join tables in MySQL, using the SQL query method with multiple joins could cause multiple documents to be indexed instead of one. -- View this message in context: http://lucene.472066.n3.nabble.com/What-is-the-difference-between-a-Join-Query-and-Embedded-Entities-in-Solr-DIH-tp4058923.html Sent from the Solr - User mailing list archive at Nabble.com.
Question on storage and index/data management in solr
Hi, I am relatively new to solr and evaluating it for my project. I would have lots of data coming in at a fast rate (say 10 MB per sec) and I would need the recent data (last 24 hours, or last 100GB) to be searchable faster than the old data. I did a bit of reading on the controls provided by solr and came across the concept of mergeFactor (defaults to 10) - this means solr merges every 10 segments into one. However, I need something like this - 1. Keep each of last 24 hours segments separate. 2. Segments generated between last 48 to 24 hours to be merged into one. Similarly, for segments created between 72 to 48 hours and so on for last 1 week. 3. Similarly, merge previous 4 week's data into one segment each week. 4. Merge all previous months data into one segment each month. I am not sure if there is a configuration possible in solr application. If not, are there APIs which will allow me to do this? Also, I want to understand how solr stores data or does it have a dependency on the way data is stored. Since the volumes are high, it would be great if the data is compressed and stored (while still searchable). If it is possible, I would like to know what kind of compression does solr do? Thank you for the responses. Regards, Vinay
Re: Exact matching in Solr 3.6.1
Thanks for your reply but this solution does not fullfil my requirment because other documents (not exact matched) will be returned as well. -- View this message in context: http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058929.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What is the difference between a Join Query and Embedded Entities in Solr DIH?
I think JOIN is more performant as - by default - DIH will run an inner query for each outer one. You can use cached source, but JOIN will be still more efficient. The nested entities are more useful when the sources are heterogeneous (e.g. DB and XML) or when you need to do custom transformers in between. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Apr 25, 2013 at 10:17 AM, Gustav xbihy...@sharklasers.com wrote: Hello guys, i saw this thread on stackoverflow, but still not satisfied with the answers. I am trying to index data across multiple tables using Solr's Data Import Handler. The official wiki on the DIH suggests using embedded entities to link multiple tables like so: document entity name=item pk=id query=SELECT * FROM item entity name=member pk=memberid query=SELECT * FROM member WHERE memberid='${item.memberid}' /entity /entity /document Another way that works is: document entity name=item pk=id query=SELECT * FROM item INNER JOIN member ON item.memberid=member.memberid /entity /document Are these two methods functionally different? Is there a performance difference? Another though would be that, if using join tables in MySQL, using the SQL query method with multiple joins could cause multiple documents to be indexed instead of one. -- View this message in context: http://lucene.472066.n3.nabble.com/What-is-the-difference-between-a-Join-Query-and-Embedded-Entities-in-Solr-DIH-tp4058923.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: What is the difference between a Join Query and Embedded Entities in Solr DIH?
Gustav, DIH should give you the same results in both scenarios. The performance trade-offs depend on your data. In your case, it looks like there is a 1-to-1 or many-to-1 relationship between item and member, so use the SQL Join. You'll get all of your data in one query and you'll be using your rbdms for what it does best. But in the case there was a 1-to-many relationship between item and member, and especially if each item has several member rows, you might get better performance using the child entity setup. Although by default DIH is going to do an n+1 select on member. For every row in item, it will issue a separate query to the db. Also, DIH does not use prepared statements, so this might be a bad choice. To work around this, specify cacheImpl='SortedMapBackedCache' on the child entity (this is the same as using CachedSqlEntityProcessor instead of SqlEntityProcessor). Do not include a where clause in this child entity. Instead, specify cacheKey='memberId' and cacheLookup='item.memberId'. DIH will now pull down your entire member table in 1 query and cache it in memory, then it can do fast hash joins against item. But if your member table is too big to fit into memory, then you need to use a disk-backed cache instead of SortedMapBackedCache. For that, see https://issues.apache.org/jira/browse/SOLR-2948 and https://issues.apache.org/jira/browse/SOLR-2613 . James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Gustav [mailto:xbihy...@sharklasers.com] Sent: Thursday, April 25, 2013 9:17 AM To: solr-user@lucene.apache.org Subject: What is the difference between a Join Query and Embedded Entities in Solr DIH? Hello guys, i saw this thread on stackoverflow, but still not satisfied with the answers. I am trying to index data across multiple tables using Solr's Data Import Handler. The official wiki on the DIH suggests using embedded entities to link multiple tables like so: document entity name=item pk=id query=SELECT * FROM item entity name=member pk=memberid query=SELECT * FROM member WHERE memberid='${item.memberid}' /entity /entity /document Another way that works is: document entity name=item pk=id query=SELECT * FROM item INNER JOIN member ON item.memberid=member.memberid /entity /document Are these two methods functionally different? Is there a performance difference? Another though would be that, if using join tables in MySQL, using the SQL query method with multiple joins could cause multiple documents to be indexed instead of one. -- View this message in context: http://lucene.472066.n3.nabble.com/What-is-the-difference-between-a-Join-Query-and-Embedded-Entities-in-Solr-DIH-tp4058923.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrJ Custom RowMapper
Hi All, Does SolrJ have an option for a custom RowMapper or BeanPropertyRowMapper (I'm using Spring/JDBC terms). I know the QueryResponse has a getBeans method, but I would like to create my own mapping and plug it in. Any pointers? Thanks, Luis
Using another way instead of DIH
hi,all i using DIH to build index is slow , when it fetch 2 million rows , it will spend 20 minutes , very slow. i am not very familar with solr , try to using lucene direct building index file from db then move to solr folder. i am not sure ,that is right way. or any other good way? thanks a lot . -- View this message in context: http://lucene.472066.n3.nabble.com/Using-another-way-instead-of-DIH-tp4058937.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Solr For a Real Search Engine
Hi, No, start.jar is not deployed. That *is* Jetty. This is what the real Embedded Jetty is about: http://wiki.eclipse.org/Jetty/Tutorial/Embedding_Jetty What we have here is Solr is just an *included* Jetty, so it's easier to get started. That's all. :) Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, Apr 25, 2013 at 3:30 AM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Otis; You are right. start.jar starts up an Jetty and there is a war file under example directory and deploys start.jar to itself, is that true? 2013/4/25 Otis Gospodnetic otis.gospodne...@gmail.com Suggestion : Don't call this embedded Jetty to avoid confusion with the actual embedded jetty. Otis Solr ElasticSearch Support http://sematext.com/ On Apr 23, 2013 4:56 PM, Furkan KAMACI furkankam...@gmail.com wrote: Thanks for the answers. I will go with embedded Jetty for my SolrCloud. If I face with something important I would want to share my experiences with you. 2013/4/23 Shawn Heisey s...@elyograg.org On 4/23/2013 2:25 PM, Furkan KAMACI wrote: Is there any documentation that explains using Jetty as embedded or not? I use Solr deployed at Tomcat but after you message I will consider about Jetty. If we think about other issues i.e. when I want to update my Solr jars/wars etc.(this is just an foo example) does any pros and cons Tomcat or Jetty has? The Jetty in the example is only 'embedded' in the sense that you don't have to install it separately. It is not special -- the Jetty components are not changed at all, a subset of them is just included in the Solr download with a tuned configuration file. If you go to www.eclipse.org/jetty and download the latest stable-8 version, you'll see some familiar things - start.jar, an etc directory, a lib directory, and a contexts directory. They have more in them than the example does -- extra functionality Solr doesn't need. If you want to start the downloaded version, you can use 'java -jar start.jar' just like you do with Solr. Thanks, Shawn
Re: what is the maximum XML file size to import?
Hi, Even if you could import giant files, I'd avoid it because it feels like just asking for trouble. Chunk the file into smaller pieces. You can index such smaller pieces in parallel, too, and end up with faster indexing as the result. Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, Apr 25, 2013 at 12:10 AM, Sharmila Thapa shar...@gmail.com wrote: Yes, I have again tried to post the XML of size 2.02GB, now it throws a different error message http://lucene.472066.n3.nabble.com/file/n4058825/1.png While searching the cause for this error message it is found that java's setFixedLengthStreamingMode method throws this error. Reference:Documentation:http://docs.oracle.com/javase/7/docs/api/java/net/HttpURLConnection.html. So we have to limit size of the file to 16 bits i.e. 2GB. I have also tried by setting unlimited java heap size, but does not work. So is there anything that can be done to support upto 6GB xml file size. If possible I would like to try to use java -Durl to import the XML data. If this does not work, then I will try for other alternatives as you have suggested DIH. -- View this message in context: http://lucene.472066.n3.nabble.com/what-is-the-maximum-XML-file-size-to-import-tp4058263p4058825.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Solr For a Real Search Engine
Thanks, Otis I got it. 2013/4/25 Otis Gospodnetic otis.gospodne...@gmail.com Hi, No, start.jar is not deployed. That *is* Jetty. This is what the real Embedded Jetty is about: http://wiki.eclipse.org/Jetty/Tutorial/Embedding_Jetty What we have here is Solr is just an *included* Jetty, so it's easier to get started. That's all. :) Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, Apr 25, 2013 at 3:30 AM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Otis; You are right. start.jar starts up an Jetty and there is a war file under example directory and deploys start.jar to itself, is that true? 2013/4/25 Otis Gospodnetic otis.gospodne...@gmail.com Suggestion : Don't call this embedded Jetty to avoid confusion with the actual embedded jetty. Otis Solr ElasticSearch Support http://sematext.com/ On Apr 23, 2013 4:56 PM, Furkan KAMACI furkankam...@gmail.com wrote: Thanks for the answers. I will go with embedded Jetty for my SolrCloud. If I face with something important I would want to share my experiences with you. 2013/4/23 Shawn Heisey s...@elyograg.org On 4/23/2013 2:25 PM, Furkan KAMACI wrote: Is there any documentation that explains using Jetty as embedded or not? I use Solr deployed at Tomcat but after you message I will consider about Jetty. If we think about other issues i.e. when I want to update my Solr jars/wars etc.(this is just an foo example) does any pros and cons Tomcat or Jetty has? The Jetty in the example is only 'embedded' in the sense that you don't have to install it separately. It is not special -- the Jetty components are not changed at all, a subset of them is just included in the Solr download with a tuned configuration file. If you go to www.eclipse.org/jetty and download the latest stable-8 version, you'll see some familiar things - start.jar, an etc directory, a lib directory, and a contexts directory. They have more in them than the example does -- extra functionality Solr doesn't need. If you want to start the downloaded version, you can use 'java -jar start.jar' just like you do with Solr. Thanks, Shawn
Re: Exact matching in Solr 3.6.1
Well then just do an exact match ONLY! It sounds like you haven't worked out the inconsistencies in your requirements. To be clear: We're not offering you solutions - that's your job. We're only pointing out tools that you can use. It is up to you to utilize the tools wisely to implement your solution. I suspect that you simply haven't experimented enough with various boosts to assure that the unstemmed result is consistently higher. Maybe you need a custom stemmer or stemmer overide so that passengers does get stemmed to passenger, but cats does not (but dogs does.) That can be a choice that you can make, but I would urge caution. Still, it is a decision that you can make - it's not a matter of Solr forcing or preventing you. I still think boosting of an unstemmed field should be sufficient. But until you clarify the inconsistencies in your requirements, we won't be able to make much progress. -- Jack Krupansky -Original Message- From: vsl Sent: Thursday, April 25, 2013 10:45 AM To: solr-user@lucene.apache.org Subject: Re: Exact matching in Solr 3.6.1 Thanks for your reply but this solution does not fullfil my requirment because other documents (not exact matched) will be returned as well. -- View this message in context: http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058929.html Sent from the Solr - User mailing list archive at Nabble.com.
How to Clean Zookeeper Data for Solr
I have a Zookeepeer ensemble with three machines. I have started a cluster with one shard. However I decided to change my shard number. I want to clean Zookeeper data but whatever I do I always get one shard and rest of added Solr nodes are as replica. What should I do?
Re: Exact matching in Solr 3.6.1
Agree with Jack. The current field type text_general is designed to match the query tokens instead of exact matches - so it's not able to fulfill your requirements. Can you use flat file http://wiki.apache.org/solr/FileBasedSpellCheckeras spell check dictionary instead and that way you can search on exact matched field while generating spell check suggestions from the file instead of from index? -S On 25 April 2013 16:25, Jack Krupansky j...@basetechnology.com wrote: Well then just do an exact match ONLY! It sounds like you haven't worked out the inconsistencies in your requirements. To be clear: We're not offering you solutions - that's your job. We're only pointing out tools that you can use. It is up to you to utilize the tools wisely to implement your solution. I suspect that you simply haven't experimented enough with various boosts to assure that the unstemmed result is consistently higher. Maybe you need a custom stemmer or stemmer overide so that passengers does get stemmed to passenger, but cats does not (but dogs does.) That can be a choice that you can make, but I would urge caution. Still, it is a decision that you can make - it's not a matter of Solr forcing or preventing you. I still think boosting of an unstemmed field should be sufficient. But until you clarify the inconsistencies in your requirements, we won't be able to make much progress. -- Jack Krupansky -Original Message- From: vsl Sent: Thursday, April 25, 2013 10:45 AM To: solr-user@lucene.apache.org Subject: Re: Exact matching in Solr 3.6.1 Thanks for your reply but this solution does not fullfil my requirment because other documents (not exact matched) will be returned as well. -- View this message in context: http://lucene.472066.n3.** nabble.com/Exact-matching-in-**Solr-3-6-1-tp4058865p4058929.**htmlhttp://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058929.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to Clean Zookeeper Data for Solr
This is what I have done. 1. Turn off all your Solr nodes. 2. Ssh to one of your zookeeper machines and run Zookeeper's CLI. On my machine, it's in /usr/lib/zookeeper/bin. 3. If you've chrooted Solr, just rmr /solr_chroot_dir. Otherwise, use rmr to delete these files and folders: clusterstate.json aliases.json live_nodes overseer overseer_elect collections If you use a chroot jail, make it again with create /solr_chroot_dir [] 4. Use Solr's zkCli to upload your configs again. 5. Start all your Solr nodes. 6. Create your collections again. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Thu, Apr 25, 2013 at 11:27 AM, Furkan KAMACI furkankam...@gmail.com wrote: I have a Zookeepeer ensemble with three machines. I have started a cluster with one shard. However I decided to change my shard number. I want to clean Zookeeper data but whatever I do I always get one shard and rest of added Solr nodes are as replica. What should I do?
RE: Using another way instead of DIH
If you post your data-config.xml here, someone might be able to find something you could change to speed things up. If the issue is parallelization, then you could possibly partition your data somehow and then run multiple DIH request handlers at the same time. This might be easier than writing your own update program. If you still think you need to write something custom, see this: http://wiki.apache.org/solr/Solrj#Adding_Data_to_Solr James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: xiaoqi [mailto:belivexia...@gmail.com] Sent: Thursday, April 25, 2013 10:01 AM To: solr-user@lucene.apache.org Subject: Using another way instead of DIH hi,all i using DIH to build index is slow , when it fetch 2 million rows , it will spend 20 minutes , very slow. i am not very familar with solr , try to using lucene direct building index file from db then move to solr folder. i am not sure ,that is right way. or any other good way? thanks a lot . -- View this message in context: http://lucene.472066.n3.nabble.com/Using-another-way-instead-of-DIH-tp4058937.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Deletes and inserts
Thanks Michael, How do you handle configurations in zookeeper? I tried reusing the same configuration but I'm getting an error message that may mean that doesn't work. Or maybe I'm doing something wrong. On Wed, Apr 24, 2013 at 12:50 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: We're using aliases to control visibility of collections we rebuild from scratch nightly. It works pretty well. If you run CREATEALIAS again, it'll switch to a new one, not augment the old one. If for some reason, you want to bridge more than one collection, you can add more than one collection to the alias at creation time, but then it becomes read-only. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Wed, Apr 24, 2013 at 12:26 PM, Jon Strayer j...@strayer.org wrote: We are using a Solr collection to serve auto complete suggestions. We'd like for the update to be without any noticeable delay for the users. I've been looking at adding new cores, loading them with the new data and then swapping them with the current ones, but but I don't see how that would work in a cloud installation. It seems that when I create a new core it is part of the collection and the old data will start replicating to it. Is that correct? I've also looked at standing up a new collection and then adding an alias for it, but that's not well documented. If the alias already exists and I add to to another collection is it removed from the first collection? I'm open to any suggestions. -- To *know* is one thing, and to know for certain *that* we know is another. --William James -- To *know* is one thing, and to know for certain *that* we know is another. --William James
Re: How to Clean Zookeeper Data for Solr
Nice. Sounds like FAQ/Wiki material, Mike! :) Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, Apr 25, 2013 at 11:33 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: This is what I have done. 1. Turn off all your Solr nodes. 2. Ssh to one of your zookeeper machines and run Zookeeper's CLI. On my machine, it's in /usr/lib/zookeeper/bin. 3. If you've chrooted Solr, just rmr /solr_chroot_dir. Otherwise, use rmr to delete these files and folders: clusterstate.json aliases.json live_nodes overseer overseer_elect collections If you use a chroot jail, make it again with create /solr_chroot_dir [] 4. Use Solr's zkCli to upload your configs again. 5. Start all your Solr nodes. 6. Create your collections again. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Thu, Apr 25, 2013 at 11:27 AM, Furkan KAMACI furkankam...@gmail.com wrote: I have a Zookeepeer ensemble with three machines. I have started a cluster with one shard. However I decided to change my shard number. I want to clean Zookeeper data but whatever I do I always get one shard and rest of added Solr nodes are as replica. What should I do?
Re: How to Clean Zookeeper Data for Solr
You said: Otherwise, use rmr to delete these files and folders. Can you give an example? 2013/4/25 Otis Gospodnetic otis.gospodne...@gmail.com Nice. Sounds like FAQ/Wiki material, Mike! :) Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, Apr 25, 2013 at 11:33 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: This is what I have done. 1. Turn off all your Solr nodes. 2. Ssh to one of your zookeeper machines and run Zookeeper's CLI. On my machine, it's in /usr/lib/zookeeper/bin. 3. If you've chrooted Solr, just rmr /solr_chroot_dir. Otherwise, use rmr to delete these files and folders: clusterstate.json aliases.json live_nodes overseer overseer_elect collections If you use a chroot jail, make it again with create /solr_chroot_dir [] 4. Use Solr's zkCli to upload your configs again. 5. Start all your Solr nodes. 6. Create your collections again. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Thu, Apr 25, 2013 at 11:27 AM, Furkan KAMACI furkankam...@gmail.com wrote: I have a Zookeepeer ensemble with three machines. I have started a cluster with one shard. However I decided to change my shard number. I want to clean Zookeeper data but whatever I do I always get one shard and rest of added Solr nodes are as replica. What should I do?
Re: How to Clean Zookeeper Data for Solr
What are you doing to clean zk? You should be able to simply use the ZkCli clear cmd: http://wiki.apache.org/solr/SolrCloud#Command_Line_Util Just make sure you stop your Solr instances before clearing it. Clearing out zk from under a running Solr instance is not a good thing to do. This should be as simple as, stop your Solr instances, use the clean command on / or /solr (whatever the root is in zk for you Solr stuff), start your Solr instances, create the collection again. - Mark On Apr 25, 2013, at 11:27 AM, Furkan KAMACI furkankam...@gmail.com wrote: I have a Zookeepeer ensemble with three machines. I have started a cluster with one shard. However I decided to change my shard number. I want to clean Zookeeper data but whatever I do I always get one shard and rest of added Solr nodes are as replica. What should I do?
Re: How to Clean Zookeeper Data for Solr
Of course deleting the collection and then recreating it should also work - if it doesn't, there is a bug to address. - Mark On Apr 25, 2013, at 12:00 PM, Mark Miller markrmil...@gmail.com wrote: What are you doing to clean zk? You should be able to simply use the ZkCli clear cmd: http://wiki.apache.org/solr/SolrCloud#Command_Line_Util Just make sure you stop your Solr instances before clearing it. Clearing out zk from under a running Solr instance is not a good thing to do. This should be as simple as, stop your Solr instances, use the clean command on / or /solr (whatever the root is in zk for you Solr stuff), start your Solr instances, create the collection again. - Mark On Apr 25, 2013, at 11:27 AM, Furkan KAMACI furkankam...@gmail.com wrote: I have a Zookeepeer ensemble with three machines. I have started a cluster with one shard. However I decided to change my shard number. I want to clean Zookeeper data but whatever I do I always get one shard and rest of added Solr nodes are as replica. What should I do?
Re: How to Clean Zookeeper Data for Solr
Hi; If you can help it would be nice: I have erased the data. I use that commands: Firstly I do that: java -Xms512M -Xmx5120M -Dsolr.solr.home=/home/solr-4.2.1/solr -Dsolr.data.dir=/home/solr-4.2.1/solr/data -Dnumshards=2 -Dbootstrap_confdir=/home/solr-4.2.1/solr/collection1/conf -Dcollection.configName=myconf -jar start.jar and do that: java -Xms512M -Xmx5120M -Dsolr.solr.home=/home/solr-4.2.1/solr -Dsolr.data.dir=/home/solr-4.2.1/solr/data -jar start.jar However when I look at the graph at Admin GUI there is only one shard but two replicas? What is the problem why it is not two shards? 2013/4/25 Mark Miller markrmil...@gmail.com Of course deleting the collection and then recreating it should also work - if it doesn't, there is a bug to address. - Mark On Apr 25, 2013, at 12:00 PM, Mark Miller markrmil...@gmail.com wrote: What are you doing to clean zk? You should be able to simply use the ZkCli clear cmd: http://wiki.apache.org/solr/SolrCloud#Command_Line_Util Just make sure you stop your Solr instances before clearing it. Clearing out zk from under a running Solr instance is not a good thing to do. This should be as simple as, stop your Solr instances, use the clean command on / or /solr (whatever the root is in zk for you Solr stuff), start your Solr instances, create the collection again. - Mark On Apr 25, 2013, at 11:27 AM, Furkan KAMACI furkankam...@gmail.com wrote: I have a Zookeepeer ensemble with three machines. I have started a cluster with one shard. However I decided to change my shard number. I want to clean Zookeeper data but whatever I do I always get one shard and rest of added Solr nodes are as replica. What should I do?
Re: How to Clean Zookeeper Data for Solr
Today I learned there's a clear command in the command line util. :) Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Thu, Apr 25, 2013 at 12:00 PM, Mark Miller markrmil...@gmail.com wrote: What are you doing to clean zk? You should be able to simply use the ZkCli clear cmd: http://wiki.apache.org/solr/SolrCloud#Command_Line_Util Just make sure you stop your Solr instances before clearing it. Clearing out zk from under a running Solr instance is not a good thing to do. This should be as simple as, stop your Solr instances, use the clean command on / or /solr (whatever the root is in zk for you Solr stuff), start your Solr instances, create the collection again. - Mark On Apr 25, 2013, at 11:27 AM, Furkan KAMACI furkankam...@gmail.com wrote: I have a Zookeepeer ensemble with three machines. I have started a cluster with one shard. However I decided to change my shard number. I want to clean Zookeeper data but whatever I do I always get one shard and rest of added Solr nodes are as replica. What should I do?
Re: How to Clean Zookeeper Data for Solr
Ooppss, I wrote numshards, I think it should be numShards 2013/4/25 Michael Della Bitta michael.della.bi...@appinions.com Today I learned there's a clear command in the command line util. :) Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Thu, Apr 25, 2013 at 12:00 PM, Mark Miller markrmil...@gmail.com wrote: What are you doing to clean zk? You should be able to simply use the ZkCli clear cmd: http://wiki.apache.org/solr/SolrCloud#Command_Line_Util Just make sure you stop your Solr instances before clearing it. Clearing out zk from under a running Solr instance is not a good thing to do. This should be as simple as, stop your Solr instances, use the clean command on / or /solr (whatever the root is in zk for you Solr stuff), start your Solr instances, create the collection again. - Mark On Apr 25, 2013, at 11:27 AM, Furkan KAMACI furkankam...@gmail.com wrote: I have a Zookeepeer ensemble with three machines. I have started a cluster with one shard. However I decided to change my shard number. I want to clean Zookeeper data but whatever I do I always get one shard and rest of added Solr nodes are as replica. What should I do?
Re: How to Clean Zookeeper Data for Solr
I think it's numShards, not numshards. - Mark On Apr 25, 2013, at 12:07 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; If you can help it would be nice: I have erased the data. I use that commands: Firstly I do that: java -Xms512M -Xmx5120M -Dsolr.solr.home=/home/solr-4.2.1/solr -Dsolr.data.dir=/home/solr-4.2.1/solr/data -Dnumshards=2 -Dbootstrap_confdir=/home/solr-4.2.1/solr/collection1/conf -Dcollection.configName=myconf -jar start.jar and do that: java -Xms512M -Xmx5120M -Dsolr.solr.home=/home/solr-4.2.1/solr -Dsolr.data.dir=/home/solr-4.2.1/solr/data -jar start.jar However when I look at the graph at Admin GUI there is only one shard but two replicas? What is the problem why it is not two shards? 2013/4/25 Mark Miller markrmil...@gmail.com Of course deleting the collection and then recreating it should also work - if it doesn't, there is a bug to address. - Mark On Apr 25, 2013, at 12:00 PM, Mark Miller markrmil...@gmail.com wrote: What are you doing to clean zk? You should be able to simply use the ZkCli clear cmd: http://wiki.apache.org/solr/SolrCloud#Command_Line_Util Just make sure you stop your Solr instances before clearing it. Clearing out zk from under a running Solr instance is not a good thing to do. This should be as simple as, stop your Solr instances, use the clean command on / or /solr (whatever the root is in zk for you Solr stuff), start your Solr instances, create the collection again. - Mark On Apr 25, 2013, at 11:27 AM, Furkan KAMACI furkankam...@gmail.com wrote: I have a Zookeepeer ensemble with three machines. I have started a cluster with one shard. However I decided to change my shard number. I want to clean Zookeeper data but whatever I do I always get one shard and rest of added Solr nodes are as replica. What should I do?
Re: How do set compression for compression on stored fields in SOLR 4.2.1
Hi, Is the question how/where to set that? This is what I found in my repo checkout: $ ffxg COMPRE ./core/src/test-files/solr/collection1/conf/solrconfig-slave.xml: str name=compressionCOMPRESSION/str Hm, but that's about replication compression. Maybe we don't have any examples of this in configs? Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Apr 24, 2013 at 3:06 PM, William Bell billnb...@gmail.com wrote: https://issues.apache.org/jira/browse/LUCENE-4226 It mentions that we can set compression mode: FAST, HIGH_COMPRESSION, FAST_UNCOMPRESSION. -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: How to Clean Zookeeper Data for Solr
Ok, it works 2013/4/25 Mark Miller markrmil...@gmail.com I think it's numShards, not numshards. - Mark On Apr 25, 2013, at 12:07 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; If you can help it would be nice: I have erased the data. I use that commands: Firstly I do that: java -Xms512M -Xmx5120M -Dsolr.solr.home=/home/solr-4.2.1/solr -Dsolr.data.dir=/home/solr-4.2.1/solr/data -Dnumshards=2 -Dbootstrap_confdir=/home/solr-4.2.1/solr/collection1/conf -Dcollection.configName=myconf -jar start.jar and do that: java -Xms512M -Xmx5120M -Dsolr.solr.home=/home/solr-4.2.1/solr -Dsolr.data.dir=/home/solr-4.2.1/solr/data -jar start.jar However when I look at the graph at Admin GUI there is only one shard but two replicas? What is the problem why it is not two shards? 2013/4/25 Mark Miller markrmil...@gmail.com Of course deleting the collection and then recreating it should also work - if it doesn't, there is a bug to address. - Mark On Apr 25, 2013, at 12:00 PM, Mark Miller markrmil...@gmail.com wrote: What are you doing to clean zk? You should be able to simply use the ZkCli clear cmd: http://wiki.apache.org/solr/SolrCloud#Command_Line_Util Just make sure you stop your Solr instances before clearing it. Clearing out zk from under a running Solr instance is not a good thing to do. This should be as simple as, stop your Solr instances, use the clean command on / or /solr (whatever the root is in zk for you Solr stuff), start your Solr instances, create the collection again. - Mark On Apr 25, 2013, at 11:27 AM, Furkan KAMACI furkankam...@gmail.com wrote: I have a Zookeepeer ensemble with three machines. I have started a cluster with one shard. However I decided to change my shard number. I want to clean Zookeeper data but whatever I do I always get one shard and rest of added Solr nodes are as replica. What should I do?
Re: Deletes and inserts
We've successfully reused the same config in Zookeeper across multiple collections and using aliases. Could you describe your problem? What does the error say? Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Thu, Apr 25, 2013 at 11:44 AM, Jon Strayer j...@strayer.org wrote: Thanks Michael, How do you handle configurations in zookeeper? I tried reusing the same configuration but I'm getting an error message that may mean that doesn't work. Or maybe I'm doing something wrong. On Wed, Apr 24, 2013 at 12:50 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: We're using aliases to control visibility of collections we rebuild from scratch nightly. It works pretty well. If you run CREATEALIAS again, it'll switch to a new one, not augment the old one. If for some reason, you want to bridge more than one collection, you can add more than one collection to the alias at creation time, but then it becomes read-only. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Wed, Apr 24, 2013 at 12:26 PM, Jon Strayer j...@strayer.org wrote: We are using a Solr collection to serve auto complete suggestions. We'd like for the update to be without any noticeable delay for the users. I've been looking at adding new cores, loading them with the new data and then swapping them with the current ones, but but I don't see how that would work in a cloud installation. It seems that when I create a new core it is part of the collection and the old data will start replicating to it. Is that correct? I've also looked at standing up a new collection and then adding an alias for it, but that's not well documented. If the alias already exists and I add to to another collection is it removed from the first collection? I'm open to any suggestions. -- To *know* is one thing, and to know for certain *that* we know is another. --William James -- To *know* is one thing, and to know for certain *that* we know is another. --William James
Re: how to get display Jessionid with solr results
You should look into the documentation of your load balancer to see how you can enable sticky sessions. If you've already done that and the load balancer requires jsessionid rather than using it's own sticky session method, it looks like documentation for using jsessionid with Jetty is here: http://wiki.eclipse.org/Jetty/Howto/SessionIds Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Wed, Apr 24, 2013 at 6:36 PM, gpssolr2020 psgoms...@gmail.com wrote: Hi, We are using jetty as a container for solr 3.6. We have two slave servers to serve queries for the user request and queries distributed to any one slave through load balancer. When one user send a first search request say its going to slave1 and when that user queries again we want to send the query to the same server with the help of Jsessionid. how to achieve this? How to get that Jsessionid with solr search results? Please provide your suggestions. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-get-display-Jessionid-with-solr-results-tp4058751.html Sent from the Solr - User mailing list archive at Nabble.com.
Need to log query request before it is processed
I would like to log query requests before they are processed. Currently, it seems they are only logged after being processed. I've tried enabling a finer logging level but that didn't seem to help. I've enabled request logging in Jetty but most queries come in as POSTs from SolrJ I was thinking of adding a query request logger as a first-component but wanted to see what others have done for this? Thanks. Tim
Re: Problem with solr deployment on weblogic 10.3
On 4/25/2013 12:04 AM, Shawn Heisey wrote: It looks like the solution is adding some config to the weblogic.xml file in the solr.war so that weblogic prefers application classes. I filed SOLR-4762. I do not know if this change might have unintended consequences. http://ananthkannan.blogspot.com/2009/08/beware-of-stringutilscontainsignorecase.html https://issues.apache.org/jira/browse/SOLR-4762 Radhakrishna: Do you know how to extract solr.war, change the WEB-INF/weblogic.xml file, and repack it? I have created a patch for the Solr source code, but I don't have weblogic, so I can't test it to make sure it works. I am running tests to make sure that the change doesn't break anything else. Alternatively, you could download the source code, apply the patch I uploaded to SOLR-4762, build Solr, and try the changed version. You never said what version of Solr you are using. The important part of the patch should apply correctly to the source of most versions. The CHANGES.txt part of the patch will fail on anything older than the 4.x dev branch (4.4), but that's not an important part of the patch. Thanks, Shawn
Re: What is the difference between a Join Query and Embedded Entities in Solr DIH?
On 4/25/2013 8:17 AM, Gustav wrote: Are these two methods functionally different? Is there a performance difference? Another though would be that, if using join tables in MySQL, using the SQL query method with multiple joins could cause multiple documents to be indexed instead of one. They may be equivalent in terms of results, but they work differently and probably will NOT have the same performance. When using nested entities in DIH, the main entity results in one SQL query, but the inner entities will result in a separate SQL query for every single item returned by the main query. If you have exactly 1 million rows in your main table and you're using a nested config with two entities, you will be executing 101 queries. DIH will be spending a fair amount of time doing nothing but waiting for the latency on a million individual queries via JDBC. It probably also results in extra work for the database server. With a server-side join, you're down to one query via JDBC, and the database server is doing the work of combining your tables, normally something it can do very efficiently. Thanks, Shawn
Re: Question on storage and index/data management in solr
On 4/25/2013 8:39 AM, Vinay Rai wrote: 1. Keep each of last 24 hours segments separate. 2. Segments generated between last 48 to 24 hours to be merged into one. Similarly, for segments created between 72 to 48 hours and so on for last 1 week. 3. Similarly, merge previous 4 week's data into one segment each week. 4. Merge all previous months data into one segment each month. I am not sure if there is a configuration possible in solr application. If not, are there APIs which will allow me to do this? To accomplish this exact scenario, you would probably have to write a custom merge policy class for Lucene. If you do so, I hope you'll strongly consider donating it to the Lucene/Solr project. Another approach: Use distributed search and put the divisions you are looking at into separate indexes (shards) in their own cores. You can then manually do whatever index merging your situation requires. Constructing the shards parameter for your queries will take some work. Here's a blog post about this method and a video of the Lucene Revolution talk mentioned in the blog post: http://www.loggly.com/blog/2010/08/our-solr-system/ http://loggly.com/videos/lucene-revolution-2010/ I had the honor of being there for that talk in Boston. They've done some amazing things with Solr. Also, I want to understand how solr stores data or does it have a dependency on the way data is stored. Since the volumes are high, it would be great if the data is compressed and stored (while still searchable). If it is possible, I would like to know what kind of compression does solr do? Solr 4.1 uses compression for stored fields. Solr 4.2 also uses compression for term vectors. From a performance perspective, compression is probably not viable at this time for the indexed data, but if that changes in the future, I'm sure that it will be added. Here is documentation on the file format used by Solr 4.2: http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/codecs/lucene42/package-summary.html#package_description Thanks, Shawn
Re: Solr metrics in Codahale metrics and Graphite?
On 4/25/2013 6:30 AM, Dmitry Kan wrote: We are very much interested in 3.4. On Thu, Apr 25, 2013 at 12:55 PM, Alan Woodward a...@flax.co.uk wrote: This is on top of trunk at the moment, but would be back ported to 4.4 if there was interest. This will be bad news, I'm sorry: All remaining work on 3.x versions happens in the 3.6 branch. This branch is in maintenance mode. It will only get fixes for serious bugs with no workaround. Improvements and new features won't be considered at all. You're welcome to try backporting patches from newer issues. Due to the major differences in the 3x and 4x codebases, the best case scenario is that you'll be facing a very manual task. Some changes can't be backported because they rely on other features only found in 4.x code. Thanks, Shawn
Atomic update issue with 4.0 and 4.2.1
Hi everyone , We have hit this strange bug using the atomic update functionality of both SOLR 4.0 and SOLR 4.2.1. We're currently posting a JSON formatted file to the core's updater using a simple curl method however we've run a very bizarre error where periodically it will fail and return a 400 error message. If we were to send the exact same request and file 5 minutes later, sometimes it will be accepted and return a 200 and other times it will continue to throw 400's. This tends to happen when the SOLR is receiving a lot of updates and restarting tomcat seems to clear up the issue, however I feel that there is probably something important that I am missing. The error message that it throws is quite strange and I don't really feel that it means very much because we can fire the exact same message 5 minutes later and it will happily fill that field. I am positive that I am only sending the value 965.00 in this case. 2013-04-25 00:20:39,373 [ERROR] org.apache.solr.core.SolrCore org.apache.solr.common.SolrException: ERROR: [doc=1764656] Error adding field 'maxPrice'='java.math.BigDecimal:965.' msg=For input string: java.math.BigDecimal:965.' msg=For input string: java.math.BigDecimal:965. ---at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:300) ---at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:73) ---at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:199) ---at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) ---at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) ---at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:451) ---at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:587) ---at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:346) ---at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) ---at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:387) ---at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:112) ---at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:96) ---at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:60) ---at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) ---at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) ---at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) ---at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817) ---at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639) ---at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) ---at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) ---at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) ---at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) ---at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) ---at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) ---at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) ---at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) ---at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:931) ---at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) ---at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) ---at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004) ---at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) ---at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) ---at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) ---at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) ---at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.NumberFormatException: For input string: java.math.BigDecimal:965. ---at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241) ---at java.lang.Float.parseFloat(Float.java:452) ---at org.apache.solr.schema.TrieField.createField(TrieField.java:598) ---at org.apache.solr.schema.TrieField.createFields(TrieField.java:655) ---at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:180) ---at
Cloudspace and Solr Support Page
Hi there, We offer Solr support and were wondering how we would go about being added to the Solr Support page http://wiki.apache.org/solr/Support? Thanks so much for your time! -- [image: Cloudspace.com] http://www.cloudspace.com/Nina TalleyAccount ManagerOffice: 877.823.8808 11551 University Blvd, Suite 2 Orlando, FL 32817
Massive Positions Files
Hi All, I'm indexing a pretty large collection of documents (about 500K relatively long documents taking up 1TB space, mostly in MS Office formats), and am confused about the file sizes in the index. I've gotten through about 180K documents, and the *.pos files add up to 325GB, while the all of the rest combined are using less than 5GB--including some large stored fields and term vectors. It makes sense to me that the compression on stored fields helps to keep that part down on large text fields, and that term vectors wouldn't be too big since they don't need position information, but the magnitude of the difference is alarming. Is that to be expected? Is there any way to reduce the size of the positions index if phrase searching is a requirement? I am using Solr 4.2.1. These documents have some a number of small metadata elements, along with the big content field. Like the default schema, I'm storing but not indexing the content field, and a lot of the fields get put into a catchall that is indexed and uses term vectors, but is not stored. Thanks, Mike
Re: Cloudspace and Solr Support Page
Hi, Just give your WIKI user name and we'll give you access to edit that page to add yourself. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 25. apr. 2013 kl. 21:39 skrev Nina Talley n...@cloudspace.com: Hi there, We offer Solr support and were wondering how we would go about being added to the Solr Support page http://wiki.apache.org/solr/Support? Thanks so much for your time! -- [image: Cloudspace.com] http://www.cloudspace.com/Nina TalleyAccount ManagerOffice: 877.823.8808 11551 University Blvd, Suite 2 Orlando, FL 32817
Re: Reordered DBQ.
OK. Thanks for explanation. On 23 April 2013 23:16, Yonik Seeley yo...@lucidworks.com wrote: On Tue, Apr 23, 2013 at 3:51 PM, Marcin Rzewucki mrzewu...@gmail.com wrote: Recently I noticed a lot of Reordered DBQs detected messages in logs. As far as I checked in logs it could be related with deleting documents, but not sure. Do you know what is the reason of those messages ? For high throughput indexing, we version updates on the leader and forward onto other replicas w/o strict serialization. If on a leader, an add happened before a DBQ, then on a replica the DBQ is serviced before the add, Solr detects this reordering and fixes it. It's not an error or an indication that anything is wrong (hence the INFO level log message). -Yonik http://lucidworks.com
How To Make Index Backup at SolrCloud?
I use SolrCloud. Let's assume that I want to move all indexes from one place to another. There maybe two reasons for that: First one is that: I will close all my system and I will use new machines with previous indexes (if it is a must they may have same network topology) at anywhere else after some time later. Second one is that: I know that SolrCloud handles failures but I will back up my indexes for a disaster event. How can I back up my indexes? I know that I can start up new nodes and I can close the old ones so I can move my indexes to other machines. However how can I do such kind of backup (should I just copy data folder of Solr nodes and put them to new Solr nodes after I change Zookeeper configuration)? What folks do?
Re: filter before facet
On Thu, Apr 25, 2013 at 12:35 AM, Toke Eskildsen t...@statsbiblioteket.dkwrote: This leads me to believe that the FQ is being applied AFTER the facets are calculated on the whole data set. For my use case it would make a ton of sense to apply the FQ first and then facet. Is it possible to specify this behavior or do I need to get into the code and get my hands dirty? As for creating a new faceting implementation that avoids the startup penalty by using only the found documents, then it is technically quite simple: Use stored fields, iterate the hits and request the values. Unfortunately this scales poorly with the number of hits, so unless you can guarantee that you will always have small result sets, this is probably not a viable option. Thank you Toke for your detailed reply. I have perhaps an unusual use case where we may have hundreds of thousands of users each with a few thousand documents. On some queries I can guarantee the result size will be small compared to the entire corpus since I'm filtering on one user's documents. I may give this alternative faceting implementation a try. Best regards, Daniel
Re: Massive Positions Files
These are the postings for all terms - the lists of positions for every occurrence of every term for all documents. Sounds to me like it could be huge. Did you try a back of the envelope calculation? 3.25 GB divided by 180K = 18 K per doc (call it 2K). How many words in a document? You say they are long. Even if there were only 5000 to 1 postings per long document, that would work out to 2 to 4 bytes or so per posting. I have no idea how big an average term posting might be, but these numbers do not seem at all unreasonable. Now, let's see what kind of precise answer the Lucene guys give you! -- Jack Krupansky -Original Message- From: Mike Sent: Thursday, April 25, 2013 4:00 PM To: solr-user@lucene.apache.org Subject: Massive Positions Files Hi All, I'm indexing a pretty large collection of documents (about 500K relatively long documents taking up 1TB space, mostly in MS Office formats), and am confused about the file sizes in the index. I've gotten through about 180K documents, and the *.pos files add up to 325GB, while the all of the rest combined are using less than 5GB--including some large stored fields and term vectors. It makes sense to me that the compression on stored fields helps to keep that part down on large text fields, and that term vectors wouldn't be too big since they don't need position information, but the magnitude of the difference is alarming. Is that to be expected? Is there any way to reduce the size of the positions index if phrase searching is a requirement? I am using Solr 4.2.1. These documents have some a number of small metadata elements, along with the big content field. Like the default schema, I'm storing but not indexing the content field, and a lot of the fields get put into a catchall that is indexed and uses term vectors, but is not stored. Thanks, Mike
Re: Need to log query request before it is processed
HI Tim, Have you tried by enabling the logging levels on httpclient, which is used by solrj classes internally? Thx,Sudhakar. On Thu, Apr 25, 2013 at 10:12 AM, Timothy Potter thelabd...@gmail.comwrote: I would like to log query requests before they are processed. Currently, it seems they are only logged after being processed. I've tried enabling a finer logging level but that didn't seem to help. I've enabled request logging in Jetty but most queries come in as POSTs from SolrJ I was thinking of adding a query request logger as a first-component but wanted to see what others have done for this? Thanks. Tim
Re: SolrJ Custom RowMapper
Hey Luis, Check this example in the source:TestDocumentObjectBinder https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_1/solr/solrj/src/test/org/apache/solr/client/solrj/beans/TestDocumentObjectBinder.java Thx,Sudhakar. On Thu, Apr 25, 2013 at 7:56 AM, Luis Lebolo luis.leb...@gmail.com wrote: Hi All, Does SolrJ have an option for a custom RowMapper or BeanPropertyRowMapper (I'm using Spring/JDBC terms). I know the QueryResponse has a getBeans method, but I would like to create my own mapping and plug it in. Any pointers? Thanks, Luis
Re: How To Make Index Backup at SolrCloud?
You can use the index backup command that's part of index replication, check the Wiki. Otis Solr ElasticSearch Support http://sematext.com/ On Apr 25, 2013 5:23 PM, Furkan KAMACI furkankam...@gmail.com wrote: I use SolrCloud. Let's assume that I want to move all indexes from one place to another. There maybe two reasons for that: First one is that: I will close all my system and I will use new machines with previous indexes (if it is a must they may have same network topology) at anywhere else after some time later. Second one is that: I know that SolrCloud handles failures but I will back up my indexes for a disaster event. How can I back up my indexes? I know that I can start up new nodes and I can close the old ones so I can move my indexes to other machines. However how can I do such kind of backup (should I just copy data folder of Solr nodes and put them to new Solr nodes after I change Zookeeper configuration)? What folks do?
Re: Too many close, count -1
One outside possibility (and 4.3 should refuse to start if this is the case). Is it possible that more than one of your cores has the same name? FWIW, Erick On Tue, Apr 23, 2013 at 5:30 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Subject: Re: Too many close, count -1 Thanks for the details, nothing jumps out at me, but we're now tracking this in SOLR-4753... https://issues.apache.org/jira/browse/SOLR-4753 -Hoss
Re: Query specific replica
bq: I was wondering wether it is possible to query the same core every request, Not that I know of. You can ping a single node by appending distrib=false, but that won't then look at multiple shards. If you don't have any shards, this would work I think... Best Erick On Tue, Apr 23, 2013 at 6:31 PM, Manuel Le Normand manuel.lenorm...@gmail.com wrote: Hello, Since i replicated my shards (i have 2 cores per shard now), I get a remarkable decrease in qTime. I assume it happens since my memory has to split between twice more cores than it used to. In my low qps rate use-case, I use replications as shard backup only (in case one of my servers goes down) and not for the ability of serving parallel requests. In this case i decrease because the two cores of the shard are active. I was wondering wether it is possible to query the same core every request, instead of load balancing between the different replicas? And only if the leader replica goes down the second replica would start serving requests. Cheers, Manu
Re: Luke misreporting index-time boosts?
I think you're kinda missing the idea of index time boosting. The semantic of this (as I remember Chris Hostetter explaining) is this document's content is more important than other document's content. By doing an index-time boost that's the same for all your documents, you're effectively doing nothing to the relative ranks of the results. Not quite sure what Luke is doing here, but using debugQuery=on will give you the actual scores of the actual documents. And if you're doing anything like wildcards or *:* queries, shortcuts are taken that set the scores to 1.0. If none of that helps, I'm out of my depth G.. Best Erick On Wed, Apr 24, 2013 at 6:01 AM, Timothy Hill timothy.d.h...@gmail.com wrote: Hello, all I have recently been attempting to apply index-time boosts to fields using the following syntax: add doc field name=important_field boost=5bleah bleah bleah/field field name=standard_field boost=2content here/field field name=trivial_fieldcontent here/field /doc doc field name=important_field boost=5content here/field field name=standard_field boost=2bleah bleah bleah/field field name=trivial_fieldcontent here/field /doc /add The intention is that matches on important_field should be more important to score than matches on trivial_field (so that a search across all fields for the term 'content' would return the second document above the first), while still being able to use the standard query parser. Looking at output from Luke, however, all fields are reported as having a boost of 1.0. The following possibilities occur to me. (1) The entire index-time-boosting approach is misconceived (2) Luke is misreporting, because index-time boosting alters more fundamental aspects of scoring (tf-idf calculations, I suppose), and the index-time boost is thus invisible to it (3) Some combination of (1) and (2) Can anyone help illuminate the situation for me? Documentation for these questions seems patchy. Thanks, Tim
Re: Facets with OR clause
If you're talking about _filter queries_, Kai's answer is good But your question is confusing. You talk about facet queries, but then use fq, which is filter query and has nothing to do with facets at all unless you're talking about turning facet information into filter queries.. FWIW, Erick On Wed, Apr 24, 2013 at 6:43 AM, Kai Becker m...@kai-becker.com wrote: Try fq=(groups:group1 OR locations:location1) Am 24.04.2013 um 12:39 schrieb vsl: Hi, my request contains following term: The are 3 facets: groups, locations, categories. When I select some items then I see such syntax in my request. fq=groups:group1fq=locations:location1 Is it possible to add OR clause between facets items in query? -- View this message in context: http://lucene.472066.n3.nabble.com/Facets-with-OR-clause-tp4058553.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Using another way instead of DIH
Thanks for help . data-config.xml ? i can not find this file , u mean data-import.xml or solrconfig.xml ? -- View this message in context: http://lucene.472066.n3.nabble.com/Using-another-way-instead-of-DIH-tp4058937p4059067.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Cloud 4.2 - Distributed Requests failing with NPE
: trace:java.lang.NullPointerException\r\n\tat : org.apache.solr.handler.component.HttpShardHandler.checkDistributed(HttpShardHandler.java:340)\r\n\tat : org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:182)\r\n\tat yea, definitely a bug. Raintung reported this recently, and made a patch available... https://issues.apache.org/jira/browse/SOLR-4705 -Hoss
Re: How do set compression for compression on stored fields in SOLR 4.2.1
: Subject: How do set compression for compression on stored fields in SOLR 4.2.1 : : https://issues.apache.org/jira/browse/LUCENE-4226 : It mentions that we can set compression mode: : FAST, HIGH_COMPRESSION, FAST_UNCOMPRESSION. The compression details are hardcoded into the various codecs. If you wanted to customize this, you'd need to write your own codec subclass... https://lucene.apache.org/core/4_2_0/core/org/apache/lucene/codecs/compressing/class-use/CompressionMode.html See, for example, the implementations of Lucene41StoredFieldsFormat and Lucene42TermVectorsFormat... public final class Lucene41StoredFieldsFormat extends CompressingStoredFieldsFormat { /** Sole constructor. */ public Lucene41StoredFieldsFormat() { super(Lucene41StoredFields, CompressionMode.FAST, 1 14); } } public final class Lucene42TermVectorsFormat extends CompressingTermVectorsFormat { /** Sole constructor. */ public Lucene42TermVectorsFormat() { super(Lucene41StoredFields, , CompressionMode.FAST, 1 12); } } -Hoss
Re: Question on storage and index/data management in solr
Thank you very much Shawn for a detailed response. Let me read all the documentation you pointed to and digest it. Sure, if I do use using solr and need to make this change, I would love to also submit it to the Lucene/Solr project. Regards, Vinay From: Shawn Heisey s...@elyograg.org To: solr-user@lucene.apache.org Sent: Thursday, April 25, 2013 11:32 PM Subject: Re: Question on storage and index/data management in solr On 4/25/2013 8:39 AM, Vinay Rai wrote: 1. Keep each of last 24 hours segments separate. 2. Segments generated between last 48 to 24 hours to be merged into one. Similarly, for segments created between 72 to 48 hours and so on for last 1 week. 3. Similarly, merge previous 4 week's data into one segment each week. 4. Merge all previous months data into one segment each month. I am not sure if there is a configuration possible in solr application. If not, are there APIs which will allow me to do this? To accomplish this exact scenario, you would probably have to write a custom merge policy class for Lucene. If you do so, I hope you'll strongly consider donating it to the Lucene/Solr project. Another approach: Use distributed search and put the divisions you are looking at into separate indexes (shards) in their own cores. You can then manually do whatever index merging your situation requires. Constructing the shards parameter for your queries will take some work. Here's a blog post about this method and a video of the Lucene Revolution talk mentioned in the blog post: http://www.loggly.com/blog/2010/08/our-solr-system/ http://loggly.com/videos/lucene-revolution-2010/ I had the honor of being there for that talk in Boston. They've done some amazing things with Solr. Also, I want to understand how solr stores data or does it have a dependency on the way data is stored. Since the volumes are high, it would be great if the data is compressed and stored (while still searchable). If it is possible, I would like to know what kind of compression does solr do? Solr 4.1 uses compression for stored fields. Solr 4.2 also uses compression for term vectors. From a performance perspective, compression is probably not viable at this time for the indexed data, but if that changes in the future, I'm sure that it will be added. Here is documentation on the file format used by Solr 4.2: http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/codecs/lucene42/package-summary.html#package_description Thanks, Shawn
Re: Facets with 5000 facet fields - Out of memory error during the query time
I got more information with the responses.Now, It's time to re look into the number of facets to be configured. Thanks, Siva http://smarttechies.wordpress.com/ -- View this message in context: http://lucene.472066.n3.nabble.com/Facets-with-5000-facet-fields-Out-of-memory-error-during-the-query-time-tp4048450p4059079.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Cloud 4.2 - Distributed Requests failing with NPE
Thank you Hoss for looking into it. -Sudhakar. On Thu, Apr 25, 2013 at 6:50 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : trace:java.lang.NullPointerException\r\n\tat : org.apache.solr.handler.component.HttpShardHandler.checkDistributed(HttpShardHandler.java:340)\r\n\tat : org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:182)\r\n\tat yea, definitely a bug. Raintung reported this recently, and made a patch available... https://issues.apache.org/jira/browse/SOLR-4705 -Hoss