Re: No wildcards with solr.ASCIIFoldingFilterFactory?
Thank you Mark! Let me see whether I understand right you idea. I have to write a Plugin like LuceneQParserPlugin which uses not the SolrQueryParser but a MySolrQueryParser which is based on SolrQueryParser und uses AnalyzingQueryParser methods. I think this is too difficult for me because I am not a programmer. Maybe I get the code together but I have no experiences in debugging of java applications. Maybe there is another method to solve this problem without coding? Or could I ask the Solr community to help us to write a Plugin? Maybe there are some others people who are interested in such kind of an analyzed wildcard search? Vladimir -- View this message in context: http://www.nabble.com/No-wildcards-with-solr.ASCIIFoldingFilterFactory--tp24162104p24216154.html Sent from the Solr - User mailing list archive at Nabble.com.
facets: case and accent insensitive sort
Hi! When I ask solr for facets, with the parameter facet.sort=index, it gives me the facets sorted alphabetically, but case and accent sensitive. I found no way to have the facets returned with the original case and accents, and sorted alphabetically, with no sensibility to case and accents. Is there anything I can do to achieve this goal, without having to retrieve all facets and sort it myself? (We have fields with many, many facets, and doing so impacts performance a lot). Sebastien.
Re: facets: case and accent insensitive sort
On Fri, Jun 26, 2009 at 4:06 PM, Sébastien Lamy lamys...@free.fr wrote: Hi! When I ask solr for facets, with the parameter facet.sort=index, it gives me the facets sorted alphabetically, but case and accent sensitive. I found no way to have the facets returned with the original case and accents, and sorted alphabetically, with no sensibility to case and accents. Is there anything I can do to achieve this goal, without having to retrieve all facets and sort it myself? (We have fields with many, many facets, and doing so impacts performance a lot). Faceting is done on indexed values so if your indexed values are with original case and accents, they will be sorted accordingly. You could use a copyField to store these values into a string type and facet on that. -- Regards, Shalin Shekhar Mangar.
How much data can Solr handle?
We're looking to build a search solution that can contain as many as 10 million different items and I was wondering if Solr could handle that kind of data amount or not? Has anybody done any testing or published any kind of results for a Solr-installation working on huge amounts of data like this? //Daniel -- Daniel Löfquist Software Engineer CDON.COM Bergsgatan 20, Box 385, SE 201 23 Malmö, Sweden Office: +46 40 601 61 00 Direct: +46 40 601 61 16 Fax: +46 40 601 61 20 E-mail: daniel.lofqu...@it.cdon.com mailto:daniel.lofqu...@it.cdon.com CDON.COM http://www.cdon.com/ Confidentiality Information contained in this e-mail is intended for the use of the addressee only, and is confidential. Any dissemination, distribution, copying or use of this communication without prior permission of the addressee is strictly prohibited. If you are not the intended addressee you must delete this e-mail and its attachments.
Re: How much data can Solr handle?
On Fri, Jun 26, 2009 at 1:27 PM, Daniel Löfquistdaniel.lofqu...@it.cdon.com wrote: We're looking to build a search solution that can contain as many as 10 million different items and I was wondering if Solr could handle that kind of data amount or not? 10m documents is a quite common load. We're currently running two installations with about 4m documents in one and 6m documents (articles) in the other. Both run from single machines and with sub 0.1s search times. Has anybody done any testing or published any kind of results for a Solr-installation working on huge amounts of data like this? There's a page dedicated on the wiki to listing known companies and installations based on Solr: http://wiki.apache.org/solr/PublicServers Hopefully that'll give you an idea. It shouldn't be to hard to just try it out (i'm guessing you could do most of the setup in a day or two). Hope that helps! --mats
Re: facets: case and accent insensitive sort
Shalin Shekhar Mangar a écrit : On Fri, Jun 26, 2009 at 4:06 PM, Sébastien Lamy lamys...@free.fr wrote: Hi! When I ask solr for facets, with the parameter facet.sort=index, it gives me the facets sorted alphabetically, but case and accent sensitive. I found no way to have the facets returned with the original case and accents, and sorted alphabetically, with no sensibility to case and accents. Is there anything I can do to achieve this goal, without having to retrieve all facets and sort it myself? (We have fields with many, many facets, and doing so impacts performance a lot). Faceting is done on indexed values so if your indexed values are with original case and accents, they will be sorted accordingly. You could use a copyField to store these values into a string type and facet on that. If I use a copyField to store into a string type, and facet on that, my problem remains: The facets are sorted case and accent sensitive. And I want an *insensitive* sort. If I use a copyField to store into a type with no accents and case (e.g alphaOnlySort), then solr return me facet values with no accents and no case. And I want the facet values returned by solr to *have accents and case*.
Re: facets: case and accent insensitive sort
On Fri, Jun 26, 2009 at 6:02 PM, Sébastien Lamy lamys...@free.fr wrote: If I use a copyField to store into a string type, and facet on that, my problem remains: The facets are sorted case and accent sensitive. And I want an *insensitive* sort. If I use a copyField to store into a type with no accents and case (e.g alphaOnlySort), then solr return me facet values with no accents and no case. And I want the facet values returned by solr to *have accents and case*. Ah, of course you are right. There is no way to do this right now except at the client side. -- Regards, Shalin Shekhar Mangar.
Re: facets: case and accent insensitive sort
Shalin Shekhar Mangar a écrit : On Fri, Jun 26, 2009 at 6:02 PM, Sébastien Lamy lamys...@free.fr wrote: If I use a copyField to store into a string type, and facet on that, my problem remains: The facets are sorted case and accent sensitive. And I want an *insensitive* sort. If I use a copyField to store into a type with no accents and case (e.g alphaOnlySort), then solr return me facet values with no accents and no case. And I want the facet values returned by solr to *have accents and case*. Ah, of course you are right. There is no way to do this right now except at the client side. Thank you for your response. Would it be easy to modify Solr to behave like I want. Where should I start to investigate?
Upgrade to solr 1.4
Hi, I need to upgrade from solr 1.3 to solr 1.4. I was wondering if there is a particular revision of 1.4 that I should use that is considered very stable for a production environment?
Re: Upgrade to solr 1.4
David Baker wrote: Hi, I need to upgrade from solr 1.3 to solr 1.4. I was wondering if there is a particular revision of 1.4 that I should use that is considered very stable for a production environment? Well it it's not pronounced stable and given in download page I don't think you can rely on being very stable for production environment.
Re: Upgrade to solr 1.4
Solr in general is fairly stable in trunk. That isn't to say that a critical error can't get through, because that does happen, but the test suite is pretty comprehensive. With Solr 1.4 getting closer and closer, I think you'll see the pace of change dropping off. I think it's one of those things that you have to judge for yourself.. Are the features/fixes/enhancements in 1.4 trunk worth a potential risk? I assume that as part of deployment into production you have some sort of defined criteria that says Solr can be added? Testing of server capacity/performance etc? Those might tell you if there are any issues with Solr 1.4 trunk that would need to delay your deployment. Eric On Jun 26, 2009, at 10:58 AM, Julian Davchev wrote: David Baker wrote: Hi, I need to upgrade from solr 1.3 to solr 1.4. I was wondering if there is a particular revision of 1.4 that I should use that is considered very stable for a production environment? Well it it's not pronounced stable and given in download page I don't think you can rely on being very stable for production environment. - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Query Filter fq with OR operator
I will like to submit a JIRA issue for this. Can anyone help me on where to go? -Yao Otis Gospodnetic wrote: Brian, Opening a JIRA issue if it doesn't already exist is the best way. If you can provide a patch, even better! Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: brian519 bpear...@desire2learn.com To: solr-user@lucene.apache.org Sent: Tuesday, June 16, 2009 1:32:41 PM Subject: Re: Query Filter fq with OR operator This feature is very important to me .. should I post something on the dev forum? Not sure what the proper protocol is for adding a feature to the roadmap Thanks, Brian. -- View this message in context: http://www.nabble.com/Query-Filter-fq-with-OR-operator-tp23895837p24059181.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Query-Filter-fq-with-OR-operator-tp23895837p24222170.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query Filter fq with OR operator
Hello Yao, A contribution would be great. Here is information about how to contribute: http://wiki.apache.org/solr/HowToContribute Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Yao Ge yao...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, June 26, 2009 11:20:25 AM Subject: Re: Query Filter fq with OR operator I will like to submit a JIRA issue for this. Can anyone help me on where to go? -Yao Otis Gospodnetic wrote: Brian, Opening a JIRA issue if it doesn't already exist is the best way. If you can provide a patch, even better! Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: brian519 To: solr-user@lucene.apache.org Sent: Tuesday, June 16, 2009 1:32:41 PM Subject: Re: Query Filter fq with OR operator This feature is very important to me .. should I post something on the dev forum? Not sure what the proper protocol is for adding a feature to the roadmap Thanks, Brian. -- View this message in context: http://www.nabble.com/Query-Filter-fq-with-OR-operator-tp23895837p24059181.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Query-Filter-fq-with-OR-operator-tp23895837p24222170.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query Filter fq with OR operator
On Fri, Jun 26, 2009 at 8:50 PM, Yao Ge yao...@gmail.com wrote: I will like to submit a JIRA issue for this. Can anyone help me on where to go? An issue has been opened already. You may want to add a vote to the following issue. https://issues.apache.org/jira/browse/SOLR-1223 -- Regards, Shalin Shekhar Mangar.
Re: How much data can Solr handle?
Hi Daniel, How much Solr can handle really depends on the hardware you run it on, the type of document you index in it, and the query rate and type. 10M doesn't sound like a large number even for an average server today (e.g. 4 GB of RAM, 1-2 cores), web-page sized documents, and a query rate of a few dozen a second simple keyword, boolean, or phrase queries Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Daniel Löfquist daniel.lofqu...@it.cdon.com To: solr-user@lucene.apache.org Sent: Friday, June 26, 2009 7:27:45 AM Subject: How much data can Solr handle? We're looking to build a search solution that can contain as many as 10 million different items and I was wondering if Solr could handle that kind of data amount or not? Has anybody done any testing or published any kind of results for a Solr-installation working on huge amounts of data like this? //Daniel -- Daniel Löfquist Software Engineer CDON.COM Bergsgatan 20, Box 385, SE 201 23 Malmö, Sweden Office: +46 40 601 61 00 Direct: +46 40 601 61 16 Fax: +46 40 601 61 20 E-mail: daniel.lofqu...@it.cdon.com CDON.COM Confidentiality Information contained in this e-mail is intended for the use of the addressee only, and is confidential. Any dissemination, distribution, copying or use of this communication without prior permission of the addressee is strictly prohibited. If you are not the intended addressee you must delete this e-mail and its attachments.
Re: Upgrade to solr 1.4
Netflix is running a nightly build from May in production. We did our normal QA on it, then ran it on one of our five servers for two weeks. No problems. It is handling about 10% more traffic with 10% less CPU. We deployed 1.4 to all our servers yesterday. wunder On 6/26/09 7:58 AM, Julian Davchev j...@drun.net wrote: David Baker wrote: Hi, I need to upgrade from solr 1.3 to solr 1.4. I was wondering if there is a particular revision of 1.4 that I should use that is considered very stable for a production environment? Well it it's not pronounced stable and given in download page I don't think you can rely on being very stable for production environment.
Re: Upgrade to solr 1.4
On Fri, Jun 26, 2009 at 9:11 PM, Walter Underwood wunderw...@netflix.comwrote: Netflix is running a nightly build from May in production. We did our normal QA on it, then ran it on one of our five servers for two weeks. No problems. It is handling about 10% more traffic with 10% less CPU. Wow, that is good news! Are you also using the java based replication? We deployed 1.4 to all our servers yesterday. Can you tell us which revision you used? -- Regards, Shalin Shekhar Mangar.
Re: Upgrade to solr 1.4
We are using the script replication. I have no interest in spending time configuring and QA'ing a different method when the scripts work fine. We are running the nightly from 2009-05-11. wunder On 6/26/09 8:51 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Fri, Jun 26, 2009 at 9:11 PM, Walter Underwood wunderw...@netflix.comwrote: Netflix is running a nightly build from May in production. We did our normal QA on it, then ran it on one of our five servers for two weeks. No problems. It is handling about 10% more traffic with 10% less CPU. Wow, that is good news! Are you also using the java based replication? We deployed 1.4 to all our servers yesterday. Can you tell us which revision you used?
Re: Upgrade to solr 1.4
We are using a trunk build from approximately the same time with little to no issues including the new replication. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Shalin Shekhar Mangar shalinman...@gmail.com Reply-To: solr-user@lucene.apache.org Date: Fri, 26 Jun 2009 21:21:44 +0530 To: solr-user@lucene.apache.org Subject: Re: Upgrade to solr 1.4 On Fri, Jun 26, 2009 at 9:11 PM, Walter Underwood wunderw...@netflix.comwrote: Netflix is running a nightly build from May in production. We did our normal QA on it, then ran it on one of our five servers for two weeks. No problems. It is handling about 10% more traffic with 10% less CPU. Wow, that is good news! Are you also using the java based replication? We deployed 1.4 to all our servers yesterday. Can you tell us which revision you used? -- Regards, Shalin Shekhar Mangar.
Error while trying to index
I am trying to index a solr server from a nightly build. I get the following error in my catalina.out: 26-Jun-2009 5:52:06 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {} 0 4 26-Jun-2009 5:52:06 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NoSuchFieldError: log at com.pjaol.search.solr.update.LocalUpdaterProcessor.processAdd(LocalUpdateProcessorFactory.java:138) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1292) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619)
Scaling out/up or a mix
Hi. I currently have an index which is 16GB per machine (8 machines = 128GB) (data is stored externally, not in index) and is growing like crazy (we are indexing blogs which is crazy by nature) and have only allocated 2GB per machine to the SOLR app since we are running some other stuff there in parallell. Each doc should be roughly the size of a blog post, no more than 20k. We currently have about 90M documents and it is increasing rapidly so getting into the G+ document range is not going to be too far away. Now due to search performance I think I need to move these instances to dedicated index/search machines (or index on some machines and search on others). Anyway I would like to get some feedback about two things: 1. What is the most important hardware aspect when it comes to add document to the index and optimize it. 1.1 Is it disk I|O write throghput ? (sequential or random-io ?) 1.2 Is it RAM ? 1.3 Is is CPU ? My guess would be disk-io, right, wrong ? 2. What is the most important hardware aspect when it comes to searching documents in my setup ? (result-set is limited to return only the top 10 matches with page handling) We facet and sort on the publishedDate of the entry (memory intensive I presume) 2.1 Is it disk read throughput ? (sequential or random-io ?) 2.2 Is it RAM ? 2.3 Is is CPU ? I have no clue since the data might not fit into memory. What is then the most important factor ? read-performance while scanning the index ? CPU while comparing fields and collecting results ? What I'm trying to find out is what I can do to get most bang for the buck with a limited (aren't we all limited?) budget. Kindly //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/
Re: How much data can Solr handle?
Total # of bytes for the input data is a more useful number than # of documents. 400 million documents was our peak at my last job. They were maybe 300-500 bytes of text, for 1k of disk space per document. The index was thus 400 gigabytes. The problems were: 1) system administration: the logistics of the index were a nightmare. Optimize took 14 hours, a full copy to the query servers took 1/2 an hour. Optimize needs twice the index size in the same partition. 2) sorting creates an array with one element for every document. We needed 32G of ram in a server to allow sorted results. 3) faceting on some fields was likewise impossible, since faceting makes an array of facet values. Faceting on timestamps was a no-no. The servers were Dell 2950s, 2 or 4 processor, 32G ram, 6 300mb high-speed SATA in Raid-5 for 1.2 terabytes of space. Basic searching was a little slower than the smaller index, but still 50ms for pre-cached queries. On Fri, Jun 26, 2009 at 8:28 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Daniel, How much Solr can handle really depends on the hardware you run it on, the type of document you index in it, and the query rate and type. 10M doesn't sound like a large number even for an average server today (e.g. 4 GB of RAM, 1-2 cores), web-page sized documents, and a query rate of a few dozen a second simple keyword, boolean, or phrase queries Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Daniel Löfquist daniel.lofqu...@it.cdon.com To: solr-user@lucene.apache.org Sent: Friday, June 26, 2009 7:27:45 AM Subject: How much data can Solr handle? We're looking to build a search solution that can contain as many as 10 million different items and I was wondering if Solr could handle that kind of data amount or not? Has anybody done any testing or published any kind of results for a Solr-installation working on huge amounts of data like this? //Daniel -- Daniel Löfquist Software Engineer CDON.COM Bergsgatan 20, Box 385, SE 201 23 Malmö, Sweden Office: +46 40 601 61 00 Direct: +46 40 601 61 16 Fax: +46 40 601 61 20 E-mail: daniel.lofqu...@it.cdon.com CDON.COM Confidentiality Information contained in this e-mail is intended for the use of the addressee only, and is confidential. Any dissemination, distribution, copying or use of this communication without prior permission of the addressee is strictly prohibited. If you are not the intended addressee you must delete this e-mail and its attachments. -- Lance Norskog goks...@gmail.com 650-922-8831 (US)