Re: Multiple Cores Vs. Single Core for the following use case
In case you are going to use core per user take a look to this patch: http://wiki.apache.org/solr/LotsOfCores Trey-13 wrote: Hi Matt, In most cases you are going to be better off going with the userid method unless you have a very small number of users and a very large number of docs/user. The userid method will likely be much easier to manage, as you won't have to spin up a new core every time you add a new user. I would start here and see if the performance is good enough for your requirements before you start worrying about it not being efficient. That being said, I really don't have any idea what your data looks like. How many users do you have? How many documents per user? Are any documents shared by multiple users? -Trey On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour matthieu_lab...@yahoo.comwrote: Hi Shall I set up Multiple Core or Single core for the following use case: I have X number of users. When I do a search, I always know for which user I am doing a search Shall I set up X cores, 1 for each user ? Or shall I set up 1 core and add a userId field to each document? If I choose the 1 core solution then I am concerned with performance. Let's say I search for NewYork ... If lucene returns all New York matches for all users and then filters based on the userId, then this is going to be less efficient than if I have sharded per user and send the request for New York to the user's core Thank you for your help matt -- View this message in context: http://old.nabble.com/Multiple-Cores-Vs.-Single-Core-for-the-following-use-case-tp27332288p27335403.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Fastest way to use solrj
how many fields are there in each doc? the binary format just reduces overhead. it does not touch/compress the payload 2010/1/27 Tim Terlegård tim.terleg...@gmail.com: I have 3 millon documents, each having 5000 chars. The xml file is about 15GB. The binary file is also about 15GB. I was a bit surprised about this. It doesn't bother me much though. At least it performs better. /Tim 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: if you write only a few docs you may not observe much difference in size. if you write large no:of docs you may observe a big difference. 2010/1/27 Tim Terlegård tim.terleg...@gmail.com: I got the binary format to work perfectly now. Performance is better than with xml. Thanks! Although, it doesn't look like a binary file is smaller in size than an xml file? /Tim 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: 2010/1/21 Tim Terlegård tim.terleg...@gmail.com: Yes, it worked! Thank you very much. But do I need to use curl or can I use CommonsHttpSolrServer or StreamingUpdateSolrServer? If I can't use BinaryWriter then I don't know how to do this. if your data is serialized using JavaBinUpdateRequestCodec, you may POST it using curl. If you are writing directly , use CommonsHttpSolrServer /Tim 2010/1/20 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: 2010/1/20 Tim Terlegård tim.terleg...@gmail.com: BinaryRequestWriter does not read from a file and post it Is there any other way or is this use case not supported? I tried this: $ curl host/solr/update/javabin -F stream.file=/tmp/data.bin $ curl host/solr/update -F stream.body=' commit /' Solr did read the file, because solr complained when the file wasn't in the format the JavaBinUpdateRequestCodec expected. But no data is added to the index for some reason. how did you create the file /tmp/data.bin ? what is the format? I wrote this in the first email. It's in the javabin format (I think). I did like this (groovy code): fieldId = new NamedList() fieldId.add(name, id) fieldId.add(val, 9-0) fieldId.add(boost, null) fieldText = new NamedList() fieldText.add(name, text) fieldText.add(val, Some text) fieldText.add(boost, null) fieldNull = new NamedList() fieldNull.add(boost, null) doc = [fieldNull, fieldId, fieldText] docs = [doc] root = new NamedList() root.add(docs, docs) fos = new FileOutputStream(data.bin) new JavaBinCodec().marshal(root, fos) /Tim JavaBin is a format. use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest updateRequest, OutputStream os) The output of this can be posted to solr and it should work -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Fastest way to use solrj
I have 6 fields. The text field is the biggest, it contains almost all of the 5000 chars. /Tim 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: how many fields are there in each doc? the binary format just reduces overhead. it does not touch/compress the payload 2010/1/27 Tim Terlegård tim.terleg...@gmail.com: I have 3 millon documents, each having 5000 chars. The xml file is about 15GB. The binary file is also about 15GB. I was a bit surprised about this. It doesn't bother me much though. At least it performs better. /Tim 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: if you write only a few docs you may not observe much difference in size. if you write large no:of docs you may observe a big difference. 2010/1/27 Tim Terlegård tim.terleg...@gmail.com: I got the binary format to work perfectly now. Performance is better than with xml. Thanks! Although, it doesn't look like a binary file is smaller in size than an xml file? /Tim 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: 2010/1/21 Tim Terlegård tim.terleg...@gmail.com: Yes, it worked! Thank you very much. But do I need to use curl or can I use CommonsHttpSolrServer or StreamingUpdateSolrServer? If I can't use BinaryWriter then I don't know how to do this. if your data is serialized using JavaBinUpdateRequestCodec, you may POST it using curl. If you are writing directly , use CommonsHttpSolrServer /Tim 2010/1/20 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: 2010/1/20 Tim Terlegård tim.terleg...@gmail.com: BinaryRequestWriter does not read from a file and post it Is there any other way or is this use case not supported? I tried this: $ curl host/solr/update/javabin -F stream.file=/tmp/data.bin $ curl host/solr/update -F stream.body=' commit /' Solr did read the file, because solr complained when the file wasn't in the format the JavaBinUpdateRequestCodec expected. But no data is added to the index for some reason. how did you create the file /tmp/data.bin ? what is the format? I wrote this in the first email. It's in the javabin format (I think). I did like this (groovy code): fieldId = new NamedList() fieldId.add(name, id) fieldId.add(val, 9-0) fieldId.add(boost, null) fieldText = new NamedList() fieldText.add(name, text) fieldText.add(val, Some text) fieldText.add(boost, null) fieldNull = new NamedList() fieldNull.add(boost, null) doc = [fieldNull, fieldId, fieldText] docs = [doc] root = new NamedList() root.add(docs, docs) fos = new FileOutputStream(data.bin) new JavaBinCodec().marshal(root, fos) /Tim JavaBin is a format. use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest updateRequest, OutputStream os) The output of this can be posted to solr and it should work -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: solr1.5
Good question indeed : I'm waiting as many others I guess for the patch 236 (the collapse thing :) ). David On Tue, Jan 26, 2010 at 4:24 PM, Matthieu Labour matth...@strateer.comwrote: Hi quick question: Is there any release date scheduled for solr 1.5 with all the wonderful patches (StreamingUpdateSolrServer etc ...)? Thank you !
scenario with FQ parameter
HI all: I am trying to figure out a way to do the following: qf=field1^10 field2^20 field^100fq=*:9+OR+(field1:xyz) *Expected Results: The above should return me documents where 9 appears in any of the fields (field1,field2 or field3) OR field1 matches xyz. * I know I can use copy field (say 'text') to copy all the fields and then use: qf=field1^10 field2^20 field^100fq=*text*:9+OR+(field1:xyz ^100.0) but doing so , the boost weights specified in the 'qf' field have no effect on the score. I am using solr 1.4 and the searchHandler is dismax. Is there any way I can achieve the above expected results but still affect the score with qf parameter ? Thanks, ~Ravi Gidwani.
Re: Wildcard Search and Filter in Solr
Ashok: May be this will help: http://gravi2.blogspot.com/2009/05/solr-wildcards-and-omitnorms.html ~Ravi On Tue, Jan 26, 2010 at 9:56 PM, ashokcz ashokkumar.gane...@tcs.com wrote: Hi just looked at the analysis.jsp and found out what it does during index / query Index Analyzer Intel intel intel intel intel intel Query Analyzer Inte* Inte* inte* inte inte inte int I think somewhere my configuration or my definition of the type text is wrong. This is my configuration . fieldType class=solr.TextField name=text analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter catenateAll=0 catenateNumbers=0 catenateWords=0 class=solr.WordDelimiterFilterFactory generateNumberParts=1 generateWordParts=1/ filter class=solr.StopFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory expand=true ignoreCase=true synonyms=synonyms.txt/ filter class=solr.LowerCaseFilterFactory/ filter catenateAll=0 catenateNumbers=0 catenateWords=0 class=solr.WordDelimiterFilterFactory generateNumberParts=1 generateWordParts=1/ filter class=solr.StopFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I think i am missing some basic configuration for doing wildcard searches . but could not figure it out . can someone help please Ahmet Arslan wrote: Hi , I m trying to use wildcard keywords in my search term and filter term . but i didnt get any results. Searched a lot but could not find any lead . Can someone help me in this. i m using solr 1.2.0 and have few records indexed with vendorName value as Intel In solr admin interface i m trying to do the search like this http://localhost:8983/solr/select?indent=onversion=2.2q=intelstart=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl.fl= and i m getting the result properly but when i use q=inte* no records are returned. the same is the case for Filter Query on using fq=VendorName:Intel i get my results. but on using fq=VendorName:Inte* no results are returned. I can guess i doing mistake in few obvious things , but could not figure it out .. Can someone pls help me out :) :) If q=intel returns documents while q=inte* does not, it means that fieldType of your defaultSearchField is reducing the token intel into something. Can you find out it by using /admin/anaysis.jsp what happens to Intel intel at index and query time? What is your defaultSearchField? Is it VendorName? It is expected that fq=VendorName:Intel returns results while fq=VendorName:Inte* does not. Because prefix queries are not analyzed. But it is strange that q=inte* does not return anything. Maybe your index analyzer is reducing Intel into int or ıntel? I am not 100% sure but solr 1.2.0 may use default locale in lowercase operation. What is your default locale? It is better to see what happens word Intel using analysis.jsp page. -- View this message in context: http://old.nabble.com/Wildcard-Search-and-Filter-in-Solr-tp27306734p27334486.html Sent from the Solr - User mailing list archive at Nabble.com.
Plurals in solr indexing
Hi, I am having trouble with indexing plurals, I have the schema with following fields gender (field) - string (field type) (eg. data Boys) all (field) - text (field type) - solr.WhitespaceTokenizerFactory, solr.SynonymFilterFactory, solr.WordDelimiterFilterFactory, solr.LowerCaseFilterFactory, SnowballPorterFilterFactory i am using copyField from gender to all and searching on all field When i search for Boy, I get the results, If i search for Boys i dont get results, I have tried things like boys bikes - no results boy bikes - works kid and kids are synonymns for boy and boys, so i tried adding kid,kids,boy,boys in synonyms hoping it will work, it doesnt work that way I also have other content fields which are copied to all , and it contains words like kids, boys etc... any idea? -- View this message in context: http://old.nabble.com/Plurals-in-solr-indexing-tp27335639p27335639.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Plurals in solr indexing
I have found that my synonyms.txt file had kids,boys,girls,childrens,children,boys girls,kid,boy,girl I ran analyzer, somehow it is matching with girl ,, i am not sure whats happening yet, so i removed ampersand Kids,boys,girls,childrens,children,boy,girl,kid I guessed when i add them comma separated it will do as a group and when any one the words are queried matches will be returned. it is working now... after i made that change in synonyms.txt file murali k wrote: Hi, I am having trouble with indexing plurals, I have the schema with following fields gender (field) - string (field type) (eg. data Boys) all (field) - text (field type) - solr.WhitespaceTokenizerFactory, solr.SynonymFilterFactory, solr.WordDelimiterFilterFactory, solr.LowerCaseFilterFactory, SnowballPorterFilterFactory i am using copyField from gender to all and searching on all field When i search for Boy, I get the results, If i search for Boys i dont get results, I have tried things like boys bikes - no results boy bikes - works kid and kids are synonymns for boy and boys, so i tried adding kid,kids,boy,boys in synonyms hoping it will work, it doesnt work that way I also have other content fields which are copied to all , and it contains words like kids, boys etc... any idea? -- View this message in context: http://old.nabble.com/Plurals-in-solr-indexing-tp27335639p27336508.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Fastest way to use solrj
The binary format just reduces overhead. in your case , all the data is in the big text field which is not compressed. But overall, the parsing is a lot faster for the binary format. So you see a perf boost 2010/1/27 Tim Terlegård tim.terleg...@gmail.com: I have 6 fields. The text field is the biggest, it contains almost all of the 5000 chars. /Tim 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: how many fields are there in each doc? the binary format just reduces overhead. it does not touch/compress the payload 2010/1/27 Tim Terlegård tim.terleg...@gmail.com: I have 3 millon documents, each having 5000 chars. The xml file is about 15GB. The binary file is also about 15GB. I was a bit surprised about this. It doesn't bother me much though. At least it performs better. /Tim 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: if you write only a few docs you may not observe much difference in size. if you write large no:of docs you may observe a big difference. 2010/1/27 Tim Terlegård tim.terleg...@gmail.com: I got the binary format to work perfectly now. Performance is better than with xml. Thanks! Although, it doesn't look like a binary file is smaller in size than an xml file? /Tim 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: 2010/1/21 Tim Terlegård tim.terleg...@gmail.com: Yes, it worked! Thank you very much. But do I need to use curl or can I use CommonsHttpSolrServer or StreamingUpdateSolrServer? If I can't use BinaryWriter then I don't know how to do this. if your data is serialized using JavaBinUpdateRequestCodec, you may POST it using curl. If you are writing directly , use CommonsHttpSolrServer /Tim 2010/1/20 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: 2010/1/20 Tim Terlegård tim.terleg...@gmail.com: BinaryRequestWriter does not read from a file and post it Is there any other way or is this use case not supported? I tried this: $ curl host/solr/update/javabin -F stream.file=/tmp/data.bin $ curl host/solr/update -F stream.body=' commit /' Solr did read the file, because solr complained when the file wasn't in the format the JavaBinUpdateRequestCodec expected. But no data is added to the index for some reason. how did you create the file /tmp/data.bin ? what is the format? I wrote this in the first email. It's in the javabin format (I think). I did like this (groovy code): fieldId = new NamedList() fieldId.add(name, id) fieldId.add(val, 9-0) fieldId.add(boost, null) fieldText = new NamedList() fieldText.add(name, text) fieldText.add(val, Some text) fieldText.add(boost, null) fieldNull = new NamedList() fieldNull.add(boost, null) doc = [fieldNull, fieldId, fieldText] docs = [doc] root = new NamedList() root.add(docs, docs) fos = new FileOutputStream(data.bin) new JavaBinCodec().marshal(root, fos) /Tim JavaBin is a format. use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest updateRequest, OutputStream os) The output of this can be posted to solr and it should work -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Help using CachedSqlEntityProcessor
Hi, I have looked on the wiki. Using the CachedSqlEntityProcessor looks like it was simple. But I am getting no speed benefit and am not sure if I have even got the syntax correct. I have a main root entity called 'article'. And then I have a number of sub entities. One such entity is as such : entity name=LinkedCategory pk=LinkedCatAricleId query=SELECT LinkedCategoryBC, CmsArticleId as LinkedCatAricleId FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) WHERE convert(varchar(50), CmsArticleId) = convert(varchar(50), '${article.CmsArticleId}') processor=CachedSqlEntityProcessor WHERE=LinkedCatArticleId = article.CmsArticleId deltaQuery=SELECT LinkedCategoryBC FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) WHERE convert(varchar(50), CmsArticleId) = convert(varchar(50), '${article.CmsArticleId}') AND (convert(varchar(50), LastUpdateDate) '${dataimporter.article.last_index_time}' OR convert(varchar(50), PublishDate) '${dataimporter.article.last_index_time}') parentDeltaQuery=SELECT * from vArticleSummaryDetail_SolrSearch (nolock) WHERE convert(varchar(50), CmsArticleId) = convert(varchar(50), '${article.CmsArticleId}') field column=LinkedCategoryBC name=LinkedCategoryBreadCrumb/ /entity As you can see I have added (for the main query - not worrying about the delta queries yet!!) the processor and the 'where' but not sure if it's correct. Can anyone point me in the right direction??? Thanks Kirsty -- View this message in context: http://old.nabble.com/Help-using-CachedSqlEntityProcessor-tp27337635p27337635.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: solr with tomcat in cluster mode
Hi again I finally setup my solr Cluster with tomcat6 The configuration I user is two tomcat servers on the same server in different ports(ex localhost:8180/solr and Localhost:8280/solr for testing purposes) with different indexes on each server and index replication through replication handler of solr , and its working fine for me and very quick Now I want to use load balance for these two tomcat servers but without using apache http server Is there any solution for that ??? -Original Message- From: Matt Mitchell [mailto:goodie...@gmail.com] Sent: Friday, January 22, 2010 9:33 PM To: solr-user@lucene.apache.org Subject: Re: solr with tomcat in cluster mode Hey Otis, We're indexing on a separate machine because we want to keep our production nodes away from processes like indexing. The indexing server also has a ton of resources available, more so than the production nodes. We set it up as an indexing server at one point and have decided to stick with it. We're not indexing the same index as the search indexes because we want to be able to step back a day or two if needed. So we do the SWAP when things are done and OK. So that last part you mentioned about the searchers needing to re-open will happen with a SWAP right? Is your concern that there will be a lag time, making it so the slaves will be out of sync for some small period of time? Would it be simpler/better to move to using Solrs native slave/master feature? I'd love to hear any suggestions you might have. Thanks, Matt On Fri, Jan 22, 2010 at 1:58 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: This should work fine. But why are you indexing to a separate index/core? Why not index in the very same index you are searching? Slaves won't see changes until their searchers re-open. Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: Matt Mitchell goodie...@gmail.com To: solr-user@lucene.apache.org Sent: Fri, January 22, 2010 9:44:03 AM Subject: Re: solr with tomcat in cluster mode We have a similar setup and I'd be curious to see how folks are doing this as well. Our setup: A few servers and an F5 load balancer. Each Solr instance points to a shared index. We use a separate server for indexing. When the index is complete, we do some juggling using the Core Admin SWAP function and update the shared index. I've wondered about having a shared index across multiple instances of (read-only) Solr -- any problems there? Matt On Fri, Jan 22, 2010 at 9:35 AM, ZAROGKIKAS,GIORGOS g.zarogki...@multirama.gr wrote: Hi I'm using solr 1.4 with tomcat in a single pc and I want to turn it in cluster mode with 2 nodes and load balancing But I can't find info how to do Is there any manual or a recorded procedure on the internet to do that Or is there anyone to help me ? Thanks in advance Ps : I use windows server 2008 for OS
Starting Jetty Server using JettySolrRunner
Hi, I am trying 2 run a solr server using JettySolrRunner, however i keep gettin the following exception: Can't find resource 'solrconfig.xml' in classpath or 'solr/conf/', cwd=/home/ithurs/shellworkspace/SolrPOC at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:260) at org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:228) at org.apache.solr.core.Config.init(Config.java:101) at org.apache.solr.core.SolrConfig.init(SolrConfig.java:130) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:134) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.apache.solr.client.solrj.embedded.JettySolrRunner.start(JettySolrRunner.java:99) at org.apache.solr.client.solrj.embedded.JettySolrRunner.start(JettySolrRunner.java:93) at com.germinait.solr.jetty.StartStopJetty.main(StartStopJetty.java:9) Jan 27, 2010 4:48:56 PM org.apache.solr.core.CoreContainer finalize SEVERE: CoreContainer was not shutdown prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! Jan 27, 2010 4:48:56 PM org.apache.solr.common.SolrException log SEVERE: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in classpath or 'solr/conf/', cwd=/home/ithurs/shellworkspace/SolrPOC at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:260) at org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:228) at org.apache.solr.core.Config.init(Config.java:101) at org.apache.solr.core.SolrConfig.init(SolrConfig.java:130) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:134) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.apache.solr.client.solrj.embedded.JettySolrRunner.start(JettySolrRunner.java:99) at org.apache.solr.client.solrj.embedded.JettySolrRunner.start(JettySolrRunner.java:93) at com.germinait.solr.jetty.StartStopJetty.main(StartStopJetty.java:9) Jan 27, 2010 4:48:56 PM org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() done Jan 27, 2010 4:48:56 PM sun.reflect.NativeMethodAccessorImpl invoke0 WARNING: failed SocketConnector @ 0.0.0.0:8983 java.net.BindException: Address already in use at java.net.PlainSocketImpl.socketBind(Native Method) at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:359) at java.net.ServerSocket.bind(ServerSocket.java:319) at java.net.ServerSocket.init(ServerSocket.java:185) at java.net.ServerSocket.init(ServerSocket.java:141) at org.mortbay.jetty.bio.SocketConnector.newServerSocket(SocketConnector.java:78) at org.mortbay.jetty.bio.SocketConnector.open(SocketConnector.java:72) at org.mortbay.jetty.AbstractConnector.doStart(AbstractConnector.java:252) at org.mortbay.jetty.bio.SocketConnector.doStart(SocketConnector.java:145) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.Server.doStart(Server.java:221) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.apache.solr.client.solrj.embedded.JettySolrRunner.start(JettySolrRunner.java:99) at org.apache.solr.client.solrj.embedded.JettySolrRunner.start(JettySolrRunner.java:93) at com.germinait.solr.jetty.StartStopJetty.main(StartStopJetty.java:9) Jan 27, 2010 4:48:56 PM sun.reflect.NativeMethodAccessorImpl invoke0 is there any way to specify the current working directory? and wht if we hv multicore with several cores, each core has a solrconfig.xml in the conf folder. how would we start a jetty server from the API in that case? Regards, Raakhi Khatwani
Re: Plurals in solr indexing
It would be more informative for you to actually post your schema definitions for the fields in question, along with your copyfield. The summary in your first post leaves a lot of questions unanswered... But a couple of things. 1 beware the SOLR string type. It does NOT tokenize the input. Text type is usually what people want unless they are doing something special purpose. 2 WordDelimiterFilterFactory is often a source of misunderstanding, take a close look at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters3 I'd strongly advise either really getting to know the admin page in SOLR and/or getting a copy of Luke to examine your index and see if what you *think* is in there actually is. 4 Try running your queries with debugQuery=on and see what that shows. HTH Erick On Wed, Jan 27, 2010 at 6:09 AM, murali k ilar...@gmail.com wrote: I have found that my synonyms.txt file had kids,boys,girls,childrens,children,boys girls,kid,boy,girl I ran analyzer, somehow it is matching with girl ,, i am not sure whats happening yet, so i removed ampersand Kids,boys,girls,childrens,children,boy,girl,kid I guessed when i add them comma separated it will do as a group and when any one the words are queried matches will be returned. it is working now... after i made that change in synonyms.txt file murali k wrote: Hi, I am having trouble with indexing plurals, I have the schema with following fields gender (field) - string (field type) (eg. data Boys) all (field) - text (field type) - solr.WhitespaceTokenizerFactory, solr.SynonymFilterFactory, solr.WordDelimiterFilterFactory, solr.LowerCaseFilterFactory, SnowballPorterFilterFactory i am using copyField from gender to all and searching on all field When i search for Boy, I get the results, If i search for Boys i dont get results, I have tried things like boys bikes - no results boy bikes - works kid and kids are synonymns for boy and boys, so i tried adding kid,kids,boy,boys in synonyms hoping it will work, it doesnt work that way I also have other content fields which are copied to all , and it contains words like kids, boys etc... any idea? -- View this message in context: http://old.nabble.com/Plurals-in-solr-indexing-tp27335639p27336508.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Help using CachedSqlEntityProcessor
I recently had issues with CachedSqlEntityProcessor too, figuring out how to use the syntax. After a while, I managed to get it working with cacheKey and cacheLookup. I think this is 1.4 specific though. It seems you have double WHERE clauses, one in the query and one in the where attribute. Try using cacheKey and cacheLookup instead in something like this: entity name=LinkedCategory pk=LinkedCatArticleId query=SELECT LinkedCategoryBC, CmsArticleId as LinkedCatAricleId FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) processor=CachedSqlEntityProcessor cacheKey=LINKEDCATARTICLEID cacheLookup=article.CMSARTICLEID deltaQuery=SELECT LinkedCategoryBC FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) WHERE convert(varchar(50), LastUpdateDate) '${dataimporter.article.last_index_time}' OR convert(varchar(50), PublishDate) '${dataimporter.article.last_index_time}' parentDeltaQuery=SELECT * from vArticleSummaryDetail_SolrSearch (nolock) field column=LinkedCategoryBC name=LinkedCategoryBreadCrumb/ /entity /Rolf Den 2010-01-27 12.36, skrev KirstyS kirst...@gmail.com: Hi, I have looked on the wiki. Using the CachedSqlEntityProcessor looks like it was simple. But I am getting no speed benefit and am not sure if I have even got the syntax correct. I have a main root entity called 'article'. And then I have a number of sub entities. One such entity is as such : entity name=LinkedCategory pk=LinkedCatAricleId query=SELECT LinkedCategoryBC, CmsArticleId as LinkedCatAricleId FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) WHERE convert(varchar(50), CmsArticleId) = convert(varchar(50), '${article.CmsArticleId}') processor=CachedSqlEntityProcessor WHERE=LinkedCatArticleId = article.CmsArticleId deltaQuery=SELECT LinkedCategoryBC FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) WHERE convert(varchar(50), CmsArticleId) = convert(varchar(50), '${article.CmsArticleId}') AND (convert(varchar(50), LastUpdateDate) '${dataimporter.article.last_index_time}' OR convert(varchar(50), PublishDate) '${dataimporter.article.last_index_time}') parentDeltaQuery=SELECT * from vArticleSummaryDetail_SolrSearch (nolock) WHERE convert(varchar(50), CmsArticleId) = convert(varchar(50), '${article.CmsArticleId}') field column=LinkedCategoryBC name=LinkedCategoryBreadCrumb/ /entity As you can see I have added (for the main query - not worrying about the delta queries yet!!) the processor and the 'where' but not sure if it's correct. Can anyone point me in the right direction??? Thanks Kirsty
Re: Lock problems: Lock obtain timed out
Can anyone think of a reason why these locks would hang around for more than 2 hours? I have been monitoring them and they look like they are very short lived. On Tue, Jan 26, 2010 at 10:15 AM, Ian Connor ian.con...@gmail.com wrote: We traced one of the lock files, and it had been around for 3 hours. A restart removed it - but is 3 hours normal for one of these locks? Ian. On Mon, Jan 25, 2010 at 4:14 PM, mike anderson saidthero...@gmail.comwrote: I am getting this exception as well, but disk space is not my problem. What else can I do to debug this? The solr log doesn't appear to lend any other clues.. Jan 25, 2010 4:02:22 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/update params={} status=500 QTime=1990 Jan 25, 2010 4:02:22 PM org.apache.solr.common.SolrException log SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@ /solr8984/index/lucene-98c1cb272eb9e828b1357f68112231e0-write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:85) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1545) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1402) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:190) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Should I consider changing the lock timeout settings (currently set to defaults)? If so, I'm not sure what to base these values on. Thanks in advance, mike On Wed, Nov 4, 2009 at 8:27 PM, Lance Norskog goks...@gmail.com wrote: This will not ever work reliably. You should have 2x total disk space for the index. Optimize, for one, requires this. On Wed, Nov 4, 2009 at 6:37 AM, Jérôme Etévé jerome.et...@gmail.com wrote: Hi, It seems this situation is caused by some No space left on device exeptions: SEVERE: java.io.IOException: No space left on device at java.io.RandomAccessFile.writeBytes(Native Method) at java.io.RandomAccessFile.write(RandomAccessFile.java:466) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:192) at org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96) I'd better try to set my maxMergeDocs and mergeFactor to more adequates values for my app (I'm indexing ~15 Gb of data on 20Gb device, so I guess there's problem when solr tries to merge the index bits being build. At the moment, they are set to mergeFactor100/mergeFactor and maxMergeDocs2147483647/maxMergeDocs Jerome. -- Jerome Eteve. http://www.eteve.net jer...@eteve.net -- Lance Norskog goks...@gmail.com -- Regards, Ian Connor 1 Leighton St
Re: Multiple Cores Vs. Single Core for the following use case
@Marc: Thank you marc. This is a logic we had to implement in the client application. Will look into applying the patch to replace our own grown logic @Trey: I have 1000 users per machine. 1 core / user. Each core is 35000 documents. Documents are small...each core goes from 100MB to 1.3GB at most. There are 7 types of documents. What I am trying to understand is the search/filter algorithm. If I have 1 core with all documents and I search for Paris for userId=123, is lucene going to first search for all Paris documents and then apply a filter on the userId ? If this is the case, then I am better off having a specific index for the user=123 because this will be faster --- On Wed, 1/27/10, Marc Sturlese marc.sturl...@gmail.com wrote: From: Marc Sturlese marc.sturl...@gmail.com Subject: Re: Multiple Cores Vs. Single Core for the following use case To: solr-user@lucene.apache.org Date: Wednesday, January 27, 2010, 2:22 AM In case you are going to use core per user take a look to this patch: http://wiki.apache.org/solr/LotsOfCores Trey-13 wrote: Hi Matt, In most cases you are going to be better off going with the userid method unless you have a very small number of users and a very large number of docs/user. The userid method will likely be much easier to manage, as you won't have to spin up a new core every time you add a new user. I would start here and see if the performance is good enough for your requirements before you start worrying about it not being efficient. That being said, I really don't have any idea what your data looks like. How many users do you have? How many documents per user? Are any documents shared by multiple users? -Trey On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour matthieu_lab...@yahoo.comwrote: Hi Shall I set up Multiple Core or Single core for the following use case: I have X number of users. When I do a search, I always know for which user I am doing a search Shall I set up X cores, 1 for each user ? Or shall I set up 1 core and add a userId field to each document? If I choose the 1 core solution then I am concerned with performance. Let's say I search for NewYork ... If lucene returns all New York matches for all users and then filters based on the userId, then this is going to be less efficient than if I have sharded per user and send the request for New York to the user's core Thank you for your help matt -- View this message in context: http://old.nabble.com/Multiple-Cores-Vs.-Single-Core-for-the-following-use-case-tp27332288p27335403.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple Cores Vs. Single Core for the following use case
On Wed, Jan 27, 2010 at 9:48 AM, Matthieu Labour matthieu_lab...@yahoo.com wrote: What I am trying to understand is the search/filter algorithm. If I have 1 core with all documents and I search for Paris for userId=123, is lucene going to first search for all Paris documents and then apply a filter on the userId ? If this is the case, then I am better off having a specific index for the user=123 because this will be faster If you want to apply the filter to userid first, use filter queries (http://wiki.apache.org/solr/CommonQueryParameters#fq). This will filter by userid first then search for Paris. didier --- On Wed, 1/27/10, Marc Sturlese marc.sturl...@gmail.com wrote: From: Marc Sturlese marc.sturl...@gmail.com Subject: Re: Multiple Cores Vs. Single Core for the following use case To: solr-user@lucene.apache.org Date: Wednesday, January 27, 2010, 2:22 AM In case you are going to use core per user take a look to this patch: http://wiki.apache.org/solr/LotsOfCores Trey-13 wrote: Hi Matt, In most cases you are going to be better off going with the userid method unless you have a very small number of users and a very large number of docs/user. The userid method will likely be much easier to manage, as you won't have to spin up a new core every time you add a new user. I would start here and see if the performance is good enough for your requirements before you start worrying about it not being efficient. That being said, I really don't have any idea what your data looks like. How many users do you have? How many documents per user? Are any documents shared by multiple users? -Trey On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour matthieu_lab...@yahoo.comwrote: Hi Shall I set up Multiple Core or Single core for the following use case: I have X number of users. When I do a search, I always know for which user I am doing a search Shall I set up X cores, 1 for each user ? Or shall I set up 1 core and add a userId field to each document? If I choose the 1 core solution then I am concerned with performance. Let's say I search for NewYork ... If lucene returns all New York matches for all users and then filters based on the userId, then this is going to be less efficient than if I have sharded per user and send the request for New York to the user's core Thank you for your help matt -- View this message in context: http://old.nabble.com/Multiple-Cores-Vs.-Single-Core-for-the-following-use-case-tp27332288p27335403.html Sent from the Solr - User mailing list archive at Nabble.com.
update doc success, but could not find the new value
I am using http://localhost:8983/solr/update?commit=trueoverwrite=truecommitWithi n=10 to update a document. The responseHeader's status is 0. But when I search the new value, it couldn't be found.
Re: Multiple Cores Vs. Single Core for the following use case
I've not looked at the filtering for quite a while, but if you're getting lots of similar queries, the filter's caching can play a huge part in speeding up queries, so even if the first query for paris was slow, subsequent queries from different users for the same terms will be sped up considerably (especially if you're using the FastLRUCache). IF filtering is slow for your queries, why not try simply using a boolean query (i.e, for the example below: paris AND userId:123) this would remove the cross-user usefulness of the caches, if I understand them correctly, but may speed up uncached searches. Toby. On 27 Jan 2010, at 15:48, Matthieu Labour wrote: @Marc: Thank you marc. This is a logic we had to implement in the client application. Will look into applying the patch to replace our own grown logic @Trey: I have 1000 users per machine. 1 core / user. Each core is 35000 documents. Documents are small...each core goes from 100MB to 1.3GB at most. There are 7 types of documents. What I am trying to understand is the search/filter algorithm. If I have 1 core with all documents and I search for Paris for userId=123, is lucene going to first search for all Paris documents and then apply a filter on the userId ? If this is the case, then I am better off having a specific index for the user=123 because this will be faster --- On Wed, 1/27/10, Marc Sturlese marc.sturl...@gmail.com wrote: From: Marc Sturlese marc.sturl...@gmail.com Subject: Re: Multiple Cores Vs. Single Core for the following use case To: solr-user@lucene.apache.org Date: Wednesday, January 27, 2010, 2:22 AM In case you are going to use core per user take a look to this patch: http://wiki.apache.org/solr/LotsOfCores Trey-13 wrote: Hi Matt, In most cases you are going to be better off going with the userid method unless you have a very small number of users and a very large number of docs/user. The userid method will likely be much easier to manage, as you won't have to spin up a new core every time you add a new user. I would start here and see if the performance is good enough for your requirements before you start worrying about it not being efficient. That being said, I really don't have any idea what your data looks like. How many users do you have? How many documents per user? Are any documents shared by multiple users? -Trey On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour matthieu_lab...@yahoo.comwrote: Hi Shall I set up Multiple Core or Single core for the following use case: I have X number of users. When I do a search, I always know for which user I am doing a search Shall I set up X cores, 1 for each user ? Or shall I set up 1 core and add a userId field to each document? If I choose the 1 core solution then I am concerned with performance. Let's say I search for NewYork ... If lucene returns all New York matches for all users and then filters based on the userId, then this is going to be less efficient than if I have sharded per user and send the request for New York to the user's core Thank you for your help matt -- View this message in context: http://old.nabble.com/Multiple-Cores-Vs.-Single-Core-for-the-following-use-case-tp27332288p27335403.html Sent from the Solr - User mailing list archive at Nabble.com. -- Toby Cole Senior Software Engineer, Semantico Limited Registered in England and Wales no. 03841410, VAT no. GB-744614334. Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK. Check out all our latest news and thinking on the Discovery blog http://blogs.semantico.com/discovery-blog/
Re: Multiple Cores Vs. Single Core for the following use case
Thanks Didier for your response And in your opinion, this should be as fast as if I would getCore(userId) -- provided that the core is already open -- and then search for Paris ? matt --- On Wed, 1/27/10, didier deshommes dfdes...@gmail.com wrote: From: didier deshommes dfdes...@gmail.com Subject: Re: Multiple Cores Vs. Single Core for the following use case To: solr-user@lucene.apache.org Date: Wednesday, January 27, 2010, 10:52 AM On Wed, Jan 27, 2010 at 9:48 AM, Matthieu Labour matthieu_lab...@yahoo.com wrote: What I am trying to understand is the search/filter algorithm. If I have 1 core with all documents and I search for Paris for userId=123, is lucene going to first search for all Paris documents and then apply a filter on the userId ? If this is the case, then I am better off having a specific index for the user=123 because this will be faster If you want to apply the filter to userid first, use filter queries (http://wiki.apache.org/solr/CommonQueryParameters#fq). This will filter by userid first then search for Paris. didier --- On Wed, 1/27/10, Marc Sturlese marc.sturl...@gmail.com wrote: From: Marc Sturlese marc.sturl...@gmail.com Subject: Re: Multiple Cores Vs. Single Core for the following use case To: solr-user@lucene.apache.org Date: Wednesday, January 27, 2010, 2:22 AM In case you are going to use core per user take a look to this patch: http://wiki.apache.org/solr/LotsOfCores Trey-13 wrote: Hi Matt, In most cases you are going to be better off going with the userid method unless you have a very small number of users and a very large number of docs/user. The userid method will likely be much easier to manage, as you won't have to spin up a new core every time you add a new user. I would start here and see if the performance is good enough for your requirements before you start worrying about it not being efficient. That being said, I really don't have any idea what your data looks like. How many users do you have? How many documents per user? Are any documents shared by multiple users? -Trey On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour matthieu_lab...@yahoo.comwrote: Hi Shall I set up Multiple Core or Single core for the following use case: I have X number of users. When I do a search, I always know for which user I am doing a search Shall I set up X cores, 1 for each user ? Or shall I set up 1 core and add a userId field to each document? If I choose the 1 core solution then I am concerned with performance. Let's say I search for NewYork ... If lucene returns all New York matches for all users and then filters based on the userId, then this is going to be less efficient than if I have sharded per user and send the request for New York to the user's core Thank you for your help matt -- View this message in context: http://old.nabble.com/Multiple-Cores-Vs.-Single-Core-for-the-following-use-case-tp27332288p27335403.html Sent from the Solr - User mailing list archive at Nabble.com.
How to Implement SpanQuery in Solr . . ?
I am about to attempt to implementing the SpanQuery in Solr 1.4. I noticed there is a JIRA to add it in 1.5: * https://issues.apache.org/jira/browse/SOLR-1337 I also noticed a couple of email threads from Grant and Yonik about trying to implement it such as: * http://old.nabble.com/SpanQuery-support-td15246477.html So . . . * Question: Has anyone started working on SOLR-1337 for Solr 1.5? And if not . . . * Question: Is the best way to go about it is to follow the following recipe? 1. Configure a. Specify a new parser plugin in solrconfig.xml: b. queryParser name=mySpanQueryParser class=SpanQueryParserPlugin/ 2. Implement a. Use the FooQParserPlugin as a starting template (https://svn.apache.org/repos/asf/lucene/solr/trunk/src/test/org/apache/solr /core/SOLR749Test.java) 3. Access a. Access the current query type via 'q=mySpanQueryParser ' Most grateful for any thoughts, Christopher
Re: How to Implement SpanQuery in Solr . . ?
As always, I'd try starting with what the user interface (in this case, syntax) should look like. It makes sense to add elementary spans first. {!spannear a=query1 b=query2 slop=10} Thinking about implementation... what would really magnify the usefulness of the basic API above is to convert non-span queries to span queries automatically. This is useful because the sub-queries of a span query must be span queries, and most query parsers generate non-span queries. I think there is code in the highlighter that uses spans that can do this conversion. -Yonik http://www.lucidimagination.com On Wed, Jan 27, 2010 at 12:24 PM, Christopher Ball christopher.b...@metaheuristica.com wrote: I am about to attempt to implementing the SpanQuery in Solr 1.4. I noticed there is a JIRA to add it in 1.5: * https://issues.apache.org/jira/browse/SOLR-1337 I also noticed a couple of email threads from Grant and Yonik about trying to implement it such as: * http://old.nabble.com/SpanQuery-support-td15246477.html So . . . * Question: Has anyone started working on SOLR-1337 for Solr 1.5? And if not . . . * Question: Is the best way to go about it is to follow the following recipe? 1. Configure a. Specify a new parser plugin in solrconfig.xml: b. queryParser name=mySpanQueryParser class=SpanQueryParserPlugin/ 2. Implement a. Use the FooQParserPlugin as a starting template (https://svn.apache.org/repos/asf/lucene/solr/trunk/src/test/org/apache/solr /core/SOLR749Test.java) 3. Access a. Access the current query type via 'q=mySpanQueryParser ' Most grateful for any thoughts, Christopher
filter query error
NewBie Using Solr1.4 I am trying to use a filter query that filters on more than one value for a given filter ie. filters on field equals value1 or value2 If I enter the following 2 urls in a browser I get back the correct results I am looking for: http://localhost:8080/apache-solr-1.4.0/select/?q=helpfl=*,scorefq=+searchScope:SRM+searchScope:SMNindent=on or http://localHost:8080/apache-solr-1.4.0/select/?q=helpfl=*,searchScope,scorefq=searchScope:(SRM+OR+SMN)indent=on But when I try to do it programitically I get an error. It only works when I am filtering on 1 value, but when I try more than one value it fails. See code snippet and error message below. When I use filter2 or filter3 it fails, but filter1 gives me no Errors Not sure what I am doing wrong. Any help would be greatly appreciated. -Begin Code snippet --- String query = help; //String filter1 = searchScope:SRM //String filter2 = +searchScope:SRM+searchScope:SMN; String filter3 = searchScope:(SRM+OR+SMN); SolrQuery solrQuery = new SolrQuery(query); solrQuery.addFilterQuery(filter3); QueryResponse response = solr.query(solrQuery); -End Code snippet --- I have tried using SolrQuery solrQuery = new SolrQuery(ClientUtils.escapeQueryChars(query)); solrQuery.addFilterQuery(ClientUtils.escapeQueryChars(filter)); But that returns no results Also note that if I cut and paste the url from the error message below, it fails when I paste it in a browser, but I can get it to work only if I remove the wt=javabin parameter. Error Message Exception in thread main org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) at com.xyz.search.SolrSearch.performSearch(SolrSearch.java:126) at com.xyz.search.SearchMain.main(SearchMain.java:23) Caused by: org.apache.solr.common.SolrException: org.apache.lucene.queryParser.ParseException: Cannot parse ' searchScope:SRM searchScope:SMN': Encountered : : at line 1, column 28. Was expecting one of: EOF AND ... OR ... NOT ...... - ... ( ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... org.apache.lucene.queryParser.ParseException: Cannot parse ' searchScope:SRM searchScope:SMN': Encountered : : at line 1, column 28. Was expecting one of: EOF AND ... OR ... NOT ...... - ... ( ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... request: http://localhost:8080/apache-solr-1.4.0/select?q=helpfq=+searchScope:SRM+searchScope:SMNhl=truerows=15wt=javabinversion=1 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) ... 3 more
Re: update doc success, but could not find the new value
Ummm, you have to provide a *lot* more detail before anyone can help. Have you used Luke or the admin page to examine your index and determine that the update did, indeed, work? Have you tried firing your query with debugQuery=on to see if the fields searched are the ones you expect? etc. Erick On Wed, Jan 27, 2010 at 11:54 AM, Jennifer Luo jenni...@talenttech.comwrote: I am using http://localhost:8983/solr/update?commit=trueoverwrite=truecommitWithi n=10 to update a document. The responseHeader's status is 0. But when I search the new value, it couldn't be found.
doc with missing highlight info
Hi, I have a query where the query matches the document but no highlighting info is returned. Why? Normally, highlighting returns correctly. This query is different from others in that it uses a phrase like CR1428-Occ1 Field: field name=destSpan type=text indexed=true stored=true termVectors=true termPositions=true termOffsets=true / query: http://localhost:8080/solr/select?q=destSpan%3A%28%22CR1428-Occ2%22%29fl=destSpanhl=truehl.fl=destSpan results: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=fldestSpan/str str name=qdestSpan:(CR1428-Occ2)/str str name=hl.fldestSpan/str str name=hltrue/str /lst /lst result name=response numFound=1 start=0 doc str name=destSpan CR1428-Occ2 abcCR1428 .../str /doc /result lst name=highlighting lst name=6de31965cda3612c0932a4ea51aba23f8c666c7f/ /lst /response Tim Harsch Sr. Software Engineer Dell Perot Systems ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=fldestSpan/str str name=qdestSpan:(CR1428-Occ2)/str str name=hl.fldestSpan/str str name=hltrue/str /lst /lst result name=response numFound=1 start=0 doc str name=destSpan CR1428-Occ2 abcCR1428 is a token for searching with SPAN testuser System of Registries 2010-01-22T23:01:00.000Z 2010-01-22T23:01:00.000Z testuser System of Registries/str /doc /result lst name=highlighting lst name=6de31965cda3612c0932a4ea51aba23f8c666c7f/ /lst /response
RE: update doc success, but could not find the new value
I am using example, only with two fields, id and body. Id is string field, body is text field. I use another program to do a http post to update the document, url is http://localhost:8983/solr/update?commit=trueoverwrite=truecommitWithi n=10 , the data is add doc field name=idid1/field field name=bodytest body/field /doc /add I get the responseHeader back, the status is 0. Then I go to admin page, do search, query is body:test. The result numFound = 0. I think the reason should be the index is not updated with the updated document. What should I do? What's is missing? Jennifer Luo -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, January 27, 2010 1:39 PM To: solr-user@lucene.apache.org Subject: Re: update doc success, but could not find the new value Ummm, you have to provide a *lot* more detail before anyone can help. Have you used Luke or the admin page to examine your index and determine that the update did, indeed, work? Have you tried firing your query with debugQuery=on to see if the fields searched are the ones you expect? etc. Erick On Wed, Jan 27, 2010 at 11:54 AM, Jennifer Luo jenni...@talenttech.comwrote: I am using http://localhost:8983/solr/update?commit=trueoverwrite=truecommitWithi n=10 to update a document. The responseHeader's status is 0. But when I search the new value, it couldn't be found.
Re: filter query error
I am trying to use a filter query that filters on more than one value for a given filter ie. filters on field equals value1 or value2 If I enter the following 2 urls in a browser I get back the correct results I am looking for: http://localhost:8080/apache-solr-1.4.0/select/?q=helpfl=*,scorefq=+searchScope:SRM+searchScope:SMNindent=on or http://localHost:8080/apache-solr-1.4.0/select/?q=helpfl=*,searchScope,scorefq=searchScope:(SRM+OR+SMN)indent=on But when I try to do it programitically I get an error. It only works when I am filtering on 1 value, but when I try more than one value it fails. See code snippet and error message below. When I use filter2 or filter3 it fails, but filter1 gives me no Errors Not sure what I am doing wrong. Any help would be greatly appreciated. -Begin Code snippet --- String query = help; //String filter1 = searchScope:SRM //String filter2 = +searchScope:SRM+searchScope:SMN; String filter3 = searchScope:(SRM+OR+SMN); You need to replace + with space: String filter3 = searchScope:(SRM OR SMN); should work.
Re: Wildcard Search and Filter in Solr
Hi just looked at the analysis.jsp and found out what it does during index / query Index Analyzer Intel intel intel intel intel intel If the resultant token is intel, then q=inte* should return documents. What says when you add debugQuery=on to your search url? And why are you using an old version of solr?
Re: Wildcard Search and Filter in Solr
Note that the query analyzer output is NOT doing query _parsing_, but rather taking the string you passed and running it through the query analyzer only. When using the default query parser, Inte* will be a search for terms that begin with inte. It is odd that you're not finding it. But you're using a pretty old version of Solr and quite likely something here has been fixed since. Give Solr 1.4 a try. Erik On Jan 27, 2010, at 12:56 AM, ashokcz wrote: Hi just looked at the analysis.jsp and found out what it does during index / query Index Analyzer Intel intel intel intel intel intel Query Analyzer Inte* Inte* inte* inte inte inte int I think somewhere my configuration or my definition of the type text is wrong. This is my configuration . fieldType class=solr.TextField name=text analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter catenateAll=0 catenateNumbers=0 catenateWords=0 class=solr.WordDelimiterFilterFactory generateNumberParts=1 generateWordParts=1/ filter class=solr.StopFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory expand=true ignoreCase=true synonyms=synonyms.txt/ filter class=solr.LowerCaseFilterFactory/ filter catenateAll=0 catenateNumbers=0 catenateWords=0 class=solr.WordDelimiterFilterFactory generateNumberParts=1 generateWordParts=1/ filter class=solr.StopFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I think i am missing some basic configuration for doing wildcard searches . but could not figure it out . can someone help please Ahmet Arslan wrote: Hi , I m trying to use wildcard keywords in my search term and filter term . but i didnt get any results. Searched a lot but could not find any lead . Can someone help me in this. i m using solr 1.2.0 and have few records indexed with vendorName value as Intel In solr admin interface i m trying to do the search like this http://localhost:8983/solr/select?indent=onversion=2.2q=intelstart=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl.fl= and i m getting the result properly but when i use q=inte* no records are returned. the same is the case for Filter Query on using fq=VendorName:Intel i get my results. but on using fq=VendorName:Inte* no results are returned. I can guess i doing mistake in few obvious things , but could not figure it out .. Can someone pls help me out :) :) If q=intel returns documents while q=inte* does not, it means that fieldType of your defaultSearchField is reducing the token intel into something. Can you find out it by using /admin/anaysis.jsp what happens to Intel intel at index and query time? What is your defaultSearchField? Is it VendorName? It is expected that fq=VendorName:Intel returns results while fq=VendorName:Inte* does not. Because prefix queries are not analyzed. But it is strange that q=inte* does not return anything. Maybe your index analyzer is reducing Intel into int or ıntel? I am not 100% sure but solr 1.2.0 may use default locale in lowercase operation. What is your default locale? It is better to see what happens word Intel using analysis.jsp page. -- View this message in context: http://old.nabble.com/Wildcard-Search-and-Filter-in-Solr-tp27306734p27334486.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Help using CachedSqlEntityProcessor
Thanks. I am on 1.4..so maybe that is the problem. Will try when I get back to work tomorrow. Thanks Rolf Johansson-2 wrote: I recently had issues with CachedSqlEntityProcessor too, figuring out how to use the syntax. After a while, I managed to get it working with cacheKey and cacheLookup. I think this is 1.4 specific though. It seems you have double WHERE clauses, one in the query and one in the where attribute. Try using cacheKey and cacheLookup instead in something like this: entity name=LinkedCategory pk=LinkedCatArticleId query=SELECT LinkedCategoryBC, CmsArticleId as LinkedCatAricleId FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) processor=CachedSqlEntityProcessor cacheKey=LINKEDCATARTICLEID cacheLookup=article.CMSARTICLEID deltaQuery=SELECT LinkedCategoryBC FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) WHERE convert(varchar(50), LastUpdateDate) '${dataimporter.article.last_index_time}' OR convert(varchar(50), PublishDate) '${dataimporter.article.last_index_time}' parentDeltaQuery=SELECT * from vArticleSummaryDetail_SolrSearch (nolock) field column=LinkedCategoryBC name=LinkedCategoryBreadCrumb/ /entity /Rolf Den 2010-01-27 12.36, skrev KirstyS kirst...@gmail.com: Hi, I have looked on the wiki. Using the CachedSqlEntityProcessor looks like it was simple. But I am getting no speed benefit and am not sure if I have even got the syntax correct. I have a main root entity called 'article'. And then I have a number of sub entities. One such entity is as such : entity name=LinkedCategory pk=LinkedCatAricleId query=SELECT LinkedCategoryBC, CmsArticleId as LinkedCatAricleId FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) WHERE convert(varchar(50), CmsArticleId) = convert(varchar(50), '${article.CmsArticleId}') processor=CachedSqlEntityProcessor WHERE=LinkedCatArticleId = article.CmsArticleId deltaQuery=SELECT LinkedCategoryBC FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) WHERE convert(varchar(50), CmsArticleId) = convert(varchar(50), '${article.CmsArticleId}') AND (convert(varchar(50), LastUpdateDate) '${dataimporter.article.last_index_time}' OR convert(varchar(50), PublishDate) '${dataimporter.article.last_index_time}') parentDeltaQuery=SELECT * from vArticleSummaryDetail_SolrSearch (nolock) WHERE convert(varchar(50), CmsArticleId) = convert(varchar(50), '${article.CmsArticleId}') field column=LinkedCategoryBC name=LinkedCategoryBreadCrumb/ /entity As you can see I have added (for the main query - not worrying about the delta queries yet!!) the processor and the 'where' but not sure if it's correct. Can anyone point me in the right direction??? Thanks Kirsty -- View this message in context: http://old.nabble.com/Help-using-CachedSqlEntityProcessor-tp27337635p27345412.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: doc with missing highlight info (bug found?!?)
The more I play with values the more I realize highlighting seems to have a bug. It seems to do with tokenizing. WILL match and highlight: Query: TOKEN Data:token Query: SEARCH Data:searching Query: abcCR Data: abcCR1428(highlights abcCR) WILL match and NOT highlight: Query: abcCR1428 Data: abcCR1428 -Original Message- From: Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] [mailto:timothy.j.har...@nasa.gov] Sent: Wednesday, January 27, 2010 10:42 AM To: solr-user@lucene.apache.org Subject: doc with missing highlight info Hi, I have a query where the query matches the document but no highlighting info is returned. Why? Normally, highlighting returns correctly. This query is different from others in that it uses a phrase like CR1428-Occ1 Field: field name=destSpan type=text indexed=true stored=true termVectors=true termPositions=true termOffsets=true / query: http://localhost:8080/solr/select?q=destSpan%3A%28%22CR1428-Occ2%22%29fl=destSpanhl=truehl.fl=destSpan results: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=fldestSpan/str str name=qdestSpan:(CR1428-Occ2)/str str name=hl.fldestSpan/str str name=hltrue/str /lst /lst result name=response numFound=1 start=0 doc str name=destSpan CR1428-Occ2 abcCR1428 .../str /doc /result lst name=highlighting lst name=6de31965cda3612c0932a4ea51aba23f8c666c7f/ /lst /response Tim Harsch Sr. Software Engineer Dell Perot Systems
Re: Multiple Cores Vs. Single Core for the following use case
It sounds to me that multiple cores won't scale.. wouldn't you have to create multiple configurations per each core and does the ranking function change per user? I would imagine that the filter method would work better.. the caching is there and as mentioned earlier would be fast for multiple searches. If you have searches for the same user, then add that to your warming queries list so that on server startup, the cache will be warm for certain users that you know tend to do a lot of searches. This can be known empirically or by log mining. I haven't used multiple cores but I suspect that having that many configuration files parsed and loaded in memory can't be good for memory usage over filter caching. Just my 2 cents Amit On Wed, Jan 27, 2010 at 8:58 AM, Matthieu Labour matthieu_lab...@yahoo.comwrote: Thanks Didier for your response And in your opinion, this should be as fast as if I would getCore(userId) -- provided that the core is already open -- and then search for Paris ? matt --- On Wed, 1/27/10, didier deshommes dfdes...@gmail.com wrote: From: didier deshommes dfdes...@gmail.com Subject: Re: Multiple Cores Vs. Single Core for the following use case To: solr-user@lucene.apache.org Date: Wednesday, January 27, 2010, 10:52 AM On Wed, Jan 27, 2010 at 9:48 AM, Matthieu Labour matthieu_lab...@yahoo.com wrote: What I am trying to understand is the search/filter algorithm. If I have 1 core with all documents and I search for Paris for userId=123, is lucene going to first search for all Paris documents and then apply a filter on the userId ? If this is the case, then I am better off having a specific index for the user=123 because this will be faster If you want to apply the filter to userid first, use filter queries (http://wiki.apache.org/solr/CommonQueryParameters#fq). This will filter by userid first then search for Paris. didier --- On Wed, 1/27/10, Marc Sturlese marc.sturl...@gmail.com wrote: From: Marc Sturlese marc.sturl...@gmail.com Subject: Re: Multiple Cores Vs. Single Core for the following use case To: solr-user@lucene.apache.org Date: Wednesday, January 27, 2010, 2:22 AM In case you are going to use core per user take a look to this patch: http://wiki.apache.org/solr/LotsOfCores Trey-13 wrote: Hi Matt, In most cases you are going to be better off going with the userid method unless you have a very small number of users and a very large number of docs/user. The userid method will likely be much easier to manage, as you won't have to spin up a new core every time you add a new user. I would start here and see if the performance is good enough for your requirements before you start worrying about it not being efficient. That being said, I really don't have any idea what your data looks like. How many users do you have? How many documents per user? Are any documents shared by multiple users? -Trey On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour matthieu_lab...@yahoo.comwrote: Hi Shall I set up Multiple Core or Single core for the following use case: I have X number of users. When I do a search, I always know for which user I am doing a search Shall I set up X cores, 1 for each user ? Or shall I set up 1 core and add a userId field to each document? If I choose the 1 core solution then I am concerned with performance. Let's say I search for NewYork ... If lucene returns all New York matches for all users and then filters based on the userId, then this is going to be less efficient than if I have sharded per user and send the request for New York to the user's core Thank you for your help matt -- View this message in context: http://old.nabble.com/Multiple-Cores-Vs.-Single-Core-for-the-following-use-case-tp27332288p27335403.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: update doc success, but could not find the new value
It works. I made some mistake in my code. Jennifer Luo -Original Message- From: Jennifer Luo [mailto:jenni...@talenttech.com] Sent: Wednesday, January 27, 2010 1:57 PM To: solr-user@lucene.apache.org Subject: RE: update doc success, but could not find the new value I am using example, only with two fields, id and body. Id is string field, body is text field. I use another program to do a http post to update the document, url is http://localhost:8983/solr/update?commit=trueoverwrite=truecommitWithi n=10 , the data is add doc field name=idid1/field field name=bodytest body/field /doc /add I get the responseHeader back, the status is 0. Then I go to admin page, do search, query is body:test. The result numFound = 0. I think the reason should be the index is not updated with the updated document. What should I do? What's is missing? Jennifer Luo -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, January 27, 2010 1:39 PM To: solr-user@lucene.apache.org Subject: Re: update doc success, but could not find the new value Ummm, you have to provide a *lot* more detail before anyone can help. Have you used Luke or the admin page to examine your index and determine that the update did, indeed, work? Have you tried firing your query with debugQuery=on to see if the fields searched are the ones you expect? etc. Erick On Wed, Jan 27, 2010 at 11:54 AM, Jennifer Luo jenni...@talenttech.comwrote: I am using http://localhost:8983/solr/update?commit=trueoverwrite=truecommitWithi n=10 to update a document. The responseHeader's status is 0. But when I search the new value, it couldn't be found.
RE: update doc success, but could not find the new value
Check out Jetty's output or Tomcat's logs. The logging is very verbose and you can get a clearer picture. Jennifer Luo said: I am using example, only with two fields, id and body. Id is string field, body is text field. I use another program to do a http post to update the document, url is http://localhost:8983/solr/update?commit=trueoverwrite=truecommitWithi n=10 , the data is add doc field name=idid1/field field name=bodytest body/field /doc /add I get the responseHeader back, the status is 0. Then I go to admin page, do search, query is body:test. The result numFound = 0. I think the reason should be the index is not updated with the updated document. What should I do? What's is missing? Jennifer Luo -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, January 27, 2010 1:39 PM To: solr-user@lucene.apache.org Subject: Re: update doc success, but could not find the new value Ummm, you have to provide a *lot* more detail before anyone can help. Have you used Luke or the admin page to examine your index and determine that the update did, indeed, work? Have you tried firing your query with debugQuery=on to see if the fields searched are the ones you expect? etc. Erick On Wed, Jan 27, 2010 at 11:54 AM, Jennifer Luo jenni...@talenttech.comwrote: I am using http://localhost:8983/solr/update?commit=trueoverwrite=truecommitWithi n=10 to update a document. The responseHeader's status is 0. But when I search the new value, it couldn't be found.
Re: Multiple Cores Vs. Single Core for the following use case
Hi - I'd probably go with a single core on this one, just for ease of operations. But here are some thoughts: One advantage I can see to multiple cores, though, would be better idf calculations. With individual cores, each user only sees the idf for his own documents. With a single core, the idf will be across all documents. In theory, better relevance. While multi-core will use more ram to start with, and I would expect it to use more disk (term dictionary per core). Filters would add to the memory footprint of the multiple core setup. However, if you only end up sorting/faceting on some of the cores, your memory use with multiple cores may actually be less. With multiple cores, each field cache only covers one user's docs. With single core, you have one field cache entry per doc in the whole corpus. Depending on usage patterns, index sizes, etc, this could be a significant amount of memory. Tom On Wed, Jan 27, 2010 at 11:38 AM, Amit Nithian anith...@gmail.com wrote: It sounds to me that multiple cores won't scale.. wouldn't you have to create multiple configurations per each core and does the ranking function change per user? I would imagine that the filter method would work better.. the caching is there and as mentioned earlier would be fast for multiple searches. If you have searches for the same user, then add that to your warming queries list so that on server startup, the cache will be warm for certain users that you know tend to do a lot of searches. This can be known empirically or by log mining. I haven't used multiple cores but I suspect that having that many configuration files parsed and loaded in memory can't be good for memory usage over filter caching. Just my 2 cents Amit On Wed, Jan 27, 2010 at 8:58 AM, Matthieu Labour matthieu_lab...@yahoo.comwrote: Thanks Didier for your response And in your opinion, this should be as fast as if I would getCore(userId) -- provided that the core is already open -- and then search for Paris ? matt --- On Wed, 1/27/10, didier deshommes dfdes...@gmail.com wrote: From: didier deshommes dfdes...@gmail.com Subject: Re: Multiple Cores Vs. Single Core for the following use case To: solr-user@lucene.apache.org Date: Wednesday, January 27, 2010, 10:52 AM On Wed, Jan 27, 2010 at 9:48 AM, Matthieu Labour matthieu_lab...@yahoo.com wrote: What I am trying to understand is the search/filter algorithm. If I have 1 core with all documents and I search for Paris for userId=123, is lucene going to first search for all Paris documents and then apply a filter on the userId ? If this is the case, then I am better off having a specific index for the user=123 because this will be faster If you want to apply the filter to userid first, use filter queries (http://wiki.apache.org/solr/CommonQueryParameters#fq). This will filter by userid first then search for Paris. didier --- On Wed, 1/27/10, Marc Sturlese marc.sturl...@gmail.com wrote: From: Marc Sturlese marc.sturl...@gmail.com Subject: Re: Multiple Cores Vs. Single Core for the following use case To: solr-user@lucene.apache.org Date: Wednesday, January 27, 2010, 2:22 AM In case you are going to use core per user take a look to this patch: http://wiki.apache.org/solr/LotsOfCores Trey-13 wrote: Hi Matt, In most cases you are going to be better off going with the userid method unless you have a very small number of users and a very large number of docs/user. The userid method will likely be much easier to manage, as you won't have to spin up a new core every time you add a new user. I would start here and see if the performance is good enough for your requirements before you start worrying about it not being efficient. That being said, I really don't have any idea what your data looks like. How many users do you have? How many documents per user? Are any documents shared by multiple users? -Trey On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour matthieu_lab...@yahoo.comwrote: Hi Shall I set up Multiple Core or Single core for the following use case: I have X number of users. When I do a search, I always know for which user I am doing a search Shall I set up X cores, 1 for each user ? Or shall I set up 1 core and add a userId field to each document? If I choose the 1 core solution then I am concerned with performance. Let's say I search for NewYork ... If lucene returns all New York matches for all users and then filters based on the userId, then this is going to be less efficient than if I have sharded per user and send the request for New York to the user's core Thank you for your help matt -- View this message in context:
Re: Plurals in solr indexing
I recommend getting familiar with the analysis tool included with solr. From Solr's main admin screen, click on analysis, Check verbose, and enter your text, and you can see the changes that happen during analysis. It's really helpful, especially when getting started. Tom On Wed, Jan 27, 2010 at 2:41 AM, murali k ilar...@gmail.com wrote: Hi, I am having trouble with indexing plurals, I have the schema with following fields gender (field) - string (field type) (eg. data Boys) all (field) - text (field type) - solr.WhitespaceTokenizerFactory, solr.SynonymFilterFactory, solr.WordDelimiterFilterFactory, solr.LowerCaseFilterFactory, SnowballPorterFilterFactory i am using copyField from gender to all and searching on all field When i search for Boy, I get the results, If i search for Boys i dont get results, I have tried things like boys bikes - no results boy bikes - works kid and kids are synonymns for boy and boys, so i tried adding kid,kids,boy,boys in synonyms hoping it will work, it doesnt work that way I also have other content fields which are copied to all , and it contains words like kids, boys etc... any idea? -- View this message in context: http://old.nabble.com/Plurals-in-solr-indexing-tp27335639p27335639.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: filter query error
thanks! that worked. From: jxkmailbox...@yahoo.com jxkmailbox...@yahoo.com To: solr-user@lucene.apache.org Sent: Wed, January 27, 2010 1:28:07 PM Subject: filter query error NewBie Using Solr1.4 I am trying to use a filter query that filters on more than one value for a given filter ie. filters on field equals value1 or value2 If I enter the following 2 urls in a browser I get back the correct results I am looking for: http://localhost:8080/apache-solr-1.4.0/select/?q=helpfl=*,scorefq=+searchScope:SRM+searchScope:SMNindent=on or http://localHost:8080/apache-solr-1.4.0/select/?q=helpfl=*,searchScope,scorefq=searchScope:%28SRM+OR+SMN%29indent=on But when I try to do it programitically I get an error. It only works when I am filtering on 1 value, but when I try more than one value it fails. See code snippet and error message below. When I use filter2 or filter3 it fails, but filter1 gives me no Errors Not sure what I am doing wrong. Any help would be greatly appreciated. -Begin Code snippet --- String query = help; //String filter1 = searchScope:SRM //String filter2 = +searchScope:SRM+searchScope:SMN; String filter3 = searchScope:(SRM+OR+SMN); SolrQuery solrQuery = new SolrQuery(query); solrQuery.addFilterQuery(filter3); QueryResponse response = solr.query(solrQuery); -End Code snippet --- I have tried using SolrQuery solrQuery = new SolrQuery(ClientUtils.escapeQueryChars(query)); solrQuery.addFilterQuery(ClientUtils.escapeQueryChars(filter)); But that returns no results Also note that if I cut and paste the url from the error message below, it fails when I paste it in a browser, but I can get it to work only if I remove the wt=javabin parameter. Error Message Exception in thread main org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) at com.xyz.search.SolrSearch.performSearch(SolrSearch.java:126) at com.xyz.search.SearchMain.main(SearchMain.java:23) Caused by: org.apache.solr.common.SolrException: org.apache.lucene.queryParser.ParseException: Cannot parse ' searchScope:SRM searchScope:SMN': Encountered : : at line 1, column 28. Was expecting one of: EOF AND ... OR ... NOT ...... - ... ( ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... org.apache.lucene.queryParser.ParseException: Cannot parse ' searchScope:SRM searchScope:SMN': Encountered : : at line 1, column 28. Was expecting one of: EOF AND ... OR ... NOT ...... - ... ( ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... request: http://localhost:8080/apache-solr-1.4.0/select?q=helpfq=+searchScope:SRM+searchScope:SMNhl=truerows=15wt=javabinversion=1 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) ... 3 more
Re: filter query error
thanks! that worked. From: Ahmet Arslan iori...@yahoo.com To: solr-user@lucene.apache.org Sent: Wed, January 27, 2010 2:00:32 PM Subject: Re: filter query error I am trying to use a filter query that filters on more than one value for a given filter ie. filters on field equals value1 or value2 If I enter the following 2 urls in a browser I get back the correct results I am looking for: http://localhost:8080/apache-solr-1.4.0/select/?q=helpfl=*,scorefq=+searchScope:SRM+searchScope:SMNindent=on or http://localHost:8080/apache-solr-1.4.0/select/?q=helpfl=*,searchScope,scorefq=searchScope:(SRM+OR+SMN)indent=on But when I try to do it programitically I get an error. It only works when I am filtering on 1 value, but when I try more than one value it fails. See code snippet and error message below. When I use filter2 or filter3 it fails, but filter1 gives me no Errors Not sure what I am doing wrong. Any help would be greatly appreciated. -Begin Code snippet --- String query = help; //String filter1 = searchScope:SRM //String filter2 = +searchScope:SRM+searchScope:SMN; String filter3 = searchScope:(SRM+OR+SMN); You need to replace + with space: String filter3 = searchScope:(SRM OR SMN); should work.
Can Solr be forced to return all field tags for a document even if the field is empty?l
I have a field Title and Summary. I've currently not set a default value for the Summary in my schema, it's just a text field with indexed=true and stored=true, but not required. When the data is indexed sometimes the documents don't have a summary so then Solr doesn't index that field. When a query is sent and we get the results for those documents returned, if they did not have a summary then there is no tagged in the xml for that field. Is there a way to have the xml always return the field tags for each document in the result set even if the field has no data? I apologize ahead of time if this has been answered, but after doing a bit of search have not been able to find the answer elsewhere. Thanks Robbin
Re: doc with missing highlight info
Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] wrote: Hi, I have a query where the query matches the document but no highlighting info is returned. Why? Normally, highlighting returns correctly. This query is different from others in that it uses a phrase like CR1428-Occ1 Field: field name=destSpan type=text indexed=true stored=true termVectors=true termPositions=true termOffsets=true / query: http://localhost:8080/solr/select?q=destSpan%3A%28%22CR1428-Occ2%22%29fl=destSpanhl=truehl.fl=destSpan results: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=fldestSpan/str str name=qdestSpan:(CR1428-Occ2)/str str name=hl.fldestSpan/str str name=hltrue/str /lst /lst result name=response numFound=1 start=0 doc str name=destSpan CR1428-Occ2 abcCR1428 .../str /doc /result lst name=highlighting lst name=6de31965cda3612c0932a4ea51aba23f8c666c7f/ /lst /response Tim Harsch Sr. Software Engineer Dell Perot Systems Which Solr version are you using? If trunk, you are using FastVectorHighlighter because destSpan is termVectors/termPositions/termOffsets are on. If so, you can use (traditional) Highlighter explicitly by specifying hl.useHighlighter=true: http://wiki.apache.org/solr/HighlightingParameters#hl.useHighlighter If you are using FVH, can you give me info of fieldType name=text/? Thanks, Koji -- http://www.rondhuit.com/en/
Re: solr with tomcat in cluster mode
Linux includes a load-balancer program 'balance'. You set it up at a third port and configure it to use 'localhost:8180' and 'localhost:8280'. On Wed, Jan 27, 2010 at 4:06 AM, ZAROGKIKAS,GIORGOS g.zarogki...@multirama.gr wrote: Hi again I finally setup my solr Cluster with tomcat6 The configuration I user is two tomcat servers on the same server in different ports(ex localhost:8180/solr and Localhost:8280/solr for testing purposes) with different indexes on each server and index replication through replication handler of solr , and its working fine for me and very quick Now I want to use load balance for these two tomcat servers but without using apache http server Is there any solution for that ??? -Original Message- From: Matt Mitchell [mailto:goodie...@gmail.com] Sent: Friday, January 22, 2010 9:33 PM To: solr-user@lucene.apache.org Subject: Re: solr with tomcat in cluster mode Hey Otis, We're indexing on a separate machine because we want to keep our production nodes away from processes like indexing. The indexing server also has a ton of resources available, more so than the production nodes. We set it up as an indexing server at one point and have decided to stick with it. We're not indexing the same index as the search indexes because we want to be able to step back a day or two if needed. So we do the SWAP when things are done and OK. So that last part you mentioned about the searchers needing to re-open will happen with a SWAP right? Is your concern that there will be a lag time, making it so the slaves will be out of sync for some small period of time? Would it be simpler/better to move to using Solrs native slave/master feature? I'd love to hear any suggestions you might have. Thanks, Matt On Fri, Jan 22, 2010 at 1:58 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: This should work fine. But why are you indexing to a separate index/core? Why not index in the very same index you are searching? Slaves won't see changes until their searchers re-open. Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: Matt Mitchell goodie...@gmail.com To: solr-user@lucene.apache.org Sent: Fri, January 22, 2010 9:44:03 AM Subject: Re: solr with tomcat in cluster mode We have a similar setup and I'd be curious to see how folks are doing this as well. Our setup: A few servers and an F5 load balancer. Each Solr instance points to a shared index. We use a separate server for indexing. When the index is complete, we do some juggling using the Core Admin SWAP function and update the shared index. I've wondered about having a shared index across multiple instances of (read-only) Solr -- any problems there? Matt On Fri, Jan 22, 2010 at 9:35 AM, ZAROGKIKAS,GIORGOS g.zarogki...@multirama.gr wrote: Hi I'm using solr 1.4 with tomcat in a single pc and I want to turn it in cluster mode with 2 nodes and load balancing But I can't find info how to do Is there any manual or a recorded procedure on the internet to do that Or is there anyone to help me ? Thanks in advance Ps : I use windows server 2008 for OS -- Lance Norskog goks...@gmail.com
RE: Solr wiki link broken
Why don't we change the links to have FrontPage explicitly? Wouldn't it be the easiest fix unless there are numerous other pages that references the default page w/o FrontPage? -kuro -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Tuesday, January 26, 2010 4:41 PM To: solr-user@lucene.apache.org Subject: RE: Solr wiki link broken : You are right. The wiki can't be read if the preferred language is not English. : The wiki system seems to implement or be configured to use a wrong way of choosing its locale. : Erik, let me know if I can help solving this. Interesting. When accessing http://wiki.apache.org/solr/; MoinMoin evidently picks a translated version of the page to show each user based on the Accept-Language header sent by the browser. If it's en or unset, you get the same thing as http://wiki.apache.org/solr/FrontPage -- but if you have some other prefered langauge configured in your browser, then you get a differnet page, for example de causes http://wiki.apache.org/solr/StartSeite to be loaded instead. (this behavior can be forced inspite of the Accept-Language header sent by the browser if you are logged into the wiki and change the Preferred langauge setting from Browser setting to something else ... but i don't recommend it since i was stuck with German for about 10 minutes and got 500 errors every time i tried to change my prefrences back) This is presumably designed to make it easy to support a multilanguage wiki, with users getting langauge specific homepages that can then link out to lanaguge specific versions of pages -- but that doesn't really help us much since we don't have any meaninful content on those langauge specific homepages. According to this... http://wiki.apache.org/solr/HelpOnLanguages ...we should be deleting all those unused pages, or have INFRA change or wiki config so that something other then FrontPage is out default (which now explains why Lunce-Java has FrontPageEN as the default) Any volunteers to help purge the wiki of (effectively) blank translation pages? ... it looks like they all (probably) have have comment ##master-page:FrontPage at the top, so they should be easy to identify even if you don't speak the language ... but they aren't very easy to search for since those comments don't appear in the generated page. -Hoss
Re: Can Solr be forced to return all field tags for a document even if the field is empty?l
This is kind of an unusual request, what higher-level problem are you trying to solve here? Because the field just *isn't there* in the underlying Lucene index for that document. I suppose you could index a not there token and just throw those values out from the response... Erick On Wed, Jan 27, 2010 at 6:19 PM, Turner, Robbin J robbin.j.tur...@boeing.com wrote: I have a field Title and Summary. I've currently not set a default value for the Summary in my schema, it's just a text field with indexed=true and stored=true, but not required. When the data is indexed sometimes the documents don't have a summary so then Solr doesn't index that field. When a query is sent and we get the results for those documents returned, if they did not have a summary then there is no tagged in the xml for that field. Is there a way to have the xml always return the field tags for each document in the result set even if the field has no data? I apologize ahead of time if this has been answered, but after doing a bit of search have not been able to find the answer elsewhere. Thanks Robbin
Re: Help using CachedSqlEntityProcessor
cacheKey and cacheLookup are required attributes . On Thu, Jan 28, 2010 at 12:51 AM, KirstyS kirst...@gmail.com wrote: Thanks. I am on 1.4..so maybe that is the problem. Will try when I get back to work tomorrow. Thanks Rolf Johansson-2 wrote: I recently had issues with CachedSqlEntityProcessor too, figuring out how to use the syntax. After a while, I managed to get it working with cacheKey and cacheLookup. I think this is 1.4 specific though. It seems you have double WHERE clauses, one in the query and one in the where attribute. Try using cacheKey and cacheLookup instead in something like this: entity name=LinkedCategory pk=LinkedCatArticleId query=SELECT LinkedCategoryBC, CmsArticleId as LinkedCatAricleId FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) processor=CachedSqlEntityProcessor cacheKey=LINKEDCATARTICLEID cacheLookup=article.CMSARTICLEID deltaQuery=SELECT LinkedCategoryBC FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) WHERE convert(varchar(50), LastUpdateDate) '${dataimporter.article.last_index_time}' OR convert(varchar(50), PublishDate) '${dataimporter.article.last_index_time}' parentDeltaQuery=SELECT * from vArticleSummaryDetail_SolrSearch (nolock) field column=LinkedCategoryBC name=LinkedCategoryBreadCrumb/ /entity /Rolf Den 2010-01-27 12.36, skrev KirstyS kirst...@gmail.com: Hi, I have looked on the wiki. Using the CachedSqlEntityProcessor looks like it was simple. But I am getting no speed benefit and am not sure if I have even got the syntax correct. I have a main root entity called 'article'. And then I have a number of sub entities. One such entity is as such : entity name=LinkedCategory pk=LinkedCatAricleId query=SELECT LinkedCategoryBC, CmsArticleId as LinkedCatAricleId FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) WHERE convert(varchar(50), CmsArticleId) = convert(varchar(50), '${article.CmsArticleId}') processor=CachedSqlEntityProcessor WHERE=LinkedCatArticleId = article.CmsArticleId deltaQuery=SELECT LinkedCategoryBC FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) WHERE convert(varchar(50), CmsArticleId) = convert(varchar(50), '${article.CmsArticleId}') AND (convert(varchar(50), LastUpdateDate) '${dataimporter.article.last_index_time}' OR convert(varchar(50), PublishDate) '${dataimporter.article.last_index_time}') parentDeltaQuery=SELECT * from vArticleSummaryDetail_SolrSearch (nolock) WHERE convert(varchar(50), CmsArticleId) = convert(varchar(50), '${article.CmsArticleId}') field column=LinkedCategoryBC name=LinkedCategoryBreadCrumb/ /entity As you can see I have added (for the main query - not worrying about the delta queries yet!!) the processor and the 'where' but not sure if it's correct. Can anyone point me in the right direction??? Thanks Kirsty -- View this message in context: http://old.nabble.com/Help-using-CachedSqlEntityProcessor-tp27337635p27345412.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
transformer or filter...which is better
Hi When the same thing can be done using a transformer and a filter which one will be better? and why?? Please help
Re: Can Solr be forced to return all field tags for a document even if the field is empty?l
On 2010-01-28 03:21, Erick Erickson wrote: This is kind of an unusual request, what higher-level problem are you trying to solve here? Because the field just *isn't there* in the underlying Lucene index for that document. I suppose you could index a not there token and just throw those values out from the response... You can also implement a SearchComponent that post-processes results and based on the schema if a field is missing then it adds an empty node to the result. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Solr + MySQL newbie question
I am planning to use Solr to power search on the site. Our db is mysql and we need to index some tables in the schema into Solr. Based on my initial research it appears that I need to write a java program that will create xml documents (say mydocs.xml) with add command and then use this command to index it in Solr java -jar post.jar mydocs.xml. Kindly let me know if this is fine or some other sophiscticated solution exist for mysql synching. -- Manish
How to disable wildcard search
Hi all, How to remove/disable wildcard search in solr. I have no requirement of wildcard. is there any configuration to disable wildcard search in solr. I am using solrj for searching.. thanks With regards Ranveer K Kumar