Solr terms search vs MySql FULLTEXT index and AGAINST
I am using Solr terms for auto suggest and I have 4 millions document in index and Its working fine. I want to know which will be more faster and efficient from 'MySql FULLTEXT index and AGAINST' and Solr terms search. Or Is there any other way in solr for auto suggest. I have separate application server and solr server. So I cannot cross domain request from browser at application server for JSON response. Ref. http://yuilibrary.com/forum/viewtopic.php?p=3203 -- View this message in context: http://www.nabble.com/Solr-terms-search-vs-MySql-FULLTEXT-index-and-AGAINST-tp25658300p25658300.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr and Garbage Collection
> Actually the CPU usage of the solr servers is almost insignificant (it was > like that before). >>The time spent on collecting memory dropped from 11% to 3.81% I even think that 3.81% from 5% is nothing (suspecting that SOLR uses 5% CPU, mostly loading large field values in memory) :))) (would be nice to load-stress-multithreaded except of waiting...) Most Expensive Query: faceting on all fields with generic query like *:*
Fwd: "Only one usage of each socket address" error
Seems like the post in the SolrNet group: http://groups.google.com/group/solrnet/browse_thread/thread/7e3034b626d3e82d?pli=1 helped me get trough. Thanks you solr-user's for helping out too! Steinar Videresendt melding: Fra: Steinar Asbjørnsen Dato: 28. september 2009 17.07.15 GMT+02.00 Til: solr-user@lucene.apache.org Emne: Re: "Only one usage of each socket address" error I'm using the add(MyObject) command form ()in a foreach loop to add my objects to the index. In the catalina-log i cannot see anything that helps me out. It stops at: 28.sep.2009 08:58:40 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[12345]} 0 187 28.sep.2009 08:58:40 org.apache.solr.core.SolrCore execute INFO: [core2] webapp=/solr path=/update params={} status=0 QTime=187 Whitch indicates nothing wrong. Are there any other logs that should be checked? What it seems like to me at the moment is that the foreach is passing objects(documents) to solr faster then solr can add them to the index. As in I'm eventually running out of connections (to solr?) or something. I'm running another incremental update that with other objects where the foreachs isn't quite as fast. This job has added over 100k documents without failing, and still going. Whereas the problematic job fails after ~3k. What I've learned trough the day tho, is that the index where my feed is failing is actually redundant. I.e I'm off the hook for now. Still I'd like to figure out whats going wrong. Steinar There's nothing in that output that indicates something we can help with over in solr-user land. What is the call you're making to Solr? Did Solr log anything anomalous? Erik On Sep 28, 2009, at 4:41 AM, Steinar Asbjørnsen wrote: I just posted to the SolrNet-group since i have the exact same(?) problem. Hope I'm not beeing rude posting here as well (since the SolrNet- group doesn't seem as active as this mailinglist). The problem occurs when I'm running an incremental feed(self made) of a index. My post: [snip] Whats happening is that i get this error message (in VS): "A first chance exception of type 'SolrNet.Exceptions.SolrConnectionException' occurred in SolrNet.DLL" And the web browser (which i use to start the feed says: "System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding." At the time of writing my index contains 15k docs, and "lacks" ~700k docs that the incremental feed should take care of adding to the index. The error message appears after 3k docs are added, and before 4k docs are added. I'm committing each 1%1000==0. In addittion autocommit is set to: 1 More info: From schema.xml: I'm fetching data from a (remote) Sql 2008 Server, using sqljdbc4.jar. And Solr is running on a local Tomcat-installation. SolrNet version: 0.2.3.0 Solr Specification Version: 1.3.0.2009.08.29.08.05.39 [/snip] Any suggestions on how to fix this would be much apreceiated. Regards, Steinar
Showing few results for each category (facet)
Hi, I am looking for a way to do the following in solr: When somebody does a search, I want to show results by category (facet) such that I display 5 results from each category (along with showing the total number of results in each category which I can always do using the facet search). This is kind of an overview of all the search results and user can click on the category to see all the results pertaining to that category (normal facet search with filter). One way that I can think of doing this is by making as many queries as there are categories and show these results under each category. But this will be very inefficient. Is there any way I can do this ? Thanks & Regards, Varun Gupta
Usage of Sort and fq
Hi, Can some one let me know how to use sort and fq parameters in Solr. Any examples woould be appreciated. Regards Bhaskar
Re: Usage of Sort and fq
/?q=*:*&fq:category:animal&sort=child_count%20asc Search for all documents (of animals), and filter the ones that belong to the category "animal" and sort ascending by a field called child_count that contains number of children for each animal. You can pass multiple fq's with more "&fq=..." parameters. Secondary, tertiary sorts can be specified using comma (",") as the separator. i.e. "sort=fieldA asc,fieldB desc, fieldC asc, ..." Cheers Avlesh On Tue, Sep 29, 2009 at 3:51 PM, bhaskar chandrasekar wrote: > Hi, > > Can some one let me know how to use sort and fq parameters in Solr. > Any examples woould be appreciated. > > Regards > Bhaskar > > >
Re: Showing few results for each category (facet)
On Tue, Sep 29, 2009 at 11:36 AM, Varun Gupta wrote: > ... > > One way that I can think of doing this is by making as many queries as there > are categories and show these results under each category. But this will be > very inefficient. Is there any way I can do this ? Hi Varun! I think that doing multiple queries doesn't have to be inefficient, since Solr caches subsequent queries for the same term and facets. Imagine this as your first query: - q: xyz - facets: myfacet and this as a second query: - q:xyz - fq: myfacet=a Compared to the first query, the second query will be very fast, since all the hard work ahs been done in query one and then cached. At least that's my understanding. Please correct me if I'm wrong. Marian
Problem getting Solr home from JNDI in Tomcat
Hi all, I'm having problems getting Solr to start on Tomcat 6. Tomcat is installed in /opt/apache-tomcat , solr is in /opt/apache-tomcat/webapps/solr , and my Solr home directory is /opt/solr . My config file is in /opt/solr/conf/solrconfig.xml . I have a Solr-specific context file in /opt/apache-tomcat/conf/Catalina/localhost/solr.xml which looks like this: But when I start Solr and browse to it, it tells me: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in classpath or 'solr/conf/', cwd=/ at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:194) at org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:162) at org.apache.solr.core.Config.(Config.java:100) at org.apache.solr.core.SolrConfig.(SolrConfig.java:113) at org.apache.solr.core.SolrConfig.(SolrConfig.java:70) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) at org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4356) at org.apache.catalina.manager.ManagerServlet.start(ManagerServlet.java:1244) at org.apache.catalina.manager.HTMLManagerServlet.start(HTMLManagerServlet.java:604) at org.apache.catalina.manager.HTMLManagerServlet.doGet(HTMLManagerServlet.java:129) at javax.servlet.http.HttpServlet.service(HttpServlet.java:690) at javax.servlet.http.HttpServlet.service(HttpServlet.java:803) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:568) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Weirdly, the exact same context file works fine on a different machine. I've tried giving Context a docBase element (both absolute, and relative paths) but it makes no difference -- Solr still isn't seeing the right home directory. I also tried setting debug="1" but didn't see any more useful info anywhere. Any ideas? This is a total show-stopper for me as this is our production server. (Otherwise I'd think about taking it down and hardwiring the Solr home path into the server's context...) Yours hopefully, Andrew. -- View this message in context: http://www.nabble.com/Problem-getting-Solr-home-from-JNDI-in-Tomcat-tp25662200p25662200.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem getting Solr home from JNDI in Tomcat
This might be a bit of a hack but i got this in the web.xml of my applicatin and it works great. solr/home /Solr/WebRoot/WEB-INF/solr java.lang.String On Tue, Sep 29, 2009 at 2:32 PM, Andrew Clegg wrote: > > Hi all, I'm having problems getting Solr to start on Tomcat 6. > > Tomcat is installed in /opt/apache-tomcat , solr is in > /opt/apache-tomcat/webapps/solr , and my Solr home directory is /opt/solr . > My config file is in /opt/solr/conf/solrconfig.xml . > > I have a Solr-specific context file in > /opt/apache-tomcat/conf/Catalina/localhost/solr.xml which looks like this: > > > value="/opt/solr" override="true" /> > allow="128\.40\.46\..*,127\.0\.0\.1" /> > > > But when I start Solr and browse to it, it tells me: > > java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in > classpath or 'solr/conf/', cwd=/ at > > org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:194) > at > > org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:162) > at org.apache.solr.core.Config.(Config.java:100) at > org.apache.solr.core.SolrConfig.(SolrConfig.java:113) at > org.apache.solr.core.SolrConfig.(SolrConfig.java:70) at > > org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) > at > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) > at > > org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) > at > > org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) > at > > org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108) > at > > org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709) > at > org.apache.catalina.core.StandardContext.start(StandardContext.java:4356) > at > org.apache.catalina.manager.ManagerServlet.start(ManagerServlet.java:1244) > at > > org.apache.catalina.manager.HTMLManagerServlet.start(HTMLManagerServlet.java:604) > at > > org.apache.catalina.manager.HTMLManagerServlet.doGet(HTMLManagerServlet.java:129) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:690) at > javax.servlet.http.HttpServlet.service(HttpServlet.java:803) at > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) > at > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) > at > > org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525) > at > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:568) > at > > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) > at > > org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) > at > > org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) > at > > org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) > at > > org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) > at > > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) > at > > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) > at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) > at java.lang.Thread.run(Thread.java:619) > > Weirdly, the exact same context file works fine on a different machine. > I've > tried giving Context a docBase element (both absolute, and relative paths) > but it makes no difference -- Solr still isn't seeing the right home > directory. I also tried setting debug="1" but didn't see any more useful > info anywhere. > > Any ideas? This is a total show-stopper for me as this is our production > server. (Otherwise I'd think about taking it down and hardwiring the Solr > home path into the server's context...) > > Yours hopefully, > > Andrew. > > -- > View this message in context: > http://www.nabble.com/Problem-getting-Solr-home-from-JNDI-in-Tomcat-tp25662200p25662200.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Problem with Wildcard...
Hi Users... i have a Problem I have a lot of fields, (type=text) for search in all fields i copy all fields in the default text field and use this for default search. Now i will search... This is into a Field "RI-MC500034-1" when i search "RI-MC500034-1" i found it... if i seacht "RI-MC5000*" i dosen´t when i search "500034" i found it... if i seacht "5000*" i dosen´t what can i do to use the Wildcards? KingArtus
Re: Measuring timing with debugQuery=true
Sorry for the delayed response ** *How big are your documents?* I have totally 1 million documents. I have totally 1950 fields in the index. Every document would probably have values for around 20 - 50 fields. *What is the total size of the index?* 1 GB *What's the amout of RAM on your box? How big is the JVM heap (and how much free memory is left on your system)?* I have 4 GB RAM. I am using Weblogic 10, 32 Bit. Since it is a windows box, I am able to allocate only 1 GB to the JVM. No other applications are running on the system. So the entire 4GB is at the disposal of the application. I am simulating load using a load tool (15 users) *Can you show what this slow query looks like (the whole request)?* q=*%3A*&rows=0&facet=true&facet.mincount=1&facet.limit=2&f.S9942.facet.limit=100&facet.field=S9942&facet.field=S6878&facet.field=S9156&facet.field=S0369&facet.field=S9926&facet.field=S1421&facet.field=S8990&facet.field=S6881&facet.field=S3552&debugQuery=true q=*%3A*&fq=S9942%3A%22TEXAS+INSTRUMENTS%22&rows=0&facet=true&facet.mincount=1&facet.limit=2&facet.field=S9942&facet.field=S6878&facet.field=S9156&facet.field=S0369&facet.field=S9926&facet.field=S1421&facet.field=S8990&facet.field=S6881&facet.field=S3552&debugQuery=true Other information Solr 1.3, JDK 1.5.0_14 regards Rahul On Mon, Sep 28, 2009 at 6:48 PM, Yonik Seeley wrote: > On Mon, Sep 28, 2009 at 7:51 AM, Rahul R wrote: > > Yonik, > > I understand that the network can be a bottle-neck but I am pretty sure > that > > it is not. I am operating on a 100 MBPS intranet... How do I ensure > that > > stored fields are cached by the OS ? Only the Solr caches within the JVM > are > > under my control.. The result set has around 10K documents of which I > am > > retrieving only 10..I am displaying a max of only 3 fields per > document > > in my result set. Can the reading time for these stored fields be so long > ? > > It could be a seek per document if the index is too big to fit in the > OS cache - but that still wouldn't be as slow as you report. > Something is fishy here. > > How big are your documents? > What is the total size of the index? > What's the amout of RAM on your box? > How big is the JVM heap (and how much free memory is left on your system)? > Can you show what this slow query looks like (the whole request)? > > > I have totally around 1 million documents in my index Any > thoughts > > on why the FacetComponent does not take any time while the QueryComponent > > takes around 2.4s. > > It could be a field that has very few unique values and faceting just > completes quickly. > Make sure you're actually getting faceting data back (that it's > correctly turned on). > > -Yonik > http://www.lucidimagination.com > > > I am doing a faceted and keyword query ie I have both 'q' > > and 'fq' params in my query Thank you for your response. > > > > Regards > > Rahul > > > > On Mon, Sep 28, 2009 at 1:20 AM, Yonik Seeley < > yo...@lucidimagination.com> > > wrote: > >> > >> The response times in a Solr request don't include the time to read > >> stored fields (since the response is streamed) and doesn't include the > >> time to transfer/read the response (which can be increased by a > >> slow/congested network link, or a slow client that doesn't read the > >> response immediately). > >> > >> How many documents are you retrieving? Reading stored fields for > >> documents can be slow if they aren't cached by the OS since it's often > >> a disk seek per document read for a large index. > >> > >> -Yonik > >> http://www.lucidimagination.com > >> > >> > >> > >> On Sun, Sep 27, 2009 at 3:41 PM, Rahul R wrote: > >> > Hello, > >> > I am trying to measure why some of my queries take a long time. I am > >> > using > >> > EmbeddedSolrServer and with logging statements before and > >> > after the EmbeddedSolrServer.query(SolrQuery) function, I have found > the > >> > time to be around 16s. I added the debugQuery=true and the timing > >> > component > >> > for this reads as following: > >> > > >> > * > >> > > >> > > timing:{time=2438.0,prepare={time=0.0,org.apache.solr.handler.component.QueryComponent={time=0.0},org.apache.solr.handler.component.FacetComponent={time=0.0},org.apache.solr.handler.component.MoreLikeThisComponent={time=0.0},org.apache.solr.handler.component.HighlightComponent={time=0.0},org.apache.solr.handler.component.DebugComponent={time=0.0}},process={time=2438.0,org.apache.solr.handler.component.QueryComponent={time=2438.0},org.apache.solr.handler.component.FacetComponent={time=0.0},org.apache.solr.handler.component.MoreLikeThisComponent={time=0.0},org.apache.solr.handler.component.HighlightComponent={time=0.0},org.apache.solr.handler.component.DebugComponent={time=0.0}}} > >> > * > >> > > >> > As you can see, this shows only 2.4s being used by the query. I can't > >> > seem > >> > to figure out where the rest of the time is being spent. This is > within > >> > my > >> > office intranet and I don't think the request-res
Re: FileNotFoundException in Java replication handler backups
On Tue, Sep 29, 2009 at 3:19 AM, Mark Miller wrote: > Looks like a bug to me. I don't see the commit point being reserved in > the backup code - which means its likely be removed before its done > being copied. Gotto reserve it using the delete policy to keep around > for the full backup duration. I'd file a JIRA issue. > > Definitely a bug. Chris, please open an issue. I'll try to work up a patch. -- Regards, Shalin Shekhar Mangar.
Re: Measuring timing with debugQuery=true
I just want to clarify here that I understand my memory allocation might be less given the load on the system. The response times were only slightly better when we ran the test on a Solaris box with 12CPU, 24G RAM and with 3.2 GB allocated for the JVM. I know that I have a performance problem. My main concern is to identify the reasons for the inconsistency between the timing information shown between the debugQuery output (2.4s) and the entire time taken by the EmbeddedSolrServer.query(SolrQuery) function (16s). I feel that if I can find out where the remaining 13.6s gets used, then I can look to improve accordingly. Thank you. Regards Rahul On Tue, Sep 29, 2009 at 7:12 PM, Rahul R wrote: > Sorry for the delayed response > ** > *How big are your documents?* > I have totally 1 million documents. I have totally 1950 fields in the > index. Every document would probably have values for around 20 - 50 fields. > *What is the total size of the index?* > 1 GB > > *What's the amout of RAM on your box? How big is the JVM heap (and how > much free memory is left on your system)?* > I have 4 GB RAM. I am using Weblogic 10, 32 Bit. Since it is a windows box, > I am able to allocate only 1 GB to the JVM. No other applications are > running on the system. So the entire 4GB is at the disposal of the > application. I am simulating load using a load tool (15 users) > > *Can you show what this slow query looks like (the whole request)?* > > q=*%3A*&rows=0&facet=true&facet.mincount=1&facet.limit=2&f.S9942.facet.limit=100&facet.field=S9942&facet.field=S6878&facet.field=S9156&facet.field=S0369&facet.field=S9926&facet.field=S1421&facet.field=S8990&facet.field=S6881&facet.field=S3552&debugQuery=true > > > q=*%3A*&fq=S9942%3A%22TEXAS+INSTRUMENTS%22&rows=0&facet=true&facet.mincount=1&facet.limit=2&facet.field=S9942&facet.field=S6878&facet.field=S9156&facet.field=S0369&facet.field=S9926&facet.field=S1421&facet.field=S8990&facet.field=S6881&facet.field=S3552&debugQuery=true > > Other information > Solr 1.3, JDK 1.5.0_14 > > regards > Rahul > > On Mon, Sep 28, 2009 at 6:48 PM, Yonik Seeley < > yo...@lucidimagination.com> wrote: > >> On Mon, Sep 28, 2009 at 7:51 AM, Rahul R wrote: >> > Yonik, >> > I understand that the network can be a bottle-neck but I am pretty sure >> that >> > it is not. I am operating on a 100 MBPS intranet... How do I ensure >> that >> > stored fields are cached by the OS ? Only the Solr caches within the JVM >> are >> > under my control.. The result set has around 10K documents of which >> I am >> > retrieving only 10..I am displaying a max of only 3 fields per >> document >> > in my result set. Can the reading time for these stored fields be so >> long ? >> >> It could be a seek per document if the index is too big to fit in the >> OS cache - but that still wouldn't be as slow as you report. >> Something is fishy here. >> >> How big are your documents? >> What is the total size of the index? >> What's the amout of RAM on your box? >> How big is the JVM heap (and how much free memory is left on your system)? >> Can you show what this slow query looks like (the whole request)? >> >> > I have totally around 1 million documents in my index Any >> thoughts >> > on why the FacetComponent does not take any time while the >> QueryComponent >> > takes around 2.4s. >> >> It could be a field that has very few unique values and faceting just >> completes quickly. >> Make sure you're actually getting faceting data back (that it's >> correctly turned on). >> >> -Yonik >> http://www.lucidimagination.com >> >> > I am doing a faceted and keyword query ie I have both 'q' >> > and 'fq' params in my query Thank you for your response. >> > >> > Regards >> > Rahul >> > >> > On Mon, Sep 28, 2009 at 1:20 AM, Yonik Seeley < >> yo...@lucidimagination.com> >> > wrote: >> >> >> >> The response times in a Solr request don't include the time to read >> >> stored fields (since the response is streamed) and doesn't include the >> >> time to transfer/read the response (which can be increased by a >> >> slow/congested network link, or a slow client that doesn't read the >> >> response immediately). >> >> >> >> How many documents are you retrieving? Reading stored fields for >> >> documents can be slow if they aren't cached by the OS since it's often >> >> a disk seek per document read for a large index. >> >> >> >> -Yonik >> >> http://www.lucidimagination.com >> >> >> >> >> >> >> >> On Sun, Sep 27, 2009 at 3:41 PM, Rahul R wrote: >> >> > Hello, >> >> > I am trying to measure why some of my queries take a long time. I am >> >> > using >> >> > EmbeddedSolrServer and with logging statements before and >> >> > after the EmbeddedSolrServer.query(SolrQuery) function, I have found >> the >> >> > time to be around 16s. I added the debugQuery=true and the timing >> >> > component >> >> > for this reads as following: >> >> > >> >> > * >> >> > >> >> > >> timing:{time=2438.0,prepare={time=0.0,org.apache.solr.handler.
Re: Problem getting Solr home from JNDI in Tomcat
Constantijn Visinescu wrote: > > This might be a bit of a hack but i got this in the web.xml of my > applicatin > and it works great. > > > >solr/home >/Solr/WebRoot/WEB-INF/solr >java.lang.String > > > That worked, thanks. You're right though, it is a bit of a hack -- I'd prefer to set the path from *outside* the app so it won't get overwritten when I upgrade. Now I've got a completely different error: "org.apache.lucene.index.CorruptIndexException: Unknown format version: -9". I think it might be time for a fresh install... Cheers, Andrew. -- View this message in context: http://www.nabble.com/Problem-getting-Solr-home-from-JNDI-in-Tomcat-tp25662200p25663931.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Measuring timing with debugQuery=true
It's harder debugging stuff like this with custom code (you say that you're using EmbeddedSolrServer) and different servlet containers. Perahps try putting your config files and index into the example jetty server, and then do a single request from curl or your web browser to see if the times are still long. -Yonik http://www.lucidimagination.com On Tue, Sep 29, 2009 at 9:42 AM, Rahul R wrote: > Sorry for the delayed response > > How big are your documents? > I have totally 1 million documents. I have totally 1950 fields in the index. > Every document would probably have values for around 20 - 50 fields. > What is the total size of the index? > 1 GB > What's the amout of RAM on your box? How big is the JVM heap (and how much > free memory is left on your system)? > I have 4 GB RAM. I am using Weblogic 10, 32 Bit. Since it is a windows box, > I am able to allocate only 1 GB to the JVM. No other applications are > running on the system. So the entire 4GB is at the disposal of the > application. I am simulating load using a load tool (15 users) > Can you show what this slow query looks like (the whole request)? > q=*%3A*&rows=0&facet=true&facet.mincount=1&facet.limit=2&f.S9942.facet.limit=100&facet.field=S9942&facet.field=S6878&facet.field=S9156&facet.field=S0369&facet.field=S9926&facet.field=S1421&facet.field=S8990&facet.field=S6881&facet.field=S3552&debugQuery=true > q=*%3A*&fq=S9942%3A%22TEXAS+INSTRUMENTS%22&rows=0&facet=true&facet.mincount=1&facet.limit=2&facet.field=S9942&facet.field=S6878&facet.field=S9156&facet.field=S0369&facet.field=S9926&facet.field=S1421&facet.field=S8990&facet.field=S6881&facet.field=S3552&debugQuery=true > > Other information > Solr 1.3, JDK 1.5.0_14 > > regards > Rahul > > On Mon, Sep 28, 2009 at 6:48 PM, Yonik Seeley > wrote: >> >> On Mon, Sep 28, 2009 at 7:51 AM, Rahul R wrote: >> > Yonik, >> > I understand that the network can be a bottle-neck but I am pretty sure >> > that >> > it is not. I am operating on a 100 MBPS intranet... How do I ensure >> > that >> > stored fields are cached by the OS ? Only the Solr caches within the JVM >> > are >> > under my control.. The result set has around 10K documents of which >> > I am >> > retrieving only 10..I am displaying a max of only 3 fields per >> > document >> > in my result set. Can the reading time for these stored fields be so >> > long ? >> >> It could be a seek per document if the index is too big to fit in the >> OS cache - but that still wouldn't be as slow as you report. >> Something is fishy here. >> >> How big are your documents? >> What is the total size of the index? >> What's the amout of RAM on your box? >> How big is the JVM heap (and how much free memory is left on your system)? >> Can you show what this slow query looks like (the whole request)? >> >> > I have totally around 1 million documents in my index Any >> > thoughts >> > on why the FacetComponent does not take any time while the >> > QueryComponent >> > takes around 2.4s. >> >> It could be a field that has very few unique values and faceting just >> completes quickly. >> Make sure you're actually getting faceting data back (that it's >> correctly turned on). >> >> -Yonik >> http://www.lucidimagination.com >> >> > I am doing a faceted and keyword query ie I have both 'q' >> > and 'fq' params in my query Thank you for your response. >> > >> > Regards >> > Rahul >> > >> > On Mon, Sep 28, 2009 at 1:20 AM, Yonik Seeley >> > >> > wrote: >> >> >> >> The response times in a Solr request don't include the time to read >> >> stored fields (since the response is streamed) and doesn't include the >> >> time to transfer/read the response (which can be increased by a >> >> slow/congested network link, or a slow client that doesn't read the >> >> response immediately). >> >> >> >> How many documents are you retrieving? Reading stored fields for >> >> documents can be slow if they aren't cached by the OS since it's often >> >> a disk seek per document read for a large index. >> >> >> >> -Yonik >> >> http://www.lucidimagination.com >> >> >> >> >> >> >> >> On Sun, Sep 27, 2009 at 3:41 PM, Rahul R wrote: >> >> > Hello, >> >> > I am trying to measure why some of my queries take a long time. I am >> >> > using >> >> > EmbeddedSolrServer and with logging statements before and >> >> > after the EmbeddedSolrServer.query(SolrQuery) function, I have found >> >> > the >> >> > time to be around 16s. I added the debugQuery=true and the timing >> >> > component >> >> > for this reads as following: >> >> > >> >> > * >> >> > >> >> > >> >> > timing:{time=2438.0,prepare={time=0.0,org.apache.solr.handler.component.QueryComponent={time=0.0},org.apache.solr.handler.component.FacetComponent={time=0.0},org.apache.solr.handler.component.MoreLikeThisComponent={time=0.0},org.apache.solr.handler.component.HighlightComponent={time=0.0},org.apache.solr.handler.component.DebugComponent={time=0.0}},process={time=2438.0,org.apache.solr.handler.component.QueryComponent=
${dataimporter.last_index_time} as an argument to newerThan in FileListEntityProcessor?
Is this possible? I can't figure out a syntax that works, and all the examples show using last_index_time as an argument to an SQL query. -- Bill Dueber Library Systems Programmer University of Michigan Library
RE: Question on Access or viewing TermFrequency Vector via SOLR.
Grant, Thanks for the link. Based on the example, I think this is what I need. If effeciency is a problem, I will consider it. I see the note that tv.df can be expensive. I guess it all depends on how big the collection is. I'm a proponent of not reinvientin the wheel if it has already been invented And can be easily integrated into my task. I looked at the TermVecotrComponentExampleEnabled (Example output) and It looks like it is what I needed. -Peter > -Original Message- > From: Grant Ingersoll [mailto:gsing...@apache.org] > Sent: Monday, September 28, 2009 6:17 PM > To: solr-user@lucene.apache.org > Subject: Re: Question on Access or viewing TermFrequency > Vector via SOLR. > > > http://wiki.apache.org/solr/TermVectorComponent. You may > want to hack > in your own capabilities to implement your own TermVectorMapper for > efficiency reasons. > > On Sep 28, 2009, at 5:05 PM, Thung, Peter C CIV > SPAWARSYSCEN-PACIFIC, > 56340 wrote: > > > Mark, > > > > Thanks. I think this may be partially what I need. > > > > Basically, what I'm trying to figure out is the following > > If someone enters a keyword say > > Apple. > > I would like to find all the documents that have the word apple In > > them, and then for each document, the number of times it showed > > up in > > each > > Document. > > > > From the link you sent, (assuming I understand it > correctly), With the > > field name "name", it has the terms (values) within the field name > > "name" Of 1, 11, 120, 133, 184, etc.. With the respective counts of > > how many documents that match the term. (I have to wonder if it > > multiply counts documents if the term is in a document more > than once. > > > > It does not tell me which document matched a specific term, or the > > number of terms that are in a specific document, correct? > > > > > > -Peter > > > > > > > > ** > > Peter Thung > > Software Developer > > IBS Project Technical Lead -Web Developer > > > > Code 56340 - Net-centric ISR Development Branch > > Joint & National ISR Systems Division > > Inteligence, Surveillance and Reconnaissance Department > > US Navy Space & Naval Warfare Systems Center Pacific (SSC > PAC) Topside > > Campus, Bldg A33, room 0055 53560 Hull Street, San Diego, CA 92152 > > > > UNCLASS Email: peter.th...@navy.mil > > SIPRNET Email: thu...@spawar.navy.smil.mil > > COMM (Primary): (619) 553-6513 > > COMM (Secondary):(619) 553-0777 > > FAX: (619) 553-1586 > > ** > > > > > > > >> -Original Message- > >> From: Mark Miller [mailto:markrmil...@gmail.com] > >> Sent: Monday, September 28, 2009 1:50 PM > >> To: solr-user@lucene.apache.org > >> Subject: Re: Question on Access or viewing TermFrequency > Vector via > >> SOLR. > >> > >> > >> Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340 wrote: > >>> is there a SOLR query that can access or view the > >> TermFrequencies for > >>> the various documents discovered, Or is the only wya to > >>> programmatically access this information. If so could > someon share > >>> an example and maybe a link for > >> information on > >>> how to do this? > >>> Some sample queries? > >>> > >>> Thank you in advance. > >>> > >>> > >>> -Peter > >>> > >>> > >>> > >>> > >>> > >> Close I can think of is: http://wiki.apache.org/solr/TermsComponent > >> > >> -- > >> - Mark > >> > >> http://www.lucidimagination.com > >> > >> > >> > >> > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using Solr/Lucene: > http://www.lucidimagination.com/search > >
Re: Create new core on the fly
Hi, We are also facing the same issue. Is the LOAD action implemented yet? If not then what should we do to achieve the same functionality? Thanks, djain ryantxu wrote: > > The LOAD method will load a core from a schema/config file -- it will > not need to be in multicore.xml (the persist=true option should > serialize this change into multicore.xml) > > Henri's latest patch implements LOAD, but it needs some clean up to > apply cleanly to the current trunk. > > ryan > > > Doug Steigerwald wrote: >> Is it going to be possible (soon) to register new Solr cores on the >> fly? I know the LOAD action is yet to be implemented, but will that let >> you create new cores that are not listed in the multicore.xml? We're >> occasionally going to have to create new cores and would like to not >> have to stop/start Solr do to do this. >> >> We want to be able to create the core structure on the filesystem and >> register that core, or make changes to the multicore.xml file and tell >> Solr to reload the cores and pick up the new ones. >> >> Thanks. >> Doug >> > > > -- View this message in context: http://www.nabble.com/Create-new-core-on-the-fly-tp14585788p25666388.html Sent from the Solr - User mailing list archive at Nabble.com.
[ANN] Carrot2 version 3.1.0 released
Dear All, [Apologies for cross-posting.] This is just to let you know that we've released version 3.1.0 of Carrot2 Search Results Clustering Engine. The 3.1.0 release comes with: * Experimental support for clustering Chinese Simplified content (based on Lucene's Smart Chinese Analyzer) * Document Clustering Workbench usability improvements * Suffix Tree Clustering algorithm rewritten for better performance and clustering quality * Apache Solr clustering plugin (to be available in Solr 1.4, Grant's blog post: http://www.lucidimagination.com/blog/2009/09/28/solrs-new-clustering-capabilities/ ) Release notes: http://project.carrot2.org/release-3.1.0-notes.html On-line demo: http://search.carrot2.org Download: http://download.carrot2.org Project website: http://project.carrot2.org Thanks, Staszek -- Stanislaw Osinski, http://carrot2.org
Index backup with new replication?
Hey, I noticed with new in-process replication, it is not as straightforward to have (production serving) solr index snapshots for backup (it used to be a natural byproduct of the snapshot taking process.) I understand there are some command-line utilities for this (abc..) Can someone please explain how to use these to take a snapshot of a solr index, assuming it is being used in production? what are some guidelines? should I stop other processes that might be issuing updates and/or comitts while taking it or is it atomic (e.g hard link )? would be nice to have this in wiki too i think for the benefit of other users, having regular backup snapshots seems critical.. Thanks, -Chak -- View this message in context: http://www.nabble.com/Index-backup-with-new-replication--tp25667145p25667145.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ${dataimporter.last_index_time} as an argument to newerThan in FileListEntityProcessor?
On Tue, Sep 29, 2009 at 8:14 PM, Bill Dueber wrote: > Is this possible? I can't figure out a syntax that works, and all the > examples show using last_index_time as an argument to an SQL query. > > It is possible but it doesn't work right now. I've created an issue and I will give a patch shortly. https://issues.apache.org/jira/browse/SOLR-1473 -- Regards, Shalin Shekhar Mangar.
Re: Create new core on the fly
On Tue, Sep 29, 2009 at 10:01 PM, djain101 wrote: > > Is the LOAD action implemented yet? > Yes, see http://wiki.apache.org/solr/CoreAdmin -- Regards, Shalin Shekhar Mangar.
Re: Create new core on the fly
Thanks Shalin for quick response. On the wiki link you mentioned, it is saying "not implemented yet!". Can you please confirm again? If yes, then in which release it is available? Appreciate your quick response. Regards, Dharmveer Shalin Shekhar Mangar wrote: > > On Tue, Sep 29, 2009 at 10:01 PM, djain101 > wrote: > >> >> Is the LOAD action implemented yet? >> > > Yes, see http://wiki.apache.org/solr/CoreAdmin > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://www.nabble.com/Create-new-core-on-the-fly-tp14585788p25669128.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Usage of Sort and fq
A description and examples of both parameters can be found here: http://wiki.apache.org/solr/CommonQueryParameters Thanks, Matt Weber On Sep 29, 2009, at 4:10 AM, Avlesh Singh wrote: /?q=*:*&fq:category:animal&sort=child_count%20asc Search for all documents (of animals), and filter the ones that belong to the category "animal" and sort ascending by a field called child_count that contains number of children for each animal. You can pass multiple fq's with more "&fq=..." parameters. Secondary, tertiary sorts can be specified using comma (",") as the separator. i.e. "sort=fieldA asc,fieldB desc, fieldC asc, ..." Cheers Avlesh On Tue, Sep 29, 2009 at 3:51 PM, bhaskar chandrasekar wrote: Hi, Can some one let me know how to use sort and fq parameters in Solr. Any examples woould be appreciated. Regards Bhaskar
Re: Create new core on the fly
On Wed, Sep 30, 2009 at 12:42 AM, djain101 wrote: > > Thanks Shalin for quick response. On the wiki link you mentioned, it is > saying "not implemented yet!". Can you please confirm again? If yes, then > in > which release it is available? > Ah, I'm sorry. You are right. Load is not implemented yet. The other way to achieve this is with the "create" and "unload" commands but you do have to specify the instanceDir, config, schema and dataDir for "create". There is some work in progress towards this feature which is targeted for 1.5 - see http://wiki.apache.org/solr/LotsOfCores -- Regards, Shalin Shekhar Mangar.
Re: Showing few results for each category (facet)
So, you want to display 5 results from each category and still know how many results are in each category. This is a perfect situation for the field collapsing patch: https://issues.apache.org/jira/browse/SOLR-236 http://wiki.apache.org/solr/FieldCollapsing Here is how I would do it. Add a field to your schema called category or whatever. Then while indexing you populate that field with whatever category the document belongs in. While executing a search, collapse the results on that field with a max collapse of 5. This will give you at most 5 results per category. Now, at the same time enable faceting on that field and DO NOT use the collapsing parameter to recount the facet vales. This means that the facet counts will be reflect the non-collapsed results. This facet should only be used to get the count for each category, not displayed to the user. On your search results page that gets the collapsed results, you can put a link that says "Show all X results from this category" where X is the value you pull out of the facet. When a user clicks that link you basically do the same search with field collapsing disabled, and a filter query on the specific category they want to see, for example: &fq=category:people. Hope this helps. Thanks, Matt Weber On Sep 29, 2009, at 4:55 AM, Marian Steinbach wrote: On Tue, Sep 29, 2009 at 11:36 AM, Varun Gupta wrote: ... One way that I can think of doing this is by making as many queries as there are categories and show these results under each category. But this will be very inefficient. Is there any way I can do this ? Hi Varun! I think that doing multiple queries doesn't have to be inefficient, since Solr caches subsequent queries for the same term and facets. Imagine this as your first query: - q: xyz - facets: myfacet and this as a second query: - q:xyz - fq: myfacet=a Compared to the first query, the second query will be very fast, since all the hard work ahs been done in query one and then cached. At least that's my understanding. Please correct me if I'm wrong. Marian
Re: Index backup with new replication?
The documentation could maybe be improved, but the basics of backup snapshots with the in-process (Java-based) replication handler actually seem pretty straightforward to me, now that I understand it: 1. You can make a snapshot whenever you want by hitting http://master_host:port/solr/replication?command=backup 2. You can have automatically triggered snapshots at commit time or optimize time by putting a backupAfter tag in the replication handler section of your solrconfig.xml. (See http://wiki.apache.org/solr/SolrReplication) In neither case do you need to stop Solr or stop modifying your index while the backup is in progress. Does anything in particular seem not straightforward? I guess there's no built-in way to purge old indexes from disk; that's a little inconvenient. If you want to use the command-line tools, I think those should be totally compatible with the new (Java) replication tools. I don't know as much about them, though. 2009/9/29 KaktuChakarabati : > > Hey, > I noticed with new in-process replication, it is not as straightforward to > have > (production serving) solr index snapshots for backup (it used to be a > natural byproduct > of the snapshot taking process.) > I understand there are some command-line utilities for this (abc..) > Can someone please explain how to use these to take a snapshot > of a solr index, assuming it is being used in production? what are some > guidelines? should I stop > other processes that might be issuing updates and/or comitts while taking it > or is it atomic (e.g hard link )? > > would be nice to have this in wiki too i think for the benefit of other > users, > having regular backup snapshots seems critical.. > > Thanks, > -Chak > -- > View this message in context: > http://www.nabble.com/Index-backup-with-new-replication--tp25667145p25667145.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: Questions on RandomSortField
: The question was either non-trivial or heavily uninteresting! No replies yet it's pretty non-trivial, and pretty interesting, but i'm also pretty behind on my solr-user email. I don't think there's anyway to do what you wanted without a custom plugin, so your efforts weren't in vain ... if we add the abiliity to sort by a ValueSource (aka function ... there's a Jira issue for this somewhere) then you could also do witha combination of functions so that anything in your category gets flattened to an extremely high constant, and everything else has a real score -- then a secondary sort on a random field would effectively only randomize the things in your category ... but we're not there yet. : Hoss, I have a small question (RandomSortField bears your signature) - Any : reason as to why RandomSortField#hash() and RandomSortField#getSeed() : methods are private? Having them public would have saved myself from : "owning" a copy in my class as well. just a general principle of API future-proofing: keep internals private unless you explicitly think through how subclasses will use them. I haven't thought it through all the way, but do you really need to copy everything? couldn't you get the SortField/Comparator from super and only delegate to it if the categories both match your specific categoryId? -Hoss
Re: XSD for Solr Response Format Version 2.2
: I am working on an XSD document for all the types in the response xml : version 2.2 : : Do you think there is a need for this? we haven't had one yet, and it doesn't seem like it's really caused any problems for people (plus the lack of response to this question suggests no one is super excited about it) but that doesn't mean it wouldn't be useful if you want to submit it. -Hoss
Re: Create new core on the fly
Hi Shalin, Can you please elaborate, why we need to do unload after create? So, if we do a create, will it modify the solr.xml everytime? Can it be avoided in subsequent requests for create? Also, if we want to implement Load, can you please give some directions to implement load action? Thanks, Dharmveer Shalin Shekhar Mangar wrote: > > On Wed, Sep 30, 2009 at 12:42 AM, djain101 > wrote: > > Ah, I'm sorry. You are right. Load is not implemented yet. The other way > to > achieve this is with the "create" and "unload" commands but you do have to > specify the instanceDir, config, schema and dataDir for "create". > > There is some work in progress towards this feature which is targeted for > 1.5 - see http://wiki.apache.org/solr/LotsOfCores > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://www.nabble.com/Create-new-core-on-the-fly-tp14585788p25671905.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query performance
: Does the following query has any performance impact over : the second query? : +title:lucene +(title:lucene -name:sid) : +(title:lucene -name:sid) the second should in theory be faster then the first just because of reduced number of comparisons needed -- but wether or not you would actually notice a difference is mainly going to depend on your data. -Hoss
Re: How to configure Solr 1.3 on Websphere 6.1
: I have been trying to deploy Solr on websphere but no luck yet. : I was trying to deploy the war file under "dist" folder, but I kept getting : errors. (recent one is that it couldn't find the configuration file). When I Did you start by going through the tutorial using the instance of jetty included in the release? your error was most likelye related to the solrconfig.xml and schema.xml files -- which are not included in the war. they are specific to *your* use cases, not part of the app. examples are provided, and the tutorial shows you where to find them and how to use them. try the tutorial, get up to speed using jettty, check out the wiki, and then you'll probably find it much easier to make sense of running solr in various servlet containers... http://lucene.apache.org/solr/tutorial.html http://wiki.apache.org/solr/SolrInstall -Hoss
Re: Multiple DisMax Queries spanning across multiple fields
: For a particular requirement we have - we need to do a query that is a : combination of multiple dismax queries behind the scenes. (Using solr 1.4 : nightly ). ... : Creating a custom QParser works right away as below. ... : Curious to see if we have an alternate method to implement the same / any : other alternate suggestions to the problem itself. if your sets of q params are coming from the same source as your sets of qf params (ie: some complicated client code) then i probably would have just written a parser that had special markup for indicating a DisMaxQuery and let the client pass a complex string (that way you won't have to worry about changing the logic in your QParser if you get a request to structure the DisMaxQeries in a diffenret super query, the client can just do the restrcuturing). But if you're still in the spirit of the dismaax handler (query strings come from clients, qf and pf come from index owner) then i think you made teh right call. -Hoss
Re: Get access to CoreContainer
Yah, I just found it, and was going to reply to my own message with that exactly! My next question is how to get the port the request was on? On Tue, Sep 29, 2009 at 4:01 PM, Mark Miller wrote: > Jason Rutherglen wrote: >> Howdy, >> >> I was wondering what the best way is to access the current >> instance of CoreContainer? It seems like the only way to do this >> is to extend CoreAdminHandler. I'd prefer a way via a way to >> access CoreContainer from SolrCore or RequestHandlerBase. >> >> The use case is, I want to implement a SearchHandler that by >> default, searches all of the local cores by automatically >> inserting a shards param of the form >> "localhost:8080/solr/core0,localhost:8080/solr/core1" into the >> request. I'll be dynamically creating and unloading cores and so >> do not want to edit solrconfig each time a core changes. >> >> Thanks! >> > SolrCore.getCoreDescriptor().getCoreContainer() > > -- > - Mark > > http://www.lucidimagination.com > > > >
Re: Get access to CoreContainer
Jason Rutherglen wrote: > Howdy, > > I was wondering what the best way is to access the current > instance of CoreContainer? It seems like the only way to do this > is to extend CoreAdminHandler. I'd prefer a way via a way to > access CoreContainer from SolrCore or RequestHandlerBase. > > The use case is, I want to implement a SearchHandler that by > default, searches all of the local cores by automatically > inserting a shards param of the form > "localhost:8080/solr/core0,localhost:8080/solr/core1" into the > request. I'll be dynamically creating and unloading cores and so > do not want to edit solrconfig each time a core changes. > > Thanks! > SolrCore.getCoreDescriptor().getCoreContainer() -- - Mark http://www.lucidimagination.com
Get access to CoreContainer
Howdy, I was wondering what the best way is to access the current instance of CoreContainer? It seems like the only way to do this is to extend CoreAdminHandler. I'd prefer a way via a way to access CoreContainer from SolrCore or RequestHandlerBase. The use case is, I want to implement a SearchHandler that by default, searches all of the local cores by automatically inserting a shards param of the form "localhost:8080/solr/core0,localhost:8080/solr/core1" into the request. I'll be dynamically creating and unloading cores and so do not want to edit solrconfig each time a core changes. Thanks!
Re: Sorting/paging problem
: 2009-09-23T19:25:03.400Z : : 2009-09-23T19:25:19.951 : : 2009-09-23T20:10:07.919Z is that a cut/paste error, or did you really get a date back from Solr w/o the trailing "Z" ?!?!?! ... : So, not only is the date sorting wrong, but the exact same document : shows up on the next page, also still out of date order. I've seen the : same document show up in 4-5 pages in some cases. It's always the last : record on the page, too. If I change the page size, the problem seems to that is really freaking weird. can you reproduce this in a simple example? maybe an index that's small enough (and doesn't contain confidential information) that you could zip up and post online? -Hoss
Re: Get access to CoreContainer
Unfortunately, because they don't want you counting on access to the servlet request due to embedded Solr and what not, to get that type of info you have to override and use your own SolrDispatchFilter: protected void execute( HttpServletRequest req, SolrRequestHandler handler, SolrQueryRequest sreq, SolrQueryResponse rsp) { // a custom filter could add more stuff to the request before passing it on. // for example: sreq.getContext().put( "HttpServletRequest", req ); Jason Rutherglen wrote: > Yah, I just found it, and was going to reply to my own message with > that exactly! > > My next question is how to get the port the request was on? > > On Tue, Sep 29, 2009 at 4:01 PM, Mark Miller wrote: > >> Jason Rutherglen wrote: >> >>> Howdy, >>> >>> I was wondering what the best way is to access the current >>> instance of CoreContainer? It seems like the only way to do this >>> is to extend CoreAdminHandler. I'd prefer a way via a way to >>> access CoreContainer from SolrCore or RequestHandlerBase. >>> >>> The use case is, I want to implement a SearchHandler that by >>> default, searches all of the local cores by automatically >>> inserting a shards param of the form >>> "localhost:8080/solr/core0,localhost:8080/solr/core1" into the >>> request. I'll be dynamically creating and unloading cores and so >>> do not want to edit solrconfig each time a core changes. >>> >>> Thanks! >>> >>> >> SolrCore.getCoreDescriptor().getCoreContainer() >> >> -- >> - Mark >> >> http://www.lucidimagination.com >> >> >> >> >> -- - Mark http://www.lucidimagination.com
Re: Get access to CoreContainer
I'll just allow the user to pass in the port via a param for now. Thx! On Tue, Sep 29, 2009 at 4:13 PM, Mark Miller wrote: > Unfortunately, because they don't want you counting on access to the > servlet request due to embedded Solr and what not, to get that type of > info you have to override and use your own SolrDispatchFilter: > > protected void execute( HttpServletRequest req, SolrRequestHandler > handler, SolrQueryRequest sreq, SolrQueryResponse rsp) { > > // a custom filter could add more stuff to the request before > passing it on. > // for example: sreq.getContext().put( "HttpServletRequest", req ); > > > Jason Rutherglen wrote: >> Yah, I just found it, and was going to reply to my own message with >> that exactly! >> >> My next question is how to get the port the request was on? >> >> On Tue, Sep 29, 2009 at 4:01 PM, Mark Miller wrote: >> >>> Jason Rutherglen wrote: >>> Howdy, I was wondering what the best way is to access the current instance of CoreContainer? It seems like the only way to do this is to extend CoreAdminHandler. I'd prefer a way via a way to access CoreContainer from SolrCore or RequestHandlerBase. The use case is, I want to implement a SearchHandler that by default, searches all of the local cores by automatically inserting a shards param of the form "localhost:8080/solr/core0,localhost:8080/solr/core1" into the request. I'll be dynamically creating and unloading cores and so do not want to edit solrconfig each time a core changes. Thanks! >>> SolrCore.getCoreDescriptor().getCoreContainer() >>> >>> -- >>> - Mark >>> >>> http://www.lucidimagination.com >>> >>> >>> >>> >>> > > > -- > - Mark > > http://www.lucidimagination.com > > > >
Re: q.alt matching no documents
: I've been using q.alt=-*:* because *:* is said to be the most efficient way of : querying for every document. is -*:* the most efficient way of querying for : no document? I don't think so ... solr internally reverse pure negative queries so that they are combined with a matchalldocsquery that is positive ... which means your final query would look like (*:* -*:*) ... there's no toehr query optimization that would happen, so that query sould produce a DisjunctionScorer that can't ever "skipTo" past any docs. once it gets cached it shouldn't matter -- but hey, you asked. The most efficient way i can think of (besides a new Query class like John suggested) is along the lines of what Erik mentioned... ... q.alt=match_nothing:0 ...it has to be a real field or the query parsing code will freak out, but making it indexed=false will ensure that there will *never* be any terms in that field so the term "0" will never be there so the entire query process will short circut out almost immediately (it won't even construct a Scorer let alone iterate over any docs) -Hoss
Re: Writing optimized index to different storage?
: Is it possible to tell Solr or Lucene, when optimizing, to write the files : that constitute the optimized index to somewhere other than : SOLR_HOME/data/index or is there something about the optimize that requires : the final segment to be created in SOLR_HOME/data/index? For what purpose? http://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an "XY Problem" ... that is: you are dealing with "X", you are assuming "Y" will help you, and you are asking about "Y" without giving more details about the "X" so that we can understand the full issue. Perhaps the best solution doesn't involve "Y" at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
Re: Index backup with new replication?
Yep, super straight-forward, thanks a bunch! Guess I missed this piece of the wiki, looks like its going through alot of updates towards solr 1.4 release.. thanks, -Chak ryguasu wrote: > > The documentation could maybe be improved, but the basics of backup > snapshots with the in-process (Java-based) replication handler > actually seem pretty straightforward to me, now that I understand it: > > 1. You can make a snapshot whenever you want by hitting > http://master_host:port/solr/replication?command=backup > > 2. You can have automatically triggered snapshots at commit time or > optimize time by putting a backupAfter tag in the replication handler > section of your solrconfig.xml. > > (See http://wiki.apache.org/solr/SolrReplication) > > In neither case do you need to stop Solr or stop modifying your index > while the backup is in progress. > > Does anything in particular seem not straightforward? I guess there's > no built-in way to purge old indexes from disk; that's a little > inconvenient. > > If you want to use the command-line tools, I think those should be > totally compatible with the new (Java) replication tools. I don't know > as much about them, though. > > 2009/9/29 KaktuChakarabati : >> >> Hey, >> I noticed with new in-process replication, it is not as straightforward >> to >> have >> (production serving) solr index snapshots for backup (it used to be a >> natural byproduct >> of the snapshot taking process.) >> I understand there are some command-line utilities for this (abc..) >> Can someone please explain how to use these to take a snapshot >> of a solr index, assuming it is being used in production? what are some >> guidelines? should I stop >> other processes that might be issuing updates and/or comitts while taking >> it >> or is it atomic (e.g hard link )? >> >> would be nice to have this in wiki too i think for the benefit of other >> users, >> having regular backup snapshots seems critical.. >> >> Thanks, >> -Chak >> -- >> View this message in context: >> http://www.nabble.com/Index-backup-with-new-replication--tp25667145p25667145.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Index-backup-with-new-replication--tp25667145p25672927.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem getting Solr home from JNDI in Tomcat
: Hi all, I'm having problems getting Solr to start on Tomcat 6. which version of Solr? : Tomcat is installed in /opt/apache-tomcat , solr is in : /opt/apache-tomcat/webapps/solr , and my Solr home directory is /opt/solr . if "solr is in /opt/apache-tomcat/webapps/solr" means that you put the solr.war in /opt/apache-tomcat/webapps/ and tomcat expanded it into /opt/apache-tomcat/webapps/solr then that is your problem -- tomcat isn't even looking at your context file (it only looks at the context files to ersolve URLs that it cant resolve looking in the webapps directory) This is why the examples of using context files on the wiki talk about keeping the war *outside* of the webapps directory, and using docBase in your Context declaration... http://wiki.apache.org/solr/SolrTomcat -Hoss
Re: Problem getting Solr home from JNDI in Tomcat
: Now I've got a completely different error: : "org.apache.lucene.index.CorruptIndexException: Unknown format version: -9". : I think it might be time for a fresh install... I've added a FAQ for this... http://wiki.apache.org/solr/FAQ#What_does_.22CorruptIndexException:_Unknown_format_version.22_mean_.3F What does "CorruptIndexException: Unknown format version" mean ? This happens when the Lucene code in Solr used to read the index files from disk encounters index files in a format it doesn't recognize. The most common cause is from using a version of Solr+Lucene that is older then the version used to create that index. -Hoss
Re: Questions on RandomSortField
Thanks Hoss! The approach that I explained in my subsequent email works like a charm. Cheers Avlesh On Wed, Sep 30, 2009 at 3:45 AM, Chris Hostetter wrote: > > : The question was either non-trivial or heavily uninteresting! No replies > yet > > it's pretty non-trivial, and pretty interesting, but i'm also pretty > behind on my solr-user email. > > I don't think there's anyway to do what you wanted without a custom > plugin, so your efforts weren't in vain ... if we add the abiliity to sort > by a ValueSource (aka function ... there's a Jira issue for this > somewhere) then you could also do witha combination of functions so that > anything in your category gets flattened to an extremely high constant, > and everything else has a real score -- then a secondary sort on a random > field would effectively only randomize the things in your category ... but > we're not there yet. > > : Hoss, I have a small question (RandomSortField bears your signature) - > Any > : reason as to why RandomSortField#hash() and RandomSortField#getSeed() > : methods are private? Having them public would have saved myself from > : "owning" a copy in my class as well. > > just a general principle of API future-proofing: keep internals private > unless you explicitly think through how subclasses will use them. > > I haven't thought it through all the way, but do you really need to copy > everything? couldn't you get the SortField/Comparator from super and > only delegate to it if the categories both match your specific categoryId? > > > > -Hoss > >
Number of terms in a SOLR field
Hi all, I am attempting to test some changes I made to my DIH based indexing process. The changes only affect the way I describe my fields in data-config.xml, there should be no changes to the way the data is indexed or stored. As a QA check I was wanting to compare the results from indexing the same data before/after the change. I was looking for a way of getting counts of terms in each field. I guess Luke etc most allow this but how? Regards Fergus.
Re: Number of terms in a SOLR field
Fergus McMenemie wrote: Hi all, I am attempting to test some changes I made to my DIH based indexing process. The changes only affect the way I describe my fields in data-config.xml, there should be no changes to the way the data is indexed or stored. As a QA check I was wanting to compare the results from indexing the same data before/after the change. I was looking for a way of getting counts of terms in each field. I guess Luke etc most allow this but how? Luke uses brute force approach - it traverses all terms, and counts terms per field. This is easy to implement yourself - just get IndexReader.terms() enumeration and traverse it. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com