Re: Guide to using SolrQuery object
You'll find the available parameters in various interfaces in the package org.apache.solr.common.params.* For instance: import org.apache.solr.common.params.FacetParams; import org.apache.solr.common.params.ShardParams; import org.apache.solr.common.params.TermVectorParams; As a side note to what Shalin said, SolrQuery extends ModifiableSolrParams (just so that you are aware of that). Hope that helps a bit. Cheers, Aleks On Tue, 14 Jul 2009 16:27:50 +0200, Reuben Firmin reub...@benetech.org wrote: Also, are there enums or constants around the various param names that can be passed in, or do people tend to define those themselves? Thanks! Reuben -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.com http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Configure Collection Distribution in Solr 1.3
As some people have mentioned here on this mailing lists, the solr 1.3 distribution scripts (snappuller / shooter) etc do not work on windows. Some have indicated that it might be possible to use cygwin but I have doubts. So unfortunately, windows users suffers with regard to replication (although I would reccommend everyone to use Unix for running servers;) ) That being said, you can use Solr 1.4 (one of the nightly builds) where you get built-in replication that is easily configured through the solr server configuration, and this works on Windows aswell! So, if you don't have any real reason to not upgrade, I suggest that you try out Solr 1.4 (which also gives lots of new features and major improvements!) Cheers, Aleksander On Tue, 09 Jun 2009 21:00:27 +0200, MaheshR mahesh.ray...@gmail.com wrote: Hi Aleksander , I gone thorugh the below links and successfully configured rsync using cygwin on windows xp. In Solr documentation they mentioned many script files like rysnc-enable, snapshooter..etc. These all UNIX based files scripts. where do I get these script files for windows OS ? Any help on this would be great helpful. Thanks MaheshR. Aleksander M. Stensby wrote: You'll find everything you need in the Wiki. http://wiki.apache.org/solr/SolrCollectionDistributionOperationsOutline http://wiki.apache.org/solr/SolrCollectionDistributionScripts If things are still uncertain I've written a guide for when we used the solr distribution scrips on our lucene index earlier. You can read that guide here: http://www.integrasco.no/index.php?option=com_contentview=articleid=51:lucene-index-replicationcatid=35:blogItemid=53 Cheers, Aleksander On Mon, 08 Jun 2009 18:22:01 +0200, MaheshR mahesh.ray...@gmail.com wrote: Hi, we configured multi-core solr 1.3 server in Tomcat 6.0.18 servlet container. Its working great. Now I need to configure collection Distribution to replicate indexing data between master and 2 slaves. Please provide me step by step instructions to configure collection distribution between master and slaves would be helpful. Thanks in advance. Thanks Mahesh. -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Query on date fields
Hello, for this you can simply use the nifty date functions supplied by SOLR (given that you have indexed your fields with the solr Date field. If I understand you correctly, you can achieve what you want with the following union query: displayStartDate:[* TO NOW] AND displayEndDate:[NOW TO *] Cheers, Aleksander On Mon, 08 Jun 2009 09:17:26 +0200, prerna07 pkhandelw...@sapient.com wrote: Hi, I have two date attributes in my Indexes: DisplayStartDate_dt DisplayEndDate_dt I need to fetch results where today's date lies between displayStartDate and dislayEndDate. However i cannot send hardcoded displayStartdate and displayEndDate date in query as there are 1000 different dates in indexes Please suggest the query. Thanks, Prerna -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Search Phrase Wildcard?
Solr does not support wildcards in phrase queries, yet. Cheers, Aleks On Thu, 11 Jun 2009 11:48:13 +0200, Samnang Chhun samnang.ch...@gmail.com wrote: Hi all, I have my document like this: doc nameSolr web service/name /doc Is there any ways that I can search like startswith: So* We* : found Sol*: found We*: not found Cheers, Samnang -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Search Phrase Wildcard?
Well yes:) Since Solr do infact support the entire lucene query parser syntax:) - Aleks On Thu, 11 Jun 2009 13:57:23 +0200, Avlesh Singh avl...@gmail.com wrote: Infact, Lucene does not support that. Lucene supports single and multiple character wildcard searches within single terms (*not within phrase queries*). Taken from http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Wildcard%20Searches Cheers Avlesh On Thu, Jun 11, 2009 at 4:32 PM, Aleksander M. Stensby aleksander.sten...@integrasco.no wrote: Solr does not support wildcards in phrase queries, yet. Cheers, Aleks On Thu, 11 Jun 2009 11:48:13 +0200, Samnang Chhun samnang.ch...@gmail.com wrote: Hi all, I have my document like this: doc nameSolr web service/name /doc Is there any ways that I can search like startswith: So* We* : found Sol*: found We*: not found Cheers, Samnang -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Sharding strategy
Hi Otis, thanks for your reply! You could say I'm lucky (and I totally agree since I've made the choice of ordering the data that way:p). What you describe is what I've thought about doing and I'm happy to read that you approve. It is always nice to know that you are not doing things completely off - that's what I love about this mailing list! I've implemented a sharded yellow pages that builds up the shard parameter and it will obviously be easy to search in two shards to overcome the beginning of the year situation, just thought it might be a bit stupid to search for 1% of the data in the latest shard and the rest in shard n-1. How much of a performance decrease do you recon I will get from searching two shards instead of one? Anyways, thanks for confirming things, Otis! Cheers, Aleksander On Wed, 10 Jun 2009 07:51:16 +0200, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Aleksander, In a sense you are lucky you have time-ordered data. That makes it very easy to shard and cheaper to search - you know exactly which shards you need to query. The beginning of the year situation should also be easy. Do start with the latest shard for the current year, and go to next shard only if you have to (e.g. if you don't get enough results from the first shard). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Aleksander M. Stensby aleksander.sten...@integrasco.no To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tuesday, June 9, 2009 7:07:47 AM Subject: Sharding strategy Hi all, I'm trying to figure out how to shard our index as it is growing rapidly and we want to make our solution scalable. So, we have documents that are most commonly sorted by their date. My initial thought is to shard the index by date, but I wonder if you have any input on this and how to best solve this... I know that the most frequent queries will be executed against the latest shard, but then let's say we shard by year, how do we best solve the situation that will occur in the beginning of a new year? (Some of the data will be in the last shard, but most of it will be on the second last shard.) Would it be stupid to have a latest shard with duplicate data (always consisting of the last 6 months or something like that) and maintain that index in addition to the regular yearly shards? Any one else facing a similar situation with a good solution? Any input would be greatly appreciated :) Cheers, Aleksander --Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Solr Multiple Queries?
Hi there Samnang! Please see inline for comments: On Tue, 09 Jun 2009 08:40:02 +0200, Samnang Chhun samnang.ch...@gmail.com wrote: Hi all, I just get started looking at using Solr as my search web service. But I don't know does Solr have some features for multiple queries: - Startswith This is what we call prefix queries and wild card queries. For instance, you want something that starts with man, you can search for man* - Exact Match Exact matching is done with apostrophes; Solr rocks - Contain Hmm, what do you mean by contain? Inside a given word? That might be a bit more tricky. We have an issue open at the moment for supporting leading wildcards, and that might allow for you to search for *cogn* and match recognition etc. If that was what you meant, you can look at the ongoing issue http://issues.apache.org/jira/browse/SOLR-218 - Doesn't Contain NOT or - are keywords to exclude something (solr supports all the boolean operators that Lucene supports). - In the range range queries in solr are done by using brackets. for instance price:[500 TO 1000] will return all results with prices ranging from 500 to 1000. There is a lot of information on the Wiki that you should check out: http://wiki.apache.org/solr/ Could anyone guide me how to implement those features in Solr? Cheers, Samnang Cheers, Aleks -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Multiple queries in one, something similar to a SQL union
I don't know if I follow you correctly, but you are saying that you want X results per type? So you do something like limit=X and query = type:Y etc. and merge the results? - Aleks On Tue, 09 Jun 2009 12:33:21 +0200, Avlesh Singh avl...@gmail.com wrote: I have an index with two fields - name and type. I need to perform a search on the name field so that *equal number of results are fetched for each type *. Currently, I am achieving this by firing multiple queries with a different type and then merging the results. In my database driven version, I used to do a union of multiple queries (and not separate SQL queries) to achieve this. Can Solr do something similar? If not, can this be a possible enhancement? Cheers Avlesh -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Sharding strategy
Hi all, I'm trying to figure out how to shard our index as it is growing rapidly and we want to make our solution scalable. So, we have documents that are most commonly sorted by their date. My initial thought is to shard the index by date, but I wonder if you have any input on this and how to best solve this... I know that the most frequent queries will be executed against the latest shard, but then let's say we shard by year, how do we best solve the situation that will occur in the beginning of a new year? (Some of the data will be in the last shard, but most of it will be on the second last shard.) Would it be stupid to have a latest shard with duplicate data (always consisting of the last 6 months or something like that) and maintain that index in addition to the regular yearly shards? Any one else facing a similar situation with a good solution? Any input would be greatly appreciated :) Cheers, Aleksander -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
StreamingUpdateSolrServer recommendations?
Hi all, I guess this questions i mainly aimed to you, Ryan. I've been trying out your StreamingUpdateSolrServer implementation for indexin, and clearly see the improvements in indexing-times compared to the CommonsHttpSolrServer :) Great work! My question is, do you have any recommendations as to what values I should use / have you found a sweet-spot? What are the trade-offs? Thread count is obvious with regard to the number of cpus available, but what about the queue size? Any thoughts? I tried 20 / 3 as you have posted in the issue thread, and get averages of about 80 documents / sec (and I have not optimized the document processing etc, which takes the larger part of the time). Anyways, I was just curious on what others are using (and what times you are getting at) Keep up the good work! Aleks -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Terms Component
You can try out the nightly build of solr (which is the solr 1.4 dev version) containing all the new nice and shiny features of Solr 1.4:) To use Terms Component you simply need to configure the handler as explained in the documentation / wiki. Cheers, Aleksander On Mon, 08 Jun 2009 14:22:15 +0200, Anshuman Manur anshuman_ma...@stragure.com wrote: while on the subject, can anybody tell me when Solr 1.4 might come out? Thanks Anshuman Manur On Mon, Jun 8, 2009 at 5:37 PM, Anshuman Manur anshuman_ma...@stragure.comwrote: I'm using Solr 1.3 apparently.and Solr 1.4 is not out yet. Sorry..My mistake! On Mon, Jun 8, 2009 at 5:18 PM, Anshuman Manur anshuman_ma...@stragure.com wrote: Hello, I want to use the terms component in Solr 1.4: But http://localhost:8983/solr/terms?terms.fl=name But, I get the following error with the above query: java.lang.NullPointerException at org.apache.solr.common.util.StrUtils.splitSmart(StrUtils.java:37) at org.apache.solr.search.OldLuceneQParser.parse(LuceneQParserPlugin.java:104) at org.apache.solr.search.QParser.getQuery(QParser.java:88) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:82) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:148) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:84) at javax.servlet.http.HttpServlet.service(HttpServlet.java:690) at javax.servlet.http.HttpServlet.service(HttpServlet.java:803) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:295) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:568) at org.ofbiz.catalina.container.CrossSubdomainSessionValve.invoke(CrossSubdomainSessionValve.java:44) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Any help would be great. Thanks Anshuman Manur -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Configure Collection Distribution in Solr 1.3
You'll find everything you need in the Wiki. http://wiki.apache.org/solr/SolrCollectionDistributionOperationsOutline http://wiki.apache.org/solr/SolrCollectionDistributionScripts If things are still uncertain I've written a guide for when we used the solr distribution scrips on our lucene index earlier. You can read that guide here: http://www.integrasco.no/index.php?option=com_contentview=articleid=51:lucene-index-replicationcatid=35:blogItemid=53 Cheers, Aleksander On Mon, 08 Jun 2009 18:22:01 +0200, MaheshR mahesh.ray...@gmail.com wrote: Hi, we configured multi-core solr 1.3 server in Tomcat 6.0.18 servlet container. Its working great. Now I need to configure collection Distribution to replicate indexing data between master and 2 slaves. Please provide me step by step instructions to configure collection distribution between master and slaves would be helpful. Thanks in advance. Thanks Mahesh. -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Initialising of CommonsHttpSolrServer in Spring framwork
Out of the box, the simplest way to configure CommonsHttpSolrServer through a spring application context is to simply define the bean for the server and inject it into whatever class you have that will use it, like Avlesh shared below. bean id=httpSolrServer class=org.apache.solr.client.solrj.impl.CommonsHttpSolrServer constructor-arg valuehttp://localhost:8080/solr/core0/value /constructor-arg /bean You can also set the connection parameters like Avlesh did with the HttpClient in the context, or directly in the init method of your implementation. Inject it with a property: property name=solrServer ref bean=httpSolrServer / /property A bit more tricky with the embedded solr server since you need to also register cores etc. We solved that by creating a core configuration loader class. - Aleks On Sat, 09 May 2009 03:08:25 +0200, Avlesh Singh avl...@gmail.com wrote: I am giving you a detailed sample of my spring usage. bean id=solrHttpClient class=org.apache.commons.httpclient.HttpClient property name=httpConnectionManager bean class=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager property name=maxConnectionsPerHost value=10/ property name=maxTotalConnections value=10/ /bean /property /bean bean id=mySearchImpl class=com.me.search.MySearchSolrImpl property name=core1 bean class=org.apache.solr.client.solrj.impl.CommonsHttpSolrServer constructor-arg value=http://localhost/solr/core1/ constructor-arg ref=solrHttpClient/ /bean /property property name=core2 bean class=org.apache.solr.client.solrj.impl.CommonsHttpSolrServer constructor-arg value=http://localhost/solr/core2/ constructor-arg ref=solrHttpClient/ /bean /property /bean Hope this helps. Cheers Avlesh On Sat, May 9, 2009 at 12:39 AM, sachin78 tendulkarsachi...@gmail.comwrote: Ranjeeth, Did you figured aout how to do this? If yes, can you share with me how you did it? Example bean definition in xml will be helpful. --Sachin Funtick wrote: Use constructor and pass URL parameter. Nothing SPRING related... Create a Spring bean with attributes 'MySolr', 'MySolrUrl', and 'init' method... 'init' will create instance of CommonsHttpSolrServer. Configure Spring... I am using Solr 1.3 and Solrj as a Java Client. I am Integarating Solrj in Spring framwork, I am facing a problem, Spring framework is not inializing CommonsHttpSolrServer class, how can I define this class to get the instance of SolrServer to invoke furthur method on this. -- View this message in context: http://www.nabble.com/Initialising-of-CommonsHttpSolrServer-in-Spring-framwork-tp18808743p23451795.html Sent from the Solr - User mailing list archive at Nabble.com. -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: How do I accomplish this (semi-)complicated setup?
and other data. All this needs to be indexed. The complication comes in when we have private repositories. Only select users have access to these, but we still need to index them. How would I go about accomplishing this? I can't think of a clean way to do it. Any pointers much appreciated. Jesper - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no Please consider the environment before printing all or any of this e-mail
Re: Solrj: Getting response attributes from QueryResponse
Hello there Mark! With SolrJ, you can simply do the following: server.query(q) returns QueryResponse the queryResponse has the method getResults() which returns SolrDocumentList. This is an extended list containing SolrDocuments, but it also exposes methods such as getNumFound(), which is exactly what you are looking for! so, you could do something like this: int hits = solrServer.query(q).getResults().getNumFound(); and you have similar methods for the other attributes, like: results.getMaxScore(); and results.getStart(); Hope that helps. Cheers, and merry Christmas! Aleks On Fri, 19 Dec 2008 21:22:48 +0100, Mark Ferguson mark.a.fergu...@gmail.com wrote: Hello, I am trying to get the numFound attribute from a returned QueryResponse object, but for the life of me I can't find where it is stored. When I view a response in XML format, it is stored as an attribute on the response node, e.g.: result name=response numFound=207 start=5 maxScore=4.1191907 However, I can't find a way to retrieve these attributes (numFound, start and maxScore). When I look at the QueryResponse itself, I can see that the attributes are being stored somewhere, because the toString method returns them. For example, queryResponse.toString() returns: {responseHeader={status=0,QTime=139,params={wt=javabin,hl=true,rows=15,version=2.2,fl=urlmd5,start=0,q=java}},response={ *numFound=1228*,start=03.633028,docs=[SolrDocument[{urlmd5=... The problem is that when I call queryResponse.get('response'), all I get is the list of SolrDocuments, I don't have any other attributes. Am I missing something or are these attributes just not publically available? If they're not, shouldn't they be? Thanks a lot, Mark Ferguson -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no Please consider the environment before printing all or any of this e-mail
Re: What are the scenarios when a new Searcher is created ?
When adding documents to solr, the searcher will not be replaced, but once you do a commit, (dependening on settings) a new searcher will be opened and warmed up while the old searcher will still be open and used when searching. Once the new searcher has finished its warmup procedure, the old searcher will be replaced with the new warmed searcher, which will now allow you to search the newest documents added to the index. - Aleks On Mon, 01 Dec 2008 01:32:05 +0100, souravm [EMAIL PROTECTED] wrote: Hi All, Say I have started a new Solr server instance using the start.jar in java command. Now for this Solr server instance when all a new Searcher would be created ? I am aware of following scenarios - 1. When the instance is started for autowarming a new Searcher is created. But not sure whether this searcher will continue to be alive or will die after the autowarming is over. 2. When I do the first search in this server instance through select, a new searcher would be created and then onwards the same searcher would be used for all select to this instance. Even if I run multiple search request concurrently I see that the same Searcher is used to service those requests. 3. When I try to add an index to this instance through update statement a new searcher is created. Please let me know if there are any other situation when a new Searcher is created. Regards, Sourav CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS*** -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: Keyword extraction
Hi again Patrick. Glad to hear that we can contribute to help you guys. Thats what this mailing list is for:) First of all, I think you use the wrong parameter to get your terms. Take a look at http://lucene.apache.org/solr/api/org/apache/solr/common/params/MoreLikeThisParams.html to see the supported params. In your string you use mlt.displayTerms=list, which i believe should be mlt.interestingTerms=list. If that doesn't work: One thing you should know is that from what i can tell, you are using the StandardRequestHandler in your querying. The StandardRequestHandler supports a simplified handling of more like these queries, namely; This method returns similar documents for each document in the response set. it supports the common mlt parameters, needs mlt=true (as you have done) and supports a mlt.count parameter to specify the number of similar documents returned for each matching doc from your query. If you want to get the top keywords etc, (and in essence your mlt.interestingTerms=list parameter to have any effect at all, if I'm not completely wrong), you will need to configure up a MoreLikeThisHandler in your solrconfig.xml and then map that to your query. From the sample configuration file: incoming queries will be dispatched to the correct handler based on the path or the qt (query type) param. Names starting with a '/' are accessed with the a path equal to the registered name. Names without a leading '/' are accessed with: http://host/app/select?qt=name If no qt is defined, the requestHandler that declares default=true will be used. You can read about the MoreLikeThisHandler here: http://wiki.apache.org/solr/MoreLikeThisHandler Once you have it configured properly your query would be something like: http://localhost:8983/solr/mlt?q=amsterdammlt.fl=textmlt.interestingTerms=listmlt=true (don't think you need the mlt=true here tho...) or http://localhost:8983/solr/select?qt=mltq=amsterdammlt.fl=textmlt.interestingTerms=listmlt=true (in the last example I use qt=mlt) Hope this helps. Regards, Aleksander On Thu, 27 Nov 2008 11:49:30 +0100, Plaatje, Patrick [EMAIL PROTECTED] wrote: Hi Aleksander, With all the help of you and the other comments, we're now at a point where a MoreLikeThis list is returned, and shows 10 related records. However on the query executed there are no keywords whatsoever being returned. Is the querystring still wrong or is something else required? The querystring we're currently executing is: http://suempnr3:8080/solr/select/?q=amsterdammlt.fl=textmlt.displayTerms=listmlt=true Best, Patrick -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: woensdag 26 november 2008 15:07 To: solr-user@lucene.apache.org Subject: Re: Keyword extraction Ah, yes, That is important. In lucene, the MLT will see if the term vector is stored, and if it is not it will still be able to perform the querying, but in a much much much less efficient way.. Lucene will analyze the document (and the variable DEFAULT_MAX_NUM_TOKENS_PARSED will be used to limit the number of tokens that will be parsed). (don't want to go into details on this since I haven't really dug through the code:p) But when the field isn't stored either, it is rather difficult to re-analyze the document;) On a general note, if you want to really understand how the MLT works, take a look at the wiki or read this thorough blog post: http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/ Regards, Aleksander On Wed, 26 Nov 2008 14:41:52 +0100, Plaatje, Patrick [EMAIL PROTECTED] wrote: Hi Aleksander, This was a typo on my end, the original query included a semicolon instead of an equal sign. But I think it has to do with my field not being stored and not being identified as termVectors=true. I'm recreating the index now, and see if this fixes the problem. Best, patrick -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: woensdag 26 november 2008 14:37 To: solr-user@lucene.apache.org Subject: Re: Keyword extraction Hi there! Well, first of all i think you have an error in your query, if I'm not mistaken. You say http://localhost:8080/solr/select/?q=id=18477975... but since you are referring to the field called id, you must say: http://localhost:8080/solr/select/?q=id:18477975... (use colon instead of the equals sign). I think that will do the trick. If not, try adding the debugQuery=on at the end of your request url, to see debug output on how the query is parsed and if/how any documents are matched against your query. Hope this helps. Cheers, Aleksander On Wed, 26 Nov 2008 13:08:30 +0100, Plaatje, Patrick [EMAIL PROTECTED] wrote: Hi Aleksander, Thanx for clearing this up. I am confident that this is a way to explore for me as I'm just starting to grasp the matter. Do you know why I'm not getting any results with the query posted earlier
Re: Keyword extraction
Hi there! Well, first of all i think you have an error in your query, if I'm not mistaken. You say http://localhost:8080/solr/select/?q=id=18477975... but since you are referring to the field called id, you must say: http://localhost:8080/solr/select/?q=id:18477975... (use colon instead of the equals sign). I think that will do the trick. If not, try adding the debugQuery=on at the end of your request url, to see debug output on how the query is parsed and if/how any documents are matched against your query. Hope this helps. Cheers, Aleksander On Wed, 26 Nov 2008 13:08:30 +0100, Plaatje, Patrick [EMAIL PROTECTED] wrote: Hi Aleksander, Thanx for clearing this up. I am confident that this is a way to explore for me as I'm just starting to grasp the matter. Do you know why I'm not getting any results with the query posted earlier then? It gives me the folowing only: lst name=moreLikeThis result name=18477975 numFound=0 start=0/ /lst Instead of delivering details of the interestingTerms. Thanks in advance Patrick -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: woensdag 26 november 2008 13:03 To: solr-user@lucene.apache.org Subject: Re: Keyword extraction I do not agree with you at all. The concept of MoreLikeThis is based on the fundamental idea of TF-IDF weighting, and not term frequency alone. Please take a look at: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/similar/MoreLikeThis.html As you can see, it is possible to use cut-off thresholds to significantly reduce the number of unimportant terms, and generate highly suitable queries based on the tf-idf frequency of the term, since as you point out, high frequency terms alone tends to be useless for querying, but taking the document frequency into account drastically increases the importance of the term! In solr, use parameters to manipulate your desired results: http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e22ec5d1519c456b2c For instance: mlt.mintf - Minimum Term Frequency - the frequency below which terms will be ignored in the source doc. mlt.mindf - Minimum Document Frequency - the frequency at which words will be ignored which do not occur in at least this many docs. You can also set thresholds for term length etc. Hope this gives you a better idea of things. - Aleks On Wed, 26 Nov 2008 12:38:38 +0100, Scurtu Vitalie [EMAIL PROTECTED] wrote: Dear Partick, I had the same problem with MoreLikeThis function. After briefly reading and analyzing the source code of moreLikeThis function in solr, I conducted: MoreLikeThis uses term vectors to ranks all the terms from a document by its frequency. According to its ranking, it will start to generate queries, artificially, and search for documents. So, moreLikeThis will retrieve related documents by artificially generating queries based on most frequent terms. There's a big problem with most frequent terms from documents. Most frequent words are usually meaningless, or so called function words, or, people from Information Retrieval like to call them stopwords. However, ignoring technical problems of implementation of moreLikeThis function, this approach is very dangerous, since queries are generated artificially based on a given document. Writting queries for retrieving a document is a human task, and it assumes some knowledge (user knows what document he wants). I advice to use others approaches, depending on your expectation. For example, you can extract similar documents just by searching for documents with similar title (more like this doesn't work in this case). I hope it helps, Best Regards, Vitalie Scurtu --- On Wed, 11/26/08, Plaatje, Patrick [EMAIL PROTECTED] wrote: From: Plaatje, Patrick [EMAIL PROTECTED] Subject: RE: Keyword extraction To: solr-user@lucene.apache.org Date: Wednesday, November 26, 2008, 10:52 AM Hi All, as an addition to my previous post, no interestingTerms are returned when i execute the folowing url: http://localhost:8080/solr/select/?q=id=18477975mlt.fl=textmlt.inter es tingTerms=listmlt=truemlt.match.include=true I get a moreLikeThis list though, any thoughts? Best, Patrick -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: Keyword extraction
I'm sure that for certain problems and cases you will need to do quite a bit tweaking to make it work (to suite your needs), but i responded to your statement because you made it sound like the MoreLikeThis component does not work at all for its purpuse, while it actually do work as intended and can be of great aid in constructing queries to retrieve same-topic-documents etc. - Aleksander On Wed, 26 Nov 2008 14:10:57 +0100, Scurtu Vitalie [EMAIL PROTECTED] wrote: Yes, I totally understand, and agree. MoreLikeThis uses TF-IDF to rank terms, then it generates queries based on top ranked terms. In any case, I wasn't able to make it work after many attempts. Finally, I've used a different method for queries generation, and it works better, or at least gives some results, while with moreLikeThis results were poor or no result at all. To mention that my index was composed by short length documents, therefore the intersection between top ranked terms by TF-IDF was empty set. MoreLikeThis works better when you have long documents. Yes, I've changed the thresholds for min TFIDF and max TFIDF, and others parameters. I've also used mlt.maxqt parameter to increase the number of terms used in queries generation, but still didn't work well, since the method of queries generation based on terms with the highest TF-IDF score doesn't generate representative query for document. I wasn't able to tune it. For a low value such as mlt.maxqt=3,4, results were poor, while for mlt.maxqt=5,6 it gave too many and irrelevant results. Thank you, Best Wishes, Vitalie Scurtu --- On Wed, 11/26/08, Aleksander M. Stensby [EMAIL PROTECTED] wrote: From: Aleksander M. Stensby aleksander. [EMAIL PROTECTED] Subject: Re: Keyword extraction To: solr-user@lucene.apache.org Date: Wednesday, November 26, 2008, 1:03 PM I do not agree with you at all. The concept of MoreLikeThis is based on the fundamental idea of TF-IDF weighting, and not term frequency alone. Please take a look at: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/similar/MoreLikeThis.html As you can see, it is possible to use cut-off thresholds to significantly reduce the number of unimportant terms, and generate highly suitable queries based on the tf-idf frequency of the term, since as you point out, high frequency terms alone tends to be useless for querying, but taking the document frequency into account drastically increases the importance of the term! In solr, use parameters to manipulate your desired results: http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e22ec5d1519c456b2c For instance: mlt.mintf - Minimum Term Frequency - the frequency below which terms will be ignored in the source doc. mlt.mindf - Minimum Document Frequency - the frequency at which words will be ignored which do not occur in at least this many docs. You can also set thresholds for term length etc. Hope this gives you a better idea of things. - Aleks On Wed, 26 Nov 2008 12:38:38 +0100, Scurtu Vitalie [EMAIL PROTECTED] wrote: Dear Partick, I had the same problem with MoreLikeThis function. After briefly reading and analyzing the source code of moreLikeThis function in solr, I conducted: MoreLikeThis uses term vectors to ranks all the terms from a document by its frequency. According to its ranking, it will start to generate queries, artificially, and search for documents. So, moreLikeThis will retrieve related documents by artificially generating queries based on most frequent terms. There's a big problem with most frequent terms from documents. Most frequent words are usually meaningless, or so called function words, or, people from Information Retrieval like to call them stopwords. However, ignoring technical problems of implementation of moreLikeThis function, this approach is very dangerous, since queries are generated artificially based on a given document. Writting queries for retrieving a document is a human task, and it assumes some knowledge (user knows what document he wants). I advice to use others approaches, depending on your expectation. For example, you can extract similar documents just by searching for documents with similar title (more like this doesn't work in this case). I hope it helps, Best Regards, Vitalie Scurtu --- On Wed, 11/26/08, Plaatje, Patrick [EMAIL PROTECTED] wrote: From: Plaatje, Patrick [EMAIL PROTECTED] Subject: RE: Keyword extraction To: solr-user@lucene.apache.org Date: Wednesday, November 26, 2008, 10:52 AM Hi All, as an addition to my previous post, no interestingTerms are returned when i execute the folowing url: http://localhost:8080/solr/select/?q=id=18477975mlt.fl=textmlt.interes tingTerms=listmlt=truemlt.match.include=true I get a moreLikeThis list though, any thoughts? Best, Patrick --Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: Keyword extraction
Ah, yes, That is important. In lucene, the MLT will see if the term vector is stored, and if it is not it will still be able to perform the querying, but in a much much much less efficient way.. Lucene will analyze the document (and the variable DEFAULT_MAX_NUM_TOKENS_PARSED will be used to limit the number of tokens that will be parsed). (don't want to go into details on this since I haven't really dug through the code:p) But when the field isn't stored either, it is rather difficult to re-analyze the document;) On a general note, if you want to really understand how the MLT works, take a look at the wiki or read this thorough blog post: http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/ Regards, Aleksander On Wed, 26 Nov 2008 14:41:52 +0100, Plaatje, Patrick [EMAIL PROTECTED] wrote: Hi Aleksander, This was a typo on my end, the original query included a semicolon instead of an equal sign. But I think it has to do with my field not being stored and not being identified as termVectors=true. I'm recreating the index now, and see if this fixes the problem. Best, patrick -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: woensdag 26 november 2008 14:37 To: solr-user@lucene.apache.org Subject: Re: Keyword extraction Hi there! Well, first of all i think you have an error in your query, if I'm not mistaken. You say http://localhost:8080/solr/select/?q=id=18477975... but since you are referring to the field called id, you must say: http://localhost:8080/solr/select/?q=id:18477975... (use colon instead of the equals sign). I think that will do the trick. If not, try adding the debugQuery=on at the end of your request url, to see debug output on how the query is parsed and if/how any documents are matched against your query. Hope this helps. Cheers, Aleksander On Wed, 26 Nov 2008 13:08:30 +0100, Plaatje, Patrick [EMAIL PROTECTED] wrote: Hi Aleksander, Thanx for clearing this up. I am confident that this is a way to explore for me as I'm just starting to grasp the matter. Do you know why I'm not getting any results with the query posted earlier then? It gives me the folowing only: lst name=moreLikeThis result name=18477975 numFound=0 start=0/ /lst Instead of delivering details of the interestingTerms. Thanks in advance Patrick -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: woensdag 26 november 2008 13:03 To: solr-user@lucene.apache.org Subject: Re: Keyword extraction I do not agree with you at all. The concept of MoreLikeThis is based on the fundamental idea of TF-IDF weighting, and not term frequency alone. Please take a look at: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/simil ar/MoreLikeThis.html As you can see, it is possible to use cut-off thresholds to significantly reduce the number of unimportant terms, and generate highly suitable queries based on the tf-idf frequency of the term, since as you point out, high frequency terms alone tends to be useless for querying, but taking the document frequency into account drastically increases the importance of the term! In solr, use parameters to manipulate your desired results: http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e2 2ec5d1519c456b2c For instance: mlt.mintf - Minimum Term Frequency - the frequency below which terms will be ignored in the source doc. mlt.mindf - Minimum Document Frequency - the frequency at which words will be ignored which do not occur in at least this many docs. You can also set thresholds for term length etc. Hope this gives you a better idea of things. - Aleks On Wed, 26 Nov 2008 12:38:38 +0100, Scurtu Vitalie [EMAIL PROTECTED] wrote: Dear Partick, I had the same problem with MoreLikeThis function. After briefly reading and analyzing the source code of moreLikeThis function in solr, I conducted: MoreLikeThis uses term vectors to ranks all the terms from a document by its frequency. According to its ranking, it will start to generate queries, artificially, and search for documents. So, moreLikeThis will retrieve related documents by artificially generating queries based on most frequent terms. There's a big problem with most frequent terms from documents. Most frequent words are usually meaningless, or so called function words, or, people from Information Retrieval like to call them stopwords. However, ignoring technical problems of implementation of moreLikeThis function, this approach is very dangerous, since queries are generated artificially based on a given document. Writting queries for retrieving a document is a human task, and it assumes some knowledge (user knows what document he wants). I advice to use others approaches, depending on your expectation. For example, you can extract similar documents just by searching for documents with similar title (more like this doesn't work in this case). I hope
Re: Keyword extraction
I do not agree with you at all. The concept of MoreLikeThis is based on the fundamental idea of TF-IDF weighting, and not term frequency alone. Please take a look at: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/similar/MoreLikeThis.html As you can see, it is possible to use cut-off thresholds to significantly reduce the number of unimportant terms, and generate highly suitable queries based on the tf-idf frequency of the term, since as you point out, high frequency terms alone tends to be useless for querying, but taking the document frequency into account drastically increases the importance of the term! In solr, use parameters to manipulate your desired results: http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e22ec5d1519c456b2c For instance: mlt.mintf - Minimum Term Frequency - the frequency below which terms will be ignored in the source doc. mlt.mindf - Minimum Document Frequency - the frequency at which words will be ignored which do not occur in at least this many docs. You can also set thresholds for term length etc. Hope this gives you a better idea of things. - Aleks On Wed, 26 Nov 2008 12:38:38 +0100, Scurtu Vitalie [EMAIL PROTECTED] wrote: Dear Partick, I had the same problem with MoreLikeThis function. After briefly reading and analyzing the source code of moreLikeThis function in solr, I conducted: MoreLikeThis uses term vectors to ranks all the terms from a document by its frequency. According to its ranking, it will start to generate queries, artificially, and search for documents. So, moreLikeThis will retrieve related documents by artificially generating queries based on most frequent terms. There's a big problem with most frequent terms from documents. Most frequent words are usually meaningless, or so called function words, or, people from Information Retrieval like to call them stopwords. However, ignoring technical problems of implementation of moreLikeThis function, this approach is very dangerous, since queries are generated artificially based on a given document. Writting queries for retrieving a document is a human task, and it assumes some knowledge (user knows what document he wants). I advice to use others approaches, depending on your expectation. For example, you can extract similar documents just by searching for documents with similar title (more like this doesn't work in this case). I hope it helps, Best Regards, Vitalie Scurtu --- On Wed, 11/26/08, Plaatje, Patrick [EMAIL PROTECTED] wrote: From: Plaatje, Patrick [EMAIL PROTECTED] Subject: RE: Keyword extraction To: solr-user@lucene.apache.org Date: Wednesday, November 26, 2008, 10:52 AM Hi All, as an addition to my previous post, no interestingTerms are returned when i execute the folowing url: http://localhost:8080/solr/select/?q=id=18477975mlt.fl=textmlt.interes tingTerms=listmlt=truemlt.match.include=true I get a moreLikeThis list though, any thoughts? Best, Patrick -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: Can a lucene document be used in solr?
Hello there, do you mean a lucene Document or do you mean if it is possible to use an existing lucene index with solr? In the latter case, the answer is yes, since solr is built on top of lucene. But it requires you to configure your schema.xml to correlate to the index-structure of your existing lucene index. On the question of document, Solr will take what is called a SolrInputDocument as input if you are using solrj, or xml if you are using http. Don't know if that answered your question or not.. Regards, Aleksander On Thu, 27 Nov 2008 05:55:06 +0100, Sajith Vimukthi [EMAIL PROTECTED] wrote: Hi all, Can someone of you all tell me whether I can use a lucene document in solr? Regards, Sajith Vimukthi Weerakoon Associate Software Engineer | ZONE24X7 | Tel: +94 11 2882390 ext 101 | Fax: +94 11 2878261 | http://www.zone24x7.com -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: facet.sort and distributed search
This is a known issue but take a look at the following jira issue and the patch supplied there: https://issues.apache.org/jira/browse/SOLR-764 Haven't tried it myself, but i believe it should do the trick for you. Hope that helps. Cheers, Aleksander On Wed, 26 Nov 2008 22:53:21 +0100, Grégoire Neuville [EMAIL PROTECTED] wrote: Hi, I'm working on an web application one functionality of which consists in presenting to the user a list of terms to seize in a form field, sorted alphabetically. As long as one single index was concerned, I used solr facets to produce the list and it worked fine. But I must now deal with several indices, and thus use the distributed search capability of solr, which forbid the use of facet.sort=false. I would like to know if someone plans to, or is even working on, the implementation of the natural facet sorting in case of a distributed search. Thanks a lot, -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: Unique id
Hello again. I'm getting a bit confused by your questions, and I believe it would be easier for us to help you if you could post the field definitions from your schema.xml and the structure of your two database views. ie. table 1: (id (int), subject (string) -.--) table 2: (category (string), other fields ..) So please post this and we can try to help you. - Aleks On Fri, 21 Nov 2008 07:49:31 +0100, Raghunandan Rao [EMAIL PROTECTED] wrote: Thanks Erik. If I convert that to a string then id field defined in schema.xml would fail as I have that as integer. If I change that to string then first view would fail as it is Integer there. What to do in such scenarios? Do I need to define multiple schema.xml or multiple unique key definitions in same schema. How does this work? Pls explain. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, November 20, 2008 6:40 PM To: solr-user@lucene.apache.org Subject: Re: Unique id I'd suggest aggregating those three columns into a string that can serve as the Solr uniqueKey field value. Erik On Nov 20, 2008, at 1:10 AM, Raghunandan Rao wrote: Basically, I am working on two views. First one has an ID column. The second view has no unique ID column. What to do in such situations? There are 3 other columns where I can make a composite key out of those. I have to index these two views now. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 5:24 PM To: solr-user@lucene.apache.org Subject: Re: Unique id Technically, no, a uniqueKey field is NOT required. I've yet to run into a situation where it made sense not to use one though. As for indexing database tables - if one of your tables doesn't have a primary key, does it have an aggregate unique key of some sort? Do you plan on updating the rows in that table and reindexing them? Seems like some kind of unique key would make sense for updating documents. But yeah, a more detailed description of your table structure and searching needs would be helpful. Erik On Nov 19, 2008, at 5:18 AM, Aleksander M. Stensby wrote: Yes it is. You need a unique id because the add method works as and add or update method. When adding a document whose ID is already found in the index, the old document will be deleted and the new will be added. Are you indexing two tables into the same index? Or does one entry in the index consist of data from both tables? How are these linked together without an ID? - Aleksander On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao [EMAIL PROTECTED] wrote: Hi, Is the uniqueKey in schema.xml really required? Reason is, I am indexing two tables and I have id as unique key in schema.xml but id field is not there in one of the tables and indexing fails. Do I really require this unique field for Solr to index it better or can I do away with this? Thanks, Rahgu -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: solrQueryParser does not take effect - nightly build
That sounds a bit strange. Did you do the changes in the schema.xml before starting the server? Because if you change it while it is running, it will by default delete and replace the file (discarding any changes you make). In other words, make sure the server is not running, make your changes and then start up the server. Apart from that, I can't really see any reason for this to not work... - Aleks On Thu, 20 Nov 2008 22:03:30 +0100, ashokc [EMAIL PROTECTED] wrote: Hi, I have set solrQueryParser defaultOperator=AND/ but it is not taking effect. It continues to take it as OR. I am working with the latest nightly build 11/20/2008 For a querry like term1 term2 Debug shows str name=parsedquerycontent:term1 content:term2/str Bug? Thanks - ashok
Re: Unique id
Ok, this brings me to the question; how are the two view's connected to each other (since you are indexing partly view 1 and partly view 2 into a single index structure? If they are not at all connected I believe you have made a fundamental mistake / misunderstand the use of your index... I assume that a Task can be assigned to a person, and your Team view displays that person, right? Maybe you are doing something like this: View 1 1, somename, sometimestamp, someothertimestamp 2, someothername, somethirdtimestamp, timetamp4 ... View 2 1, 58, 0 2, 58, 1 3, 52, 0 ... I'm really confused about your database structure... To me, It would be logical to add a team_id field to the team table, and add a third table to link tasks to a team (or to individual persons). Once you have that information (because I do assume there MUST be some link there) you would do: insert into your index: (id from the task), (name of the task), (id of the person assigned to this task), (id of the departement that this person works in). I guess that you _might_ be thinking a bit wrong and trying to do something like this: Treat each view as independent views, and inserting values from each table as separate documents in the index so you would do: insert into your index: (id from the task), (name of the task), (no value), (no value) which will be ok to do (no value), (no value), (id of the person), (id of the departement) --- which makes no sense to me... So, can you clearify the relationship between the two views, and how you are thinking of inserting entries into your index? - Aleks On Fri, 21 Nov 2008 10:33:28 +0100, Raghunandan Rao [EMAIL PROTECTED] wrote: View structure is: 1. Task(id* (int), name (string), start (timestamp), end (timestamp)) 2. Team(person_id (int), deptId (int), isManager (int)) * is primary key In schema.xml I have field name=id type=integer indexed=true stored=true required=true/ field name=name type=text indexed=true stored=true/ field name=personId type=integer indexed=true stored=true/ field name= deptId type=integer indexed=true stored=true/ uniqueKeyid/uniqueKey -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: Friday, November 21, 2008 2:56 PM To: solr-user@lucene.apache.org Subject: Re: Unique id Hello again. I'm getting a bit confused by your questions, and I believe it would be easier for us to help you if you could post the field definitions from your schema.xml and the structure of your two database views. ie. table 1: (id (int), subject (string) -.--) table 2: (category (string), other fields ..) So please post this and we can try to help you. - Aleks On Fri, 21 Nov 2008 07:49:31 +0100, Raghunandan Rao [EMAIL PROTECTED] wrote: Thanks Erik. If I convert that to a string then id field defined in schema.xml would fail as I have that as integer. If I change that to string then first view would fail as it is Integer there. What to do in such scenarios? Do I need to define multiple schema.xml or multiple unique key definitions in same schema. How does this work? Pls explain. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, November 20, 2008 6:40 PM To: solr-user@lucene.apache.org Subject: Re: Unique id I'd suggest aggregating those three columns into a string that can serve as the Solr uniqueKey field value. Erik On Nov 20, 2008, at 1:10 AM, Raghunandan Rao wrote: Basically, I am working on two views. First one has an ID column. The second view has no unique ID column. What to do in such situations? There are 3 other columns where I can make a composite key out of those. I have to index these two views now. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 5:24 PM To: solr-user@lucene.apache.org Subject: Re: Unique id Technically, no, a uniqueKey field is NOT required. I've yet to run into a situation where it made sense not to use one though. As for indexing database tables - if one of your tables doesn't have a primary key, does it have an aggregate unique key of some sort? Do you plan on updating the rows in that table and reindexing them? Seems like some kind of unique key would make sense for updating documents. But yeah, a more detailed description of your table structure and searching needs would be helpful. Erik On Nov 19, 2008, at 5:18 AM, Aleksander M. Stensby wrote: Yes it is. You need a unique id because the add method works as and add or update method. When adding a document whose ID is already found in the index, the old document will be deleted and the new will be added. Are you indexing two tables into the same index? Or does one entry in the index consist of data from both tables? How are these linked together without an ID? - Aleksander On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao [EMAIL PROTECTED] wrote: Hi, Is the uniqueKey
Re: Unique id
And in case that wasn't clear, the reason for it failing then would obviously be because you define the id field with required=true, and you try inserting a document where this field is missing... - Aleks On Fri, 21 Nov 2008 10:46:10 +0100, Aleksander M. Stensby [EMAIL PROTECTED] wrote: Ok, this brings me to the question; how are the two view's connected to each other (since you are indexing partly view 1 and partly view 2 into a single index structure? If they are not at all connected I believe you have made a fundamental mistake / misunderstand the use of your index... I assume that a Task can be assigned to a person, and your Team view displays that person, right? Maybe you are doing something like this: View 1 1, somename, sometimestamp, someothertimestamp 2, someothername, somethirdtimestamp, timetamp4 ... View 2 1, 58, 0 2, 58, 1 3, 52, 0 ... I'm really confused about your database structure... To me, It would be logical to add a team_id field to the team table, and add a third table to link tasks to a team (or to individual persons). Once you have that information (because I do assume there MUST be some link there) you would do: insert into your index: (id from the task), (name of the task), (id of the person assigned to this task), (id of the departement that this person works in). I guess that you _might_ be thinking a bit wrong and trying to do something like this: Treat each view as independent views, and inserting values from each table as separate documents in the index so you would do: insert into your index: (id from the task), (name of the task), (no value), (no value) which will be ok to do (no value), (no value), (id of the person), (id of the departement) --- which makes no sense to me... So, can you clearify the relationship between the two views, and how you are thinking of inserting entries into your index? - Aleks On Fri, 21 Nov 2008 10:33:28 +0100, Raghunandan Rao [EMAIL PROTECTED] wrote: View structure is: 1. Task(id* (int), name (string), start (timestamp), end (timestamp)) 2. Team(person_id (int), deptId (int), isManager (int)) * is primary key In schema.xml I have field name=id type=integer indexed=true stored=true required=true/ field name=name type=text indexed=true stored=true/ field name=personId type=integer indexed=true stored=true/ field name= deptId type=integer indexed=true stored=true/ uniqueKeyid/uniqueKey -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: Friday, November 21, 2008 2:56 PM To: solr-user@lucene.apache.org Subject: Re: Unique id Hello again. I'm getting a bit confused by your questions, and I believe it would be easier for us to help you if you could post the field definitions from your schema.xml and the structure of your two database views. ie. table 1: (id (int), subject (string) -.--) table 2: (category (string), other fields ..) So please post this and we can try to help you. - Aleks On Fri, 21 Nov 2008 07:49:31 +0100, Raghunandan Rao [EMAIL PROTECTED] wrote: Thanks Erik. If I convert that to a string then id field defined in schema.xml would fail as I have that as integer. If I change that to string then first view would fail as it is Integer there. What to do in such scenarios? Do I need to define multiple schema.xml or multiple unique key definitions in same schema. How does this work? Pls explain. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, November 20, 2008 6:40 PM To: solr-user@lucene.apache.org Subject: Re: Unique id I'd suggest aggregating those three columns into a string that can serve as the Solr uniqueKey field value. Erik On Nov 20, 2008, at 1:10 AM, Raghunandan Rao wrote: Basically, I am working on two views. First one has an ID column. The second view has no unique ID column. What to do in such situations? There are 3 other columns where I can make a composite key out of those. I have to index these two views now. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 5:24 PM To: solr-user@lucene.apache.org Subject: Re: Unique id Technically, no, a uniqueKey field is NOT required. I've yet to run into a situation where it made sense not to use one though. As for indexing database tables - if one of your tables doesn't have a primary key, does it have an aggregate unique key of some sort? Do you plan on updating the rows in that table and reindexing them? Seems like some kind of unique key would make sense for updating documents. But yeah, a more detailed description of your table structure and searching needs would be helpful. Erik On Nov 19, 2008, at 5:18 AM, Aleksander M. Stensby wrote: Yes it is. You need a unique id because the add method works as and add or update method. When adding a document whose ID is already found in the index, the old document will be deleted
Re: Unique id
Well, In that case, what do you want to search for? If I were you, I would make my index consist of tasks (and I assume that is what you are trying to do). So why don't you just use your schema.xml as you have right now, and do the following: Pick a person (let's say he has person_id=42 and deptId=3), get his queue of tasks, then for each task in queue do: insert into index: (id from the task), (name of the task), (id of the person), (id of the departement) an example: 3, this is a very important task, 42, 3 4, this one is also important, 42, 3 5, this one is low priority, 42, 3 And then for the next person you do the same, (person_id=58 and deptId=5) insert: 6, this is about solr, 58, 5 7, this is about lucene, 58, 5 etc. Now you can search for all tasks in departement 5 by doing deptId:5. If you want to search for all the tasks assigned to a specific person you just enter the query personId:42. And you could also search for all tasks containing certain keywords by doing the query name:solr OR name:lucene. Do you understand now, or is it still unclear? - Aleks On Fri, 21 Nov 2008 10:56:38 +0100, Raghunandan Rao [EMAIL PROTECTED] wrote: Ok. There is common column in two views called queueId. I query second view first and get all the queueids for a person. And having queueIds I get all the ids from first view. Sorry for missing that column earlier. I think it should make sense now. -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: Friday, November 21, 2008 3:18 PM To: solr-user@lucene.apache.org Subject: Re: Unique id And in case that wasn't clear, the reason for it failing then would obviously be because you define the id field with required=true, and you try inserting a document where this field is missing... - Aleks On Fri, 21 Nov 2008 10:46:10 +0100, Aleksander M. Stensby [EMAIL PROTECTED] wrote: Ok, this brings me to the question; how are the two view's connected to each other (since you are indexing partly view 1 and partly view 2 into a single index structure? If they are not at all connected I believe you have made a fundamental mistake / misunderstand the use of your index... I assume that a Task can be assigned to a person, and your Team view displays that person, right? Maybe you are doing something like this: View 1 1, somename, sometimestamp, someothertimestamp 2, someothername, somethirdtimestamp, timetamp4 ... View 2 1, 58, 0 2, 58, 1 3, 52, 0 ... I'm really confused about your database structure... To me, It would be logical to add a team_id field to the team table, and add a third table to link tasks to a team (or to individual persons). Once you have that information (because I do assume there MUST be some link there) you would do: insert into your index: (id from the task), (name of the task), (id of the person assigned to this task), (id of the departement that this person works in). I guess that you _might_ be thinking a bit wrong and trying to do something like this: Treat each view as independent views, and inserting values from each table as separate documents in the index so you would do: insert into your index: (id from the task), (name of the task), (no value), (no value) which will be ok to do (no value), (no value), (id of the person), (id of the departement) --- which makes no sense to me... So, can you clearify the relationship between the two views, and how you are thinking of inserting entries into your index? - Aleks On Fri, 21 Nov 2008 10:33:28 +0100, Raghunandan Rao [EMAIL PROTECTED] wrote: View structure is: 1. Task(id* (int), name (string), start (timestamp), end (timestamp)) 2. Team(person_id (int), deptId (int), isManager (int)) * is primary key In schema.xml I have field name=id type=integer indexed=true stored=true required=true/ field name=name type=text indexed=true stored=true/ field name=personId type=integer indexed=true stored=true/ field name= deptId type=integer indexed=true stored=true/ uniqueKeyid/uniqueKey -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: Friday, November 21, 2008 2:56 PM To: solr-user@lucene.apache.org Subject: Re: Unique id Hello again. I'm getting a bit confused by your questions, and I believe it would be easier for us to help you if you could post the field definitions from your schema.xml and the structure of your two database views. ie. table 1: (id (int), subject (string) -.--) table 2: (category (string), other fields ..) So please post this and we can try to help you. - Aleks On Fri, 21 Nov 2008 07:49:31 +0100, Raghunandan Rao [EMAIL PROTECTED] wrote: Thanks Erik. If I convert that to a string then id field defined in schema.xml would fail as I have that as integer. If I change that to string then first view would fail as it is Integer there. What to do in such scenarios? Do I need to define multiple schema.xml or multiple unique key
Re: Unique id
I still don't understand why you want two different indexes if you want to return the linked information each time anyways... I would say the easiest way is just to index all data (all columns from your views) into the index like this: taskid - taskname - start - end - personid - deptid - ismanager then you can just search like I already explained earlier. This way, you have already joined by queue-id when you insert it into the index and thus you get both results from one single search. (if you also want to have the ability to search on the queueID, just add a column for that. In general, your questions doesn't really have anything to do with solr, but architecture, db-design and what you want to search on. - A. 1. Task(id* (int), name (string), start (timestamp), end (timestamp)) 2. Team(person_id (int), deptId (int), isManager (int)) * is primary key In schema.xml I have field name=id type=integer indexed=true stored=true required=true/ field name=name type=text indexed=true stored=true/ field name=personId type=integer indexed=true stored=true/ field name= deptId type=integer indexed=true stored=true/ On Fri, 21 Nov 2008 11:59:56 +0100, Raghunandan Rao [EMAIL PROTECTED] wrote: Can you also let me know how I join two search indices in one query? That means, in this case I have two diff search indices and I need to join by queueId and get all the tasks in one SolrQuery. I am creating queries in Solrj. -Original Message- From: Raghunandan Rao [mailto:[EMAIL PROTECTED] Sent: Friday, November 21, 2008 3:45 PM To: solr-user@lucene.apache.org Subject: RE: Unique id Ok. I got your point. So I need not require ID field in the second view. I will hence remove required=true in schema.xml. What I thought was unique ID makes indexing easier or used to maintain doc. Thanks a lot. -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: Friday, November 21, 2008 3:36 PM To: solr-user@lucene.apache.org Subject: Re: Unique id Well, In that case, what do you want to search for? If I were you, I would make my index consist of tasks (and I assume that is what you are trying to do). So why don't you just use your schema.xml as you have right now, and do the following: Pick a person (let's say he has person_id=42 and deptId=3), get his queue of tasks, then for each task in queue do: insert into index: (id from the task), (name of the task), (id of the person), (id of the departement) an example: 3, this is a very important task, 42, 3 4, this one is also important, 42, 3 5, this one is low priority, 42, 3 And then for the next person you do the same, (person_id=58 and deptId=5) insert: 6, this is about solr, 58, 5 7, this is about lucene, 58, 5 etc. Now you can search for all tasks in departement 5 by doing deptId:5. If you want to search for all the tasks assigned to a specific person you just enter the query personId:42. And you could also search for all tasks containing certain keywords by doing the query name:solr OR name:lucene. Do you understand now, or is it still unclear? - Aleks On Fri, 21 Nov 2008 10:56:38 +0100, Raghunandan Rao [EMAIL PROTECTED] wrote: Ok. There is common column in two views called queueId. I query second view first and get all the queueids for a person. And having queueIds I get all the ids from first view. Sorry for missing that column earlier. I think it should make sense now. -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: Friday, November 21, 2008 3:18 PM To: solr-user@lucene.apache.org Subject: Re: Unique id And in case that wasn't clear, the reason for it failing then would obviously be because you define the id field with required=true, and you try inserting a document where this field is missing... - Aleks On Fri, 21 Nov 2008 10:46:10 +0100, Aleksander M. Stensby [EMAIL PROTECTED] wrote: Ok, this brings me to the question; how are the two view's connected to each other (since you are indexing partly view 1 and partly view 2 into a single index structure? If they are not at all connected I believe you have made a fundamental mistake / misunderstand the use of your index... I assume that a Task can be assigned to a person, and your Team view displays that person, right? Maybe you are doing something like this: View 1 1, somename, sometimestamp, someothertimestamp 2, someothername, somethirdtimestamp, timetamp4 ... View 2 1, 58, 0 2, 58, 1 3, 52, 0 ... I'm really confused about your database structure... To me, It would be logical to add a team_id field to the team table, and add a third table to link tasks to a team (or to individual persons). Once you have that information (because I do assume there MUST be some link there) you would do: insert into your index: (id from the task), (name of the task), (id of the person assigned to this task), (id of the departement that this person works in). I guess that you _might_
Re: Unique id
Ok, but how do you map your table structure to the index? As far as I can understand, the two tables have different structre, so why/how do you map two different datastructures onto a single index? Are the two tables connected in some way? If so, you could make your index structure reflect the union of both tables and just make one insertion into the index per entry of the two tables. Maybe you could post the table structure so that I can get a better understanding of your use-case... - Aleks On Wed, 19 Nov 2008 11:25:56 +0100, Raghunandan Rao [EMAIL PROTECTED] wrote: Ok got it. I am indexing two tables differently. I am using Solrj to index with @Field annotation. I make two queries initially and fetch the data from two tables and index them separately. But what if the ids in two tables are same? That means documents with same id will be deleted when doing update. How does this work? Please explain. Thanks. -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 3:49 PM To: solr-user@lucene.apache.org Subject: Re: Unique id Yes it is. You need a unique id because the add method works as and add or update method. When adding a document whose ID is already found in the index, the old document will be deleted and the new will be added. Are you indexing two tables into the same index? Or does one entry in the index consist of data from both tables? How are these linked together without an ID? - Aleksander On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao [EMAIL PROTECTED] wrote: Hi, Is the uniqueKey in schema.xml really required? Reason is, I am indexing two tables and I have id as unique key in schema.xml but id field is not there in one of the tables and indexing fails. Do I really require this unique field for Solr to index it better or can I do away with this? Thanks, Rahgu
Re: Unique id
Yes it is. You need a unique id because the add method works as and add or update method. When adding a document whose ID is already found in the index, the old document will be deleted and the new will be added. Are you indexing two tables into the same index? Or does one entry in the index consist of data from both tables? How are these linked together without an ID? - Aleksander On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao [EMAIL PROTECTED] wrote: Hi, Is the uniqueKey in schema.xml really required? Reason is, I am indexing two tables and I have id as unique key in schema.xml but id field is not there in one of the tables and indexing fails. Do I really require this unique field for Solr to index it better or can I do away with this? Thanks, Rahgu -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: Use SOLR like the MySQL LIKE
Hi there, You should use LowerCaseTokenizerFactory as you point out yourself. As far as I know, the StandardTokenizer recognizes email addresses and internet hostnames as one token. In your case, I guess you want an email, say [EMAIL PROTECTED] to be split into four tokens: average joe apache org, or something like that, which would indeed allow you to search for joe or average j* and match. To do so, you could use the WordDelimiterFilterFactory and split on intra-word delimiters (I think the defaults here are non-alphanumeric chars). Take a look at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters for more info on tokenizers and filters. cheers, Aleks On Tue, 18 Nov 2008 08:35:31 +0100, Carsten L [EMAIL PROTECTED] wrote: Hello. The data: I have a dataset containing ~500.000 documents. In each document there is an email, a name and an user ID. The problem: I would like to be able to search in it, but it should be like the MySQL LIKE. So when a user enters the search term: carsten, then the query looks like: name:(carsten) OR name:(carsten*) OR email:(carsten) OR email:(carsten*) OR userid:(carsten) OR userid:(carsten*) Then it should match: carsten l carsten larsen Carsten Larsen Carsten CARSTEN etc. And when the user enters the term: carsten l the query looks like: name:(carsten l) OR name:(carsten l*) OR email:(carsten l) OR email:(carsten l*) OR userid:(carsten l) OR userid:(carsten l*) Then it should match: carsten l carsten larsen Carsten Larsen Or written to the MySQL syntax: ... WHERE `name` LIKE 'carsten%' OR `email` LIKE 'carsten%' OR `userid` LIKE 'carsten%'... I know that I need to use the solr.LowerCaseTokenizerFactory on my name and email field, to ensure case insentitive behavior. The problem seems to be the wildcards and the whitespaces. -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: Use SOLR like the MySQL LIKE
Ah, okay! Well, then I suggest you index the field in two different ways if you want both possible ways of searching. One, where you treat the entire name as one token (in lowercase) (then you can search for avera* and match on for instance average joe etc.) And then another field where you tokenize on whitespace for instance, if you want/need that possibility aswell. Look at the solr copy fields and try it out, it works like a charm :) Cheers, Aleksander On Tue, 18 Nov 2008 10:40:24 +0100, Carsten L [EMAIL PROTECTED] wrote: Thanks for the quick reply! It is supposed to work a little like the Google Suggest or field autocompletion. I know I mentioned email and userid, but the problem lies with the name field, because of the whitespaces in combination with the wildcard. I looked at the solr.WordDelimiterFilterFactory, but it does not mention anything about whitespaces - or wildcards. A quick brushup: I would like to mimic the LIKE functionality from MySQL using the wildcards in the end of the searchquery. In MySQL whitespaces are treated as characters, not splitters. Aleksander M. Stensby wrote: Hi there, You should use LowerCaseTokenizerFactory as you point out yourself. As far as I know, the StandardTokenizer recognizes email addresses and internet hostnames as one token. In your case, I guess you want an email, say [EMAIL PROTECTED] to be split into four tokens: average joe apache org, or something like that, which would indeed allow you to search for joe or average j* and match. To do so, you could use the WordDelimiterFilterFactory and split on intra-word delimiters (I think the defaults here are non-alphanumeric chars). Take a look at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters for more info on tokenizers and filters. cheers, Aleks On Tue, 18 Nov 2008 08:35:31 +0100, Carsten L [EMAIL PROTECTED] wrote: Hello. The data: I have a dataset containing ~500.000 documents. In each document there is an email, a name and an user ID. The problem: I would like to be able to search in it, but it should be like the MySQL LIKE. So when a user enters the search term: carsten, then the query looks like: name:(carsten) OR name:(carsten*) OR email:(carsten) OR email:(carsten*) OR userid:(carsten) OR userid:(carsten*) Then it should match: carsten l carsten larsen Carsten Larsen Carsten CARSTEN etc. And when the user enters the term: carsten l the query looks like: name:(carsten l) OR name:(carsten l*) OR email:(carsten l) OR email:(carsten l*) OR userid:(carsten l) OR userid:(carsten l*) Then it should match: carsten l carsten larsen Carsten Larsen Or written to the MySQL syntax: ... WHERE `name` LIKE 'carsten%' OR `email` LIKE 'carsten%' OR `userid` LIKE 'carsten%'... I know that I need to use the solr.LowerCaseTokenizerFactory on my name and email field, to ensure case insentitive behavior. The problem seems to be the wildcards and the whitespaces. -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: Calculating peaks - solrj support for facet.date?
As Erik said, you can just set the parameters yourself SolrQuery query = new SolrQuery(...); query.set(FacetParams.FACET_DATE, ...); etc. You'll find all facet-related parameters in the FacetParams interface, located in the org.apache.solr.common.params package. - Aleks On Fri, 07 Nov 2008 14:26:56 +0100, Erik Hatcher [EMAIL PROTECTED] wrote: On Nov 7, 2008, at 7:23 AM, [EMAIL PROTECTED] wrote: Sorry, but I have one more question. Does the java client solrj support facet.date? Yeah, but it doesn't have explicit setters for it. A SolrQuery is also a ModifiableSolrParams - so you can call the add/set methods on it using the same keys used with HTTP requests. Erik -- Aleksander M. Stensby Senior software developer Integrasco A/S
Re: EmbeddedSolrServer and the MultiCore functionality
Okay, sounds fair. Well, why I would have multiple shards was based on the presumption that it would be more effective to be able to search in single shards when needed (if each shard contains lets say 30 million entries) and then when time comes, migrate one of the shards to a different node. But I guess the gain in performance is not significant and that i should rather have just one shard per node. Or? Best regards and thanks for your answer, Aleksander On Tue, 23 Sep 2008 16:57:08 +0200, Ryan McKinley [EMAIL PROTECTED] wrote: If i have solr up and running and do something like this: query.set(shards, localhost:8080/solr/core0,localhost: 8080/solr/core1); I will get the results from both cores, obviously... But is there a way to do this without using shards and accessing the cores through http? I presume it would/should be possible to do the same thing directly against the cores, but my question is really if this has been implemented already / is it possible? not implemented... Check line 384 of SearchHandler.java SolrServer server = new CommonsHttpSolrServer(url, client); it defaults to CommonsHttpSolrServer. This could easily change to EmbeddedSolrServer, but i'm not sure it is a very common usecase... why would you have multiple shards on the same machine? ryan -- Aleksander M. Stensby Senior Software Developer Integrasco A/S +47 41 22 82 72 [EMAIL PROTECTED]
EmbeddedSolrServer and the MultiCore functionality
Hello everyone, I'm new to Solr (have been using Lucene for a few years now). We are looking into Solr and have heard many good things about the project:) I have a few questions regarding the EmbeddedSolrServer in Solrj and the MultiCore features... I've tried to find answers to this in the archives but have not succeeded. The thing is, I want to be able to use the Embedded server to access multiple cores on one machine, and I would like to at least have the possibility to access the lucene indexes without http. In particular I'm wondering if it is possible to do the shards (distributed search) approach using the embedded server, without using http requests. lets say I register 2 cores to a container and init my embedded server like this: CoreContainer container = new CoreContainer(); container.register(core1, core1, false); container.register(core2, core2, false); server = new EmbeddedSolrServer(container, core1); then queries performed on my server will return results from core1... and if i do ..=new EmbeddedSolrServer(container, core2) the results will come from core2. If i have solr up and running and do something like this: query.set(shards, localhost:8080/solr/core0,localhost:8080/solr/core1); I will get the results from both cores, obviously... But is there a way to do this without using shards and accessing the cores through http? I presume it would/should be possible to do the same thing directly against the cores, but my question is really if this has been implemented already / is it possible? Thanks in advance for any replies! Best regards, Aleksander -- Aleksander M. Stensby Senior Software Developer Integrasco A/S +47 41 22 82 72 [EMAIL PROTECTED]