RE: Facet? Search problem
Thanks. I'll look at that as well. -Original Message- From: Stefan Matheis [mailto:matheis.ste...@gmail.com] Sent: Tuesday, March 14, 2017 1:20 PM To: solr-user@lucene.apache.org Subject: RE: Facet? Search problem Scott Depending on what you're looking for https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results might be worth a look as well. -Stefan On Mar 14, 2017 7:25 PM, "Scott Smith" <ssm...@mainstreamdata.com> wrote: > Grouping appears to be exactly what I'm looking for. I added > "group=true=category" to my search and It appears that I > get a list of groups, one document in each group that matches the > search along with (bonus) the number of documents in the category that > match that search. Perfect. Thank you very much. > > -Original Message- > From: Dave [mailto:hastings.recurs...@gmail.com] > Sent: Monday, March 13, 2017 7:59 PM > To: solr-user@lucene.apache.org > Subject: Re: Facet? Search problem > > Perhaps look into grouping on that field. > > > On Mar 13, 2017, at 9:08 PM, Scott Smith <ssm...@mainstreamdata.com> > wrote: > > > > I'm trying to solve a search problem and wondering if facets (or > something else) might solve the problem. > > > > Let's assume I have a bunch of documents (100 million+). Each > > document > has a category (keyword) assigned to it. A single document my only > have one category, but there may be multiple documents with the same > category (1 to a few hundred documents may be in any one category). > There are several million categories. > > > > Supposed I'm doing a search with a page size of 50. What I want to > > do > is do a search (e.g., "dog") and get back the top 50 documents that > match the contain the word "dog" and are all in different categories. > So, there needs to be one document from 50 different categories. > > > > If that's not possible, then is it possible to do it if I know the > > 50 > categories up-front and hand that off as part of the search (so "find > 50 documents that match the term 'dog' and there is one document from > each of > 50 specified categories"). > > > > Is there a way to do this? > > > > I'm not extremely knowledgeable about facets, but thought that might > > be > a solution. But, it doesn't have to be facets. > > > > Thanks for any help > > > > Scott > > > > >
RE: Facet? Search problem
Grouping appears to be exactly what I'm looking for. I added "group=true=category" to my search and It appears that I get a list of groups, one document in each group that matches the search along with (bonus) the number of documents in the category that match that search. Perfect. Thank you very much. -Original Message- From: Dave [mailto:hastings.recurs...@gmail.com] Sent: Monday, March 13, 2017 7:59 PM To: solr-user@lucene.apache.org Subject: Re: Facet? Search problem Perhaps look into grouping on that field. > On Mar 13, 2017, at 9:08 PM, Scott Smith <ssm...@mainstreamdata.com> wrote: > > I'm trying to solve a search problem and wondering if facets (or something > else) might solve the problem. > > Let's assume I have a bunch of documents (100 million+). Each document has a > category (keyword) assigned to it. A single document my only have one > category, but there may be multiple documents with the same category (1 to a > few hundred documents may be in any one category). There are several million > categories. > > Supposed I'm doing a search with a page size of 50. What I want to do is do > a search (e.g., "dog") and get back the top 50 documents that match the > contain the word "dog" and are all in different categories. So, there needs > to be one document from 50 different categories. > > If that's not possible, then is it possible to do it if I know the 50 > categories up-front and hand that off as part of the search (so "find 50 > documents that match the term 'dog' and there is one document from each of 50 > specified categories"). > > Is there a way to do this? > > I'm not extremely knowledgeable about facets, but thought that might be a > solution. But, it doesn't have to be facets. > > Thanks for any help > > Scott > >
Facet? Search problem
I'm trying to solve a search problem and wondering if facets (or something else) might solve the problem. Let's assume I have a bunch of documents (100 million+). Each document has a category (keyword) assigned to it. A single document my only have one category, but there may be multiple documents with the same category (1 to a few hundred documents may be in any one category). There are several million categories. Supposed I'm doing a search with a page size of 50. What I want to do is do a search (e.g., "dog") and get back the top 50 documents that match the contain the word "dog" and are all in different categories. So, there needs to be one document from 50 different categories. If that's not possible, then is it possible to do it if I know the 50 categories up-front and hand that off as part of the search (so "find 50 documents that match the term 'dog' and there is one document from each of 50 specified categories"). Is there a way to do this? I'm not extremely knowledgeable about facets, but thought that might be a solution. But, it doesn't have to be facets. Thanks for any help Scott
Accessing document stored fields in a custom function
I'm creating a custom function (extends ValueSource). I'm generating a value that will both be returned as a value in the hit for each doc and also be used to sort. As I read the documentation, this is not difficult. To determine the value for a document, I need to access the stored fields for that document (i.e., the value that the function will generate partially depends on stored information in the document). How do I access them from the getValues() method? Is this via the FieldCache.DEFAULT? I'm using solr 4.8 if that makes a difference (which I think it does since older examples seem to have been deprecated). For example, if I have a field called Fred, how do I access that field from the document? Is accessing the stored data going to have a big impact on the time to return results? Thanks Scott
RE: Help on custom sort
I'll take a look at that. Thanks -Original Message- From: Apoorva Gaurav [mailto:apoorva.gau...@myntra.com] Sent: Sunday, September 21, 2014 11:32 PM To: solr-user Subject: Re: Help on custom sort Try using a custom value source parser and pass the formula of computing the price to solr; something like this http://java.dzone.com/articles/connecting-redis-solr-boosting On Mon, Sep 22, 2014 at 1:38 AM, Scott Smith ssm...@mainstreamdata.com wrote: There are likely several hundred groups. Also, new groups will be added and some groups will be deleted. So, I don't think putting a field in the docs works. Having to add a new group price into 100 million+ documents doesn't seem reasonable. Right now I'm looking at http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external.html. This reference a much older version of solr (the blog is from 2011) and so I will need to update the classes referenced. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Saturday, September 20, 2014 11:58 AM To: solr-user@lucene.apache.org Subject: Re: Help on custom sort How many different groups are there? And can user A ever be part of more than one group? If 1 there are a reasonably small number of groups ( 100 or so as a place to start) and 2 a user is always part of a single group then you could store separate prices in each document by group, thus you'd have some fields like price_group_a: $100 price_group_b: $101 then sorting becomes trivial, you just specify a sort_group_a for users in group A etc. If the number of groups is unknown-but-not-huge dynamic fields could be used. If that's not the case, then you might be able to get clever with sorting by function, here's a place to start: https://cwiki.apache.org/confluence/display/solr/Function+Queries These can be arbitrarily complex, but I'm thinking something where the price returned by the function respects the group the user is in, perhaps even the min/max of all the groups the user is in. I admit I haven't really thought that through well though... Best, Erick On Sat, Sep 20, 2014 at 9:26 AM, Scott Smith ssm...@mainstreamdata.com wrote: I need to provide a custom sort option for sorting by price and I would like some suggestions. It's not the straightforward just sort by a price field in the document scenario or I wouldn't be asking for help. Here's the scenario I'm dealing with. I have 100 million+ documents (so multi-sharded). Users search for documents they are interested in using a standard keyword search. They then purchase documents they are interested in. So far, nothing hard. Here's where things get interesting. The documents come from multiple suppliers. Each supplier sets a price for his documents and different suppliers will provide different pricing. That wouldn't be difficult except that *users* are divided up into different groups and depending on which group they are in, the supplier will charge the user a different price. So, user A may pay one price for a document and user B may pay a different price for the same document just because user A and user B are in different groups. I don't even know if the relative order or pricing is the same between different groups (e.g., if document X is more expensive than document Y for a user in group M, it may not be more expensive for a user in group N). The one thing that may make this doable is that supplier A will likely have the same price for all of his documents for each of the user groups. So, a user in group A will pay the same price regardless of which document he buys from supplier 1. A user in group B will also pay the same price for any document from supplier 1; it's just that a user in group B will likely pay a different price than a user in group A. So, within a supplier, the price varies based on user group, not the document. To summarize, one of the requirements for the system is that we provide the ability to sort search results based on price. This would be easy except that the price a user pays not only depends on what he wants to buy, but on what group the he is in. I suspect there is some kind of custom solr module I'm going to have to write. I'm thinking that the user group gets passed in as a custom solr parameter (I'm assuming that's possible??). Then I'm thinking that there has to be some kind of in memory database that tracks pricing based on user group and document supplier). I'm happy to go read code, documents, links, etc if someone can point me in the right direction. What kind of solr module am I likely going to write (extend) and are there some examples somewhere? Maybe there's a way to do this without having to extend a solr module?? Hope this makes sense. Any help is appreciated. Scott -- Thanks Regards, Apoorva
Help on custom sort
I need to provide a custom sort option for sorting by price and I would like some suggestions. It's not the straightforward just sort by a price field in the document scenario or I wouldn't be asking for help. Here's the scenario I'm dealing with. I have 100 million+ documents (so multi-sharded). Users search for documents they are interested in using a standard keyword search. They then purchase documents they are interested in. So far, nothing hard. Here's where things get interesting. The documents come from multiple suppliers. Each supplier sets a price for his documents and different suppliers will provide different pricing. That wouldn't be difficult except that *users* are divided up into different groups and depending on which group they are in, the supplier will charge the user a different price. So, user A may pay one price for a document and user B may pay a different price for the same document just because user A and user B are in different groups. I don't even know if the relative order or pricing is the same between different groups (e.g., if document X is more expensive than document Y for a user in group M, it may not be more expensive for a user in group N). The one thing that may make this doable is that supplier A will likely have the same price for all of his documents for each of the user groups. So, a user in group A will pay the same price regardless of which document he buys from supplier 1. A user in group B will also pay the same price for any document from supplier 1; it's just that a user in group B will likely pay a different price than a user in group A. So, within a supplier, the price varies based on user group, not the document. To summarize, one of the requirements for the system is that we provide the ability to sort search results based on price. This would be easy except that the price a user pays not only depends on what he wants to buy, but on what group the he is in. I suspect there is some kind of custom solr module I'm going to have to write. I'm thinking that the user group gets passed in as a custom solr parameter (I'm assuming that's possible??). Then I'm thinking that there has to be some kind of in memory database that tracks pricing based on user group and document supplier). I'm happy to go read code, documents, links, etc if someone can point me in the right direction. What kind of solr module am I likely going to write (extend) and are there some examples somewhere? Maybe there's a way to do this without having to extend a solr module?? Hope this makes sense. Any help is appreciated. Scott
Tie breakers when sorting equal items
I promised to ask this on the forum just to confirm what I assume is true. Suppose you're returning results using a sort order based on some field (so, not relevancy). For example, suppose it's a date field which indicates when the document was loaded into the solr index. Suppose two items have exactly the same date/time in the field. Would solr return the two items in the order in which they were inserted. I would assume that the answer is not necessarily. I know that you can have secondary sort fields if something exists that would provide the desired functionality. I know that I could set up some kind of numbering scheme that would provide the same result (the customer doesn't want to pay for that). So, I'm really just asking if Solr has any guarantees that when you sort on a field and two items have the same value, they will be sorted in the order they were inserted into the index. Again, I assume the answer is no, but I said I would ask.
Solr query processing
I just want to state a couple of things and hear someone say, that's right. 1. In a solr query you can have multiple fq's, but only a single q. And yes, I can simply AND the multiple qs together. Just want to avoid that if I'm wrong. 2. A subtler issue is that when a full query is executied, Solr must look at the schema to see how each field was tokenized (or not) and the various other filters applied to a field so that it can properly transform fields data (e.g., tokenize the text, but not keywords). As an aside, it would be nice if the queryparser could do the same thing in Lucene (I know, wrong forum :)). Scott
RE: Custom Solr indexer/searcher
Thanks for the suggestions. I'll take a look at these things. -Original Message- From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] Sent: Thursday, November 15, 2012 11:54 PM To: solr-user@lucene.apache.org Subject: Re: Custom Solr indexer/searcher Scott, It sounds like you need to look into few samples of similar things in Lucene. On top of my head FuzzyQuery from 4.0, which finds terms similar to the given in FST for query expansion. Generic query expansion is done via MultiTermQuery. Index time terms expansion is shown in TrieField and btw NumericRangeQuery (it should match with your goal a lot). All these are single dimension samples, but AFAIK KD-tree is multidimensional, look into GeoHashField which puts two dimensional points into single terms with ability to build ranges on them see GeoHashField.createSpatialQuery(). Happy hacking! On Fri, Nov 16, 2012 at 10:34 AM, John Whelan whelanl...@gmail.com wrote: Scott, I probably have no idea as to what I'm saying, but if you're looking for finding results in a N-dimensional space, you might look at creating a field of type 'point'. Point-type fields have a dimension attribute; I believe that it can be set to a large integer value. Barring that, there is also a 'dist()' function that can be used to work with multiple numeric fields in order sort results based on closeness to a desired coordinate. The 'dist function takes a parameter to specify the means of calculating the distance. (For example, 2 - 'Euclidean distance'. I don't know the other options.) In the worst case, my response is worthless, but pops your question back up in the e-mails... Regards, John -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Custom Solr indexer/searcher
Suppose I have a special data search type (something different than a string or numeric value) that I want to integrate into the Solr server. For example, suppose I wanted to implement a KD-tree as a filter that would integrate with standard Solr filters and queries. I might want to say find all of the documents in the index with the word 'tree' in them that are within a certain distance of a particular document in the KD-tree. Let me add that I'm not really looking for a KD-Tree implementation for Solr; I just assume that a fair number of people will know what a KD-tree is and so, have some idea that I'm talking about adding a new data type (different than string, long, etc.) that Solr will need to be able to index and search with. It's important that the new data type should integrate with the existing standard Solr data types for searching purposes. First, is there a way to build and specify a plugin that provides Solr both the indexer and search interfaces and therefore hides the internal details of what's going on in the search from Solr so it just thinks it's another search type? Or, would I have to hack Solr in a lot of places to add my custom data type in? Second, if the interface(s) exists to add in a new data type, is there documentation (tutorial, examples, etc.) anywhere on how to do this. Or, is my only option to dig into the Solr code? Mostly, I'm looking for some links or suggestions on where to start looking. I doubt this subject is simple enough to fit into an email post (though I'd be happy to be surprised :) ). You can assume Solr 4.0 if that makes things easier. You can also assume that I have some familiarity with Lucene (though I haven't hacked that code either). Hopefully, I've explained this well enough so that people know what I'm looking for. Cheers Scott
Exception in Solr server on more like this
I've been trying to get More like this running under solr 3.5. I get the Exception below. The http request is also highlighted below. I've looked at the FieldType code and I don't understand what's going on there. So, while I know what a null pointer exception means, it isn't telling me what I did or didn't do. FYI - the Body field has termVectors set to true which I thought was sufficient for MLT. What I'm trying to do is submit the phrase country now is the time country to MLT to determine the interesting words (which I want returned) and then return the top most relevant documents. Any help on what might be wrong would be appreciated. Scott 6975 [main] INFO com.mainstreamdata.MediasIndexer.mediasBrowser.SearchFactory - SearchFactory:SearchFactory: Search Factory initialized SolrQuery:: (country now is the time country) Filter:: (Language:en) 15274 [main] ERROR com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch - SolrSearch:getDocTier: Unable to do search: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:266) at com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getTier(SolrSearch.java:309) at com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getFirstTier(SolrSearch.java:93) at com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getNextOlderTier(SolrSearch.java:175) at com.mainstreamdata.MediasIndexer.SolrMgrTest.testMoreLikeThis(SolrMgrTest.java:209) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at junit.framework.TestCase.runTest(TestCase.java:164) at junit.framework.TestCase.runBare(TestCase.java:130) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:120) at junit.framework.TestSuite.runTest(TestSuite.java:230) at junit.framework.TestSuite.run(TestSuite.java:225) at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: org.apache.solr.common.SolrException: null java.lang.NullPointerException at org.apache.solr.schema.FieldType.storedToIndexed(FieldType.java:374) at org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis(MoreLikeThisHandler.java:320) at org.apache.solr.handler.component.MoreLikeThisComponent.getMoreLikeThese(MoreLikeThisComponent.java:82) at org.apache.solr.handler.component.MoreLikeThisComponent.process(MoreLikeThisComponent.java:57) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:208) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at
RE: Exception in Solr server on more like this
This turned out to be SOLR-2986. -Original Message- From: Scott Smith [mailto:ssm...@mainstreamdata.com] Sent: Thursday, December 22, 2011 1:24 PM To: solr-user@lucene.apache.org Subject: Exception in Solr server on more like this I've been trying to get More like this running under solr 3.5. I get the Exception below. The http request is also highlighted below. I've looked at the FieldType code and I don't understand what's going on there. So, while I know what a null pointer exception means, it isn't telling me what I did or didn't do. FYI - the Body field has termVectors set to true which I thought was sufficient for MLT. What I'm trying to do is submit the phrase country now is the time country to MLT to determine the interesting words (which I want returned) and then return the top most relevant documents. Any help on what might be wrong would be appreciated. Scott 6975 [main] INFO com.mainstreamdata.MediasIndexer.mediasBrowser.SearchFactory - SearchFactory:SearchFactory: Search Factory initialized SolrQuery:: (country now is the time country) Filter:: (Language:en) 15274 [main] ERROR com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch - SolrSearch:getDocTier: Unable to do search: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:266) at com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getTier(SolrSearch.java:309) at com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getFirstTier(SolrSearch.java:93) at com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getNextOlderTier(SolrSearch.java:175) at com.mainstreamdata.MediasIndexer.SolrMgrTest.testMoreLikeThis(SolrMgrTest.java:209) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at junit.framework.TestCase.runTest(TestCase.java:164) at junit.framework.TestCase.runBare(TestCase.java:130) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:120) at junit.framework.TestSuite.runTest(TestSuite.java:230) at junit.framework.TestSuite.run(TestSuite.java:225) at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: org.apache.solr.common.SolrException: null java.lang.NullPointerException at org.apache.solr.schema.FieldType.storedToIndexed(FieldType.java:374) at org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis(MoreLikeThisHandler.java:320) at org.apache.solr.handler.component.MoreLikeThisComponent.getMoreLikeThese(MoreLikeThisComponent.java:82) at org.apache.solr.handler.component.MoreLikeThisComponent.process(MoreLikeThisComponent.java:57) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:208) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766
MoreLikeThis questions
I'm implementing a MoreLikeThis search. I have a couple of questions. I'm implementing this with solrj so I would appreciate it if any code snippets reflect that. First, I want to provide the text that Solr should check for interesting words and do the search on. This means I don't want to specify a document in the collection. I think the documentation implies I can do this. However, it seems like using the q parameter would be the wrong thing since I think it would just take doc 0 of the result of searching the default field with those words. However, I don't see any other parameter that looks like it's the correct one. Second, I need to access the interesting terms. It's not clear to me how to get these. I see the parameter I need to set to have the interesting terms included in the response. I'm just not sure how to get at them with solrj once the response comes back. Can someone point me to examples of how to do this?
RE: MoreLikeThis questions
I realized I probably should have said Solr 3.5 in case that makes a difference. -Original Message- From: Scott Smith [mailto:ssm...@mainstreamdata.com] Sent: Friday, December 09, 2011 2:29 PM To: solr-user@lucene.apache.org Subject: MoreLikeThis questions I'm implementing a MoreLikeThis search. I have a couple of questions. I'm implementing this with solrj so I would appreciate it if any code snippets reflect that. First, I want to provide the text that Solr should check for interesting words and do the search on. This means I don't want to specify a document in the collection. I think the documentation implies I can do this. However, it seems like using the q parameter would be the wrong thing since I think it would just take doc 0 of the result of searching the default field with those words. However, I don't see any other parameter that looks like it's the correct one. Second, I need to access the interesting terms. It's not clear to me how to get these. I see the parameter I need to set to have the interesting terms included in the response. I'm just not sure how to get at them with solrj once the response comes back. Can someone point me to examples of how to do this?
RE: MoreLikeThis questions
OK. I just found Juan Grande's 7/1/2011 post. It seems like that gives me some ideas on the second question. I still don't know what to do about the first question. Maybe if I saw the Request xml, it would give me a hint what to do with the solrj stuff. Anybody have any thoughts? Scott -Original Message- From: Scott Smith [mailto:ssm...@mainstreamdata.com] Sent: Friday, December 09, 2011 3:14 PM To: solr-user@lucene.apache.org Subject: RE: MoreLikeThis questions I realized I probably should have said Solr 3.5 in case that makes a difference. -Original Message- From: Scott Smith [mailto:ssm...@mainstreamdata.com] Sent: Friday, December 09, 2011 2:29 PM To: solr-user@lucene.apache.org Subject: MoreLikeThis questions I'm implementing a MoreLikeThis search. I have a couple of questions. I'm implementing this with solrj so I would appreciate it if any code snippets reflect that. First, I want to provide the text that Solr should check for interesting words and do the search on. This means I don't want to specify a document in the collection. I think the documentation implies I can do this. However, it seems like using the q parameter would be the wrong thing since I think it would just take doc 0 of the result of searching the default field with those words. However, I don't see any other parameter that looks like it's the correct one. Second, I need to access the interesting terms. It's not clear to me how to get these. I see the parameter I need to set to have the interesting terms included in the response. I'm just not sure how to get at them with solrj once the response comes back. Can someone point me to examples of how to do this?
To optimize or not - Solr vs Lucene
Wasn't sure which mailing list to send this to. I'm writing an application that can be configured to run directly with lucene or with solr and I'm trying to figure out whether optimization of the index should be totally eliminated, eliminated in the lucene case only or what. If I read the 3.5 lucene javadocs, optimize() has been deprecated because it is rarely justified with the current lucene index implementation (I started with lucene in the 1.42 days when I think it was pretty much a necessity). However, If I read the lucid imagination 3.4 manual (page 176), it talks about how optimizing will merge a lot of small blocks together making the index more efficient-which is exactly what I thought optimize did. Since solr is based on lucene, I'm wondering if the 3.4 manual is simply out-of-date on this point or whether there is something else going on. Our application is indexing content in real time and so the index changes frequently during the day. Some of our indexes only contain a few hundred thousand documents. However, in one of our applications there are over 50 million documents (using Solr with multiple shards). I thought optimization was a way to keep the index segments merged and thus make the searching more efficient. I thought it was especially needed if the index was being updated frequently. When should I optimize? Thanks in advance for any feedback. Scott
RE: Lucene-SOLR transition
OK. Thanks for all of the suggestions. Cheers Scott -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Monday, September 19, 2011 3:27 AM To: solr-user@lucene.apache.org Subject: Re: Lucene-SOLR transition On Sep 18, 2011, at 19:43 , Michael Sokolov wrote: On 9/15/2011 8:30 PM, Scott Smith wrote: 2. Assuming that the answer to 1 is correct, then is there an easy way to take a lucene query (with nested Boolean queries, filter queries, etc.) and generate a SOLR query string with q and fq components? I believe that Query.toString() will probably get you back something that can be parsed in turn by the traditional lucene QueryParser, thus completing the circle and returning your original Query. But why would you want to do that? No, you can't rely on Query.toString() roundtripping (think stemming, for example - but many other examples that won't work that way too). What you can do, since you know Lucene's API well, is write a QParser(Plugin) that takes request parameters as strings and generates the Query from that like you are now with your Lucene app. Erik
Lucene-SOLR transition
I've been using lucene for a number of years. We've now decided to move to SOLR. I have a couple of questions. 1. I'm used to creating Boolean queries, filter queries, term queries, etc. for lucene. Am I right in thinking that for SOLR my only option is creating string queries (with q and fq components) for solrj? 2. Assuming that the answer to 1 is correct, then is there an easy way to take a lucene query (with nested Boolean queries, filter queries, etc.) and generate a SOLR query string with q and fq components? Thanks Scott
ANTLR SOLR query/filter parser
I'm looking for an ANTLR parser that consumes solr queries and filters. Before I write my own, thought I'd ask if anyone has one they are willing to share or can point me to one? Thanks Scott