Re: Search for misspelled words in corpus
Hm, I was purposely avoiding mentioning ngrams because just ngramming all indexed tokens would balloon the index My assumption was that only *some* words are misspelled, in which case it may be better not to ngram all tokens Otis -- Solr & ElasticSearch Support http://sematext.com/ On Sun, Jun 9, 2013 at 2:30 AM, Jagdish Nomula wrote: > Another theoretical answer for this question is ngrams approach. You can > index the word and its trigrams. Query the index, by the string as well as > its trigrams, with a % match search. You than pass the exhaustive resultset > through a more expensive scoring such as Smith Waterman. > > Thanks, > > Jagdish > > > On Sat, Jun 8, 2013 at 11:03 PM, Shashi Kant wrote: > >> n-grams might help, followed by a edit distance metric such as Jaro-Winkler >> or Smith-Waterman-Gotoh to further filter out. >> >> >> On Sun, Jun 9, 2013 at 1:59 AM, Otis Gospodnetic < >> otis.gospodne...@gmail.com >> > wrote: >> >> > Interesting problem. The first thing that comes to mind is to do >> > "word expansion" during indexing. Kind of like synonym expansion, but >> > maybe a bit more dynamic. If you can have a dictionary of correctly >> > spelled words, then for each token emitted by the tokenizer you could >> > look up the dictionary and expand the token to all other words that >> > are similar/close enough. This would not be super fast, and you'd >> > likely have to add some custom heuristic for figuring out what >> > "similar/close enough" means, but it might work. >> > >> > I'd love to hear other ideas... >> > >> > Otis >> > -- >> > Solr & ElasticSearch Support >> > http://sematext.com/ >> > >> > >> > >> > >> > >> > On Wed, Jun 5, 2013 at 9:10 AM, కామేశ్వర రావు భైరవభట్ల >> > wrote: >> > > Hi, >> > > >> > > I have a problem where our text corpus on which we need to do search >> > > contains many misspelled words. Same word could also be misspelled in >> > > several different ways. It could also have documents that have correct >> > > spellings However, the search term that we give in query would always >> be >> > > correct spelling. Now when we search on a term, we would like to get >> all >> > > the documents that contain both correct and misspelled forms of the >> > search >> > > term. >> > > We tried fuzzy search, but it doesn't work as per our expectations. It >> > > returns any close match, not specifically misspelled words. For >> example, >> > if >> > > I'm searching for a word like "fight", I would like to return the >> > documents >> > > that have words like "figth" and "feight", not documents with words >> like >> > > "sight" and "light". >> > > Is there any suggested approach for doing this? >> > > >> > > regards, >> > > Kamesh >> > >> > > > > -- > ***Jagdish Nomula* > Sr. Manager Search > Simply Hired, Inc. > 370 San Aleso Ave., Ste 200 > Sunnyvale, CA 94085 > > office - 408.400.4700 > cell - 408.431.2916 > email - jagd...@simplyhired.com > > www.simplyhired.com
Velocity / Solritas not works in solr 4.3 and Tomcat 6
*Could anyone help me to see what is the reason which Solritas page failed?* *I can go to http://localhost:8080/solr without problem, but fail to go to http://localhost:8080/solr/browse* *As below is the status report! Any help is appreciated.* *Thanks!* *Andy* * * *type* Status report *message* *{msg=lazy loading error,trace=org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.SolrCore$LazyQueryResponseWriterWrapper.getWrappedWriter(SolrCore.java:2260) at org.apache.solr.core.SolrCore$LazyQueryResponseWriterWrapper.getContentType(SolrCore.java:2279) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:623) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:372) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:879) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:617) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1760) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.solr.common.SolrException: Error Instantiating Query Response Writer, solr.VelocityResponseWriter failed to instantiate org.apache.solr.response.QueryResponseWriter at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539) at org.apache.solr.core.SolrCore.createQueryResponseWriter(SolrCore.java:604) at org.apache.solr.core.SolrCore.access$200(SolrCore.java:131) at org.apache.solr.core.SolrCore$LazyQueryResponseWriterWrapper.getWrappedWriter(SolrCore.java:2255) ... 16 more Caused by: java.lang.ClassCastException: class org.apache.solr.response.VelocityResponseWriter at java.lang.Class.asSubclass(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:458) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518) ... 19 more ,code=500}* *description* *The server encountered an internal error that prevented it from fulfilling this request.*
Re: Search for misspelled words in corpus
Another theoretical answer for this question is ngrams approach. You can index the word and its trigrams. Query the index, by the string as well as its trigrams, with a % match search. You than pass the exhaustive resultset through a more expensive scoring such as Smith Waterman. Thanks, Jagdish On Sat, Jun 8, 2013 at 11:03 PM, Shashi Kant wrote: > n-grams might help, followed by a edit distance metric such as Jaro-Winkler > or Smith-Waterman-Gotoh to further filter out. > > > On Sun, Jun 9, 2013 at 1:59 AM, Otis Gospodnetic < > otis.gospodne...@gmail.com > > wrote: > > > Interesting problem. The first thing that comes to mind is to do > > "word expansion" during indexing. Kind of like synonym expansion, but > > maybe a bit more dynamic. If you can have a dictionary of correctly > > spelled words, then for each token emitted by the tokenizer you could > > look up the dictionary and expand the token to all other words that > > are similar/close enough. This would not be super fast, and you'd > > likely have to add some custom heuristic for figuring out what > > "similar/close enough" means, but it might work. > > > > I'd love to hear other ideas... > > > > Otis > > -- > > Solr & ElasticSearch Support > > http://sematext.com/ > > > > > > > > > > > > On Wed, Jun 5, 2013 at 9:10 AM, కామేశ్వర రావు భైరవభట్ల > > wrote: > > > Hi, > > > > > > I have a problem where our text corpus on which we need to do search > > > contains many misspelled words. Same word could also be misspelled in > > > several different ways. It could also have documents that have correct > > > spellings However, the search term that we give in query would always > be > > > correct spelling. Now when we search on a term, we would like to get > all > > > the documents that contain both correct and misspelled forms of the > > search > > > term. > > > We tried fuzzy search, but it doesn't work as per our expectations. It > > > returns any close match, not specifically misspelled words. For > example, > > if > > > I'm searching for a word like "fight", I would like to return the > > documents > > > that have words like "figth" and "feight", not documents with words > like > > > "sight" and "light". > > > Is there any suggested approach for doing this? > > > > > > regards, > > > Kamesh > > > -- ***Jagdish Nomula* Sr. Manager Search Simply Hired, Inc. 370 San Aleso Ave., Ste 200 Sunnyvale, CA 94085 office - 408.400.4700 cell - 408.431.2916 email - jagd...@simplyhired.com www.simplyhired.com
Re: load balancing internal Solr on Azure
Hi Kevin, Would http://search-lucene.com/?q=LBHttpSolrServer work for you? Otis -- Solr & ElasticSearch Support http://sematext.com/ On Fri, May 24, 2013 at 3:12 PM, Kevin Osborn wrote: > We are looking install SolrCloud on Azure. We want it to be an internal > service. For some applications that use SolrJ, we can use ZooKeeper. But > for other applications that don't talk to Azure, we will need to go through > a load balancer to distribute traffic among the Solr instances (VMs, IaaS). > > The problem is that Azure as far as I am aware does not have a load > balancer for internal services. Internal endpoints are not load balanced. > > This is obviously not a problem specific to Solr, but I was hoping that > other people might have some good ideas for addressing this issue. Thanks. > > -- > *KEVIN OSBORN* > LEAD SOFTWARE ENGINEER > CNET Content Solutions > OFFICE 949.399.8714 > CELL 949.310.4677 SKYPE osbornk > 5 Park Plaza, Suite 600, Irvine, CA 92614 > [image: CNET Content Solutions]
Re: HyperLogLog for Solr
I have not heard of anyone using HLL in Solr, but: https://docs.google.com/presentation/d/1ESNiqd7HuIfuwXSSK81PAAu6AmEPEE0u_vyk4FU5x9o/present#slide=id.p https://github.com/ptdavteam/elasticsearch-approx-plugin Otis -- Solr & ElasticSearch Support http://sematext.com/ On Tue, May 28, 2013 at 2:43 AM, J Mohamed Zahoor wrote: > Hi > > Has anyone tried using HLL for using finding unique values of a field in solr? > I am planning to use them to facet count on certain fields to reduce memory > footprint. > > > > ./Zahoor
Re: Note on The Book
It's 2013 and people suffer from ADD. Break it up into a la carte chapter books. Otis -- Solr & ElasticSearch Support http://sematext.com/ On Wed, May 29, 2013 at 6:23 PM, Jack Krupansky wrote: > Markus, > > Okay, more pages it is! > > -- Jack Krupansky > > -Original Message- From: Markus Jelsma > Sent: Wednesday, May 29, 2013 5:35 PM > > To: solr-user@lucene.apache.org > Subject: RE: Note on The Book > > Jack, > > I'd prefer tons of information instead of a meager 300 page book that leaves > a lot of questions. I'm looking forward to a paperback or hardcover book and > price doesn't really matter, it is going to be worth it anyway. > > Thanks, > Markus > > > > -Original message- >> >> From:Jack Krupansky >> Sent: Wed 29-May-2013 15:10 >> To: solr-user@lucene.apache.org >> Subject: Re: Note on The Book >> >> Erick, your point is well taken. Although my primary interest/skill is to >> produce a solid foundation reference (including tons of examples), the >> real >> goal is to then build on top of that foundation. >> >> While I focus on the hard-core material - which really does include some >> narrative and lots of examples in addition to tons of "mere" reference, my >> co-author, Ryan Tabora, will focus almost exclusively on... narrative and >> diagrams. >> >> And when I say reference, I also mean lots of examples. Even as the >> hard-core reference stabilizes, the examples will continue to grow ("like >> weeds!"). >> >> Once we get the current, existing, under-review, chapters packaged into >> the >> new book and available for purchase and download (maybe Lulu, not decided) >> - >> available, in a couple of weeks, it will be updated approximately every >> other week, both with additional reference material, and additional >> narrative and diagrams. >> >> One of our priorities (after we get through Stage 0 of the next few weeks) >> is to in fact start giving each of the long Deep Dive Chapters enough >> narrative lead to basically say exactly that - why you should care. >> >> A longer-term priority is to improve the balance of narrative and >> hard-core >> reference. Yeah, that will be a lot of pages. It already is. We were at >> 907 >> pages and I was about to drop in another 166 pages on update handlers when >> O'Reilly threw up their hands and pulled the plug. I was estimating 1200 >> pages at that stage. And I'll probably have another 60-80 pages on update >> request processors within a week or so. With more to come. That did >> include >> a lot of hard-core material and example code for Lucene, which won't be in >> the new Solr-only book. By focusing on an e-book the raw page count alone >> becomes moot. We haven't given up on print - the intent is eventually to >> have multiple volumes (4-8 or so, maybe more), both as cheaper e-books ($3 >> to $5 each) and slimmer print volumes for people who don't need everything >> in print. >> >> In fact, we will likely offer the revamped initial chapters of the book as >> a >> standalone introduction to Solr - narrative introduction ("why should you >> care about Solr"), basic concepts of Lucene and Solr (and why you should >> care!), brief tutorial walkthough of the major feature areas of Solr, and >> a >> case study. The intent would be both e-book and a slim print volume (75 >> pages?). >> >> Another priority (beyond Stage 0) is to develop a detailed roadmap diagram >> of Solr and how applications can use Solr, and then use that to show how >> each of the Deep Dive sections (heavy reference, but gradually adding more >> narrative over time.) >> >> We will probably be very open to requests - what people really wish a book >> would actually do for them. The only request we won't be open to is to do >> it >> all in only 300 pages. >> >> -- Jack Krupansky >> >> -Original Message- From: Erick Erickson >> Sent: Wednesday, May 29, 2013 7:19 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Note on The Book >> >> FWIW, picking up on Alexandre's point. One of my continual >> frustrations with virtually _all_ >> technical books is they become endless pages of details without ever >> mentioning why >> the hell I should care. Unfortunately, explaining use-cases for >> everything would only make >> the book about 10,000 pages long. Siiigh. >> >> I guess you can take this as a vote for narrative >> >> Erick >> >> On Tue, May 28, 2013 at 4:53 PM, Jack Krupansky >> wrote: >> > We'll have a blog for the book. We hope to have a first >> > raw/rough/partial/draft published as an e-book in maybe 10 days to 2 >> > weeks. >> > As soon as we get that process under control, we'll start the blog. I'll >> > keep your email on file and keep you posted. >> > >> > -- Jack Krupansky >> > >> > -Original Message- From: Swati Swoboda >> > Sent: Tuesday, May 28, 2013 1:36 PM >> > To: solr-user@lucene.apache.org >> > Subject: RE: Note on The Book >> > >> > >> > I'd definitely prefer the spiral bound as well. E-books are great and > >> > your
Re: Search for misspelled words in corpus
n-grams might help, followed by a edit distance metric such as Jaro-Winkler or Smith-Waterman-Gotoh to further filter out. On Sun, Jun 9, 2013 at 1:59 AM, Otis Gospodnetic wrote: > Interesting problem. The first thing that comes to mind is to do > "word expansion" during indexing. Kind of like synonym expansion, but > maybe a bit more dynamic. If you can have a dictionary of correctly > spelled words, then for each token emitted by the tokenizer you could > look up the dictionary and expand the token to all other words that > are similar/close enough. This would not be super fast, and you'd > likely have to add some custom heuristic for figuring out what > "similar/close enough" means, but it might work. > > I'd love to hear other ideas... > > Otis > -- > Solr & ElasticSearch Support > http://sematext.com/ > > > > > > On Wed, Jun 5, 2013 at 9:10 AM, కామేశ్వర రావు భైరవభట్ల > wrote: > > Hi, > > > > I have a problem where our text corpus on which we need to do search > > contains many misspelled words. Same word could also be misspelled in > > several different ways. It could also have documents that have correct > > spellings However, the search term that we give in query would always be > > correct spelling. Now when we search on a term, we would like to get all > > the documents that contain both correct and misspelled forms of the > search > > term. > > We tried fuzzy search, but it doesn't work as per our expectations. It > > returns any close match, not specifically misspelled words. For example, > if > > I'm searching for a word like "fight", I would like to return the > documents > > that have words like "figth" and "feight", not documents with words like > > "sight" and "light". > > Is there any suggested approach for doing this? > > > > regards, > > Kamesh >
Re: Search for misspelled words in corpus
Interesting problem. The first thing that comes to mind is to do "word expansion" during indexing. Kind of like synonym expansion, but maybe a bit more dynamic. If you can have a dictionary of correctly spelled words, then for each token emitted by the tokenizer you could look up the dictionary and expand the token to all other words that are similar/close enough. This would not be super fast, and you'd likely have to add some custom heuristic for figuring out what "similar/close enough" means, but it might work. I'd love to hear other ideas... Otis -- Solr & ElasticSearch Support http://sematext.com/ On Wed, Jun 5, 2013 at 9:10 AM, కామేశ్వర రావు భైరవభట్ల wrote: > Hi, > > I have a problem where our text corpus on which we need to do search > contains many misspelled words. Same word could also be misspelled in > several different ways. It could also have documents that have correct > spellings However, the search term that we give in query would always be > correct spelling. Now when we search on a term, we would like to get all > the documents that contain both correct and misspelled forms of the search > term. > We tried fuzzy search, but it doesn't work as per our expectations. It > returns any close match, not specifically misspelled words. For example, if > I'm searching for a word like "fight", I would like to return the documents > that have words like "figth" and "feight", not documents with words like > "sight" and "light". > Is there any suggested approach for doing this? > > regards, > Kamesh
Dataless nodes in SolrCloud?
Hi, Is there a notion of a data-node vs. non-data node in SolrCloud? Something a la http://www.elasticsearch.org/guide/reference/modules/node/ Thanks, Otis Solr & ElasticSearch Support http://sematext.com/
Re: index merge question
I have noticed that when I write a doc with an id that already exists, it creates a new revision with the only the fields from the second write. I guess there is a REST API in the latest solr version which updates only selected fields. In my opinion, merge should be creating a doc which is a union of the fields assuming the fields are conforming to the schema of the output index. ~ Sourajit On Sun, Jun 9, 2013 at 12:06 AM, Mark Miller wrote: > > On Jun 8, 2013, at 12:52 PM, Jamie Johnson wrote: > > > When merging through the core admin ( > > http://wiki.apache.org/solr/MergingSolrIndexes) what is the policy for > > conflicts during the merge? So for instance if I am merging core 1 and > > core 2 into core 0 (first example), what happens if core 1 and core 2 > both > > have a document with the same key, say core 1 has a newer version of core > > 2? Does the merge fail, does the newer document remain? > > You end up with both documents, both with that ID - not generally a > situation you want to end up in. You need to ensure unique id's in the > input data or replace the index rather than merging into it. > > > > > Also if using the srcCore method if a document with key 1 is written > while > > an index also with key 1 is being merged what happens? > > It depends on the order I think - if the doc is written after the merge > and it's an update, it will update the doc that was just merged in. If the > merge comes second, you have the doc twice and it's a problem. > > - Mark
Re: Custom Data Clustering
Hello, This sounds like a custom SearchComponent. Which clustering library you want to use or DIY is up to you, but go with the SearchComponent approach. You will still need to process N hits, but you won't need to first send them all over the wire. Otis -- Solr & ElasticSearch Support http://sematext.com/ On Fri, Jun 7, 2013 at 11:48 AM, Raheel Hasan wrote: > Hi, > > Can someone please tell me if there is a way to have a custom *`clustering > of the data`* from `solr` 'query' results? I am facing 2 issues currently: > > 1. The `*Carrot*` clustering only applies clustering to the "paged" > results (i.e. in the current pagination's page results). > > 2. I need to have custom clustering and classify results into certain > classes only (i.e. only few very specific words in the search results). > Like for example "Red", "Green", "Blue" etc... and not "hello World", > "Known World", "green world" etc -(if you know what I mean here) - > Where all these words in both Do and DoNot existing in the search results. > > Please tell me how to achieve this. Perhaps Carrot/clustering is not needed > here and some other classifier is needed. So what to do here? > > Basically, I cannot receive 1 million results, then process them via > PHP-Array to classify them as per need. The classification must be done > here in solr only. > > Thanks > > -- > Regards, > Raheel Hasan
Query-node+shard stickiness?
Hi, Is there anything in SolrCloud that would support query-node/shard affinity/stickiness? What I mean by that is a mechanism that is smart enough to keep sending the same query X to the same node(s)+shard(s)... with the goal being better utilization of Solr and OS caches? Example: * Imagine a Collection with 2 shards and 3 replicas: s1r1, s1r2, s1r3, s2r1, s2r2, s2r3 * Query for "Foo Bar" comes in and hits one of the nodes, say s1r1 * Since shard 2 needs to be queried, too, one of its 3 replicas needs to be searched. Say s2r1 gets searched * 5 minutes later the same query for "Foo Bar" comes in, say it hits s1r1 again * Again shard 2 needs to be searched. But which of the 3 replicas should be searched? * Ideally that same s2r1 would be searched Is there anything in SolrCloud that can accomplish this? Or if there a place in SolrCloud where such "query hash ==> node/shard" mapping could be implemented? Thanks, Otis -- Solr & ElasticSearch Support http://sematext.com/
Re: Help required with fq syntax
Though the syntax looks fine, but I get all the records. As per example given above I get all the documents, meaning filtering did not work. I am curious to know if my indexing went fine or not. I will check and revert back. On Sun, Jun 9, 2013 at 7:21 AM, Otis Gospodnetic wrote: > Try: > > ...&q=*:*&fq=-blocked_company_ids:5 > > Otis > -- > Solr & ElasticSearch Support > http://sematext.com/ > > > > > > On Sat, Jun 8, 2013 at 9:37 PM, Kamal Palei wrote: > > Dear All > > I have a multi-valued field blocked_company_ids in index. > > > > You can think like > > > > 1. document1 , blocked_company_ids: 1, 5, 7 > > 2. document2 , blocked_company_ids: 2, 6, 7 > > 3. document3 , blocked_company_ids: 4, 5, 6 > > > > and so on . > > > > If I want to retrieve all the documents where blocked_company_id does > not > > contain one particular company id say 5. > > > > So my search result should give me only document2 as document1 and > > document3 both contains 5. > > > > To achieve this how fq syntax looks like is it something like below > > > > &fq=blocked_company_ids:-5 > > > > I tried like above syntax, but it gives me 0 record. > > > > Can somebody help me with the syntax please, and point me where all > syntax > > details are given. > > > > Thanks > > Kamal > > Net Cloud Systems >
Re: Help required with fq syntax
Also please note that for some documents, blocked_company_ids may not be present as well. In such cases that document should be present in search result as well. BR, Kamal On Sun, Jun 9, 2013 at 7:07 AM, Kamal Palei wrote: > Dear All > I have a multi-valued field blocked_company_ids in index. > > You can think like > > 1. document1 , blocked_company_ids: 1, 5, 7 > 2. document2 , blocked_company_ids: 2, 6, 7 > 3. document3 , blocked_company_ids: 4, 5, 6 > > and so on . > > If I want to retrieve all the documents where blocked_company_id does not > contain one particular company id say 5. > > So my search result should give me only document2 as document1 and > document3 both contains 5. > > To achieve this how fq syntax looks like is it something like below > > &fq=blocked_company_ids:-5 > > I tried like above syntax, but it gives me 0 record. > > Can somebody help me with the syntax please, and point me where all syntax > details are given. > > Thanks > Kamal > Net Cloud Systems > >
Re: Help required with fq syntax
Try: ...&q=*:*&fq=-blocked_company_ids:5 Otis -- Solr & ElasticSearch Support http://sematext.com/ On Sat, Jun 8, 2013 at 9:37 PM, Kamal Palei wrote: > Dear All > I have a multi-valued field blocked_company_ids in index. > > You can think like > > 1. document1 , blocked_company_ids: 1, 5, 7 > 2. document2 , blocked_company_ids: 2, 6, 7 > 3. document3 , blocked_company_ids: 4, 5, 6 > > and so on . > > If I want to retrieve all the documents where blocked_company_id does not > contain one particular company id say 5. > > So my search result should give me only document2 as document1 and > document3 both contains 5. > > To achieve this how fq syntax looks like is it something like below > > &fq=blocked_company_ids:-5 > > I tried like above syntax, but it gives me 0 record. > > Can somebody help me with the syntax please, and point me where all syntax > details are given. > > Thanks > Kamal > Net Cloud Systems
Help required with fq syntax
Dear All I have a multi-valued field blocked_company_ids in index. You can think like 1. document1 , blocked_company_ids: 1, 5, 7 2. document2 , blocked_company_ids: 2, 6, 7 3. document3 , blocked_company_ids: 4, 5, 6 and so on . If I want to retrieve all the documents where blocked_company_id does not contain one particular company id say 5. So my search result should give me only document2 as document1 and document3 both contains 5. To achieve this how fq syntax looks like is it something like below &fq=blocked_company_ids:-5 I tried like above syntax, but it gives me 0 record. Can somebody help me with the syntax please, and point me where all syntax details are given. Thanks Kamal Net Cloud Systems
Re: does solr support query time only stopwords?
Maybe returned hits match other query terms. Otis Solr & ElasticSearch Support http://sematext.com/ On Jun 8, 2013 6:34 PM, "jchen2000" wrote: > I wanted to analyze high frequency terms using Solr's Luke request handler > and keep updating the stopwords file for new queries from time to time. > Obviously I have to index all terms whether they belong to stopwords list > or > not. > > So I configured query analyzer stopwords list but disabled index analyzer > stopwords list, However, it seems like the query would return all records > containing stopwords after this. > > Anybody has an idea why this would happen? > > ps. I am using Datastax Enterprise 3.0.2 and the solr version is 4.0 > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/does-solr-support-query-time-only-stopwords-tp4069087.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Entire query is stopwords
Remove the stopFilter from the "index" section of your fieldType, only keep it in the "query" section. This way your stopwords will always be indexed and edismax will be able to selectively remove stopwords from the query depending on whether all words are stopwords or not. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 5. juni 2013 kl. 21:36 skrev Vardhan Dharnidharka : > > > > Hi, > > I am using the standard edismax parser and my example query is as follows: > > {!edismax qf='object_description ' rows=10 start=0 mm=-40% v='object'} > > In this case, 'object' happens to be a stopword in the StopWordsFilter in my > datatype 'object_description'. Now, since 'object' is not indexed at all, the > query does not return any results. In an ideal case, I would want documents > containing the term 'object' to be returned. > > What is the best practice to achieve this? Index stop-words and re-query with > 'stopwords=false'. Or can this be done without re-querying? > > Thanks, > Vardhan >
does solr support query time only stopwords?
I wanted to analyze high frequency terms using Solr's Luke request handler and keep updating the stopwords file for new queries from time to time. Obviously I have to index all terms whether they belong to stopwords list or not. So I configured query analyzer stopwords list but disabled index analyzer stopwords list, However, it seems like the query would return all records containing stopwords after this. Anybody has an idea why this would happen? ps. I am using Datastax Enterprise 3.0.2 and the solr version is 4.0 -- View this message in context: http://lucene.472066.n3.nabble.com/does-solr-support-query-time-only-stopwords-tp4069087.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Lucene/Solr Filesystem tunings
Turning swappiness down to 0 can have some decent performance impact. - http://en.wikipedia.org/wiki/Swappiness In the past, I've seen better performance with ext3 over ext4 around commits/fsync. Test were actually enough slower (lots of these operations), that I made a special ext3 partition workspace for lucene/solr dev. (Still use ext4 for root and home). Have not checked that recently, and it may not be a large concern for many use cases. - Mark On Jun 4, 2013, at 6:48 PM, Tim Vaillancourt wrote: > Hey all, > > Does anyone have any advice or special filesytem tuning to share for > Lucene/Solr, and which file systems they like more? > > Also, does Lucene/Solr care about access times if I turn them off (I think I > doesn't care)? > > A bit unrelated: What are people's opinions on reducing some consistency > things like filesystem journaling, etc (ext2?) due to SolrCloud's additional > HA with replicas? How about RAID 0 x 3 replicas or so? > > Thanks! > > Tim Vaillancourt
Re: index merge question
On Jun 8, 2013, at 12:52 PM, Jamie Johnson wrote: > When merging through the core admin ( > http://wiki.apache.org/solr/MergingSolrIndexes) what is the policy for > conflicts during the merge? So for instance if I am merging core 1 and > core 2 into core 0 (first example), what happens if core 1 and core 2 both > have a document with the same key, say core 1 has a newer version of core > 2? Does the merge fail, does the newer document remain? You end up with both documents, both with that ID - not generally a situation you want to end up in. You need to ensure unique id's in the input data or replace the index rather than merging into it. > > Also if using the srcCore method if a document with key 1 is written while > an index also with key 1 is being merged what happens? It depends on the order I think - if the doc is written after the merge and it's an update, it will update the doc that was just merged in. If the merge comes second, you have the doc twice and it's a problem. - Mark
index merge question
When merging through the core admin ( http://wiki.apache.org/solr/MergingSolrIndexes) what is the policy for conflicts during the merge? So for instance if I am merging core 1 and core 2 into core 0 (first example), what happens if core 1 and core 2 both have a document with the same key, say core 1 has a newer version of core 2? Does the merge fail, does the newer document remain? Also if using the srcCore method if a document with key 1 is written while an index also with key 1 is being merged what happens?
Re: custom field tutorial
Usually, people want to do the opposite - store the numeric code as a numeric field for perceived efficiency and let the user query and view results with the text form. But, there isn't any evidence of any great performance benefit of doing so - just store the string code in a string field. Also, your language is confusing - you say "a single integer field that maps to the string field" - do you actually want two separate fields? Is that the case? If so, just let the user query against either field depending on what their preference is for numeric or string codes. And your language seems to indicate that you want the user to query by numeric code but the field would be indexed as a string code. Is that the case? Maybe you could clarify your intentions. Sure, with custom code, custom fields, custom codecs, custom query parsers, etc. you can do almost anything - but... the initial challenge for any Solr app developer is to first try and see if they can make due with the existing capabilities. -- Jack Krupansky -Original Message- From: Anria Billavara Sent: Saturday, June 08, 2013 2:54 AM To: solr-user@lucene.apache.org Subject: Re: custom field tutorial You seem to know what you want the words to map to, so index the map. Have one field for the word, one field for the mapped value, and at query time, search the words and return the mapped field. If it is comma separated, so be it and split it up in your code post search. Otherwise, same as Wunder, in my many years in search this is an odd request Anria Sent from my Samsung smartphone on AT&T Original message Subject: Re: custom field tutorial From: Walter Underwood To: solr-user@lucene.apache.org CC: What are you trying to do? This seems really odd. I've been working in search for fifteen years and I've never heard this request. You could always return all the fields to the client and ignore the ones you don't want. wunder On Jun 7, 2013, at 8:24 PM, geeky2 wrote: can someone point me to a "custom field" tutorial. i checked the wiki and this list - but still a little hazy on how i would do this. essentially - when the user issues a query, i want my class to interrogate a string field (containing several codes - example boo, baz, bar) and return a single integer field that maps to the string field (containing the code). example: boo=1 baz=2 bar=3 thx mark