Re: Looking for Developers
This is the second time he has sent this shit. Kill his subscription. Is it possible? On Tue, Oct 26, 2010 at 10:38 PM, Yuchen Wang wrote: > UNSUBSCRIBE > > On Tue, Oct 26, 2010 at 10:15 PM, Igor Chudov wrote: > > > UNSUBSCRIBE > > > > On Wed, Oct 27, 2010 at 12:14 AM, ST ST wrote: > > > Looking for Developers Experienced in Solr/Lucene And/OR FAST Search > > Engines > > > from India (Pune) > > > > > > We are looking for off-shore India Based Developers who are proficient > in > > > Solr/Lucene and/or FAST search engine . > > > Developers in the cities of Pune/Bombay in India are preferred. > > Development > > > is for projects based in US for a reputed firm. > > > > > > If you are proficient in Solr/Lucene/FAST and have 5 years minimum > > industry > > > experience with atleast 3 years in Search Development, > > > please send me your resume. > > > > > > Thanks > > > > > >
Re: how well does multicore scale?
Creating a unique id for a schema is one of those design tasks: http://wiki.apache.org/solr/UniqueKey A marvelously lucid and well-written page, if I do say so. And I do. On Tue, Oct 26, 2010 at 10:16 PM, Tharindu Mathew wrote: > Really great to know you were able to fire up about 100 cores. But, > when it scales up to around 1000 or even more. I wonder how it would > perform. > > I have a question regarding ids i.e. the unique key. Since there is a > potential use case that two users might add the same document, how > would we set the id. I was thinking of appending the user id to the an > id I would use ex: "/system/bar.pdfuserid25". Otherwise, solr would > replace the document of one user, which is not what we want. > > This is also applicable to deleteById. Is there a better way to do this? > > On Tue, Oct 26, 2010 at 7:45 PM, Jonathan Rochkind wrote: >> mike anderson wrote: >>> >>> I'm really curious if there is a clever solution to the obvious problem >>> with: "So your better off using a single index and with a user id and use >>> a query filter with the user id when fetching data.", i.e.. when you have >>> hundreds of thousands of user IDs tagged on each article. That just >>> doesn't >>> sound like it scales very well.. >>> >> >> Actually, I think that design would scale pretty fine, I don't think there's >> an 'obvious' problem. You store your userIDs in a multi-valued field (or as >> multiple terms in a single value, ends up being similar). You fq on there >> with the current userID. There's one way to find out of course, but that >> doesn't seem a patently ridiculous scenario or anything, that's the kind of >> thing Solr is generally good at, it's what it's built for. The problem >> might actually be in the time it takes to add such a document to the index; >> but not in query time. >> >> Doesn't mean it's the best solution for your problem though, I can't say. >> >> My impression is that Solr in general isn't really designed to support the >> kind of multi-tenancy use case people are talking about lately. So trying >> to make it work anyway... if multi-cores work for you, then great, but be >> aware they weren't really designed for that (having thousands of cores) and >> may not. If a single index can work for you instead, great, but as you've >> discovered it's not neccesarily obvious how to set up the schema to do what >> you need -- really this applies to Solr in general, unlike an rdbms where >> you just third-form-normalize everything and figure it'll work for almost >> any use case that comes up, in Solr you generally need to custom fit the >> schema for your particular use cases, sometimes being kind of clever to >> figure out the optimal way to do that. >> >> This is, I'd argue/agree, indeed kind of a disadvantage, setting up a Solr >> index takes more intellectual work than setting up an rdbms. The trade off >> is you get speed, and flexible ways to set up relevancy (that still perform >> well). Took a couple decades for rdbms to get as brainless to use as they >> are, maybe in a couple more we'll have figured out ways to make indexing >> engines like solr equally brainless, but not yet -- but it's still pretty >> damn easy for what it is, the lucene/Solr folks have done a remarkable job. >> > > > > -- > Regards, > > Tharindu > -- Lance Norskog goks...@gmail.com
Re: Looking for Developers
UNSUBSCRIBE On Tue, Oct 26, 2010 at 10:15 PM, Igor Chudov wrote: > UNSUBSCRIBE > > On Wed, Oct 27, 2010 at 12:14 AM, ST ST wrote: > > Looking for Developers Experienced in Solr/Lucene And/OR FAST Search > Engines > > from India (Pune) > > > > We are looking for off-shore India Based Developers who are proficient in > > Solr/Lucene and/or FAST search engine . > > Developers in the cities of Pune/Bombay in India are preferred. > Development > > is for projects based in US for a reputed firm. > > > > If you are proficient in Solr/Lucene/FAST and have 5 years minimum > industry > > experience with atleast 3 years in Search Development, > > please send me your resume. > > > > Thanks > > >
Re: Looking for Developers
UNSUBSCRIBE On Wed, Oct 27, 2010 at 12:14 AM, ST ST wrote: > Looking for Developers Experienced in Solr/Lucene And/OR FAST Search Engines > from India (Pune) > > We are looking for off-shore India Based Developers who are proficient in > Solr/Lucene and/or FAST search engine . > Developers in the cities of Pune/Bombay in India are preferred. Development > is for projects based in US for a reputed firm. > > If you are proficient in Solr/Lucene/FAST and have 5 years minimum industry > experience with atleast 3 years in Search Development, > please send me your resume. > > Thanks >
Re: how well does multicore scale?
Really great to know you were able to fire up about 100 cores. But, when it scales up to around 1000 or even more. I wonder how it would perform. I have a question regarding ids i.e. the unique key. Since there is a potential use case that two users might add the same document, how would we set the id. I was thinking of appending the user id to the an id I would use ex: "/system/bar.pdfuserid25". Otherwise, solr would replace the document of one user, which is not what we want. This is also applicable to deleteById. Is there a better way to do this? On Tue, Oct 26, 2010 at 7:45 PM, Jonathan Rochkind wrote: > mike anderson wrote: >> >> I'm really curious if there is a clever solution to the obvious problem >> with: "So your better off using a single index and with a user id and use >> a query filter with the user id when fetching data.", i.e.. when you have >> hundreds of thousands of user IDs tagged on each article. That just >> doesn't >> sound like it scales very well.. >> > > Actually, I think that design would scale pretty fine, I don't think there's > an 'obvious' problem. You store your userIDs in a multi-valued field (or as > multiple terms in a single value, ends up being similar). You fq on there > with the current userID. There's one way to find out of course, but that > doesn't seem a patently ridiculous scenario or anything, that's the kind of > thing Solr is generally good at, it's what it's built for. The problem > might actually be in the time it takes to add such a document to the index; > but not in query time. > > Doesn't mean it's the best solution for your problem though, I can't say. > > My impression is that Solr in general isn't really designed to support the > kind of multi-tenancy use case people are talking about lately. So trying > to make it work anyway... if multi-cores work for you, then great, but be > aware they weren't really designed for that (having thousands of cores) and > may not. If a single index can work for you instead, great, but as you've > discovered it's not neccesarily obvious how to set up the schema to do what > you need -- really this applies to Solr in general, unlike an rdbms where > you just third-form-normalize everything and figure it'll work for almost > any use case that comes up, in Solr you generally need to custom fit the > schema for your particular use cases, sometimes being kind of clever to > figure out the optimal way to do that. > > This is, I'd argue/agree, indeed kind of a disadvantage, setting up a Solr > index takes more intellectual work than setting up an rdbms. The trade off > is you get speed, and flexible ways to set up relevancy (that still perform > well). Took a couple decades for rdbms to get as brainless to use as they > are, maybe in a couple more we'll have figured out ways to make indexing > engines like solr equally brainless, but not yet -- but it's still pretty > damn easy for what it is, the lucene/Solr folks have done a remarkable job. > -- Regards, Tharindu
Re: Solr sorting problem
Erick Erickson wrote: > In general, the behavior when sorting is not predictable when > sorting on a tokenized field, which "text" is. What would > it mean to sort on a field with "erick" "Moazzam" as tokens > in a single document? Should it be in the "e"s or the "m"s? Might it be possible or reasonable to have it show up under both "e" and "m"? Or if not, just at the first one it finds? I've recently been asked a similar question where we wanted to sort documents by a victim's age. I have a victim_age field, but since there can be multiple victims in an incident it wasn't a unique field. As a workaround, I added a "victim_age_min" field; but it would have been easier if I didn't need to do that. > That said, you probably want to watch out for case > > Best > Erick > > On Fri, Oct 22, 2010 at 10:02 AM, Moazzam Khan wrote: > >> For anyone who faced the same problem, changing the field to string >> from text worked! >> >> -Moazzam >> >> On Fri, Oct 22, 2010 at 8:50 AM, Moazzam Khan wrote: >>> The field type of the first name and last name is text. Could that be >>> why it's not sorting properly? I just changed it to string and started >>> a full-import. Hopefully that will work. >>> >>> Thanks, >>> Moazzam >>> >>> On Thu, Oct 21, 2010 at 7:42 PM, Jayendra Patil >>> wrote: need additional information . Sorting is easy in Solr just by passing the sort parameter However, when it comes to text sorting it depends on how you analyse and tokenize your fields Sorting does not work on fields with multiple tokens. >> http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F On Thu, Oct 21, 2010 at 7:24 PM, Moazzam Khan >> wrote: > Hey guys, > > I have a list of people indexed in Solr. I am trying to sort by their > first names but I keep getting results that are not alphabetically > sorted (I see the names starting with W before the names starting with > A). I have a feeling that the results are first being sorted by > relevancy then sorted by first name. > > Is there a way I can get the results to be sorted alphabetically? > > Thanks, > Moazzam > >
Re: How do I this in Solr?
Thanks everybody for the inputs. Looks like Steven's solution is the closest one but will lead to performance issues when the query string has many terms. I will try to implement the two filters suggested by Steven and see how the performance matches up. -- Thanks Varun Gupta On Wed, Oct 27, 2010 at 8:04 AM, scott chu (朱炎詹) wrote: > I think you have to write a "yet exact match" handler yourself (I mean yet > cause it's not quite exact match we normally know). Steve's answer is quite > near your request. You can do further work based on his solution. > > At the last step, I'll suggest you eat up all blank within query string and > query result, respevtively & only returns those results that has equal > string length as the query string's. > > For example, giving: > *query string = "Samsung with GPS" > *query results: > resutl 1 = "Samsung has lots of mobile with GPS" > result 2 = "with GPS Samsng" > result 3 = "GPS mobile with vendors, such as Sony, Samsung" > > they become: > *query result = "SamsungwithGPS" (length =14) > *query results: > resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29) > result 2 = "withGPSSamsng" (length =14) > result 3 = "GPSmobilewithvendors,suchasSony,Samsung" (length =43) > > so result 2 matches your request. > > In this way, you can avoid case-sensitive, word-order-rearrange load of > works. Furthermore, you can do refined work, such as remove white > characters, etc. > > Scott @ Taiwan > > > - Original Message - From: "Varun Gupta" > > To: > Sent: Tuesday, October 26, 2010 9:07 PM > > Subject: How do I this in Solr? > > > Hi, >> >> I have lot of small documents (each containing 1 to 15 words) indexed in >> Solr. For the search query, I want the search results to contain only >> those >> documents that satisfy this criteria "All of the words of the search >> result >> document are present in the search query" >> >> For example: >> If I have the following documents indexed: "nokia n95", "GPS", "android", >> "samsung", "samsung andriod", "nokia andriod", "mobile with GPS" >> >> If I search with the text "samsung andriod GPS", search results should >> only >> conain "samsung", "GPS", "andriod" and "samsung andriod". >> >> Is there a way to do this in Solr. >> >> -- >> Thanks >> Varun Gupta >> >> > > > > > > > %<&b6G$J0T.'$$'d(l/f,r!C > Checked by AVG - www.avg.com > Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10 > 14:34:00 > >
how to index raw data
Hi, I wanted to use a few fields from the dataase, but cannot use the DIH because jdbc access to the database is not allowed. We can only go thru a wrapper. As such, I would like to know how I can index the data obtained through the db wrapper, using solrJ. I would have two fields to index - id and a text field containing the data. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-index-raw-data-tp1778033p1778033.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: FieldCollapsing and Stats or Sum ?!
Do you want one number, or the sum for each group? For one number, the stats component is fine. For one number per group, grouping does not (yet) support the stats component. This is the old SQL "Group By" command, right? On Tue, Oct 26, 2010 at 6:42 AM, stockiii wrote: > > Hello. > > we want to group with field collapsing and we want a sum of this groups. > > in example: > group by currency_id: EUR, CHF, ... > and for this groups, the correct sum of the documents from the field: amount > > ist this in one Request possible ? or its necessary do this in several > requests ? > maybe first grouping and then using the statsComponent to get the sum of the > group by sending a new request with the filter ? but then i dont need > grouping !?!? > > thx =) > -- > View this message in context: > http://lucene.472066.n3.nabble.com/FieldCollapsing-and-Stats-or-Sum-tp1773842p1773842.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com
Re: How do I this in Solr?
I think you have to write a "yet exact match" handler yourself (I mean yet cause it's not quite exact match we normally know). Steve's answer is quite near your request. You can do further work based on his solution. At the last step, I'll suggest you eat up all blank within query string and query result, respevtively & only returns those results that has equal string length as the query string's. For example, giving: *query string = "Samsung with GPS" *query results: resutl 1 = "Samsung has lots of mobile with GPS" result 2 = "with GPS Samsng" result 3 = "GPS mobile with vendors, such as Sony, Samsung" they become: *query result = "SamsungwithGPS" (length =14) *query results: resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29) result 2 = "withGPSSamsng" (length =14) result 3 = "GPSmobilewithvendors,suchasSony,Samsung" (length =43) so result 2 matches your request. In this way, you can avoid case-sensitive, word-order-rearrange load of works. Furthermore, you can do refined work, such as remove white characters, etc. Scott @ Taiwan - Original Message - From: "Varun Gupta" To: Sent: Tuesday, October 26, 2010 9:07 PM Subject: How do I this in Solr? Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria "All of the words of the search result document are present in the search query" For example: If I have the following documents indexed: "nokia n95", "GPS", "android", "samsung", "samsung andriod", "nokia andriod", "mobile with GPS" If I search with the text "samsung andriod GPS", search results should only conain "samsung", "GPS", "andriod" and "samsung andriod". Is there a way to do this in Solr. -- Thanks Varun Gupta %<&b6G$J0T.'$$'d(l/f,r!C Checked by AVG - www.avg.com Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10 14:34:00
Re: Multiple Word Facets
Facets are generated from indexed terms. Depending on your need/use-case: You can use a additional separate String field (which is not tokenized) for facets, populate it via copyField. Search on tokenized field facet on non-tokenized field. Or You can add solr.ShingleFilterFactory to your index analyzer to form multiple word terms. --- On Wed, 10/27/10, Adam Estrada wrote: > From: Adam Estrada > Subject: Multiple Word Facets > To: solr-user@lucene.apache.org > Date: Wednesday, October 27, 2010, 4:43 AM > All, > I am a new to Solr faceting and stuck on how to get > multiple-word > facets returned from a standard Solr query. See below for > what is > currently being returned. > > > > > > 89 > 87 > 87 > 87 > 84 > 60 > 32 > 22 > 19 > 15 > 15 > 14 > 12 > 11 > 10 > 9 > 7 > 7 > 7 > 6 > 6 > 6 > 6 > ...etc... > > There are many terms in there that are 2 or 3 word phrases. > For > example, Eastern Federal Lands Highway Division all gets > broken down > in to the individual words that make up the total group of > words. I've > seen quite a few websites that do what it is I am trying to > do here so > any suggestions at this point would be great. See my schema > below > (copied from the example schema). > > class="solr.TextField" positionIncrementGap="100"> > > class="solr.WhitespaceTokenizerFactory"/> > class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="false"/> > class="solr.StopFilterFactory" > > ignoreCase="true" > > words="stopwords.txt" > > enablePositionIncrements="true" > > /> > class="solr.WordDelimiterFilterFactory" > generateWordParts="1" > generateNumberParts="1" catenateWords="0" > catenateNumbers="0" > catenateAll="0" splitOnCaseChange="1"/> > class="solr.RemoveDuplicatesTokenFilterFactory"/> > > > Similar for type="query". Please advise on how to group or > cluster > document terms so that they can be used as facets. > > Many thanks in advance, > Adam Estrada >
Re: Multiple Word Facets
Use this field type - On Tue, Oct 26, 2010 at 6:43 PM, Adam Estrada wrote: > All, > I am a new to Solr faceting and stuck on how to get multiple-word > facets returned from a standard Solr query. See below for what is > currently being returned. > > > > > > 89 > 87 > 87 > 87 > 84 > 60 > 32 > 22 > 19 > 15 > 15 > 14 > 12 > 11 > 10 > 9 > 7 > 7 > 7 > 6 > 6 > 6 > 6 > ...etc... > > There are many terms in there that are 2 or 3 word phrases. For > example, Eastern Federal Lands Highway Division all gets broken down > in to the individual words that make up the total group of words. I've > seen quite a few websites that do what it is I am trying to do here so > any suggestions at this point would be great. See my schema below > (copied from the example schema). > > positionIncrementGap="100"> > > > ignoreCase="true" expand="false"/> >ignoreCase="true" >words="stopwords.txt" >enablePositionIncrements="true" >/> > generateWordParts="1" > generateNumberParts="1" catenateWords="0" catenateNumbers="0" > catenateAll="0" splitOnCaseChange="1"/> > > > > Similar for type="query". Please advise on how to group or cluster > document terms so that they can be used as facets. > > Many thanks in advance, > Adam Estrada >
Re: snapshot-4.0 and maven
You use maven-assembly-plugin's jar-with-dependencies to build a single jar with all its dependencies http://stackoverflow.com/questions/574594/how-can-i-create-an-executable-jar-with-dependencies-using-maven @tommychheng On 10/19/10 6:53 AM, Matt Mitchell wrote: Hey thanks Tommy. To be more specific, I'm trying to use SolrJ in a clojure project. When I try to use SolrJ using what you showed me, I get errors saying lucene classes can't be found etc.. Is there a way to build everything SolrJ (snapshot-4.0) needs into one jar? Matt On Mon, Oct 18, 2010 at 11:01 PM, Tommy Chheng wrote: Once you built the solr 4.0 jar, you can use mvn's install command like this: mvn install:install-file -DgroupId=org.apache -DartifactId=solr -Dpackaging=jar -Dversion=4.0-SNAPSHOT -Dfile=solr-4.0-SNAPSHOT.jar -DgeneratePom=true @tommychheng On 10/18/10 7:28 PM, Matt Mitchell wrote: I'd like to get solr snapshot-4.0 pushed into my local maven repo. Is this possible to do? If so, could someone give me a tip or two on getting started? Thanks, Matt
Multiple Word Facets
All, I am a new to Solr faceting and stuck on how to get multiple-word facets returned from a standard Solr query. See below for what is currently being returned. 89 87 87 87 84 60 32 22 19 15 15 14 12 11 10 9 7 7 7 6 6 6 6 ...etc... There are many terms in there that are 2 or 3 word phrases. For example, Eastern Federal Lands Highway Division all gets broken down in to the individual words that make up the total group of words. I've seen quite a few websites that do what it is I am trying to do here so any suggestions at this point would be great. See my schema below (copied from the example schema). Similar for type="query". Please advise on how to group or cluster document terms so that they can be used as facets. Many thanks in advance, Adam Estrada
Re: ClassCastException Issue
On Mon, Oct 25, 2010 at 2:45 AM, Alex Matviychuk wrote: > Getting this when deploying to tomcat: > > [INFO][http-4443-exec-3][solr.schema.IndexSchema] readSchema():394 > Reading Solr Schema > [INFO][http-4443-exec-3][solr.schema.IndexSchema] readSchema():408 > Schema name=tsadmin > [ERROR][http-4443-exec-3][util.plugin.AbstractPluginLoader] log():139 > java.lang.ClassCastException: org.apache.solr.schema.StrField cannot > be cast to org.apache.solr.schema.FieldType >at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:419) >at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:447) >at > org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141) >at > org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:456) >at org.apache.solr.schema.IndexSchema.(IndexSchema.java:95) >at org.apache.solr.core.SolrCore.(SolrCore.java:520) >at > org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) > > > solr schema: > > > > > sortMissingLast="true" omitNorms="true"/> >... > > > > ... > > > > > Any ideas? > > Thanks, > Alex Matviychuk > Alex, I've run into this issue myself, and it was because I tried to create a fieldType called string (like you). Rename "string" to something else and the exception should go away. - Ken
RE: How do I this in Solr?
Hi Matt, I think your concern about performance is spot-on, though. The combinatorial explosion would be at query time, not at index time - my solution has a single token indexed per document. My suggested query-time filter would generate the following number of output terms, where C(n,k) is the combination of n things taken k at a time, n is the number of input query terms, and k is the number of concatenated input query terms forming one output query term: C(n,1)+C(n,2)...+C(n,n-1)+C(n,n) For small queries this would not be a problem: 1 input query term -> 1 output query term 2 input query terms -> 3 output query terms 3 input query terms -> 7 output query terms 4 input query terms -> 15 output query terms But for larger queries, it could be fairly expensive: 10 input query terms -> 1,023 output query terms ... 15 input query terms -> 32,767 output query terms This is exactly (2^n - 1) output query terms, where n is the number of input terms. 32k query terms might be too slow to be functional. Steve > -Original Message- > From: Matthew Hall [mailto:mh...@informatics.jax.org] > Sent: Tuesday, October 26, 2010 3:51 PM > To: solr-user@lucene.apache.org > Subject: Re: How do I this in Solr? > > Bah.. nope this would miss documents that only match a subset of the > given terms. > > I'm going to have to go with Steven's approach as the right choice here. > > Matt > > On 10/26/2010 3:44 PM, Matthew Hall wrote: > > Indeed, I'd missed the second part of his requirements, my and > > solution is sadly insufficient to this task. > > > > The combinatorial part of you solution worries me a bit though Steven, > > because his documents that are on the larger side of his corpus would > > likely slow down query performance a bit while the filter calculates > > all of the possibilities for a given document. > > > > I'm wondering if a slightly hybrid approach would be valid: > > > > Have a filter that calculates the total number of terms for a given > > document. And then add a clause into your query at runtime that would > > match what the filter would come up with: > > > > So: > > > > text:"Nokia" AND text:"Mobile" AND text:"GPS" AND termCount: 3 > > > > Something like that anyhow. > > > > Matt > > > > On 10/26/2010 3:35 PM, Dennis Gearon wrote: > >> I'm the LAST person anyone will ever need to worry about flame > >> baiting. You did notice that I retracted what I said and supported > >> your point of view? > >> > >> Sorry if my cryptic comment sounded critical. I was wrong, you were > >> right :-) > >> Dennis Gearon > >> > >> Signature Warning > >> > >> It is always a good idea to learn from your own mistakes. It is > >> usually a better idea to learn from others’ mistakes, so you do not > >> have to make them yourself. from > >> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > >> > >> EARTH has a Right To Life, > >>otherwise we all die. > >> > >> > >> --- On Tue, 10/26/10, Steven A Rowe wrote: > >> > >>> From: Steven A Rowe > >>> Subject: RE: How do I this in Solr? > >>> To: "solr-user@lucene.apache.org" > >>> Date: Tuesday, October 26, 2010, 12:27 PM > >>> Hi Dennis, > >>> > >>> You wrote: > If Solr is like Google, once documents matching only > >>> the ANDed items > in the query ran out, then those that had only two of > >>> the terms, then > only 1 of the terms, and then those close to it would > >>> start showing up. > >>> [...] > Plus, if he wants terms that contain ONLY those words, > >>> and no others, an > ANDed query would not do that, right? ANDed queries > >>> return results that > must have ALL the terms listed, and could have lots of > >>> other words, right? > >>> > >>> This is *exactly* what I just said: ANDed queries (i.e., > >>> requiring all query terms) will not satisfy Varun's > >>> requirements. > >>> > >>> Your participation in this thread looks an awful lot like > >>> flame-bating: Someone else asks a question, I answer with a > >>> possible solution, you give a one-word "overkill" response, > >>> I say why it's not overkill. You then ask if anybody > >>> knows the answer to the original question, and then parrot > >>> my response to your "overkill" statement. Really > >>> > >>> Get your shit together or shut up. Please. > >>> > >>> Steve > >>> > -Original Message- > From: Dennis Gearon [mailto:gear...@sbcglobal.net] > Sent: Tuesday, October 26, 2010 3:14 PM > To: solr-user@lucene.apache.org > Subject: RE: How do I this in Solr? > > > > Dennis Gearon > > Signature Warning > > It is always a good idea to learn from your own > >>> mistakes. It is usually a > better idea to learn from others’ mistakes, so you > >>> do not have to make > them yourself. from > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > EART
Re: Strange search
Try to do some changes, but it's not help: In _http://localhost:8983/search/admin/schema.jsp I have, for example, term "main" and frequency "7" for this term. But if I try to find this I don't get any result. If I use wildcard, I have only 4 docs in response. But if I try to find term "html" (frequency "5") I don't get any result even with wildcard. Where is problem and how I can it solvе? -- View this message in context: http://lucene.472066.n3.nabble.com/Strange-search-tp998961p1774059.html Sent from the Solr - User mailing list archive at Nabble.com.
Jars required in classpath to run embedded solr server?
Hi everyone, Do we need all lucene jars in the class path for this? Seems that the solr-solrj and solr-core jars are not enough (http://wiki.apache.org/solr/Solrj). It is asking for lucene jars in the classpath. Could I know what jars are required to run this? Thanks in advance. -- Regards, Tharindu
Re: How do I this in Solr?
Bah.. nope this would miss documents that only match a subset of the given terms. I'm going to have to go with Steven's approach as the right choice here. Matt On 10/26/2010 3:44 PM, Matthew Hall wrote: Indeed, I'd missed the second part of his requirements, my and solution is sadly insufficient to this task. The combinatorial part of you solution worries me a bit though Steven, because his documents that are on the larger side of his corpus would likely slow down query performance a bit while the filter calculates all of the possibilities for a given document. I'm wondering if a slightly hybrid approach would be valid: Have a filter that calculates the total number of terms for a given document. And then add a clause into your query at runtime that would match what the filter would come up with: So: text:"Nokia" AND text:"Mobile" AND text:"GPS" AND termCount: 3 Something like that anyhow. Matt On 10/26/2010 3:35 PM, Dennis Gearon wrote: I'm the LAST person anyone will ever need to worry about flame baiting. You did notice that I retracted what I said and supported your point of view? Sorry if my cryptic comment sounded critical. I was wrong, you were right :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe wrote: From: Steven A Rowe Subject: RE: How do I this in Solr? To: "solr-user@lucene.apache.org" Date: Tuesday, October 26, 2010, 12:27 PM Hi Dennis, You wrote: If Solr is like Google, once documents matching only the ANDed items in the query ran out, then those that had only two of the terms, then only 1 of the terms, and then those close to it would start showing up. [...] Plus, if he wants terms that contain ONLY those words, and no others, an ANDed query would not do that, right? ANDed queries return results that must have ALL the terms listed, and could have lots of other words, right? This is *exactly* what I just said: ANDed queries (i.e., requiring all query terms) will not satisfy Varun's requirements. Your participation in this thread looks an awful lot like flame-bating: Someone else asks a question, I answer with a possible solution, you give a one-word "overkill" response, I say why it's not overkill. You then ask if anybody knows the answer to the original question, and then parrot my response to your "overkill" statement. Really Get your shit together or shut up. Please. Steve -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Tuesday, October 26, 2010 3:14 PM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe wrote: From: Steven A Rowe Subject: RE: How do I this in Solr? To: "solr-user@lucene.apache.org" Date: Tuesday, October 26, 2010, 12:10 PM Dennis, Do you mean to say that you read my earlier post, and disagree that it would solve the problem? Or have you simply not read it? Steve -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Tuesday, October 26, 2010 3:00 PM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? Good point. Since I might need such a query myself someday, how *IS* that done? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe wrote: From: Steven A Rowe Subject: RE: How do I this in Solr? To: "solr-user@lucene.apache.org" Date: Tuesday, October 26, 2010, 11:46 AM Um, maybe I'm way off base, but when Varun said: If I search with the text "samsung andriod GPS", search results should only conain "samsung", "GPS", "andriod" and "samsung andriod". I interpreted that to mean that hit documents should contain terms from the query, and nothing else. Making all terms required doesn't do this. Steve -Original Message- From: Matthew Hall [mailto:mh...@informatics.jax.org] Sent: Tuesday, October 26, 2010 2:30 PM To: solr-user@lucene.apache.org Subject: Re: How do I this in Solr? Um.. you could change your default clause to AND rather than or. That shoul
Re: How do I this in Solr?
Indeed, I'd missed the second part of his requirements, my and solution is sadly insufficient to this task. The combinatorial part of you solution worries me a bit though Steven, because his documents that are on the larger side of his corpus would likely slow down query performance a bit while the filter calculates all of the possibilities for a given document. I'm wondering if a slightly hybrid approach would be valid: Have a filter that calculates the total number of terms for a given document. And then add a clause into your query at runtime that would match what the filter would come up with: So: text:"Nokia" AND text:"Mobile" AND text:"GPS" AND termCount: 3 Something like that anyhow. Matt On 10/26/2010 3:35 PM, Dennis Gearon wrote: I'm the LAST person anyone will ever need to worry about flame baiting. You did notice that I retracted what I said and supported your point of view? Sorry if my cryptic comment sounded critical. I was wrong, you were right :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe wrote: From: Steven A Rowe Subject: RE: How do I this in Solr? To: "solr-user@lucene.apache.org" Date: Tuesday, October 26, 2010, 12:27 PM Hi Dennis, You wrote: If Solr is like Google, once documents matching only the ANDed items in the query ran out, then those that had only two of the terms, then only 1 of the terms, and then those close to it would start showing up. [...] Plus, if he wants terms that contain ONLY those words, and no others, an ANDed query would not do that, right? ANDed queries return results that must have ALL the terms listed, and could have lots of other words, right? This is *exactly* what I just said: ANDed queries (i.e., requiring all query terms) will not satisfy Varun's requirements. Your participation in this thread looks an awful lot like flame-bating: Someone else asks a question, I answer with a possible solution, you give a one-word "overkill" response, I say why it's not overkill. You then ask if anybody knows the answer to the original question, and then parrot my response to your "overkill" statement. Really Get your shit together or shut up. Please. Steve -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Tuesday, October 26, 2010 3:14 PM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe wrote: From: Steven A Rowe Subject: RE: How do I this in Solr? To: "solr-user@lucene.apache.org" Date: Tuesday, October 26, 2010, 12:10 PM Dennis, Do you mean to say that you read my earlier post, and disagree that it would solve the problem? Or have you simply not read it? Steve -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Tuesday, October 26, 2010 3:00 PM To: solr-user@lucene.apache.org Subject: RE: How do I this in Solr? Good point. Since I might need such a query myself someday, how *IS* that done? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe wrote: From: Steven A Rowe Subject: RE: How do I this in Solr? To: "solr-user@lucene.apache.org" Date: Tuesday, October 26, 2010, 11:46 AM Um, maybe I'm way off base, but when Varun said: If I search with the text "samsung andriod GPS", search results should only conain "samsung", "GPS", "andriod" and "samsung andriod". I interpreted that to mean that hit documents should contain terms from the query, and nothing else. Making all terms required doesn't do this. Steve -Original Message- From: Matthew Hall [mailto:mh...@informatics.jax.org] Sent: Tuesday, October 26, 2010 2:30 PM To: solr-user@lucene.apache.org Subject: Re: How do I this in Solr? Um.. you could change your default clause to AND rather than or. That should do the trick. Matt On 10/26/2010 2:26 PM, Dennis Gearon wrote: Overkill? Dennis Gearon I can't think of a way to do it without writing new analysis filters. But I think you could do what you want wi
Re: Highlighting for non-stored fields
Thanks for the insight. This is definitely a feasible solution because I only need to highlight when the user open the document. I guess the easiest way I can do this is to "reuse" the solr code (with some modification) in my own application. On Tue, Oct 26, 2010 at 2:35 PM, Pradeep Singh wrote: > Another way you can do this is - after the search has completed, load the > field in your application, write separate code to reanalyze that > field/document, index it in RAM, and run it through highlighter classes. > All > this as part of your web application outside of Solr. Considering the size > of your data it doesn't look advisable to store it because then you would > be > almost doubling the size of your index (if you are looking to highlight on > a > field then it's probably going to be full of content). > > -Pradeep > > On Tue, Oct 26, 2010 at 8:32 AM, Phong Dais wrote: > > > Hi, > > > > I understand that I need to store the fields in order to use highlighting > > "out of the box". > > I'm looking for a way to highlighting using term offsets instead of the > > actual text since the text is not stored. What am asking is is it > possible > > to modify the response (thru custom implementation) to contain > highlighted > > offsets instead of the actual matched text. Should I be writing my own > > DefaultHighlighter? Or overiding some of its functionality? Can this be > > done this way or am I way off? > > > > BTW, I'm using solr-1.4. > > > > Thanks, > > P. > > > > On Tue, Oct 26, 2010 at 9:25 AM, Israel Ekpo > wrote: > > > > > Check out this link > > > > > > http://wiki.apache.org/solr/FieldOptionsByUseCase > > > > > > You need to store the field if you want to use the highlighting > feature. > > > > > > If you need to retrieve and display the highlighted snippets then the > > > fields > > > definitely needs to be stored. > > > > > > To use term offsets, it will be a good idea to enable the following > > > attributes for that field termVectors termPositions termOffsets > > > > > > The only issue here is that your storage costs will increase because of > > > these extra features. > > > > > > Nevertheless, you definitely need to store the field if you need to > > > retrieve > > > it for highlighting purposes. > > > > > > On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais > > wrote: > > > > > > > Hi, > > > > > > > > I've been looking thru the mailing archive for the past week and I > > > haven't > > > > found any useful info regarding this issue. > > > > > > > > My requirement is to index a few terabytes worth of data to be > > searched. > > > > Due to the size of the data, I would like to index without storing > but > > I > > > > would like to use the highlighting feature. Is this even possible? > > What > > > > are my options? > > > > > > > > I've read about termOffsets, payload that could possibly be used to > do > > > this > > > > but I have no idea how this could be done. > > > > > > > > Any pointers greatly appreciated. Someone please point me in the > right > > > > direction. > > > > > > > > I don't mind having to write some code or digging thru existing code > > to > > > > accomplish this task. > > > > > > > > Thanks, > > > > P. > > > > > > > > > > > > > > > > -- > > > °O° > > > "Good Enough" is not good enough. > > > To give anything less than your best is to sacrifice the gift. > > > Quality First. Measure Twice. Cut Once. > > > http://www.israelekpo.com/ > > > > > >
RE: How do I this in Solr?
Dennis, I wasn't trying to force your admission of my rectitude - I was just getting frustrated that the conversation was moving in spiral fashion, and was worried that you might have intentionally engineered that. I'm glad to hear that you weren't flame baiting. Steve > -Original Message- > From: Dennis Gearon [mailto:gear...@sbcglobal.net] > Sent: Tuesday, October 26, 2010 3:35 PM > To: solr-user@lucene.apache.org > Subject: RE: How do I this in Solr? > > I'm the LAST person anyone will ever need to worry about flame baiting. > You did notice that I retracted what I said and supported your point of > view? > > Sorry if my cryptic comment sounded critical. I was wrong, you were right > :-) > Dennis Gearon > > Signature Warning > > It is always a good idea to learn from your own mistakes. It is usually a > better idea to learn from others’ mistakes, so you do not have to make > them yourself. from > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > EARTH has a Right To Life, > otherwise we all die. > > > --- On Tue, 10/26/10, Steven A Rowe wrote: > > > From: Steven A Rowe > > Subject: RE: How do I this in Solr? > > To: "solr-user@lucene.apache.org" > > Date: Tuesday, October 26, 2010, 12:27 PM > > Hi Dennis, > > > > You wrote: > > > If Solr is like Google, once documents matching only > > the ANDed items > > > in the query ran out, then those that had only two of > > the terms, then > > > only 1 of the terms, and then those close to it would > > start showing up. > > [...] > > > Plus, if he wants terms that contain ONLY those words, > > and no others, an > > > ANDed query would not do that, right? ANDed queries > > return results that > > > must have ALL the terms listed, and could have lots of > > other words, right? > > > > This is *exactly* what I just said: ANDed queries (i.e., > > requiring all query terms) will not satisfy Varun's > > requirements. > > > > Your participation in this thread looks an awful lot like > > flame-bating: Someone else asks a question, I answer with a > > possible solution, you give a one-word "overkill" response, > > I say why it's not overkill. You then ask if anybody > > knows the answer to the original question, and then parrot > > my response to your "overkill" statement. Really > > > > Get your shit together or shut up. Please. > > > > Steve > > > > > -Original Message- > > > From: Dennis Gearon [mailto:gear...@sbcglobal.net] > > > Sent: Tuesday, October 26, 2010 3:14 PM > > > To: solr-user@lucene.apache.org > > > Subject: RE: How do I this in Solr? > > > > > > > > > > > > Dennis Gearon > > > > > > Signature Warning > > > > > > It is always a good idea to learn from your own > > mistakes. It is usually a > > > better idea to learn from others’ mistakes, so you > > do not have to make > > > them yourself. from > > > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > > > > > EARTH has a Right To Life, > > > otherwise we all die. > > > > > > > > > --- On Tue, 10/26/10, Steven A Rowe > > wrote: > > > > > > > From: Steven A Rowe > > > > Subject: RE: How do I this in Solr? > > > > To: "solr-user@lucene.apache.org" > > > > > > Date: Tuesday, October 26, 2010, 12:10 PM > > > > Dennis, > > > > > > > > Do you mean to say that you read my earlier post, > > and > > > > disagree that it would solve the problem? Or > > have you > > > > simply not read it? > > > > > > > > Steve > > > > > > > > > -Original Message- > > > > > From: Dennis Gearon [mailto:gear...@sbcglobal.net] > > > > > Sent: Tuesday, October 26, 2010 3:00 PM > > > > > To: solr-user@lucene.apache.org > > > > > Subject: RE: How do I this in Solr? > > > > > > > > > > Good point. Since I might need such a query > > myself > > > > someday, how *IS* that > > > > > done? > > > > > > > > > > > > > > > Dennis Gearon > > > > > > > > > > Signature Warning > > > > > > > > > > It is always a good idea to learn from your > > own > > > > mistakes. It is usually a > > > > > better idea to learn from others’ > > mistakes, so you > > > > do not have to make > > > > > them yourself. from > > > > > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > > > > > > > > > > > > EARTH has a Right To Life, > > > > > otherwise we all die. > > > > > > > > > > > > > > > --- On Tue, 10/26/10, Steven A Rowe > > > > wrote: > > > > > > > > > > > From: Steven A Rowe > > > > > > Subject: RE: How do I this in Solr? > > > > > > To: "solr-user@lucene.apache.org" > > > > > > > > > > Date: Tuesday, October 26, 2010, 11:46 > > AM > > > > > > Um, maybe I'm way off base, but when > > > > > > Varun said: > > > > > > > > > > > > > If I search with the text "samsung > > andriod > > > > GPS", > > > > > > > search results should only conain > > "samsung", > > > > "GPS", > > > > > > > "andriod" and "samsung andriod". > > > > > > > > > > > > I interpreted that to mean that hit > > documents > > > > shou
RE: How do I this in Solr?
I'm the LAST person anyone will ever need to worry about flame baiting. You did notice that I retracted what I said and supported your point of view? Sorry if my cryptic comment sounded critical. I was wrong, you were right :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe wrote: > From: Steven A Rowe > Subject: RE: How do I this in Solr? > To: "solr-user@lucene.apache.org" > Date: Tuesday, October 26, 2010, 12:27 PM > Hi Dennis, > > You wrote: > > If Solr is like Google, once documents matching only > the ANDed items > > in the query ran out, then those that had only two of > the terms, then > > only 1 of the terms, and then those close to it would > start showing up. > [...] > > Plus, if he wants terms that contain ONLY those words, > and no others, an > > ANDed query would not do that, right? ANDed queries > return results that > > must have ALL the terms listed, and could have lots of > other words, right? > > This is *exactly* what I just said: ANDed queries (i.e., > requiring all query terms) will not satisfy Varun's > requirements. > > Your participation in this thread looks an awful lot like > flame-bating: Someone else asks a question, I answer with a > possible solution, you give a one-word "overkill" response, > I say why it's not overkill. You then ask if anybody > knows the answer to the original question, and then parrot > my response to your "overkill" statement. Really > > Get your shit together or shut up. Please. > > Steve > > > -Original Message- > > From: Dennis Gearon [mailto:gear...@sbcglobal.net] > > Sent: Tuesday, October 26, 2010 3:14 PM > > To: solr-user@lucene.apache.org > > Subject: RE: How do I this in Solr? > > > > > > > > Dennis Gearon > > > > Signature Warning > > > > It is always a good idea to learn from your own > mistakes. It is usually a > > better idea to learn from others’ mistakes, so you > do not have to make > > them yourself. from > > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > > EARTH has a Right To Life, > > otherwise we all die. > > > > > > --- On Tue, 10/26/10, Steven A Rowe > wrote: > > > > > From: Steven A Rowe > > > Subject: RE: How do I this in Solr? > > > To: "solr-user@lucene.apache.org" > > > > Date: Tuesday, October 26, 2010, 12:10 PM > > > Dennis, > > > > > > Do you mean to say that you read my earlier post, > and > > > disagree that it would solve the problem? Or > have you > > > simply not read it? > > > > > > Steve > > > > > > > -Original Message- > > > > From: Dennis Gearon [mailto:gear...@sbcglobal.net] > > > > Sent: Tuesday, October 26, 2010 3:00 PM > > > > To: solr-user@lucene.apache.org > > > > Subject: RE: How do I this in Solr? > > > > > > > > Good point. Since I might need such a query > myself > > > someday, how *IS* that > > > > done? > > > > > > > > > > > > Dennis Gearon > > > > > > > > Signature Warning > > > > > > > > It is always a good idea to learn from your > own > > > mistakes. It is usually a > > > > better idea to learn from others’ > mistakes, so you > > > do not have to make > > > > them yourself. from > > > > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > > > > > > > > EARTH has a Right To Life, > > > > otherwise we all die. > > > > > > > > > > > > --- On Tue, 10/26/10, Steven A Rowe > > > wrote: > > > > > > > > > From: Steven A Rowe > > > > > Subject: RE: How do I this in Solr? > > > > > To: "solr-user@lucene.apache.org" > > > > > > > > Date: Tuesday, October 26, 2010, 11:46 > AM > > > > > Um, maybe I'm way off base, but when > > > > > Varun said: > > > > > > > > > > > If I search with the text "samsung > andriod > > > GPS", > > > > > > search results should only conain > "samsung", > > > "GPS", > > > > > > "andriod" and "samsung andriod". > > > > > > > > > > I interpreted that to mean that hit > documents > > > should > > > > > contain terms from the query, and > nothing else. > > > Making > > > > > all terms required doesn't do this. > > > > > > > > > > Steve > > > > > > > > > > > -Original Message- > > > > > > From: Matthew Hall [mailto:mh...@informatics.jax.org] > > > > > > Sent: Tuesday, October 26, 2010 > 2:30 PM > > > > > > To: solr-user@lucene.apache.org > > > > > > Subject: Re: How do I this in > Solr? > > > > > > > > > > > > Um.. you could change your default > clause to > > > AND > > > > > rather than or. > > > > > > > > > > > > That should do the trick. > > > > > > > > > > > > Matt > > > > > > > > > > > > On 10/26/2010 2:26 PM, Dennis > Gearon wrote: > > > > > > > Overkill? > > > > > > > > > > > > > > Dennis Gearon > > > > > > >> I can't think of
RE: How do I this in Solr?
Hi Dennis, You wrote: > If Solr is like Google, once documents matching only the ANDed items > in the query ran out, then those that had only two of the terms, then > only 1 of the terms, and then those close to it would start showing up. [...] > Plus, if he wants terms that contain ONLY those words, and no others, an > ANDed query would not do that, right? ANDed queries return results that > must have ALL the terms listed, and could have lots of other words, right? This is *exactly* what I just said: ANDed queries (i.e., requiring all query terms) will not satisfy Varun's requirements. Your participation in this thread looks an awful lot like flame-bating: Someone else asks a question, I answer with a possible solution, you give a one-word "overkill" response, I say why it's not overkill. You then ask if anybody knows the answer to the original question, and then parrot my response to your "overkill" statement. Really Get your shit together or shut up. Please. Steve > -Original Message- > From: Dennis Gearon [mailto:gear...@sbcglobal.net] > Sent: Tuesday, October 26, 2010 3:14 PM > To: solr-user@lucene.apache.org > Subject: RE: How do I this in Solr? > > > > Dennis Gearon > > Signature Warning > > It is always a good idea to learn from your own mistakes. It is usually a > better idea to learn from others’ mistakes, so you do not have to make > them yourself. from > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > EARTH has a Right To Life, > otherwise we all die. > > > --- On Tue, 10/26/10, Steven A Rowe wrote: > > > From: Steven A Rowe > > Subject: RE: How do I this in Solr? > > To: "solr-user@lucene.apache.org" > > Date: Tuesday, October 26, 2010, 12:10 PM > > Dennis, > > > > Do you mean to say that you read my earlier post, and > > disagree that it would solve the problem? Or have you > > simply not read it? > > > > Steve > > > > > -Original Message- > > > From: Dennis Gearon [mailto:gear...@sbcglobal.net] > > > Sent: Tuesday, October 26, 2010 3:00 PM > > > To: solr-user@lucene.apache.org > > > Subject: RE: How do I this in Solr? > > > > > > Good point. Since I might need such a query myself > > someday, how *IS* that > > > done? > > > > > > > > > Dennis Gearon > > > > > > Signature Warning > > > > > > It is always a good idea to learn from your own > > mistakes. It is usually a > > > better idea to learn from others’ mistakes, so you > > do not have to make > > > them yourself. from > > > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > > > > > EARTH has a Right To Life, > > > otherwise we all die. > > > > > > > > > --- On Tue, 10/26/10, Steven A Rowe > > wrote: > > > > > > > From: Steven A Rowe > > > > Subject: RE: How do I this in Solr? > > > > To: "solr-user@lucene.apache.org" > > > > > > Date: Tuesday, October 26, 2010, 11:46 AM > > > > Um, maybe I'm way off base, but when > > > > Varun said: > > > > > > > > > If I search with the text "samsung andriod > > GPS", > > > > > search results should only conain "samsung", > > "GPS", > > > > > "andriod" and "samsung andriod". > > > > > > > > I interpreted that to mean that hit documents > > should > > > > contain terms from the query, and nothing else. > > Making > > > > all terms required doesn't do this. > > > > > > > > Steve > > > > > > > > > -Original Message- > > > > > From: Matthew Hall [mailto:mh...@informatics.jax.org] > > > > > Sent: Tuesday, October 26, 2010 2:30 PM > > > > > To: solr-user@lucene.apache.org > > > > > Subject: Re: How do I this in Solr? > > > > > > > > > > Um.. you could change your default clause to > > AND > > > > rather than or. > > > > > > > > > > That should do the trick. > > > > > > > > > > Matt > > > > > > > > > > On 10/26/2010 2:26 PM, Dennis Gearon wrote: > > > > > > Overkill? > > > > > > > > > > > > Dennis Gearon > > > > > >> I can't think of a way to do it > > without > > > > writing new > > > > > >> analysis filters. > > > > > >> > > > > > >> But I think you could do what you > > want with > > > > two filters > > > > > >> (this is untested): > > > > > >> > > > > > >> 1. An index-time filter that > > outputs a single > > > > token > > > > > >> consisting of all of the input > > tokens, sorted > > > > in a > > > > > >> consistent way, e.g.: > > > > > >> > > > > > >> "mobile with GPS" > > > > -> "GPS mobile > > > > > >> with" > > > > > >> "samsung android" > > > > -> "android > > > > > >> samsung" > > > > > >> > > > > > >> 2. A query-time filter that outputs > > one token > > > > per input > > > > > >> term combination, sorted in the > > same > > > > consistent way as the > > > > > >> index-time filter, e.g.: > > > > > >> > > > > > >> "samsung andriod > > > > GPS" > > > > > >> -> > > > > > >> "samsung","android","GPS", > > > > > >> "android > > > > > >> samsung","GPS samsung","android > > GPS" > > > > > >> "android > > > > GPS > > > > > >> samsung
How does DIH multithreading work?
I understand that the thread count is specified on root entities only. Does it spawn multiple threads per root entity? Or multiple threads per descendant entity? Can someone give an example of how you would make a database query in an entity with 4 threads that would select 1 row per thread? Thanks, Mark -- View this message in context: http://lucene.472066.n3.nabble.com/How-does-DIH-multithreading-work-tp1776111p1776111.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: How do I this in Solr?
Plus, if he wants terms that contain ONLY those words, and no others, an ANDed query would not do that, right? ANDed queries return results that must have ALL the terms listed, and could have lots of other words, right? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe wrote: > From: Steven A Rowe > Subject: RE: How do I this in Solr? > To: "solr-user@lucene.apache.org" > Date: Tuesday, October 26, 2010, 12:10 PM > Dennis, > > Do you mean to say that you read my earlier post, and > disagree that it would solve the problem? Or have you > simply not read it? > > Steve > > > -Original Message- > > From: Dennis Gearon [mailto:gear...@sbcglobal.net] > > Sent: Tuesday, October 26, 2010 3:00 PM > > To: solr-user@lucene.apache.org > > Subject: RE: How do I this in Solr? > > > > Good point. Since I might need such a query myself > someday, how *IS* that > > done? > > > > > > Dennis Gearon > > > > Signature Warning > > > > It is always a good idea to learn from your own > mistakes. It is usually a > > better idea to learn from others’ mistakes, so you > do not have to make > > them yourself. from > > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > > EARTH has a Right To Life, > > otherwise we all die. > > > > > > --- On Tue, 10/26/10, Steven A Rowe > wrote: > > > > > From: Steven A Rowe > > > Subject: RE: How do I this in Solr? > > > To: "solr-user@lucene.apache.org" > > > > Date: Tuesday, October 26, 2010, 11:46 AM > > > Um, maybe I'm way off base, but when > > > Varun said: > > > > > > > If I search with the text "samsung andriod > GPS", > > > > search results should only conain "samsung", > "GPS", > > > > "andriod" and "samsung andriod". > > > > > > I interpreted that to mean that hit documents > should > > > contain terms from the query, and nothing else. > Making > > > all terms required doesn't do this. > > > > > > Steve > > > > > > > -Original Message- > > > > From: Matthew Hall [mailto:mh...@informatics.jax.org] > > > > Sent: Tuesday, October 26, 2010 2:30 PM > > > > To: solr-user@lucene.apache.org > > > > Subject: Re: How do I this in Solr? > > > > > > > > Um.. you could change your default clause to > AND > > > rather than or. > > > > > > > > That should do the trick. > > > > > > > > Matt > > > > > > > > On 10/26/2010 2:26 PM, Dennis Gearon wrote: > > > > > Overkill? > > > > > > > > > > Dennis Gearon > > > > >> I can't think of a way to do it > without > > > writing new > > > > >> analysis filters. > > > > >> > > > > >> But I think you could do what you > want with > > > two filters > > > > >> (this is untested): > > > > >> > > > > >> 1. An index-time filter that > outputs a single > > > token > > > > >> consisting of all of the input > tokens, sorted > > > in a > > > > >> consistent way, e.g.: > > > > >> > > > > >> "mobile with GPS" > > > -> "GPS mobile > > > > >> with" > > > > >> "samsung android" > > > -> "android > > > > >> samsung" > > > > >> > > > > >> 2. A query-time filter that outputs > one token > > > per input > > > > >> term combination, sorted in the > same > > > consistent way as the > > > > >> index-time filter, e.g.: > > > > >> > > > > >> "samsung andriod > > > GPS" > > > > >> -> > > > > >> "samsung","android","GPS", > > > > >> "android > > > > >> samsung","GPS samsung","android > GPS" > > > > >> "android > > > GPS > > > > >> samsung" > > > > >> > > > > >> Steve > > > > >> > > > > >>> -Original Message- > > > > >>> From: Varun Gupta [mailto:varun.vgu...@gmail.com] > > > > >>> Sent: Tuesday, October 26, 2010 > 9:08 AM > > > > >>> To: solr-user@lucene.apache.org > > > > >>> Subject: How do I this in > Solr? > > > > >>> > > > > >>> Hi, > > > > >>> > > > > >>> I have lot of small documents > (each > > > containing 1 to 15 > > > > >> words) indexed in > > > > >>> Solr. For the search query, I > want the > > > search results > > > > >> to contain only > > > > >>> those > > > > >>> documents that satisfy this > criteria "All > > > of the words > > > > >> of the search > > > > >>> result > > > > >>> document are present in the > search > > > query" > > > > >>> > > > > >>> For example: > > > > >>> If I have the following > documents > > > indexed: "nokia > > > > >> n95", "GPS", "android", > > > > >>> "samsung", "samsung andriod", > "nokia > > > andriod", "mobile > > > > >> with GPS" > > > > >>> If I search with the text > "samsung > > > andriod GPS", > > > > >> search results should > > > > >>> only > > > > >>> conain "samsung", "GPS", > "andriod" and > > > "samsung > > > > >> andriod". > > > > >>> Is there a way to do this in > Solr. > > > > >>> > > > > >>
RE: How do I this in Solr?
If Solr is like Google, once documents matching only the ANDed items in the query ran out, then those that had only two of the terms, then only 1 of the terms, and then those close to it would start showing up. Is this correct? If so, it wouldn't match his requirements. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe wrote: > From: Steven A Rowe > Subject: RE: How do I this in Solr? > To: "solr-user@lucene.apache.org" > Date: Tuesday, October 26, 2010, 12:10 PM > Dennis, > > Do you mean to say that you read my earlier post, and > disagree that it would solve the problem? Or have you > simply not read it? > > Steve > > > -Original Message- > > From: Dennis Gearon [mailto:gear...@sbcglobal.net] > > Sent: Tuesday, October 26, 2010 3:00 PM > > To: solr-user@lucene.apache.org > > Subject: RE: How do I this in Solr? > > > > Good point. Since I might need such a query myself > someday, how *IS* that > > done? > > > > > > Dennis Gearon > > > > Signature Warning > > > > It is always a good idea to learn from your own > mistakes. It is usually a > > better idea to learn from others’ mistakes, so you > do not have to make > > them yourself. from > > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > > EARTH has a Right To Life, > > otherwise we all die. > > > > > > --- On Tue, 10/26/10, Steven A Rowe > wrote: > > > > > From: Steven A Rowe > > > Subject: RE: How do I this in Solr? > > > To: "solr-user@lucene.apache.org" > > > > Date: Tuesday, October 26, 2010, 11:46 AM > > > Um, maybe I'm way off base, but when > > > Varun said: > > > > > > > If I search with the text "samsung andriod > GPS", > > > > search results should only conain "samsung", > "GPS", > > > > "andriod" and "samsung andriod". > > > > > > I interpreted that to mean that hit documents > should > > > contain terms from the query, and nothing else. > Making > > > all terms required doesn't do this. > > > > > > Steve > > > > > > > -Original Message- > > > > From: Matthew Hall [mailto:mh...@informatics.jax.org] > > > > Sent: Tuesday, October 26, 2010 2:30 PM > > > > To: solr-user@lucene.apache.org > > > > Subject: Re: How do I this in Solr? > > > > > > > > Um.. you could change your default clause to > AND > > > rather than or. > > > > > > > > That should do the trick. > > > > > > > > Matt > > > > > > > > On 10/26/2010 2:26 PM, Dennis Gearon wrote: > > > > > Overkill? > > > > > > > > > > Dennis Gearon > > > > >> I can't think of a way to do it > without > > > writing new > > > > >> analysis filters. > > > > >> > > > > >> But I think you could do what you > want with > > > two filters > > > > >> (this is untested): > > > > >> > > > > >> 1. An index-time filter that > outputs a single > > > token > > > > >> consisting of all of the input > tokens, sorted > > > in a > > > > >> consistent way, e.g.: > > > > >> > > > > >> "mobile with GPS" > > > -> "GPS mobile > > > > >> with" > > > > >> "samsung android" > > > -> "android > > > > >> samsung" > > > > >> > > > > >> 2. A query-time filter that outputs > one token > > > per input > > > > >> term combination, sorted in the > same > > > consistent way as the > > > > >> index-time filter, e.g.: > > > > >> > > > > >> "samsung andriod > > > GPS" > > > > >> -> > > > > >> "samsung","android","GPS", > > > > >> "android > > > > >> samsung","GPS samsung","android > GPS" > > > > >> "android > > > GPS > > > > >> samsung" > > > > >> > > > > >> Steve > > > > >> > > > > >>> -Original Message- > > > > >>> From: Varun Gupta [mailto:varun.vgu...@gmail.com] > > > > >>> Sent: Tuesday, October 26, 2010 > 9:08 AM > > > > >>> To: solr-user@lucene.apache.org > > > > >>> Subject: How do I this in > Solr? > > > > >>> > > > > >>> Hi, > > > > >>> > > > > >>> I have lot of small documents > (each > > > containing 1 to 15 > > > > >> words) indexed in > > > > >>> Solr. For the search query, I > want the > > > search results > > > > >> to contain only > > > > >>> those > > > > >>> documents that satisfy this > criteria "All > > > of the words > > > > >> of the search > > > > >>> result > > > > >>> document are present in the > search > > > query" > > > > >>> > > > > >>> For example: > > > > >>> If I have the following > documents > > > indexed: "nokia > > > > >> n95", "GPS", "android", > > > > >>> "samsung", "samsung andriod", > "nokia > > > andriod", "mobile > > > > >> with GPS" > > > > >>> If I search with the text > "samsung > > > andriod GPS", > > > > >> search results should > > > > >>> only > > > > >>> conain "samsung", "GPS", > "andriod" and > > > "samsung > > > > >> andriod". > > > > >>> Is ther
RE: How do I this in Solr?
Dennis, Do you mean to say that you read my earlier post, and disagree that it would solve the problem? Or have you simply not read it? Steve > -Original Message- > From: Dennis Gearon [mailto:gear...@sbcglobal.net] > Sent: Tuesday, October 26, 2010 3:00 PM > To: solr-user@lucene.apache.org > Subject: RE: How do I this in Solr? > > Good point. Since I might need such a query myself someday, how *IS* that > done? > > > Dennis Gearon > > Signature Warning > > It is always a good idea to learn from your own mistakes. It is usually a > better idea to learn from others’ mistakes, so you do not have to make > them yourself. from > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > EARTH has a Right To Life, > otherwise we all die. > > > --- On Tue, 10/26/10, Steven A Rowe wrote: > > > From: Steven A Rowe > > Subject: RE: How do I this in Solr? > > To: "solr-user@lucene.apache.org" > > Date: Tuesday, October 26, 2010, 11:46 AM > > Um, maybe I'm way off base, but when > > Varun said: > > > > > If I search with the text "samsung andriod GPS", > > > search results should only conain "samsung", "GPS", > > > "andriod" and "samsung andriod". > > > > I interpreted that to mean that hit documents should > > contain terms from the query, and nothing else. Making > > all terms required doesn't do this. > > > > Steve > > > > > -Original Message- > > > From: Matthew Hall [mailto:mh...@informatics.jax.org] > > > Sent: Tuesday, October 26, 2010 2:30 PM > > > To: solr-user@lucene.apache.org > > > Subject: Re: How do I this in Solr? > > > > > > Um.. you could change your default clause to AND > > rather than or. > > > > > > That should do the trick. > > > > > > Matt > > > > > > On 10/26/2010 2:26 PM, Dennis Gearon wrote: > > > > Overkill? > > > > > > > > Dennis Gearon > > > >> I can't think of a way to do it without > > writing new > > > >> analysis filters. > > > >> > > > >> But I think you could do what you want with > > two filters > > > >> (this is untested): > > > >> > > > >> 1. An index-time filter that outputs a single > > token > > > >> consisting of all of the input tokens, sorted > > in a > > > >> consistent way, e.g.: > > > >> > > > >> "mobile with GPS" > > -> "GPS mobile > > > >> with" > > > >> "samsung android" > > -> "android > > > >> samsung" > > > >> > > > >> 2. A query-time filter that outputs one token > > per input > > > >> term combination, sorted in the same > > consistent way as the > > > >> index-time filter, e.g.: > > > >> > > > >> "samsung andriod > > GPS" > > > >> -> > > > >> "samsung","android","GPS", > > > >> "android > > > >> samsung","GPS samsung","android GPS" > > > >> "android > > GPS > > > >> samsung" > > > >> > > > >> Steve > > > >> > > > >>> -Original Message- > > > >>> From: Varun Gupta [mailto:varun.vgu...@gmail.com] > > > >>> Sent: Tuesday, October 26, 2010 9:08 AM > > > >>> To: solr-user@lucene.apache.org > > > >>> Subject: How do I this in Solr? > > > >>> > > > >>> Hi, > > > >>> > > > >>> I have lot of small documents (each > > containing 1 to 15 > > > >> words) indexed in > > > >>> Solr. For the search query, I want the > > search results > > > >> to contain only > > > >>> those > > > >>> documents that satisfy this criteria "All > > of the words > > > >> of the search > > > >>> result > > > >>> document are present in the search > > query" > > > >>> > > > >>> For example: > > > >>> If I have the following documents > > indexed: "nokia > > > >> n95", "GPS", "android", > > > >>> "samsung", "samsung andriod", "nokia > > andriod", "mobile > > > >> with GPS" > > > >>> If I search with the text "samsung > > andriod GPS", > > > >> search results should > > > >>> only > > > >>> conain "samsung", "GPS", "andriod" and > > "samsung > > > >> andriod". > > > >>> Is there a way to do this in Solr. > > > >>> > > > >>> -- > > > >>> Thanks > > > >>> Varun Gupta > > > >
RE: How do I this in Solr?
Good point. Since I might need such a query myself someday, how *IS* that done? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Tue, 10/26/10, Steven A Rowe wrote: > From: Steven A Rowe > Subject: RE: How do I this in Solr? > To: "solr-user@lucene.apache.org" > Date: Tuesday, October 26, 2010, 11:46 AM > Um, maybe I'm way off base, but when > Varun said: > > > If I search with the text "samsung andriod GPS", > > search results should only conain "samsung", "GPS", > > "andriod" and "samsung andriod". > > I interpreted that to mean that hit documents should > contain terms from the query, and nothing else. Making > all terms required doesn't do this. > > Steve > > > -Original Message- > > From: Matthew Hall [mailto:mh...@informatics.jax.org] > > Sent: Tuesday, October 26, 2010 2:30 PM > > To: solr-user@lucene.apache.org > > Subject: Re: How do I this in Solr? > > > > Um.. you could change your default clause to AND > rather than or. > > > > That should do the trick. > > > > Matt > > > > On 10/26/2010 2:26 PM, Dennis Gearon wrote: > > > Overkill? > > > > > > Dennis Gearon > > >> I can't think of a way to do it without > writing new > > >> analysis filters. > > >> > > >> But I think you could do what you want with > two filters > > >> (this is untested): > > >> > > >> 1. An index-time filter that outputs a single > token > > >> consisting of all of the input tokens, sorted > in a > > >> consistent way, e.g.: > > >> > > >> "mobile with GPS" > -> "GPS mobile > > >> with" > > >> "samsung android" > -> "android > > >> samsung" > > >> > > >> 2. A query-time filter that outputs one token > per input > > >> term combination, sorted in the same > consistent way as the > > >> index-time filter, e.g.: > > >> > > >> "samsung andriod > GPS" > > >> -> > > >> "samsung","android","GPS", > > >> "android > > >> samsung","GPS samsung","android GPS" > > >> "android > GPS > > >> samsung" > > >> > > >> Steve > > >> > > >>> -Original Message- > > >>> From: Varun Gupta [mailto:varun.vgu...@gmail.com] > > >>> Sent: Tuesday, October 26, 2010 9:08 AM > > >>> To: solr-user@lucene.apache.org > > >>> Subject: How do I this in Solr? > > >>> > > >>> Hi, > > >>> > > >>> I have lot of small documents (each > containing 1 to 15 > > >> words) indexed in > > >>> Solr. For the search query, I want the > search results > > >> to contain only > > >>> those > > >>> documents that satisfy this criteria "All > of the words > > >> of the search > > >>> result > > >>> document are present in the search > query" > > >>> > > >>> For example: > > >>> If I have the following documents > indexed: "nokia > > >> n95", "GPS", "android", > > >>> "samsung", "samsung andriod", "nokia > andriod", "mobile > > >> with GPS" > > >>> If I search with the text "samsung > andriod GPS", > > >> search results should > > >>> only > > >>> conain "samsung", "GPS", "andriod" and > "samsung > > >> andriod". > > >>> Is there a way to do this in Solr. > > >>> > > >>> -- > > >>> Thanks > > >>> Varun Gupta > >
Re: ClassCastException Issue
: [ERROR][http-4443-exec-3][util.plugin.AbstractPluginLoader] log():139 : java.lang.ClassCastException: org.apache.solr.schema.StrField cannot : be cast to org.apache.solr.schema.FieldType This almost certainly inidcates a classloader issue - i suspect you have multiple solr related jars in various places, and the FieldType class instance found when StrField is loaded comes from a different (incompatible) jar. -Hoss
RE: How do I this in Solr?
Um, maybe I'm way off base, but when Varun said: > If I search with the text "samsung andriod GPS", > search results should only conain "samsung", "GPS", > "andriod" and "samsung andriod". I interpreted that to mean that hit documents should contain terms from the query, and nothing else. Making all terms required doesn't do this. Steve > -Original Message- > From: Matthew Hall [mailto:mh...@informatics.jax.org] > Sent: Tuesday, October 26, 2010 2:30 PM > To: solr-user@lucene.apache.org > Subject: Re: How do I this in Solr? > > Um.. you could change your default clause to AND rather than or. > > That should do the trick. > > Matt > > On 10/26/2010 2:26 PM, Dennis Gearon wrote: > > Overkill? > > > > Dennis Gearon > >> I can't think of a way to do it without writing new > >> analysis filters. > >> > >> But I think you could do what you want with two filters > >> (this is untested): > >> > >> 1. An index-time filter that outputs a single token > >> consisting of all of the input tokens, sorted in a > >> consistent way, e.g.: > >> > >> "mobile with GPS" -> "GPS mobile > >> with" > >> "samsung android" -> "android > >> samsung" > >> > >> 2. A query-time filter that outputs one token per input > >> term combination, sorted in the same consistent way as the > >> index-time filter, e.g.: > >> > >> "samsung andriod GPS" > >> -> > >> "samsung","android","GPS", > >> "android > >> samsung","GPS samsung","android GPS" > >> "android GPS > >> samsung" > >> > >> Steve > >> > >>> -Original Message- > >>> From: Varun Gupta [mailto:varun.vgu...@gmail.com] > >>> Sent: Tuesday, October 26, 2010 9:08 AM > >>> To: solr-user@lucene.apache.org > >>> Subject: How do I this in Solr? > >>> > >>> Hi, > >>> > >>> I have lot of small documents (each containing 1 to 15 > >> words) indexed in > >>> Solr. For the search query, I want the search results > >> to contain only > >>> those > >>> documents that satisfy this criteria "All of the words > >> of the search > >>> result > >>> document are present in the search query" > >>> > >>> For example: > >>> If I have the following documents indexed: "nokia > >> n95", "GPS", "android", > >>> "samsung", "samsung andriod", "nokia andriod", "mobile > >> with GPS" > >>> If I search with the text "samsung andriod GPS", > >> search results should > >>> only > >>> conain "samsung", "GPS", "andriod" and "samsung > >> andriod". > >>> Is there a way to do this in Solr. > >>> > >>> -- > >>> Thanks > >>> Varun Gupta
Re: Highlighting for non-stored fields
Another way you can do this is - after the search has completed, load the field in your application, write separate code to reanalyze that field/document, index it in RAM, and run it through highlighter classes. All this as part of your web application outside of Solr. Considering the size of your data it doesn't look advisable to store it because then you would be almost doubling the size of your index (if you are looking to highlight on a field then it's probably going to be full of content). -Pradeep On Tue, Oct 26, 2010 at 8:32 AM, Phong Dais wrote: > Hi, > > I understand that I need to store the fields in order to use highlighting > "out of the box". > I'm looking for a way to highlighting using term offsets instead of the > actual text since the text is not stored. What am asking is is it possible > to modify the response (thru custom implementation) to contain highlighted > offsets instead of the actual matched text. Should I be writing my own > DefaultHighlighter? Or overiding some of its functionality? Can this be > done this way or am I way off? > > BTW, I'm using solr-1.4. > > Thanks, > P. > > On Tue, Oct 26, 2010 at 9:25 AM, Israel Ekpo wrote: > > > Check out this link > > > > http://wiki.apache.org/solr/FieldOptionsByUseCase > > > > You need to store the field if you want to use the highlighting feature. > > > > If you need to retrieve and display the highlighted snippets then the > > fields > > definitely needs to be stored. > > > > To use term offsets, it will be a good idea to enable the following > > attributes for that field termVectors termPositions termOffsets > > > > The only issue here is that your storage costs will increase because of > > these extra features. > > > > Nevertheless, you definitely need to store the field if you need to > > retrieve > > it for highlighting purposes. > > > > On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais > wrote: > > > > > Hi, > > > > > > I've been looking thru the mailing archive for the past week and I > > haven't > > > found any useful info regarding this issue. > > > > > > My requirement is to index a few terabytes worth of data to be > searched. > > > Due to the size of the data, I would like to index without storing but > I > > > would like to use the highlighting feature. Is this even possible? > What > > > are my options? > > > > > > I've read about termOffsets, payload that could possibly be used to do > > this > > > but I have no idea how this could be done. > > > > > > Any pointers greatly appreciated. Someone please point me in the right > > > direction. > > > > > > I don't mind having to write some code or digging thru existing code > to > > > accomplish this task. > > > > > > Thanks, > > > P. > > > > > > > > > > > -- > > °O° > > "Good Enough" is not good enough. > > To give anything less than your best is to sacrifice the gift. > > Quality First. Measure Twice. Cut Once. > > http://www.israelekpo.com/ > > >
Re: How do I this in Solr?
Um.. you could change your default clause to AND rather than or. That should do the trick. Matt On 10/26/2010 2:26 PM, Dennis Gearon wrote: Overkill? Dennis Gearon I can't think of a way to do it without writing new analysis filters. But I think you could do what you want with two filters (this is untested): 1. An index-time filter that outputs a single token consisting of all of the input tokens, sorted in a consistent way, e.g.: "mobile with GPS" -> "GPS mobile with" "samsung android" -> "android samsung" 2. A query-time filter that outputs one token per input term combination, sorted in the same consistent way as the index-time filter, e.g.: "samsung andriod GPS" -> "samsung","android","GPS", "android samsung","GPS samsung","android GPS" "android GPS samsung" Steve -Original Message- From: Varun Gupta [mailto:varun.vgu...@gmail.com] Sent: Tuesday, October 26, 2010 9:08 AM To: solr-user@lucene.apache.org Subject: How do I this in Solr? Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria "All of the words of the search result document are present in the search query" For example: If I have the following documents indexed: "nokia n95", "GPS", "android", "samsung", "samsung andriod", "nokia andriod", "mobile with GPS" If I search with the text "samsung andriod GPS", search results should only conain "samsung", "GPS", "andriod" and "samsung andriod". Is there a way to do this in Solr. -- Thanks Varun Gupta
Re: Modelling Access Control
"Son, don't touch that stove . . . .", "OUCH! Hey Dad, I BURNED my hand on that stove, why didn't you tell me that?!?#! You know I need to know WHY, not just DON'T!" Dennis Gearon > Very important: do not make a spelling or autosuggest index > from a > text field which some people can see and other people > can't. >
RE: How do I this in Solr?
Overkill? Dennis Gearon > > I can't think of a way to do it without writing new > analysis filters. > > But I think you could do what you want with two filters > (this is untested): > > 1. An index-time filter that outputs a single token > consisting of all of the input tokens, sorted in a > consistent way, e.g.: > > "mobile with GPS" -> "GPS mobile > with" > "samsung android" -> "android > samsung" > > 2. A query-time filter that outputs one token per input > term combination, sorted in the same consistent way as the > index-time filter, e.g.: > > "samsung andriod GPS" > -> > "samsung","android","GPS", > "android > samsung","GPS samsung","android GPS" > "android GPS > samsung" > > Steve > > > -Original Message- > > From: Varun Gupta [mailto:varun.vgu...@gmail.com] > > Sent: Tuesday, October 26, 2010 9:08 AM > > To: solr-user@lucene.apache.org > > Subject: How do I this in Solr? > > > > Hi, > > > > I have lot of small documents (each containing 1 to 15 > words) indexed in > > Solr. For the search query, I want the search results > to contain only > > those > > documents that satisfy this criteria "All of the words > of the search > > result > > document are present in the search query" > > > > For example: > > If I have the following documents indexed: "nokia > n95", "GPS", "android", > > "samsung", "samsung andriod", "nokia andriod", "mobile > with GPS" > > > > If I search with the text "samsung andriod GPS", > search results should > > only > > conain "samsung", "GPS", "andriod" and "samsung > andriod". > > > > Is there a way to do this in Solr. > > > > -- > > Thanks > > Varun Gupta >
Re: Strange search
Can anyone tell my, why my search is so terrible? It's work realy strange. Here my basic configs in schema.xml: main filters: and fields: templateId text here schema for field "typeCaption" from _http://localhost:8983/search/admin/schema.jsp; html4 page4 template4 text4 main4 seo 3 meta2 tags1 keywords1 If I search "html", I get all results, but if I search "seo" or "text" I don't get any results. I try to use wildcard, but it don't help me. Can anyone say, where is my problem. Sorry for my not well english. -- View this message in context: http://lucene.472066.n3.nabble.com/Strange-search-tp998961p1773307.html Sent from the Solr - User mailing list archive at Nabble.com.
After java replication: field not found exception on slaves
Hi, we had the following problem. We added a field to schema.xml and fed our master with the new data. After that querying on the master is fine. But when we replicated (solr1.4.0) to our slaves. All slaves said they cannot find the new field (standard exception for missing fields). And that although I can see the new field in the xml response and I can see it in the replicated schema.xml file!? It is more strange that with scp-ing the exact data folder to our master all is fine (on the master). Did somebody of you hit the same strange behaviour? Regards, Peter. PS: Finally we did on the slaves: rm -rf data/ ./reload.sh + replicated again
Inconsistent slave performance after optimize
Hello esteemed Solr community -- I'm observing some inconsistent performance on our slave servers after recently optimizing our master server. Our configuration is as follows: - all servers are hosted at Amazon EC2, running Ubuntu 8.04 - 1 master with heavy insert/update traffic, about 125K new documents per day (m1.large, ~8GB RAM) - autocommit every 1 minute - 3 slaves (m2.xlarge instance sizes, ~16GB RAM) - replicate every 5 minutes - we have configured autowarming queries for these machines - autowarmCount = 0 - Total index size is ~7M documents We were seeing increasing, but gradual performance degradation across all nodes. So we decided to try optimizing our index to improve performance. In preparation for the optimize we disabled replication polling on all slaves. We also turned off all workers that were writing to the index. Then we ran optimize on the master. The optimize took 45-60 minutes to complete, and the total size went from 68GB down to 23GB. We then enabled replication on each slave one at a time. The first slave we re-enabled took about 15 minutes to copy the new files. Once the files were copied the performance of slave plummeted. Average response time went from 0.75 sec to 45 seconds. Over the past 18 hours the average response time has gradually gown down to around 1.2 seconds now. Before re-enabling replication the second slave, we first removed it from our load-balanced pool of available search servers. This server's average query performance also degraded quickly, and then (unlike the first slave we replicated) did not improve. It stayed at around 30 secs per query. On the theory that this is a cache-warming issue, we added this server back to the pool in hopes that additional traffic would warm the cache. But what we saw was a quick spike of much worse performance (50 sec / query on average) followed by a slow/gradual decline in average response times. As of now (10 hours after the initial replication) this server is still reporting an average response time of ~2 seconds. This is much worse than before the optimize and is a counter-intuitive result. We expected an index 1/3 the size would be faster, not slower. On the theory that the index files needed to be loaded into the file system cache, I used the 'dd' command to copy the contents of the data/index directory to /dev/null, but that did not result in any noticeable performance improvement. At this point, things were not going as expected. We did not expect the replication after an optimize to result in such horrid performance. So we decided to let the last slave continue to serve stale results while we waited 4 hours for the other two slaves to approach some acceptable performance level. After the 4 hour break, we re-moved the 3rd and last slave server from our load-balancing pool, then re-enabled replication. This time we saw a tiny blip. The average performance went up to 1 second briefly then went back to the (normal for us) 0.25 to 0.5 second range. We then added this server back to the load-balancing pool and observed no degradation in performance. While we were happy to avoid a repeat of the poor performance we saw on the previous slaves, we are at a loss to explain why this slave did not also have such poor performance. At this point we're scratching our heads trying to understand: (a) Why the performance of the first two slaves was so terrible after the optimize. We think its cache-warming related, but we're not sure. > 10 hours seems like a long time to wait for the cache to warm up (b) Why the performance of the third slave was barely impacted. It should have hit the same cold-cache issues as the other servers, if that is indeed the root cause. (c) Why performance of the first 2 slaves is still much worse after the optimize than it was before the optimize, where the performance of the 3rd slave is pretty much unchanged. We expected the optimize to *improve* performance. All 3 slave servers are identically configured, and the procedure for re-enabling replication was identical for the 2nd and 3rd slaves, with the exception of a 4-hour wait period. We have confirmed that the 3rd slave did replicate, the number of documents and total index size matches the master and other slave servers. I'm writing to fish for an explanation or ideas that might explain this inconsistent performance. Obviously, we'd like to be able to reproduce the performance of the 3rd slave, and avoid the poor performance of the first two slaves the next time we decide it's time to optimize our index. thanks in advance, Mason
Re: Highlighting for non-stored fields
Hi, I understand that I need to store the fields in order to use highlighting "out of the box". I'm looking for a way to highlighting using term offsets instead of the actual text since the text is not stored. What am asking is is it possible to modify the response (thru custom implementation) to contain highlighted offsets instead of the actual matched text. Should I be writing my own DefaultHighlighter? Or overiding some of its functionality? Can this be done this way or am I way off? BTW, I'm using solr-1.4. Thanks, P. On Tue, Oct 26, 2010 at 9:25 AM, Israel Ekpo wrote: > Check out this link > > http://wiki.apache.org/solr/FieldOptionsByUseCase > > You need to store the field if you want to use the highlighting feature. > > If you need to retrieve and display the highlighted snippets then the > fields > definitely needs to be stored. > > To use term offsets, it will be a good idea to enable the following > attributes for that field termVectors termPositions termOffsets > > The only issue here is that your storage costs will increase because of > these extra features. > > Nevertheless, you definitely need to store the field if you need to > retrieve > it for highlighting purposes. > > On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais wrote: > > > Hi, > > > > I've been looking thru the mailing archive for the past week and I > haven't > > found any useful info regarding this issue. > > > > My requirement is to index a few terabytes worth of data to be searched. > > Due to the size of the data, I would like to index without storing but I > > would like to use the highlighting feature. Is this even possible? What > > are my options? > > > > I've read about termOffsets, payload that could possibly be used to do > this > > but I have no idea how this could be done. > > > > Any pointers greatly appreciated. Someone please point me in the right > > direction. > > > > I don't mind having to write some code or digging thru existing code to > > accomplish this task. > > > > Thanks, > > P. > > > > > > -- > °O° > "Good Enough" is not good enough. > To give anything less than your best is to sacrifice the gift. > Quality First. Measure Twice. Cut Once. > http://www.israelekpo.com/ >
Re: Documents are deleted when Solr is restarted
The Solr home is the -Dsolr.solr.home Java System property Also make sure that -Dsolr.data.dir is define for your data directory, if it is not already defined in the solrconfig.xml file On Tue, Oct 26, 2010 at 10:46 AM, Upayavira wrote: > You need to watch what you are setting your solr.home to. That is where > your indexes are being written. Are they getting overwritten/lost > somehow. Watch the files in that dir while doing a restart. > > That's a start at least. > > Upayavira > > On Tue, 26 Oct 2010 16:40 +0300, "Mackram Raydan" > wrote: > > Hey everyone, > > > > I apologize if this question is rudimentary but it is getting to me and > > I did not find anything reasonable about it online. > > > > So basically I have a Solr 1.4.1 setup behind Tomcat 6. I used the > > SolrTomcat wiki page to setup. The system works exactly the way I want > > it (proper search, highlighting, etc...). The problem however is when I > > restart my Tomcat server all the data in Solr (ie the index) is simply > > lost. The admin shows me the number of docs is 0 when it was before in > > the thousands. > > > > Can someone please help me understand why the above is happening and how > > can I workaround it if possible? > > > > Big thanks for any help you can send my way. > > > > Regards, > > > > Mackram > > > -- °O° "Good Enough" is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Documents are deleted when Solr is restarted
You need to watch what you are setting your solr.home to. That is where your indexes are being written. Are they getting overwritten/lost somehow. Watch the files in that dir while doing a restart. That's a start at least. Upayavira On Tue, 26 Oct 2010 16:40 +0300, "Mackram Raydan" wrote: > Hey everyone, > > I apologize if this question is rudimentary but it is getting to me and > I did not find anything reasonable about it online. > > So basically I have a Solr 1.4.1 setup behind Tomcat 6. I used the > SolrTomcat wiki page to setup. The system works exactly the way I want > it (proper search, highlighting, etc...). The problem however is when I > restart my Tomcat server all the data in Solr (ie the index) is simply > lost. The admin shows me the number of docs is 0 when it was before in > the thousands. > > Can someone please help me understand why the above is happening and how > can I workaround it if possible? > > Big thanks for any help you can send my way. > > Regards, > > Mackram >
Re: Query only a specfic field with a specific value using Dismax Handler
Thanks Jonathan. FQ seems promising. I will give it a go. Swapnonil Mukherjee On 26-Oct-2010, at 7:29 PM, Jonathan Rochkind wrote: > So, first of all, "exact" match is hard in Solr on tokenized fields. > Tokenized fields don't really do that. So for exact match, you should > probably use a non-tokenized field (string or text with keywordtokenizer > (which should really be called the non-tokenizer)). If there's only one > token in your value anyway though, like a single number, it may not > matter and work fine. > > Secondly, I'd recommend combining a dismax query for the user-entered > phrase (like 'dog') with standard lucene queries for those other > things. There are (at least) two ways to do that. The first is just put > everything after the first AND in one or more 'fq' parameters instead of > trying to include them in 'q'. The second is to use Solr's nested query > syntax, to specify sub-queries with different query parsers. Someone can > explain the second if you need it, but the easier to understand 'fq' > approach seems right to me for your case. > > Swapnonil Mukherjee wrote: >> Hi Everybody, >> >> Let me give you a brief idea of our Solr document. We have about 6 text type >> fields, each containing IPTC data extracted from photos. Search is performed >> mostly on these 6 fields. >> We also have a mutlivalue field named group_id that contains a list of all >> the group_ids that have access to this photo. In other words we are >> storing the metadata of the photo as well as the permissions applicable for >> this photo in the Solr document itself. This group_id field by the way is of >> long type. >> >> Additionally we have certain boolean and constant type fields named >> visibleToEndUser (boolean) and entityType (a java enum between 0 to 5). >> >> The first field defaultSearch is a copyField which contains a copy of all >> the values of 6 text type fields that I have mentioned. >> >> The way we query presently using the default search handler is like this. >> >> defaultSearch:(Dog) AND (group_id:2181347 OR group_id:2181364 OR >> group_id:2216624 OR group_id:2216990) AND (entityType:0) AND >> (visibleToEndUser:true) >> >> We want to start using the dismax (if not dismax then edismax) query >> handler but so far I have not been able to replicate the query mentioned >> above to the equivalent dismax form. >> >> What I cannot figure out is? >> >> 1. How do I apply exact match on the group_id, visibleToEndUser and the >> entityType fields? Or How how do I query a specific field with a specific >> value rather than searching across all fields with all values. >> 2. How do I apply OR and AND conditions? >> >> >> Swapnonil Mukherjee >> >> >> >> >>
Re: how well does multicore scale?
mike anderson wrote: I'm really curious if there is a clever solution to the obvious problem with: "So your better off using a single index and with a user id and use a query filter with the user id when fetching data.", i.e.. when you have hundreds of thousands of user IDs tagged on each article. That just doesn't sound like it scales very well.. Actually, I think that design would scale pretty fine, I don't think there's an 'obvious' problem. You store your userIDs in a multi-valued field (or as multiple terms in a single value, ends up being similar). You fq on there with the current userID. There's one way to find out of course, but that doesn't seem a patently ridiculous scenario or anything, that's the kind of thing Solr is generally good at, it's what it's built for. The problem might actually be in the time it takes to add such a document to the index; but not in query time. Doesn't mean it's the best solution for your problem though, I can't say. My impression is that Solr in general isn't really designed to support the kind of multi-tenancy use case people are talking about lately. So trying to make it work anyway... if multi-cores work for you, then great, but be aware they weren't really designed for that (having thousands of cores) and may not. If a single index can work for you instead, great, but as you've discovered it's not neccesarily obvious how to set up the schema to do what you need -- really this applies to Solr in general, unlike an rdbms where you just third-form-normalize everything and figure it'll work for almost any use case that comes up, in Solr you generally need to custom fit the schema for your particular use cases, sometimes being kind of clever to figure out the optimal way to do that. This is, I'd argue/agree, indeed kind of a disadvantage, setting up a Solr index takes more intellectual work than setting up an rdbms. The trade off is you get speed, and flexible ways to set up relevancy (that still perform well). Took a couple decades for rdbms to get as brainless to use as they are, maybe in a couple more we'll have figured out ways to make indexing engines like solr equally brainless, but not yet -- but it's still pretty damn easy for what it is, the lucene/Solr folks have done a remarkable job.
Re: How do I this in Solr?
On Tue, Oct 26, 2010 at 9:15 AM, Savvas-Andreas Moysidis < savvas.andreas.moysi...@googlemail.com> wrote: > If I get your question right, you probably want to use the AND binary > operator as in "samsung AND andriod AND GPS" or "+samsung +andriod +GPS" > > N.b. For these queries you can also pass the q.op parameter in the request to temporarily change the default operator to AND; this has the same effect without having to build the query; i.e., you can just pass "http://host:port/solr/select?q=samsung+android+gps&q.op=and"; as the query string (along with any other params you need).
Re: how well does multicore scale?
So I fired up about 100 cores and used JMeter to fire off a few thousand queries. It looks like the memory usage isn't much worse than running a single shard. So thats good. I'm really curious if there is a clever solution to the obvious problem with: "So your better off using a single index and with a user id and use a query filter with the user id when fetching data.", i.e.. when you have hundreds of thousands of user IDs tagged on each article. That just doesn't sound like it scales very well.. Cheers, Mike On Fri, Oct 22, 2010 at 10:43 PM, Lance Norskog wrote: > http://wiki.apache.org/solr/CoreAdmin > > Since Solr 1.3 > > On Fri, Oct 22, 2010 at 1:40 PM, mike anderson > wrote: > > Thanks for the advice, everyone. I'll take a look at the API mentioned > and > > do some benchmarking over the weekend. > > > > -Mike > > > > > > On Fri, Oct 22, 2010 at 8:50 AM, Mark Miller > wrote: > > > >> On 10/22/10 1:44 AM, Tharindu Mathew wrote: > >> > Hi Mike, > >> > > >> > I've also considered using a separate cores in a multi tenant > >> > application, ie a separate core for each tenant/domain. But the cores > >> > do not suit that purpose. > >> > > >> > If you check out documentation no real API support exists for this so > >> > it can be done dynamically through SolrJ. And all use cases I found, > >> > only had users configuring it statically and then using it. That was > >> > maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks. > >> > >> You can dynamically manage cores with solrj. See > >> org.apache.solr.client.solrj.request.CoreAdminRequest's static methods > >> for a place to start. > >> > >> You probably want to turn solr.xml's persist option on so that your > >> cores survive restarts. > >> > >> > > >> > So your better off using a single index and with a user id and use a > >> > query filter with the user id when fetching data. > >> > >> Many times this is probably the case - pro's and con's to each depending > >> on what you are up to. > >> > >> - Mark > >> lucidimagination.com > >> > >> > > >> > On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind > >> wrote: > >> >> No, it does not seem reasonable. Why do you think you need a > seperate > >> core > >> >> for every user? > >> >> mike anderson wrote: > >> >>> > >> >>> I'm exploring the possibility of using cores as a solution to > "bookmark > >> >>> folders" in my solr application. This would mean I'll need tens of > >> >>> thousands > >> >>> of cores... does this seem reasonable? I have plenty of CPUs > available > >> for > >> >>> scaling, but I wonder about the memory overhead of adding cores > (aside > >> >>> from > >> >>> needing to fit the new index in memory). > >> >>> > >> >>> Thoughts? > >> >>> > >> >>> -mike > >> >>> > >> >>> > >> >> > >> > > >> > > >> > > >> > >> > > > > > > -- > Lance Norskog > goks...@gmail.com >
Re: Query only a specfic field with a specific value using Dismax Handler
So, first of all, "exact" match is hard in Solr on tokenized fields. Tokenized fields don't really do that. So for exact match, you should probably use a non-tokenized field (string or text with keywordtokenizer (which should really be called the non-tokenizer)). If there's only one token in your value anyway though, like a single number, it may not matter and work fine. Secondly, I'd recommend combining a dismax query for the user-entered phrase (like 'dog') with standard lucene queries for those other things. There are (at least) two ways to do that. The first is just put everything after the first AND in one or more 'fq' parameters instead of trying to include them in 'q'. The second is to use Solr's nested query syntax, to specify sub-queries with different query parsers. Someone can explain the second if you need it, but the easier to understand 'fq' approach seems right to me for your case. Swapnonil Mukherjee wrote: Hi Everybody, Let me give you a brief idea of our Solr document. We have about 6 text type fields, each containing IPTC data extracted from photos. Search is performed mostly on these 6 fields. We also have a mutlivalue field named group_id that contains a list of all the group_ids that have access to this photo. In other words we are storing the metadata of the photo as well as the permissions applicable for this photo in the Solr document itself. This group_id field by the way is of long type. Additionally we have certain boolean and constant type fields named visibleToEndUser (boolean) and entityType (a java enum between 0 to 5). The first field defaultSearch is a copyField which contains a copy of all the values of 6 text type fields that I have mentioned. The way we query presently using the default search handler is like this. defaultSearch:(Dog) AND (group_id:2181347 OR group_id:2181364 OR group_id:2216624 OR group_id:2216990) AND (entityType:0) AND (visibleToEndUser:true) We want to start using the dismax (if not dismax then edismax) query handler but so far I have not been able to replicate the query mentioned above to the equivalent dismax form. What I cannot figure out is? 1. How do I apply exact match on the group_id, visibleToEndUser and the entityType fields? Or How how do I query a specific field with a specific value rather than searching across all fields with all values. 2. How do I apply OR and AND conditions? Swapnonil Mukherjee
Re: Solr ExtractingRequestHandler with Compressed files
Hi Javendra, Thanks for the suggestion, I updated to Solr 1.4.1 and Solr Cell 1.4.1 and tried sending a zip file that contained several html documents. Unfortunately, that did not solve the problem. Here's the curl command I used: curl " http://localhost:8983/solr/update/extract?literla.id=d...@uprefix=attr_&fmap.content=attri_content&commit=true"; -F "file=data.zip" When I query for id:doc1, the attr_content lists each filename within the zip archive. It also indexed the stream_size, stream_source and content_type. It does not appear to be opening up the individual files within the zip. Did you have to make any other configuration changes to your solrconfig.xml or schema.xml to read the contents of the individual files? Would it help to pass the specific mime type on the curl line ? On Mon, Oct 25, 2010 at 3:27 PM, Jayendra Patil < jayendra.patil@gmail.com> wrote: > There was this issue with the previous version of Solr, wherein only the > file names from the zip used to get indexed. > We had faced the same issue and ended up using the Solr trunk which has the > Tika version upgraded and works fine. > > The Solr version 1.4.1 should also have the fix included. Try using it. > > Regards, > Jayendra > > On Fri, Oct 22, 2010 at 6:02 PM, Joey Hanzel >wrote: > > > Hi, > > > > Has anyone had success using ExtractingRequestHandler and Tika with any > of > > the compressed file formats (zip, tar, gz, etc) ? > > > > I am sending solr the archived.tar file using curl. curl " > > > > > http://localhost:8983/solr/update/extract?literal.id=doc1&fmap.content=body_texts&commit=true > > " > > -H 'Content-type:application/octet-stream' --data-binary > > "@/home/archived.tar" > > The result I get when I query the document is that the filenames inside > the > > archive are indexed as the "body_texts", but the content of those files > is > > not extracted or included. This is not the behvior I expected. Ref: > > > > > http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika#article.tika.example > > . > > When I send 1 of the actual documents inside the archive using the same > > curl > > command the extracted content is then stored in the "body_texts" field. > Am > > I missing a step for the compressed files? > > > > I have added all the extraction depednenices as indicated by mat in > > http://outoftime.lighthouseapp.com/projects/20339/tickets/98-solr-celland > > am able to succesfully extract data from MS Word, PDF, HTML documents. > > > > I'm using the following library versions. > > Solr 1.40, Solr Cell 1.4.1, with Tika Core 0.4 > > > > Given everything I have read this version of Tika should support > extracting > > data from all files within a compressed file. Any help or suggestions > > would > > be appreciated. > > >
Solr - xmlhttprequest
I have a solr instance in my server, and I can make request with internet explorer. However, with other browsers I can't. Error given; *XMLHttpRequest cannot load http://. Origin http://... is not allowed by Access-Control-Allow-Origin.* I changed my apache server conf file and added this lines; Header set Access-Control-Allow-Origin "*" Header set Access-Control-Allow-Methods POST,GET,OPTIONS Header set Access-Control-Allow-Headers X-PINGOTHER Header set Access-Control-Max-Age 1728000 to allow. Still, the same error. Any suggestion? -- Yavuz Selim YILMAZ
Re: a bug of solr distributed search
Andrzej Bialecki wrote: > On 2010-10-25 11:22, Toke Eskildsen wrote: >> On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote: >>> But itshows a problem of distrubted search without common idf. >>> A doc will get different score in different shard. >> Bingo. >> >> I really don't understand why this fundamental problem with sharding >> isn't mentioned more often. Every time the advice "use sharding" is >> given, it should be followed with a "but be aware that it will make >> relevance ranking unreliable". > > The reason is twofold, I think: And a third potential reason - it's arguably a feature instead of a bug for some applications. Depending on how I organize my shards, "give me the most relevant document from each shard for this search" seems like it could be useful. > * there is an exact solution to this problem, namely to make two > distributed calls instead of one (first call to collect per-shard IDFs > for given query terms, second call to submit a query rewritten with the > global IDF-s). This solution is implemented in SOLR-1632, with some > caching to reduce the cost for common queries. However, this means that > now for every query you need to make two calls instead of one, which > potentially doubles the time to return results (for simple common > queries - for rare complex queries the time will be still dominated by > the query runtime on shard servers). > > * another reason is that in many many cases the difference between using > exact global IDF and per-shard IDFs is not that significant. If shards > are more or less homogenous (e.g. you assign documents to shards by > hash(docId)) then term distributions will be also similar. So then the > question is whether you can accept an N% variance in scores across > shards, or whether you want to bear the cost of an additional > distributed RPC for every query... > > To summarize, I would qualify your statement with: "...if the > composition of your shards is drastically different". Otherwise the cost > of using global IDF is not worth it, IMHO. >
Documents are deleted when Solr is restarted
Hey everyone, I apologize if this question is rudimentary but it is getting to me and I did not find anything reasonable about it online. So basically I have a Solr 1.4.1 setup behind Tomcat 6. I used the SolrTomcat wiki page to setup. The system works exactly the way I want it (proper search, highlighting, etc...). The problem however is when I restart my Tomcat server all the data in Solr (ie the index) is simply lost. The admin shows me the number of docs is 0 when it was before in the thousands. Can someone please help me understand why the above is happening and how can I workaround it if possible? Big thanks for any help you can send my way. Regards, Mackram
Re: Highlighting for non-stored fields
Check out this link http://wiki.apache.org/solr/FieldOptionsByUseCase You need to store the field if you want to use the highlighting feature. If you need to retrieve and display the highlighted snippets then the fields definitely needs to be stored. To use term offsets, it will be a good idea to enable the following attributes for that field termVectors termPositions termOffsets The only issue here is that your storage costs will increase because of these extra features. Nevertheless, you definitely need to store the field if you need to retrieve it for highlighting purposes. On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais wrote: > Hi, > > I've been looking thru the mailing archive for the past week and I haven't > found any useful info regarding this issue. > > My requirement is to index a few terabytes worth of data to be searched. > Due to the size of the data, I would like to index without storing but I > would like to use the highlighting feature. Is this even possible? What > are my options? > > I've read about termOffsets, payload that could possibly be used to do this > but I have no idea how this could be done. > > Any pointers greatly appreciated. Someone please point me in the right > direction. > > I don't mind having to write some code or digging thru existing code to > accomplish this task. > > Thanks, > P. > -- °O° "Good Enough" is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
RE: How do I this in Solr?
Hi Varun, I can't think of a way to do it without writing new analysis filters. But I think you could do what you want with two filters (this is untested): 1. An index-time filter that outputs a single token consisting of all of the input tokens, sorted in a consistent way, e.g.: "mobile with GPS" -> "GPS mobile with" "samsung android" -> "android samsung" 2. A query-time filter that outputs one token per input term combination, sorted in the same consistent way as the index-time filter, e.g.: "samsung andriod GPS" -> "samsung","android","GPS", "android samsung","GPS samsung","android GPS" "android GPS samsung" Steve > -Original Message- > From: Varun Gupta [mailto:varun.vgu...@gmail.com] > Sent: Tuesday, October 26, 2010 9:08 AM > To: solr-user@lucene.apache.org > Subject: How do I this in Solr? > > Hi, > > I have lot of small documents (each containing 1 to 15 words) indexed in > Solr. For the search query, I want the search results to contain only > those > documents that satisfy this criteria "All of the words of the search > result > document are present in the search query" > > For example: > If I have the following documents indexed: "nokia n95", "GPS", "android", > "samsung", "samsung andriod", "nokia andriod", "mobile with GPS" > > If I search with the text "samsung andriod GPS", search results should > only > conain "samsung", "GPS", "andriod" and "samsung andriod". > > Is there a way to do this in Solr. > > -- > Thanks > Varun Gupta
Re: How do I this in Solr?
If I get your question right, you probably want to use the AND binary operator as in "samsung AND andriod AND GPS" or "+samsung +andriod +GPS" On 26 October 2010 14:07, Varun Gupta wrote: > Hi, > > I have lot of small documents (each containing 1 to 15 words) indexed in > Solr. For the search query, I want the search results to contain only those > documents that satisfy this criteria "All of the words of the search result > document are present in the search query" > > For example: > If I have the following documents indexed: "nokia n95", "GPS", "android", > "samsung", "samsung andriod", "nokia andriod", "mobile with GPS" > > If I search with the text "samsung andriod GPS", search results should only > conain "samsung", "GPS", "andriod" and "samsung andriod". > > Is there a way to do this in Solr. > > -- > Thanks > Varun Gupta >
How do I this in Solr?
Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria "All of the words of the search result document are present in the search query" For example: If I have the following documents indexed: "nokia n95", "GPS", "android", "samsung", "samsung andriod", "nokia andriod", "mobile with GPS" If I search with the text "samsung andriod GPS", search results should only conain "samsung", "GPS", "andriod" and "samsung andriod". Is there a way to do this in Solr. -- Thanks Varun Gupta
Next Word - Any Suggestions?
Am about to implement a custom query that is sort of mash-up of Facets, Highlighting, and SpanQuery - but thought I'd see if anyone has done anything similar. In simple words, I need facet on the next word given a target word. For example, if my index only had the following 5 documents (comprised of a sentence each): Doc 1 - The quick brown fox jumped over the fence. Doc 2 - The sly fox skipped over the fence. Doc 3 - The fat fox skipped his afternoon class. Doc 4 - A brown duck and red fox, crashed the party. Doc 5 - Charles Brown! Fox! Crashed my damn car. The query should give the frequency of the distinct terms after the word "fox": skipped - 2 crashed - 2 jumped - 1 Long-term, do the opposite - frequency of the distinct terms before the word "fox": brown - 2 sly - 1 fat - 1 red - 1 My guess is that either the FastVectorHighlighter or SpanQuery would be a reasonable starting point. I was hoping to take advantage of Vectors as I am storing termVectors, termPositions, and termOffsets for the field in question. Grateful for any thoughts . . . reference implementations . . . words of encouragement . . . free beer - whatever you can offer. Gracias, Christopher
RE: How to index on basis of a condition?
Try: select IF(sub_cat_id=2002, DATE_FORMAT(ad_post_date, '%Y-%m-%dT00:00:00Z/DAY'), null) as 'ad_sort_field' from tcuser.ad_details where Ephraim Ofir -Original Message- From: Pawan Darira [mailto:pawan.dar...@gmail.com] Sent: Tuesday, October 26, 2010 1:29 PM To: solr-user@lucene.apache.org Subject: Re: How to index on basis of a condition? My Sql is select IF(sub_cat_id=2002, ad_post_date, null) as 'ad_sort_field' from tcuser.ad_details where +---+ | ad_sort_field | +---+ | 2010-05-30| | 2010-05-02| | 2010-10-07| | NULL| | 2010-10-15| | NULL| ++ Thanks Pawan On Tue, Oct 26, 2010 at 4:36 PM, Gora Mohanty wrote: > On Tue, Oct 26, 2010 at 3:56 PM, Pawan Darira > wrote: > > I am using mysql database, and, field type is "date" > [...] > > Could you show us the exact SELECT statement, and some example > values returned by running the SELECT directly at a mysql console? > > Regards, > Gora > -- Thanks, Pawan Darira
Re: Does Solr reload schema.xml dynamically?
Hi Everybody, Thanks Ephraim and Peter. I think I got my answer. Swapnonil Mukherjee On 26-Oct-2010, at 4:23 PM, Ephraim Ofir wrote: > Note that usually when you change the schema.xml you have not only to > restart solr, but also rebuild the index, so the issue of how to reload > the file seems like a small problem... > > Ephraim Ofir > > -Original Message- > From: Peter Karich [mailto:peat...@yahoo.de] > Sent: Tuesday, October 26, 2010 12:29 PM > To: solr-user@lucene.apache.org > Subject: Re: Does Solr reload schema.xml dynamically? > > Hi, > > See this: > http://wiki.apache.org/solr/CoreAdmin#RELOAD > > Solr will also load the new configuration (without restart the webapp) > on the slaves when using replication: > http://wiki.apache.org/solr/SolrReplication > > Regards, > Peter. > >> Hi Everybody, >> >> If I change my schema.xml to, do I have to restart Solr. Is there some > way, I can apply the changes to schema.xml without restarting Solr? >> >> Swapnonil Mukherjee >> >> >> >> > > > -- > http://jetwick.com twitter search prototype >
Query only a specfic field with a specific value using Dismax Handler
Hi Everybody, Let me give you a brief idea of our Solr document. We have about 6 text type fields, each containing IPTC data extracted from photos. Search is performed mostly on these 6 fields. We also have a mutlivalue field named group_id that contains a list of all the group_ids that have access to this photo. In other words we are storing the metadata of the photo as well as the permissions applicable for this photo in the Solr document itself. This group_id field by the way is of long type. Additionally we have certain boolean and constant type fields named visibleToEndUser (boolean) and entityType (a java enum between 0 to 5). The first field defaultSearch is a copyField which contains a copy of all the values of 6 text type fields that I have mentioned. The way we query presently using the default search handler is like this. defaultSearch:(Dog) AND (group_id:2181347 OR group_id:2181364 OR group_id:2216624 OR group_id:2216990) AND (entityType:0) AND (visibleToEndUser:true) We want to start using the dismax (if not dismax then edismax) query handler but so far I have not been able to replicate the query mentioned above to the equivalent dismax form. What I cannot figure out is? 1. How do I apply exact match on the group_id, visibleToEndUser and the entityType fields? Or How how do I query a specific field with a specific value rather than searching across all fields with all values. 2. How do I apply OR and AND conditions? Swapnonil Mukherjee
Re: How to index on basis of a condition?
My Sql is select IF(sub_cat_id=2002, ad_post_date, null) as 'ad_sort_field' from tcuser.ad_details where +---+ | ad_sort_field | +---+ | 2010-05-30| | 2010-05-02| | 2010-10-07| | NULL| | 2010-10-15| | NULL| ++ Thanks Pawan On Tue, Oct 26, 2010 at 4:36 PM, Gora Mohanty wrote: > On Tue, Oct 26, 2010 at 3:56 PM, Pawan Darira > wrote: > > I am using mysql database, and, field type is "date" > [...] > > Could you show us the exact SELECT statement, and some example > values returned by running the SELECT directly at a mysql console? > > Regards, > Gora > -- Thanks, Pawan Darira
Re: How to index on basis of a condition?
On Tue, Oct 26, 2010 at 3:56 PM, Pawan Darira wrote: > I am using mysql database, and, field type is "date" [...] Could you show us the exact SELECT statement, and some example values returned by running the SELECT directly at a mysql console? Regards, Gora
RE: Does Solr reload schema.xml dynamically?
Note that usually when you change the schema.xml you have not only to restart solr, but also rebuild the index, so the issue of how to reload the file seems like a small problem... Ephraim Ofir -Original Message- From: Peter Karich [mailto:peat...@yahoo.de] Sent: Tuesday, October 26, 2010 12:29 PM To: solr-user@lucene.apache.org Subject: Re: Does Solr reload schema.xml dynamically? Hi, See this: http://wiki.apache.org/solr/CoreAdmin#RELOAD Solr will also load the new configuration (without restart the webapp) on the slaves when using replication: http://wiki.apache.org/solr/SolrReplication Regards, Peter. > Hi Everybody, > > If I change my schema.xml to, do I have to restart Solr. Is there some way, I can apply the changes to schema.xml without restarting Solr? > > Swapnonil Mukherjee > > > > -- http://jetwick.com twitter search prototype
Highlighting for non-stored fields
Hi, I've been looking thru the mailing archive for the past week and I haven't found any useful info regarding this issue. My requirement is to index a few terabytes worth of data to be searched. Due to the size of the data, I would like to index without storing but I would like to use the highlighting feature. Is this even possible? What are my options? I've read about termOffsets, payload that could possibly be used to do this but I have no idea how this could be done. Any pointers greatly appreciated. Someone please point me in the right direction. I don't mind having to write some code or digging thru existing code to accomplish this task. Thanks, P.
RE: How to index on basis of a condition?
This is probably just a date format problem, nothing to do with the IF() statement. Try applying this on your date: DATE_FORMAT(yourDate, '%Y-%m-%dT00:00:00Z') Ephraim Ofir -Original Message- From: Pawan Darira [mailto:pawan.dar...@gmail.com] Sent: Tuesday, October 26, 2010 12:26 PM To: solr-user@lucene.apache.org Subject: Re: How to index on basis of a condition? I am using mysql database, and, field type is "date" On Tue, Oct 26, 2010 at 2:56 PM, Gora Mohanty wrote: > On Tue, Oct 26, 2010 at 2:37 PM, Pawan Darira > wrote: > > Thanks Mr. Ephraim Ofir. I used the SELECT IF() for my requirement. The > > query result is correct. But when i see it in my index, the value stored > is > > something unusual bunch of characters e.g. "*...@6628ad5a"* > [...] > > Which database are you indexing from? The field type is probably > a blob in the database. Check that, and look into the ClobTransformer: > http://wiki.apache.org/solr/DataImportHandler#ClobTransformer > > Regards, > Gora > -- Thanks, Pawan Darira
Re: Does Solr reload schema.xml dynamically?
Hi, See this: http://wiki.apache.org/solr/CoreAdmin#RELOAD Solr will also load the new configuration (without restart the webapp) on the slaves when using replication: http://wiki.apache.org/solr/SolrReplication Regards, Peter. Hi Everybody, If I change my schema.xml to, do I have to restart Solr. Is there some way, I can apply the changes to schema.xml without restarting Solr? Swapnonil Mukherjee -- http://jetwick.com twitter search prototype
Re: How to index on basis of a condition?
I am using mysql database, and, field type is "date" On Tue, Oct 26, 2010 at 2:56 PM, Gora Mohanty wrote: > On Tue, Oct 26, 2010 at 2:37 PM, Pawan Darira > wrote: > > Thanks Mr. Ephraim Ofir. I used the SELECT IF() for my requirement. The > > query result is correct. But when i see it in my index, the value stored > is > > something unusual bunch of characters e.g. "*...@6628ad5a"* > [...] > > Which database are you indexing from? The field type is probably > a blob in the database. Check that, and look into the ClobTransformer: > http://wiki.apache.org/solr/DataImportHandler#ClobTransformer > > Regards, > Gora > -- Thanks, Pawan Darira
Re: command line to check if Solr is up running
Hi Xin, from the wiki: http://wiki.apache.org/solr/SolrConfigXml The URL of the "ping" query is* /admin/ping * You can also check (via wget) the number of documents. it might look like a rusty hack but it works for me: wget -T 1 -q "http://localhost:8080/solr/select?q=*:*"; -O - | tr '/>' '\n' | grep numFound | tr '"' ' ' | awk '{print $5}'` Regards, Peter. As we know we can use browser to check if Solr is running by going to http://$hostName:$portNumber/$masterName/admin, say http://localhost:8080/solr1/admin. My questions is: are there any ways to check it using command line? I used "curl http://localhost:8080"; to check my Tomcat, it worked fine. However, no response if I try "curl http://localhost:8080/solr1/admin"; (even when my Solr is running). Does anyone know any command line alternatives? Thanks, Xin This electronic mail message contains information that (a) is or may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM DISCLOSURE, and (b) is intended only for the use of the addressee(s) named herein. If you are not an intended recipient, please contact the sender immediately and take the steps necessary to delete the message completely from your computer system. Not Intended as a Substitute for a Writing: Notwithstanding the Uniform Electronic Transaction Act or any other law of similar effect, absent an express statement to the contrary, this e-mail message, its contents, and any attachments hereto are not intended to represent an offer or acceptance to enter into a contract and are not otherwise intended to bind this sender, barnesandnoble.com llc, barnesandnoble.com inc. or any other person or entity. -- http://jetwick.com twitter search prototype
Re: Does Solr reload schema.xml dynamically?
If you are using Solr Multicore http://wiki.apache.org/solr/CoreAdmin you can issue a Reload command http://localhost:8983/solr/admin/cores?action=RELOAD&core=core0 On 26 Oct 2010, at 11:09, Swapnonil Mukherjee wrote: > Hi Everybody, > > If I change my schema.xml to, do I have to restart Solr. Is there some way, I > can apply the changes to schema.xml without restarting Solr? > > Swapnonil Mukherjee > > >
Does Solr reload schema.xml dynamically?
Hi Everybody, If I change my schema.xml to, do I have to restart Solr. Is there some way, I can apply the changes to schema.xml without restarting Solr? Swapnonil Mukherjee
Re: How to index on basis of a condition?
On Tue, Oct 26, 2010 at 2:37 PM, Pawan Darira wrote: > Thanks Mr. Ephraim Ofir. I used the SELECT IF() for my requirement. The > query result is correct. But when i see it in my index, the value stored is > something unusual bunch of characters e.g. "*...@6628ad5a"* [...] Which database are you indexing from? The field type is probably a blob in the database. Check that, and look into the ClobTransformer: http://wiki.apache.org/solr/DataImportHandler#ClobTransformer Regards, Gora
Re: How to index on basis of a condition?
Thanks Mr. Ephraim Ofir. I used the SELECT IF() for my requirement. The query result is correct. But when i see it in my index, the value stored is something unusual bunch of characters e.g. "*...@6628ad5a"* Please suggest as to what went wrong. - Pawan On Mon, Oct 25, 2010 at 6:44 PM, Ephraim Ofir wrote: > Assuming you're talking about data that comes from a DB, I find it easiest > to do this kind of logic on the DB's side (mssql example): > SELECT IF(someField = someValue, desiredValue, NULL) AS desiredName from > someTable > > If that's not possible, you can use RegexTransformer( > http://wiki.apache.org/solr/DataImportHandler#RegexTransformer) or (worst > case and worst performance) ScriptTransformer( > http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer) and > actually write a JS script to do your logic. > > Ephraim Ofir > > -Original Message- > From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] > Sent: Monday, October 25, 2010 10:23 AM > To: solr-user@lucene.apache.org > Subject: Re: How to index on basis of a condition? > > Do you want to use a field's content do decide whether the document should > be indexed or not? > You could write an UpdateProcessor for that, simply aborting the chain for > the docs that don't pass your test. > > @Override > public void processAdd(AddUpdateCommand cmd) throws IOException { >SolrInputDocument doc = cmd.getSolrInputDocument(); >String value = (String) doc.getFieldValue("myfield"); >String condition = "foobar"; >if(value == condition) { >super.processAdd(cmd); >} > } > > But if what you meant was to skip only that field if it does not match > condition, you could use doc.removeField(name) instead. Now you can feed > your content using whatever method you like. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > On 25. okt. 2010, at 08.38, Pawan Darira wrote: > > > Hi > > > > I want to index a particular field on one if() condition. Can i do it > > through DIH? > > > > Please suggest. > > > > -- > > Thanks, > > Pawan Darira > > -- Thanks, Pawan Darira
Re: Need help for solr searching case insensative item
Hi, You need to share relevant parts of your schema for us to be able to see what's going on. Try using fieldType="text". Basically, you need a fieldType which has the lowercaseFilter included. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 25. okt. 2010, at 21.09, wu liu wrote: > Hi all, > > I just noticed a wierd thing happend to my solr search result. > if I do a search for "ecommons", it cannot get the result for "eCommons", > instead, > if i do a search for "eCommons", i can only get all the match for "eCommons", > but not "ecommons". > > I cannot figure it out why? > > please help me > > Thanks very much in advance
Re: Need help for solr searching case insensative item
Sounds like WordDelimiterFilter config issue, please refer to http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory . Also it will help if you could provide: 1) Tokenizers/Filters config in schema file 2) analysis.jsp output in admin page. 2010/10/26 wu liu > Hi all, > > I just noticed a wierd thing happend to my solr search result. > if I do a search for "ecommons", it cannot get the result for "eCommons", > instead, > if i do a search for "eCommons", i can only get all the match for > "eCommons", but not "ecommons". > > I cannot figure it out why? > > please help me > > Thanks very much in advance >
Re: DIH wiht several Cores
okay. how did you solve this ? do you wrote an own importer ? we have a "own" "importer" yet, but only for one instance of solr and one index, we want to spit this in severeal cores and indexes and want to use DIH because we think his indexing is better than a php skript ... -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-wiht-several-Cores-tp1767883p1772223.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Modelling Access Control
The idea of ACL-based queries is: each document carries all of the groups or roles that it is ok with. Each user search includes all of the groups or roles the user has. The roles are stored as multivalued string fields. Each ACL-based query passes in "roles:A OR roles:B OR roles:C" and if any of A,B,C are in the stored ACL field, you have a match. This is called "early binding". "Late binding" is when you return everything and the app calls LDAP and say "can she see this? or this?". This is slow and puts a monster load on the ACL server. Very important: do not make a spelling or autosuggest index from a text field which some people can see and other people can't. On Tue, Oct 26, 2010 at 12:06 AM, Lance Norskog wrote: > Filter queries are a set of bits which is ANDed against query results > at a very early stage of query processing. They are very useful. Note > that they are stored (I think) in parsed query order, so you have to > pass in the same filter query string each time. > > On Mon, Oct 25, 2010 at 8:59 AM, Dennis Gearon wrote: >> Thanks for that insight, a lot. >> >> Dennis Gearon >> >> Signature Warning >> >> It is always a good idea to learn from your own mistakes. It is usually a >> better idea to learn from others’ mistakes, so you do not have to make them >> yourself. from >> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' >> >> EARTH has a Right To Life, >> otherwise we all die. >> >> >> --- On Mon, 10/25/10, Jonathan Rochkind wrote: >> >>> From: Jonathan Rochkind >>> Subject: Re: Modelling Access Control >>> To: "solr-user@lucene.apache.org" >>> Date: Monday, October 25, 2010, 8:19 AM >>> Dennis Gearon wrote: >>> > why use filter queries? >>> > >>> > Wouldn't reducing the set headed into the filters by >>> putting it in the main query be faster? (A question to >>> learn, since I do NOT know :-) >>> > >>> > >>> No. At least as I understand it. In the best case, the >>> filter query will be a lot faster, because filter queries >>> are cached seperately in the filter cache. So if the >>> existing filter query can be found in the cache, it'll be a >>> lot faster. If it's not in the cache, the performance should >>> be pretty much the same as if you had included it as an >>> additional clause in the main q query. >>> >>> The reasons to put it in a fq filter are: >>> >>> 1) The caching behavior. You can have that certain part of >>> the query be cached on it's own, speeding up any subsequent >>> queries that use that same fq. >>> >>> 2) Simplification of client code. You can leave your 'q' >>> however you want it, using whatever kind of query parser you >>> want too (dismax, whatever), and just add on the 'fq' >>> without touching the 'q'. This is a lot >>> easier to do, and especially when you're using it for access >>> control like this, a lot harder for a bug to creep in. >>> >>> Jonathan >>> >>> >>> >> > > > > -- > Lance Norskog > goks...@gmail.com > -- Lance Norskog goks...@gmail.com
Re: Modelling Access Control
Filter queries are a set of bits which is ANDed against query results at a very early stage of query processing. They are very useful. Note that they are stored (I think) in parsed query order, so you have to pass in the same filter query string each time. On Mon, Oct 25, 2010 at 8:59 AM, Dennis Gearon wrote: > Thanks for that insight, a lot. > > Dennis Gearon > > Signature Warning > > It is always a good idea to learn from your own mistakes. It is usually a > better idea to learn from others’ mistakes, so you do not have to make them > yourself. from > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > EARTH has a Right To Life, > otherwise we all die. > > > --- On Mon, 10/25/10, Jonathan Rochkind wrote: > >> From: Jonathan Rochkind >> Subject: Re: Modelling Access Control >> To: "solr-user@lucene.apache.org" >> Date: Monday, October 25, 2010, 8:19 AM >> Dennis Gearon wrote: >> > why use filter queries? >> > >> > Wouldn't reducing the set headed into the filters by >> putting it in the main query be faster? (A question to >> learn, since I do NOT know :-) >> > >> > >> No. At least as I understand it. In the best case, the >> filter query will be a lot faster, because filter queries >> are cached seperately in the filter cache. So if the >> existing filter query can be found in the cache, it'll be a >> lot faster. If it's not in the cache, the performance should >> be pretty much the same as if you had included it as an >> additional clause in the main q query. >> >> The reasons to put it in a fq filter are: >> >> 1) The caching behavior. You can have that certain part of >> the query be cached on it's own, speeding up any subsequent >> queries that use that same fq. >> >> 2) Simplification of client code. You can leave your 'q' >> however you want it, using whatever kind of query parser you >> want too (dismax, whatever), and just add on the 'fq' >> without touching the 'q'. This is a lot >> easier to do, and especially when you're using it for access >> control like this, a lot harder for a bug to creep in. >> >> Jonathan >> >> >> > -- Lance Norskog goks...@gmail.com