How do I add a custom field?
Hello, I want to have an additional field that appears for every document in search results. I understand that I should do this by adding the field to the schema.xml, so I add: Then I restart Solr (so that I loads the new schema.xml) and make a query specifying that it should return myField too, but it doesn't. Will it do only for newly indexed documents? Am I missing something? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains "[LON]" or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) < Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with "X". ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How do I add a custom field?
Hi Gabriele, Did you index any docs with your new field ? The results will just bring back docs and what fields they have. They won't bring back "null" fields just because they are in your schema. Lucene is schema-less. Solr adds the schema to make it nice to administer and very powerful to use. On 3 July 2011 11:01, Gabriele Kahlout wrote: > Hello, > > I want to have an additional field that appears for every document in > search results. I understand that I should do this by adding the field to > the schema.xml, so I add: > indexed="false"/> > Then I restart Solr (so that I loads the new schema.xml) and make a query > specifying that it should return myField too, but it doesn't. Will it do > only for newly indexed documents? Am I missing something? > > -- > Regards, > K. Gabriele > > --- unchanged since 20/9/10 --- > P.S. If the subject contains "[LON]" or the addressee acknowledges the > receipt within 48 hours then I don't resend the email. > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) > < Now + 48h) ⇒ ¬resend(I, this). > > If an email is sent by a sender that is not a trusted contact or the email > does not contain a valid code then the email is not received. A valid code > starts with a hyphen and ends with "X". > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ > L(-[a-z]+[0-9]X)). >
Re: non-alphanumeric character searching
I'd start by removing lots of stuff, particularly WordDelimiterFilterFactory. That's splitting your input up by non-alpha characters. If you really want just the string stored, try just using KeywordTokenizer and LowerCaseFilter (although AsciFolding... wouln't hurt). But the best way to understand all the effects of various analysis chains is to use the admin/analysis page (be sure to turn the verbose output on). It'll show you exctly what produces what transformations. Another very useful tool is adding &debugQuery=on to your URLs and looking at the parsed output. Oh, and I expect that some of your unexpected results are a result of searching against our default field. Searching "example:stuff" will search against the field "example". Searching "stuff" will search against the in your schema.xml (note that the &debugQuery=on will show this) Hope this helps Erick On Jul 1, 2011 11:19 AM, "Lisa Riggle" wrote: > Hi Everyone! > > I'm very new to solr and have a question that I hope you all can answer. > > My boss has me learning solr for work, with the specific goal of > improving the schema on one of the cores of our site. This core > consists of nothing but company names from our database, so I think that > makes things easier, since there's no need to worry about parsing email > address or URL's or anything. > > Anyway, I am running into some problems with non-alphanumeric characters > in company names causing searches to return wild results. For example, > there is 1 company in our database stored as /HPC Inter@ctive > (ApartmentGuide.com)/. In my test script, I have a couple of different > search strings that don't seem to return consistent results. For > example:/hpc inter@ctive/ returns 1 result (yay), but /hpc inter@ctive > (apartment/ and /hpc inter@ctive (apartmentguide/ both return 0 > results. /inter@ctive/ by itself returns 832 results. > > Among other issues, I'm having a heck of a time trying to figure out how > to make solr just search for "inter@ctive" as a whole word instead of > splitting it up at the @ and searching for "inter" and "ctive". > > How do I get solr to ignore special characters, like @, and just treat > it as part of the string? > > I've spent some time trying out diffrerent tokenizers and filters, and > rearranging the order of some of the filters. Doing that does affect > the results at times, but mostly I get the results listed above. I also > tried using the PatternReplacefilterFactory to just remove all special > characters from the index/search strings, but I'm fantastically bad at > regex, so that didn't work either. > > I appreciate any and all advice. > Thanks! > --Lisa > > -- > > I'm running a default install of solr 3.2 with the following schema: > > > > > > > > > > > catenateNumbers="0" catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/> > > > > > > > > > > > catenateNumbers="0" catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/> > > > > > > > I don't know what other information is needed to point me in the right > direction, but please let me know if there's something I can send that > will be of assistance.
Re: upgraded from 2.9 to 3.x, problems. help?
Can you post the results of adding &debugQuery=on to your two versions? And have you re-indexed or not? Best Erick On Jul 1, 2011 12:31 PM, "dhastings" wrote: > i guess what im asking is how to set up solr/lucene to find > yale l.j. > yale l. j. > yale l j > as all the same thing. > > -- > View this message in context: http://lucene.472066.n3.nabble.com/upgraded-from-2-9-to-3-x-problems-help-tp3129348p3129520.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: How do I add a custom field?
Is there how I can compute and add the field to all indexed documents without re-indexing? MyField counts the number of terms per document (unique word count). On Sun, Jul 3, 2011 at 12:24 PM, lee carroll wrote: > Hi Gabriele, > Did you index any docs with your new field ? > > The results will just bring back docs and what fields they have. They won't > bring back "null" fields just because they are in your schema. Lucene > is schema-less. > Solr adds the schema to make it nice to administer and very powerful to > use. > > > > > > On 3 July 2011 11:01, Gabriele Kahlout wrote: > > Hello, > > > > I want to have an additional field that appears for every document in > > search results. I understand that I should do this by adding the field to > > the schema.xml, so I add: > > > indexed="false"/> > > Then I restart Solr (so that I loads the new schema.xml) and make a query > > specifying that it should return myField too, but it doesn't. Will it do > > only for newly indexed documents? Am I missing something? > > > > -- > > Regards, > > K. Gabriele > > > > --- unchanged since 20/9/10 --- > > P.S. If the subject contains "[LON]" or the addressee acknowledges the > > receipt within 48 hours then I don't resend the email. > > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ > time(x) > > < Now + 48h) ⇒ ¬resend(I, this). > > > > If an email is sent by a sender that is not a trusted contact or the > email > > does not contain a valid code then the email is not received. A valid > code > > starts with a hyphen and ends with "X". > > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ > > L(-[a-z]+[0-9]X)). > > > -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains "[LON]" or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) < Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with "X". ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Exception when using result grouping and sorting by geodist() with Solr 3.3
Hello, I just tried up(down?)grading our current Solr 4.0 trunk setup to Solr 3.3.0 as result grouping was the only reason for us to stay with the trunk. Everything worked like a charm except for one of our queries, where we group results by the owning user and sort by distance. A simplified example for my query (that still fails) looks like this: q=*:*&group=true&group.field=user.uniqueId_s&group.main=true&group.format=grouped&sfield=user.location_p&pt=48.20927,16.3728&sort=geodist() > asc The exception thrown is: Caused by: org.apache.solr.common.SolrException: Unweighted use of sort > geodist(latlon(user.location_p),48.20927,16.3728) > at > org.apache.solr.search.function.ValueSource$1.newComparator(ValueSource.java:106) > at org.apache.lucene.search.SortField.getComparator(SortField.java:413) > at > org.apache.lucene.search.grouping.AbstractFirstPassGroupingCollector.(AbstractFirstPassGroupingCollector.java:81) > at > org.apache.lucene.search.grouping.TermFirstPassGroupingCollector.(TermFirstPassGroupingCollector.java:56) > at > org.apache.solr.search.Grouping$CommandField.createFirstPassCollector(Grouping.java:587) > at org.apache.solr.search.Grouping.execute(Grouping.java:256) > at > org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:237) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) > at > org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:140) > ... 39 more Any ideas how to fix this or work around this error for now? I'd really like to move from the trunk to the stable 3.3.0 release and this is the only problem currently keeping me from doing so. Cheers, Thomas
How do I compute and store a field?
Hello, I'm trying to add a field that counts the number of terms in a document to my schema. So far I've been computing this value at query-time. Is there how I could compute this once only and store the field? final SolrIndexSearcher searcher = request.getSearcher(); final SolrIndexReader reader = searcher.getReader(); final String content = "content"; final byte[] norms = reader.norms(content); final int[] docLengths; if (norms == null) { docLengths = null; } else { docLengths = new int[norms.length]; int i = 0; for (byte b : norms) { float docNorm = searcher.getSimilarity().decodeNormValue(b); int docLength = 0; if (docNorm != 0) { docLength = (int) (1 / docNorm); //reciprocal } docLengths[i++] = docLength; } ... final NumericField docLenNormField = new NumericField(TestQueryResponseWriter.DOC_LENGHT); docLenNormField.setIntValue(docLengths[id]); doc.add(docLenNormField); -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains "[LON]" or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) < Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with "X". ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How do I add a custom field?
You'll need to index the field. I would think you would want to index/store the field along with the associated document, in which case you'll have to reindex the documents as well - there's no single-field update capability in Lucene (yet?). -Mike On 7/3/2011 1:09 PM, Gabriele Kahlout wrote: Is there how I can compute and add the field to all indexed documents without re-indexing? MyField counts the number of terms per document (unique word count). On Sun, Jul 3, 2011 at 12:24 PM, lee carroll wrote: Hi Gabriele, Did you index any docs with your new field ? The results will just bring back docs and what fields they have. They won't bring back "null" fields just because they are in your schema. Lucene is schema-less. Solr adds the schema to make it nice to administer and very powerful to use. On 3 July 2011 11:01, Gabriele Kahlout wrote: Hello, I want to have an additional field that appears for every document in search results. I understand that I should do this by adding the field to the schema.xml, so I add: Then I restart Solr (so that I loads the new schema.xml) and make a query specifying that it should return myField too, but it doesn't. Will it do only for newly indexed documents? Am I missing something? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains "[LON]" or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) < Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with "X". ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Solr 3.2 filter cache warming taking longer than 1.4.1
On 7/2/2011 12:34 PM, Yonik Seeley wrote: OK, I tried a quick test of 1.4.1 vs 3x on optimized indexes (unoptimized had different numbers of segments so I didn't try that). 3x (as of today) was 28% faster at a large filter query (300 terms in one big disjunction, with each term matching ~1000 docs). A lot of the terms used in my filter queries may match hundreds of thousands or even millions of documents. The largest search group (sg:stdp) matches about 1.4 million out of 9.5 million docs on each shard, and is probably present in most filter queries. Right now I have the default termIndexInterval of 128, and a setTermIndexDivisor of 8. I think this probably has the same memory footprint as a termIndexInterval of 1024, but because it can do seeks in the tii file (taking good advantage of disk cache) before it ultimately seeks in the tis file, there are probably fewer seeks. My warm time is slightly better than it was with the interval at 1024, and my average query speed hasn't changed much. I am going to try an interval of 64 and a divisor of 16. I'm interested in other performance enhancing ideas that don't involve tweaking tons of options all at the same time. I think my best bet for performance is adding more memory, of course. Shawn
Custom Cache cleared after a commit?
I know the queryResultCache and stuff live only so long as a commit happens but I'm wondering if the custom caches are like this as well? I'd actually rather have a custom cache which is not cleared at all. I want to give the elements of this Cache a 6 hour TTL (or some time frame) but I never want it to clear on a commit. Is this possible using SolrCache? -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-Cache-cleared-after-a-commit-tp3136345p3136345.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom Cache cleared after a commit?
On Sun, Jul 3, 2011 at 10:52 PM, arian487 wrote: > I know the queryResultCache and stuff live only so long as a commit happens > but I'm wondering if the custom caches are like this as well? I'd actually > rather have a custom cache which is not cleared at all. That's not currently possible. The nature of Solr's caches are that they are completely transparent - it doesn't matter if a cache is used or not, the response should always be the same. This is analogous to caching the fact that 2*2 = 4. Put another way, Solr's caches are only for increasing request throughput, and should not affect what response a client receives. -Yonik http://www.lucidimagination.com
how to improve query result time.
Hi All I have complex phrase queries including wildcard. (ex. q="conn* pho*"~2 OR "inter* pho*"~2 OR ...) That takes long query result time. I tried reindex after changing termIndexInterval to 8 for reduce the query result time through more loading term index info. I thought if I do so query result time will be faster. But it wasn't. I doubt searching for .frq/.prx spends more time... Any ideas for impoving query result time? I'm using Solr 1.4 and schema.xml is below. Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-improve-query-result-time-tp3136554p3136554.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom Cache cleared after a commit?
I guess I'll have to use something other then SolrCache to get what I want then. Or I could use SolrCache and just change the code (I've already done so much of this anwyways...). Anyways thanks for the reply. -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-Cache-cleared-after-a-commit-tp3136345p3136580.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: MergerFactor and MaxMergerDocs effecting num of segments created
Shawn when i reindex data using full-import i got: *_0.fdt 3310 _0.fdx 23 _0.frq 857 _0.nrm 31 _0.prx 1748 _0.tis 350 _1.fdt 3310 _1.fdx 23 _1.fnm 1 _1.frq 857 _1.nrm 31 _1.prx 1748 _1.tii 5 _1.tis 350 segments.gen1 segments_3 1* Where all _1 marked as archived(A) And when i run again full import(for testing ) i got _1 and 2_ files where all 2_ marked as archive. What does it mean. and the problem i am not getting is while i am doing full import which deletes the old indexes and creates new than why i m getting the old one again. - Thanks & Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/MergerFactor-and-MaxMergerDocs-effecting-num-of-segments-created-tp3128897p3136664.html Sent from the Solr - User mailing list archive at Nabble.com.