RE: Question regarding synonym
Hi i have a question regarding synonymfilter i have a one way mapping defined austin martin, astonmartin = aston martin ... Can anybody please explain if my observation is correct. This is a very critical aspect for my work. That is correct - the synonym filter can recognize multi-token synonyms from consecutive tokens in a stream.
RE: Mixed field types and boolean searching
No- there are various analyzers. StandardAnalyzer is geared toward searching bodies of text for interesting words - punctuation is ripped out. Other analyzers are more useful for concrete text. You may have to work at finding one that leaves punctuation in. My problem is not with the StandardAnalyzer per se, but more as to how dismax style queries are handled by the query parser when the different fields have different sets of ignored tokens or stop words. Say you want to use the contents of a text box in your app and query a field in Solr. The user enters A and B, so you map this to f1:A and f1:B. Now, if B is an ignored token in the f1 field for whatever reason, the query boils down to f1:A. Now imagine you want to allow the user's text to match multiple fields - as in any term can match any field, but all terms must match at least 1 field. So now you map the user's query to (f1:A OR f2:A) AND (f1:B OR f2:B). But if f2 does not ignore B, the query boils down to (f1:A OR f2:A) AND (f2:B). Now documents that could come back when you were only matching against the f1 field don't come back. This seems counter-intuitive - to be consistent, I would think the query should essentially be treated as (f1:A OR f2:A) AND (TRUE OR f2:B) - and thus a term that is a stop word or ignored token for any of the fields would be ignored across the board. So I guess what I'm asking is if there is a reason for the existing behavior, or is it just a fact-of-life of the query parser? Thanks! -Ken
RE: Alphanumeric Wild Card Search Question
Here's my question: I have some products that I want to allow people to search for with wild cards. For example, if my product is YBM354, I'd like for users to be able to search on YBM*, YBM3*, YBM35* and for any of these searches to return that product. I've found that I can search for YBM* and get the product, just not the other combinations. Are you using WordDelimiterFilterFactory? That would explain this behavior. If so, do you need it - for the queries you describe you don't need that kind of tokenization. Also, have you played with the analysis tool on the admin page, it is a great help in debugging things like this. -Ken
RE: SolrJ question
You can escape the string with org.apache.lucene.queryParser.QueryParser.escape(String query) http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/queryParser/QueryParser.html#escape%28java.lang.String%29 -Original Message- From: ptomb...@gmail.com [mailto:ptomb...@gmail.com] On Behalf Of Paul Tomblin Sent: Monday, August 17, 2009 5:12 PM To: solr-user@lucene.apache.org Subject: SolrJ question If I put an object into a SolrInputDocument and store it, how do I query for it back? For instance, I stored a java.net.URI in a field called url, and I want to query for all the documents that match a particular URI. The query syntax only seems to allow Strings, and if I just try query.setQuery(url: + uri.toString()) I get an error because of the colon after http in the URI. I'm really new to Solr, so please let me know if I'm missing something basic here. -- http://www.linkedin.com/in/paultomblin
RE: SolrJ question
Does this mean I should have converted my objects to string before writing them to the server? I believe SolrJ takes care of that for you by calling toString(), but you would need to convert explicitly when you query (and then escape).
RE: Using Lucene's payload in Solr
It looks like things have changed a bit since this subject was last brought up here. I see that there are support in Solr/Lucene for indexing payload data (DelimitedPayloadTokenFilterFactory and DelimitedPayloadTokenFilter). Overriding the Similarity class is straight forward. So the last piece of the puzzle is to use a BoostingTermQuery when searching. I think all I need to do is to subclass Solr's LuceneQParserPlugin uses SolrQueryParser under the cover. I think all I need to do is to write my own query parser plugin that uses a custom query parser, with the only difference being in the getFieldQuery() method where a BoostingTermQuery is used instead of a TermQuery. The BTQ is now deprecated in favor of the BoostingFunctionTermQuery, which gives some more flexibility in terms of how the spans in a single document are scored. Am I on the right track? Yes. Has anyone done something like this already? I wrote a QParserPlugin that seems to do the trick. This is minimally tested - we're not actually using it at the moment, but should get you going. Also, as Grant suggested, you may want to sub BFTQ for BTQ below: package com.zoominfo.solr.analysis; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.index.Term; import org.apache.lucene.queryParser.*; import org.apache.lucene.search.*; import org.apache.lucene.search.payloads.BoostingTermQuery; import org.apache.solr.common.params.*; import org.apache.solr.common.util.NamedList; import org.apache.solr.request.SolrQueryRequest; import org.apache.solr.search.*; public class BoostingTermQParserPlugin extends QParserPlugin { public static String NAME = zoom; public void init(NamedList args) { } public QParser createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) { System.out.print(BoostingTermQParserPlugin::createParser\n); return new BoostingTermQParser(qstr, localParams, params, req); } } class BoostingTermQueryParser extends QueryParser { public BoostingTermQueryParser(String f, Analyzer a) { super(f, a); System.out.print(BoostingTermQueryParser::BoostingTermQueryParser\n); } @Override protected Query newTermQuery(Term term){ System.out.print(BoostingTermQueryParser::newTermQuery\n); return new BoostingTermQuery(term); } } class BoostingTermQParser extends QParser { String sortStr; QueryParser lparser; public BoostingTermQParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) { super(qstr, localParams, params, req); System.out.print(BoostingTermQParser::BoostingTermQParser\n); } public Query parse() throws ParseException { System.out.print(BoostingTermQParser::parse\n); String qstr = getString(); String defaultField = getParam(CommonParams.DF); if (defaultField==null) { defaultField = getReq().getSchema().getSolrQueryParser(null).getField(); } lparser = new BoostingTermQueryParser(defaultField, getReq().getSchema().getQueryAnalyzer()); // these could either be checked set here, or in the SolrQueryParser constructor String opParam = getParam(QueryParsing.OP); if (opParam != null) { lparser.setDefaultOperator(AND.equals(opParam) ? QueryParser.Operator.AND : QueryParser.Operator.OR); } else { // try to get default operator from schema lparser.setDefaultOperator(getReq().getSchema().getSolrQueryParser(null).getDefaultOperator()); } return lparser.parse(qstr); } public String[] getDefaultHighlightFields() { return new String[]{lparser.getField()}; } }
RE: Solr failing on y charakter in string?
Ok still not working with new field text_two: str name=qtext:Har* text_two:Har*/str == result 0 Schema Updates: fieldType name=text_two class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=text_two type=text_two indexed=true stored=false multiValued=true/ copyField source=text dest=text_two/ I'm pretty sure the query string needs to be lower-case, since a wildcard query is not analyzed. I think what Avlesh was suggesting was more like this: str name=qtext:Har text_two:har*/str So the original field would be for a regular query containing whatever the user entered and would undergo the usual analysis for searching, and the secondary field would be used to construct a wildcard query which would strictly serve the begins-with case. -Ken
RE: Boosting ('bq') on multi-valued fields
Hey Ken, Thanks for your reply. When I wrote '5|6' I ment that this is a multiValued field with two values '5' and '6', rather than the literal string '5|6' (and any Tokenizer). Does your reply still holds? That is, are multiValued fields dependent on the notion of tokenization to such a degree so that I cant use str type with them meaningfully? if so, it seems weird to me that I should be able to define a str multiValued field to begin with.. I'm pretty sure you can use multiValued string fields in the way you are describing. If you just do a query without the boost do documents with multiple values come back? That would at least tell you whether the problem was matching on the term itself or something to do with your use of boosts. -Ken
RE: Range Query question
The problem is that the indexed form of this XML is flattened so the car entity has 2 garage names, 2 min values and 2 max values, but the grouping between the garage name and it's min and max values is lost. The danger is that we end up doing a comparison of the min-of-the-mins and the max-of-the-maxes, which tells us that a car is available in the price range which may not be true if garage1 has all cars below our search range and garage2 has all cars above our search range, e.g. if our search range is 5000-6000 then we should get no match. You could index each garage-car pairing as a separate document, embedding all the necessary information you need for searching. e.g.- garage_car car_manufacturerFord/manufacturer car_modelKa/model garage_namegarage1/name min2000/min max4000/max /garage_car
RE: Boosting ('bq') on multi-valued fields
Hey, I have a field defined as such: field name=site_idtype=string indexed=true stored=false multiValued=true / with the string type defined as: fieldtype name=string class=solr.StrField sortMissingLast=true omitNorms=true/ When I try using some query-time boost parameters using the bq on values of this field it seems to behave strangely in case of documents actually having multiple values: If i'd do a boost for a particular value ( site_id:5^1.1 ) it seems like all the cases where this field is actually populated with multiple ones ( i.e a document with field value 5|6 ) do not get boosted at all. I verified this using debugQuery explainOther=doc_id:document_with_multiple_values. is this a known issue/bug? any work arounds? (i'm using a nightly solr build from a few months back.. ) There is no tokenization on 'string' fields, so a query for 5 does not match a doc with a value of 5|6 for this field. You could try using field type 'text' for this and see what you get. You may need to customize it to you the StandardAnalyzer or WordDelimiterFilterFactory to get the right behavior. Using the analysis tool in the solr admin UI to experiment will probably be helpful. -Ken
RE: multi-word synonyms with multiple matches
You haven't given us the full details on how you are using the SynonymFilterFactory (expand true or false?) but in general: yes the SynonymFilter finds the longest match it can. Sorry - doing expansion at index time: filter class=solr.SynonymFilterFactory synonyms=title_synonyms.txt ignoreCase=true expand=true/ if every svp is also a vp, then being explict in your synonyms (when doing index time expansion) should work... vp,vice president svp,senior vice president=vp,svp,senior vice president That worked - thanks!
RE: nested dismax queries
Filter queries with arbitrary text values may swamp the cache in 1.3. Are you implying this won't happen in 1.4? Can you point me to the feature that would mitigate this? Otherwise, the combinations aren't infinite. Keep the filters seperate in order to limit their number. Specify two simple filters instead of one composite filter, fq=x:bla and fq=y:blub instead of fq=x:bla AND y:blub. See: filterCache/@size, queryResultCache/@size, documentCache/@size http://markmail.org/thread/tb6aanicpt43okcm Michael Ludwig That's what I was thinking would make the most sense, assuming the intersection of the cached bitmaps is efficient enough. Thanks for the reply. -Ken
RE: nested dismax queries
Filter queries with arbitrary text values may swamp the cache in 1.3. Are you implying this won't happen in 1.4? I intended to say just this, but I was on the wrong track. Can you point me to the feature that would mitigate this? What I was thinking of is the following: [#SOLR-475] multi-valued faceting via un-inverted field https://issues.apache.org/jira/browse/SOLR-475 But as you can see, this refers to faceting on multi-valued fields, not to filter queries with arbitrary text. I was off on a tangent. Sorry. To get back to your initial mail, I tend to think that drop-down boxes (the values of which you control) are a nice match for the filter query, whereas user-entered text is more likely to be a candidate for the main query. Michael Ludwig I agree, which brings me back tot the issue of combining dismax with standard queries. It looks like we may need to create a custom query parser to get optimal performance. Thanks again.
RE: Question about index sizes.
That's a great question. And the answer is, of course, it depends. Mostly on the size of the documents you are indexing. 50 million rows from a database table with a handful of columns is very different from 50 million web pages, pdf documents, books, etc. We currently have about 50 million documents split across 2 servers with reasonable performance - sub-second response time in most cases. The total size of the 2 indices is about 300G. I'd say most of the size is from stored fields, though we index just about everything. This is on 64-bit ubuntu boxes with 32G of memory. We haven't pushed this into production yet, but initial load-testing results look promising. Hope this helps! -Original Message- From: Jim Adams [mailto:jasolru...@gmail.com] Sent: Tuesday, June 23, 2009 1:24 PM To: solr-user@lucene.apache.org Subject: Question about index sizes. Can anyone give me a rule of thumb for knowing when you need to go to multicore or shards? How many records can be in an index before it breaks down? Does it break down? Is it 10 million? 20 million? 50 million? Thanks, Jim
multi-word synonyms with multiple matches
We have a field with index-time synonyms called title. Among the entries in the synonyms file are vp,vice president svp,senior vice president However, a search for vp does not return results where the title is senior vice president. It appears that the term vp is not indexed when there is a longer string that matches a different synonym. Is this by design, and is there any way to make solr index all synonyms that match a term, even if it is contained in a longer synonym? Thanks! -Ken
nested dismax queries
The recent discussion of filter queries has got me thinking about other ways to improve performance of our app. We have an index with a lot of fields and we support both single-search-box style queries using DisMax and fielded search using the standard query handler. We also support using both strategies in the same search. For exmaple, a user might enter Alabama Biotechnology in the main search box, triggering a dismax request which returns lots of different types of results. They may then want to refine their search by selecting a specific industry from a drop-down box. We handle this by adding a filterquery (fq=) to the original query. We have dozens of additional fields like this - some with a finite set of discrete values, some with arbitrary text values. The combinations are infinite, and I'm worried we will overwhelm the filterCache by supporting all of these cases as filter queries. I'm investigating nested queries as an alternative way to support this type of hybrid-search. It appears that this only works when the top-level request query is a standard lucene-style query and the nested query is a dismax, and not the other way arround - correct me if I am wrong here. It also appears that what is specified in the {!xxx} as the nested query type must be an actual query type and not the name of a request handler defined in solrconfig.xml. Thus it would seem that the nested query string must supply all of the default parameters for a dismax request. Is this correct? Is there another approach that I am missing? I suppose I could create a new query parser class that would supply the defaults, but that seems like overkill. Any comments are welcome, I just want to know that I am not completely off track and there isn't some really simple way to achieve this that I have overlooked. Thanks all! -Ken
RE: Urgent | Query Issue with Dismax | Please help
?q=facetFormat_product_s:Pfqs ePub eBook Sfqsqt=dismaxrequest - dose not return results, although field facetFormat_product_s is defined in dismaxrequest Handler of solrconfig.xml When you use the dismax handler, you don't need to specify the field in the query string. It's meant to be used as a natural language parser with minimal special syntax.
RE: fq vs. q
-Original Message- From: Fergus McMenemie [mailto:fer...@twig.me.uk] Sent: Friday, June 12, 2009 3:41 PM To: solr-user@lucene.apache.org Subject: Re: fq vs. q On Fri, Jun 12, 2009 at 7:09 PM, Michael Ludwig m...@as-guides.com wrote: I've summarized what I've learnt about filter queries on this page: http://wiki.apache.org/solr/FilterQueryGuidance Wow! This is great! Thanks for taking the time to write this up Michael. I've added a section on analysis, scoring and faceting aspects. +1 definitely a great article I ran into this very issue recently as we are using a freshness filter for our data that can be 6//12/18 months etc. I discovered that even though we were only indexing with day-level granularity, we were specifying the query by computing a date down to the second and thus virutally every filter was unique. It's amazing how something this simple could bring solr to it's knees on a large data set. By simply changing the filter to date:[NOW-18MONTHS TO NOW] or equivalent, the problem vanishes. It does bring up an interestion question though - how is NOW treated wrt to the cache key? Does solr translate it to a date first? If so, how does it determine the granularity? If not, is there any mechanism to flush the cache when the corresponding result set changes? -Ken
RE: Filtering query terms
When I try testing the filter solr.LowerCaseFilterFactory I get different results calling the following urls: 1. http://[server-ip]:[server-port]/solr/[core- name]/select/?q=all%3Apapaversion=2.2start=0rows=10indent=on 2. http://[server-ip]:[server-port]/solr/[core- name]/select/?q=all%3APaPaversion=2.2start=0rows=10indent=on In this case, the WordDelimiterFilterFactory is kicking in on your second search, so APaPa is split into APa and Pa. You can double-check this by using the analysis tool in the admin UI - http://localhost:8983/solr/admin/analysis.jsp Besides, when trying to test the solr.ISOLatin1AccentFilterFactory I get different results calling the following urls: 1. http://[server-ip]:[server-port]/solr/[core- name]/select/?q=all%3Apapaversion=2.2start=0rows=10indent=on 2. http://[server-ip]:[server-port]/solr/[core- name]/select/?q=all%3Apapàversion=2.2start=0rows=10indent=on Not sure what it happening here, but again I would check it with the analysi tool
RE: Incorrect sort with with function query in query parameters
A Unit test would be ideal, but even if you can just provide a list of steps (ie: using this solrconfig+schema, index these docs, then update this one doc, then execute this search) it can help people track things down. Please open a bug and attach as much detail as you can there. -Hoss Was a bug ever opened on this? I am seeing similar behavior (though in my case it's the debug scores that look wrong). -Ken
RE: facet results in order of rank
Hello Solrites (or Solrorians) I prefer Solrdier :) Is it possible to get the average ranking score for a set of docs that would be returned for a given facet value. If not in SOLR, what about Lucene? How hard to implement? I have years of Java experience, but no Lucene coding experience. Would be happy to implement if someone could guide me. thanks Gene I don't know much about the implementation, but it seems to me it should be possible to sum up the scores as the matching facet terms are gathered and counted. According to the docs there are 2 algorithms that do this - one enumerates all the unique values of the facet field and does an intersetion with the query, and the other scans the result set and sums up the unique values in the facet field for each doc. I would start by looking at the source for the FacetComponent (org.apache.solr.handler.component) and SimpleFacets (org.apache.solr.request) classes. Sorry I can't be of more help - it seems like an interesting challenge! Onward... -Ken
RE: Sorting dates with reduced precision
Yes, but dates are fairly spesific, say 06:45 Nov. 2 , 2009. What if I want to say Sort so that withing entries for Nov. 2 , you sort by relevance for example? Append /DAY to the date value you index, for example 1995-12-31T23:59:59Z/DAY will yield 1995-12-31 So that all documents with the same date will then be sorted by relevance or whatever you specify as the next criteria in the sort parameter. Thanks, this happens at indexing time? Yes
RE: storing xml - how to highlight hits in response?
Hi, I'm storing some raw xml in solr (stored and non-tokenized). I'd like to highlight hits in the response, obviously this is problematic as the highlighting elements are also xml. So if I match an attribute value or tag name, the xml response is messed up. Is there a way to highlight only text, that is not part of an xml element? As in, only the text content? You could create a custom Analyzer or Tokenizer that strips everything but the text content. -Ken
RE: modify SOLR scoring
I believe you can use a function query to do this: http://wiki.apache.org/solr/FunctionQuery if you embed the following in your query, you should get a boost for more recent date values: _val_:ord(dateField) Where dateField is the field name of the date you want to use. -Original Message- From: Bertrand DUMAS-PILHOU [mailto:bdum...@eurocortex.fr] Sent: Thursday, April 23, 2009 3:44 PM To: solr-user@lucene.apache.org Subject: modify SOLR scoring Hi everybody, I'm using SOLR with a schema (for example) like this: parutiondate, date, indexed, not stored fulltext, stemmed, indexed, not stored I know it's possible to order by a field or more, but I want to order by score and modify the scrore formula. I'll want keep the SOLR score but add a new parameter in the formula to boost the score of the most recent document. What is the best way to do this ? Thanks. Excuse for my english. -- View this message in context: http://www.nabble.com/modify-SOLR- scoring-tp23198326p23198326.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: storing xml - how to highlight hits in response?
Yeah great idea, thanks. Does anyone know if there is code out there that will do this sort of thing? Perhaps a much simpler option would be to use this: http://lucene.apache.org/solr/api/org/apache/solr/analysis/PatternReplaceFilterFactory.html with a regex of [^]* or something like that - I'm no regex expert. Of course it could get tricky to handle escaped characters and the like, but it may be a good enough poor man's solution. -Ken
RE: Sorting dates with reduced precision
Yes, but dates are fairly spesific, say 06:45 Nov. 2 , 2009. What if I want to say Sort so that withing entries for Nov. 2 , you sort by relevance for example? Append /DAY to the date value you index, for example 1995-12-31T23:59:59Z/DAY will yield 1995-12-31 So that all documents with the same date will then be sorted by relevance or whatever you specify as the next criteria in the sort parameter.
RE: Highlight question
Add the following parameters to the url: hl=truehl.fl=xhtml http://wiki.apache.org/solr/HighlightingParameters -Original Message- From: Bertrand DUMAS-PILHOU [mailto:bdum...@eurocortex.fr] Sent: Wednesday, April 22, 2009 4:43 PM To: solr-user@lucene.apache.org Subject: Highlight question Hi everybody, I have an schema seems like this in SOLR: title, type:string , indexed not stored body, type:string, stemmed, indexed not stored xhtml, type:string, not indexed, stored When user make an search on field title, body or both, I want to highlight the match string in the xhtml field only. How I can do this ? Thanks and sorry for my english. -- View this message in context: http://www.nabble.com/Highlight-question- tp23175851p23175851.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Sort by distance from location?
I've never used them personally, but I think a function query would suit you here. Function queries allow you to define a custom function as a component of the score of a result document. Define a distance function based on the user's current location and the that of the search result, such that the shorter the distance, the higher the function output. This will boost results inversely proportional to the distance from the user. -Ken -Original Message- From: Development Team [mailto:dev.and...@gmail.com] Sent: Tuesday, April 14, 2009 5:32 PM To: solr-user@lucene.apache.org Subject: Sort by distance from location? Hi everybody, My index has latitude/longitude values for locations. I am required to do a search based on a set of criteria, and order the results based on how far the lat/long location is to the current user's location. Currently we are emulating such a search by adding criteria of ever-widening bounding boxes, and the more of those boxes match the document, the higher the score and thus the closer ones appear at the start of the results. The query looks something like this (newlines between each search term): +criteraOne:1 +criteriaTwo:true +latitude:[-90.0 TO 90.0] +longitude:[-180.0 TO 180.0] (latitude:[40.52 TO 40.81] longitude:[-74.17 TO -73.79]) (latitude:[40.30 TO 41.02] longitude:[-74.45 TO -73.51]) (latitude:[39.94 TO 41.38] longitude:[-74.93 TO -73.03]) [[...etc...about 10 times...]] Naturally this is quite slow (query is approximately 6x slower than normal), and... I can't help but feel that there's a more elegant way of sorting by distance. Does anybody know how to do this or have any suggestions? Sincerely, Daryl.
RE: sub skus with colour and size
Every product we have comes in colour and size combinations, I need to do a faceted search on these that allows for colour and size and various other fields. A single product may have multiple colours and multiple sizes. For example a style might be available in black size 12, but also have other sizes in red. If someone searches for red and size 12, it should not bring the product as that combination is not possible. I'm no expert, but one way to do this would be to have a multi-valued field with all the possible combinations, eg if you have the following in your data: color valuered/value sizes10,12/sizes /color color valueblack/value sizes8,10/sizes /color you could create a solr doc with a mulitvalued color field: colorcolor_red size_10 size_12/color colorcolor_black size_8 size_10/color Then if you set the positionIncrementGap in your schema to a sufficiently high value (say 1000), you can use the following query to search for a color size combination: color:color_red size_10~1000 which executes a phrase search with a slop factor of 1000, ensuring it won't cross the field boundary hope this helps! -Ken
RE: using BoostingTermQuery
I'm no QueryParser expert, but I would probably start w/ the default query parser in Solr (LuceneQParser), and then progress a bit to the DisMax one. I'd ask specific questions based on what you see there. If you get far enough along, you may consider asking for help on the java-user list as well. Thanks - I think I've got it working now. I ended up subclassing QueryParser and overriding newTermQuery() to create a BoostingTermQuery instead of a plain ol' TermQuery. Seems to work. Yup - I'm pretty sure I have that side figured out. My input contains terms marked up with a score (ie 'software?7') I just needed to create a TokenFilter that parses out the suffix and sets the Payload on the token. Cool. Patch? Not sure how valuable it is - all I did was create a new subclass of TokenFilter. Here's the code fwiw: public class ScorePayloadFilter extends TokenFilter { protected ScorePayloadFilter(TokenStream input) { super(input); } public Token next(Token in) throws IOException { Token nextToken = input.next(in); if ( nextToken != null ) { char[] buf = nextToken.termBuffer(); int termLen = nextToken.termLength(); int posn = -1; for ( int i=0; i termLen; i++ ) if ( buf[i] == '?' ) posn = i; if ( posn 0 ) { int scorepos = posn + 1; String score = new String(buf, scorepos, termLen - scorepos); Integer scoreInt = new Integer(score); Payload payload = new Payload(); byte[] payloadBytes = new byte[4]; payload.setData(PayloadHelper.encodeInt(scoreInt, payloadBytes, 0)); nextToken.setPayload(payload); nextToken.setTermLength(posn); } } return nextToken; } } Thanks again for the help! -Ken
RE: using BoostingTermQuery
At this point, it's roll your own. That's where I'm getting bogged down - I'm confused by the various queryparser classes in lucene and solr and I'm not sure exactly what I need to override. Do you know of an example of something similar to what I'm doing that I could use as a reference? I'd love to see the BTQ in Solr (and Spans!), but I wonder if it makes sense w/o better indexing side support. I assume you are rolling your own Analyzer, right? Yup - I'm pretty sure I have that side figured out. My input contains terms marked up with a score (ie 'software?7') I just needed to create a TokenFilter that parses out the suffix and sets the Payload on the token. Spans and payloads are this huge untapped area for better search! Completely agree - we do a lot with keyword searching, and we use this type of thing in our existing search implementation. Thanks for the quick response! On Sep 23, 2008, at 5:12 PM, Ensdorf Ken wrote: Hi- I'm new to Solr, and I'm trying to figure out the best way to configure it to use BoostingTermQuery in the scoring mechanism. Do I need to create a custom query parser? All I want is the default parser behavior except to get the custom term boost from the Payload data. Thanks! -Ken -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ