RE: Question regarding synonym

2009-10-02 Thread Ensdorf Ken
 Hi
 i have a question regarding synonymfilter
 i have a one way mapping defined
 austin martin, astonmartin = aston martin
 
...
 
 Can anybody please explain if my observation is correct. This is a very
 critical aspect for my work.

That is correct - the synonym filter can recognize multi-token synonyms from 
consecutive tokens in a stream.



RE: Mixed field types and boolean searching

2009-09-25 Thread Ensdorf Ken
 No- there are various analyzers. StandardAnalyzer is geared toward
 searching bodies of text for interesting words -  punctuation is
 ripped out. Other analyzers are more useful for concrete text. You
 may have to work at finding one that leaves punctuation in.
 

My problem is not with the StandardAnalyzer per se, but more as to how dismax 
style queries are handled by the query parser when the different fields have 
different sets of ignored tokens or stop words.

Say you want to use the contents of a text box in your app and query a field in 
Solr.  The user enters A and B, so you map this to f1:A and f1:B.  Now, if 
B is an ignored token in the f1 field for whatever reason, the query boils 
down to f1:A.  

Now imagine you want to allow the user's text to match multiple fields - as in 
any term can match any field, but all terms must match at least 1 field.  So 
now you map the user's query to (f1:A OR f2:A) AND (f1:B OR f2:B).  But if f2 
does not ignore B, the query boils down to (f1:A OR f2:A) AND (f2:B).  Now 
documents that could come back when you were only matching against the f1 field 
don't come back.  

This seems counter-intuitive - to be consistent, I would think the query should 
essentially be treated as (f1:A OR f2:A) AND (TRUE OR f2:B)  - and thus a 
term that is a stop word or ignored token for any of the fields would be 
ignored across the board.

So I guess what I'm asking is if there is a reason for the existing behavior, 
or is it just a fact-of-life of the query parser?  Thanks!

-Ken


RE: Alphanumeric Wild Card Search Question

2009-09-24 Thread Ensdorf Ken
 Here's my question:
 I have some products that I want to allow people to search for with
 wild cards. For example, if my product is YBM354, I'd like for users to
 be able to search on YBM*, YBM3*, YBM35* and for any of these
 searches to return that product. I've found that I can search for
 YBM* and get the product, just not the other combinations.

Are you using WordDelimiterFilterFactory?  That would explain this behavior.

If so, do you need it - for the queries you describe you don't need that kind 
of tokenization.

Also, have you played with the analysis tool on the admin page, it is a great 
help in debugging things like this.

-Ken


RE: SolrJ question

2009-08-17 Thread Ensdorf Ken
You can escape the string with

org.apache.lucene.queryParser.QueryParser.escape(String query)

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/queryParser/QueryParser.html#escape%28java.lang.String%29



 -Original Message-
 From: ptomb...@gmail.com [mailto:ptomb...@gmail.com] On Behalf Of Paul
 Tomblin
 Sent: Monday, August 17, 2009 5:12 PM
 To: solr-user@lucene.apache.org
 Subject: SolrJ question

 If I put an object into a SolrInputDocument and store it, how do I
 query for it back?  For instance, I stored a java.net.URI in a field
 called url, and I want to query for all the documents that match a
 particular URI.  The query syntax only seems to allow Strings, and if
 I just try query.setQuery(url: + uri.toString()) I get an error
 because of the colon after http in the URI.

 I'm really new to Solr, so please let me know if I'm missing something
 basic here.

 --
 http://www.linkedin.com/in/paultomblin


RE: SolrJ question

2009-08-17 Thread Ensdorf Ken
 Does this mean I should have converted my objects to string before
 writing them to the server?


I believe SolrJ takes care of that for you by calling toString(), but you would 
need to convert explicitly when you query (and then escape).


RE: Using Lucene's payload in Solr

2009-08-13 Thread Ensdorf Ken
  It looks like things have changed a bit since this subject was last
  brought
  up here.  I see that there are support in Solr/Lucene for indexing
  payload
  data (DelimitedPayloadTokenFilterFactory and
  DelimitedPayloadTokenFilter).
  Overriding the Similarity class is straight forward.  So the last
  piece of
  the puzzle is to use a BoostingTermQuery when searching.  I think
  all I need
  to do is to subclass Solr's LuceneQParserPlugin uses SolrQueryParser
  under
  the cover.  I think all I need to do is to write my own query parser
  plugin
  that uses a custom query parser, with the only difference being in
 the
  getFieldQuery() method where a BoostingTermQuery is used instead of a
  TermQuery.

 The BTQ is now deprecated in favor of the BoostingFunctionTermQuery,
 which gives some more flexibility in terms of how the spans in a
 single document are scored.

 
  Am I on the right track?

 Yes.

  Has anyone done something like this already?


I wrote a QParserPlugin that seems to do the trick.  This is minimally tested - 
we're not actually using it at the moment, but should get you going.  Also, as 
Grant suggested, you may want to sub BFTQ for BTQ below:

package com.zoominfo.solr.analysis;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryParser.*;
import org.apache.lucene.search.*;
import org.apache.lucene.search.payloads.BoostingTermQuery;
import org.apache.solr.common.params.*;
import org.apache.solr.common.util.NamedList;
import org.apache.solr.request.SolrQueryRequest;
import org.apache.solr.search.*;

public class BoostingTermQParserPlugin extends QParserPlugin {
  public static String NAME = zoom;

  public void init(NamedList args) {
  }

  public QParser createParser(String qstr, SolrParams localParams, SolrParams 
params, SolrQueryRequest req) {
System.out.print(BoostingTermQParserPlugin::createParser\n);
return new BoostingTermQParser(qstr, localParams, params, req);
  }
}

class BoostingTermQueryParser extends QueryParser {

public BoostingTermQueryParser(String f, Analyzer a) {
super(f, a);

System.out.print(BoostingTermQueryParser::BoostingTermQueryParser\n);
}

@Override
protected Query newTermQuery(Term term){
System.out.print(BoostingTermQueryParser::newTermQuery\n);
return new BoostingTermQuery(term);
}
}

class BoostingTermQParser extends QParser {
  String sortStr;
  QueryParser lparser;

  public BoostingTermQParser(String qstr, SolrParams localParams, SolrParams 
params, SolrQueryRequest req) {
super(qstr, localParams, params, req);
System.out.print(BoostingTermQParser::BoostingTermQParser\n);
  }


  public Query parse() throws ParseException {
System.out.print(BoostingTermQParser::parse\n);
String qstr = getString();

String defaultField = getParam(CommonParams.DF);
if (defaultField==null) {
  defaultField = getReq().getSchema().getSolrQueryParser(null).getField();
}

lparser = new BoostingTermQueryParser(defaultField, 
getReq().getSchema().getQueryAnalyzer());

// these could either be checked  set here, or in the SolrQueryParser 
constructor
String opParam = getParam(QueryParsing.OP);
if (opParam != null) {
  lparser.setDefaultOperator(AND.equals(opParam) ? 
QueryParser.Operator.AND : QueryParser.Operator.OR);
} else {
  // try to get default operator from schema
  
lparser.setDefaultOperator(getReq().getSchema().getSolrQueryParser(null).getDefaultOperator());
}

return lparser.parse(qstr);
  }


  public String[] getDefaultHighlightFields() {
return new String[]{lparser.getField()};
  }

}


RE: Solr failing on y charakter in string?

2009-08-03 Thread Ensdorf Ken
 Ok still not working with new field text_two:
 str name=qtext:Har* text_two:Har*/str
 == result 0

 Schema Updates:
 
 fieldType name=text_two class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
   tokenizer class=solr.LowerCaseTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
   tokenizer class=solr.LowerCaseTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType


 field name=text_two type=text_two indexed=true stored=false
 multiValued=true/

 copyField source=text dest=text_two/
 

I'm pretty sure the query string needs to be lower-case, since a wildcard query 
is not analyzed.

I think what Avlesh was suggesting was more like this:

str name=qtext:Har text_two:har*/str

So the original field would be for a regular query containing whatever the user 
entered and would undergo the usual analysis for searching, and the secondary 
field would be used to construct a wildcard query which would strictly serve 
the begins-with case.

-Ken


RE: Boosting ('bq') on multi-valued fields

2009-07-30 Thread Ensdorf Ken

 Hey Ken,
 Thanks for your reply.
 When I wrote '5|6' I ment that this is a multiValued field with two
 values
 '5' and '6', rather than the literal string '5|6' (and any Tokenizer).
 Does
 your reply still holds? That is, are multiValued fields dependent on
 the
 notion of tokenization to such a degree so that I cant use str type
 with
 them meaningfully? if so, it seems weird to me that I should be able to
 define a str multiValued field to begin with..

I'm pretty sure you can use multiValued string fields in the way you are 
describing.  If you just do a query without the boost do documents with 
multiple values come back?  That would at least tell you whether the problem 
was matching on the term itself or something to do with your use of boosts.

-Ken


RE: Range Query question

2009-07-30 Thread Ensdorf Ken
 The problem is that the indexed form of this XML is flattened so the
 car
 entity has 2 garage names, 2 min values and 2 max values, but the
 grouping
 between the garage name and it's min and max values is lost.  The
 danger is
 that we end up doing a comparison of the min-of-the-mins and the
 max-of-the-maxes, which tells us that a car is available in the price
 range
 which may not be true if garage1 has all cars below our search range
 and
 garage2 has all cars above our search range, e.g. if our search range
 is
 5000-6000 then we should get no match.

You could index each garage-car pairing as a separate document, embedding all 
the necessary information you need for searching.

e.g.-

garage_car
car_manufacturerFord/manufacturer
car_modelKa/model
   garage_namegarage1/name
   min2000/min
   max4000/max
/garage_car


RE: Boosting ('bq') on multi-valued fields

2009-07-29 Thread Ensdorf Ken
 Hey,
 I have a field defined as such:

  field name=site_idtype=string indexed=true
 stored=false
 multiValued=true /

 with the string type defined as:

 fieldtype name=string class=solr.StrField sortMissingLast=true
 omitNorms=true/

 When I try using some query-time boost parameters using the bq on
 values of
 this field it seems to behave
 strangely in case of documents actually having multiple values:
 If i'd do a boost for a particular value ( site_id:5^1.1 ) it seems
 like
 all the cases where this field is actually
 populated with multiple ones ( i.e a document with field value 5|6 )
 do
 not get boosted at all. I verified this using
 debugQuery  explainOther=doc_id:document_with_multiple_values.
 is this a known issue/bug? any work arounds? (i'm using a nightly solr
 build
 from a few months back.. )

There is no tokenization on 'string' fields, so a query for 5 does not match 
a doc with a value of 5|6 for this field.  You could try  using field type 
'text' for this and see what you get.  You may need to customize it to you the 
StandardAnalyzer or WordDelimiterFilterFactory to get the right behavior.  
Using the analysis tool in the solr admin UI to experiment will probably be 
helpful.

-Ken




RE: multi-word synonyms with multiple matches

2009-07-20 Thread Ensdorf Ken
 You haven't given us the full details on how you are using the
 SynonymFilterFactory (expand true or false?) but in general: yes the
 SynonymFilter finds the longest match it can.

Sorry - doing expansion at index time:
filter class=solr.SynonymFilterFactory synonyms=title_synonyms.txt 
ignoreCase=true expand=true/


 if every svp is also a vp, then being explict in your synonyms (when
 doing
 index time expansion) should work...

 vp,vice president
 svp,senior vice president=vp,svp,senior vice president

That worked - thanks!



RE: nested dismax queries

2009-06-29 Thread Ensdorf Ken
 Filter queries with arbitrary text values may swamp the cache in 1.3.

Are you implying this won't happen in 1.4?  Can you point me to the feature 
that would mitigate this?


 Otherwise, the combinations aren't infinite. Keep the filters seperate
 in order to limit their number. Specify two simple filters instead of
 one composite filter, fq=x:bla and fq=y:blub instead of fq=x:bla
 AND y:blub. See:

 filterCache/@size, queryResultCache/@size, documentCache/@size
 http://markmail.org/thread/tb6aanicpt43okcm

 Michael Ludwig

That's what I was thinking would make the most sense, assuming the intersection 
of the cached bitmaps is efficient enough.  Thanks for the reply.

-Ken


RE: nested dismax queries

2009-06-29 Thread Ensdorf Ken


  Filter queries with arbitrary text values may swamp the cache in
 1.3.
 
  Are you implying this won't happen in 1.4?

 I intended to say just this, but I was on the wrong track.

  Can you point me to the feature that would mitigate this?

 What I was thinking of is the following:

 [#SOLR-475] multi-valued faceting via un-inverted field
 https://issues.apache.org/jira/browse/SOLR-475

 But as you can see, this refers to faceting on multi-valued fields, not
 to filter queries with arbitrary text. I was off on a tangent. Sorry.

 To get back to your initial mail, I tend to think that drop-down boxes
 (the values of which you control) are a nice match for the filter
 query,
 whereas user-entered text is more likely to be a candidate for the main
 query.

 Michael Ludwig

I agree, which brings me back tot the issue of combining dismax with standard 
queries.  It looks like we may need to create a custom query parser to get 
optimal performance.  Thanks again.




RE: Question about index sizes.

2009-06-23 Thread Ensdorf Ken
That's a great question.  And the answer is, of course, it depends.  Mostly on 
the size of the documents you are indexing.  50 million rows from a database 
table with a handful of columns is very different from 50 million web pages,  
pdf documents, books, etc.

We currently have about 50 million documents split across 2 servers with 
reasonable performance - sub-second response time in most cases.  The total 
size of the 2 indices is about 300G.  I'd say most of the size is from stored 
fields, though we index just about everything.  This is on 64-bit ubuntu boxes 
with 32G of memory.  We haven't pushed this into production yet, but initial 
load-testing results look promising.

Hope this helps!

 -Original Message-
 From: Jim Adams [mailto:jasolru...@gmail.com]
 Sent: Tuesday, June 23, 2009 1:24 PM
 To: solr-user@lucene.apache.org
 Subject: Question about index sizes.

 Can anyone give me a rule of thumb for knowing when you need to go to
 multicore or shards?  How many records can be in an index before it
 breaks
 down?  Does it break down?  Is it 10 million? 20 million?  50 million?

 Thanks, Jim


multi-word synonyms with multiple matches

2009-06-22 Thread Ensdorf Ken
We have a field with index-time synonyms called title.  Among the entries in 
the synonyms file are

vp,vice president
svp,senior vice president

However, a search for vp does not return results where the title is senior 
vice president.  It appears that the term vp is not indexed when there is a 
longer string that matches a different synonym.  Is this by design, and is 
there any way to make solr index all synonyms that match a term, even if it is 
contained in a longer synonym?  Thanks!

-Ken



nested dismax queries

2009-06-19 Thread Ensdorf Ken
The recent discussion of filter queries has got me thinking about other ways to 
improve performance of our app.  We have an index with a lot of fields and we 
support both single-search-box style queries using DisMax and fielded search 
using the standard query handler.  We also support using both strategies in the 
same search.

For exmaple, a user might enter Alabama Biotechnology in the main search box, 
triggering a dismax request which returns lots of different types of results.  
They may then want to refine their search by selecting a specific industry from 
a drop-down box.  We handle this by adding a filterquery (fq=) to the original 
query.  We have dozens of additional fields like this - some with a finite set 
of discrete values, some with arbitrary text values.  The combinations are 
infinite, and I'm worried we will overwhelm the filterCache by supporting all 
of these cases as filter queries.

I'm investigating nested queries as an alternative way to support this type of 
hybrid-search.  It appears that this only works when the top-level request 
query is a standard lucene-style query and the nested query is a dismax, and 
not the other way arround - correct me if I am wrong here.  It also appears 
that what is specified in the {!xxx} as the nested query type must be an actual 
query type and not the name of a request handler defined in solrconfig.xml.  
Thus it would seem that the nested query string must supply all of the default 
parameters for a dismax request.  Is this correct?  Is there another approach 
that I am missing?  I suppose I could create a new query parser class that 
would supply the defaults, but that seems like overkill.

Any comments are welcome, I just want to know that I am not completely off 
track and there isn't some really simple way to achieve this that I have 
overlooked.  Thanks all!

-Ken


RE: Urgent | Query Issue with Dismax | Please help

2009-06-19 Thread Ensdorf Ken

 ?q=facetFormat_product_s:Pfqs ePub eBook Sfqsqt=dismaxrequest - dose
 not
 return results,
 although field facetFormat_product_s is defined in dismaxrequest
 Handler of
 solrconfig.xml

When you use the dismax handler, you don't need to specify the field in the 
query string.  It's meant to be used as a natural language parser with minimal 
special syntax.


RE: fq vs. q

2009-06-12 Thread Ensdorf Ken


 -Original Message-
 From: Fergus McMenemie [mailto:fer...@twig.me.uk]
 Sent: Friday, June 12, 2009 3:41 PM
 To: solr-user@lucene.apache.org
 Subject: Re: fq vs. q

 On Fri, Jun 12, 2009 at 7:09 PM, Michael Ludwig m...@as-guides.com
 wrote:
 
  I've summarized what I've learnt about filter queries on this page:
 
  http://wiki.apache.org/solr/FilterQueryGuidance
 
 
 Wow! This is great! Thanks for taking the time to write this up
 Michael.
 
 I've added a section on analysis, scoring and faceting aspects.
 

+1 definitely a great article

I ran into this very issue recently as we are using a freshness filter for 
our data that can be 6//12/18 months etc.  I discovered that even though we 
were only indexing with day-level granularity, we were specifying the query by 
computing a date down to the second and thus virutally every filter was unique. 
 It's amazing how something this simple could bring solr to it's knees on a 
large data set.  By simply changing the filter to date:[NOW-18MONTHS TO NOW] 
or equivalent, the problem vanishes.

It does bring up an interestion question though - how is NOW treated wrt to 
the cache key?  Does solr translate it to a date first?  If so, how does it 
determine the granularity?  If not, is there any mechanism to flush the cache 
when the corresponding result set changes?

-Ken


RE: Filtering query terms

2009-05-22 Thread Ensdorf Ken
 When I try testing the filter solr.LowerCaseFilterFactory I get
 different results calling the following urls:

  1. http://[server-ip]:[server-port]/solr/[core-
 name]/select/?q=all%3Apapaversion=2.2start=0rows=10indent=on
  2. http://[server-ip]:[server-port]/solr/[core-
 name]/select/?q=all%3APaPaversion=2.2start=0rows=10indent=on

In this case, the WordDelimiterFilterFactory is kicking in on your second 
search, so APaPa is split into APa and Pa.  You can double-check this by 
using the analysis tool in the admin UI - 
http://localhost:8983/solr/admin/analysis.jsp


 Besides, when trying to test the solr.ISOLatin1AccentFilterFactory I
 get different results calling the following urls:

  1. http://[server-ip]:[server-port]/solr/[core-
 name]/select/?q=all%3Apapaversion=2.2start=0rows=10indent=on
  2. http://[server-ip]:[server-port]/solr/[core-
 name]/select/?q=all%3Apapàversion=2.2start=0rows=10indent=on

Not sure what it happening here, but again I would check it with the analysi 
tool


RE: Incorrect sort with with function query in query parameters

2009-05-18 Thread Ensdorf Ken

 A Unit test would be ideal, but even if you can just provide a list of
 steps (ie: using this solrconfig+schema, index these docs, then update
 this one doc, then execute this search) it can help people track things
 down.

 Please open a bug and attach as much detail as you can there.


 -Hoss

Was a bug ever opened on this?  I am seeing similar behavior (though in my case 
it's the debug scores that look wrong).

-Ken



RE: facet results in order of rank

2009-04-30 Thread Ensdorf Ken
 Hello Solrites (or Solrorians)

I prefer Solrdier :)


 Is it possible to get the average ranking score for a set of docs that
 would be returned for a given facet value.

 If not in SOLR, what about Lucene?

 How hard to implement?

 I have years of Java experience, but no Lucene coding experience.

 Would be happy to implement if someone could guide me.

 thanks
 Gene


I don't know much about the implementation, but it seems to me it should be 
possible to sum up the scores as the matching facet terms are gathered and 
counted.  According to the docs there are 2 algorithms that do this - one 
enumerates all the unique values of the facet field and does an intersetion 
with the query, and the other scans the result set and sums up the unique 
values in the facet field for each doc.  I would start by looking at the source 
for the FacetComponent (org.apache.solr.handler.component) and SimpleFacets 
(org.apache.solr.request) classes.

Sorry I can't be of more help - it seems like an interesting challenge!

Onward...
-Ken


RE: Sorting dates with reduced precision

2009-04-23 Thread Ensdorf Ken
  Yes, but dates are fairly spesific, say 06:45 Nov. 2 , 2009. What if
 I
  want to say Sort so that withing entries for Nov. 2 , you sort by
  relevance for example?
 
 
  Append /DAY to the date value you index, for example
 
  1995-12-31T23:59:59Z/DAY will yield 1995-12-31
 
  So that all documents with the same date will then be sorted by
 relevance or whatever you specify as the next criteria in the sort
 parameter.
 Thanks, this happens at indexing time?

Yes


RE: storing xml - how to highlight hits in response?

2009-04-23 Thread Ensdorf Ken
 Hi,

 I'm storing some raw xml in solr (stored and non-tokenized). I'd like
 to
 highlight hits in the response, obviously this is problematic as the
 highlighting elements are also xml. So if I match an attribute value or
 tag
 name, the xml response is messed up. Is there a way to highlight only
 text,
 that is not part of an xml element? As in, only the text content?

You could create a custom Analyzer or Tokenizer that strips everything but the 
text content.

-Ken



RE: modify SOLR scoring

2009-04-23 Thread Ensdorf Ken
I believe you can use a function query to do this:

http://wiki.apache.org/solr/FunctionQuery

if you embed the following in your query, you should get a boost for more 
recent date values:

_val_:ord(dateField)

Where dateField is the field name of the date you want to use.

 -Original Message-
 From: Bertrand DUMAS-PILHOU [mailto:bdum...@eurocortex.fr]
 Sent: Thursday, April 23, 2009 3:44 PM
 To: solr-user@lucene.apache.org
 Subject: modify SOLR scoring


 Hi everybody,

 I'm using SOLR with a schema (for example) like this:
 parutiondate, date, indexed, not stored
 fulltext, stemmed, indexed, not stored

 I know it's possible to order by a field or more, but I want to order
 by
 score and modify the scrore formula.
 I'll want keep the SOLR score but add a new parameter in the formula to
 boost the score of the most recent document.

 What is the best way to do this ?

 Thanks.

 Excuse for my english.


 --
 View this message in context: http://www.nabble.com/modify-SOLR-
 scoring-tp23198326p23198326.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: storing xml - how to highlight hits in response?

2009-04-23 Thread Ensdorf Ken
 Yeah great idea, thanks. Does anyone know if there is code out there
 that
 will do this sort of thing?


Perhaps a much simpler option would be to use this:

http://lucene.apache.org/solr/api/org/apache/solr/analysis/PatternReplaceFilterFactory.html

with a regex of [^]* or something like that - I'm no regex expert.  Of 
course it could get tricky to handle escaped characters and the like, but it 
may be a good enough poor man's solution.

-Ken



RE: Sorting dates with reduced precision

2009-04-22 Thread Ensdorf Ken
 Yes, but dates are fairly spesific, say 06:45 Nov. 2 , 2009. What if I
 want to say Sort so that withing entries for Nov. 2 , you sort by
 relevance for example?


Append /DAY to the date value you index, for example

1995-12-31T23:59:59Z/DAY will yield 1995-12-31

So that all documents with the same date will then be sorted by relevance or 
whatever you specify as the next criteria in the sort parameter.





RE: Highlight question

2009-04-22 Thread Ensdorf Ken
Add the following parameters to the url:

hl=truehl.fl=xhtml

http://wiki.apache.org/solr/HighlightingParameters



 -Original Message-
 From: Bertrand DUMAS-PILHOU [mailto:bdum...@eurocortex.fr]
 Sent: Wednesday, April 22, 2009 4:43 PM
 To: solr-user@lucene.apache.org
 Subject: Highlight question


 Hi everybody,

 I have an schema seems like this in SOLR:
 title, type:string , indexed not stored
 body, type:string, stemmed, indexed not stored
 xhtml, type:string, not indexed, stored

 When user make an search on field title, body or both, I want to
 highlight
 the match string in the xhtml field only.

 How I can do this ?

 Thanks and sorry for my english.
 --
 View this message in context: http://www.nabble.com/Highlight-question-
 tp23175851p23175851.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: Sort by distance from location?

2009-04-21 Thread Ensdorf Ken
I've never used them personally, but I think a function query would suit you 
here.  Function queries allow you to define a custom function as a component of 
the score of a result document.  Define a distance function based on the user's 
current location and the that of the search result, such that the shorter the 
distance, the higher the function output.  This will boost results inversely 
proportional to the distance from the user.

-Ken

 -Original Message-
 From: Development Team [mailto:dev.and...@gmail.com]
 Sent: Tuesday, April 14, 2009 5:32 PM
 To: solr-user@lucene.apache.org
 Subject: Sort by distance from location?

 Hi everybody,
  My index has latitude/longitude values for locations. I am
 required to
 do a search based on a set of criteria, and order the results based on
 how
 far the lat/long location is to the current user's location. Currently
 we
 are emulating such a search by adding criteria of ever-widening
 bounding
 boxes, and the more of those boxes match the document, the higher the
 score
 and thus the closer ones appear at the start of the results. The query
 looks
 something like this (newlines between each search term):

 +criteraOne:1
 +criteriaTwo:true
 +latitude:[-90.0 TO 90.0] +longitude:[-180.0 TO 180.0]
 (latitude:[40.52 TO 40.81] longitude:[-74.17 TO -73.79])
 (latitude:[40.30 TO 41.02] longitude:[-74.45 TO -73.51])
 (latitude:[39.94 TO 41.38] longitude:[-74.93 TO -73.03])
 [[...etc...about 10 times...]]

  Naturally this is quite slow (query is approximately 6x slower
 than
 normal), and... I can't help but feel that there's a more elegant way
 of
 sorting by distance.
  Does anybody know how to do this or have any suggestions?

 Sincerely,

  Daryl.


RE: sub skus with colour and size

2008-10-09 Thread Ensdorf Ken


 Every product we have comes in colour and size combinations,
 I need to do a
 faceted search on these that allows for colour and size and
 various other
 fields. A single product may have multiple colours and multiple sizes.

 For example a style might be available in black size 12, but
 also have other
 sizes in red. If someone searches for red and size 12, it
 should not bring
 the product as that combination is not possible.

I'm no expert, but one way to do this would be to have a multi-valued field 
with all the possible combinations, eg if you have the following in your data:

color
valuered/value
sizes10,12/sizes
/color
color
valueblack/value
sizes8,10/sizes
/color

you could create a solr doc with a mulitvalued color field:

colorcolor_red size_10 size_12/color
colorcolor_black size_8 size_10/color

Then if you set the positionIncrementGap in your schema to a sufficiently 
high value (say 1000), you can use the following query to search for a color 
size combination:

color:color_red size_10~1000

which executes a phrase search with a slop factor of 1000, ensuring it won't 
cross the field boundary

hope this helps!
-Ken


RE: using BoostingTermQuery

2008-09-24 Thread Ensdorf Ken

 I'm no QueryParser expert, but I would probably start w/ the default
 query parser in Solr (LuceneQParser), and then progress a bit to the
 DisMax one.  I'd ask specific questions based on what you see there.
 If you get far enough along, you may consider asking for help on the
 java-user list as well.

Thanks - I think I've got it working now.  I ended up subclassing QueryParser 
and overriding newTermQuery() to create a BoostingTermQuery instead of a plain 
ol' TermQuery.  Seems to work.

  Yup - I'm pretty sure I have that side figured out.  My input
  contains terms marked up with a score (ie 'software?7')  I just
  needed to create a TokenFilter that parses out the suffix and sets
  the Payload on the token.

 Cool.  Patch?

Not sure how valuable it is - all I did was create a new subclass of 
TokenFilter.  Here's the code fwiw:

public class ScorePayloadFilter extends TokenFilter {

protected ScorePayloadFilter(TokenStream input) {
super(input);
}

public Token next(Token in) throws IOException {
Token nextToken = input.next(in);

if ( nextToken != null )
{
char[] buf = nextToken.termBuffer();
int termLen = nextToken.termLength();
int posn = -1;

for ( int i=0; i  termLen; i++ )
if ( buf[i] == '?' )
posn = i;

if ( posn  0 )
{
int scorepos = posn + 1;
String score = new String(buf, scorepos, 
termLen - scorepos);
Integer scoreInt = new Integer(score);
Payload payload = new Payload();

byte[] payloadBytes = new byte[4];

payload.setData(PayloadHelper.encodeInt(scoreInt, payloadBytes, 0));
nextToken.setPayload(payload);
nextToken.setTermLength(posn);
}
}
return nextToken;
}
}

Thanks again for the help!

-Ken


RE: using BoostingTermQuery

2008-09-23 Thread Ensdorf Ken

 At this point, it's roll your own.

That's where I'm getting bogged down - I'm confused by the various queryparser 
classes in lucene and solr and I'm not sure exactly what I need to override.  
Do you know of an example of something similar to what I'm doing that I could 
use as a reference?

 I'd love to see the BTQ in Solr
 (and Spans!), but I wonder if it makes sense w/o better indexing side
 support.  I assume you are rolling your own Analyzer, right?

Yup - I'm pretty sure I have that side figured out.  My input contains terms 
marked up with a score (ie 'software?7')  I just needed to create a TokenFilter 
that parses out the suffix and sets the Payload on the token.

  Spans and payloads are this huge untapped area for better search!

Completely agree - we do a lot with keyword searching, and we use this type of 
thing in our existing search implementation.  Thanks for the quick response!

 On Sep 23, 2008, at 5:12 PM, Ensdorf Ken wrote:

  Hi-
 
  I'm new to Solr, and I'm trying to figure out the best way to
  configure it to use BoostingTermQuery in the scoring mechanism.  Do
  I need to create a custom query parser?  All I want is the default
  parser behavior except to get the custom term boost from the Payload
  data.  Thanks!
 
  -Ken

 --
 Grant Ingersoll
 http://www.lucidimagination.com

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ