Re: Token Counter
Faceting will do this for you. Check out: http://wiki.apache.org/solr/SimpleFacetParameters#facet.field This param allows you to specify a field which should be treated as a facet. > It will iterate over each Term in the field and generate a facet count using > that Term as the constraint. > > For a text field, it actually does go over each of the indexed tokens. On Mon, Jan 10, 2011 at 10:11 AM, supersoft wrote: > > As I understand, a faceted search would be useful if keywords is a > multivalued field and the its field value is just a token. > > I want to display the occurences of the tokens wich appear in a indexed > (and > stored) text field. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Token-Counter-tp2227795p2228991.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Return Lucene DocId in Solr Results
Take this with a sizeable grain of salt as I haven't actually tried doing this. But you might try using an IndexReader which it looks like you can get from this class: http://lucene.apache.org/solr/api/org/apache/solr/core/StandardIndexReaderFactory.html sasank On Tue, Nov 30, 2010 at 6:45 AM, Lohrenz, Steven wrote: > Hmm, I found some similar queries on stackoverflow and they did not > recommend exposing the lucene docId. > > So, I guess my question becomes: What is the best way, from within my > custom QParser, to take a list of solr primary keys (that were retrieved > from elsewhere) and turn them into docIds? I also saw something about > cacheing them using a Field Cache - how would I do that? > > Thanks, > Steve > > -Original Message- > From: Lohrenz, Steven [mailto:steven.lohr...@hmhpub.com] > Sent: 30 November 2010 11:57 > To: solr-user@lucene.apache.org > Subject: Return Lucene DocId in Solr Results > > Hi, > > I was wondering how I would go about getting the lucene docid included in > the results from a solr query? > > I've built a QueryParser to query another solr instance and and join the > results of the two instances through the use of a Filter. The Filter needs > the lucene docid to work. This is the only bit I'm missing right now. > > Thanks, > Steve > >
Reading Solr Index directly
Hi, I've been poking around the JavaDocs a bit, and it looks like it's possible to directly read the index using the Solr Java API. Hoping to clarify a couple of things -- 1) Do I need to read the index with Solr APIs, or can I use Lucene (PyLucene is particularly attractive...)? If so, how wary should I be about the Lucene version number? 2) Is there anything I should worry about in terms of opening a read-only reader against an active Solr instance? Or will this just block? 3) Anything else that jumps out at gotchas? I couldn't find any pages about how to do this. I'm happy to compile any responses for inclusion on the Solr wiki. thanks! sasank
Re: Highlighter - multiple instances of term being combined
Ahh this reconfirms. The analyzers are properly pulling things apart. There are two instances of the query keyword with words between them. But from your last comment, it sounds like the system's not trying to do any sort of phrase highlighting, but is just hitting a weird edge case? I'm seeing this behavior somewhat commonly, so I thought for sure there must be some option that says if two highlighted words are sufficiently close together, highlight them as a single phrase. On Tue, Nov 9, 2010 at 7:11 PM, Lance Norskog wrote: > Have you looked at solr/admin/analysis.jsp? This is 'Analysis' link > off the main solr admin page. It will show you how text is broken up > for both the indexing and query processes. You might get some insight > about how these words are torn apart and assigned positions. Trying > the different Analyzers and options might get you there. > > But to be frank- highlighting is a tough problem and has always had a > lot of edge cases. > > On Tue, Nov 9, 2010 at 6:08 PM, Sasank Mudunuri wrote: > > I'm finding that if a keyword appears in a field multiple times very > close > > together, it will get highlighted as a phrase even though there are other > > terms between the two instances. So this search: > > > > http://localhost:8983/solr/select/? > > > > hl=true& > > hl.snippets=1& > > q=residue& > > hl.fragsize=0& > > mergeContiguous=false& > > indent=on& > > hl.usePhraseHighlighter=false& > > debugQuery=on& > > hl.fragmenter=gap& > > hl.highlightMultiTerm=false > > > > Highlights as: > > What does "low-residue" mean? Like low-residue diet? > > > > Trying to get it to highlight as: > > What does "low-residue" mean? Like low-residue diet? > > I've tried playing with various combinations of mergeContiguous, > > highlightMultiTerm, and usePhraseHighlighter, but they all yield the same > > output. > > > > For reference, field type uses a StandardTokenizerFactory and > > SynonymFilterFactory, StopFilterFactory, StandardFilterFactory and > > SnowballFilterFactory. I've confirmed that the intermediate words don't > > appear in either the synonym or the stop words list. I can post the full > > definition if helpful. > > > > Any pointers as to how to debug this would be greatly appreciated! > > sasank > > > > > > -- > Lance Norskog > goks...@gmail.com >
Highlighter - multiple instances of term being combined
I'm finding that if a keyword appears in a field multiple times very close together, it will get highlighted as a phrase even though there are other terms between the two instances. So this search: http://localhost:8983/solr/select/? hl=true& hl.snippets=1& q=residue& hl.fragsize=0& mergeContiguous=false& indent=on& hl.usePhraseHighlighter=false& debugQuery=on& hl.fragmenter=gap& hl.highlightMultiTerm=false Highlights as: What does "low-residue" mean? Like low-residue diet? Trying to get it to highlight as: What does "low-residue" mean? Like low-residue diet? I've tried playing with various combinations of mergeContiguous, highlightMultiTerm, and usePhraseHighlighter, but they all yield the same output. For reference, field type uses a StandardTokenizerFactory and SynonymFilterFactory, StopFilterFactory, StandardFilterFactory and SnowballFilterFactory. I've confirmed that the intermediate words don't appear in either the synonym or the stop words list. I can post the full definition if helpful. Any pointers as to how to debug this would be greatly appreciated! sasank
Re: Searching for Documents by Indexed Term
That looks very promising based on a couple of quick queries. Any objections if I move the javadoc help into the wiki, specifically: Create a term query from the input value without any text analysis or > transformation whatsoever. This is useful in debugging, or when raw terms > are returned from the terms component (this is not the default). Thanks Eric! sasank On Wed, Oct 20, 2010 at 6:00 PM, Erick Erickson wrote: > This may be a wild herring, but have you tried "raw"? NOTE: I'm a little > out of my depth here on what this actually does, so don't waste time by > thinking I'm an authority on this one. See: > > http://lucene.apache.org/solr/api/org/apache/solr/search/RawQParserPlugin.html > > and > http://wiki.apache.org/solr/SolrQuerySyntax > (this last under "built in query parsers"). > > HTH > Erick > > On Wed, Oct 20, 2010 at 1:47 PM, Sasank Mudunuri wrote: > > > Hi Solr Users, > > > > I used the TermsComponent to walk through all the indexed terms and find > > ones of particular interest (named entities). And now, I'd like to search > > for documents that contain these particular entities. I have both > > query-time > > and index-time stemming set for the field, which means I can't just hit > the > > normal search handler because as I understand, it will stem the > > already-stemmed term. Any ideas about how to search directly for the > > indexed > > term? Maybe something I can do at query-time to disable stemming? > > > > Thanks! > > sasank > > >
Searching for Documents by Indexed Term
Hi Solr Users, I used the TermsComponent to walk through all the indexed terms and find ones of particular interest (named entities). And now, I'd like to search for documents that contain these particular entities. I have both query-time and index-time stemming set for the field, which means I can't just hit the normal search handler because as I understand, it will stem the already-stemmed term. Any ideas about how to search directly for the indexed term? Maybe something I can do at query-time to disable stemming? Thanks! sasank