Re: Token Counter

2011-01-10 Thread Sasank Mudunuri
Faceting will do this for you. Check out:
http://wiki.apache.org/solr/SimpleFacetParameters#facet.field

This param allows you to specify a field which should be treated as a facet.
> It will iterate over each Term in the field and generate a facet count using
> that Term as the constraint.
>
>
For a text field, it actually does go over each of the indexed tokens.


On Mon, Jan 10, 2011 at 10:11 AM, supersoft  wrote:

>
> As I understand, a faceted search would be useful if keywords is a
> multivalued field and the its field value is just a token.
>
> I want to display the occurences of the tokens wich appear in a indexed
> (and
> stored) text field.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Token-Counter-tp2227795p2228991.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Return Lucene DocId in Solr Results

2010-12-01 Thread Sasank Mudunuri
Take this with a sizeable grain of salt as I haven't actually tried doing
this. But you might try using an IndexReader which it looks like you can get
from this class:

http://lucene.apache.org/solr/api/org/apache/solr/core/StandardIndexReaderFactory.html

sasank

On Tue, Nov 30, 2010 at 6:45 AM, Lohrenz, Steven
wrote:

> Hmm, I found some similar queries on stackoverflow and they did not
> recommend exposing the lucene docId.
>
> So, I guess my question becomes: What is the best way, from within my
> custom QParser, to take a list of solr primary keys (that were retrieved
> from elsewhere) and turn them into docIds? I also saw something about
> cacheing them using a Field Cache - how would I do that?
>
> Thanks,
> Steve
>
> -Original Message-
> From: Lohrenz, Steven [mailto:steven.lohr...@hmhpub.com]
> Sent: 30 November 2010 11:57
> To: solr-user@lucene.apache.org
> Subject: Return Lucene DocId in Solr Results
>
> Hi,
>
> I was wondering how I would go about getting the lucene docid included in
> the results from a solr query?
>
> I've built a QueryParser to query another solr instance and and join the
> results of the two instances through the use of a Filter.  The Filter needs
> the lucene docid to work. This is the only bit I'm missing right now.
>
> Thanks,
> Steve
>
>


Reading Solr Index directly

2010-11-17 Thread Sasank Mudunuri
Hi,

I've been poking around the JavaDocs a bit, and it looks like it's possible
to directly read the index using the Solr Java API. Hoping to clarify a
couple of things --

1) Do I need to read the index with Solr APIs, or can I use Lucene (PyLucene
is particularly attractive...)? If so, how wary should I be about the Lucene
version number?

2) Is there anything I should worry about in terms of opening a read-only
reader against an active Solr instance? Or will this just block?

3) Anything else that jumps out at gotchas?

I couldn't find any pages about how to do this. I'm happy to compile any
responses for inclusion on the Solr wiki.

thanks!
sasank


Re: Highlighter - multiple instances of term being combined

2010-11-10 Thread Sasank Mudunuri
Ahh this reconfirms. The analyzers are properly pulling things apart. There
are two instances of the query keyword with words between them. But from
your last comment, it sounds like the system's not trying to do any sort of
phrase highlighting, but is just hitting a weird edge case? I'm seeing this
behavior somewhat commonly, so I thought for sure there must be some option
that says if two highlighted words are sufficiently close together,
highlight them as a single phrase.

On Tue, Nov 9, 2010 at 7:11 PM, Lance Norskog  wrote:

> Have you looked at solr/admin/analysis.jsp? This is 'Analysis' link
> off the main solr admin page. It will show you how text is broken up
> for both the indexing and query processes. You might get some insight
> about how these words are torn apart and assigned positions. Trying
> the different Analyzers and options might get you there.
>
> But to be frank- highlighting is a tough problem and has always had a
> lot of edge cases.
>
> On Tue, Nov 9, 2010 at 6:08 PM, Sasank Mudunuri  wrote:
> > I'm finding that if a keyword appears in a field multiple times very
> close
> > together, it will get highlighted as a phrase even though there are other
> > terms between the two instances. So this search:
> >
> > http://localhost:8983/solr/select/?
> >
> > hl=true&
> > hl.snippets=1&
> > q=residue&
> > hl.fragsize=0&
> > mergeContiguous=false&
> > indent=on&
> > hl.usePhraseHighlighter=false&
> > debugQuery=on&
> > hl.fragmenter=gap&
> > hl.highlightMultiTerm=false
> >
> > Highlights as:
> > What does "low-residue" mean? Like low-residue diet?
> >
> > Trying to get it to highlight as:
> > What does "low-residue" mean? Like low-residue diet?
> > I've tried playing with various combinations of mergeContiguous,
> > highlightMultiTerm, and usePhraseHighlighter, but they all yield the same
> > output.
> >
> > For reference, field type uses a StandardTokenizerFactory and
> > SynonymFilterFactory, StopFilterFactory, StandardFilterFactory and
> > SnowballFilterFactory. I've confirmed that the intermediate words don't
> > appear in either the synonym or the stop words list. I can post the full
> > definition if helpful.
> >
> > Any pointers as to how to debug this would be greatly appreciated!
> > sasank
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Highlighter - multiple instances of term being combined

2010-11-09 Thread Sasank Mudunuri
I'm finding that if a keyword appears in a field multiple times very close
together, it will get highlighted as a phrase even though there are other
terms between the two instances. So this search:

http://localhost:8983/solr/select/?

hl=true&
hl.snippets=1&
q=residue&
hl.fragsize=0&
mergeContiguous=false&
indent=on&
hl.usePhraseHighlighter=false&
debugQuery=on&
hl.fragmenter=gap&
hl.highlightMultiTerm=false

Highlights as:
What does "low-residue" mean? Like low-residue diet?

Trying to get it to highlight as:
What does "low-residue" mean? Like low-residue diet?
I've tried playing with various combinations of mergeContiguous,
highlightMultiTerm, and usePhraseHighlighter, but they all yield the same
output.

For reference, field type uses a StandardTokenizerFactory and
SynonymFilterFactory, StopFilterFactory, StandardFilterFactory and
SnowballFilterFactory. I've confirmed that the intermediate words don't
appear in either the synonym or the stop words list. I can post the full
definition if helpful.

Any pointers as to how to debug this would be greatly appreciated!
sasank


Re: Searching for Documents by Indexed Term

2010-10-20 Thread Sasank Mudunuri
That looks very promising based on a couple of quick queries. Any objections
if I move the javadoc help into the wiki, specifically:

Create a term query from the input value without any text analysis or
> transformation whatsoever. This is useful in debugging, or when raw terms
> are returned from the terms component (this is not the default).


Thanks Eric!
sasank

On Wed, Oct 20, 2010 at 6:00 PM, Erick Erickson wrote:

> This may be a wild herring, but have you tried "raw"? NOTE: I'm a little
> out of my depth here on what this actually does, so don't waste time by
> thinking I'm an authority on this one. See:
>
> http://lucene.apache.org/solr/api/org/apache/solr/search/RawQParserPlugin.html
>
> and
> http://wiki.apache.org/solr/SolrQuerySyntax
> (this last under "built in query parsers").
>
> HTH
> Erick
>
> On Wed, Oct 20, 2010 at 1:47 PM, Sasank Mudunuri  wrote:
>
> > Hi Solr Users,
> >
> > I used the TermsComponent to walk through all the indexed terms and find
> > ones of particular interest (named entities). And now, I'd like to search
> > for documents that contain these particular entities. I have both
> > query-time
> > and index-time stemming set for the field, which means I can't just hit
> the
> > normal search handler because as I understand, it will stem the
> > already-stemmed term. Any ideas about how to search directly for the
> > indexed
> > term? Maybe something I can do at query-time to disable stemming?
> >
> > Thanks!
> > sasank
> >
>


Searching for Documents by Indexed Term

2010-10-20 Thread Sasank Mudunuri
Hi Solr Users,

I used the TermsComponent to walk through all the indexed terms and find
ones of particular interest (named entities). And now, I'd like to search
for documents that contain these particular entities. I have both query-time
and index-time stemming set for the field, which means I can't just hit the
normal search handler because as I understand, it will stem the
already-stemmed term. Any ideas about how to search directly for the indexed
term? Maybe something I can do at query-time to disable stemming?

Thanks!
sasank