Re: Token Counter

2011-01-10 Thread Sasank Mudunuri
Faceting will do this for you. Check out:
http://wiki.apache.org/solr/SimpleFacetParameters#facet.field

This param allows you to specify a field which should be treated as a facet.
 It will iterate over each Term in the field and generate a facet count using
 that Term as the constraint.


For a text field, it actually does go over each of the indexed tokens.


On Mon, Jan 10, 2011 at 10:11 AM, supersoft elarab...@gmail.com wrote:


 As I understand, a faceted search would be useful if keywords is a
 multivalued field and the its field value is just a token.

 I want to display the occurences of the tokens wich appear in a indexed
 (and
 stored) text field.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Token-Counter-tp2227795p2228991.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Return Lucene DocId in Solr Results

2010-12-01 Thread Sasank Mudunuri
Take this with a sizeable grain of salt as I haven't actually tried doing
this. But you might try using an IndexReader which it looks like you can get
from this class:

http://lucene.apache.org/solr/api/org/apache/solr/core/StandardIndexReaderFactory.html

sasank

On Tue, Nov 30, 2010 at 6:45 AM, Lohrenz, Steven
steven.lohr...@hmhpub.comwrote:

 Hmm, I found some similar queries on stackoverflow and they did not
 recommend exposing the lucene docId.

 So, I guess my question becomes: What is the best way, from within my
 custom QParser, to take a list of solr primary keys (that were retrieved
 from elsewhere) and turn them into docIds? I also saw something about
 cacheing them using a Field Cache - how would I do that?

 Thanks,
 Steve

 -Original Message-
 From: Lohrenz, Steven [mailto:steven.lohr...@hmhpub.com]
 Sent: 30 November 2010 11:57
 To: solr-user@lucene.apache.org
 Subject: Return Lucene DocId in Solr Results

 Hi,

 I was wondering how I would go about getting the lucene docid included in
 the results from a solr query?

 I've built a QueryParser to query another solr instance and and join the
 results of the two instances through the use of a Filter.  The Filter needs
 the lucene docid to work. This is the only bit I'm missing right now.

 Thanks,
 Steve




Reading Solr Index directly

2010-11-17 Thread Sasank Mudunuri
Hi,

I've been poking around the JavaDocs a bit, and it looks like it's possible
to directly read the index using the Solr Java API. Hoping to clarify a
couple of things --

1) Do I need to read the index with Solr APIs, or can I use Lucene (PyLucene
is particularly attractive...)? If so, how wary should I be about the Lucene
version number?

2) Is there anything I should worry about in terms of opening a read-only
reader against an active Solr instance? Or will this just block?

3) Anything else that jumps out at gotchas?

I couldn't find any pages about how to do this. I'm happy to compile any
responses for inclusion on the Solr wiki.

thanks!
sasank


Re: Highlighter - multiple instances of term being combined

2010-11-10 Thread Sasank Mudunuri
Ahh this reconfirms. The analyzers are properly pulling things apart. There
are two instances of the query keyword with words between them. But from
your last comment, it sounds like the system's not trying to do any sort of
phrase highlighting, but is just hitting a weird edge case? I'm seeing this
behavior somewhat commonly, so I thought for sure there must be some option
that says if two highlighted words are sufficiently close together,
highlight them as a single phrase.

On Tue, Nov 9, 2010 at 7:11 PM, Lance Norskog goks...@gmail.com wrote:

 Have you looked at solr/admin/analysis.jsp? This is 'Analysis' link
 off the main solr admin page. It will show you how text is broken up
 for both the indexing and query processes. You might get some insight
 about how these words are torn apart and assigned positions. Trying
 the different Analyzers and options might get you there.

 But to be frank- highlighting is a tough problem and has always had a
 lot of edge cases.

 On Tue, Nov 9, 2010 at 6:08 PM, Sasank Mudunuri sas...@gmail.com wrote:
  I'm finding that if a keyword appears in a field multiple times very
 close
  together, it will get highlighted as a phrase even though there are other
  terms between the two instances. So this search:
 
  http://localhost:8983/solr/select/?
 
  hl=true
  hl.snippets=1
  q=residue
  hl.fragsize=0
  mergeContiguous=false
  indent=on
  hl.usePhraseHighlighter=false
  debugQuery=on
  hl.fragmenter=gap
  hl.highlightMultiTerm=false
 
  Highlights as:
  What does low-emresidue mean? Like low-residue/em diet?
 
  Trying to get it to highlight as:
  What does low-emresidue/em mean? Like low-emresidue/em diet?
  I've tried playing with various combinations of mergeContiguous,
  highlightMultiTerm, and usePhraseHighlighter, but they all yield the same
  output.
 
  For reference, field type uses a StandardTokenizerFactory and
  SynonymFilterFactory, StopFilterFactory, StandardFilterFactory and
  SnowballFilterFactory. I've confirmed that the intermediate words don't
  appear in either the synonym or the stop words list. I can post the full
  definition if helpful.
 
  Any pointers as to how to debug this would be greatly appreciated!
  sasank
 



 --
 Lance Norskog
 goks...@gmail.com



Highlighter - multiple instances of term being combined

2010-11-09 Thread Sasank Mudunuri
I'm finding that if a keyword appears in a field multiple times very close
together, it will get highlighted as a phrase even though there are other
terms between the two instances. So this search:

http://localhost:8983/solr/select/?

hl=true
hl.snippets=1
q=residue
hl.fragsize=0
mergeContiguous=false
indent=on
hl.usePhraseHighlighter=false
debugQuery=on
hl.fragmenter=gap
hl.highlightMultiTerm=false

Highlights as:
What does low-emresidue mean? Like low-residue/em diet?

Trying to get it to highlight as:
What does low-emresidue/em mean? Like low-emresidue/em diet?
I've tried playing with various combinations of mergeContiguous,
highlightMultiTerm, and usePhraseHighlighter, but they all yield the same
output.

For reference, field type uses a StandardTokenizerFactory and
SynonymFilterFactory, StopFilterFactory, StandardFilterFactory and
SnowballFilterFactory. I've confirmed that the intermediate words don't
appear in either the synonym or the stop words list. I can post the full
definition if helpful.

Any pointers as to how to debug this would be greatly appreciated!
sasank


Searching for Documents by Indexed Term

2010-10-20 Thread Sasank Mudunuri
Hi Solr Users,

I used the TermsComponent to walk through all the indexed terms and find
ones of particular interest (named entities). And now, I'd like to search
for documents that contain these particular entities. I have both query-time
and index-time stemming set for the field, which means I can't just hit the
normal search handler because as I understand, it will stem the
already-stemmed term. Any ideas about how to search directly for the indexed
term? Maybe something I can do at query-time to disable stemming?

Thanks!
sasank


Re: Searching for Documents by Indexed Term

2010-10-20 Thread Sasank Mudunuri
That looks very promising based on a couple of quick queries. Any objections
if I move the javadoc help into the wiki, specifically:

Create a term query from the input value without any text analysis or
 transformation whatsoever. This is useful in debugging, or when raw terms
 are returned from the terms component (this is not the default).


Thanks Eric!
sasank

On Wed, Oct 20, 2010 at 6:00 PM, Erick Erickson erickerick...@gmail.comwrote:

 This may be a wild herring, but have you tried raw? NOTE: I'm a little
 out of my depth here on what this actually does, so don't waste time by
 thinking I'm an authority on this one. See:

 http://lucene.apache.org/solr/api/org/apache/solr/search/RawQParserPlugin.html

 and
 http://wiki.apache.org/solr/SolrQuerySyntax
 (this last under built in query parsers).

 HTH
 Erick

 On Wed, Oct 20, 2010 at 1:47 PM, Sasank Mudunuri sas...@gmail.com wrote:

  Hi Solr Users,
 
  I used the TermsComponent to walk through all the indexed terms and find
  ones of particular interest (named entities). And now, I'd like to search
  for documents that contain these particular entities. I have both
  query-time
  and index-time stemming set for the field, which means I can't just hit
 the
  normal search handler because as I understand, it will stem the
  already-stemmed term. Any ideas about how to search directly for the
  indexed
  term? Maybe something I can do at query-time to disable stemming?
 
  Thanks!
  sasank