Re: help with using ngram analyser needed

2008-02-22 Thread Christian Wittern

Otis Gospodnetic wrote:


  
Great, this works and should give me a start for further experiments.  
Thanks a lot!


Christian




Re: Newbie question about search

2008-02-22 Thread Reece
Sounds like the docs aren't committed maybe?

Go to /solr/admin/stats.jsp and look for:

docsPending : X

Where X is the number of docs that aren't committed yet.

-Reece



On Fri, Feb 22, 2008 at 3:07 PM, x8nnn <[EMAIL PROTECTED]> wrote:
>
>  I tried to verify the readerdir. Which is fine.
>  Inside index dir I can even see a file created _7.fdt which has all the
>  content of text.
>  Now I am surprised why I am not getting it in search?
>  Santos
>
>
>
>  x8nnn wrote:
>  >
>  > Recently I installed Solr.
>  >
>  > I made changes to schema.xml, added following entries
>  >
>  > 
>  >
>  >
>  >
>  >
>  >
>  > Now I post a document like this:
>  > 0A0A1BC3:01183F59ADDC:CBFA:008AEED0
>  > 
>  >  Interoperability Demonstration Project Report
>  > 
>  > 
>  > 
>  >
>  > 110 page of text...
>  >
>  > 
>  > 
>  >
>  > Once I post it I see following entry in my catalina.out. However when I go
>  > to solr search page and try to search any token in content sectionI do not
>  > get any thing returned. basically
>  >
>  > 
>  >
>  >
>  > am I missing something?
>  >
>  > SimplePostTool: WARNING: Make sure your XML documents are encoded in
>  > UTF-8, other encodings are not currently supported
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.handler.XmlUpdateRequestHandler
>  > update
>  > INFO: added id={0A0A1BC3:01183F59ADDC:CBFA:008AEED0} in 187ms
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.core.SolrCore execute
>  > INFO: /update  0 202
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.update.DirectUpdateHandler2
>  > commit
>  > INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.update.DirectUpdateHandler2
>  > doDeletions
>  > INFO: DirectUpdateHandler2 deleting and removing dups for 1 ids
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher 
>  > INFO: Opening [EMAIL PROTECTED] DirectUpdateHandler2
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.update.DirectUpdateHandler2
>  > doDeletions
>  > INFO: DirectUpdateHandler2 docs deleted=0
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher 
>  > INFO: Opening [EMAIL PROTECTED] main
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
>  > INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
>  >
>  > 
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
>  > INFO: autowarming result for [EMAIL PROTECTED] main
>  >
>  > 
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
>  > INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
>  >
>  > 
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
>  > INFO: autowarming result for [EMAIL PROTECTED] main
>  >
>  > 
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
>  > INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
>  >
>  > 
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
>  > INFO: autowarming result for [EMAIL PROTECTED] main
>  >
>  > 
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.core.SolrCore registerSearcher
>  > INFO: Registered new searcher [EMAIL PROTECTED] main
>  > Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher close
>  > INFO: Closing [EMAIL PROTECTED] main
>  >
>  > 
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>  >
>  > 
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>  >
>  > 
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0

Re: Newbie question about search

2008-02-22 Thread x8nnn

I tried to verify the readerdir. Which is fine.
Inside index dir I can even see a file created _7.fdt which has all the
content of text.
Now I am surprised why I am not getting it in search?
Santos

x8nnn wrote:
> 
> Recently I installed Solr.
> 
> I made changes to schema.xml, added following entries
> 
> 
>
>
>
>
> 
> Now I post a document like this:
> 0A0A1BC3:01183F59ADDC:CBFA:008AEED0
> 
>  Interoperability Demonstration Project Report
> 
> 
> 
> 
> 110 page of text...
> 
> 
> 
> 
> Once I post it I see following entry in my catalina.out. However when I go
> to solr search page and try to search any token in content sectionI do not
> get any thing returned. basically
> 
> 
> 
> 
> am I missing something?
> 
> SimplePostTool: WARNING: Make sure your XML documents are encoded in
> UTF-8, other encodings are not currently supported
> Feb 21, 2008 11:14:45 PM org.apache.solr.handler.XmlUpdateRequestHandler
> update
> INFO: added id={0A0A1BC3:01183F59ADDC:CBFA:008AEED0} in 187ms
> Feb 21, 2008 11:14:45 PM org.apache.solr.core.SolrCore execute
> INFO: /update  0 202
> Feb 21, 2008 11:14:45 PM org.apache.solr.update.DirectUpdateHandler2
> commit
> INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
> Feb 21, 2008 11:14:45 PM org.apache.solr.update.DirectUpdateHandler2
> doDeletions
> INFO: DirectUpdateHandler2 deleting and removing dups for 1 ids
> Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher 
> INFO: Opening [EMAIL PROTECTED] DirectUpdateHandler2
> Feb 21, 2008 11:14:45 PM org.apache.solr.update.DirectUpdateHandler2
> doDeletions
> INFO: DirectUpdateHandler2 docs deleted=0
> Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher 
> INFO: Opening [EMAIL PROTECTED] main
> Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
>
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for [EMAIL PROTECTED] main
>
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
>
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for [EMAIL PROTECTED] main
>
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
>
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for [EMAIL PROTECTED] main
>
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Feb 21, 2008 11:14:45 PM org.apache.solr.core.SolrCore registerSearcher
> INFO: Registered new searcher [EMAIL PROTECTED] main
> Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher close
> INFO: Closing [EMAIL PROTECTED] main
>
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Feb 21, 2008 11:14:45 PM org.apache.solr.update.DirectUpdateHandler2
> commit
> INFO: end_commit_flush
> Feb 21, 2008 11:14:45 PM org.apache.solr.handler.XmlUpdateRequestHandler
> update
> INFO: commit 0 56
> Feb 21, 2008 11:14:45 PM org.apache.solr.core.SolrCore execute
> INFO: /update  0 56
> 

-- 
View this message in context: 
http://www.nabble.com/Newbie-question-about-search-tp15640877p15641411.html
Sent from

Re: Indexing content, storing html

2008-02-22 Thread Paul deGrandis
Thanks, this is perfect for what I'm trying to do.

Paul

On 2/22/08, Reece <[EMAIL PROTECTED]> wrote:
> Well I don't remember the specific name of it, I just wrote that
>  because it sounded close :)
>
>  There is a list of them here though:
>  http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>
>  -Reece
>
>
>
>  On Fri, Feb 22, 2008 at 2:10 PM, Paul deGrandis
>
> <[EMAIL PROTECTED]> wrote:
>  > Thanks!
>  >
>  >  Does Solr include an HTMLTokenFilterFactory?
>  >
>  >  Paul
>  >
>  >
>  >
>  >  On 2/22/08, Reece <[EMAIL PROTECTED]> wrote:
>  >  > I did this as well, but found problems when searching (tags in between
>  >  >  words caused searching nightmares).  I recommend stripping out all the
>  >  >  tags using the HTMLTokenFilterFactory or your own regex when indexing,
>  >  >  and storing the actual HTML in an actual database.
>  >  >
>  >  >  If you really want to store the HTML though, you can use cdata in the
>  >  >  xml like this:
>  >  >
>  >  >  
>  >  > 
>  >  > 
>  >  > 123
>  >  >  name="title">
>  >  > 
>  >  >   
>  >  >
>  >  >  The CDATA thing will basically say anything between it's tag's will be
>  >  >  rendered as the field value.  It only breaks if your html string has a
>  >  >  "]]>" in it to end the data tag.
>  >  >
>  >  >
>  >  >  -Reece
>  >  >
>  >  >
>  >  >
>  >  >
>  >  >  On Fri, Feb 22, 2008 at 12:19 PM, Paul deGrandis
>  >  >  <[EMAIL PROTECTED]> wrote:
>  >  >  > Hi all,
>  >  >  >
>  >  >  >  I'm working on a solr app that pulls HTML from an embedded 
> JavaScript
>  >  >  >  WYSIWYG editor, and I need to index on the content, but store and
>  >  >  >  reproduce the HTML.  The problem I have is when I try to add and
>  >  >  >  commit, the HTML gets interpreted as XML.  Is the way to do this
>  >  >  >  properly to create an HTMLTokenFilterFactory?  And if so, is there a
>  >  >  >  collection of plugins (like filters and such) that someone can point
>  >  >  >  me to?
>  >  >  >
>  >  >  >  Regards,
>  >  >  >  Paul
>  >  >  >
>  >  >
>  >
>


Re: Indexing content, storing html

2008-02-22 Thread Reece
Well I don't remember the specific name of it, I just wrote that
because it sounded close :)

There is a list of them here though:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

-Reece



On Fri, Feb 22, 2008 at 2:10 PM, Paul deGrandis
<[EMAIL PROTECTED]> wrote:
> Thanks!
>
>  Does Solr include an HTMLTokenFilterFactory?
>
>  Paul
>
>
>
>  On 2/22/08, Reece <[EMAIL PROTECTED]> wrote:
>  > I did this as well, but found problems when searching (tags in between
>  >  words caused searching nightmares).  I recommend stripping out all the
>  >  tags using the HTMLTokenFilterFactory or your own regex when indexing,
>  >  and storing the actual HTML in an actual database.
>  >
>  >  If you really want to store the HTML though, you can use cdata in the
>  >  xml like this:
>  >
>  >  
>  > 
>  > 
>  > 123
>  > 
>  > 
>  >   
>  >
>  >  The CDATA thing will basically say anything between it's tag's will be
>  >  rendered as the field value.  It only breaks if your html string has a
>  >  "]]>" in it to end the data tag.
>  >
>  >
>  >  -Reece
>  >
>  >
>  >
>  >
>  >  On Fri, Feb 22, 2008 at 12:19 PM, Paul deGrandis
>  >  <[EMAIL PROTECTED]> wrote:
>  >  > Hi all,
>  >  >
>  >  >  I'm working on a solr app that pulls HTML from an embedded JavaScript
>  >  >  WYSIWYG editor, and I need to index on the content, but store and
>  >  >  reproduce the HTML.  The problem I have is when I try to add and
>  >  >  commit, the HTML gets interpreted as XML.  Is the way to do this
>  >  >  properly to create an HTMLTokenFilterFactory?  And if so, is there a
>  >  >  collection of plugins (like filters and such) that someone can point
>  >  >  me to?
>  >  >
>  >  >  Regards,
>  >  >  Paul
>  >  >
>  >
>


Re: Indexing content, storing html

2008-02-22 Thread Paul deGrandis
Thanks!

Does Solr include an HTMLTokenFilterFactory?

Paul

On 2/22/08, Reece <[EMAIL PROTECTED]> wrote:
> I did this as well, but found problems when searching (tags in between
>  words caused searching nightmares).  I recommend stripping out all the
>  tags using the HTMLTokenFilterFactory or your own regex when indexing,
>  and storing the actual HTML in an actual database.
>
>  If you really want to store the HTML though, you can use cdata in the
>  xml like this:
>
>  
> 
> 
> 123
> 
> 
>   
>
>  The CDATA thing will basically say anything between it's tag's will be
>  rendered as the field value.  It only breaks if your html string has a
>  "]]>" in it to end the data tag.
>
>
>  -Reece
>
>
>
>
>  On Fri, Feb 22, 2008 at 12:19 PM, Paul deGrandis
>  <[EMAIL PROTECTED]> wrote:
>  > Hi all,
>  >
>  >  I'm working on a solr app that pulls HTML from an embedded JavaScript
>  >  WYSIWYG editor, and I need to index on the content, but store and
>  >  reproduce the HTML.  The problem I have is when I try to add and
>  >  commit, the HTML gets interpreted as XML.  Is the way to do this
>  >  properly to create an HTMLTokenFilterFactory?  And if so, is there a
>  >  collection of plugins (like filters and such) that someone can point
>  >  me to?
>  >
>  >  Regards,
>  >  Paul
>  >
>


Re: Indexing content, storing html

2008-02-22 Thread Reece
I did this as well, but found problems when searching (tags in between
words caused searching nightmares).  I recommend stripping out all the
tags using the HTMLTokenFilterFactory or your own regex when indexing,
and storing the actual HTML in an actual database.

If you really want to store the HTML though, you can use cdata in the
xml like this:




123


  

The CDATA thing will basically say anything between it's tag's will be
rendered as the field value.  It only breaks if your html string has a
"]]>" in it to end the data tag.

-Reece



On Fri, Feb 22, 2008 at 12:19 PM, Paul deGrandis
<[EMAIL PROTECTED]> wrote:
> Hi all,
>
>  I'm working on a solr app that pulls HTML from an embedded JavaScript
>  WYSIWYG editor, and I need to index on the content, but store and
>  reproduce the HTML.  The problem I have is when I try to add and
>  commit, the HTML gets interpreted as XML.  Is the way to do this
>  properly to create an HTMLTokenFilterFactory?  And if so, is there a
>  collection of plugins (like filters and such) that someone can point
>  me to?
>
>  Regards,
>  Paul
>


Newbie question about search

2008-02-22 Thread x8nnn

Recently I installed Solr.

I made changes to schema.xml, added following entries


   
   
   
   

Now I post a document like this:
0A0A1BC3:01183F59ADDC:CBFA:008AEED0

 Interoperability Demonstration Project Report




110 page of text...




Once I post it I see following entry in my catalina.out. However when I go
to solr search page and try to search any token in content sectionI do not
get any thing returned. basically




am I missing something?

SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8,
other encodings are not currently supported
Feb 21, 2008 11:14:45 PM org.apache.solr.handler.XmlUpdateRequestHandler
update
INFO: added id={0A0A1BC3:01183F59ADDC:CBFA:008AEED0} in 187ms
Feb 21, 2008 11:14:45 PM org.apache.solr.core.SolrCore execute
INFO: /update  0 202
Feb 21, 2008 11:14:45 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
Feb 21, 2008 11:14:45 PM org.apache.solr.update.DirectUpdateHandler2
doDeletions
INFO: DirectUpdateHandler2 deleting and removing dups for 1 ids
Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher 
INFO: Opening [EMAIL PROTECTED] DirectUpdateHandler2
Feb 21, 2008 11:14:45 PM org.apache.solr.update.DirectUpdateHandler2
doDeletions
INFO: DirectUpdateHandler2 docs deleted=0
Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher 
INFO: Opening [EMAIL PROTECTED] main
Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
   
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main
   
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
   
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main
   
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
   
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main
   
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Feb 21, 2008 11:14:45 PM org.apache.solr.core.SolrCore registerSearcher
INFO: Registered new searcher [EMAIL PROTECTED] main
Feb 21, 2008 11:14:45 PM org.apache.solr.search.SolrIndexSearcher close
INFO: Closing [EMAIL PROTECTED] main
   
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
   
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
   
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Feb 21, 2008 11:14:45 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Feb 21, 2008 11:14:45 PM org.apache.solr.handler.XmlUpdateRequestHandler
update
INFO: commit 0 56
Feb 21, 2008 11:14:45 PM org.apache.solr.core.SolrCore execute
INFO: /update  0 56
-- 
View this message in context: 
http://www.nabble.com/Newbie-question-about-search-tp15640877p15640877.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solrj or any other solr java client

2008-02-22 Thread Otis Gospodnetic
Grab a nightly build, it should be in there.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
> From: Paul Treszczotko <[EMAIL PROTECTED]>
> To: "solr-user@lucene.apache.org" 
> Sent: Friday, February 22, 2008 1:32:37 PM
> Subject: solrj or any other solr java client
> 
> Hi all,
> Where can I find the latest and the greatest copy of SOLRJ or any other http 
> java client for solr?
> 
> pt
> ???u0?
> 
> Paul Treszczotko
> Architect, Client Systems
> INPUT
> 11720 Plaza America Drive, Suite 1200 Reston, Virginia 20190
> Direct: 703-707-3524; Fax 703-707-6201
> This email and any files transmitted with it are confidential and are 
> intended 
> solely for the use of the individual or entity to which they are addressed. 
> If 
> you are not the intended recipient or the person responsible for delivering 
> the 
> email to the intended recipient, be advised that you have received this email 
> and any such files in error and that any use, dissemination, forwarding, 
> printing or copying of this email and/or any such files is strictly 
> prohibited. 
> If you have received this email in error please immediately notify 
> [EMAIL PROTECTED] and destroy the original message and any such files.
> 
> 
> 




solrj or any other solr java client

2008-02-22 Thread Paul Treszczotko
Hi all,
Where can I find the latest and the greatest copy of SOLRJ or any other http 
java client for solr?

pt
???u0?

Paul Treszczotko
Architect, Client Systems
INPUT
11720 Plaza America Drive, Suite 1200 Reston, Virginia 20190
Direct: 703-707-3524; Fax 703-707-6201
This email and any files transmitted with it are confidential and are intended 
solely for the use of the individual or entity to which they are addressed. If 
you are not the intended recipient or the person responsible for delivering the 
email to the intended recipient, be advised that you have received this email 
and any such files in error and that any use, dissemination, forwarding, 
printing or copying of this email and/or any such files is strictly prohibited. 
If you have received this email in error please immediately notify [EMAIL 
PROTECTED] and destroy the original message and any such files.




Re: help with using ngram analyser needed

2008-02-22 Thread Otis Gospodnetic
Hi,

Append &debugQuery=true to your request URLs to see what's going on.

Here is something I've used in the past.  I suggest you throw out everything 
but n-grams while you're debugging.



  

  
  

  


...
...


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
> From: Christian Wittern <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, February 22, 2008 4:32:08 AM
> Subject: help with using ngram analyser needed
> 
> Hi Solr users,
> 
> This is my first posting to this list, after experimenting with Solr
> for a few days.  Please bear with me.
> 
> I am trying to set up a text field for searching CJK text.  At the
> moment, I am trying using the ngram tokenizer factory, defined in the
> schema.xml as follows:
> 
> 
>   
> 
> 
> 
>   
>   
> 
> 
> synonyms="variants.txt" ignoreCase="true" expand="true"/>
> 
>   
> 
> 
> I can test this in the administrative interface and it seems to work.
> However, when I do searches, I only get matches for single character
> searches, or for searches that match a complete text field.  What I am
> trying to achieve is a substring match that would match any sequence
> of characters in the target field.
> 
> Any help appreciated,
> 
> Christian
> 
> 
> 
> -- 
> Christian Wittern, Kyoto
> 




Re: YAML update request handler

2008-02-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
Without breaking the existing stuff we can add another interface
BinaryQueryResponse extends QueryResponseWriter{
public void write(OutputStream out, SolrQueryRequest request,
SolrQueryResponse response) throws IOException;

}
and in the SolrDispatchFilter do something like this

QueryResponseWriter responseWriter = core.getQueryResponseWriter(solrReq);
if (responseWriter instanceof BinaryQueryResponse ) {
BinaryQueryResponse binaryResp = (Object)
responseWriter;
binaryResp.write(response.getOutputStream(), solrReq, solrRsp);
} else {
responseWriter.write(response.getWriter(), solrReq, solrRsp);}

--Noble
On Fri, Feb 22, 2008 at 8:05 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> The DispatchFilter could probably be modified to have the option of
>  using the ServletOutputStream instead of the Writer.  It would take
>  some doing to maintain the proper compatibility, but it can be done, I
>  think.  Maybe we could have a /binary path or something along those
>  lines and SolrJ could use that.  QueryResponseWriter could be extended
>  to have a write method that takes an OutputStream.   Caveat:  I
>  haven't fully investigated this, but I do believe it makes sense for
>  SolrJ to use a binary format by default.  The other thing it should do
>  is make sure, when sending/receiving XML is that the XML is as "tight"
>  as possible, i.e. minimal whitespace, etc.
>
>  Just thinking out loud,
>  Grant
>
>  On Feb 22, 2008, at 8:29 AM, Noble Paul നോബിള്‍
>
>
> नोब्ळ् wrote:
>
>  > The API forbids use of any non-text format.
>  >
>  > The QueryResponseWriter's write() method can take only a Writer. So we
>  > cannot write any binary stream into that.
>  >
>  > --Noble
>  >
>  > On Fri, Feb 22, 2008 at 12:30 AM, Walter Underwood
>  > <[EMAIL PROTECTED]> wrote:
>  >> Python marshal format is worth a try. It is binary and can represent
>  >> the same data as JSON. It should be a good fit to Solr.
>  >>
>  >> We benchmarked that against XML several years ago and it was 2X
>  >> faster.
>  >> Of course, XML parsers are a lot faster now.
>  >>
>  >> wunder
>  >>
>  >>
>  >>
>  >> On 2/21/08 10:50 AM, "Grant Ingersoll" <[EMAIL PROTECTED]> wrote:
>  >>
>  >>> XML can be a problem when it is really lengthy (lots of results,
>  >>> large
>  >>> results) such that a binary format could be useful in certain cases
>  >>> where we control both ends of the pipe (i.e. SolrJ.)  I've seen apps
>  >>> that deal with really large files wrapped in XML where the XML
>  >>> parsing
>  >>> takes a significant amount of time as compared to a more compact
>  >>> binary format.
>  >>>
>  >>> I think it at least warrants profiling/testing.
>  >>>
>  >>> -Grant
>  >>>
>  >>> On Feb 21, 2008, at 12:07 PM, Noble Paul നോബിള്‍
>  >>> नोब्ळ् wrote:
>  >>>
>   hi,
>   The format over the wire is not of great significance because it
>   gets
>   unmarshalled into the corresponding language object as soon as it
>   comes out
>   of the wire. I would say XML/JSON should meet 99% of the
>   requirements
>   because all the platforms come with an unmarshaller for both of
>   these.
>  
>   But,If it can offer good performance improvement it is worth
>   trying.
>   --Noble
>  
>   On Thu, Feb 21, 2008 at 3:41 AM, alexander lind <[EMAIL PROTECTED]>
>   wrote:
>  
>  > On Feb 20, 2008, at 9:31 AM, Doug Steigerwald wrote:
>  >
>  >> A few months back I wrote a YAML update request handler to see
>  >> if we
>  >> could post documents faster than with XMl.  We did see some small
>  >> speed improvements (didn't write down the numbers), but the
>  >> hacked
>  >> together code was probably making it slower as well.  Not sure if
>  >> there are faster YAML libraries out there either.
>  >>
>  >> We're not actually using it, since it was just a small proof of
>  >> concept type of project, but is this anything people might be
>  >> interested in?
>  >>
>  >
>  > Out of simple preference I would love to see a YAML request
>  > handler
>  > just because I like the YAML format. If its also faster than XML,
>  > then
>  > all the better.
>  >
>  > Cheers
>  > Alec
>  >
>  
>  
>  
>   --
>   --Noble Paul
>  >>>
>  >>> --
>  >>> Grant Ingersoll
>  >>> http://www.lucenebootcamp.com
>  >>> Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
>  >>>
>  >>> Lucene Helpful Hints:
>  >>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>  >>> http://wiki.apache.org/lucene-java/LuceneFAQ
>  >>>
>  >>>
>  >>>
>  >>>
>  >>>
>  >>
>  >>
>  >
>  >
>  >
>  > --
>  > --Noble Paul
>
>  --
>  Grant Ingersoll
>  http://www.lucenebootcamp.com
>  Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
>
>  Lucene Helpful Hints:
>  http://wiki.apache.org/lucene-java/BasicsOfPerformance
>  http://wiki.apache.org/luce

Re: Filter query cache issues

2008-02-22 Thread Yonik Seeley
On Fri, Feb 22, 2008 at 12:22 PM, Matt M. <[EMAIL PROTECTED]> wrote:
>  I'm working with an index that contains 4,447,390 documents. The response
>  time for querying using facets is pretty darn slow. I'm fairly new to more
>  advanced Solr usage and today have started looking into the solrconfig.xml.
>  In the solr admin app, I noticed that the filterCache evictions were around
>  14,194,010 - is this saying that 14,194,010 items that were supposed to be
>  cached were not? Here are the stats as they stand currently. Would someone
>  mind looking at this and giving me an analysis of sorts?

The current faceting code only works well for certain term distributions:
1) single valued fields (where the Lucene FieldCache is used)
2) multi-valued fields with the number of unique terms <1000-1 or so.

Do a single faceted query, and then check how the lookups in the
filterCache changed.
That will be the number of unique terms, and the filterCache size
should be set to be larger than this so everything will be cached.
Right now, the hit rate is 0.  You may or may not have enough memory
to use this method... just try it out to find out.

Sometime in the near future, we'll have a better faceting method for
multi-valued fields with many terms, provided that each document only
has a few terms on average:
https://issues.apache.org/jira/browse/SOLR-475

-Yonik

>  Thank you - matt
>
>  filterCache - STATS:
>
>  **
>  lookups : 14576040
>  hits : 49737
>  hitratio : 0.00
>  inserts : 14526389
>  evictions : 14194010
>  size : 2048
>  cumulative_lookups : 14576040
>  cumulative_hits : 49737
>  cumulative_hitratio : 0.00
>  cumulative_inserts : 14526389
>  cumulative_evictions : 14194010
>


Filter query cache issues

2008-02-22 Thread Matt M.
Hi,

I'm working with an index that contains 4,447,390 documents. The response
time for querying using facets is pretty darn slow. I'm fairly new to more
advanced Solr usage and today have started looking into the solrconfig.xml.
In the solr admin app, I noticed that the filterCache evictions were around
14,194,010 - is this saying that 14,194,010 items that were supposed to be
cached were not? Here are the stats as they stand currently. Would someone
mind looking at this and giving me an analysis of sorts?

Thank you - matt

filterCache - STATS:

**
lookups : 14576040
hits : 49737
hitratio : 0.00
inserts : 14526389
evictions : 14194010
size : 2048
cumulative_lookups : 14576040
cumulative_hits : 49737
cumulative_hitratio : 0.00
cumulative_inserts : 14526389
cumulative_evictions : 14194010


Indexing content, storing html

2008-02-22 Thread Paul deGrandis
Hi all,

I'm working on a solr app that pulls HTML from an embedded JavaScript
WYSIWYG editor, and I need to index on the content, but store and
reproduce the HTML.  The problem I have is when I try to add and
commit, the HTML gets interpreted as XML.  Is the way to do this
properly to create an HTMLTokenFilterFactory?  And if so, is there a
collection of plugins (like filters and such) that someone can point
me to?

Regards,
Paul


Re: Error messages in log, but everything seems fine

2008-02-22 Thread Yonik Seeley
On Fri, Feb 22, 2008 at 7:38 AM, amamare <[EMAIL PROTECTED]> wrote:
>  Hi,
>  Solr apparently writes loads of error messages with every update, commit,
>  search etc. Everything seems to be fine, searching and indexing is correct
>  and fast, but we are concerned it might affect other parts of the system if
>  they are in fact symptoms of errors internal to Solr. It seems that one
>  error message is being logged before every info message, like this:

These are logged at the INFO level (there is no error, just
informational).  You should be able to change your logging
configuration to remove all but real errors from being logged if you
want.

-Yonik


>  12:55:50,816 ERROR [STDERR] 22.feb.2008 12:55:50
>  org.apache.solr.update.DirectUpdateHandler2 commit
>  INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
>  12:55:50,816 ERROR [STDERR] 22.feb.2008 12:55:50
>  org.apache.solr.update.DirectUpdateHandler2 doDeletions
>  INFO: DirectUpdateHandler2 deleting and removing dups for 1457 ids
>  12:55:51,269 ERROR [STDERR] 22.feb.2008 12:55:51
>  org.apache.solr.search.SolrIndexSearcher 
>  INFO: Opening [EMAIL PROTECTED] DirectUpdateHandler2
>  12:55:51,300 ERROR [STDERR] 22.feb.2008 12:55:51
>  org.apache.solr.update.DirectUpdateHandler2 doDeletions
>  INFO: DirectUpdateHandler2 docs deleted=1457
>  12:55:51,300 ERROR [STDERR] 22.feb.2008 12:55:51
>  org.apache.solr.search.SolrIndexSearcher 
>  INFO: Opening [EMAIL PROTECTED] main
>  12:55:51,316 ERROR [STDERR] 22.feb.2008 12:55:51
>  org.apache.solr.search.SolrIndexSearcher warm
>  INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
>
>  
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>  12:55:51,316 ERROR [STDERR] 22.feb.2008 12:55:51
>  org.apache.solr.search.SolrIndexSearcher warm
>  INFO: autowarming result for [EMAIL PROTECTED] main
>
>  
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>  12:55:51,316 ERROR [STDERR] 22.feb.2008 12:55:51
>  org.apache.solr.search.SolrIndexSearcher warm
>
>  I've looked in the source code of the relevant classes, but I can't find the
>  source of the log messages (which really don't seem to contain much
>  information).
>
>  Anyone know why this happens and how I can fix it?
>
>  Thanks,
>  Laila


RE: multiple "things" in a document

2008-02-22 Thread Will Johnson
Usually you do something like: (assuming this is in a rdbms)

SELECT sku.id as skuid, sku.name as skuname, item.name as itemname,
location.name as locationname 
FROM sku, item, location
WHERE sku.item = item.id AND sku.location = location.id

The you can search on any part of the 'flat' record and know what field
comes from where.

Depending on the size of you corpus, and the type of queries you want to be
able to server there are a million ways to optimize this but this should get
you up and searching quickly enough.

- will



-Original Message-
From: Geoffrey Young [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 22, 2008 9:19 AM
To: solr-user@lucene.apache.org
Subject: multiple "things" in a document

hi all :)

I'm just getting up to speed with solr (and lucene, for that matter) for 
a new project.  after reading through the available docs I'm not finding 
an answer to my most basic (newbie, certainly) question.  please feel 
free to just point me to the proper doc :)

this isn't my actual use case, but it's close enough for general 
understanding... say I want to store data on a collection of SKUs which 
(for the unfamiliar :) are a combination of item + location.  so we 
might have

   sku
 id
 name
 item
 location

   item
 id
 name

   location
 id
 name

all of the schema.xml examples seem to deal with just a flat "thing" 
perhaps with multiple entries of the same field.  what I'm after is how 
to represent this kind of relationship in the schema, such that I can 
limit my result set to, say, a sku or item, but if I search on sku I can 
discriminate between the sku name and the item name in my results.

from my reading on lucene this is pretty basic stuff, but I don't see 
how the solr layer approaches this at all.  again, doc pointers much 
appreciated.

thanks for listening :)

--Geoff



Re: YAML update request handler

2008-02-22 Thread Grant Ingersoll
The DispatchFilter could probably be modified to have the option of  
using the ServletOutputStream instead of the Writer.  It would take  
some doing to maintain the proper compatibility, but it can be done, I  
think.  Maybe we could have a /binary path or something along those  
lines and SolrJ could use that.  QueryResponseWriter could be extended  
to have a write method that takes an OutputStream.   Caveat:  I  
haven't fully investigated this, but I do believe it makes sense for  
SolrJ to use a binary format by default.  The other thing it should do  
is make sure, when sending/receiving XML is that the XML is as "tight"  
as possible, i.e. minimal whitespace, etc.


Just thinking out loud,
Grant

On Feb 22, 2008, at 8:29 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



The API forbids use of any non-text format.

The QueryResponseWriter's write() method can take only a Writer. So we
cannot write any binary stream into that.

--Noble

On Fri, Feb 22, 2008 at 12:30 AM, Walter Underwood
<[EMAIL PROTECTED]> wrote:

Python marshal format is worth a try. It is binary and can represent
the same data as JSON. It should be a good fit to Solr.

We benchmarked that against XML several years ago and it was 2X  
faster.

Of course, XML parsers are a lot faster now.

wunder



On 2/21/08 10:50 AM, "Grant Ingersoll" <[EMAIL PROTECTED]> wrote:

XML can be a problem when it is really lengthy (lots of results,  
large

results) such that a binary format could be useful in certain cases
where we control both ends of the pipe (i.e. SolrJ.)  I've seen apps
that deal with really large files wrapped in XML where the XML  
parsing

takes a significant amount of time as compared to a more compact
binary format.

I think it at least warrants profiling/testing.

-Grant

On Feb 21, 2008, at 12:07 PM, Noble Paul നോബിള്‍
नोब्ळ् wrote:


hi,
The format over the wire is not of great significance because it  
gets

unmarshalled into the corresponding language object as soon as it
comes out
of the wire. I would say XML/JSON should meet 99% of the  
requirements
because all the platforms come with an unmarshaller for both of  
these.


But,If it can offer good performance improvement it is worth  
trying.

--Noble

On Thu, Feb 21, 2008 at 3:41 AM, alexander lind <[EMAIL PROTECTED]>
wrote:


On Feb 20, 2008, at 9:31 AM, Doug Steigerwald wrote:

A few months back I wrote a YAML update request handler to see  
if we

could post documents faster than with XMl.  We did see some small
speed improvements (didn't write down the numbers), but the  
hacked

together code was probably making it slower as well.  Not sure if
there are faster YAML libraries out there either.

We're not actually using it, since it was just a small proof of
concept type of project, but is this anything people might be
interested in?



Out of simple preference I would love to see a YAML request  
handler

just because I like the YAML format. If its also faster than XML,
then
all the better.

Cheers
Alec





--
--Noble Paul


--
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ












--
--Noble Paul


--
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ







multiple "things" in a document

2008-02-22 Thread Geoffrey Young

hi all :)

I'm just getting up to speed with solr (and lucene, for that matter) for 
a new project.  after reading through the available docs I'm not finding 
an answer to my most basic (newbie, certainly) question.  please feel 
free to just point me to the proper doc :)


this isn't my actual use case, but it's close enough for general 
understanding... say I want to store data on a collection of SKUs which 
(for the unfamiliar :) are a combination of item + location.  so we 
might have


  sku
id
name
item
location

  item
id
name

  location
id
name

all of the schema.xml examples seem to deal with just a flat "thing" 
perhaps with multiple entries of the same field.  what I'm after is how 
to represent this kind of relationship in the schema, such that I can 
limit my result set to, say, a sku or item, but if I search on sku I can 
discriminate between the sku name and the item name in my results.


from my reading on lucene this is pretty basic stuff, but I don't see 
how the solr layer approaches this at all.  again, doc pointers much 
appreciated.


thanks for listening :)

--Geoff


Re: YAML update request handler

2008-02-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
The API forbids use of any non-text format.

The QueryResponseWriter's write() method can take only a Writer. So we
cannot write any binary stream into that.

--Noble

On Fri, Feb 22, 2008 at 12:30 AM, Walter Underwood
<[EMAIL PROTECTED]> wrote:
> Python marshal format is worth a try. It is binary and can represent
>  the same data as JSON. It should be a good fit to Solr.
>
>  We benchmarked that against XML several years ago and it was 2X faster.
>  Of course, XML parsers are a lot faster now.
>
>  wunder
>
>
>
>  On 2/21/08 10:50 AM, "Grant Ingersoll" <[EMAIL PROTECTED]> wrote:
>
>  > XML can be a problem when it is really lengthy (lots of results, large
>  > results) such that a binary format could be useful in certain cases
>  > where we control both ends of the pipe (i.e. SolrJ.)  I've seen apps
>  > that deal with really large files wrapped in XML where the XML parsing
>  > takes a significant amount of time as compared to a more compact
>  > binary format.
>  >
>  > I think it at least warrants profiling/testing.
>  >
>  > -Grant
>  >
>  > On Feb 21, 2008, at 12:07 PM, Noble Paul നോബിള്‍
>  > नोब्ळ् wrote:
>  >
>  >> hi,
>  >> The format over the wire is not of great significance because it gets
>  >> unmarshalled into the corresponding language object as soon as it
>  >> comes out
>  >> of the wire. I would say XML/JSON should meet 99% of the requirements
>  >> because all the platforms come with an unmarshaller for both of these.
>  >>
>  >> But,If it can offer good performance improvement it is worth trying.
>  >> --Noble
>  >>
>  >> On Thu, Feb 21, 2008 at 3:41 AM, alexander lind <[EMAIL PROTECTED]>
>  >> wrote:
>  >>
>  >>> On Feb 20, 2008, at 9:31 AM, Doug Steigerwald wrote:
>  >>>
>   A few months back I wrote a YAML update request handler to see if we
>   could post documents faster than with XMl.  We did see some small
>   speed improvements (didn't write down the numbers), but the hacked
>   together code was probably making it slower as well.  Not sure if
>   there are faster YAML libraries out there either.
>  
>   We're not actually using it, since it was just a small proof of
>   concept type of project, but is this anything people might be
>   interested in?
>  
>  >>>
>  >>> Out of simple preference I would love to see a YAML request handler
>  >>> just because I like the YAML format. If its also faster than XML,
>  >>> then
>  >>> all the better.
>  >>>
>  >>> Cheers
>  >>> Alec
>  >>>
>  >>
>  >>
>  >>
>  >> --
>  >> --Noble Paul
>  >
>  > --
>  > Grant Ingersoll
>  > http://www.lucenebootcamp.com
>  > Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
>  >
>  > Lucene Helpful Hints:
>  > http://wiki.apache.org/lucene-java/BasicsOfPerformance
>  > http://wiki.apache.org/lucene-java/LuceneFAQ
>  >
>  >
>  >
>  >
>  >
>
>



-- 
--Noble Paul


Error messages in log, but everything seems fine

2008-02-22 Thread amamare

Hi,
Solr apparently writes loads of error messages with every update, commit,
search etc. Everything seems to be fine, searching and indexing is correct
and fast, but we are concerned it might affect other parts of the system if
they are in fact symptoms of errors internal to Solr. It seems that one
error message is being logged before every info message, like this:

12:55:50,816 ERROR [STDERR] 22.feb.2008 12:55:50
org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
12:55:50,816 ERROR [STDERR] 22.feb.2008 12:55:50
org.apache.solr.update.DirectUpdateHandler2 doDeletions
INFO: DirectUpdateHandler2 deleting and removing dups for 1457 ids
12:55:51,269 ERROR [STDERR] 22.feb.2008 12:55:51
org.apache.solr.search.SolrIndexSearcher 
INFO: Opening [EMAIL PROTECTED] DirectUpdateHandler2
12:55:51,300 ERROR [STDERR] 22.feb.2008 12:55:51
org.apache.solr.update.DirectUpdateHandler2 doDeletions
INFO: DirectUpdateHandler2 docs deleted=1457
12:55:51,300 ERROR [STDERR] 22.feb.2008 12:55:51
org.apache.solr.search.SolrIndexSearcher 
INFO: Opening [EMAIL PROTECTED] main
12:55:51,316 ERROR [STDERR] 22.feb.2008 12:55:51
org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main

filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
12:55:51,316 ERROR [STDERR] 22.feb.2008 12:55:51
org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main

filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
12:55:51,316 ERROR [STDERR] 22.feb.2008 12:55:51
org.apache.solr.search.SolrIndexSearcher warm

I've looked in the source code of the relevant classes, but I can't find the
source of the log messages (which really don't seem to contain much
information).

Anyone know why this happens and how I can fix it?

Thanks,
Laila
-- 
View this message in context: 
http://www.nabble.com/Error-messages-in-log%2C-but-everything-seems-fine-tp15632885p15632885.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr to work for my web application

2008-02-22 Thread Thorsten Scherler
On Fri, 2008-02-22 at 04:11 -0800, newBea wrote:
> Hi Thorsten,
> 
> Many thanks for ur replies so far...finally i set up correct environment for
> Solr. Its working:clap:

:)

Congrats, glad you got it running.

> 
> Solr Rocks!

Indeed. :)

salu2

> 
> Thorsten Scherler wrote:
> > 
> > On Thu, 2008-02-14 at 23:16 -0800, newBea wrote:
> >> Hi Thorsten...
> >> 
> >> SOrry for giving u much trouble but I need some answer regarding
> >> solr...plz
> >> help...
> >> 
> >> Question1
> >> I am using tomcat 5.5.23 so for JNDI setup of solr, adding solr.xml with
> >> context fragment as below in the tomcat5.5/...catalina/localhost.
> >> 
> >> 
> >> >> value="D:/Projects/csdb/solr" override="true" />
> >> 
> >> 
> >> Is it the correct way of doing it? 
> > 
> > Yes as I understand the wiki page.
> > 
> >> Or do I need to add context fragment in
> >> the server.xml of tomcat5.5?
> >> 
> >> Question2
> >> I am starting solr server using start.jar from another location on C:
> >> drive...whereas my home location indicated on D: drive. Is it the root
> >> coz I
> >> am not getting the search result?
> > 
> > Hmm, as I understand it you are starting two instance of solr! One as a
> > tomcat and the other as jetty. Why do you want that? If you have solr on
> > tomcat you do not need the jetty anymore. I does make 0 sense under
> > normal circumstances to do this.
> > 
> >> 
> >> Question3
> >> I have added parameter as C:\solr\data in
> >> solrconfig.xml...
> > 
> > That seems to be wrong. It should read ${solr.data.dir:C:\solr
> > \dat} but I am not using windows so I am not sure whether you
> > may need to escape the path.
> > 
> > salu2
> > 
> >> but the indexes are not getting stored there...indexes for
> >> search are getting stored in the default dir of solr...any suggestions
> >> 
> >> Thanks in advance...
> >> 
> >> 
> >> Thorsten Scherler wrote:
> >> > 
> >> > On Wed, 2008-02-13 at 05:04 -0800, newBea wrote:
> >> >> I havnt used luke.xsl. Ya but the link provided by u gives me "Solr
> >> Luke
> >> >> Request Handler Response"...
> >> >> 
> >> >>  is simple string as: csid
> >> > 
> >> > So you have:
> >> > csid
> >> > 
> >> > and
> >> >  >> > required="true" /> 
> >> > 
> >> > 
> >> >> 
> >> >> till now I am updating docs thru command prompt as : post.jar *.xml
> >> >> http://localhost:8983/update
> >> > 
> >> > how do the docs look like? I mean since you changed the sample config
> >> > you send changed documents as well, right? How do they look?
> >> > 
> >> >> 
> >> >> I am not clear on how do I post xml docs
> >> > 
> >> > Well like you said, with the post.jar and then you will send your
> >> > modified docs but there are many ways to trigger an add command to
> >> solr.
> >> > 
> >> >>  or wud xml docs be posted while I
> >> >> request solr thru tomcat at the time of searching text...
> >> > 
> >> > To search text from tomcat you will need to have a servlet or something
> >> > similar that contacts the solr server for the search result and the
> >> > handle the response (e.g. apply custom xsl to the results).
> >> > 
> >> > 
> >> > 
> >> >> 
> >> >> This manually procedure when I update the xml docs on exampledocs
> >> folder
> >> >> inside distribution package restrict it to exampledocs itself
> >> > 
> >> > No, either copy the jar to the folder where you have your documents or
> >> > add it to the PATH.
> >> > 
> >> >> ...I am not
> >> >> getting a way where my sites text get searched by solr...Do I need to
> >> >> copy
> >> >> start.jar and relevant folders in my working directory for web
> >> >> application.
> >> > 
> >> > Hmm, it seems that you not have understood the second paragraph of 
> >> > http://wiki.apache.org/solr/mySolr
> >> > 
> >> > "Typically it's not recommended to have your front end users/clients
> >> > hitting Solr directly as part of an HTML form submit ... the more
> >> > conventional way to think of it is that Solr is a backend service,
> >> which
> >> > your application can talk to over HTTP ..."
> >> > 
> >> > Meaning you have two different server running. Alternatively you can
> >> run
> >> > solr in the same tomcat as you application. If you follow SolrTomcat
> >> > from the wiki it will be install as "solr" servlet. Your application
> >> > will then communicate with this serlvet.
> >> > 
> >> > salu2
> >> > 
> >> >> 
> >> >> any help?
> >> >> 
> >> >> Thorsten Scherler-3 wrote:
> >> >> > 
> >> >> > On Wed, 2008-02-13 at 03:42 -0800, newBea wrote:
> >> >> >> Hi Thorsten,
> >> >> >> 
> >> >> >> I have my application running on 8080 port with tomcat 5.5.23I
> >> am
> >> >> >> starting solr on port 8983 with jetty server using command "java
> >> -jar
> >> >> >> start.jar".
> >> >> >> 
> >> >> >> Both the server gets started...now any search I make on tomcat
> >> >> >> application
> >> >> >> is interacting with solr very well. The problem is "schema.xml" and
> >> >> >> "solrconfig.xml" in the conf directory are default one. But after
> >> >> adding
> >> >> >> customized sch

Re: solr to work for my web application

2008-02-22 Thread newBea

Hi Thorsten,

Many thanks for ur replies so far...finally i set up correct environment for
Solr. Its working:clap:

Solr Rocks!

Thorsten Scherler wrote:
> 
> On Thu, 2008-02-14 at 23:16 -0800, newBea wrote:
>> Hi Thorsten...
>> 
>> SOrry for giving u much trouble but I need some answer regarding
>> solr...plz
>> help...
>> 
>> Question1
>> I am using tomcat 5.5.23 so for JNDI setup of solr, adding solr.xml with
>> context fragment as below in the tomcat5.5/...catalina/localhost.
>> 
>> 
>>> value="D:/Projects/csdb/solr" override="true" />
>> 
>> 
>> Is it the correct way of doing it? 
> 
> Yes as I understand the wiki page.
> 
>> Or do I need to add context fragment in
>> the server.xml of tomcat5.5?
>> 
>> Question2
>> I am starting solr server using start.jar from another location on C:
>> drive...whereas my home location indicated on D: drive. Is it the root
>> coz I
>> am not getting the search result?
> 
> Hmm, as I understand it you are starting two instance of solr! One as a
> tomcat and the other as jetty. Why do you want that? If you have solr on
> tomcat you do not need the jetty anymore. I does make 0 sense under
> normal circumstances to do this.
> 
>> 
>> Question3
>> I have added parameter as C:\solr\data in
>> solrconfig.xml...
> 
> That seems to be wrong. It should read ${solr.data.dir:C:\solr
> \dat} but I am not using windows so I am not sure whether you
> may need to escape the path.
> 
> salu2
> 
>> but the indexes are not getting stored there...indexes for
>> search are getting stored in the default dir of solr...any suggestions
>> 
>> Thanks in advance...
>> 
>> 
>> Thorsten Scherler wrote:
>> > 
>> > On Wed, 2008-02-13 at 05:04 -0800, newBea wrote:
>> >> I havnt used luke.xsl. Ya but the link provided by u gives me "Solr
>> Luke
>> >> Request Handler Response"...
>> >> 
>> >>  is simple string as: csid
>> > 
>> > So you have:
>> > csid
>> > 
>> > and
>> > > > required="true" /> 
>> > 
>> > 
>> >> 
>> >> till now I am updating docs thru command prompt as : post.jar *.xml
>> >> http://localhost:8983/update
>> > 
>> > how do the docs look like? I mean since you changed the sample config
>> > you send changed documents as well, right? How do they look?
>> > 
>> >> 
>> >> I am not clear on how do I post xml docs
>> > 
>> > Well like you said, with the post.jar and then you will send your
>> > modified docs but there are many ways to trigger an add command to
>> solr.
>> > 
>> >>  or wud xml docs be posted while I
>> >> request solr thru tomcat at the time of searching text...
>> > 
>> > To search text from tomcat you will need to have a servlet or something
>> > similar that contacts the solr server for the search result and the
>> > handle the response (e.g. apply custom xsl to the results).
>> > 
>> > 
>> > 
>> >> 
>> >> This manually procedure when I update the xml docs on exampledocs
>> folder
>> >> inside distribution package restrict it to exampledocs itself
>> > 
>> > No, either copy the jar to the folder where you have your documents or
>> > add it to the PATH.
>> > 
>> >> ...I am not
>> >> getting a way where my sites text get searched by solr...Do I need to
>> >> copy
>> >> start.jar and relevant folders in my working directory for web
>> >> application.
>> > 
>> > Hmm, it seems that you not have understood the second paragraph of 
>> > http://wiki.apache.org/solr/mySolr
>> > 
>> > "Typically it's not recommended to have your front end users/clients
>> > hitting Solr directly as part of an HTML form submit ... the more
>> > conventional way to think of it is that Solr is a backend service,
>> which
>> > your application can talk to over HTTP ..."
>> > 
>> > Meaning you have two different server running. Alternatively you can
>> run
>> > solr in the same tomcat as you application. If you follow SolrTomcat
>> > from the wiki it will be install as "solr" servlet. Your application
>> > will then communicate with this serlvet.
>> > 
>> > salu2
>> > 
>> >> 
>> >> any help?
>> >> 
>> >> Thorsten Scherler-3 wrote:
>> >> > 
>> >> > On Wed, 2008-02-13 at 03:42 -0800, newBea wrote:
>> >> >> Hi Thorsten,
>> >> >> 
>> >> >> I have my application running on 8080 port with tomcat 5.5.23I
>> am
>> >> >> starting solr on port 8983 with jetty server using command "java
>> -jar
>> >> >> start.jar".
>> >> >> 
>> >> >> Both the server gets started...now any search I make on tomcat
>> >> >> application
>> >> >> is interacting with solr very well. The problem is "schema.xml" and
>> >> >> "solrconfig.xml" in the conf directory are default one. But after
>> >> adding
>> >> >> customized schema name parameter and required fields, solr is not
>> >> working
>> >> >> as
>> >> >> required.
>> >> > 
>> >> > Can you post the modification you made to both files?
>> >> > 
>> >> >> 
>> >> >> Customized code for parsing the xml generated from solr is working
>> >> >> fine...but it is unable to find the uniquekey field which we set
>> for
>> >> all
>> >> >> the
>> >> >> documents in the schema document...

Re: YAML update request handler

2008-02-22 Thread Grant Ingersoll

See https://issues.apache.org/jira/browse/SOLR-476

On Feb 22, 2008, at 5:17 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



The SolrJ client is designed with the ResponseParser as an abstract
class (which is good). But I have no means to plugin my custom
ResponseParser class.
Add a setter method . setResponseParser(ResponseParser parser)
and  have a lazy initialization of Responseparser .
if(_processor == null) _processor = new XMLResponseParser();

in the beginning of the request method.

While it is a good idea to use commons HttpClient It is a huge ball
and chain to put those extra jars  (comons-http-client,
commons-logging, commons-codec ) in my simple client application . It
is too much to ask by a client API which is just supposed to parse an
xml response.

If httpclient  is not available we must be able to fall back to new
URL().openConnection();

--Noble

On Fri, Feb 22, 2008 at 9:46 AM, Noble Paul നോബിള്‍  
नोब्ळ्

<[EMAIL PROTECTED]> wrote:
For the case where we use Solrj (we control both ends) It is best  
to resort to a custom binary format. It works fastest and with  
least cost /bandwidth . We can use a custom object serialization/ 
deserialization mechanism (java standard serialization is verbose )  
which is lightweight .


I can create a patch which can be used for the same if you think it  
is useful.


--Noble







On Fri, Feb 22, 2008 at 12:20 AM, Grant Ingersoll <[EMAIL PROTECTED] 
> wrote:


XML can be a problem when it is really lengthy (lots of results,  
large

results) such that a binary format could be useful in certain cases
where we control both ends of the pipe (i.e. SolrJ.)  I've seen apps
that deal with really large files wrapped in XML where the XML  
parsing

takes a significant amount of time as compared to a more compact
binary format.

I think it at least warrants profiling/testing.

-Grant

On Feb 21, 2008, at 12:07 PM, Noble Paul നോബിള്‍



नोब्ळ् wrote:


hi,
The format over the wire is not of great significance because it  
gets

unmarshalled into the corresponding language object as soon as it
comes out
of the wire. I would say XML/JSON should meet 99% of the  
requirements
because all the platforms come with an unmarshaller for both of  
these.


But,If it can offer good performance improvement it is worth  
trying.

--Noble

On Thu, Feb 21, 2008 at 3:41 AM, alexander lind <[EMAIL PROTECTED]>
wrote:


On Feb 20, 2008, at 9:31 AM, Doug Steigerwald wrote:

A few months back I wrote a YAML update request handler to see  
if we

could post documents faster than with XMl.  We did see some small
speed improvements (didn't write down the numbers), but the  
hacked

together code was probably making it slower as well.  Not sure if
there are faster YAML libraries out there either.

We're not actually using it, since it was just a small proof of
concept type of project, but is this anything people might be
interested in?



Out of simple preference I would love to see a YAML request  
handler

just because I like the YAML format. If its also faster than XML,
then
all the better.

Cheers
Alec





--
--Noble Paul


--
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ










--
--Noble Paul




--
--Noble Paul


--
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ







Re: YAML update request handler

2008-02-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
The SolrJ client is designed with the ResponseParser as an abstract
class (which is good). But I have no means to plugin my custom
ResponseParser class.
Add a setter method . setResponseParser(ResponseParser parser)
and  have a lazy initialization of Responseparser .
if(_processor == null) _processor = new XMLResponseParser();

in the beginning of the request method.

While it is a good idea to use commons HttpClient It is a huge ball
and chain to put those extra jars  (comons-http-client,
commons-logging, commons-codec ) in my simple client application . It
is too much to ask by a client API which is just supposed to parse an
xml response.

If httpclient  is not available we must be able to fall back to new
URL().openConnection();

--Noble

On Fri, Feb 22, 2008 at 9:46 AM, Noble Paul നോബിള്‍ नोब्ळ्
<[EMAIL PROTECTED]> wrote:
> For the case where we use Solrj (we control both ends) It is best to resort 
> to a custom binary format. It works fastest and with least cost /bandwidth . 
> We can use a custom object serialization/deserialization mechanism (java 
> standard serialization is verbose ) which is lightweight .
>
> I can create a patch which can be used for the same if you think it is useful.
>
> --Noble
>
>
>
>
>
>
>
> On Fri, Feb 22, 2008 at 12:20 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
>
> > XML can be a problem when it is really lengthy (lots of results, large
> > results) such that a binary format could be useful in certain cases
> > where we control both ends of the pipe (i.e. SolrJ.)  I've seen apps
> > that deal with really large files wrapped in XML where the XML parsing
> > takes a significant amount of time as compared to a more compact
> > binary format.
> >
> > I think it at least warrants profiling/testing.
> >
> > -Grant
> >
> > On Feb 21, 2008, at 12:07 PM, Noble Paul നോബിള്‍
> >
> >
> >
> > नोब्ळ् wrote:
> >
> > > hi,
> > > The format over the wire is not of great significance because it gets
> > > unmarshalled into the corresponding language object as soon as it
> > > comes out
> > > of the wire. I would say XML/JSON should meet 99% of the requirements
> > > because all the platforms come with an unmarshaller for both of these.
> > >
> > > But,If it can offer good performance improvement it is worth trying.
> > > --Noble
> > >
> > > On Thu, Feb 21, 2008 at 3:41 AM, alexander lind <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > >> On Feb 20, 2008, at 9:31 AM, Doug Steigerwald wrote:
> > >>
> > >>> A few months back I wrote a YAML update request handler to see if we
> > >>> could post documents faster than with XMl.  We did see some small
> > >>> speed improvements (didn't write down the numbers), but the hacked
> > >>> together code was probably making it slower as well.  Not sure if
> > >>> there are faster YAML libraries out there either.
> > >>>
> > >>> We're not actually using it, since it was just a small proof of
> > >>> concept type of project, but is this anything people might be
> > >>> interested in?
> > >>>
> > >>
> > >> Out of simple preference I would love to see a YAML request handler
> > >> just because I like the YAML format. If its also faster than XML,
> > >> then
> > >> all the better.
> > >>
> > >> Cheers
> > >> Alec
> > >>
> > >
> > >
> > >
> > > --
> > > --Noble Paul
> >
> > --
> > Grant Ingersoll
> > http://www.lucenebootcamp.com
> > Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
> >
> > Lucene Helpful Hints:
> > http://wiki.apache.org/lucene-java/BasicsOfPerformance
> > http://wiki.apache.org/lucene-java/LuceneFAQ
> >
> >
> >
> >
> >
> >
>
>
>
> --
> --Noble Paul



-- 
--Noble Paul


help with using ngram analyser needed

2008-02-22 Thread Christian Wittern
Hi Solr users,

This is my first posting to this list, after experimenting with Solr
for a few days.  Please bear with me.

I am trying to set up a text field for searching CJK text.  At the
moment, I am trying using the ngram tokenizer factory, defined in the
schema.xml as follows:


  



  
  



  


I can test this in the administrative interface and it seems to work.
However, when I do searches, I only get matches for single character
searches, or for searches that match a complete text field.  What I am
trying to achieve is a substring match that would match any sequence
of characters in the target field.

Any help appreciated,

Christian



-- 
Christian Wittern, Kyoto


Re: custom handler results don't seem to match manually entered query string

2008-02-22 Thread evol__

Hoss thanks,
hm it might be a problem with not (specifically..) using analyzers.

But I always thought such code:

Term term = new Term("text", str);
TermQuery tq = new TermQuery(term);
query.add(tq, Occur.SHOULD);

would get query terms through analyzers - since they are specified under


...
 


Is that not true?


Cheers. D.


hossman wrote:
> 
> 
> Hmmm... everything seems right here.  
> 
> This may be a silly question, but 
> you are calling rsp.add("response", docs_main.docList) in your custom 
> handler correct?
> 
> second question: how are you building up your query obejct?  the only 
> thing i can think of is that you are constructing the TermQueries directly 
> (without using the analyzer) so they don't match what's really in the 
> index (ie: things aren't being lowercased, not splitting on "." and "_") 
> but when you cut/paste the query string into standard request handler it 
> uses the QueryParser which does the proper analysis.
> 
> what does debugQuery=true say about your query when you cut/paste the 
> query string?
> 
> can you post the full code of your custo mrequest handler?
> 
> 
> : Hi,
> : my problem is as follows: my request handler's code
> : 
> : filters = null;
> : DocListAndSet docs_main = searcher.getDocListAndSet(query, filters,
> null,
> : start, rows, flags);
> : String querystr = query.toString();
> : rsp.add("QUERY_main", querystr);
> : 
> : 
> : gives zero responses:
> : 
> :  ((text:Travel text:Home text:Online_Archives
> : text:Ireland text:Consumer_Information text:Regional text:Europe
> text:News
> : text:Complaints text:CNN.com text:February text:Transport
> : text:Airlines)^0.3)
> :   
> : 
> : 
> : While copying the "QUERY_main" string into Solr admin returns full of
> them:
> : 
> : 
> : (text:Travel text:Home text:Online_Archives text:Ireland
> : text:Consumer_Information text:Regional text:Europe text:News
> : text:Complaints text:CNN.com text:February text:Transport
> text:Airlines)^0.3
> : 
> : 10
> : 2.2
> : 
> : 
> : ÿÿ
> : 
> : 
> : 
> : 
> : Please help me understand what's going on, I'm a bit confused atm.
> Thanks
> : :-)
> 
> -Hoss
> 
> 

-- 
View this message in context: 
http://www.nabble.com/custom-handler-results-don%27t-seem-to-match-manually-entered-query-string-tp15544268p15629988.html
Sent from the Solr - User mailing list archive at Nabble.com.