Re: Bug in DismaxRequestHanlder?
Thanks! Removing the entry in the config file fixed it. Could please explain to me what the property does exactly? It is not clear to me. On 19/06/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : When I run the following query : : : http://localhost:8666/solr/select/?q=test+lettre&version=2.2&start=0&rows=10&indent=on&qt=dismax : HTTP Status 500 - For input string: "" java.lang.NumberFormatException: For : input string: "" at java.lang.NumberFormatException.forInputString( : NumberFormatException.java:48) at java.lang.Integer.parseInt( Integer.java:468) : at java.lang.Integer.(Integer.java:620) at : org.apache.solr.util.SolrPluginUtils.calculateMinShouldMatch( That exception seems to indicate that the value of your "mm" option is set to the empty string. if you had no value specified, then it would default to the string "100%" since there is no "mm" param in the URL you listed, i'm assuming your solrconfig.xml has a default "mm" param specifeid and it is the empty string (or perhaps all whitespace) if you set that to a legal "minShouldMatch" string (or remove it completely) things should work fine. if you'd like, feel free to open a bug requesting a better error message when "mm" can't be parsed cleanly. (please note the "" causes NumFormatException as the motivation for the bug) -Hoss
Faceted Search!
Hi all, I'm couple of days old with Solr so I'm very new to this. However, I'm trying to implemented Faceted search somewhat close to CNET shopper.com. Instead of using some items (like "camera"), I want to search for documents. I'm planning to use Nutch to crawl that website and use Solr to cluster my search results. I tried integrating Nutch with Solr following FooFactory.com's blog ..but I could not follow few of the steps as I'm very new to both of them. If anyone of you have implemented, can you please give me suggestion or code snippets so that I can implemented them to achieve the "faceted search". Any help would be appericated. Thanks, Niraj - You snooze, you lose. Get messages ASAP with AutoCheck in the all-new Yahoo! Mail Beta.
Re: add CJKTokenizer to solr
I'm sorry. Because it was not possible to append it, it sends it again. > > I got the error below after adding CJKTokenizer to schema.xml. I > > checked the constructor of CJKTokenizer, it requires a Reader parameter, > > I guess that's why I get this error, I searched the email archive, it > > seems working for other users. Does anyone know what is the problem? > > > CJKTokenizerFactory that I am using is appended. > -- package org.apache.solr.analysis.ja; import java.io.Reader; import org.apache.lucene.analysis.cjk.CJKTokenizer ; import org.apache.lucene.analysis.TokenStream; import org.apache.solr.analysis.BaseTokenizerFactory; /** * CJKTokenizer for Solr * @see org.apache.lucene.analysis.cjk.CJKTokenizer * @author matsu * */ public class CJKTokenizerFactory extends BaseTokenizerFactory { /** * @see org.apache.solr.analysis.TokenizerFactory#create(Reader) */ public TokenStream create(Reader input) { return new CJKTokenizer( input ); } } -- Trou Matsuzawa
Re: add CJKTokenizer to solr
> I got the error below after adding CJKTokenizer to schema.xml. I > checked the constructor of CJKTokenizer, it requires a Reader parameter, > I guess that's why I get this error, I searched the email archive, it > seems working for other users. Does anyone know what is the problem? CJKTokenizerFactory that I am using is appended. On Mon, 18 Jun 2007 21:35:37 -0700 "Xuesong Luo" <[EMAIL PROTECTED]> wrote: > Hi, > > I got the error below after adding CJKTokenizer to schema.xml. I > checked the constructor of CJKTokenizer, it requires a Reader parameter, > I guess that's why I get this error, I searched the email archive, it > seems working for other users. Does anyone know what is the problem? > > > > Thanks > > Xuesong > > > > > > 2007-06-18 17:09:29,369 ERROR [STDERR] Jun 18, 2007 5:09:29 PM > org.apache.solr.core.SolrException log > > SEVERE: org.apache.solr.core.SolrException: Error instantiating class > class org.apache.lucene.analysis.cjk.CJKTokenizer > > at org.apache.solr.core.Config.newInstance(Config.java:229) > > at > org.apache.solr.schema.IndexSchema.readTokenizerFactory(IndexSchema.java > :619) > > at > org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:593) > > at > org.apache.solr.schema.IndexSchema.readConfig(IndexSchema.java:331) > > at > org.apache.solr.schema.IndexSchema.(IndexSchema.java:71) > > > > > > > > Schema.xml > > positionIncrementGap="100" > > > > > > > > > > > > > > > -- Toru Matsuzawa
Re: add CJKTokenizer to solr
: I got the error below after adding CJKTokenizer to schema.xml. I : checked the constructor of CJKTokenizer, it requires a Reader parameter, : I guess that's why I get this error, I searched the email archive, it : seems working for other users. Does anyone know what is the problem? You can use any Lucene "Analyzers" that has a default constructor as is by declaring it in the declaration (the example schema.xml shows this using the GreekAnalyzer) os you could use the CJKAnalyzer directly ... if you want to use a Lucene "Tokenizer" you need a simple Solr "TokenizerFactory" to generate instances of it. writting a TokenizerFactory is easy, they can be simple -- really, REALLY simple ... most of the ones in the Solr code base have more lines of License text then they do of code... http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/analysis/LowerCaseTokenizerFactory.java?view=markup http://wiki.apache.org/solr/SolrPlugins#head-718653697f60b44092280c8c506077e0933e3668 http://lucene.apache.org/solr/api/org/apache/solr/analysis/TokenizerFactory.html -Hoss
add CJKTokenizer to solr
Hi, I got the error below after adding CJKTokenizer to schema.xml. I checked the constructor of CJKTokenizer, it requires a Reader parameter, I guess that's why I get this error, I searched the email archive, it seems working for other users. Does anyone know what is the problem? Thanks Xuesong 2007-06-18 17:09:29,369 ERROR [STDERR] Jun 18, 2007 5:09:29 PM org.apache.solr.core.SolrException log SEVERE: org.apache.solr.core.SolrException: Error instantiating class class org.apache.lucene.analysis.cjk.CJKTokenizer at org.apache.solr.core.Config.newInstance(Config.java:229) at org.apache.solr.schema.IndexSchema.readTokenizerFactory(IndexSchema.java :619) at org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:593) at org.apache.solr.schema.IndexSchema.readConfig(IndexSchema.java:331) at org.apache.solr.schema.IndexSchema.(IndexSchema.java:71) Schema.xml
Re: problems getting data into solr index
On 18-Jun-07, at 6:27 AM, vanderkerkoff wrote: Cheesr Mike, read the page, it's starting to get into my brian now. Django was giving me unicode string, so I did some encoding and decoding and now the data is getting into solr, and it's simply not passing the characters that are cuasing problems, which is great. Glad to hear that it is working. 2 little things, I'm getting an error when it's trying to optimise the index AttributeError: SolrConnection instance has no attribute 'optimise' You don't know what that is about do you? Er, it means that SolrConnection has no optimise command. Instead do conn.commit(optimize=True) I'm still on solr1.1 as we were having trouble getting this sort of interaction to work with 1.2, not sure if it's related. 2. I've used your suggestions to force the output into ascii, but if I try to force it into utf8, which I though solr would accept, it fails. I'm not sure why though. Perhaps this is why: solr.py expects unicode. You can pass it ascii, and it will transparently convert to unicode fine because that is the default codec. If you end up with utf-8, it will try to convert to unicode using the ascii codec and fail. So, you could completely skip the ;encode('ascii', 'ignore') line. Of course, you'd have the characters in the text. I'm not quite sure what you're after, since leaving it in utf-8 would leave the funny characters that you wanted to strip. -MIke
Re: Copying part of index directory
On 17-Jun-07, at 3:03 AM, Roopesh P Raj wrote: Thanks for the reply. I have one more query. My doubt is where to re-index (location of the index directory) ? For this should I run another instance of solr? Is this the preferred approach ? There is no preferred approach, this is dictated entirely by your requirements. Since you wanted to create a new subindex, you'll have to set up another Solr instance somewhere. Another machine, another webapp, etc. -Mike
Re: Bug in DismaxRequestHanlder?
: When I run the following query : : : http://localhost:8666/solr/select/?q=test+lettre&version=2.2&start=0&rows=10&indent=on&qt=dismax : HTTP Status 500 - For input string: "" java.lang.NumberFormatException: For : input string: "" at java.lang.NumberFormatException.forInputString( : NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:468) : at java.lang.Integer.(Integer.java:620) at : org.apache.solr.util.SolrPluginUtils.calculateMinShouldMatch( That exception seems to indicate that the value of your "mm" option is set to the empty string. if you had no value specified, then it would default to the string "100%" since there is no "mm" param in the URL you listed, i'm assuming your solrconfig.xml has a default "mm" param specifeid and it is the empty string (or perhaps all whitespace) if you set that to a legal "minShouldMatch" string (or remove it completely) things should work fine. if you'd like, feel free to open a bug requesting a better error message when "mm" can't be parsed cleanly. (please note the "" causes NumFormatException as the motivation for the bug) -Hoss
Re: All facet.fields for a given facet.query?
On 6/18/07, James Mead <[EMAIL PROTECTED]> wrote: Is it possible to request all facet.fields for a given facet.query instead of having to request specific facet.fields? e.g. is there a wildcard for facet.fields? Not currently. Can you elaborate on the problem you are trying to solve? Are you using dynamic fields and hence don't know the exact names of the fields to facet on? -Yonik
All facet.fields for a given facet.query?
Thanks for a great project. Is it possible to request all facet.fields for a given facet.query instead of having to request specific facet.fields? e.g. is there a wildcard for facet.fields? -- James. http://blog.floehopper.org
Re: Filtering on a 'unique key' set
On 6/18/07, Henrib <[EMAIL PROTECTED]> wrote: Thanks Yonik; Let me twist the same question another way; I'm running Solr embedded, the uniqueKey set that pre-exists may be large, is per-query (most likely not useful to cache it) and is iterable. I'd rather avoid making a string to build the 'fq', get it parsed, etc. Would it be as safe & more efficient in a (custom) request handler to create a DocSet by fetching termDocs for each key used as a Term & use is as a filter? Yes, that should work fine. Most of the savings will be avoiding the query parsing. -Yonik
Re: Filtering on a 'unique key' set
Thanks Yonik; Let me twist the same question another way; I'm running Solr embedded, the uniqueKey set that pre-exists may be large, is per-query (most likely not useful to cache it) and is iterable. I'd rather avoid making a string to build the 'fq', get it parsed, etc. Would it be as safe & more efficient in a (custom) request handler to create a DocSet by fetching termDocs for each key used as a Term & use is as a filter? Or is this just a bad idea? Pseudo code being: DocSet keyFilter(org.apache.lucene.index.IndexReader reader, String keyField, java.util.Iterator ikeys) throws java.io.IOException { org.apache.solr.util.OpenBitSet bits = new org.apache.solr.util.OpenBitSet(reader.maxDoc()); if (ikeys.hasNext()) { org.apache.lucene.index.Term term = new org.apache.lucene.index.Term(keyField,ikeys.next()); org.apache.lucene.index.TermDocs termDocs = reader.termDocs(term); if (termDocs.next()) bits.fastSet(termDocs.doc()); while(ikeys.hasNext()) { termDocs.seek(term.createTerm(ikeys.next())); if(termDocs.next()) bits.fastSet(termDocs.doc()); } termDocs.close(); } return new org.apache.solr.search.BitDocSet(bits); } Thanks again Yonik Seeley wrote: > > On 6/17/07, Henrib <[EMAIL PROTECTED]> wrote: >> Merely an efficiency related question: is there any other way to filter >> on a >> uniqueKey set than using the 'fq' parameter & building a list of the >> uniqueKeys? > > I don't thnik so... > >> In 'raw' Lucene, you could use filters directly in search; is this (close >> to) equivalent efficiency wise? > > Yes, any fq params are turned into filters. > > -Yonik > > -- View this message in context: http://www.nabble.com/Filtering-on-a-%27unique-key%27-set-tf3935694.html#a11178089 Sent from the Solr - User mailing list archive at Nabble.com.
Re: problems getting data into solr index
I think I've resolved this. I've edited that solr.py file to optimize=True on commit and moved the commit outside of the loop http://pastie.textmate.org/71392 The data is going in, it's optmizing once but it's showing as commit = 0 in the stats page of my solr. There's no errors that I can see, and the data is definately in the index as I can now search for it. vanderkerkoff wrote: > > > 2 little things, I'm getting an error when it's trying to optimise the > index > > AttributeError: SolrConnection instance has no attribute 'optimise' > > You don't know what that is about do you? > > I'm still on solr1.1 as we were having trouble getting this sort of > interaction to work with 1.2, not sure if it's related. > > -- View this message in context: http://www.nabble.com/problems-getting-data-into-solr-index-tf3915542.html#a11176732 Sent from the Solr - User mailing list archive at Nabble.com.
[ANN] acts_as_solr v.0.9 has been released
It's with great pleasure that I announce this great milestone for the acts_as_solr plugin. Thanks to all who contributed with ideas, patches, etc. = About = This plugin adds full text search capabilities and many other nifty features from Apache's Solr to any Rails model. = IMPORTANT: Before you Upgrade from v.0.8.5 = If you are currently using the embedded Solr in production environment, please make sure you backup the data directory before upgrading to version 0.9 because the directory where Solr lives now is under acts_as_solr/solr instead of acts_as_solr/test/solr. = Changes NEW: Added the option :scores when doing a search. If set to true this will return the score as a 'solr_score' attribute or each one of the instances found books = Book.find_by_solr 'ruby OR splinter', :scores => true books.records.first.solr_score => 1.21321397 books.records.last.solr_score => 0.12321548 NEW: Major change on the way the results returned are accessed. books = Book.find_by_solr 'ruby' # the above will return a SearchResults class with 4 methods: # docs|results|records: will return an array of records found # # books.records.is_a?(Array) # => true # # total|num_found|total_hits: will return the total number of records found # # books.total # => 2 # # facets: will return the facets when doing a faceted search # # max_score|highest_score: returns the highest score found # # books.max_score # => 1.3213213 NEW: Integrating acts_as_solr to use solr-ruby as the 'backend'. Integration based on the patch submitted by Erik Hatcher NEW: Re-factoring rebuild_solr_index to allow adds to be done in batch; and if a finder block is given, it will be called to retrieve the items to index. (thanks Daniel E.) NEW: Adding the option to specify the port Solr should start when using rake solr:start rake solr:start RAILS_ENV=your_env PORT=XX NEW: Adding deprecation warning for the :background configuration option. It will no longer be updated. NEW: Adding support for models that use a primary key other than integer class Posting < ActiveRecord::Base set_primary_key 'guid' #string #make sure you set the :primary_key_field => 'pk_s' if you wish to use a string field as the primary key acts_as_solr({},{:primary_key_field => 'pk_s'}) end FIX: Disabling of storing most fields. Storage isn't useful for acts_as_solr in any field other than the pk and id fields. It just takes up space and time. (thanks Daniel E.) FIX: Re-factoring code submitted by Daniel E. NEW: Adding an :auto_commit option that will only send the commit command to Solr if it is set to true class Author < ActiveRecord::Base acts_as_solr :auto_commit => false end FIX: Fixing bug on rake's test task FIX: Making acts_as_solr's Post class compatible with Solr 1.2 (thanks Si) NEW: Adding Solr 1.2 FIX: Removing Solr 1.1 NEW: Adding a conditional :if option to the acts_as_solr call. It behaves the same way ActiveRecord's :if argument option does. class Electronic < ActiveRecord::Base acts_as_solr :if => proc{|record| record.is_active?} end NEW: Adding fixtures to Solr index when using rake db:fixtures:load FIX: Fixing boost warning messages FIX: Fixing bug when adding a facet to a field that contains boost NEW: Deprecating find_with_facet and combining functionality with find_by_solr NEW: Adding the option to :exclude_fields when indexing a model class User < ActiveRecord::Base acts_as_solr :exclude_fields => [:password, :login, :credit_card_number] end FIX: Fixing branch bug on older ruby version NEW: Adding boost support for fields and documents being indexed: class Electronic < ActiveRecord::Base # You can add boosting on a per-field basis or on the entire document acts_as_solr :fields => [{:price => {:boost => 5.0}}], :boost => 5.0 end FIX: Fixed the acts_as_solr limitation to only accept test|development|production environments. = /Changes For more info: http://acts_as_solr.railsfreaks.com OR if your browser/isp can't render: http://acts-as-solr.railsfreaks.com -- Thiago Jackiw
Re: problems getting data into solr index
Cheesr Mike, read the page, it's starting to get into my brian now. Django was giving me unicode string, so I did some encoding and decoding and now the data is getting into solr, and it's simply not passing the characters that are cuasing problems, which is great. I'm going to follow the same sort of principle in my python code when I'm adding the items, so I can keep my solr index up to date as and when things are entered. Here's the code I'm using to enter the data. http://pastie.textmate.org/71367 2 little things, I'm getting an error when it's trying to optimise the index AttributeError: SolrConnection instance has no attribute 'optimise' You don't know what that is about do you? I'm still on solr1.1 as we were having trouble getting this sort of interaction to work with 1.2, not sure if it's related. 2. I've used your suggestions to force the output into ascii, but if I try to force it into utf8, which I though solr would accept, it fails. I'm not sure why though. Mike Klaas wrote: > > Hi, > > To diagnose this properly, you're going to have to figure out if > you're dealing with encoded bytes or unicode, and what django does. > See http://www.joelonsoftware.com/articles/Unicode.html. > > As a short-term solution, you can force things to ascii using: > > str(s.decode('ascii', 'ignore')) # assuming s is a bytestring > u.encode('ascii', 'ignore') # assuming u is a unicode string > > -Mike > -- View this message in context: http://www.nabble.com/problems-getting-data-into-solr-index-tf3915542.html#a11174969 Sent from the Solr - User mailing list archive at Nabble.com.
Bug in DismaxRequestHanlder?
Hello, I think I have uncovered a bug. When I run the following query : http://localhost:8666/solr/select/?q=test+lettre&version=2.2&start=0&rows=10&indent=on&qt=dismax I get the following exception : HTTP Status 500 - For input string: "" java.lang.NumberFormatException: For input string: "" at java.lang.NumberFormatException.forInputString( NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:468) at java.lang.Integer.(Integer.java:620) at org.apache.solr.util.SolrPluginUtils.calculateMinShouldMatch( SolrPluginUtils.java:614) at org.apache.solr.util.SolrPluginUtils.setMinShouldMatch(SolrPluginUtils.java:575) at org.apache.solr.request.DisMaxRequestHandler.handleRequestBody( DisMaxRequestHandler.java:244) at org.apache.solr.handler.RequestHandlerBase.handleRequest( RequestHandlerBase.java:77) at org.apache.solr.core.SolrCore.execute( SolrCore.java:658) at org.apache.solr.servlet.SolrDispatchFilter.execute( SolrDispatchFilter.java:191) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter( ApplicationFilterChain.java:202) at org.apache.catalina.core.ApplicationFilterChain.doFilter( ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke( StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke( StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke( StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection (Http11BaseProtocol.java:664) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket( PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt( LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run( ThreadPool.java:684) at java.lang.Thread.run(Thread.java:595) The exception only happens when using the DismaxRequestHandler and a combination of 2 search words (ex. test lettre) Can someone please, confirm this? Does anyone know a workaround? I am using the final solr 1.2 release. Greetings