Re: Bug in DismaxRequestHanlder?

2007-06-18 Thread Thierry Collogne

Thanks! Removing the entry in the config file fixed it.
Could please explain to me what the property does exactly? It is not clear
to me.

On 19/06/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:


: When I run the following query :
:
:
http://localhost:8666/solr/select/?q=test+lettre&version=2.2&start=0&rows=10&indent=on&qt=dismax

: HTTP Status 500 - For input string: "" java.lang.NumberFormatException:
For
: input string: "" at java.lang.NumberFormatException.forInputString(
: NumberFormatException.java:48) at java.lang.Integer.parseInt(
Integer.java:468)
: at java.lang.Integer.(Integer.java:620) at
: org.apache.solr.util.SolrPluginUtils.calculateMinShouldMatch(

That exception seems to indicate that the value of your "mm" option is set
to the empty string.  if you had no value specified, then it would default
to the string "100%"

since there is no "mm" param in the URL you listed, i'm assuming your
solrconfig.xml has a default "mm" param specifeid and it is the empty
string (or perhaps all whitespace)

if you set that to a legal "minShouldMatch" string (or remove it
completely) things should work fine.


if you'd like, feel free to open a bug requesting a better error message
when "mm" can't be parsed cleanly. (please note the "" causes
NumFormatException as the motivation for the bug)


-Hoss




Faceted Search!

2007-06-18 Thread niraj tulachan
Hi all,
I'm couple of days old with Solr so I'm very new to this.  However, I'm 
trying to implemented Faceted search somewhat close to CNET shopper.com. 
Instead of using some items (like "camera"), I want to search for documents.  
I'm planning to use Nutch to crawl that website and use Solr to cluster my 
search results.  I tried integrating Nutch with Solr following FooFactory.com's 
blog ..but I could not follow few of the steps as I'm very new to both of 
them.  If anyone of you have implemented, can you please give me suggestion or 
code snippets so that I can implemented them to achieve the "faceted search".  
Any help would be appericated.
  Thanks,
  Niraj  

   
-
You snooze, you lose. Get messages ASAP with AutoCheck
 in the all-new Yahoo! Mail Beta. 

Re: add CJKTokenizer to solr

2007-06-18 Thread Toru Matsuzawa
I'm sorry. Because it was not possible to append it, 
it sends it again. 

> > I got the error below after adding CJKTokenizer to schema.xml.  I
> > checked the constructor of CJKTokenizer, it requires a Reader parameter,
> > I guess that's why I get this error, I searched the email archive, it
> > seems working for other users. Does anyone know what is the problem?
> 
> 
> CJKTokenizerFactory that I am using is appended.
> 
--
package org.apache.solr.analysis.ja;

import java.io.Reader;
import org.apache.lucene.analysis.cjk.CJKTokenizer ;

import org.apache.lucene.analysis.TokenStream;
import org.apache.solr.analysis.BaseTokenizerFactory;

/**
 * CJKTokenizer for Solr
 * @see org.apache.lucene.analysis.cjk.CJKTokenizer
 * @author matsu
 *
 */
public class CJKTokenizerFactory extends BaseTokenizerFactory {

  /**
   * @see org.apache.solr.analysis.TokenizerFactory#create(Reader)
   */
  public TokenStream create(Reader input) {
return new CJKTokenizer( input );
  }

}


-- 
Trou Matsuzawa




Re: add CJKTokenizer to solr

2007-06-18 Thread Toru Matsuzawa
> I got the error below after adding CJKTokenizer to schema.xml.  I
> checked the constructor of CJKTokenizer, it requires a Reader parameter,
> I guess that's why I get this error, I searched the email archive, it
> seems working for other users. Does anyone know what is the problem?


CJKTokenizerFactory that I am using is appended.

On Mon, 18 Jun 2007 21:35:37 -0700
"Xuesong Luo" <[EMAIL PROTECTED]> wrote:

> Hi, 
> 
> I got the error below after adding CJKTokenizer to schema.xml.  I
> checked the constructor of CJKTokenizer, it requires a Reader parameter,
> I guess that's why I get this error, I searched the email archive, it
> seems working for other users. Does anyone know what is the problem?
> 
>  
> 
> Thanks
> 
> Xuesong
> 
>  
> 
>  
> 
> 2007-06-18 17:09:29,369 ERROR [STDERR] Jun 18, 2007 5:09:29 PM
> org.apache.solr.core.SolrException log
> 
> SEVERE: org.apache.solr.core.SolrException: Error instantiating class
> class org.apache.lucene.analysis.cjk.CJKTokenizer
> 
> at org.apache.solr.core.Config.newInstance(Config.java:229)
> 
> at
> org.apache.solr.schema.IndexSchema.readTokenizerFactory(IndexSchema.java
> :619)
> 
> at
> org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:593)
> 
> at
> org.apache.solr.schema.IndexSchema.readConfig(IndexSchema.java:331)
> 
> at
> org.apache.solr.schema.IndexSchema.(IndexSchema.java:71)
> 
>  
> 
>  
> 
>  
> 
> Schema.xml
> 
>  positionIncrementGap="100" >
> 
>   
> 
> 
> 
> 
> 
> 
>   
> 
> 
> 
>  
> 

-- 
Toru Matsuzawa


Re: add CJKTokenizer to solr

2007-06-18 Thread Chris Hostetter

: I got the error below after adding CJKTokenizer to schema.xml.  I
: checked the constructor of CJKTokenizer, it requires a Reader parameter,
: I guess that's why I get this error, I searched the email archive, it
: seems working for other users. Does anyone know what is the problem?

You can use any Lucene "Analyzers" that has a default constructor as is by
declaring it in the  declaration (the example schema.xml shows
this using the GreekAnalyzer) os you could use the CJKAnalyzer directly
... if you want to use a Lucene "Tokenizer" you need a simple Solr
"TokenizerFactory" to generate instances of it.  writting a
TokenizerFactory is easy, they can be simple -- really, REALLY simple ...
most of the ones in the Solr code base have more lines of License text
then they do of code...

http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/analysis/LowerCaseTokenizerFactory.java?view=markup

http://wiki.apache.org/solr/SolrPlugins#head-718653697f60b44092280c8c506077e0933e3668
http://lucene.apache.org/solr/api/org/apache/solr/analysis/TokenizerFactory.html


-Hoss



add CJKTokenizer to solr

2007-06-18 Thread Xuesong Luo
Hi, 

I got the error below after adding CJKTokenizer to schema.xml.  I
checked the constructor of CJKTokenizer, it requires a Reader parameter,
I guess that's why I get this error, I searched the email archive, it
seems working for other users. Does anyone know what is the problem?

 

Thanks

Xuesong

 

 

2007-06-18 17:09:29,369 ERROR [STDERR] Jun 18, 2007 5:09:29 PM
org.apache.solr.core.SolrException log

SEVERE: org.apache.solr.core.SolrException: Error instantiating class
class org.apache.lucene.analysis.cjk.CJKTokenizer

at org.apache.solr.core.Config.newInstance(Config.java:229)

at
org.apache.solr.schema.IndexSchema.readTokenizerFactory(IndexSchema.java
:619)

at
org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:593)

at
org.apache.solr.schema.IndexSchema.readConfig(IndexSchema.java:331)

at
org.apache.solr.schema.IndexSchema.(IndexSchema.java:71)

 

 

 

Schema.xml



  






  



 



Re: problems getting data into solr index

2007-06-18 Thread Mike Klaas

On 18-Jun-07, at 6:27 AM, vanderkerkoff wrote:



Cheesr Mike, read the page, it's starting to get into my brian now.

Django was giving me unicode string, so I did some encoding and  
decoding and

now the data is getting into solr, and it's simply not passing the
characters that are cuasing problems, which is great.


Glad to hear that it is working.

2 little things, I'm getting an error when it's trying to optimise  
the index


AttributeError: SolrConnection instance has no attribute 'optimise'

You don't know what that is about do you?


Er, it means that SolrConnection has no optimise command.  Instead do

conn.commit(optimize=True)


I'm still on solr1.1 as we were having trouble getting this sort of
interaction to work with 1.2, not sure if it's related.

2.  I've used your suggestions to force the output into ascii, but  
if I try
to force it into utf8, which I though solr would accept, it fails.   
I'm not

sure why though.


Perhaps this is why: solr.py expects unicode.  You can pass it ascii,  
and it will transparently convert to unicode fine because that is the  
default codec.  If you end up with utf-8, it will try to convert to  
unicode using the ascii codec and fail.


So, you could completely skip the ;encode('ascii', 'ignore') line.   
Of course, you'd have the characters in the text.  I'm not quite sure  
what you're after, since leaving it in utf-8 would leave the funny  
characters that you wanted to strip.


-MIke


Re: Copying part of index directory

2007-06-18 Thread Mike Klaas

On 17-Jun-07, at 3:03 AM, Roopesh P Raj wrote:



Thanks for the reply. I have one more query. My doubt is where to  
re-index (location of the index directory) ? For this should I run  
another instance of solr? Is this the preferred approach ?


There is no preferred approach, this is dictated entirely by your  
requirements.  Since you wanted to create a new subindex, you'll have  
to set up another Solr instance somewhere.  Another machine, another  
webapp, etc.


-Mike


Re: Bug in DismaxRequestHanlder?

2007-06-18 Thread Chris Hostetter
: When I run the following query :
:
: 
http://localhost:8666/solr/select/?q=test+lettre&version=2.2&start=0&rows=10&indent=on&qt=dismax

: HTTP Status 500 - For input string: "" java.lang.NumberFormatException: For
: input string: "" at java.lang.NumberFormatException.forInputString(
: NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:468)
: at java.lang.Integer.(Integer.java:620) at
: org.apache.solr.util.SolrPluginUtils.calculateMinShouldMatch(

That exception seems to indicate that the value of your "mm" option is set
to the empty string.  if you had no value specified, then it would default
to the string "100%"

since there is no "mm" param in the URL you listed, i'm assuming your
solrconfig.xml has a default "mm" param specifeid and it is the empty
string (or perhaps all whitespace)

if you set that to a legal "minShouldMatch" string (or remove it
completely) things should work fine.


if you'd like, feel free to open a bug requesting a better error message
when "mm" can't be parsed cleanly. (please note the "" causes
NumFormatException as the motivation for the bug)


-Hoss



Re: All facet.fields for a given facet.query?

2007-06-18 Thread Yonik Seeley

On 6/18/07, James Mead <[EMAIL PROTECTED]> wrote:

Is it possible to request all facet.fields for a given facet.query instead
of having to request specific facet.fields? e.g. is there a wildcard for
facet.fields?


Not currently.
Can you elaborate on the problem you are trying to solve?  Are you
using dynamic fields and hence don't know the exact names of the
fields to facet on?

-Yonik


All facet.fields for a given facet.query?

2007-06-18 Thread James Mead

Thanks for a great project.

Is it possible to request all facet.fields for a given facet.query instead
of having to request specific facet.fields? e.g. is there a wildcard for
facet.fields?
--
James.
http://blog.floehopper.org


Re: Filtering on a 'unique key' set

2007-06-18 Thread Yonik Seeley

On 6/18/07, Henrib <[EMAIL PROTECTED]> wrote:

Thanks Yonik;
Let me twist the same question another way; I'm running Solr embedded, the
uniqueKey set that pre-exists  may be large, is per-query (most likely not
useful to cache it) and is iterable. I'd rather avoid making a string to
build the 'fq', get it parsed, etc.
Would it be as safe & more efficient in a (custom) request handler to create
a DocSet by fetching termDocs for each key used as a Term & use is as a
filter?


Yes, that should work fine.
Most of the savings will be avoiding the query parsing.

-Yonik


Re: Filtering on a 'unique key' set

2007-06-18 Thread Henrib

Thanks Yonik;
Let me twist the same question another way; I'm running Solr embedded, the
uniqueKey set that pre-exists  may be large, is per-query (most likely not
useful to cache it) and is iterable. I'd rather avoid making a string to
build the 'fq', get it parsed, etc.
Would it be as safe & more efficient in a (custom) request handler to create
a DocSet by fetching termDocs for each key used as a Term & use is as a
filter? Or is this just a bad idea?

Pseudo code being:
DocSet keyFilter(org.apache.lucene.index.IndexReader reader,
String keyField,
java.util.Iterator ikeys) throws java.io.IOException {
org.apache.solr.util.OpenBitSet bits = new
org.apache.solr.util.OpenBitSet(reader.maxDoc());
if (ikeys.hasNext()) {
org.apache.lucene.index.Term term = new
org.apache.lucene.index.Term(keyField,ikeys.next());
org.apache.lucene.index.TermDocs termDocs =
reader.termDocs(term);
if (termDocs.next())
bits.fastSet(termDocs.doc());
while(ikeys.hasNext()) {
termDocs.seek(term.createTerm(ikeys.next()));
if(termDocs.next())
bits.fastSet(termDocs.doc());
}
termDocs.close();
}
return new org.apache.solr.search.BitDocSet(bits);
}

Thanks again

Yonik Seeley wrote:
> 
> On 6/17/07, Henrib <[EMAIL PROTECTED]> wrote:
>> Merely an efficiency related question: is there any other way to filter
>> on a
>> uniqueKey set than using the 'fq' parameter & building a list of the
>> uniqueKeys?
> 
> I don't thnik so...
> 
>> In 'raw' Lucene, you could use filters directly in search; is this (close
>> to) equivalent efficiency wise?
> 
> Yes, any fq params are turned into filters.
> 
> -Yonik
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Filtering-on-a-%27unique-key%27-set-tf3935694.html#a11178089
Sent from the Solr - User mailing list archive at Nabble.com.



Re: problems getting data into solr index

2007-06-18 Thread vanderkerkoff

I think I've resolved this.

I've edited that solr.py file to optimize=True on commit and moved the
commit outside of the loop

http://pastie.textmate.org/71392

The data is going in, it's optmizing once but it's showing as commit = 0 in
the stats page of my solr.

There's no errors that I can see, and the data is definately in the index as
I can now search for it.



vanderkerkoff wrote:
> 
> 
> 2 little things, I'm getting an error when it's trying to optimise the
> index
> 
> AttributeError: SolrConnection instance has no attribute 'optimise'
> 
> You don't know what that is about do you?
> 
> I'm still on solr1.1 as we were having trouble getting this sort of
> interaction to work with 1.2, not sure if it's related.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/problems-getting-data-into-solr-index-tf3915542.html#a11176732
Sent from the Solr - User mailing list archive at Nabble.com.



[ANN] acts_as_solr v.0.9 has been released

2007-06-18 Thread Thiago Jackiw

It's with great pleasure that I announce this great milestone for the
acts_as_solr plugin. Thanks to all who contributed with ideas,
patches, etc.

= About =
This plugin adds full text search capabilities and many other nifty
features from Apache's Solr to any Rails model.

= IMPORTANT: Before you Upgrade from v.0.8.5 =
If you are currently using the embedded Solr in production
environment, please make sure you backup the data directory before
upgrading to version 0.9 because the directory where Solr lives now is
under acts_as_solr/solr instead of acts_as_solr/test/solr.

= Changes 
NEW: Added the option :scores when doing a search. If set to true this
will return the score as a 'solr_score' attribute or each one of the
instances found
 books = Book.find_by_solr 'ruby OR splinter', :scores => true
 books.records.first.solr_score
 => 1.21321397
 books.records.last.solr_score
 => 0.12321548

NEW: Major change on the way the results returned are accessed.
 books = Book.find_by_solr 'ruby'
 # the above will return a SearchResults class with 4 methods:
 # docs|results|records: will return an array of records found
 #
 #   books.records.is_a?(Array)
 #   => true
 #
 # total|num_found|total_hits: will return the total number of records found
 #
 #   books.total
 #   => 2
 #
 # facets: will return the facets when doing a faceted search
 #
 # max_score|highest_score: returns the highest score found
 #
 #   books.max_score
 #   => 1.3213213

NEW: Integrating acts_as_solr to use solr-ruby as the 'backend'.
Integration based on the patch submitted by Erik Hatcher

NEW: Re-factoring rebuild_solr_index to allow adds to be done in
batch; and if a finder block is given, it will be called to retrieve
the items to index. (thanks Daniel E.)

NEW: Adding the option to specify the port Solr should start when
using rake solr:start
 rake solr:start RAILS_ENV=your_env PORT=XX

NEW: Adding deprecation warning for the :background configuration
option. It will no longer be updated.

NEW: Adding support for models that use a primary key other than integer
 class Posting < ActiveRecord::Base
   set_primary_key 'guid' #string
   #make sure you set the :primary_key_field => 'pk_s' if you wish to
use a string field as the primary key
   acts_as_solr({},{:primary_key_field => 'pk_s'})
 end

FIX: Disabling of storing most fields. Storage isn't useful for
acts_as_solr in any field other than the pk and id fields. It just
takes up space and time. (thanks Daniel E.)

FIX: Re-factoring code submitted by Daniel E.

NEW: Adding an :auto_commit option that will only send the commit
command to Solr if it is set to true
 class Author < ActiveRecord::Base
acts_as_solr :auto_commit => false
 end

FIX: Fixing bug on rake's test task

FIX: Making acts_as_solr's Post class compatible with Solr 1.2 (thanks Si)

NEW: Adding Solr 1.2

FIX: Removing Solr 1.1

NEW: Adding a conditional :if option to the acts_as_solr call. It
behaves the same way ActiveRecord's :if argument option does.
 class Electronic < ActiveRecord::Base
   acts_as_solr :if => proc{|record| record.is_active?}
 end

NEW: Adding fixtures to Solr index when using rake db:fixtures:load

FIX: Fixing boost warning messages

FIX: Fixing bug when adding a facet to a field that contains boost

NEW: Deprecating find_with_facet and combining functionality with find_by_solr

NEW: Adding the option to :exclude_fields when indexing a model
 class User < ActiveRecord::Base
   acts_as_solr :exclude_fields => [:password, :login, :credit_card_number]
 end

FIX: Fixing branch bug on older ruby version

NEW: Adding boost support for fields and documents being indexed:
 class Electronic < ActiveRecord::Base
   # You can add boosting on a per-field basis or on the entire document
   acts_as_solr :fields => [{:price => {:boost => 5.0}}], :boost => 5.0
 end

FIX: Fixed the acts_as_solr limitation to only accept
test|development|production environments.
= /Changes 

For more info:
http://acts_as_solr.railsfreaks.com
OR if your browser/isp can't render:
http://acts-as-solr.railsfreaks.com

--
Thiago Jackiw


Re: problems getting data into solr index

2007-06-18 Thread vanderkerkoff

Cheesr Mike, read the page, it's starting to get into my brian now.

Django was giving me unicode string, so I did some encoding and decoding and
now the data is getting into solr, and it's simply not passing the
characters that are cuasing problems, which is great.

I'm going to follow the same sort of principle in my python code when I'm
adding the items, so I can keep my solr index up to date as and when things
are entered.

Here's the code I'm using to enter the data.

http://pastie.textmate.org/71367

2 little things, I'm getting an error when it's trying to optimise the index

AttributeError: SolrConnection instance has no attribute 'optimise'

You don't know what that is about do you?

I'm still on solr1.1 as we were having trouble getting this sort of
interaction to work with 1.2, not sure if it's related.

2.  I've used your suggestions to force the output into ascii, but if I try
to force it into utf8, which I though solr would accept, it fails.  I'm not
sure why though.

 



Mike Klaas wrote:
> 
> Hi,
> 
> To diagnose this properly, you're going to have to figure out if  
> you're dealing with encoded bytes or unicode, and what django does.   
> See http://www.joelonsoftware.com/articles/Unicode.html.
> 
> As a short-term solution, you can force things to ascii using:
> 
> str(s.decode('ascii', 'ignore')) # assuming s is a bytestring
> u.encode('ascii', 'ignore') # assuming u is a unicode string
> 
> -Mike
> 

-- 
View this message in context: 
http://www.nabble.com/problems-getting-data-into-solr-index-tf3915542.html#a11174969
Sent from the Solr - User mailing list archive at Nabble.com.



Bug in DismaxRequestHanlder?

2007-06-18 Thread Thierry Collogne

Hello,

I think I have uncovered a bug.

When I run the following query :

http://localhost:8666/solr/select/?q=test+lettre&version=2.2&start=0&rows=10&indent=on&qt=dismax

I get the following exception :


HTTP Status 500 - For input string: "" java.lang.NumberFormatException: For
input string: "" at java.lang.NumberFormatException.forInputString(
NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:468)
at java.lang.Integer.(Integer.java:620) at
org.apache.solr.util.SolrPluginUtils.calculateMinShouldMatch(
SolrPluginUtils.java:614) at
org.apache.solr.util.SolrPluginUtils.setMinShouldMatch(SolrPluginUtils.java:575)
at org.apache.solr.request.DisMaxRequestHandler.handleRequestBody(
DisMaxRequestHandler.java:244) at
org.apache.solr.handler.RequestHandlerBase.handleRequest(
RequestHandlerBase.java:77) at org.apache.solr.core.SolrCore.execute(
SolrCore.java:658) at org.apache.solr.servlet.SolrDispatchFilter.execute(
SolrDispatchFilter.java:191) at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
ApplicationFilterChain.java:202) at
org.apache.catalina.core.ApplicationFilterChain.doFilter(
ApplicationFilterChain.java:173) at
org.apache.catalina.core.StandardWrapperValve.invoke(
StandardWrapperValve.java:213) at
org.apache.catalina.core.StandardContextValve.invoke(
StandardContextValve.java:178) at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
at org.apache.catalina.core.StandardEngineValve.invoke(
StandardEngineValve.java:107) at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection
(Http11BaseProtocol.java:664) at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(
PoolTcpEndpoint.java:527) at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(
LeaderFollowerWorkerThread.java:80) at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(
ThreadPool.java:684) at java.lang.Thread.run(Thread.java:595)


The exception only happens when using the DismaxRequestHandler and a
combination of 2 search words (ex. test lettre)

Can someone please, confirm this? Does anyone know a workaround?

I am using the final solr 1.2 release.

Greetings