Re: stopwords not working in multicore setup

2011-03-25 Thread Christopher Bottaro
Ahh, thank you for the hints Martin... German stopwords without Umlaut work
correctly.

So I'm trying to figure out where the UTF-8 chars are getting messed up.
 Using the Solr admin web UI, I did a search for title:für and the xml (or
json) output in the browser shows the query with the proper encoding, but
the Solr logs show this:

INFO: [page_30d_de] webapp=/solr path=/select
params={explainOther=fl=*,scoreindent=onstart=0q=title:f?rhl.fl=qt=standardwt=xmlfq=version=2.2rows=10}
hits=76 status=0 QTime=2

Notice the title:f?r.  How do I fix that?  I'm using Jetty btw...

Thanks for the help.

On Fri, Mar 25, 2011 at 3:05 AM, Martin Rödig r...@shi-gmbh.com wrote:

 I have some questions about your config:

 Is the stopwords-de.txt in the same diractory as the shema.xml?
 Is the title field from type text?
 Have you the same problem with german stopwords with out Umlaut (ü,ö,ä)
 like the word denn?

 A Problem can be that the stopwords-de.txt is not save as UTF-8, so the
 filter can not read the umlaut ü in the file.


 Mit freundlichen Grüßen
 M.Sc. Dipl.-Inf. (FH) Martin Rödig

 SHI Elektronische Medien GmbH
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 - - - - - - - -
 AKTUELL - NEU - AB SOFORT
 Solr/Lucene Schulung vom 19. - 21. April in Berlin

 Als erster zertifizierter Trainingspartner von Lucid Imagination in
 Deutschland, Österreich und Schweiz bietet SHI ab sofort
 deutschsprachige Solr Schulungen an.
 Weitere Informationen: www.shi-gmbh.com/services/solr-training
 Achtung: Die Anzahl der Plätze ist beschränkt!
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 - - - - - - - -
 Postadresse: Watzmannstr. 23, 86316 Friedberg
 Besuchsadresse: Curt-Frenzel-Str. 12, 86167 Augsburg
 Tel.: 0821 7482633 18
 Tel.: 0821 7482633 0 (Zentrale)
 Fax: 0821 7482633 29

 Internet: http://www.shi-gmbh.com
 Registergericht Augsburg HRB 17382
 Geschäftsführer: Peter Spiske
 Steuernummer: 103/137/30412

 -Ursprüngliche Nachricht-
 Von: Christopher Bottaro [mailto:cjbott...@onespot.com]
 Gesendet: Freitag, 25. März 2011 05:37
 An: solr-user@lucene.apache.org
 Betreff: stopwords not working in multicore setup

 Hello,

 I'm running a Solr server with 5 cores.  Three are for English content and
 two are for German content.  The default stopwords setup works fine for the
 English cores, but the German stopwords aren't working.

 The German stopwords file is stopwords-de.txt and resides in the same
 directory as stopwords.txt.  The German cores use a different schema (named
 schema.page.de.xml) which has the following text field definition:
 http://pastie.org/1711866

 The stopwords-de.txt file looks like this:  http://pastie.org/1711869

 The query I'm doing is this:  q = title:für

 And it's returning documents with für in the title.  Title is a text field
 which should use the stopwords-de.txt, as seen in the aforementioned pastie.

 Any ideas?  Thanks for the help.



stopwords not working in multicore setup

2011-03-24 Thread Christopher Bottaro
Hello,

I'm running a Solr server with 5 cores.  Three are for English content and
two are for German content.  The default stopwords setup works fine for the
English cores, but the German stopwords aren't working.

The German stopwords file is stopwords-de.txt and resides in the same
directory as stopwords.txt.  The German cores use a different schema (named
schema.page.de.xml) which has the following text field definition:
http://pastie.org/1711866

The stopwords-de.txt file looks like this:  http://pastie.org/1711869

The query I'm doing is this:  q = title:für

And it's returning documents with für in the title.  Title is a text field
which should use the stopwords-de.txt, as seen in the aforementioned pastie.

Any ideas?  Thanks for the help.


Re: multicore replication slave

2010-10-12 Thread Christopher Bottaro
Answered my own question.  Instead of naming each core in the
replication handler, you use a variable instead:

requestHandler name=/replication class=solr.ReplicationHandler 
  lst name=slave
str 
name=masterUrlhttp://solr.mydomain.com:8983/solr/${solr.core.name}/replication/str
str name=pollInterval00:00:60/str
  /lst
/requestHandler

That will get all of your cores replicating.

-- C

On Mon, Oct 11, 2010 at 6:25 PM, Christopher Bottaro
cjbott...@onespot.com wrote:
 Hello,

 I can't get my multicore slave to replicate from the master.

 The master is setup properly and the following urls return 00OKNo
 command as expected:
 http://solr.mydomain.com:8983/solr/core1/replication
 http://solr.mydomain.com:8983/solr/core2/replication
 http://solr.mydomain.com:8983/solr/core3/replication

 The following pastie shows how my slave is setup:
 http://pastie.org/1214209

 But it's not working (i.e. I see no replication attempts in the slave's log).

 Any ideas?

 Thanks for the help.



multicore replication slave

2010-10-11 Thread Christopher Bottaro
Hello,

I can't get my multicore slave to replicate from the master.

The master is setup properly and the following urls return 00OKNo
command as expected:
http://solr.mydomain.com:8983/solr/core1/replication
http://solr.mydomain.com:8983/solr/core2/replication
http://solr.mydomain.com:8983/solr/core3/replication

The following pastie shows how my slave is setup:
http://pastie.org/1214209

But it's not working (i.e. I see no replication attempts in the slave's log).

Any ideas?

Thanks for the help.


How to see the query generated by MoreLikeThisHandler?

2010-03-03 Thread Christopher Bottaro
Hello,

Is there a way to see exactly what query is generated by the
MoreLikeThisHandler?  If I send debugQuery=true then I see in the
response a key called parsedquery but it doesn't seem quite right.

What I mean by that is when I make the MoreLikeThis query, I set
mlt.fl to title,content but the query shown in parsedquery does
not query on title at all... only on content.  Furthermore, the
query looks something like this content:word1 content:word2
content:word3 but if I copy and paste that into a standard query,
nothing comes back because the default term operator is AND.

If I change that query to content:word1 OR content:word2 OR
content:word3, I get results but they are not the same as what the
MLT query returns.

Is there a way to see the generated query without actually running it?
 As of now, I'm making a MLT query with rows=0, but I think it's still
running the query because it takes a non trivial amount of time and it
also shows numFound in the response.

Thanks for the help,
-- Christopher


DisMaxRequestHandler questions about bf and bq

2010-03-03 Thread Christopher Bottaro
Hello,

I have a couple of questions regarding the bf and bq params to the
DisMaxRequestHandler.

1)  Can I specify them more than once?  Ex:
bf=log(popularity)bf=log(comment_count)

2)  When using bq, how can I specify what score to use for documents
not returned by the query?  In other words, how do I mimic this
behavior using bq:
bf=query($qq, 0.1)qq=site:news.yahoo.com


Thanks for the help!


Boost a document score via query using MoreLikeThisHandler

2010-03-01 Thread Christopher Bottaro
Hello,

Is it possible to boost a document's score based on something like
fq=site(com.google*).  In other words, I want to boost the score of
documents who's site field starts with com.google.

I'm using the MoreLikeThisHandler.

Thanks for the help,
-- Christopher


Re: Boost a document score via query using MoreLikeThisHandler

2010-03-01 Thread Christopher Bottaro
On Mon, Mar 1, 2010 at 7:36 PM, Christopher Bottaro
cjbott...@onespot.com wrote:
 Hello,

 Is it possible to boost a document's score based on something like
 fq=site(com.google*).  In other words, I want to boost the score of
 documents who's site field starts with com.google.

 I'm using the MoreLikeThisHandler.

 Thanks for the help,
 -- Christopher


Ok, I think I need to do this with BoostQParserPlugin and nested
queries, but I can't quite figure it out.

So this works...
q={!boost b=log(popularity)}(title:barack OR title:obama)

But instead of boosting by popularity, I want to boost by site:
q={!boost b=query({ !query q='site:*.yahoo.com' })}(title:barack OR title:obama)

This is the exception I get...
org.apache.lucene.queryParser.ParseException: Expected identifier at
pos 18 str='{!boost b=query({ !query q='site:*.yahoo.com'
})}(title:barack OR title:obama)'

But that doesn't work.  Any tips?  Thanks.