Re: dose solr sopport distribute index storage ?

2009-10-12 Thread Shalin Shekhar Mangar
On Mon, Oct 12, 2009 at 10:27 AM, Pravin Karne 
pravin_ka...@persistent.co.in wrote:

 How to set master/slave setup for solr.


Index documents only on the master. Put the slaves behind a load balancer
and query only on slaves. Setup replication between the master and slaves.
See http://wiki.apache.org/solr/SolrReplication

-- 
Regards,
Shalin Shekhar Mangar.


Re: Facet query help

2009-10-12 Thread Shalin Shekhar Mangar
On Mon, Oct 12, 2009 at 6:07 AM, Tommy Chheng tommy.chh...@gmail.comwrote:

 The dummy data set is composed of 6 docs.

 My query is set for 'tommy' with the facet query of Memory_s:1+GB

 http://lh:8983/solr/select/?facet=truefacet.field=CPU_sfacet.field=Memory_sfacet.field=Video+Card_swt=rubyfacet.query=Memory_s:1+GBq=tommyindent=on

 However, in the response (http://pastie.org/650932), I get two docs: one
 which has the correct field Memory_s:1 GB and the second document which has
 a Memory_s:3+GB. Why did the second document match if i set the facet.query
 to just 1+GB??


facet.query does not limit documents. It is used for finding the number of
documents matching the query. In order to filter the result set you should
use filter query e.g. fq=Memory_s:1 GB

-- 
Regards,
Shalin Shekhar Mangar.


Re: Is negative boost possible?

2009-10-12 Thread Andrzej Bialecki

Yonik Seeley wrote:

On Sun, Oct 11, 2009 at 6:04 PM, Lance Norskog goks...@gmail.com wrote:

And the other important
thing to know about boost values is that the dynamic range is about
6-8 bits


That's an index-time boost - an 8 bit float with 5 bits of mantissa
and 3 bits of exponent.
Query time boosts are normal 32 bit floats.


To be more specific: index-time float encoding does not permit negative 
numbers (see SmallFloat), but query-time boosts can be negative, and 
they DO affect the score - see below. BTW, standard Collectors collect 
only results with positive scores, so if you want to collect results 
with negative scores as well then you need to use a custom Collector.


---
BeanShell 2.0b4 - by Pat Niemeyer (p...@pat.net)
bsh % import org.apache.lucene.search.*;
bsh % import org.apache.lucene.index.*;
bsh % import org.apache.lucene.store.*;
bsh % import org.apache.lucene.document.*;
bsh % import org.apache.lucene.analysis.*;
bsh % tq = new TermQuery(new Term(a, b));
bsh % print(tq);
a:b
bsh % tq.setBoost(-1);
bsh % print(tq);
a:b^-1.0
bsh % q = new BooleanQuery();
bsh % tq1 = new TermQuery(new Term(a, c));
bsh % tq1.setBoost(10);
bsh % q.add(tq1, BooleanClause.Occur.SHOULD);
bsh % q.add(tq, BooleanClause.Occur.SHOULD);
bsh % print(q);
a:c^10.0 a:b^-1.0
bsh % dir = new RAMDirectory();
bsh % w = new IndexWriter(dir, new WhitespaceAnalyzer());
bsh % doc = new Document();
bsh % doc.add(new Field(a, b c d, Field.Store.YES, 
Field.Index.ANALYZED));

bsh % w.addDocument(doc);
bsh % w.close();
bsh % r = IndexReader.open(dir);
bsh % is = new IndexSearcher(r);
bsh % td = is.search(q, 10);
bsh % sd = td.scoreDocs;
bsh % print(sd.length);
1
bsh % print(is.explain(q, 0));
0.1373985 = (MATCH) sum of:
  0.15266499 = (MATCH) weight(a:c^10.0 in 0), product of:
0.99503726 = queryWeight(a:c^10.0), product of:
  10.0 = boost
  0.30685282 = idf(docFreq=1, numDocs=1)
  0.32427183 = queryNorm
0.15342641 = (MATCH) fieldWeight(a:c in 0), product of:
  1.0 = tf(termFreq(a:c)=1)
  0.30685282 = idf(docFreq=1, numDocs=1)
  0.5 = fieldNorm(field=a, doc=0)
  -0.0152664995 = (MATCH) weight(a:b^-1.0 in 0), product of:
-0.099503726 = queryWeight(a:b^-1.0), product of:
  -1.0 = boost
  0.30685282 = idf(docFreq=1, numDocs=1)
  0.32427183 = queryNorm
0.15342641 = (MATCH) fieldWeight(a:b in 0), product of:
  1.0 = tf(termFreq(a:b)=1)
  0.30685282 = idf(docFreq=1, numDocs=1)
  0.5 = fieldNorm(field=a, doc=0)

bsh %


--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: rollback and cumulative_add

2009-10-12 Thread Koji Sekiguchi
Koji Sekiguchi wrote:
 Hello,

 I found that rollback resets adds and docsPending count,
 but doesn't reset cumulative_adds.

 $ cd example/exampledocs
 # comment out the line of commit/ so avoid committing in post.sh
 $ ./post.sh *.xml
 = docsPending=19, adds=19, cumulative_adds=19

 # do rollback
 $ curl http://localhost:8983/solr/update?rollback=true
 = rollbacks=1, docsPending=0, adds=0, cumulative_adds=19

 Is this correct behavior?

 Koji

   
(forwarded dev list)

I think this is a bug that was introduced by me when I contributed
the first patch for the rollback and the bug was inherited by
the successive patches. I'll reopen SOLR-670 and attach the fix soon:

https://issues.apache.org/jira/browse/SOLR-670

Koji
-- 

http://www.rondhuit.com/




Re: Is negative boost possible?

2009-10-12 Thread Yonik Seeley
On Mon, Oct 12, 2009 at 5:58 AM, Andrzej Bialecki a...@getopt.org wrote:
 BTW, standard Collectors collect only results
 with positive scores, so if you want to collect results with negative scores
 as well then you need to use a custom Collector.

Solr never discarded non-positive hits, and now Lucene 2.9 no longer
does either.

-Yonik


two facet.prefix on one facet field in a single query

2009-10-12 Thread Bill Au
Is it possible to have two different facet.prefix on the same facet field in
a single query.  I wan to get facet counts for two prefix, xx and yy.  I
tried using two facet.prefix (ie facet.prefix=xxfacet.prefix=yy) but the
second one seems to have no effect.

Bill


Re: Facet query help

2009-10-12 Thread Tommy Chheng
ok, so fq != facet.query. i thought it was an alias. I'm trying your 
suggestion fq=Memory_s:1 GB and now it's returning zero documents even 
though there is one document that has tommy and Memory_s:1 GB as 
seen in the original pastie(http://pastie.org/650932). I tried the fq 
query body with quotes and without quotes.


http://lh:8983/solr/select/?facet=truefacet.field=CPU_sfacet.field=Memory_sfacet.field=Video+Card_swt=rubyfq=%22Memory_s:1+GB%22q=tommyindent=on

Any thoughts?

thanks,
tommy

On 10/12/09 1:00 AM, Shalin Shekhar Mangar wrote:

On Mon, Oct 12, 2009 at 6:07 AM, Tommy Chhengtommy.chh...@gmail.comwrote:

   

The dummy data set is composed of 6 docs.

My query is set for 'tommy' with the facet query of Memory_s:1+GB

http://lh:8983/solr/select/?facet=truefacet.field=CPU_sfacet.field=Memory_sfacet.field=Video+Card_swt=rubyfacet.query=Memory_s:1+GBq=tommyindent=on

However, in the response (http://pastie.org/650932), I get two docs: one
which has the correct field Memory_s:1 GB and the second document which has
a Memory_s:3+GB. Why did the second document match if i set the facet.query
to just 1+GB??


 

facet.query does not limit documents. It is used for finding the number of
documents matching the query. In order to filter the result set you should
use filter query e.g. fq=Memory_s:1 GB

   


Re: format of sort parameter in Solr::Request::Standard

2009-10-12 Thread Paul Rosen
I did an experiment that worked. In Solr::Request::Standard, in the 
to_hash() method, I changed the commented line below to the two lines 
following it.


sort = @params[:sort].collect do |sort|
  key = sort.keys[0]
  #{key.to_s} #{sort[key] == :descending ? 'desc' : 'asc'}
end.join(',') if @params[:sort]

# START OF CHANGES
#hash[:q] = sort ? #...@params[:query]};#{sort} : @params[:query]
hash[:q] = @params[:query]
hash[:sort] = sort if sort != nil
# END OF CHANGES

hash[q.op] = @params[:operator]
hash[:df] = @params[:default_field]

Does this make sense? Should this be changed in the next version of the 
solr-ruby gem?


Paul Rosen wrote:

Hi all,

I'm using solr-ruby 0.0.7 and am having trouble getting Sort to work.

I have the following statement:

req = Solr::Request::Standard.new(:start = start, :rows = max,
 :sort = [ :title_sort = :ascending ],
 :query = query, :filter_queries = filter_queries,
 :field_list = @field_list,
 :facets = {:fields = @facet_fields, :mincount = 1, :missing = true, 
:limit = -1},
 :highlighting = {:field_list = ['text'], :fragment_size = 600}, 
:shards = @cores)


That produces no results, but removing the :sort parameter off does give 
results.


Here is the output from solr:

INFO: [merged] webapp=/solr path=/select 
params={wt=rubyfacet.limit=-1rows=30start=0facet=truefacet.mincount=1q=(rossetti);title_sort+ascfl=archive,date_label,genre,role_ART,role_AUT,role_EDT,role_PBL,role_TRL,source,image,thumbnail,text_url,title,alternative,uri,url,exhibit_type,license,title_sort,author_sortqt=standardfacet.missing=truehl.fl=textfacet.field=genrefacet.field=archivefacet.field=freeculturehl.fragsize=600hl=trueshards=localhost:8983/solr/merged} 
status=0 QTime=19


It looks to me like the string should have sort=title_sort+asc 
instead of ;title_sort_asc tacked on to the query, but I'm not sure 
about that.


Any clues what I'm doing wrong?

Thanks,
Paul




Re: format of sort parameter in Solr::Request::Standard

2009-10-12 Thread Erik Hatcher

Paul-

Trunk solr-ruby has this instead:

hash[:sort] = @params[:sort].collect do |sort|
  key = sort.keys[0]
  #{key.to_s} #{sort[key] == :descending ? 'desc' : 'asc'}
end.join(',') if @params[:sort]

The ;sort... stuff is now deprecated with Solr itself

I suppose the 0.8 gem needs to be pushed to rubyforge, eh?

Erik


On Oct 12, 2009, at 11:03 AM, Paul Rosen wrote:

I did an experiment that worked. In Solr::Request::Standard, in the  
to_hash() method, I changed the commented line below to the two  
lines following it.


   sort = @params[:sort].collect do |sort|
 key = sort.keys[0]
 #{key.to_s} #{sort[key] == :descending ? 'desc' : 'asc'}
   end.join(',') if @params[:sort]

# START OF CHANGES
   #hash[:q] = sort ? #...@params[:query]};#{sort} : @params[:query]
   hash[:q] = @params[:query]
   hash[:sort] = sort if sort != nil
# END OF CHANGES

   hash[q.op] = @params[:operator]
   hash[:df] = @params[:default_field]

Does this make sense? Should this be changed in the next version of  
the solr-ruby gem?


Paul Rosen wrote:

Hi all,
I'm using solr-ruby 0.0.7 and am having trouble getting Sort to work.
I have the following statement:
req = Solr::Request::Standard.new(:start = start, :rows = max,
:sort = [ :title_sort = :ascending ],
:query = query, :filter_queries = filter_queries,
:field_list = @field_list,
:facets = {:fields = @facet_fields, :mincount = 1, :missing =  
true, :limit = -1},
:highlighting = {:field_list = ['text'], :fragment_size =  
600}, :shards = @cores)
That produces no results, but removing the :sort parameter off does  
give results.

Here is the output from solr:
INFO: [merged] webapp=/solr path=/select  
params 
= 
{wt 
= 
ruby 
facet 
.limit 
= 
-1 
rows=30start=0facet=truefacet.mincount=1q=(rossetti);title_sort 
+ 
asc 
fl 
= 
archive 
,date_label 
,genre 
,role_ART 
,role_AUT 
,role_EDT 
,role_PBL 
,role_TRL 
,source 
,image 
,thumbnail 
,text_url 
,title 
,alternative 
,uri 
,url 
,exhibit_type 
,license 
,title_sort 
,author_sort 
qt 
= 
standard 
facet 
.missing 
= 
true 
hl 
.fl 
= 
text 
facet 
.field 
= 
genre 
facet 
.field 
= 
archive 
facet.field=freeculturehl.fragsize=600hl=trueshards=localhost: 
8983/solr/merged} status=0 QTime=19
It looks to me like the string should have sort=title_sort+asc  
instead of ;title_sort_asc tacked on to the query, but I'm not  
sure about that.

Any clues what I'm doing wrong?
Thanks,
Paul






Solr over DRBD

2009-10-12 Thread Pieter Steyn
Hi there,

I have a 2 node cluster running apache and solr over a shared
partition ontop of DRBD.   Think of it like a SAN.

I'm curios as to how I should do load balancing / sharing with Solr in
this setup.  I'm already using DNS round robbin for apache.

My Solr installation is on /cluster/Solr.  I've been starting an
instance of Solr on each server out of the same installation / working
directory.
Is this safe?  I haven't noticed any problems so far.

Does this mean they'll share the same index?  Is there a better way to
do this?  Should I perhaps only do commits on one of the servers (and
setup heartbeat to determine which server to run the commit on)?

I'm running Solr 1.3, but I'm not against upgrading if that provides
me with a better way of load balancing.

Kind regards,
Pieter


capitalization and delimiters

2009-10-12 Thread Audrey Foo


In my search docs, I have content such as 'powershot' and 'powerShot'.
I would expect 'powerShot' would be searched as 'power', 'shot' and 
'powershot', so that results for all these are returned. Instead, only results 
for 'power' and 'shot' are returned.
Any suggestions?
In the schema, index analyzer:filter class=solr.WordDelimiterFilterFactory 
generateWordParts=0 generateNumberParts=0 catenateWords=1 
catenateNumbers=1 catenateAll=0/filter 
class=solr.LowerCaseFilterFactory/
In the schema, query analyzerfilter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/filter 
class=solr.LowerCaseFilterFactory/
ThanksAudrey  
_
New! Open Messenger faster on the MSN homepage
http://go.microsoft.com/?linkid=9677405

Re: Default query parameter for one core

2009-10-12 Thread Michael
Thanks for your input, Shalin.

On Sun, Oct 11, 2009 at 12:30 AM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 - I can't use a variable like ${shardsParam} in a single shared
 solrconfig.xml, because the line
    str name=shards${shardsParam}/str
  has to be in there, and that forces a (possibly empty) shards
 parameter onto cores that *don't* need one, causing a
 NullPointerException.


 Well, we can fix the NPE :)  Please raise an issue.

The NPE may be the correct behavior -- I'm causing an empty shards=
parameter, which doesn't have a defined behavior AFAIK.  The
deficiency I was pointing out was that using ${shardsParam} doesn't
help me achieve my real goal, which is to have the entire str tag
disappear for some shards.

 So I think my best bet is to make two mostly-identical
 solrconfig.xmls, and point core0 to the one specifying a shards=
 parameter:
    core name=core0 config=core0_solrconfig.xml/

 I don't like the duplication of config, but at least it accomplishes my
 goal!


 There is another way too. Each plugin in Solr now supports a configuration
 attribute named enable which can be true or false. You can control the
 value (true/false) through a variable. So you can duplicate just the handle
 instead of the complete solrconfig.xml

I had looked into this, but thought it doesn't help because I'm not
disabling an entire plugin -- just a str tag specifying a default
parameter to a requestHandler.  Individual str tags don't have an
enable flag for me to conditionally set to false.  Maybe I'm
misunderstanding what you're suggesting?

Thanks again,
Michael


Re: Is negative boost possible?

2009-10-12 Thread Andrzej Bialecki

Yonik Seeley wrote:

On Mon, Oct 12, 2009 at 5:58 AM, Andrzej Bialecki a...@getopt.org wrote:

BTW, standard Collectors collect only results
with positive scores, so if you want to collect results with negative scores
as well then you need to use a custom Collector.


Solr never discarded non-positive hits, and now Lucene 2.9 no longer
does either.


Hmm ... The code that I pasted in my previous email uses 
Searcher.search(Query, int), which in turn uses search(Query, Filter, 
int), and it doesn't return any results if only the first clause is 
present (the one with negative boost) even though it's a matching clause.


I think this is related to the fact that in TopScoreDocCollector:48 the 
pqTop.score is initialized to 0, and then all results that have lower 
score that this are discarded. Perhaps this should be initialized to 
Float.MIN_VALUE?



--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Scoring for specific field queries

2009-10-12 Thread R. Tan
Avlesh,

I got it, finally, by doing an OR between the two fields, one with an exact
match keyword and the other is grouped.

q=suggestion:formula xxx OR tokenized_suggestion:(formula )

Thanks for all your help!

Rih


On Fri, Oct 9, 2009 at 4:26 PM, R. Tan tanrihae...@gmail.com wrote:

 I ended up with the same set of results earlier but I don't results such as
 the champion, I think because of the EdgeNGram filter.

 With NGram, I'm back to the same problem:

 Result for q=ca

 doc
 float name=score0.8717008/float
 str name=tokenized_suggestionBlu Jazz Cafe/str
 /doc

 doc
 float name=score0.8717008/float
 str name=tokenized_suggestionCafé in the Pond/str
 /doc




Letters with accent in query

2009-10-12 Thread R. Tan
Hi,
I'm querying with an accented keyword such as café but the debug info
shows that it is only searching for caf. I'm using the ISOLatin1Accent
filter as well.

Query:
http://localhost:8983/solr/select?q=%E9debugQuery=true

Params return shows this:
lst name=params
str name=q/
str name=debugQuerytrue/str
/lst

What am I missing here?

Rih


Re: Default query parameter for one core

2009-10-12 Thread Michael
OK, a hacky but working solution to making one core shard to all
others: have the default parameter *name* vary, so that one core gets
shards=foo and all other cores get dummy=foo.

# solr.xml
solr ...
property name=shardsKey value=dummy /
property name=shardsValue value= /
cores ...
  core name=core0 instanceDir=./
property name=shardsKey value=shards /
property name=shardsValue value=localhost:9990/solr/core1,.../
  /core
  core name=core1 instanceDir=./ dataDir=/search/1/
   ...
/cores
/solr

# solrconfig.xml
requestHandler ...
  list name=defaults
str name=${shardsKey}${shardsValue}/str
...

Michael

On Mon, Oct 12, 2009 at 12:00 PM, Michael solrco...@gmail.com wrote:
 Thanks for your input, Shalin.

 On Sun, Oct 11, 2009 at 12:30 AM, Shalin Shekhar Mangar
 shalinman...@gmail.com wrote:
 - I can't use a variable like ${shardsParam} in a single shared
 solrconfig.xml, because the line
    str name=shards${shardsParam}/str
  has to be in there, and that forces a (possibly empty) shards
 parameter onto cores that *don't* need one, causing a
 NullPointerException.


 Well, we can fix the NPE :)  Please raise an issue.

 The NPE may be the correct behavior -- I'm causing an empty shards=
 parameter, which doesn't have a defined behavior AFAIK.  The
 deficiency I was pointing out was that using ${shardsParam} doesn't
 help me achieve my real goal, which is to have the entire str tag
 disappear for some shards.

 So I think my best bet is to make two mostly-identical
 solrconfig.xmls, and point core0 to the one specifying a shards=
 parameter:
    core name=core0 config=core0_solrconfig.xml/

 I don't like the duplication of config, but at least it accomplishes my
 goal!


 There is another way too. Each plugin in Solr now supports a configuration
 attribute named enable which can be true or false. You can control the
 value (true/false) through a variable. So you can duplicate just the handle
 instead of the complete solrconfig.xml

 I had looked into this, but thought it doesn't help because I'm not
 disabling an entire plugin -- just a str tag specifying a default
 parameter to a requestHandler.  Individual str tags don't have an
 enable flag for me to conditionally set to false.  Maybe I'm
 misunderstanding what you're suggesting?

 Thanks again,
 Michael



Re: Letters with accent in query

2009-10-12 Thread Michael
What tokenizer and filters are you using in what order?  See schema.xml.

Also, you may wish to use ASCIIFoldingFilter, which covers more cases
than ISOLatin1AccentFilter.

Michael

On Mon, Oct 12, 2009 at 12:42 PM, R. Tan tanrihae...@gmail.com wrote:
 Hi,
 I'm querying with an accented keyword such as café but the debug info
 shows that it is only searching for caf. I'm using the ISOLatin1Accent
 filter as well.

 Query:
 http://localhost:8983/solr/select?q=%E9debugQuery=true

 Params return shows this:
 lst name=params
 str name=q/
 str name=debugQuerytrue/str
 /lst

 What am I missing here?

 Rih



Search results order

2009-10-12 Thread bhaskar chandrasekar
Hi,
 
I have indexed my xml which contains the following data.
 
add
doc
  field name=urlhttp://www.yahoo.com /field
  field name=titleyahoomail/field
  field name=descriptionyahoo has various links and gives in detail about 
the all the links in it/field
/doc
doc
  field name=urlhttp://www.rediff.com/field
  field name=titleIt is a good website/field
  field name=descriptionRediff has a interesting homepage/field
/doc
doc
  field name=urlhttp://www.ndtv.com/field
  field name=titleNdtv has a variety of good links/field
  field name=descriptionThe homepage of Ndtv is very good/field
/doc
/add
 
 
In my solr home page , when I search input as “good”
 
It displays the docs which has “good” as highest occurrences by default.
 
The output comes as follows.
doc
  field name=urlhttp://www.ndtv.com/field
  field name=titleNdtv has a variety of good links/field
  field name=descriptionThe homepage of Ndtv is very good/field
/doc
doc
  field name=urlhttp://www.rediff.com/field
  field name=titleIt is a good website/field
  field name=descriptionRediff has a interesting homepage/field
/doc
 
If I need to display doc which has least occurrence of search input “good” as 
first result.
 
What changes should I make in solrconfig file to achieve the same?.
Any suggestions would be helpful.
 
 
For me the output should come as below.
 
doc
  field name=urlhttp://www.rediff.com/field
  field name=titleIt is a good website/field
  field name=descriptionRediff has a interesting homepage/field
/doc
doc
  field name=urlhttp://www.ndtv.com/field
  field name=titleNdtv has a variety of good links/field
  field name=descriptionThe homepage of Ndtv is very good/field
/doc
 
Regards
Bhaskar


  

Re: dose solr sopport distribute index storage ?

2009-10-12 Thread Chaitali Gupta
Hi, 

How should we setup master and slaves in Solr? What configuration files and 
parameters should we need to change and how ? 

Thanks, 
Chaitali 

--- On Mon, 10/12/09, Shalin Shekhar Mangar shalinman...@gmail.com wrote:

From: Shalin Shekhar Mangar shalinman...@gmail.com
Subject: Re: dose solr sopport distribute index storage ?
To: solr-user@lucene.apache.org
Date: Monday, October 12, 2009, 3:17 AM

On Mon, Oct 12, 2009 at 10:27 AM, Pravin Karne 
pravin_ka...@persistent.co.in wrote:

 How to set master/slave setup for solr.


Index documents only on the master. Put the slaves behind a load balancer
and query only on slaves. Setup replication between the master and slaves.
See http://wiki.apache.org/solr/SolrReplication

-- 
Regards,
Shalin Shekhar Mangar.



  

Conditional copyField

2009-10-12 Thread David Stuart

Hi,
I am pushing data to solr from two different sources nutch and a cms.  
I have a data clash in that in nutch a copyField is required to push  
the url field to the id field as it is used as  the primary lookup in  
the nutch solr intergration update. The other cms also uses the url  
field but also populates the id field with a different value. Now I  
can't really change either source definition so is there a way in  
solrconfig or schema to check if id is empty and only copy if true or  
is there a better way via the updateprocessor?


Thanks for your help in advance
Regards

David


Re: format of sort parameter in Solr::Request::Standard

2009-10-12 Thread Erik Hatcher
I've just pushed a new 0.0.8 gem to Rubyforge that includes the fix I  
described for the sort parameter.


Erik


On Oct 12, 2009, at 11:03 AM, Paul Rosen wrote:

I did an experiment that worked. In Solr::Request::Standard, in the  
to_hash() method, I changed the commented line below to the two  
lines following it.


   sort = @params[:sort].collect do |sort|
 key = sort.keys[0]
 #{key.to_s} #{sort[key] == :descending ? 'desc' : 'asc'}
   end.join(',') if @params[:sort]

# START OF CHANGES
   #hash[:q] = sort ? #...@params[:query]};#{sort} : @params[:query]
   hash[:q] = @params[:query]
   hash[:sort] = sort if sort != nil
# END OF CHANGES

   hash[q.op] = @params[:operator]
   hash[:df] = @params[:default_field]

Does this make sense? Should this be changed in the next version of  
the solr-ruby gem?


Paul Rosen wrote:

Hi all,
I'm using solr-ruby 0.0.7 and am having trouble getting Sort to work.
I have the following statement:
req = Solr::Request::Standard.new(:start = start, :rows = max,
:sort = [ :title_sort = :ascending ],
:query = query, :filter_queries = filter_queries,
:field_list = @field_list,
:facets = {:fields = @facet_fields, :mincount = 1, :missing =  
true, :limit = -1},
:highlighting = {:field_list = ['text'], :fragment_size =  
600}, :shards = @cores)
That produces no results, but removing the :sort parameter off does  
give results.

Here is the output from solr:
INFO: [merged] webapp=/solr path=/select  
params 
= 
{wt 
= 
ruby 
facet 
.limit 
= 
-1 
rows=30start=0facet=truefacet.mincount=1q=(rossetti);title_sort 
+ 
asc 
fl 
= 
archive 
,date_label 
,genre 
,role_ART 
,role_AUT 
,role_EDT 
,role_PBL 
,role_TRL 
,source 
,image 
,thumbnail 
,text_url 
,title 
,alternative 
,uri 
,url 
,exhibit_type 
,license 
,title_sort 
,author_sort 
qt 
= 
standard 
facet 
.missing 
= 
true 
hl 
.fl 
= 
text 
facet 
.field 
= 
genre 
facet 
.field 
= 
archive 
facet.field=freeculturehl.fragsize=600hl=trueshards=localhost: 
8983/solr/merged} status=0 QTime=19
It looks to me like the string should have sort=title_sort+asc  
instead of ;title_sort_asc tacked on to the query, but I'm not  
sure about that.

Any clues what I'm doing wrong?
Thanks,
Paul






Re: dose solr sopport distribute index storage ?

2009-10-12 Thread Dan Trainor

On 10/12/2009 10:49 AM, Chaitali Gupta wrote:

Hi,

How should we setup master and slaves in Solr? What configuration files and 
parameters should we need to change and how ?

Thanks,
Chaitali


Hi -

I think Shalin was pretty clear on that, it is documented very well at 
http://wiki.apache.org/solr/SolrReplication .


I am responding, however, to explain something that took me a bit of 
time to wrap my brain around in the hopes that it helps you and perhaps 
some others.


Solr in itself does not replicate.  Instead, Solr relies on an 
underlying rsync setup to keep these indices sync'd throughout the 
collective.  When you break it down, its simply rsync with a 
configuration file making all the nodes aware that they participate in 
this configuration.  Wrap a cron around this between all the nodes, and 
they simply replicate raw data from one master to one or more slave.


I would suggest reading up on how snapshots are preformed and how the 
log files are created/what they do.  Of course it would benefit you to 
know the ins and outs of all the elements that help Solr replicate, but 
its been my experience that most of it has to do with those particular 
items.


Thanks
-dant



Re: dose solr sopport distribute index storage ?

2009-10-12 Thread Pieter Steyn
Sorry for the hijack, but s replication necessary when using a cluster
file-system such as GFS2.  Where the files are the same for any
instance of Solr?


On Mon, Oct 12, 2009 at 8:36 PM, Dan Trainor dtrai...@toolbox.com wrote:
 On 10/12/2009 10:49 AM, Chaitali Gupta wrote:

 Hi,

 How should we setup master and slaves in Solr? What configuration files
 and parameters should we need to change and how ?

 Thanks,
 Chaitali

 Hi -

 I think Shalin was pretty clear on that, it is documented very well at
 http://wiki.apache.org/solr/SolrReplication .

 I am responding, however, to explain something that took me a bit of time to
 wrap my brain around in the hopes that it helps you and perhaps some others.

 Solr in itself does not replicate.  Instead, Solr relies on an underlying
 rsync setup to keep these indices sync'd throughout the collective.  When
 you break it down, its simply rsync with a configuration file making all the
 nodes aware that they participate in this configuration.  Wrap a cron
 around this between all the nodes, and they simply replicate raw data from
 one master to one or more slave.

 I would suggest reading up on how snapshots are preformed and how the log
 files are created/what they do.  Of course it would benefit you to know the
 ins and outs of all the elements that help Solr replicate, but its been my
 experience that most of it has to do with those particular items.

 Thanks
 -dant




Re: Search results order

2009-10-12 Thread Nicholas Clark
You can reverse the sort order. In this case, you want score ascending:

sort=score+asc

If you just want documents without that keyword, then try using the minus
sign:

q=-good

http://wiki.apache.org/solr/CommonQueryParameters

-Nick


On Mon, Oct 12, 2009 at 1:19 PM, bhaskar chandrasekar
bas_s...@yahoo.co.inwrote:

 Hi,

 I have indexed my xml which contains the following data.

 add
 doc
   field name=urlhttp://www.yahoo.com /field
   field name=titleyahoomail/field
   field name=descriptionyahoo has various links and gives in detail
 about the all the links in it/field
 /doc
 doc
   field name=urlhttp://www.rediff.com/field
   field name=titleIt is a good website/field
   field name=descriptionRediff has a interesting homepage/field
 /doc
 doc
   field name=urlhttp://www.ndtv.com/field
   field name=titleNdtv has a variety of good links/field
   field name=descriptionThe homepage of Ndtv is very good/field
 /doc
 /add


 In my solr home page , when I search input as “good”

 It displays the docs which has “good” as highest occurrences by default.

 The output comes as follows.
 doc
   field name=urlhttp://www.ndtv.com/field
   field name=titleNdtv has a variety of good links/field
   field name=descriptionThe homepage of Ndtv is very good/field
 /doc
 doc
   field name=urlhttp://www.rediff.com/field
   field name=titleIt is a good website/field
   field name=descriptionRediff has a interesting homepage/field
 /doc

 If I need to display doc which has least occurrence of search input “good”
 as first result.

 What changes should I make in solrconfig file to achieve the same?.
 Any suggestions would be helpful.


 For me the output should come as below.

 doc
   field name=urlhttp://www.rediff.com/field
   field name=titleIt is a good website/field
   field name=descriptionRediff has a interesting homepage/field
 /doc
 doc
   field name=urlhttp://www.ndtv.com/field
   field name=titleNdtv has a variety of good links/field
   field name=descriptionThe homepage of Ndtv is very good/field
 /doc

 Regards
 Bhaskar





Re: Boosting of words

2009-10-12 Thread Nicholas Clark
The easiest way to boost your query is to modify your query string.

q=product:red color:red^10

In the above example, I have boosted the color field. If red is found in
that field, it will get a boost of 10. If it is only found in the product
field, then there will be no boost.

Here's more information:

http://wiki.apache.org/solr/SolrRelevancyCookbook#Boosting_Ranking_Terms

Once you're comfortable with that, I suggest that you look into using the
DisMax request handler. It will allow you to easily search across multiple
fields with custom boost values.

http://wiki.apache.org/solr/DisMaxRequestHandler

-Nick


On Sun, Oct 11, 2009 at 12:26 PM, bhaskar chandrasekar bas_s...@yahoo.co.in
 wrote:

 Hi,

 I would like to know how can i give boosting to search input in Solr.
 Where exactly should i make the changes?.

 Regards
 Bhaskar





Re: Is negative boost possible?

2009-10-12 Thread Yonik Seeley
On Mon, Oct 12, 2009 at 12:03 PM, Andrzej Bialecki a...@getopt.org wrote:
 Solr never discarded non-positive hits, and now Lucene 2.9 no longer
 does either.

 Hmm ... The code that I pasted in my previous email uses
 Searcher.search(Query, int), which in turn uses search(Query, Filter, int),
 and it doesn't return any results if only the first clause is present (the
 one with negative boost) even though it's a matching clause.

 I think this is related to the fact that in TopScoreDocCollector:48 the
 pqTop.score is initialized to 0, and then all results that have lower score
 that this are discarded. Perhaps this should be initialized to
 Float.MIN_VALUE?

Hmmm, You're actually seeing this with Lucene 2.9?
The HitQueue (subclass of PriorityQueue) is pre-populated with
sentinel objects with scores of -Inf, not zero.

-Yonik
http://www.lucidimagination.com


Re: Conditional copyField

2009-10-12 Thread AHMET ARSLAN
 Hi,
 I am pushing data to solr from two different sources nutch
 and a cms. I have a data clash in that in nutch a copyField
 is required to push the url field to the id field as it is
 used as  the primary lookup in the nutch solr
 intergration update. The other cms also uses the url field
 but also populates the id field with a different value. Now
 I can't really change either source definition so is there a
 way in solrconfig or schema to check if id is empty and only
 copy if true or is there a better way via the
 updateprocessor?

copyField declaration has three attributes: source, dest and maxChars.
Therefore it can be concluded that there is no way to do it in schema.xml

Luckily, Wiki [1] has a quick example that implements a conditional copyField.

[1] http://wiki.apache.org/solr/UpdateRequestProcessor





doing searches from within an UpdateRequestProcessor

2009-10-12 Thread Bill Au
Is it possible to do searches from within an UpdateRequestProcessor?  The
documents in my index reference each other.  When a document is deleted, I
would like to update all documents containing a reference to the deleted
document.  My initial idea is to use a custom UpdateRequestProcessor.  Is
there a better way to do this?
Bill


Lucene Merge Threads

2009-10-12 Thread Giovanni Fernandez-Kincade
Hi,
I'm attempting to optimize a pretty large index, and even though the optimize 
request timed out, I watched it using a profiler and saw that the optimize 
thread continued executing. Eventually it completed, but in the background I 
still see a thread performing a merge:

Lucene Merge Thread #0 [RUNNABLE, IN_NATIVE] CPU time: 17:51
java.io.RandomAccessFile.readBytes(byte[], int, int)
java.io.RandomAccessFile.read(byte[], int, int)
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(byte[],
 int, int)
org.apache.lucene.store.BufferedIndexInput.refill()
org.apache.lucene.store.BufferedIndexInput.readByte()
org.apache.lucene.store.IndexInput.readVInt()
org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos)
org.apache.lucene.index.SegmentTermEnum.next()
org.apache.lucene.index.SegmentMergeInfo.next()
org.apache.lucene.index.SegmentMerger.mergeTermInfos(FormatPostingsFieldsConsumer)
org.apache.lucene.index.SegmentMerger.mergeTerms()
org.apache.lucene.index.SegmentMerger.merge(boolean)
org.apache.lucene.index.IndexWriter.mergeMiddle(MergePolicy$OneMerge)
org.apache.lucene.index.IndexWriter.merge(MergePolicy$OneMerge)
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(MergePolicy$OneMerge)
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run()


This has taken quite a while, and hasn't really been fully utilizing the 
machine's resources. After looking at the Lucene source, I noticed that you can 
set a MaxThreadCount parameter in this class. Is this parameter exposed by Solr 
somehow? I see the class mentioned, commented out, in my solrconfig.xml, but 
I'm not sure of the correct way to specify the parameter:

!--
 Expert:
 The Merge Scheduler in Lucene controls how merges are performed.  The 
ConcurrentMergeScheduler (Lucene 2.3 default)
  can perform merges in the background using separate threads.  The 
SerialMergeScheduler (Lucene 2.2 default) does not.
 --

!--mergeSchedulerorg.apache.lucene.index.ConcurrentMergeScheduler/mergeScheduler--


Also, if I can specify this parameter, is it safe to just start/stop my servlet 
server (Tomcat) mid-merge?

Thanks in advance,
Gio.


Re: Lucene Merge Threads

2009-10-12 Thread Jason Rutherglen
Try this in solrconfig.xml:

mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler
  int name=maxThreadCount1/int
/mergeScheduler

Yes you can stop the process mid-merge.  The partially merged files
will be deleted on restart.

We need to update the wiki?

On Mon, Oct 12, 2009 at 4:05 PM, Giovanni Fernandez-Kincade
gfernandez-kinc...@capitaliq.com wrote:
 Hi,
 I'm attempting to optimize a pretty large index, and even though the optimize 
 request timed out, I watched it using a profiler and saw that the optimize 
 thread continued executing. Eventually it completed, but in the background I 
 still see a thread performing a merge:

 Lucene Merge Thread #0 [RUNNABLE, IN_NATIVE] CPU time: 17:51
 java.io.RandomAccessFile.readBytes(byte[], int, int)
 java.io.RandomAccessFile.read(byte[], int, int)
 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(byte[],
  int, int)
 org.apache.lucene.store.BufferedIndexInput.refill()
 org.apache.lucene.store.BufferedIndexInput.readByte()
 org.apache.lucene.store.IndexInput.readVInt()
 org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos)
 org.apache.lucene.index.SegmentTermEnum.next()
 org.apache.lucene.index.SegmentMergeInfo.next()
 org.apache.lucene.index.SegmentMerger.mergeTermInfos(FormatPostingsFieldsConsumer)
 org.apache.lucene.index.SegmentMerger.mergeTerms()
 org.apache.lucene.index.SegmentMerger.merge(boolean)
 org.apache.lucene.index.IndexWriter.mergeMiddle(MergePolicy$OneMerge)
 org.apache.lucene.index.IndexWriter.merge(MergePolicy$OneMerge)
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(MergePolicy$OneMerge)
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run()


 This has taken quite a while, and hasn't really been fully utilizing the 
 machine's resources. After looking at the Lucene source, I noticed that you 
 can set a MaxThreadCount parameter in this class. Is this parameter exposed 
 by Solr somehow? I see the class mentioned, commented out, in my 
 solrconfig.xml, but I'm not sure of the correct way to specify the parameter:

 !--
     Expert:
     The Merge Scheduler in Lucene controls how merges are performed.  The 
 ConcurrentMergeScheduler (Lucene 2.3 default)
      can perform merges in the background using separate threads.  The 
 SerialMergeScheduler (Lucene 2.2 default) does not.
     --
    
 !--mergeSchedulerorg.apache.lucene.index.ConcurrentMergeScheduler/mergeScheduler--


 Also, if I can specify this parameter, is it safe to just start/stop my 
 servlet server (Tomcat) mid-merge?

 Thanks in advance,
 Gio.



RE: Lucene Merge Threads

2009-10-12 Thread Giovanni Fernandez-Kincade
Do you have to make a new call to optimize to make it start the merge again?

-Original Message-
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] 
Sent: Monday, October 12, 2009 7:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Lucene Merge Threads

Try this in solrconfig.xml:

mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler
  int name=maxThreadCount1/int
/mergeScheduler

Yes you can stop the process mid-merge.  The partially merged files
will be deleted on restart.

We need to update the wiki?

On Mon, Oct 12, 2009 at 4:05 PM, Giovanni Fernandez-Kincade
gfernandez-kinc...@capitaliq.com wrote:
 Hi,
 I'm attempting to optimize a pretty large index, and even though the optimize 
 request timed out, I watched it using a profiler and saw that the optimize 
 thread continued executing. Eventually it completed, but in the background I 
 still see a thread performing a merge:

 Lucene Merge Thread #0 [RUNNABLE, IN_NATIVE] CPU time: 17:51
 java.io.RandomAccessFile.readBytes(byte[], int, int)
 java.io.RandomAccessFile.read(byte[], int, int)
 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(byte[],
  int, int)
 org.apache.lucene.store.BufferedIndexInput.refill()
 org.apache.lucene.store.BufferedIndexInput.readByte()
 org.apache.lucene.store.IndexInput.readVInt()
 org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos)
 org.apache.lucene.index.SegmentTermEnum.next()
 org.apache.lucene.index.SegmentMergeInfo.next()
 org.apache.lucene.index.SegmentMerger.mergeTermInfos(FormatPostingsFieldsConsumer)
 org.apache.lucene.index.SegmentMerger.mergeTerms()
 org.apache.lucene.index.SegmentMerger.merge(boolean)
 org.apache.lucene.index.IndexWriter.mergeMiddle(MergePolicy$OneMerge)
 org.apache.lucene.index.IndexWriter.merge(MergePolicy$OneMerge)
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(MergePolicy$OneMerge)
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run()


 This has taken quite a while, and hasn't really been fully utilizing the 
 machine's resources. After looking at the Lucene source, I noticed that you 
 can set a MaxThreadCount parameter in this class. Is this parameter exposed 
 by Solr somehow? I see the class mentioned, commented out, in my 
 solrconfig.xml, but I'm not sure of the correct way to specify the parameter:

 !--
     Expert:
     The Merge Scheduler in Lucene controls how merges are performed.  The 
 ConcurrentMergeScheduler (Lucene 2.3 default)
      can perform merges in the background using separate threads.  The 
 SerialMergeScheduler (Lucene 2.2 default) does not.
     --
    
 !--mergeSchedulerorg.apache.lucene.index.ConcurrentMergeScheduler/mergeScheduler--


 Also, if I can specify this parameter, is it safe to just start/stop my 
 servlet server (Tomcat) mid-merge?

 Thanks in advance,
 Gio.



Re: two facet.prefix on one facet field in a single query

2009-10-12 Thread Bill Au
It looks like there is a JIRA covering this:

https://issues.apache.org/jira/browse/SOLR-1387

On Mon, Oct 12, 2009 at 11:00 AM, Bill Au bill.w...@gmail.com wrote:

 Is it possible to have two different facet.prefix on the same facet field
 in a single query.  I wan to get facet counts for two prefix, xx and
 yy.  I tried using two facet.prefix (ie facet.prefix=xxfacet.prefix=yy)
 but the second one seems to have no effect.

 Bill



XSLT Response for multivalue fields

2009-10-12 Thread blholmes

I am having trouble generating the xsl file for multivalue entries. I'm not
sure I'm missing something, or if this is how it is supposed to function. I
have to authors and I'd like to have seperate ByLine notes in my
translation.
Here is what solr returns normally
...
arr name=author
strCrista  Souza/str
strDarrell  Dunn/str
/arr

Here is my xsl
xsl:for-each select=a...@name='author']::*
   ByLine
xsl:value-of select=./
   /ByLine
/xsl:for-each


And here is what it is returning:
ByLineCrista  SouzaDarrell  Dunn/ByLine

I was expecting it to return 
ByLineCrista  Souza/ByLine
ByLineDarrell  Dunn/ByLine

I've tried other variations and using templates instead but it keeps
displaying the same thing, one ByLine field with things mushed together.

Any clues if this is an issue with xslt code, the xslt response Writer,
XALAN, or solr? I've no clues where to go from here. Any ideas to point me
in the right direction appreciated.
-- 
View this message in context: 
http://www.nabble.com/XSLT-Response-for-multivalue-fields-tp25865618p25865618.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr 1.4 Release Party

2009-10-12 Thread Israel Ekpo
It is my email signature.

It is a sort of hybrid/mashup from different sources.

On Mon, Oct 12, 2009 at 6:49 PM, Michael Masters mmast...@gmail.com wrote:

 Where does the quote come from :)

 On Sat, Oct 10, 2009 at 6:38 AM, Israel Ekpo israele...@gmail.com wrote:
  I can't wait...
 
  --
  Good Enough is not good enough.
  To give anything less than your best is to sacrifice the gift.
  Quality First. Measure Twice. Cut Once.
 




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: Boosting of words

2009-10-12 Thread bhaskar chandrasekar
 
Hi Nicholas,
 
Thanks for your input.Where exactly the query
 
q=product:red color:red^10

should be used and defined?.
Help me.
 
Regards
Bhaskar

--- On Mon, 10/12/09, Nicholas Clark clark...@gmail.com wrote:


From: Nicholas Clark clark...@gmail.com
Subject: Re: Boosting of words
To: solr-user@lucene.apache.org
Date: Monday, October 12, 2009, 2:13 PM


The easiest way to boost your query is to modify your query string.

q=product:red color:red^10

In the above example, I have boosted the color field. If red is found in
that field, it will get a boost of 10. If it is only found in the product
field, then there will be no boost.

Here's more information:

http://wiki.apache.org/solr/SolrRelevancyCookbook#Boosting_Ranking_Terms

Once you're comfortable with that, I suggest that you look into using the
DisMax request handler. It will allow you to easily search across multiple
fields with custom boost values.

http://wiki.apache.org/solr/DisMaxRequestHandler

-Nick


On Sun, Oct 11, 2009 at 12:26 PM, bhaskar chandrasekar bas_s...@yahoo.co.in
 wrote:

 Hi,

 I would like to know how can i give boosting to search input in Solr.
 Where exactly should i make the changes?.

 Regards
 Bhaskar






  

RE: Lucene Merge Threads

2009-10-12 Thread Giovanni Fernandez-Kincade
This didn't end up working. I got the following error when I tried to commit:

Oct 12, 2009 8:36:42 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error loading class '
5
'
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:310)
at 
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:325)
at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:81)
at 
org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:178)
at 
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:123)
at 
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:172)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:400)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:168)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
at 
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at 
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at 
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException: 
5

at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.$$YJP$$doPrivileged(Native Method)
at java.security.AccessController.doPrivileged(Unknown Source)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClassInternal(Unknown Source)
at java.lang.Class.$$YJP$$forName0(Native Method)
at java.lang.Class.forName0(Unknown Source)
at java.lang.Class.forName(Unknown Source)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:294)
... 28 more


I believe it's because the MaxThreadCount is not a public property of the 
ConcurrentMergeSchedulerClass. You have to call this method to set it:

public void setMaxThreadCount(int count) {
if (count  1)
  throw new IllegalArgumentException(count should be at least 1);
maxThreadCount = count;
  }

Is that possible through the solrconfig?

Thanks,
Gio.

-Original Message-
From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] 
Sent: Monday, October 12, 2009 7:53 PM
To: solr-user@lucene.apache.org
Subject: RE: Lucene Merge Threads

Do you have to make a new call to optimize to make it start the merge again?

-Original Message-
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] 
Sent: Monday, October 12, 2009 7:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Lucene Merge Threads

Try this in solrconfig.xml:

mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler
  int name=maxThreadCount1/int
/mergeScheduler

Yes you can stop the process mid-merge.  The partially merged files
will be deleted on restart.

We need to update the wiki?

On Mon, Oct 12, 2009 at 4:05 PM, Giovanni Fernandez-Kincade

SpellCheck Index not building

2009-10-12 Thread Varun Gupta
Hi,

I am using Solr 1.3 for spell checking. I am facing a strange problem of
spell checking index not been generated. When I have less number of
documents (less than 1000) indexed then the spell check index builds, but
when the documents are more (around 40K), then the index for spell checking
does not build. I can see the directory for spell checking build and there
are two files under it: segments_3   segments.gen

I am using the following query to build the spell checking index:
/select
params={spellcheck=truestart=0qt=contentsearchwt=xmlrows=0spellcheck.build=trueversion=2.2

In the logs I see:
INFO: [] webapp=/solr path=/select
params={spellcheck=truestart=0qt=contentsearchwt=xmlrows=0spellcheck.build=trueversion=2.2}
hits=37467 status=0 QTime=44

Please help me solve this problem.

Here is my configuration:
*schema.xml:*
fieldType name=textSpell class=solr.TextField
positionIncrementGap=100 stored=false multiValued=true
  analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType
   field name=a_spell type=textSpell /
   copyField source=title dest=a_spell /
   copyField source=content dest=a_spell /

*solrconfig.xml:*
  requestHandler name=contentsearch class=solr.DisMaxRequestHandler 
lst name=defaults
 str name=defTypedismax/str

  str name=spellcheck.onlyMorePopularfalse/str
  str name=spellcheck.extendedResultsfalse/str
  str name=spellcheck.count5/str
  str name=spellcheck.collatetrue/str
  str name=spellcheck.dictionaryjarowinkler/str
/lst
arr name=last-components
strspellcheck/str
/arr
  /requestHandler

  searchComponent name=spellcheck class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypetextSpell/str
lst name=spellchecker
  str name=namea_spell/str
  str name=fielda_spell/str
  str name=spellcheckIndexDir./spellchecker_a_spell/str
  str name=accuracy0.7/str
/lst
lst name=spellchecker
  str name=namejarowinkler/str
  str name=fielda_spell/str
  !-- Use a different Distance Measure --
  str
name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str
  str name=spellcheckIndexDir./spellchecker_a_spell/str
  str name=accuracy0.7/str
/lst
  /searchComponent

--
Thanks
Varun Gupta


Re: SpellCheck Index not building

2009-10-12 Thread Shalin Shekhar Mangar
On Tue, Oct 13, 2009 at 8:36 AM, Varun Gupta varun.vgu...@gmail.com wrote:

 Hi,

 I am using Solr 1.3 for spell checking. I am facing a strange problem of
 spell checking index not been generated. When I have less number of
 documents (less than 1000) indexed then the spell check index builds, but
 when the documents are more (around 40K), then the index for spell checking
 does not build. I can see the directory for spell checking build and there
 are two files under it: segments_3   segments.gen


It seems that you might be running out of memory with a larger index. Can
you check the logs to see if it has any exceptions recorded?

-- 
Regards,
Shalin Shekhar Mangar.


Re: SpellCheck Index not building

2009-10-12 Thread Varun Gupta
No, there are no exceptions in the logs.

--
Thanks
Varun Gupta

On Tue, Oct 13, 2009 at 8:46 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Tue, Oct 13, 2009 at 8:36 AM, Varun Gupta varun.vgu...@gmail.com
 wrote:

  Hi,
 
  I am using Solr 1.3 for spell checking. I am facing a strange problem of
  spell checking index not been generated. When I have less number of
  documents (less than 1000) indexed then the spell check index builds, but
  when the documents are more (around 40K), then the index for spell
 checking
  does not build. I can see the directory for spell checking build and
 there
  are two files under it: segments_3   segments.gen
 
 
 It seems that you might be running out of memory with a larger index. Can
 you check the logs to see if it has any exceptions recorded?

 --
 Regards,
 Shalin Shekhar Mangar.



Re: doing searches from within an UpdateRequestProcessor

2009-10-12 Thread Noble Paul നോബിള്‍ नोब्ळ्
A custom UpdateRequestProcessor is the solution. You can access the
searcher in a UpdateRequestProcessor.

On Tue, Oct 13, 2009 at 4:20 AM, Bill Au bill.w...@gmail.com wrote:
 Is it possible to do searches from within an UpdateRequestProcessor?  The
 documents in my index reference each other.  When a document is deleted, I
 would like to update all documents containing a reference to the deleted
 document.  My initial idea is to use a custom UpdateRequestProcessor.  Is
 there a better way to do this?
 Bill




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: search by some functionality

2009-10-12 Thread Chris Hostetter

: Maybe I'm missing something, but function queries aren't involved in
: determining whether a document matches or not, only its score.  How is a a
: custom function / value-source going to filter?

it's not ... i didn't realize that was the context of the question, i was 
just answering the specific question about how to create custom functions.



-Hoss



Re: Weird Facet and KeywordTokenizerFactory Issue

2009-10-12 Thread Chris Hostetter

: I had to be brief as my facets are in the order of 100K over 800K documents
: and also if I give the complete schema.xml I was afraid nobody would read my
: long message :-) ..Hence I showed only relevant pieces of the result showing
: different fields having same problem

relevant is good, but you have to provide a consistent picture from start 
to finish ... you don't need to show 1,000 lines of facet field output, 
but you at least need to show the field names.

: fieldType name=keywordText class=solr.TextField
: sortMissingLast=true omitNorms=true positionIncrementGap=100
:   analyzer type=index
: tokenizer class=solr.KeywordTokenizerFactory/
: filter class=solr.TrimFilterFactory /
: filter class=solr.StopFilterFactory ignoreCase=true
: words=stopwords.txt,entity-stopwords.txt enablePositionIncrements=true/
: 
: filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
: ignoreCase=true expand=false /
: filter class=solr.RemoveDuplicatesTokenFilterFactory/
:   /analyzer

...have you used analysis.jsp to see what terms that analyzer produces 
based on the strings you are indexing for your documents?  becuase 
combined with synonyms like this...

: New York, N.Y., NY = New York

...it doesn't suprise me that you're getting New as an indexed term.  
By default SynonymFilter uses whitespace to delimit tokens in multi-token 
synonyms, so for some input like NY you should see it produce the token 
New and York

you can use the tokenizerFactory attribute on SynonymFilterFactory to 
specify a TokenizerFactory class to use when parsing synonyms.txt



-Hoss



Re: Question about PatternReplace filter and automatic Synonym generation

2009-10-12 Thread Chris Hostetter

:  There is a Solr.PatternTokenizerFactory class which likely fits the bill in
: this case. The related question I have is this - is it possible to have
: multiple Tokenizers in your analysis chain?

No .. Tokenizers consume CharReaders and produce a TokenStream ... what's 
needed here is a TokenFilter that comsumes a TOkenStream and produces a 
TokenStream





-Hoss



Re: De-basing / re-basing docIDs, or how to effectively pass calculated values from a Scorer or Filter up to (Solr's) QueryComponent.process

2009-10-12 Thread Chris Hostetter
: In the code I'm working with, I generate a cache of calculated values as a
: by-product within a Filter.getDocidSet implementation (and within a Query-ized
: version of the filter and its Scorer method) . These values are keyed off the
: IndexReader's docID values, since that's all that's accessible at that level.
: Ultimately, however, I need to be able to access these values much higher up
: in the stack (Solr's QueryComponent.process method), so that I can inject the

my suggestion would be to change your Filter to use the FieldCache to 
lookup the uiqueKey for your docid, and base your cache off that ... then 
other uses of your cache (higher up the chain) will have an idea that 
makes sense outside the ocntext of segment reader.




-Hoss



Re: DIH and EmbeddedSolr

2009-10-12 Thread rohan rai
Hey
Any reason why it may be happening ??

Regards
Rohan

On Sun, Oct 11, 2009 at 9:25 PM, rohan rai hiroha...@gmail.com wrote:


 Small data set..
 ?xml version=1.0 encoding=UTF-8 ?
 root
 test
 id11/id
 name11/name
 type11/type
 /test
 test
 id22/id
 name22/name
 type22/type
 /test
 test
 id33/id
 name33/name
 type33/type
 /test
 /root

 data-config
 dataConfig
 dataSource type=FileDataSource/
 document
 entity name=test processor=XPathEntityProcessor
 forEach=/root/test/
 url=/home/test/test_data.xml
 
 field column=id name=id xpath=/root/test/id/
 field column=name name=name xpath=/root/test/name/
 field column=type name=type xpath=/root/test/type/
 /entity
 /document
 /dataConfig

 schema
 ?xml version=1.0 ?
 schema name=test version=1.1
   types
fieldtype name=string  class=solr.StrField sortMissingLast=true
 omitNorms=true/
   /types

  fields
   field name=id  type=string   indexed=true  stored=true
 multiValued=false required=true/
   field name=typetype=string   indexed=true  stored=true
 multiValued=false /
   field name=nametype=string   indexed=true  stored=true
 multiValued=false /
  /fields

  uniqueKeyid/uniqueKey

  defaultSearchFieldname/defaultSearchField

  solrQueryParser defaultOperator=OR/
 /schema

 Sometime it creates sometimes it gives thread pool exception. It does not
 consistently creates the index.

 Regards
 Rohan


 On Sun, Oct 11, 2009 at 3:56 PM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

 On Sat, Oct 10, 2009 at 7:44 PM, rohan rai hiroha...@gmail.com wrote:

  This is pretty unstable...anyone has any clue...Sometimes it even
 creates
  index, sometimes it does not ??
 
 
 Most DataImportHandler tests run Solr in an embedded-like mode and they
 run
 fine. Can you tell us which version of Solr are you using? Also, any data
 which can help us reproduce the problem would be nice.

 --
 Regards,
 Shalin Shekhar Mangar.