from:"Jérôme Etévé"

Re: Replication clients logs in solr 1.4

2010-01-20 Thread Jérôme Etévé

Oops.

Ok my mistakes.

The logs are actually for the solr 1.3 system scripts based distribution only.

And the config files synchronize only on change ..

J.

2010/1/20 Jérôme Etévé :
> Hi All,
>
> I'm using the build in replication with master/slave(s) Solr and the
> indices are replicating just fine.
>
> Just something troubles me:
>
> Nothing happens in my logs/ directory ..
> On the slave(s), no logs/snapshot.current file.
> And on the master, nothing either appears on logs/clients/
>
> The logs directories belongs to the tomcat running solr and are writable
>
> Another thing I noticed is I've got some timesFailed=18 in the slave
> replication.properties, although I cannot see any error in my
> catalina.out :(, I just have:
> 20-Jan-2010 16:11:00 org.apache.solr.handler.SnapPuller fetchLatestIndex
> INFO: Slave in sync with master
>
> Is there any reason for this?
>
> What I also don't get is that no documents are being updated on my
> master, the index versions are the same on my slave and master and
> still timesFailed is increasing continuously.
>
> The master config files seems to fail to synchronize as well.
>
> Thanks for any help.
>
> Jerome.
>
>
>
>
> --
> Jerome Eteve.
> http://www.eteve.net
> jer...@eteve.net
>



-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Replication clients logs in solr 1.4

2010-01-20 Thread Jérôme Etévé

Hi All,

I'm using the build in replication with master/slave(s) Solr and the
indices are replicating just fine.

Just something troubles me:

Nothing happens in my logs/ directory ..
On the slave(s), no logs/snapshot.current file.
And on the master, nothing either appears on logs/clients/

The logs directories belongs to the tomcat running solr and are writable

Another thing I noticed is I've got some timesFailed=18 in the slave
replication.properties, although I cannot see any error in my
catalina.out :(, I just have:
20-Jan-2010 16:11:00 org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master

Is there any reason for this?

What I also don't get is that no documents are being updated on my
master, the index versions are the same on my slave and master and
still timesFailed is increasing continuously.

The master config files seems to fail to synchronize as well.

Thanks for any help.

Jerome.




-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Re: exact match lookup

2009-11-04 Thread Jérôme Etévé

If feedClass acts as an identifier, better use string :)

use sort=title asc,score desc (not sort:)

J.

2009/11/4 Joel Nylund :
> thank worked for me, changed to:
>
> http://localhost:8983/solr/select?q=feedClass:%22social%20news%22
>
> and the matches are correct, I changed the feedClass field back to type
> text.
>
> A followup question has to do with sorting these results.
>
> I have a field called title that I want the results sorted by.
>
> http://localhost:8983/solr/select?q=feedClass:%22social%20news%22&sort:title%20asc
>
> I tried this and the results are not sorted (they seem random)
>
> any ideas?
>
> thanks
> Joel
>
>
> 
> −
> 
> 0
> 1
> −
> 
> feedClass:"social news"
> 
> 
> 
> −
> 
> −
> 
> Social News
> F
> Far
> 
> 
> Social News
> D
> dig
> 
> 
> Social News
> T
> Tech
> 
> 
> Social News
> M
> Mix
> 
>
>
>
> On Nov 4, 2009, at 12:15 PM, Jérôme Etévé wrote:
>
>> Hi,
>> you need to quote your phrase when you search for 'Social News':
>>
>> feedClass:"Social News" (URI encoded of course).
>>
>> otherwise your request will become (I assume you're using a standard
>> query parser) feedClass:Social defaultField:News . Well that's the
>> idea.
>>
>> It should then work using the type string.
>>
>> Cheers!
>>
>> J.
>>
>>
>> 2009/11/4 Joel Nylund :
>>>
>>> Hi,
>>>
>>> I have a field that I want to do exact match lookups using.
>>> (when I say exact match, im looking for equivalent to a sql query where
>>> with
>>> no like clause so where feedClass = "Social News")
>>>
>>> For example the field is called feedClass and im doing:
>>>
>>> http://localhost:8983/solr/select?q=feedClass:Blog
>>>
>>> http://localhost:8983/solr/select?q=feedClass:Social%20News
>>>
>>> I tried using "text" and it seems to work pretty well except for classes
>>> with spaces in them.
>>>
>>> So I tried using field type string, that didnt work. Then I tried
>>> defining a
>>> new type called:
>>>
>>>   >> positionIncrementGap="100">
>>>  
>>>
>>>
>>> This didnt seem to help either.
>>>
>>> When I do these queries for this field with spaces, I seem to get random
>>> results
>>>
>>> For example:
>>>
>>> 
>>> −
>>> 
>>> 0
>>> 5
>>> −
>>> 
>>> feedClass:Social News
>>> 
>>> 
>>> −
>>> 
>>> −
>>> 
>>> Blog
>>> N
>>> 
>>>
>>>
>>> any ideas?
>>>
>>> thanks
>>> Joel
>>>
>>>
>>
>>
>>
>> --
>> Jerome Eteve.
>> http://www.eteve.net
>> jer...@eteve.net
>
>



-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Re: character encoding issue

2009-11-04 Thread Jérôme Etévé

Hi,

 How do you post your data to solr? If it's by posting XML, then it
should be properly encoded in UTF-8 (which is the XML default).
Regardless of what's in the DB (which can be a mystery with MySQL).

At query time, if the XML writer is used, then it's encoded in UTF-8.
If the json one is used, I think it's the same. Because json is
unicode compliant by nature (javascript).

According to what you say, I would bet for a PHP problem. It seems PHP
takes the correct UTF8 octets from solr and displays them as latin1
encoding (hence the strange characters). You need to
- either output your pages in UTF-8
- or decode the octets given by solr to a unicode string and let it be
encoded as latin1 for output (with the risk of loosing non-latin1
encodable characters).

I hope it helps.

J.

2009/11/4 Jonathan Hendler :
> Hi Peter,
>
> I have the same set of issues and will look for a response here.
>
> Sometimes those other chars can be create at the time of input (like
> extraction from a Microsoft Office doc from third part tool for example).
> But MySQL looking OK in the browser might be because the encoding of MySQL
> was not the same as the original text. Say for example that the collation of
> MySQL is Latin, and the document was UTF-8. When a browser renders, it might
> assume chars are UTF-8, but SOLR might be taking the table type literally in
> the DIH (Latin1 Swedish for example). Could also be the way PHP doesn't
> handle UTF-8 well and it depends on your client.
>
> Don't think it has anything to do with Jetty - I use Resin.
>
> Hope that helps,
>
> - Jonathan
>
>
> On Nov 4, 2009, at 8:48 AM, Peter Hedlund wrote:
>
>> I'm having a problem with character encoding.  The data that I'm indexing
>> with SOLR is being pulled from a MySQL database and then the index is being
>> integrated into a PHP application.  When I display the text from the SOLR
>> index it's full of strange characters (â€“, Ã©, etc...).  However, when I
>> bypass SOLR and access the data from the MySQL table directly and write to
>> the browser I don't see any problems with em-dashes and accented characters.
>>
>> Is this a JETTY issue or a SOLR issue or something else?  (It's not simply
>> an issue of including > content="text/html;charset=UTF-8"> either)
>>
>> Thanks for any help.
>>
>> Peter Hedlund
>>
>>
>
>



-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Re: exact match lookup

2009-11-04 Thread Jérôme Etévé

Hi,
 you need to quote your phrase when you search for 'Social News':

feedClass:"Social News" (URI encoded of course).

otherwise your request will become (I assume you're using a standard
query parser) feedClass:Social defaultField:News . Well that's the
idea.

It should then work using the type string.

Cheers!

J.


2009/11/4 Joel Nylund :
> Hi,
>
> I have a field that I want to do exact match lookups using.
> (when I say exact match, im looking for equivalent to a sql query where with
> no like clause so where feedClass = "Social News")
>
> For example the field is called feedClass and im doing:
>
> http://localhost:8983/solr/select?q=feedClass:Blog
>
> http://localhost:8983/solr/select?q=feedClass:Social%20News
>
> I tried using "text" and it seems to work pretty well except for classes
> with spaces in them.
>
> So I tried using field type string, that didnt work. Then I tried defining a
> new type called:
>
>  positionIncrementGap="100">
>
>
>
> This didnt seem to help either.
>
> When I do these queries for this field with spaces, I seem to get random
> results
>
> For example:
>
> 
> −
> 
> 0
> 5
> −
> 
> feedClass:Social News
> 
> 
> −
> 
> −
> 
> Blog
> N
> 
>
>
> any ideas?
>
> thanks
> Joel
>
>



-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Re: Lock problems: Lock obtain timed out

2009-11-04 Thread Jérôme Etévé

Hi,

It seems this situation is caused by some No space left on device exeptions:
SEVERE: java.io.IOException: No space left on device
at java.io.RandomAccessFile.writeBytes(Native Method)
at java.io.RandomAccessFile.write(RandomAccessFile.java:466)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:192)
at 
org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)


I'd better try to set my maxMergeDocs and mergeFactor to more
adequates values for my app (I'm indexing ~15 Gb of data on 20Gb
device, so I guess there's problem when solr tries to merge the index
bits being build.

At the moment, they are set to   100 and
2147483647

Jerome.

-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Lock problems: Lock obtain timed out

2009-11-02 Thread Jérôme Etévé

Hi,

  I've got a few machines who post documents concurrently to a solr
instance. They do not issue the commit themselves, instead, I've got
autocommit set up at solr server side:
   
  5 
  6 


This usually works fine, but sometime the server goes in a deadlock
state . Here's the errors I get from the log (these go on forever
until I delete the index and restart all from zero):

02-Nov-2009 10:35:27 org.apache.solr.update.SolrIndexWriter finalize
SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates
a bug -- POSSIBLE RESOURCE LEAK!!!
...
[ multiple messages like this ]
...
02-Nov-2009 10:35:27 org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
timed out: 
NativeFSLock@/home/solrdata/jobs/index/lucene-703db99881e56205cb910a2e5fd816d3-write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:85)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1538)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1395)
at 
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:190)
at 
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
at 
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)


I'm wondering what could be the reason for this (if a commit takes
mire than 60 seconds for instance?), and if I should use better
locking or autocommittting options?

Here's the locking conf I've got at the moment:
   1000
1
   native

I'm using solr trunk from 12 oct 2009 within tomcat.

Thanks for any help.

Jerome.

-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Re: Slow Commits

2009-10-28 Thread Jérôme Etévé

Hi, here's two thing that can slow down commits:

1) Autowarming the caches.
2) The Java old generation object garbage collection.

You can try:
- Turning autowarming off (set autowarmCount="0"  in the caches configuration)
- If you use the sun jvm, use  -XX:+UseConcMarkSweepGC to get a less
blocking garbage collection.

You may also try to:
- Not wait for the new searcher when you commit. The commit will then
be instant from your posting application point of view. ( option
waitSearcher=false  ).
- Leave the commits to the server ( by setting autocommits in the
solrconfig.xml). This is the best strategy if you've got lot of
concurrent processes who posts.

Cheers.

Jerome.

2009/10/28 Jim Murphy :
>
> Hi All,
>
> We have 8 solr shards, index is ~ 90M documents 190GB.  :)
>
> 4 of the shards have acceptable commit time - 30-60 seconds.  The other 4
> have drifted over the last couple months to but up around 2-3 minutes.  This
> is killing our write throughput as you can imagine.
>
> I've included a log dump of a typical commit.  Not the large time period
> (3:40) between the start commit log message and the OnCommit log message.
> So, I think warming issues are not relevant.
>
> Any ideas what to debug at this point?
>
> I'm about to issue an optimize and see where that goes.  Its been a while
> since I did that.
>
> Cheers,
>
> Jim
>
>
>
>
> Oct 28, 2009 11:47:02 AM org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
> Oct 28, 2009 11:50:43 AM org.apache.solr.core.SolrDeletionPolicy onCommit
> INFO: SolrDeletionPolicy.onCommit: commits:num=2
>
> commit{dir=/master/data/index,segFN=segments_8us4,version=1228872482131,generation=413140,filenames=[segments_8us4,
> _alae.fnm, _ai
> lk.tis, _ala9.fnm, _ala9.fdx, _alac.fnm, _al9w_h.del, _alab.prx, _ala9.fdt,
> _a61p_b76.del, _alab.fnm, _al8x.frq, _al7i_2f.del, _akh1.tis,
> _add1.frq, _alae.tis, _alad_1.del, _alaa.fnm, _alad.nrm, _al9w.frq,
> _alae.tii, _ailk.tii, _add1.tis, _alac.tii, _akuu.tis, _add1.tii, _ail
> k.frq, _alac.tis, _7zfh.tii, _962y.tis, _ala7.frq, _ah91.prx, _akuu.tii,
> _alab_3.del, _ah91.fnm, _7zfh.tis, _ala8.frq, _962y.tii, _alae.pr
> x, _a61p.fdt, _akuu.frq, _a61p.fdx, _al7i.fdx, _al2o.tis, _al9w.tis,
> _ala7.fnm, _a61p.frq, _akzu.fnm, _9wzn.fnm, _akh1.prx, _al7i.fdt, _al
> a9_2.del, _962y.prx, _al7i.prx, _al9w.tii, _alaa_4.del, _al7i.frq,
> _ah91.tii, _ala8.nrm, _962y.fdt, _add1_62u.del, _alae.nrm, _ah91.tis, _
> 962y.fdx, _akh1.fnm, _al8x.prx, _al2o.tii, _ala7.fdx, _ala9.prx, _ala7.fdt,
> _al9w.prx, _ala8.prx, _akh1.tii, _al2o.fdx, _7zfh.frq, _alac_3
> .del, _akzu.tii, _akzu.fdt, _alad.fnm, _akzu.tis, _alab.nrm, _akzu.fdx,
> _al2o.fnm, _al2o.fdt, _alaa.prx, _alaa.nrm, _962y.fnm, _ala7.prx,
> _alaa.tis, _ailk.fdt, _akzu_8d.del, _alac.frq, _akzu.prx, _ala9.nrm,
> _ailk.prx, _ala9.tis, _alaa.tii, _alae.frq, _add1.fnm, _7zfh.prx, _al
> 9w.fnm, _ala9.tii, _ala9.frq, _962y.nrm, _alab.frq, _ala8.fdx, _al8x.fnm,
> _a61p.prx, _7zfh.fnm, _ala8.fdt, _ailk.fdx, _alaa.frq, _7zfh.fdx
> , _al7i.tis, _ah91.fdt, _ailk.fnm, _9wzn_i0m.del, _ah91.fdx, _al7i.tii,
> _ailk_24j.del, _alad.fdx, _al8x.tii, _alae.fdx, _add1.prx, _akuu.f
> nm, _al8x.tis, _ah91.frq, _ala8.fnm, _7zfh.fdt, _alad.fdt, _alae_1.del,
> _alae.fdt, _akzu.frq, _a61p.fnm, _9wzn.frq, _ala8.tii, _7zfh_1gsd.
> del, _7zfh.nrm, _ala7_6.del, _a61p.tis, _9wzn.tii, _alad.frq, _alad.tii,
> _akuu.fdt, _alab.tii, _ala8.tis, _962y_xgg.del, _akh1.frq, _akuu.
> fdx, _alab.tis, _al7i.fnm, _alad.tis, _alac.nrm, _alab.fdx, _ala8_5.del,
> _add1.fdx, _ala7.tii, _akuu_cc.del, _alab.fdt, _9wzn.prx, _alaa.f
> dx, _al9w.fdt, _al2o.frq, _akh1_nf.del, _alac.prx, _akh1.fdx, _alaa.fdt,
> _al9w.fdx, _al8x_17.del, _add1.fdt, _al2o.prx, _akh1.fdt, _alad.p
> rx, _akuu.prx, _962y.frq, _al2o_66.del, _alac.fdt, _ala7.tis, _a61p.tii,
> _alac.fdx, _al8x.fdt, _9wzn.tis, _9wzn.fdt, _al8x.fdx, _9wzn.fdx,
>  _ah91_35l.del]
>
> commit{dir=/master/data/index,segFN=segments_8us5,version=1228872482132,generation=413141,filenames=[_ala9.fnm,
> _alaa_5.del, _alab
> .fnm, _962y_xgh.del, _al8x.frq, _akh1.tis, _add1.frq, _alae.tis,
> _7zfh_1gse.del, _alad.nrm, _alae.tii, _akuu.tis, _ah91_35m.del, _ailk.frq
> , _7zfh.tii, _962y.tis, _akuu.tii, _ah91.prx, _7zfh.tis, _ala8.frq,
> _962y.tii, _ala7.fnm, _akzu.fnm, _9wzn.fnm, _ala9_2.del, _ala8.nrm, _a
> laf.fnm, _alae.nrm, _ala9.prx, _ailk_24k.del, _alaf.prx, _al9w.prx,
> _ala8.prx, _akh1.tii, _akzu.tii, _akzu.tis, _alad.fnm, _al2o.fnm, _962
> y.fnm, _al8x_18.del, _ala7_7.del, _alaa.tis, _ala9.nrm, _ala9.tis,
> _alaa.tii, _962y.nrm, _ala9.tii, _a61p.prx, _add1_62v.del, _al8x.fnm, _
> 7zfh.fnm, _al7i_2g.del, _ailk.fnm, _al8x.tii, _al8x.tis, _ala8.fnm,
> _akzu.frq, _9wzn.frq, _7zfh.nrm, _akuu.fdt, _alad.tii, _akuu.fdx, _aku
> u_cd.del, _a61p_b77.del, _alad.tis, _al2o_67.del, _add1.fdx, _9wzn.prx,
> _al9w.fdt, _add1.fdt, _al9w.fdx, _akuu.prx, _962y.frq, _9wzn.fdt,
> _alab_4.del, _9wzn.fdx, segments_8us5, _alac_4.del, _alae.fnm, _ailk

Re: Multifield query parser and phrase query behaviour from 1.3 to 1.4

2009-10-28 Thread Jérôme Etévé

Mea maxima culpa,

I had foolishly set the option  omitTermFreqAndPositions="false" in an
attempt to save space.
It works when this is set to 'true'.

However, even when it's set to 'false' , the highlighting of a field
continues to work even if the search doesn't.
Does the highlighter use a different strategy to match the query terms
in the fields?

Cheers!

Jerome.

2009/10/27 Jérôme Etévé :
> Actually here is the difference between the textgen analysis pipeline and our:
>
> For the phrase "ingenieur d'affaire senior" ,
> Our pipeline gives right after our tokenizer:
>
> term position   1   2   3   4
> term text   ingenieur   d   affaire senior
>
> 'd' and 'affaire' are separated as different tokens straight away. Our
> filters have no later effect for this phrase.
>
> * The textgen pipeline uses a whitespace tokenizer, so it gives first:
> term position   1   2   3
> term text   ingenieur   d'affaire   senior
> term type   wordwordword
> source start,end0,9 10,19   20,26
>
> * Then a word delimiter filter splits the token "d'affaire" (and
> generate the concatenation):
> erm position1   2   3   4
> term text   ingenieur   d   affaire senior
> daffaire
> term type   wordwordwordword
> word
> source start,end0,9 10,11   12,19   20,26
> 10,19
>
>
> Could you see a reason why title:"d affaire" works with textgen but
> not with our type?

Re: facet.query and fq

2009-10-27 Thread Jérôme Etévé

Hi,

 you need to 'tag' your filter and then exclude it from the faceting.

 An example here:
http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters

J.

2009/10/27 David Giffin :
> Hi There,
>
> Is there a way to get facet.query= to ignore the fq= param? We want to
> do a query like this:
>
> select?fl=*&start=0&q=cool&fq=in_stock:true&facet=true&facet.query=in_stock:false&qt=dismax
>
> To understand the count of items not in stock, when someone has
> filtered items that are in stock. Or is there a way to combine two
> queries into one?
>
> Thanks,
> David
>



-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Re: Multifield query parser and phrase query behaviour from 1.3 to 1.4

2009-10-27 Thread Jérôme Etévé

Actually here is the difference between the textgen analysis pipeline and our:

For the phrase "ingenieur d'affaire senior" ,
Our pipeline gives right after our tokenizer:

term position   1   2   3   4
term text   ingenieur   d   affaire senior

'd' and 'affaire' are separated as different tokens straight away. Our
filters have no later effect for this phrase.

* The textgen pipeline uses a whitespace tokenizer, so it gives first:
term position   1   2   3
term text   ingenieur   d'affaire   senior
term type   wordwordword
source start,end0,9 10,19   20,26

* Then a word delimiter filter splits the token "d'affaire" (and
generate the concatenation):
erm position1   2   3   4
term text   ingenieur   d   affaire senior
daffaire
term type   wordwordwordword
word
source start,end0,9 10,11   12,19   20,26
10,19


Could you see a reason why title:"d affaire" works with textgen but
not with our type?

Thanks!

Jerome.


2009/10/27 Jérôme Etévé :
> Hum,
>  That's probably because of our own customized types/tokenizers/filters.
>
> I tried reindexing and querying our data using the default solr type
> 'textgen' and it works fine.
>
> I need to investigate which features of the new lucene 2.9 API is not
> implemented in our own tokenizers etc...
>
> Thanks.
>
> Jerome.
>
> 2009/10/27 Yonik Seeley :
>> On Tue, Oct 27, 2009 at 8:44 AM, Jérôme Etévé  wrote:
>>> I don't really get why these two tokens are subsequently put together
>>> in a phrase query.
>>
>> That's the way the Lucene query parser has always worked... phrase
>> queries are made if multiple tokens are produced from one field query.
>>
>>> In solr 1.3, it didn't seem to be a problem though. title:"d affaire"
>>> matches document where title contains "d'affaire" and all is fine.
>>
>> This should not have changed between 1.3 and 1.4...
>> What's the fieldType and it's definition for your title field?
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>
>
>
> --
> Jerome Eteve.
> http://www.eteve.net
> jer...@eteve.net
>



-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Re: Multifield query parser and phrase query behaviour from 1.3 to 1.4

2009-10-27 Thread Jérôme Etévé

Hum,
 That's probably because of our own customized types/tokenizers/filters.

I tried reindexing and querying our data using the default solr type
'textgen' and it works fine.

I need to investigate which features of the new lucene 2.9 API is not
implemented in our own tokenizers etc...

Thanks.

Jerome.

2009/10/27 Yonik Seeley :
> On Tue, Oct 27, 2009 at 8:44 AM, Jérôme Etévé  wrote:
>> I don't really get why these two tokens are subsequently put together
>> in a phrase query.
>
> That's the way the Lucene query parser has always worked... phrase
> queries are made if multiple tokens are produced from one field query.
>
>> In solr 1.3, it didn't seem to be a problem though. title:"d affaire"
>> matches document where title contains "d'affaire" and all is fine.
>
> This should not have changed between 1.3 and 1.4...
> What's the fieldType and it's definition for your title field?
>
> -Yonik
> http://www.lucidimagination.com
>



-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Multifield query parser and phrase query behaviour from 1.3 to 1.4

2009-10-27 Thread Jérôme Etévé

Hi All,
 I'm using a multified query parser to generated weighted queries
across different fields.

For instance, perl developer gives me:
+(title:perl^10.0 keywords:perl company:perl^3.0)
+(title:developer^10.0 keywords:developer company:developer^3.0)

Either in solr 1.3 or solr 1.4 (from 12 oct 2009), a query like
"d'affaire" gives me:
title:"d affaire"^10.0 keywords:"d affaire" company:"d affaire"^3.0

nb: "d" is not a stopword

That's the first thing I don't get, since "d'affaire" is parsed as two
separate tokens 'd' and 'affaire' , why these phrase queries appear?

When I use the analysis interface of solr, "d'affaire" gives (for
query or indexing, since the analyzer is the same):
term position   1   2
term text   d   affaire
term type   wordword
source start,end0,1 2,9

You can't see it in this email, but 'd' and 'affaire' are both purple,
indicating a match with the query tokens.

I don't really get why these two tokens are subsequently put together
in a phrase query.

In solr 1.3, it didn't seem to be a problem though. title:"d affaire"
matches document where title contains "d'affaire" and all is fine.
That's the behaviour we should expect since the title field uses
exactly the same analyzer at index and query time.

Since I'm using solr 1.4, title:"d affaire" does not give any results back.

Is there any behaviour change that could be responsible for this, and
what's the correct way to fix this?

Thanks for your help.

Jerome.

-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Where the new replication pulls the files?

2009-10-23 Thread Jérôme Etévé

Hi all,
  I'm wondering where a slave pulls the files from the master on replication.

Is it directly to the index/ directory or is it somewhere else before
it's completed and gets copied to index?

Cheers!

Jerome.

-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Re: QTime always a multiple of 50ms ?

2009-10-23 Thread Jérôme Etévé

2009/10/23 Andrzej Bialecki :
> Jérôme Etévé wrote:
>>
>> Hi all,
>>
>>  I'm using Solr trunk from 2009-10-12 and I noticed that the QTime
>> result is always a multiple of roughly 50ms, regardless of the used
>> handler.
>>
>> For instance, for the update handler, I get :
>>
>> INFO: [idx1] webapp=/solr path=/update/ params={} status=0 QTime=0
>> INFO: [idx1] webapp=/solr path=/update/ params={} status=0 QTime=104
>> INFO: [idx1] webapp=/solr path=/update/ params={} status=0 QTime=52
>> ...
>>
>> Is this a known issue ?
>
> It may be an issue with System.currentTimeMillis() resolution on some
> platforms (e.g. Windows)?

I don't know, I'm using linux 2.6.22 and a jvm 1.6.0


-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

QTime always a multiple of 50ms ?

2009-10-23 Thread Jérôme Etévé

Hi all,

 I'm using Solr trunk from 2009-10-12 and I noticed that the QTime
result is always a multiple of roughly 50ms, regardless of the used
handler.

For instance, for the update handler, I get :

INFO: [idx1] webapp=/solr path=/update/ params={} status=0 QTime=0
INFO: [idx1] webapp=/solr path=/update/ params={} status=0 QTime=104
INFO: [idx1] webapp=/solr path=/update/ params={} status=0 QTime=52
...

Is this a known issue ?

Cheers!

J.

-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Disable replication on master while slaves are pulling

2009-10-21 Thread Jérôme Etévé

Hi there,

  I'm planning to reindex all my data on my master server everyday, so
here's what I intend to do on the master:

1 - disable replication on the master
2 - Empty the index
3 - Reindex everything
4 - Optimize
5 - enable replication again

There's something I'm wondering about this strategy.

What would happen if a slave is not finished pulling the data when I
start step 1?

Is there a better strategy to achieve daily complete reindexing?

Thanks!

Jerome.

-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Re: Is Relational Mapping (foreign key) possible in solr ??

2009-10-19 Thread Jérôme Etévé

Hi,

 here's what you could do:

* Use multivalued fields instead of 'comma separated values', so you
won't need a separator.
* Store project identifiers in the user index.

Denormalised projects informations in a user entry will fatally need
re-indexing lot of user entries when project info changes.

* You could have a mixed index with user and project entries in the
same index, so if you search for a name, you'd find users and projects
matching that name.

Jerome.

2009/10/19 ashokcz :
>
> Hi i browsed through the solr docs and user forums and what i infer is we
> cant use solr to store Relational
> Mapping(foreign key) in solr .
>
> but just want to know if any chances of doing the same.
>
> I have two tables User table (with 1,00,000 entries )  and project table
> with (200 entries ).
> User table columns : userid , name ,country , location , etc.
> Project tables Columns : project name , description , business unit ,
> project type .
> Here User Location , Country , Project  Name , Project  business unit ,
> project type are faceted
> A user can be mapped to multiple projects.
> In solr i store the details like this 
> [
> {
> userId:1234;
> userName:ABC;
> Country:US;
> Location:NY;
> Project Name:Project1,Project2;
> Project Description:Project1,Project2;
> Project  business unit:unit1,unit2;
> Project type:Type1,Type2
> }
> ]
>
> With this structure i could get faceted details about both user data and
> project data .
>
> But here i face 2 Problems .
>
> 1.A project can be mapped to many users say 10,000 Users . So if i change a
> project name then i end
> up indexing 10,000 Records which is a very time consuming work.
>
> 2.for Fields like Project Description i could not find any proper delimiter
> . for other fields comma (,) is
>
> okay but being description i could not use any specific delimiter .This is
> not faceted but still in search results i need to take this out and show the
> project details in tabular format. and i use delimiter to split it .For
> other  project fields like Project Name and Type i could do it but not for
> this Project Description field
>
> So i expect is there any way of storing the data as relational records like
> in user details where we will have field called project Id and data will be
> 1,2 which refers to project records primary key in solr and still preserve
> the faceted approach.
>
> As for my knowledge my guess is it cant be done ???
> Am i correct ???
> If so then how we can achieve the solutions to my problem??
> Pls if someone could share some ideas it will be useful.
> --
> View this message in context: 
> http://www.nabble.com/Is-Relational-Mapping-%28foreign-key%29-possible-in-solrtp25955068p25955068.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Fwd: Replication filelist command failure on container restart

2009-10-16 Thread Jérôme Etévé

-- Forwarded message --
From: Jérôme Etévé 
Date: 2009/10/16
Subject: Re: Replication filelist command failure on container restart
To: yo...@lucidimagination.com


Thanks Yonik,

It works now!

J.

2009/10/16 Yonik Seeley :
> I think you may need to tell the replication handler to enable
> replication after startup too?
>
>commit
>startup
>
> -Yonik
> http://www.lucidimagination.com
>
>
> On Fri, Oct 16, 2009 at 12:58 PM, Jérôme Etévé  wrote:
>> Hi All,
>>  I'm facing a small problem with the replication handler:
>>
>> After restarting my master container (tomcat),
>> /admin/replication/index.jsp shows me the right information,
>> basically the same indexversion as before the restart (no
>> commits/optimize have been done after restart):
>>
>> Local Index  Index Version: 1255709893043, Generation: 8
>>
>> However, if I query the handler with the filelist command and this
>> version number :
>> /replication?command=filelist&indexversion=1255709893043 , the handler
>> gives me an error:
>>
>> invalid indexversion
>>
>> So I think my slaves will get confused if this information doesn't
>> remain consistent after a master container restart.
>>
>> Is there a way to go around this problem, for instance by triggering a
>> commit on startup (or reload) ?
>>
>>
>> Thanks!
>>
>> Jerome.
>>
>> --
>> Jerome Eteve.
>> http://www.eteve.net
>> jer...@eteve.net
>>
>



--
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net



-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Replication filelist command failure on container restart

2009-10-16 Thread Jérôme Etévé

Hi All,
 I'm facing a small problem with the replication handler:

After restarting my master container (tomcat),
/admin/replication/index.jsp shows me the right information,
basically the same indexversion as before the restart (no
commits/optimize have been done after restart):

Local Index  Index Version: 1255709893043, Generation: 8

However, if I query the handler with the filelist command and this
version number :
/replication?command=filelist&indexversion=1255709893043 , the handler
gives me an error:

invalid indexversion

So I think my slaves will get confused if this information doesn't
remain consistent after a master container restart.

Is there a way to go around this problem, for instance by triggering a
commit on startup (or reload) ?


Thanks!

Jerome.

-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Solr 1.4 Release date/ lucene 2.9 API ?

2009-10-01 Thread Jérôme Etévé

Hi all,

Have you planned a release date for solr 1.4? If I understood well, it
will use lucene 2.9 release from last sept. 24th with a stable API?

Thanks.

Jerome.

-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Re: Where do I need to install Solr

2009-09-30 Thread Jérôme Etévé

Solr is a separate service, in the same way a RDMS is a separate service.

Whether you install it on the same machine as your webserver or not,
it's logically separated from your server.

Jerome.

2009/9/30 Claudio Martella :
> Kevin Miller wrote:
>> Does Solr have to be installed on the web server, or can I install Solr
>> on a different server and access it from my web server?
>>
>> Kevin Miller
>> Web Services
>>
>>
> you can access it from your webserver (or browser) via HTTP/XML requests
> and responses.
> have a look at solr tutorial: http://lucene.apache.org/solr/tutorial.html
> and this one: http://www.xml.com/lpt/a/1668
>
> --
> Claudio Martella
> Digital Technologies
> Unit Research & Development - Engineer
>
> TIS innovation park
> Via Siemens 19 | Siemensstr. 19
> 39100 Bolzano | 39100 Bozen
> Tel. +39 0471 068 123
> Fax  +39 0471 068 129
> claudio.marte...@tis.bz.it http://www.tis.bz.it
>
> Short information regarding use of personal data. According to Section 13 of 
> Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we 
> process your personal data in order to fulfil contractual and fiscal 
> obligations and also to send you information regarding our services and 
> events. Your personal data are processed with and without electronic means 
> and by respecting data subjects' rights, fundamental freedoms and dignity, 
> particularly with regard to confidentiality, personal identity and the right 
> to personal data protection. At any time and without formalities you can 
> write an e-mail to priv...@tis.bz.it in order to object the processing of 
> your personal data for the purpose of sending advertising materials and also 
> to exercise the right to access personal data and other rights referred to in 
> Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation 
> Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete 
> information on the web site www.tis.bz.it.
>
>
>



-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Re: delay while adding document to solr index

2009-09-30 Thread Jérôme Etévé

Hi,

- Try to let solr do the commits for you (setting up autocommit
feature). (and stop committing after inserting one document). This
should greatly improve the delays you're experiencing.

- If you do not optimize, it's normal your index size only grows.
Optimize once regularly when your load is minimal.

Jerome.

2009/9/30 swapna_here :
>
> thanks again for your immediate response
>
> yes, i am running the commit after a document is indexed
>
> here i don't understand why my index size is increased to 625MB(for the
> 10 documents)
> which was previously 250MB
> is this due to i have not optimized at all my index or since i am adding
> documents individually
>
> i need solution for this urgently
> thanks a lot
> --
> View this message in context: 
> http://www.nabble.com/delay-while-adding-document-to-solr-index-tp25676777p25679463.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

init parameters for queryParser

2009-09-30 Thread Jérôme Etévé

Hi all,

  I've got my own query parser plugin defined thanks to the queryParser tag:



The QParserPlugin class has got an init method like this:
public void init(NamedList args);

Where and how do I put my args to be passed to init for my query parser plugin?

I'm trying



  value1
   value1



But I'm not sure if it's the right way.

Could we also update the wiki about this?
http://wiki.apache.org/solr/SolrPlugins#QParserPlugin

Jerome.

-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

What options would you recommend for the Sun JVM?

2009-09-25 Thread Jérôme Etévé

Hi solr addicts,

I know there's no one size fits all set of options for the sun JVM,
but I think It'd be useful to everyone to share your tips on using the
sun JVM with solr.

For instance, I recently figured out that setting the tenured
generation garbage collection to Concurrent mark and sweep (
-XX:+UseConcMarkSweepGC )  have dramatically decreased the amount of
time java hangs on tenured gen. garbage collecting. On my settings,
the old gen. garbage collection went from big time chunks of 1~2
second to multiple small slices of ~0.2 s.

As a result, the commits (hence the searcher drop/rebuild) are much
less painful from the application performance point of view.

What are the other options you would recommend?

Cheers!

Jerome.

-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Re: do NOT want to stem plurals for a particular field, or words

2009-09-15 Thread Jérôme Etévé

Hi,

  You can enable/disable stemming per field type in the schema.xml, by
removing the stemming filters from the type definition.

Basically, copy your prefered type, rename it to something like
'text_nostem', remove the stemming filter from the type and use your
'text_nostem' type for your field 'type' .

By what you say, I guess your field 'type' will be even more happier
to simply be of type 'string' .

Jerome.

2009/9/15 DHast :
>
> I have a field where there are items that are plurals, and used as very
> specific locators, so i do a solr search type:articles, and it translates it
> into : type:article, then into type:articl... is tehre a way to stop it from
> doing this on either the field "type" or on a list of words "articles,
> notes, etc"
>
> i tried enering into the protwords.txt file and dont seem to get any where
> --
> View this message in context: 
> http://www.nabble.com/do-NOT-want-to-stem-plurals-for-a-particular-field%2C-or-words-tp25455570p25455570.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Best strategy to commit often under load.

2009-09-15 Thread Jérôme Etévé

Hi all,

 I've got a solr server under significant load ( ~40/s ) and a single
process which can potentially commit as often as possible.
Typically, when it commits every 5 or 10s, my solr server slows down
quite a lot and this can lead to congestion problems on my client
side.

What would you recommend in this situation, is it better to leave solr
performs the commits automatically with reasonable autocommit
parameters?

What are solr's best practices concerning this point?

Thanks for your help!

Jerome.

-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Re: Implementing customized Scorer with solr API 1.4

2009-08-21 Thread Jérôme Etévé

Hi ,
 Thanks for your help.

So do I have to do:

public Scorer scorer(IndexReader reader) throws IOException {
 SolrIndexReader solrReader = (SolrIndexReader) reader;
 int offset = solrReader.getBase() ;

Or is it a bit more complex than that?


Jerome.

2009/8/20 Mark Miller :
> Jérôme Etévé wrote:
>> Hi all,
>>
>>  I'm kind of struggling with a customized lucene.Scorer of mine, since
>> I use solr 1.4.
>>
>>  Here's the problem:
>>
>>  I wrote a DocSetQuery which inherit from a lucene.Query. This query
>> is a decorator for a lucene.Query that filters out the documents which
>> are not in a given set of  predefined documents (a solr.DocSet which I
>> call docset ).
>>
>> So In my Weight / Scorer, I implemented the method  nextDoc like that:
>>
>> public int nextDoc() throws IOException {
>> do {
>>  if (decoScorer.nextDoc() == NO_MORE_DOCS) {
>>   return NO_MORE_DOCS;
>>  }
>> // DO THIS UNTIL the doc is in the docset
>>  } while (!docset.exists(decoScorer.docID()));
>>  return decoScorer.docID();
>> }
>>
>> The decoScorer here is the decorated scorer.
>>
>> My problem here is that in docset, there are 'absolute' documents IDs,
>> but now solr uses a number of sub readers each with a kind of offset,
>> so decoScorer.docID() gives 'relative' document ID . Because of this,
>> I happen to test relative document IDs against a set of absolute
>> docIDs.
>>
>> So my DocSetQuery does not work anymore. The solution would be I think
>> to have a way of getting the offset of the SolrReader being used in
>> the context to be able to do docset.exists(decoScorer.docID() +
>> offset) .
>>
>> But how can I get this offset?
>> The scorer is built with a lucene.IndexReader in parameter:
>> public Scorer scorer(IndexReader reader) .
>>
>> Within solr, this IndexReader happens to be an instance of
>> SolrIndexReader so I though maybe I could downcast reader to a
>> SolrIndexReader to be able to call the offset related methods on it
>> (getBase() etc...).
>>
> It may not feel super clean, but it should be fine - Solr always uses a
> SolrIndexSearcher which always wraps all of the IndexReaders in
> SolrIndexReader. I'm fairly sure anyway ;)
>
> By getting the base of the subreader wihtin the top reader, you can add
> it to the doc id to get the top reader doc id.
>> I feel quite unconfortable with this solution since my DocSetQuery
>> inherits from a lucene thing, so it would be quite odd to downcast
>> something to a solr class inside it, plus I didn't really figured out
>> how to use those offset related methods.
>>
>> Thanks for your help!
>>
>> All the best!
>>
>> Jerome Eteve.
>>
>>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Implementing customized Scorer with solr API 1.4

2009-08-20 Thread Jérôme Etévé

Hi all,

 I'm kind of struggling with a customized lucene.Scorer of mine, since
I use solr 1.4.

 Here's the problem:

 I wrote a DocSetQuery which inherit from a lucene.Query. This query
is a decorator for a lucene.Query that filters out the documents which
are not in a given set of  predefined documents (a solr.DocSet which I
call docset ).

So In my Weight / Scorer, I implemented the method  nextDoc like that:

public int nextDoc() throws IOException {
do {
 if (decoScorer.nextDoc() == NO_MORE_DOCS) {
  return NO_MORE_DOCS;
 }
// DO THIS UNTIL the doc is in the docset
 } while (!docset.exists(decoScorer.docID()));
 return decoScorer.docID();
}

The decoScorer here is the decorated scorer.

My problem here is that in docset, there are 'absolute' documents IDs,
but now solr uses a number of sub readers each with a kind of offset,
so decoScorer.docID() gives 'relative' document ID . Because of this,
I happen to test relative document IDs against a set of absolute
docIDs.

So my DocSetQuery does not work anymore. The solution would be I think
to have a way of getting the offset of the SolrReader being used in
the context to be able to do docset.exists(decoScorer.docID() +
offset) .

But how can I get this offset?
The scorer is built with a lucene.IndexReader in parameter:
public Scorer scorer(IndexReader reader) .

Within solr, this IndexReader happens to be an instance of
SolrIndexReader so I though maybe I could downcast reader to a
SolrIndexReader to be able to call the offset related methods on it
(getBase() etc...).

I feel quite unconfortable with this solution since my DocSetQuery
inherits from a lucene thing, so it would be quite odd to downcast
something to a solr class inside it, plus I didn't really figured out
how to use those offset related methods.

Thanks for your help!

All the best!

Jerome Eteve.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Re: Writing and using your own Query class in solr 1.4 (trunk)

2009-08-18 Thread Jérôme Etévé

That's right. I just had another decorator which was not adapted for
the new API. My fault ..

Thanks,

Jerome.

2009/8/18 Mark Miller :
> I'm pretty sure one of them is called. In the version you have:
>
>  public void search(Query query, HitCollector results)
>   throws IOException {
>   search(createQueryWeight(query), null, new HitCollectorWrapper(results));
>  }
>
>  protected QueryWeight createQueryWeight(Query query) throws IOException {
>   return query.queryWeight(this);
>  }
>
>
> Query.queryWeight will in turn call Query.createQueryWight (either for your
> Query, or for the primitive Query
> it rewrites itself too).
>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
> Jérôme Etévé wrote:
>>
>> Hi Mark,
>>
>>
>> Thanks for clarifying this. So should I keep both sets of method
>> implemented? I guess it won't hurt when solr trunk will use the
>> updated version of lucene without those methods.
>>
>> What I don't get is that neither my createWeight or createQueryWeight
>> methods seem to be called when I call
>> rb.req.getSearcher().search(limitedQuery, myCollector);
>>
>> I'll look at the code to find out.
>>
>> Thanks!
>>
>> Jerome
>>
>> 2009/8/18 Mark Miller :
>>
>>>
>>> You have run into some stuff that has been somewhat rolled back in
>>> Lucene.
>>>
>>> QueryWieght, and the methods it brought have been reverted.
>>>
>>> Shortly (when Solr trunk updates Lucene), Solr will go back to just
>>> createWeight and weight.
>>>
>>> The main change that will be left is that Weight will be an abstract
>>> class
>>> rather than an interface.
>>>
>>>
>>> --
>>> - Mark
>>>
>>> http://www.lucidimagination.com
>>>
>>> Jérôme Etévé wrote:
>>>
>>>>
>>>> Hi all,
>>>>
>>>> I have a custom search component which uses a query I wrote.
>>>> Basically, this Query (called DocSetQuery) is a Query decorator that
>>>> skips any document which is not in a given document set. My code used
>>>> to work perfectly in solr 1.3 but in solr 1.4, it seems that my
>>>> DocSetQuery has lost all its power.
>>>>
>>>> I noticed that to be compliant with solr 1.4 trunk and the lucene it
>>>> contains, I should implement two new methods:
>>>>
>>>> createQueryWeight
>>>> and
>>>> queryWeight
>>>>
>>>> So I did. It was very easy, because basically it's only about re-using
>>>> the deprecated Weight createWeight and wrapping the result with a
>>>> QueryWeightWrapper.
>>>>
>>>> So now I believe my DocSetQuery complies with the new
>>>> solr1.4/lucene2.9-dev api. And I've got those methods:
>>>>
>>>> public QueryWeight queryWeight(Searcher searcher) throws IOException {
>>>> return createQueryWeight(searcher);
>>>> }
>>>> public QueryWeight createQueryWeight(Searcher searcher) throws
>>>> IOException
>>>> {
>>>> log.info("[sponsoring] creating QueryWeight calling createQueryWeight
>>>> ");
>>>> return new QueryWeightWrapper(createWeight(searcher));
>>>> }
>>>> public Weight weight(Searcher searcher) throws IOException {
>>>> return createWeight(searcher);
>>>> }
>>>>
>>>> //and of course
>>>>
>>>> protected Weight createWeight(final Searcher searcher) throws
>>>> IOException
>>>> {
>>>> log.info("[sponsoring] creating weight with DoCset " + docset.size());
>>>> ...
>>>> }
>>>>
>>>> I'm then using my DocSetQuery in my custom SearchComponent like that:
>>>>
>>>> Query limitedQuery = new DocSetQuery(decoratedQuery , ... );
>>>>
>>>> Then I simply perform a search by doing
>>>>
>>>> rb.req.getSearcher().search(limitedQuery, myCollector);
>>>>
>>>> My problem is neither of createQueryWeight or createWeight is called
>>>> by the solr Searcher, and I'm wondering what I did wrong.
>>>> Should I build the Weight myself and call the search method which
>>>> accepts a Weight object?
>>>>
>>>> This is quite confusing because:
>>>> - it used to work perfectly in solr 1.3
>>>> - in the nightly build version of lucene API, those new methods
>>>> createQueryWeight and queryWeight have disappeared but with the lucene
>>>> solr1.4trunk uses, they exists plus the old ones ( createWeight and
>>>> weight) are deprecated.
>>>>
>>>>
>>>> Thanks for your help.
>>>>
>>>> Jerome Eteve.
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>
>
>
>
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Re: Writing and using your own Query class in solr 1.4 (trunk)

2009-08-18 Thread Jérôme Etévé

Hi Mark,


Thanks for clarifying this. So should I keep both sets of method
implemented? I guess it won't hurt when solr trunk will use the
updated version of lucene without those methods.

What I don't get is that neither my createWeight or createQueryWeight
methods seem to be called when I call
rb.req.getSearcher().search(limitedQuery, myCollector);

I'll look at the code to find out.

Thanks!

Jerome

2009/8/18 Mark Miller :
> You have run into some stuff that has been somewhat rolled back in Lucene.
>
> QueryWieght, and the methods it brought have been reverted.
>
> Shortly (when Solr trunk updates Lucene), Solr will go back to just
> createWeight and weight.
>
> The main change that will be left is that Weight will be an abstract class
> rather than an interface.
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
> Jérôme Etévé wrote:
>>
>> Hi all,
>>
>> I have a custom search component which uses a query I wrote.
>> Basically, this Query (called DocSetQuery) is a Query decorator that
>> skips any document which is not in a given document set. My code used
>> to work perfectly in solr 1.3 but in solr 1.4, it seems that my
>> DocSetQuery has lost all its power.
>>
>> I noticed that to be compliant with solr 1.4 trunk and the lucene it
>> contains, I should implement two new methods:
>>
>> createQueryWeight
>> and
>> queryWeight
>>
>> So I did. It was very easy, because basically it's only about re-using
>> the deprecated Weight createWeight and wrapping the result with a
>> QueryWeightWrapper.
>>
>> So now I believe my DocSetQuery complies with the new
>> solr1.4/lucene2.9-dev api. And I've got those methods:
>>
>> public QueryWeight queryWeight(Searcher searcher) throws IOException {
>> return createQueryWeight(searcher);
>> }
>> public QueryWeight createQueryWeight(Searcher searcher) throws IOException
>> {
>> log.info("[sponsoring] creating QueryWeight calling createQueryWeight ");
>> return new QueryWeightWrapper(createWeight(searcher));
>> }
>> public Weight weight(Searcher searcher) throws IOException {
>> return createWeight(searcher);
>> }
>>
>> //and of course
>>
>> protected Weight createWeight(final Searcher searcher) throws IOException
>> {
>> log.info("[sponsoring] creating weight with DoCset " + docset.size());
>> ...
>> }
>>
>> I'm then using my DocSetQuery in my custom SearchComponent like that:
>>
>> Query limitedQuery = new DocSetQuery(decoratedQuery , ... );
>>
>> Then I simply perform a search by doing
>>
>> rb.req.getSearcher().search(limitedQuery, myCollector);
>>
>> My problem is neither of createQueryWeight or createWeight is called
>> by the solr Searcher, and I'm wondering what I did wrong.
>> Should I build the Weight myself and call the search method which
>> accepts a Weight object?
>>
>> This is quite confusing because:
>> - it used to work perfectly in solr 1.3
>> - in the nightly build version of lucene API, those new methods
>> createQueryWeight and queryWeight have disappeared but with the lucene
>> solr1.4trunk uses, they exists plus the old ones ( createWeight and
>> weight) are deprecated.
>>
>>
>> Thanks for your help.
>>
>> Jerome Eteve.
>>
>
>
>
>
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Writing and using your own Query class in solr 1.4 (trunk)

2009-08-18 Thread Jérôme Etévé

Hi all,

 I have a custom search component which uses a query I wrote.
Basically, this Query (called DocSetQuery) is a Query decorator that
skips any document which is not in a given document set. My code used
to work perfectly in solr 1.3 but in solr 1.4, it seems that my
DocSetQuery has lost all its power.

I noticed that to be compliant with solr 1.4 trunk and the lucene it
contains, I should implement two new methods:

createQueryWeight
and
queryWeight

So I did. It was very easy, because basically it's only about re-using
the deprecated Weight createWeight and wrapping the result with a
QueryWeightWrapper.

So now I believe my DocSetQuery complies with the new
solr1.4/lucene2.9-dev api. And I've got those methods:

 public QueryWeight queryWeight(Searcher searcher) throws IOException {
return createQueryWeight(searcher);
  }
 public QueryWeight createQueryWeight(Searcher searcher) throws IOException {
log.info("[sponsoring] creating QueryWeight calling createQueryWeight ");
return new QueryWeightWrapper(createWeight(searcher));
  }
 public Weight weight(Searcher searcher) throws IOException {
return createWeight(searcher);
  }

//and of course

protected Weight createWeight(final Searcher searcher) throws IOException {
log.info("[sponsoring] creating weight with DoCset " + docset.size());
...
}

I'm then using my DocSetQuery in my custom SearchComponent like that:

Query limitedQuery = new DocSetQuery(decoratedQuery , ... );

Then I simply perform a search by doing

rb.req.getSearcher().search(limitedQuery, myCollector);

My problem is neither of createQueryWeight or createWeight is called
by the solr Searcher, and I'm wondering what I did wrong.
Should I build the Weight myself and call the search method which
accepts a Weight object?

This is quite confusing because:
- it used to work perfectly in solr 1.3
- in the nightly build version of lucene API, those new methods
createQueryWeight and queryWeight have disappeared but with the lucene
solr1.4trunk uses, they exists plus the old ones ( createWeight and
weight) are deprecated.


Thanks for your help.

Jerome Eteve.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Re: facet performance tips

2009-08-13 Thread Jérôme Etévé

Thanks everyone for your advices.

I increased my filterCache, and the faceting performances improved greatly.

My faceted field can have at the moment ~4 different terms, so I
did set a filterCache size of 5 and it works very well.

However, I'm planning to increase the number of terms to maybe around
500 000, so I guess this approach won't work anymore, as I doubt a 500
000 sized fieldCache would work.

So I guess my best move would be to upgrade to the soon to be 1.4
version of solr to benefit from its new faceting method.

I know this is a bit off-topic, but do you have a rough idea about
when 1.4 will be an official release?
As well, is the current trunk OK for production? Is it compatible with
1.3 configuration files?

Thanks !

Jerome.

2009/8/13 Stephen Duncan Jr :
> Note that depending on the profile of your field (full text and how many
> unique terms on average per document), the improvements from 1.4 may not
> apply, as you may exceed the limits of the new faceting technique in Solr
> 1.4.
> -Stephen
>
> On Wed, Aug 12, 2009 at 2:12 PM, Erik Hatcher  wrote:
>
>> Yes, increasing the filterCache size will help with Solr 1.3 performance.
>>
>> Do note that trunk (soon Solr 1.4) has dramatically improved faceting
>> performance.
>>
>>Erik
>>
>>
>> On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote:
>>
>>  Hi everyone,
>>>
>>>  I'm using some faceting on a solr index containing ~ 160K documents.
>>> I perform facets on multivalued string fields. The number of possible
>>> different values is quite large.
>>>
>>> Enabling facets degrades the performance by a factor 3.
>>>
>>> Because I'm using solr 1.3, I guess the facetting makes use of the
>>> filter cache to work. My filterCache is set
>>> to a size of 2048. I also noticed in my solr stats a very small ratio
>>> of cache hit (~ 0.01%).
>>>
>>> Can it be the reason why the faceting is slow? Does it make sense to
>>> increase the filterCache size so it matches more or less the number
>>> of different possible values for the faceted fields? Would that not
>>> make the memory usage explode?
>>>
>>> Thanks for your help !
>>>
>>> --
>>> Jerome Eteve.
>>>
>>> Chat with me live at http://www.eteve.net
>>>
>>> jer...@eteve.net
>>>
>>
>>
>
>
> --
> Stephen Duncan Jr
> www.stephenduncanjr.com
>

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Re: Solr 1.3 and JDK1.6

2009-08-12 Thread Jérôme Etévé

Hi,
  I'm running solr 1.3 with java -version java version "1.6..." .
No problem to report.

Cheers.

J

2009/8/12 vaibhav joshi :
>
> Hi
>
> I am using Solr 1.3 ( official released version) and JDk1.5. My company is 
> moving towards upgrading all systems to JDK1.6. is it safe to upgrade to 
> JDK1.6 with Solr 1.3 wars? Are there any compatible issues with JDK1.6?
>
> Thanks
> Vaibhav
>
> _
> Sports, news, fashion and entertainment. Pick it all up in a package called 
> MSN India
> http://in.msn.com



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

facet performance tips

2009-08-12 Thread Jérôme Etévé

Hi everyone,

  I'm using some faceting on a solr index containing ~ 160K documents.
I perform facets on multivalued string fields. The number of possible
different values is quite large.

Enabling facets degrades the performance by a factor 3.

Because I'm using solr 1.3, I guess the facetting makes use of the
filter cache to work. My filterCache is set
to a size of 2048. I also noticed in my solr stats a very small ratio
of cache hit (~ 0.01%).

Can it be the reason why the faceting is slow? Does it make sense to
increase the filterCache size so it matches more or less the number
of different possible values for the faceted fields? Would that not
make the memory usage explode?

Thanks for your help !

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Re: Synonym aware string field typ

2009-08-04 Thread Jérôme Etévé

2009/8/4 Otis Gospodnetic :
> Yes, you need to specify one or the other then, index-time or query-time, 
> depending on where you want your synonyms to kick in.

Ok great. Thx !

> Eh, hitting reply to this email used your personal email instead of 
> solr-user@lucene.apache.org .  Eh eh. Making it hard for people replying to 
> keep the discussion on the list without doing extra work


It did the same for me with your message. I had to click 'reply all' .

Maybe it's a gmail problem.

J.

>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> - Original Message 
>> From: Jérôme Etévé 
>> To: Otis Gospodnetic 
>> Cc: solr-user@lucene.apache.org
>> Sent: Tuesday, August 4, 2009 12:39:33 PM
>> Subject: Re: Synonym aware string field typ
>>
>> Hi Otis,
>>
>> Thanks. Yep, this synonym behaviour is the one I want.
>>
>> So if I don't want the synonyms to be applied at index time, I need
>> to specify an index time analyzer right ?
>>
>> Jerome.
>>
>>
>> 2009/8/4 Otis Gospodnetic :
>> > Hi,
>> >
>> > KeywordTokenizer will not tokenize your string.  I have a feeling that 
>> > won't
>> work with synonyms, unless your field value entirely match a synonym.  Maybe 
>> an
>> example would help:
>> >
>> > If you have:
>> >  foo canine bar
>> > Then KeywordTokenizer won't break this into 3 tokens.
>> > And then canine/dog synonym won't work.
>> >
>> >  Yes, if you define the analyzer like that, it will be used both at index 
>> > and
>> query time.
>> >
>> > Otis
>> > --
>> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls
>> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>> >
>> >
>> >
>> > - Original Message 
>> >> From: Jérôme Etévé
>> >> To: solr-user@lucene.apache.org
>> >> Sent: Tuesday, August 4, 2009 7:33:28 AM
>> >> Subject: Synonym aware string field typ
>> >>
>> >> Hi all,
>> >>
>> >> I'd like to have a string type which is synonym aware at query time.
>> >> Is it ok to have something like that:
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> tokenizerFactory="solr.KeywordTokenizerFactory"
>> >> synonyms="my_synonyms.txt" ignoreCase="true"/>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> My questions are:
>> >>
>> >> - Will the index time analyzer stay the default for the type 
>> >> solr.StrField .
>> >> - Is the KeywordTokenizerFactory the right one to use for the query
>> >> time analyzer ?
>> >>
>> >> Cheers!
>> >>
>> >> Jerome.
>> >>
>> >> --
>> >> Jerome Eteve.
>> >>
>> >> Chat with me live at http://www.eteve.net
>> >>
>> >> jer...@eteve.net
>> >
>> >
>>
>>
>>
>> --
>> Jerome Eteve.
>>
>> Chat with me live at http://www.eteve.net
>>
>> jer...@eteve.net
>
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Re: Synonym aware string field typ

2009-08-04 Thread Jérôme Etévé

Hi Otis,

 Thanks. Yep, this synonym behaviour is the one I want.

 So if I don't want the synonyms to be applied at index time, I need
to specify an index time analyzer right ?

Jerome.


2009/8/4 Otis Gospodnetic :
> Hi,
>
> KeywordTokenizer will not tokenize your string.  I have a feeling that won't 
> work with synonyms, unless your field value entirely match a synonym.  Maybe 
> an example would help:
>
> If you have:
>  foo canine bar
> Then KeywordTokenizer won't break this into 3 tokens.
> And then canine/dog synonym won't work.
>
>  Yes, if you define the analyzer like that, it will be used both at index and 
> query time.
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> - Original Message 
>> From: Jérôme Etévé 
>> To: solr-user@lucene.apache.org
>> Sent: Tuesday, August 4, 2009 7:33:28 AM
>> Subject: Synonym aware string field typ
>>
>> Hi all,
>>
>> I'd like to have a string type which is synonym aware at query time.
>> Is it ok to have something like that:
>>
>>
>>
>>
>>
>> tokenizerFactory="solr.KeywordTokenizerFactory"
>> synonyms="my_synonyms.txt" ignoreCase="true"/>
>>
>>
>>
>>
>>
>> My questions are:
>>
>> - Will the index time analyzer stay the default for the type solr.StrField .
>> - Is the KeywordTokenizerFactory the right one to use for the query
>> time analyzer ?
>>
>> Cheers!
>>
>> Jerome.
>>
>> --
>> Jerome Eteve.
>>
>> Chat with me live at http://www.eteve.net
>>
>> jer...@eteve.net
>
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Synonym aware string field typ

2009-08-04 Thread Jérôme Etévé

Hi all,

I'd like to have a string type which is synonym aware at query time.
Is it ok to have something like that:


  
   
   
   




My questions are:

- Will the index time analyzer stay the default for the type solr.StrField .
- Is the KeywordTokenizerFactory the right one to use for the query
time analyzer ?

Cheers!

Jerome.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Faceting in more like this

2009-07-31 Thread Jérôme Etévé

Hi all,

  Is there a way to enable faceting when using a more like this handler?
  I'd like to have facets from my similar documents.

  Cheers !

  J.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Reasonable number of maxWarming searchers

2009-07-30 Thread Jérôme Etévé

Hi All,

 I'm planning to have a certain number of processes posting
independently in a solr instance.
 This instance will solely act as a master instance. No clients queries on it.

 Is there a problem if i set maxWarmingSearchers to something like 30 or 40?
 Also, how do I disable the cache warming? Is setting autowarmCount's
to 0 enough?


 Regards,

 Jerome.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Re: Mailing list: Change the reply too ?

2009-07-30 Thread Jérôme Etévé

2009/7/30 Erik Hatcher :
>
> On Jul 30, 2009, at 1:44 PM, Jérôme Etévé wrote:
>
>> Hi all,
>>
>> I don't know if it does the same from everyone, but when I use the
>> reply function of my mail agent, it sets the recipient to the user who
>> sent the message, and not the mailing list.
>>
>> So it's quite annoying cause I have to change the recipient each time
>> I reply to someone on the list.
>>
>> Can the list admins fix this issue ?
>
> All my replies go to the list.
>
> From your message, the header says:
>
>  Reply-To: solr-user@lucene.apache.org
>
>Erik

It works with your messages. It might depends on mail agents.

Jerome.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Mailing list: Change the reply too ?

2009-07-30 Thread Jérôme Etévé

Hi all,

 I don't know if it does the same from everyone, but when I use the
reply function of my mail agent, it sets the recipient to the user who
sent the message, and not the mailing list.

So it's quite annoying cause I have to change the recipient each time
I reply to someone on the list.

Can the list admins fix this issue ?

Cheers !

J.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Re: Posting data in JSON

2009-07-30 Thread Jérôme Etévé

Hi,

  Nope, I'm not using solrj (my client code is in Perl), and I'm with solr 1.3.

J.

2009/7/30 Shalin Shekhar Mangar :
> On Thu, Jul 30, 2009 at 8:31 PM, Jérôme Etévé 
> wrote:
>>
>> Hi All,
>>
>>  I'm wondering if it's possible to post documents to solr in JSON format.
>>
>> JSON is much faster than XML to get the queries results, so I think
>> it'd be great to be able to post data in JSON to speed up the indexing
>> and lower the network load.
>
> If you are using Java,Solrj on 1.4 (trunk), you can use the binary format
> which is extremely compact and efficient. Note that with Solr/Solrj 1.3,
> binary became the default response format for Solrj clients.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Posting data in JSON

2009-07-30 Thread Jérôme Etévé

Hi All,

 I'm wondering if it's possible to post documents to solr in JSON format.

JSON is much faster than XML to get the queries results, so I think
it'd be great to be able to post data in JSON to speed up the indexing
and lower the network load.

All the best !

Jerome Eteve.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Synchronisation problem with replication

2009-05-15 Thread Jérôme Etévé

Hi All,

   I've got here a small problem about replication.

  Let's say I post a document on the master server, and the slaves do
a snappuller/installer via crontab every 1 minutes.

  Then between in average 30 seconds, all my search servers are not
synchronized.

  Is there a way to improve this situation ?

  Cheers !!!

  J.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Re: Disable unique-key for Solr index

2009-05-11 Thread Jérôme Etévé

Hi !

  Is there any " primary table " in your view with a unique single key
you could use ?

J.

2009/5/11 jcott28 :
>
> I have a case where I would like a solr index created which disables the
> unique-key option.
>
> I've tried commenting out the   option and that just spits out an
> error:
>
> SEVERE: org.apache.solr.common.SolrException: QueryElevationComponent
> requires the schema to have a uniqueKeyField
>
>
> I've tried something like this : 
>
> Nothing seems to do the trick.
>
> The problem with a unique key is that the uniqueness for my results are
> actually based on all the fields in my document.  There isn't one specific
> field which is unique.  All the fields combined are unique though (they are
> taken directly from a View inside an RDBMS whose primary key is all of the
> columns).
>
> Any help would be greatly appreciated!
>
> Thanks,
>  Jeff
>
> --
> View this message in context: 
> http://www.nabble.com/Disable-unique-key-for-Solr-index-tp23487249p23487249.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Concurrent run of snapshot scripts.

2009-05-11 Thread Jérôme Etévé

Hi Everyone,

  I'm running solr 1.3 and I was wondering if there's a problem with
running the snapshot script concurrently .

  For instance, I have a cron job which performs a
snappuller/snapinstaller every minute on my slave servers. Sometime
(for instance after an optimize), the snappuller can take more  than
one minute.

Is that a problem if another snappuller is spawned whilst another one
older than one minute is still running ?

Cheers !!

Jerome Eteve.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Re: Very long commit time.

2009-03-04 Thread Jérôme Etévé

On Wed, Mar 4, 2009 at 1:21 PM, Yonik Seeley  wrote:
> On Wed, Mar 4, 2009 at 5:25 AM, Jérôme Etévé  wrote:
>> Great,
>>
>>  It went down to less than 10 secs now :)
>> What I don't really understand is that my autowarmCount were pretty
>> low ( like 128 ) and still the autowarming of the caches were very
>> slow.
>>
>> Can you explain more why it can be that slow ?
>
> One possibility is a lack of physical memory available to the OS for
> caching reads on both the old index and the new index.  This would
> cause all of the queries to be slower if they ended up doing real disk
> IO for each query/filter being warmed.

Strange, we've got plenty of memory on this box and the swap is zero.
But well, I'm happy we went around the problem. What's your experience
with commits with ~10M docs ( and ~128 autowarming count in caches ) ?

Cheers.

Jerome.



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Re: Very long commit time.

2009-03-04 Thread Jérôme Etévé

Great,

  It went down to less than 10 secs now :)
What I don't really understand is that my autowarmCount were pretty
low ( like 128 ) and still the autowarming of the caches were very
slow.

Can you explain more why it can be that slow ?

Cheers !

Jerome.

On Tue, Mar 3, 2009 at 8:00 PM, Yonik Seeley  wrote:
> Looks like cache autowarming.
> If you have statically defined warming queries in solrconfig.xml, you
> could try setting autowarmCount=0 for all the caches.
>
> -Yonik
> http://www.lucidimagination.com
>
>
> On Tue, Mar 3, 2009 at 2:37 PM, Jérôme Etévé  wrote:
>> Dear solr fans,
>>
>>  I have a solr index of roughly 8M docs and I have here a little
>> problem when I commit some insertion into it.
>>
>>  The insert itself is very fast, but my commit takes 163 seconds.
>>
>>  Here's the solr trace the commit leaves:
>>
>>  INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
>>  03-Mar-2009 20:20:35 org.apache.solr.search.SolrIndexSearcher 
>> INFO: Opening searc...@7de212f9 main
>> 03-Mar-2009 20:20:35 org.apache.solr.update.DirectUpdateHandler2 commit
>> INFO: end_commit_flush
>> 03-Mar-2009 20:20:35 org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming searc...@7de212f9 main from searc...@732d8b11 main
>>
>> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=16,evictions=0,size=16,warmupTime=71641,cumulative_lookups=90,cumulative_hits=68,cumulative_hitratio=0.75,cumulative_inserts=22,cumulative_evictions=0}
>> 03-Mar-2009 20:21:52 org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming result for searc...@7de212f9 main
>>
>> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=16,evictions=0,size=16,warmupTime=76905,cumulative_lookups=90,cumulative_hits=68,cumulative_hitratio=0.75,cumulative_inserts=22,cumulative_evictions=0}
>> 03-Mar-2009 20:21:52 org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming searc...@7de212f9 main from searc...@732d8b11 main
>>
>> queryResultCache{lookups=24,hits=24,hitratio=1.00,inserts=32,evictions=0,size=32,warmupTime=82406,cumulative_lookups=6310,cumulative_hits=268,cumulative_hitratio=0.04,cumulative_inserts=6041,cumulative_evictions=5522}
>> 03-Mar-2009 20:23:17 org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming result for searc...@7de212f9 main
>>
>> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=32,evictions=0,size=32,warmupTime=85591,cumulative_lookups=6310,cumulative_hits=268,cumulative_hitratio=0.04,cumulative_inserts=6041,cumulative_evictions=5522}
>> 03-Mar-2009 20:23:17 org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming searc...@7de212f9 main from searc...@732d8b11 main
>>
>> documentCache{lookups=720,hits=710,hitratio=0.98,inserts=40,evictions=0,size=40,warmupTime=0,cumulative_lookups=415308,cumulative_hits=283661,cumulative_hitratio=0.68,cumulative_inserts=131647,cumulative_evictions=131105}
>> 03-Mar-2009 20:23:17 org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming result for searc...@7de212f9 main
>>
>> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=415308,cumulative_hits=283661,cumulative_hitratio=0.68,cumulative_inserts=131647,cumulative_evictions=131105}
>> 03-Mar-2009 20:23:17 org.apache.solr.core.QuerySenderListener newSearcher
>> INFO: QuerySenderListener sending requests to searc...@7de212f9 main
>>
>> // Then the few warm up queries defined in solrconfig.xml
>>
>> INFO: Closing searc...@732d8b11 main
>>
>> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=16,evictions=0,size=16,warmupTime=71641,cumulative_lookups=90,cumulative_hits=68,cumulative_hitratio=0.75,cumulative_inserts=22,cumulative_evictions=0}
>>
>> queryResultCache{lookups=24,hits=24,hitratio=1.00,inserts=32,evictions=0,size=32,warmupTime=82406,cumulative_lookups=6310,cumulative_hits=268,cumulative_hitratio=0.04,cumulative_inserts=6041,cumulative_evictions=5522}
>>
>> documentCache{lookups=720,hits=710,hitratio=0.98,inserts=40,evictions=0,size=40,warmupTime=0,cumulative_lookups=415308,cumulative_hits=283661,cumulative_hitratio=0.68,cumulative_inserts=131647,cumulative_evictions=131105}
>> 03-Mar-2009 20:23:18 org.apache.solr.update.processor.LogUpdateProcessor 
>> finish
>> INFO: {commit=} 0 163189
>> 03-Mar-2009 20:23:18 org.apache.solr.core.SolrCore execute
>> INFO: [jobs] webapp=/cjsolr path=/update/ params={} status=0 QTime=163189
>>
>>
>> I'm sure I'm doing something wrong. Does this 163 seconds commit time
>> have to do with the commit parameters :
>> (optimize=false,waitFlush=false,waitSearcher=true)  ??
>>
>> Thanks for any help.
>>
>> Cheers !!
>>
>> Jerome.
>>
>> --
>> Jerome Eteve.
>>
>> Chat with me live at http://www.eteve.net
>>
>> jer...@eteve.net
>>
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Very long commit time.

2009-03-03 Thread Jérôme Etévé

Dear solr fans,

  I have a solr index of roughly 8M docs and I have here a little
problem when I commit some insertion into it.

  The insert itself is very fast, but my commit takes 163 seconds.

  Here's the solr trace the commit leaves:

  INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
  03-Mar-2009 20:20:35 org.apache.solr.search.SolrIndexSearcher 
INFO: Opening searc...@7de212f9 main
03-Mar-2009 20:20:35 org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
03-Mar-2009 20:20:35 org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@7de212f9 main from searc...@732d8b11 main

filterCache{lookups=0,hits=0,hitratio=0.00,inserts=16,evictions=0,size=16,warmupTime=71641,cumulative_lookups=90,cumulative_hits=68,cumulative_hitratio=0.75,cumulative_inserts=22,cumulative_evictions=0}
03-Mar-2009 20:21:52 org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@7de212f9 main

filterCache{lookups=0,hits=0,hitratio=0.00,inserts=16,evictions=0,size=16,warmupTime=76905,cumulative_lookups=90,cumulative_hits=68,cumulative_hitratio=0.75,cumulative_inserts=22,cumulative_evictions=0}
03-Mar-2009 20:21:52 org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@7de212f9 main from searc...@732d8b11 main

queryResultCache{lookups=24,hits=24,hitratio=1.00,inserts=32,evictions=0,size=32,warmupTime=82406,cumulative_lookups=6310,cumulative_hits=268,cumulative_hitratio=0.04,cumulative_inserts=6041,cumulative_evictions=5522}
03-Mar-2009 20:23:17 org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@7de212f9 main

queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=32,evictions=0,size=32,warmupTime=85591,cumulative_lookups=6310,cumulative_hits=268,cumulative_hitratio=0.04,cumulative_inserts=6041,cumulative_evictions=5522}
03-Mar-2009 20:23:17 org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@7de212f9 main from searc...@732d8b11 main

documentCache{lookups=720,hits=710,hitratio=0.98,inserts=40,evictions=0,size=40,warmupTime=0,cumulative_lookups=415308,cumulative_hits=283661,cumulative_hitratio=0.68,cumulative_inserts=131647,cumulative_evictions=131105}
03-Mar-2009 20:23:17 org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@7de212f9 main

documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=415308,cumulative_hits=283661,cumulative_hitratio=0.68,cumulative_inserts=131647,cumulative_evictions=131105}
03-Mar-2009 20:23:17 org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to searc...@7de212f9 main

// Then the few warm up queries defined in solrconfig.xml

INFO: Closing searc...@732d8b11 main

filterCache{lookups=0,hits=0,hitratio=0.00,inserts=16,evictions=0,size=16,warmupTime=71641,cumulative_lookups=90,cumulative_hits=68,cumulative_hitratio=0.75,cumulative_inserts=22,cumulative_evictions=0}

queryResultCache{lookups=24,hits=24,hitratio=1.00,inserts=32,evictions=0,size=32,warmupTime=82406,cumulative_lookups=6310,cumulative_hits=268,cumulative_hitratio=0.04,cumulative_inserts=6041,cumulative_evictions=5522}

documentCache{lookups=720,hits=710,hitratio=0.98,inserts=40,evictions=0,size=40,warmupTime=0,cumulative_lookups=415308,cumulative_hits=283661,cumulative_hitratio=0.68,cumulative_inserts=131647,cumulative_evictions=131105}
03-Mar-2009 20:23:18 org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {commit=} 0 163189
03-Mar-2009 20:23:18 org.apache.solr.core.SolrCore execute
INFO: [jobs] webapp=/cjsolr path=/update/ params={} status=0 QTime=163189


I'm sure I'm doing something wrong. Does this 163 seconds commit time
have to do with the commit parameters :
(optimize=false,waitFlush=false,waitSearcher=true)  ??

Thanks for any help.

Cheers !!

Jerome.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Collection distribution in a multicore environment

2009-02-24 Thread Jérôme Etévé

Hi fellow Solr fans,

  I'm setting up some collection distribution along with multicore
solr . I'm using version 1.3

  I have no problem with the snapshooter, since this can be set within
each core in solrconfig.xml.

  My question is more about the rsyncd .
  The rsyncd-start creates a rsyncd.conf in the conf directory
relative to where it lies , so what I did is  copying bin/rsynd-start
in each core directory:

  solr/
 core1/
 bin/
rsyncd-start
 conf/
rsyncd.conf
 core2/
 - same thing -

 Then for each core, I launch a rsyncd :
   /../solr/core1/bin/rsyncd-start -p 18080 -d /../solr/core1/data/

 This way, it can be stopped properly when I use (rsyncd-stop grabs
the data from the conf/rsyncd.conf of the containing core).
  /../solr/core1/bin/rsyncd-stop

 The problem is I'm not very confortable with having one running
deamon per core (each on a different port), plus a copy of each script
inside each core.

 Is there any better way to set this up ?

 Cheers !!

 Jerome Eteve.






-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Re: Precisions on solr.xml about cross context forwarding.

2008-12-17 Thread Jérôme Etévé

I was thinking, maybe we should write a patch to fix this issue.

For instance by making a dispatch servlet (with a "core" parameter or
request attribute) that would act the same way as the filter but
provide a cross context addressable entry point.

What do you think ?

Jerome

On Wed, Dec 17, 2008 at 6:24 PM, Jérôme Etévé  wrote:
> Maybe there's an 'internal query' concept in j2ee that could be a workaround ?
> I'm not really a j2ee expert ..
>
> Jerome.
>
> On Wed, Dec 17, 2008 at 5:09 PM, Smiley, David W.  wrote:
>> This bothers me too.  I find it really strange that Solr's entry-point is a
>> servlet filter instead of a servlet.
>>
>> ~ David
>>
>>
>> On 12/17/08 12:07 PM, "Jérôme Etévé"  wrote:
>>
>> Hi all,
>>
>>  In solr.xml ( /lucene/solr/trunk/src/webapp/web/WEB-INF/web.xml
>> ),it's written that
>>
>>  "It is unnecessary, and potentially problematic, to have the
>> SolrDispatchFilter
>>   configured to also filter on forwards.  Do not configure
>>   this dispatcher as FORWARD."
>>
>> The problem is that if filters do not have this FORWARD thing, then
>> cross context forwarding doesn't work.
>>
>> Is there a workaround to this problem ?
>>
>> Jerome.
>>
>> --
>> Jerome Eteve.
>>
>> Chat with me live at http://www.eteve.net
>>
>> jer...@eteve.net
>>
>>
>
>
>
> --
> Jerome Eteve.
>
> Chat with me live at http://www.eteve.net
>
> jer...@eteve.net
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Re: Precisions on solr.xml about cross context forwarding.

2008-12-17 Thread Jérôme Etévé

Maybe there's an 'internal query' concept in j2ee that could be a workaround ?
I'm not really a j2ee expert ..

Jerome.

On Wed, Dec 17, 2008 at 5:09 PM, Smiley, David W.  wrote:
> This bothers me too.  I find it really strange that Solr's entry-point is a
> servlet filter instead of a servlet.
>
> ~ David
>
>
> On 12/17/08 12:07 PM, "Jérôme Etévé"  wrote:
>
> Hi all,
>
>  In solr.xml ( /lucene/solr/trunk/src/webapp/web/WEB-INF/web.xml
> ),it's written that
>
>  "It is unnecessary, and potentially problematic, to have the
> SolrDispatchFilter
>   configured to also filter on forwards.  Do not configure
>   this dispatcher as FORWARD."
>
> The problem is that if filters do not have this FORWARD thing, then
> cross context forwarding doesn't work.
>
> Is there a workaround to this problem ?
>
> Jerome.
>
> --
> Jerome Eteve.
>
> Chat with me live at http://www.eteve.net
>
> jer...@eteve.net
>
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Precisions on solr.xml about cross context forwarding.

2008-12-17 Thread Jérôme Etévé

Hi all,

 In solr.xml ( /lucene/solr/trunk/src/webapp/web/WEB-INF/web.xml
),it's written that

 "It is unnecessary, and potentially problematic, to have the SolrDispatchFilter
  configured to also filter on forwards.  Do not configure
  this dispatcher as FORWARD."

The problem is that if filters do not have this FORWARD thing, then
cross context forwarding doesn't work.

Is there a workaround to this problem ?

Jerome.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Re: AW: Cross-context-forward to solr-instance

2008-12-17 Thread Jérôme Etévé

Hi Lance,

 Can you tell us what's this parameter and how to set it ?

 I'm also stucked with the same problem :(

 Thanks !!

 Jerome


On Mon, Sep 8, 2008 at 6:02 PM, Lance Norskog  wrote:
> You can give a default core set by adding a default parameter to the query
> in solrconfig.xml. This is hacky, but it gives you a set of cores instead of
> just one core.
>
> -Original Message-
> From: David Smiley @MITRE.org [mailto:dsmi...@mitre.org]
> Sent: Monday, September 08, 2008 7:54 AM
> To: solr-user@lucene.apache.org
> Subject: Re: AW: Cross-context-forward to solr-instance
>
>
> FWIW, I'm also using the SolrRequestFilter for forwards, despite the
> warning.
> Solr1.3 doesn't have the concept of a default core anymore yet I want this
> feature.  I made an uber-simple JSP like this:
> "
> />
> And so now my clients don't need to update their URL just because I've
> migrated to Solr 1.3.  Oh, I needed to set up the dispatcher FORWARD as you
> mentioned and I also remapped the /select/* servlet mapping to my jsp.:
>  
>selectDefaultCore
>/selectDefaultCore.jsp
>  
>
>  
>selectDefaultCore
>/select/*
>  
>
> The only problem I've seen so far is that if I echo the params
> (echoParams=all), I see the output doubled.  Weird but inconsequential.
>
> ~ David Smiley
>
>
> Hachmann wrote:
>>
>> Hi,
>>
>> I made a mistake. At least with Tomcat 5.5.x, if you configure the
>> SolrRequestFilter with FORWARD it indeed gets
>> called even when you forward from another web-context!
>>
>> Note, that the documentation says this might be problematic!
>>
>> Sorry for the previous overhasty post.
>> Björn
>>
>>> -Ursprüngliche Nachricht-
>>> Von:
>>> solr-user-return-13537-hachmann.bjoern=guj...@lucene.apache.or
>>> g
>>> [mailto:solr-user-return-13537-hachmann.bjoern=guj...@lucene.a
>> pache.org] Im Auftrag von Hachmann, Bjoern
>>> Gesendet: Samstag, 6. September 2008 08:01
>>> An: solr-user@lucene.apache.org
>>> Betreff: Cross-context-forward to solr-instance
>>>
>>> Hi,
>>>
>>> yesterday I tried the Solr-1.3-RC2 and everything seems to work fine
>>> using the traditional single-core setup. But while troubleshooting
>>> the new multi-core feature, I realized for the first time, that I
>>> have been using the deprecated (even in 1.2) class SolrServlet. This
>>> is a huge problem for us, as we run the solr-web-app parallel to our
>>> main web-app in the same servlet-container. Using this approach we
>>> can internally forward update- and select-requests to the
>>> Solr-instance currently in use.
>>>
>>> ServletContext ctx = getServletContext().getContext("solr1");
>>> RequestDispatcher rd = ctx.getNamedDispatcher("SolrServer");
>>> rd.forward(request, response);
>>>
>>> As you can see, this approach only works for the servlet named
>>> 'SolrServer' which references the deprecated class.
>>>
>>> The attempt of using a path based dispatcher
>>> (ctx.getRequestDispatcher) was not successful, even though I
>>> configured the SolrRequestFilter in the solr-web.xml to work on
>>> forwards (FORWARD), which the documentation
>>> discourages. Maybe this is because of the cross-context-dispatch?
>>>
>>> At the moment I ran totally out of ideas, apart from completely
>>> redesigning our whole setup. Any ideas are highly appreciated.
>>>
>>> Thanks in advance,
>>> Björn
>>
>>
>
> --
> View this message in context:
> http://www.nabble.com/Cross-context-forward-to-solr-instance-tp19343349p1937
> 3757.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Accessing multiple core via cross context queries.

2008-12-16 Thread Jérôme Etévé

Hi All,

 I'm developping a webapp that needs access to the solr webapp.

 I can get my solr context like that:

 ServletContext solrContext = getServletContext().getContext("/solr");

 but when I do

 solrContext.getRequestDispatcher("/core0/select/").dispatch(request,response)
;

 I get a 404 error:

  HTTP Status 404 - /solr/core0/select/

type Status report

message /solr/core0/select/

description The requested resource (/solr/core0/select/) is not available.


Beside that, if access /solr/core0/select/ directly then everything is fine.

>From what I saw in the sources , solr relies on a Filter notion to
deal with queries involving multicore, but I cannot see why this could
have an influence on what resources is available from the eyes of who.

Can't a webapp see the same things as the web users does ? j2ee gurus help !

Is there something I'm missing here ? (both webapps are with crossContext=true )


Cheers!

Jerome.


-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

MoreLikeThis and boost functions

2008-12-06 Thread Jérôme Etévé

Hi everyone,

 I'm wondering if the MoreLikeThis handler takes the boost function
parameter into account for the scoring (hence the sorting I guess) of
the similar documents it finds.

Thanks for your help !

Jerome.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

[EMAIL PROTECTED]

Different tokenizing algorithms for the same stream

2008-11-07 Thread Jérôme Etévé

Hi,

  you have to keep track of the character position yourself in your
custom Tokenizer.

  See org.apache.lucene.analysis.CharTokenizer for a starting example.

  Cheers,

  J.


On Fri, Nov 7, 2008 at 3:33 PM, Yoav Caspi <[EMAIL PROTECTED]> wrote:
> Thanks, Jerome.
>
> My problem is that in Token next(Token result) there is no information about
> the location inside the stream.
> I can read characters from the input Reader, but couldn't find a way to know
> if it's the beginning of the input or not.
>
> -J
>
> On Fri, Nov 7, 2008 at 6:13 AM, Jérôme Etévé <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>
>>  I think you could implement your personalized tokenizer in a way it
>> changes its behaviour after it has delivered X tokens.
>>
>> This implies a new tokenizer instance is build from the factory for
>> every string analyzed, which I believe is true.
>>
>> Can this be confirmed ?
>>
>> Cheers !
>>
>> Jerome.
>>
>>
>> On Thu, Nov 6, 2008 at 11:08 PM, Yuri Jan <[EMAIL PROTECTED]> wrote:
>> > Hello all,
>> >
>> > I'm trying to implement a tokenizer that will behave differently on
>> > different parts of the incoming stream.
>> > For example, for the first X words in the stream I would like to use one
>> > tokenizing algorithm, while for the rest of the stream a different
>> > tokenizing algorithm will be used.
>> >
>> > What is the best way to implement that?
>> > Where should I store this stream-related data?
>> >
>> > Thanks,
>> > Yuri
>> >
>>
>>
>>
>> --
>> Jerome Eteve.
>>
>> Chat with me live at http://www.eteve.net
>>
>> [EMAIL PROTECTED]
>
>



--
Jerome Eteve.

Chat with me live at http://www.eteve.net

[EMAIL PROTECTED]



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

[EMAIL PROTECTED]

Re: Batch and Incremental mode of indexing

2008-11-07 Thread Jérôme Etévé

Hi,
 For batch indexing, what you could do is to use two core. One in
production and one used for your update.

Once your update core is build (delete *:* plus batch insert) , you
can swap the cores to put it in production:
http://wiki.apache.org/solr/CoreAdmin#head-928b872300f1b66748c85cebb12a59bb574e501b

Cheers,

J



On Fri, Nov 7, 2008 at 12:18 PM, Vaijanath N. Rao <[EMAIL PROTECTED]> wrote:
> Hi Solr-Users,
>
> I am not sure but does there exist any mechanism where-in we can specify
> solr as Batch and incremental indexing.
> What I mean by batch indexing is solr would delete all the records which
> existed in the index and will create an new index form the given data.
> For incremental I want solr to just do the operation ( add/delete/... ).
>
> This is how we currently do batch-indexing, issue an command to solr delete
> q=*:* commit and than start the indexing.
> For incremental operation we just take the data and the operation specified.
>
> Kindly let me know if there exist a smarter way to get this working.
>
> --Thanks and Regards
> Vaijanath
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

[EMAIL PROTECTED]

delivering customized results using a SearchComponent plugin

2008-11-07 Thread Jérôme Etévé

Hi there,

  I developed a personalized SearchComponent in which I'm building a
docset from a personalized Query, and a personalized Priority Queue.

To be short, I'm doing that (in the process method) :

HitCollector hitCol = new HitCollector() {
@Override
public void collect(int doc, float score) {
myQueue.insert(new ScoreDoc(doc, score));
myNumHits[0]++;
}
};

  rb.req.getSearcher().search(myQuery, hitCol);

 After popping the ids from myQueue etc ..., I add a nice DocSlice to
the output:

 rb.rsp.add("myResponse", new DocSlice(0, mySliceLen, myIds,
myScores, myNumHits[0], myMaxScore));


 The effect of that, is that the given online response automagically
(well, as far as I understand :D) contains the documents of my
docSlice (under the key 'myResponse') , each one of them containing
the fields defined in the 'fl' parameter (including the score).

What I'd like to do is to add some fields to the returned documents. I
thought about doing this in the way the QueryComponent adds the score
( see returnFields method in QueryComponent ), but my own
'handleResponses' method is not called, plus I can't access to
rb._responseDocs (which seems to be imperative to have an effect on
the returned online response).

Here's what could help me a lot:

  - Where does the solr framework transforms the doc Ids (which are
just integers) ?
  - How is the standard QueryComponent given the possibility to add
this 'score' field to the returned document ?
  - How to hook in that process so I can add my own field ?


Cheers !!

Jerome.


-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

[EMAIL PROTECTED]

Re: Different tokenizing algorithms for the same stream

2008-11-07 Thread Jérôme Etévé

Hi,

  I think you could implement your personalized tokenizer in a way it
changes its behaviour after it has delivered X tokens.

This implies a new tokenizer instance is build from the factory for
every string analyzed, which I believe is true.

Can this be confirmed ?

Cheers !

Jerome.

On Thu, Nov 6, 2008 at 11:08 PM, Yuri Jan <[EMAIL PROTECTED]> wrote:
> Hello all,
>
> I'm trying to implement a tokenizer that will behave differently on
> different parts of the incoming stream.
> For example, for the first X words in the stream I would like to use one
> tokenizing algorithm, while for the rest of the stream a different
> tokenizing algorithm will be used.
>
> What is the best way to implement that?
> Where should I store this stream-related data?
>
> Thanks,
> Yuri
>

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

[EMAIL PROTECTED]

DocSet: BitDocSet or HashDocSet ?

2008-10-28 Thread Jérôme Etévé

Hi all,

  In my code, I'd like to keep a subset of my 14M docs which is around
100k large.

 What is according to you the best option in terms of speed and memory usage ?

 Some basic thoughts tells me the BitDocSet should be the fastest for
lookup, but takes ~ 14M * sizeof(int) in memory, whereas
 the HashDocSet takes just ~ 100k * sizeof(int)  , but is a bit slower lookup.

 The doc of HashDocSet says "t can be a better choice if there are few
docs in the set" . What does 'few' means in this context ?

 Cheers !

 Jerome.


-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

[EMAIL PROTECTED]

Re: Deadlock problem on searcher at warm up.

2008-10-24 Thread Jérôme Etévé

Great, it works now.

Thanks !

J

On Fri, Oct 24, 2008 at 4:45 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On Fri, Oct 24, 2008 at 8:21 AM, Jérôme Etévé <[EMAIL PROTECTED]> wrote:
>> I though it'd be ok to trigger this the very first time the process
>> method is called by doing something like that:
>>
>>  private boolean firstTime= true ;
>>
>>  public void process(ResponseBuilder rb) throws IOException {
>>if ( firstTime ){
>>firstTime = false ;
>>buildMyStuff(rb) ;
>>}
>>  }
>>
>>
>> The problem is that my method buildMyStuff hangs when calling
>> rb.req.getCore().getSearcher() ; ,
>> and I believe this is happening when the warm up queries are executed.
>
> getSearcher() can wait for a searcher to be registered.
> getNewestSearcher() can be used from places like inform(), but if you
> are already in process()
> then the one you should use is the one bound to the request (the
> SolrQueryRequest object) - rb.req.getSearcher()
>
> -Yonik
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

[EMAIL PROTECTED]

Re: One document inserted but nothing showing up ? SOLR 1.3

2008-10-24 Thread Jérôme Etévé

Hi there,

Are you sure you did a commit after your insertion ?

On Fri, Oct 24, 2008 at 8:11 AM, sunnyfr <[EMAIL PROTECTED]> wrote:
>
> Even that doesn't work,
> How can I check properly, I did insert one document but I can't get it back
> ???
>
>
> Feak, Todd wrote:
>>
>> Unless "q=ALL" is a special query I don't know about, the only reason you
>> would get results is if "ALL" showed up in the default field of the single
>> document that was inserted/updated.
>>
>> You could try a query of "*:*" instead. Don't forget to URL encode if you
>> are doing this via URL.
>>
>> -Todd
>>
>>
>> -Original Message-
>> From: sunnyfr [mailto:[EMAIL PROTECTED]
>> Sent: Thursday, October 23, 2008 9:17 AM
>> To: solr-user@lucene.apache.org
>> Subject: One document inserted but nothing showing up ? SOLR 1.3
>>
>>
>> Hi
>>
>> Can somebody help me ?
>> How can I see all my documents, I just did a full import :
>> 
>> Indexing completed. Added/Updated: 1 documents. Deleted 0 documents.
>> 
>>
>> and when I do :8180/solr/video/select/?q=ALL, I've no result ?
>> 
>> −
>> 
>> 0
>> 0
>> −
>> 
>> ALL
>> 
>> 
>> 
>> 
>>
>> Thanks a lot,
>>
>> --
>> View this message in context:
>> http://www.nabble.com/One-document-inserted-but-nothing-showing-up---SOLR-1.3-tp20134357p20134357.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/One-document-inserted-but-nothing-showing-up---SOLR-1.3-tp20134357p20145343.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

[EMAIL PROTECTED]

Deadlock problem on searcher at warm up.

2008-10-24 Thread Jérôme Etévé

Hi everyone,

 I'm implementing a search component inherited from SearchComponent .

 This component has to build a data structure from the index. Like in
the SpellChecker, I trigger this building by giving a special argument
at query time  (from the process method) and I'm using the searcher I
get like this:

RefCounted search = rb.req.getCore()
.getSearcher();
...
search.decref();

I included this component at the end of the chain in my search handler.

What I'd like to do is to trigger this building for a first time at
solr startup so I don't need to artificially trigger it for a first
time.

I though it'd be ok to trigger this the very first time the process
method is called by doing something like that:

 private boolean firstTime= true ;

 public void process(ResponseBuilder rb) throws IOException {
if ( firstTime ){
firstTime = false ;
buildMyStuff(rb) ;
}
 }


The problem is that my method buildMyStuff hangs when calling
rb.req.getCore().getSearcher() ; ,
and I believe this is happening when the warm up queries are executed.

Furthermore, any regular queries on a solr instance like this would
hang and wait forever.

I there any way I can get around this problem, or is there a better
way to buildMyStuff a first time when solr is started up?

Cheers,

Jerome.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

[EMAIL PROTECTED]

Re: solr 1.3 database connection latin1/stored utf8 in mysql?

2008-10-22 Thread Jérôme Etévé

Hi,

  See
   http://java.sun.com/j2se/1.3/docs/guide/intl/encoding.doc.html
  and
   
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#getBytes(java.lang.String)

  Also note that you cannot transform a latin1 string in a utf-8
string. What you can do
is to decode a latin1 octet array to a String (java uses its own
internal representation for String which you shouldn't even know
about), and you can encode a String to an utf-8 bytes array.

Cheers.

J.


On Wed, Oct 22, 2008 at 10:11 AM, sunnyfr <[EMAIL PROTECTED]> wrote:
>
> Hi Shalin
> Thanks for your answer but it doesn't work just with Dfile.encoding
> I was hoping it could work.
>
> I definitely can't change the database so I guess I must change java code.
> I've a function to change latin-1 string to utf8  but I don't know really
> where should I put it?
>
> Thanks for your answer,
>
>
> Shalin Shekhar Mangar wrote:
>>
>> Hi,
>>
>> The best way to manage international characters is to keep everything in
>> UTF-8. Otherwise it will be difficult to figure out the source of the
>> problem.
>>
>> 1. Make sure the program which writes data into MySQL is using UTF-8
>> 2. Make sure the MySQL tables are using UTF-8.
>> 3. Make sure MySQL client connections use UTF-8 by default
>> 4. If the SQL written in your data-config has international characters,
>> start Solr with "-Dfile.encoding=UTF-8" as a command line parameter
>>
>> http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
>>
>> I don't think there is any easy way to go about this. You may need to
>> revisit all the parts of your system.
>>
>> On Wed, Oct 22, 2008 at 12:52 PM, sunnyfr <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> Hi,
>>>
>>> I'm using solr1.3 mysql and tomcat55, can you please help to sort this
>>> out?
>>> How can I index data in UTF8 ? I tried to add the parameter
>>> encoding="UTF-8"
>>> in the datasource in data-config.xml.
>>>
>>> | character_set_client| latin1
>>> | character_set_connection| latin1
>>> But data are stored in UTF8 inside database, not very logic but I can't
>>> change it.
>>>
>>> But still doesn't work, Help would be more than welcome,
>>> Thanks
>>> --
>>> View this message in context:
>>> http://www.nabble.com/solr-1.3-database-connection-latin1-stored-utf8-in-mysql--tp20105301p20105301.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/solr-1.3-database-connection-latin1-stored-utf8-in-mysql--tp20105342p20106791.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

[EMAIL PROTECTED]

Re: tomcat55/solr1.3 - Indexing data, doesnt take in consideration utf8!

2008-10-21 Thread Jérôme Etévé

Looks like you have a double encoding problem.

It might be because you fetch UTF-8 binary data from mysql (I know
that for instance the perl driver has an issue with that) and you then
encode it a second time in UTF-8 when you post to solr.

Make sure the string you're getting from mysql are actually proper
unicode strings and not the raw UTF-8 encoded binary form.

You may want to have a look at
http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-charsets.html
for the proper option to use with your connection.

What you can try to check you're posting actual UTF-8 data to solr is
to dump your xml post in a file (don't forget to set the input
encoding to UTF-8 ). Then you can check if this file is readable with
any UTF-8 aware editor.

Cheers,

Jerome.


On Tue, Oct 21, 2008 at 10:43 AM, sunnyfr <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I've solr 1.3 and tomcat55.
> When I try to index a bit of data and I request ALL, obviously my accent and
> UTF8 encoding is not took in consideration.
> 
> 2006-12-14T15:28:27Z
> 
> Le 1er film de Goro Miyazaki (fils de Hayao)
> je suis allÃ(c)e  ...
> 
> æ¸¡é‚Š å‰ å·  vs ä¸‰ç"°ä¸‹ç"° 1
>
>
> My database Mysql is well in UTF8, if I request data manually from mysql I
> will get accent even japan characters properly
>
> I index my data, my data-config is :
>driver="com.mysql.jdbc.Driver"
>  url="jdbc:mysql://master-spare.videos.com/videos"
>  user="solr"
>  password="pass"
>  batchSize="-1"
>  responseBuffering="adaptive"/>
>
> My schema config file start by : 
>
> I've add in my server.xml : because my localhost point on 8180
>   maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
>   enableLookups="false" redirectPort="8443" acceptCount="100"
>   connectionTimeout="2" disableUploadTimeout="true"
> URIEncoding="UTF-8" useBodyEncodingForURI="true" />
>
> What can I check?
> I'm using a linux server.
> If I do dpkg-reconfigure -plow locales
> Generating locales...
>  fr_BE.UTF-8... up-to-date
>  fr_CA.UTF-8... up-to-date
>  fr_CH.UTF-8... up-to-date
>  fr_FR.UTF-8... up-to-date
>  fr_LU.UTF-8... up-to-date
> Generation complete.
>
> Would that be a problem, I would say no but maybe, do I miss a package???
>
>
>
> --
> View this message in context: 
> http://www.nabble.com/tomcat55-solr1.3---Indexing-data%2C-doesnt-take-in-consideration-utf8%21-tp20086167p20086167.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

[EMAIL PROTECTED]

Re: Discarding undefined fields in query

2008-10-08 Thread Jérôme Etévé

On Tue, Oct 7, 2008 at 12:56 AM, Chris Hostetter
<[EMAIL PROTECTED]> wrote:
>
> : req.getSchema().getQueryAnalyzer();
> :
> : I think it's in this analyzer that the undefined field error happens
> : (because for instance the field 'foo' doesn't exists in the schema,
> : and so it's impossible to find a specific analyzer to this field in
> : the schema).
>
> Correct.
>
> : The strange thing is that any QueryParser (Lucene API) is supposed to
> : raise a ParseException if anything wrong happens with the parsing with
> : the parse(String) method.
> :
> : But here, it seems that the Analyzer from the schema (the one we get
> : from getQueryAnalyzer()) is creating it's own error ( the undefined
> : field one, instance of SolrException) and instead of propagating it to
> : the QueryParser which could have a chance to propagate it as a
> : standard ParseException, it seems it stops solr processing the query
> : directly.
>
> Solr isn't doing anything magical here -- it's just throwing a
> SolrException, which is a RuntimeExcepttion -- the Lucene
> QueryParser.parse method only throws a ParseException in th event of
> TooManyClauses, TokenMgrError, or an inner ParseException.
>

Ook, I get it now.
Runtime exceptions don't have to be checked at compile time, ( and
couldn't be here since the Analyzer could be anything throwing
anything).

I'll catch that and deal with it then (Or is it bad programming ?) .

Thanks for your help .

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

[EMAIL PROTECTED]

Re: Discarding undefined fields in query

2008-10-01 Thread Jérôme Etévé

Hi,
  yes I've got the stack trace giving me the beginning of an explanation.

  One of the QueryParsers  I use in my Query parser plugin is a
multifiedQueryParser and it needs a fields aware Analyzer, which I get
from the schema like this:

req.getSchema().getQueryAnalyzer();

I think it's in this analyzer that the undefined field error happens
(because for instance the field 'foo' doesn't exists in the schema,
and so it's impossible to find a specific analyzer to this field in
the schema).

The strange thing is that any QueryParser (Lucene API) is supposed to
raise a ParseException if anything wrong happens with the parsing with
the parse(String) method.

But here, it seems that the Analyzer from the schema (the one we get
from getQueryAnalyzer()) is creating it's own error ( the undefined
field one, instance of SolrException) and instead of propagating it to
the QueryParser which could have a chance to propagate it as a
standard ParseException, it seems it stops solr processing the query
directly.


Here's the full stack (with the undefined field being 'hwss' )

   org.apache.solr.common.SolrException: undefined field hwss
at 
org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1053)
at 
org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer.getAnalyzer(IndexSchema.java:373)
at 
org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.tokenStream(IndexSchema.java:348)
at 
org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:473)
at 
org.apache.lucene.queryParser.MultiFieldQueryParser.getFieldQuery(MultiFieldQueryParser.java:120)
at 
org.apache.lucene.queryParser.MultiFieldQueryParser.getFieldQuery(MultiFieldQueryParser.java:135)
at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1248)
at 
org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1135)
at 
org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1092)
at 
org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1052)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:168)
at my.organisation.lucene.queryParser.MyLuceneQueryParser.parse(Unknown
Source)
at my.organisation.solr.search.MyQParser.parse(Unknown Source)
at org.apache.solr.search.QParser.getQuery(QParser.java:88)
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:82)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:155)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)

Cheers !

Jerome.

On Tue, Sep 30, 2008 at 10:34 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On Tue, Sep 30, 2008 at 2:42 PM, Jérôme Etévé <[EMAIL PROTECTED]> wrote:
>> But still I have an error from the webapp when I try to query my
>> schema with non existing fields in my query ( like foo:bar ).
>>
>> I'm wondering if the query q is parsed in a very simple way somewhere
>> else (and independently from any customized QParserPlugin) and checked
>> against the schema.
>
> It should not be.  Are you sure your QParser is being used?
> Does the error contain a stack trace that can pinpoint where it's coming from?
>
> -Yonik
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

[EMAIL PROTECTED]

Discarding undefined fields in query

2008-09-30 Thread Jérôme Etévé

Hi All,

  I wrote a customized query parser which discards non-schema fields
from the query (I'm using the schema field names  from
req.getSchema().getFields().keySet() ) .

This parser works fine in unit tests.

But still I have an error from the webapp when I try to query my
schema with non existing fields in my query ( like foo:bar ).

I'm wondering if the query q is parsed in a very simple way somewhere
else (and independently from any customized QParserPlugin) and checked
against the schema.

Is there an option to modify this behaviour so undefined fields in a
query could be simply discarded instead of throwing an error ?

Cheers !

Jerome.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

[EMAIL PROTECTED]

Re: Multicore and custom jars loading

2008-09-29 Thread Jérôme Etévé

My mistake,

  Using the sharedLib="lib/" attribute in the solr tag of solr.xml
solved the problem.

J.

On Mon, Sep 29, 2008 at 2:43 PM, Jérôme Etévé <[EMAIL PROTECTED]> wrote:
> Hello all.
>
>  I'm using a multicore installation and I've got a small issue with
> the loading of our customized jars.
>
>  Let's say I've got a class my.company.MyAnalyzer which is distributed
> in a jar called company-solr.jar
>
>  If I put this jar in the lib directory, at the solr home like this:
>
>  $solr_home/:
>solr.xml
>core1/
>core2/
>lib/company-solr.jar
>
> , then the solr class loader adds properly the company-solr.jar to the
> class loader, but then it's not possible to find those classes from
> the cores.
>  For instance if you have core1/conf/schema.xml which makes use of the
> my.company.MyAnalyzer class, it won't work because this class won't be
> found.
>
> At the moment, I solved the pb by duplicating the jar inside the two
> cores like that:
>
>   core1/lib/company-solr.jar
>   ...
>   core2/lib/company-solr.jar
>
> But I'm not very happy with this solution.
>
> Is there anyway to allow core shema files to references classes loaded
> as jars in the top level lib path ?
>
> I'm running solr1.3.0 in tomcat 6.0.18
>
>
> Cheers !!
>
> Jerome.
>
> --
> Jerome Eteve.
>
> Chat with me live at http://www.eteve.net
>
> [EMAIL PROTECTED]
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

[EMAIL PROTECTED]

Multicore and custom jars loading

2008-09-29 Thread Jérôme Etévé

Hello all.

 I'm using a multicore installation and I've got a small issue with
the loading of our customized jars.

 Let's say I've got a class my.company.MyAnalyzer which is distributed
in a jar called company-solr.jar

 If I put this jar in the lib directory, at the solr home like this:

 $solr_home/:
solr.xml
core1/
core2/
lib/company-solr.jar

, then the solr class loader adds properly the company-solr.jar to the
class loader, but then it's not possible to find those classes from
the cores.
 For instance if you have core1/conf/schema.xml which makes use of the
my.company.MyAnalyzer class, it won't work because this class won't be
found.

At the moment, I solved the pb by duplicating the jar inside the two
cores like that:

   core1/lib/company-solr.jar
   ...
   core2/lib/company-solr.jar

But I'm not very happy with this solution.

Is there anyway to allow core shema files to references classes loaded
as jars in the top level lib path ?

I'm running solr1.3.0 in tomcat 6.0.18


Cheers !!

Jerome.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

[EMAIL PROTECTED]

Querying multicore

2008-09-24 Thread Jérôme Etévé

Hi everyone,

  I'm planning to use the multicore cause it seems more convenient
than having multiple instances of solr in the same container.

  I'm wondering if it's possible to query different cores ( hence
different schemas / searchers ... indices !) from a customized
SolrRequestHandler to build a response. ?

  If not I'll have to build my own webapp and query solr through
crossContext requests. Has someone done that already ?

  Kind regards,

  Jerome Eteve.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

[EMAIL PROTECTED]

Re: Solr deployment in tomcat

2007-10-09 Thread Jérôme Etévé

On 10/9/07, Chris Laux <[EMAIL PROTECTED]> wrote:
> Jérôme Etévé wrote:
> [...]
> > /var/solr/foo/ is the solr home for this instance (where you'll put
> > your schema.xml , solrconfig.xml etc.. ) .
>
> Thanks for the input Jérôme, I gave it another try and discovered that
> what I was doing wrong was copying the solr/example/ directory to what
> you call "/var/solr/foo/", while copying solr/example/solr/ is what
> works now.
>
> Maybe I should add a note to the Wiki...

Sounds like a good idea ! Actually I remember struggling a bit to have
multiple instance of solr in tomcat.

-- 
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/

Re: Solr deployment in tomcat

2007-10-09 Thread Jérôme Etévé

Hi,

Here's what I've got (multiplesolr instance within the same tomcat server)

In
/var/tomcat/conf/Catalina/localhost/

For an instance 'foo' :

foo.xml :

   


/var/tomcat/solrapp/solr.war is the path to the solr war file. It can
be anywhere on the disk.
/var/solr/foo/ is the solr home for this instance (where you'll put
your schema.xml , solrconfig.xml etc.. ) .


Restart tomcat and you should see your foo app appear in your deployed apps.


Jerome.

On 10/9/07, Chris Laux <[EMAIL PROTECTED]> wrote:
> > Hello Group,
> >  Does anyone able to deploy solr.war @ tomcat. I just tried to deploy it as 
> > per wiki and it gives bunch of exceptions and I dont think those exceptions 
> > have any relevance with the actual cause. I was wondering if there is any 
> > speciaf configuration needed?
>
> I had that very same problem while trying to set solr up with tomcat
> (and multiple instances). I have given up for now and am working with
> Jetty instead.
>
> Chris Laux
>
>


-- 
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/

Re: Problem with html code inside xml

2007-09-25 Thread Jérôme Etévé

If I understand, you want to keep the raw html code in solr like that
(in your posting xml file):


  


I think you should encode your content to protect these xml entities:
<  ->  <
> -> >
" -> "
& -> &

If you use perl, have a look at HTML::Entities.


On 9/25/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I've got some problem with html code who is embedded in xml file:
>
> Sample source .
>
> 
> 
> 
>  Les débats
> 
> 
> Le premier tour des élections fédérales se déroulera 
> le 21
> octobre prochain. D'ici là, La 1ère vous propose plusieurs rendez-
> vous, dont plusieurs grands débats à l'enseigne de Forums.
> 
> 
> 
> 
> my para textehere
> 
> 
> Vous trouverez sur cette page toutes les 
> dates et les heures de
> ces différents rendez-vous ainsi que le nom et les partis des
> débatteurs. De plus, vous pourrez également écouter ou réécouter
> l'ensemble de ces émissions.
> 
> 
> 
> -
> When a make a query on solr I've got something like that in the
> source code of the xml result:
>
> http://www.w3.org/1999/xhtml";>
> <
> div
> class
> =
> "paragraph"
> >
> <
> div
> class
> =
> "paragraphTitle"
> />
> −
> <
> ...
>
> It is not exactly what I want. I want to keep the html tags, that all
> without formatting.
>
> So the br tags and a tags are well formed in xml and json result, but
> the div tags are not kept.
> -
> In the schema.xml I've got this for the html content
>
> 
>
>stored="true" multiValued="true"/>
>
> -
>
> Any help would be appreciate.
>
> Thanks in advance.
>
> S. Christin
>
>
>
>
>
>


-- 
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/

Re: How to get all the search results - python

2007-09-24 Thread Jérôme Etévé

By design, it's not very efficient to ask for a large number of
results with solr/lucene. I think you will face performance and memory
problems if you do that.


On 9/24/07, Thorsten Scherler <[EMAIL PROTECTED]> wrote:
> On Mon, 2007-09-24 at 16:29 +0530, Roopesh P Raj wrote:
> > > Hi Roopesh,
> >
> > > I am not sure whether I understand your problem.
> >
> > > Is it the limitation of rows/pagination?
> > > If so why not using a real high number (like rows=100)?
> >
> > > salu2
> >
> > Hi,
> >
> > Assigning a high number will solve my problem. (I thought that there will 
> > something like rows='all' to do it).
> >
> > Can I do pagination using the python client?
>
> I am not a python expert but I think so.
>
> > How can I specify the starting position, offset etc for
> > pagination through the python client?
>
> http://wiki.apache.org/solr/CommonQueryParameters
>
> It should work as described in the above document (with the start
> parameter.
>
> e.g.
> data = c.search(q='query', fl='id score unique_id Message-ID To From
> Subject',rows=50, wt='python',start=50)
>
> HTH
> --
> Thorsten Scherler thorsten.at.apache.org
> Open Source Java  consulting, training and solutions
>
>


-- 
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/

-field:[* TO *] doesn't seem to work

2007-09-03 Thread Jérôme Etévé

Hi all
 I've got a problem here with the '-field:[* TO *]' syntax. It doesn't
seem to work as expected (see
http://wiki.apache.org/solr/SolrQuerySyntax ).

My request is 'word -fieldD:[* TO *]' and the debugQuery=1 solr option
shows that it's properly transformed as :

+(fieldA:chef^10.0 fieldB:chef fieldC:chef^2.0) -fieldD:[* TO *]

but solr still gives back documents with non void fieldD .

My fieldD is defined as ' with

text_ws being the standard solr text field that only splits on
whitespace for exact matching of words.

Did I missed something ?

I'm using solr 1.2.1-dev .


Thanks for any help !

Jerome.

-- 
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/

Re: Index HotSwap

2007-08-24 Thread Jérôme Etévé

On 8/21/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
> :  I'm wondering what's the best way to completely change a big index
> : without loosing any requests.
>
> use the snapinstaller script -- or adopt the same atomic copying approach
> it uses.

I'm having a look :)

> :   - Between the two mv's, the directory dir does not exists, which can
> : cause some solr failure.
>
> this shouldn't cause any failure unless you tell Solr to try and reload
> turing the move (ie: you send it a commit) ... either way an atomic copy
> in place of a mv should work much better.

Why, does the reloading of the searcher triggers a re loading of the
files from disk ?
Thx !

>
> -Hoss
>
>


-- 
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/

Re: Indexing HTML content... (Embed HTML into XML?)

2007-08-22 Thread Jérôme Etévé

You need to encode your html content so it can be include as a normal
'string' value in your xml element.

As far as remember, the only unsafe characters you have to encode as
entities are:
<  -> <
> -> >
" -> "e;
& -> &

(google xml entities to be sure).

I dont know what language you use , but for perl for instance, you can
use something like:
use HTML::Entities ;
my $xmlString = encode_entities($rawHTML  , '<>&"' );

Also you need to make sure your Html is encoded in UTF-8 . To comply
with solr need for UTF-8 encoded xml.

I hope it helps.

J.

On 8/22/07, Ravish Bhagdev <[EMAIL PROTECTED]> wrote:
> Hello,
>
> Sorry for stupid question.  I'm trying to index html file as one of
> the fields in Solr, I've setup appropriate analyzer in schema but I'm
> not sure how to add html content to Solr.  Encapsulating HTML content
> within field tag is obviously not valid.  How do I add html content?
> Hope the query is clear
>
> Thanks,
> Ravi
>


-- 
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/

Index HotSwap

2007-08-21 Thread Jérôme Etévé

Hi all,
 I'm wondering what's the best way to completely change a big index
without loosing any requests.
 That's how I do at the moment:

 solr index is a soft link to a directory dir.
 When I want to install a new index (in dir.new), I do a

 mv dir dir.old ; mv dir.new dir
 Then I ask for a reload of the solr application (within tomcat).

 I can see two problems with this method:

  - Between the two mv's, the directory dir does not exists, which can
cause some solr failure.

  - Apparently It's not that safe to reload a webapp within tomcat.
   I thought it was the equivalent of the apache graceful reloading
(completing current requests and putting incoming ones into a queue
while the application restarts), but it's   apparently not. I noticed
we have a couple of query lost when it happens.
 One is a 503 This application is not currently available, and the one
just after is
 a 404 /solr//select/ - The requested resource (/solr//select/) is not
available.

 Does anybody know how to avoid this behaviour, and eventually what is
the best way to swap between two big indexes.

Thanks for any help !

Jerome.





-- 
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/

Using MMapDirectory instead of FSDirectory

2007-08-21 Thread Jérôme Etévé

Hi !

  Is there a way to use a MMapDirectory instead of FSDirectory within Solr ?

Our index is quite big and It takes a long time to go up in the OS
cached memory. I'm wondering if an MMapDirectory could help to have
our data in memory quicker (our index on disk is bigger than our
memory available).

Do you have tips on optimizing such thing ?
Thanks !!!

Jerome.

-- 
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/

Re: Solr Search any Lucene Index ?

2007-07-16 Thread Jérôme Etévé

Hi,

From my personal experience, solr is capable to search in an index

generated with CLucene.
Of course, you have to be carefull on the type mappings.

J.

On 7/16/07, Ard Schrijvers <[EMAIL PROTECTED]> wrote:

Hello,

AFAIK, Solr Search is only capable of searching in a Lucene index that is 
created by Solr (at least, this seems logical to me)...or, the exact same 
fields and analyzers must have been indexed the way solr would have done it.

Ard

>
> Hi,
>   Can Solr Search any Lucene Index. If "YES" what should
> be change in
> configuration.
>
> Thanks
> Narendra
>

--
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/

Re: Pluggable IndexSearcher Proposal

2007-07-05 Thread Jérôme Etévé


Actually, I implemented that feature for the 1.2.0 version of solr
(the one I use)

It allows you to speficy the IndexSearcher used by solr in the schema
configuration file:




If the specified class cant be loaded, a severe message is issued in
the log and solr
falls back to the hardcoded lucene IndexSearcher .

The patch to apply is attached to this email.

I also created an issue in the solr jira:
https://issues.apache.org/jira/browse/SOLR-288
but I didn t find the way to upload the patch.

Thanks for your comments.

Jerome.

On 7/5/07, Jérôme Etévé <[EMAIL PROTECTED]> wrote:

Hi all !

I need a new feature in solr : to allow the configuration of the
IndexSearcher class in the schema configuration to override the lucene
IndexSearcher .
I noticed that there's only one point in the code where the searcher is built:


in org/apache/solr/search/SolrIndexSearcher.java:

  private SolrIndexSearcher(IndexSchema schema, String name,
IndexReader r, boolean closeReader, boolean enableCache) {
this.schema = schema;
this.name = "Searcher@" + Integer.toHexString(hashCode()) +
(name!=null ? " "+name : "");

log.info("Opening " + this.name);

reader = r;
/** HERE */
   searcher = new IndexSearcher(r);


I'd like to allow a new tag in the schema :

  



I dont exactly know what is the best way to do it.
I was think of:

* In IndexSchema:

implement a method
String getLuceneIndexSearcherClassName()

* In SolrIndexSearcher
  in  private SolrIndexSearcher:

  String idxSearcherClassName = schema.getLuceneIndexSearcherClassName()
  // Then load the class itself
  // Then build a new instance of this class with the IndexReader r

 What solr special class loader and instance builder do I have to use
to do the last two operation ?

Can I use directly :

Class idxSearcherClass = Config.findClass(idxSearcherClassName)

and then build a idxSearcher by using the standard java.lang.Class methods ?

Am I in the right and does it fit with the solr architecture to do that ?

I'd be perfectly happy to implement that and submit a patch.

Thanks for your comments and answers.

Jerome

--
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/




--
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/
diff -Nurp src_old/java/org/apache/solr/schema/IndexSchema.java src/java/org/apache/solr/schema/IndexSchema.java
--- src_old/java/org/apache/solr/schema/IndexSchema.java	2007-05-30 16:51:06.0 +0100
+++ src/java/org/apache/solr/schema/IndexSchema.java	2007-07-05 16:46:11.0 +0100
@@ -125,6 +125,13 @@ public final class IndexSchema {
   public Collection getRequiredFields() { return requiredFields; }
 
   private Similarity similarity;
+  private String searcherClassName = null ;
+
+/**
+ * Returns the indexSearcherClassName to use with this index
+ */
+public String getSearcherClassName() { return searcherClassName ;}
+
 
   /**
* Returns the Similarity used for this index
@@ -449,6 +456,15 @@ public final class IndexSchema {
   similarity = (Similarity)Config.newInstance(node.getNodeValue().trim());
   log.fine("using similarity " + similarity.getClass().getName());
 }
+
+// Grab indexSearcher class
+node = (Node) xpath.evaluate("/schema/searcher/@class" , document, XPathConstants.NODE);
+if ( node != null ){
+	searcherClassName  = node.getNodeValue().trim() ;
+	log.info("will use " + searcherClassName + " for IndexSearcher class");
+}else{
+	log.info("No customized index searcher class - will use default");
+}
 
 node = (Node) xpath.evaluate("/schema/defaultSearchField/text()", document, XPathConstants.NODE);
 if (node==null) {
diff -Nurp src_old/java/org/apache/solr/search/SolrIndexSearcher.java src/java/org/apache/solr/search/SolrIndexSearcher.java
--- src_old/java/org/apache/solr/search/SolrIndexSearcher.java	2007-05-30 16:51:15.0 +0100
+++ src/java/org/apache/solr/search/SolrIndexSearcher.java	2007-07-05 17:45:18.0 +0100
@@ -41,6 +41,8 @@ import java.util.*;
 import java.util.logging.Level;
 import java.util.logging.Logger;
 
+import java.lang.reflect.Constructor ;
+
 
 /**
  * SolrIndexSearcher adds schema awareness and caching functionality
@@ -104,7 +106,33 @@ public class SolrIndexSearcher extends S
 log.info("Opening " + this.name);
 
 reader = r;
-searcher = new IndexSearcher(r);
+//searcher = new IndexSearcher(r);
+
+// Eventually build a searcher according to configuration
+String idxSearcherClassName = schema.getSearcherClassName() ;
+if ( idxSearcherClassName == null ){
+	log.info("Using hardcoded standard lucene IndexSearcher");
+	searcher = new IndexSearcher(r);
+}else{
+	log.info("Attempting to load " + idxSearcherClassName );
+	IndexSearcher customsearcher ;
+	try{
+	Class idx

Pluggable IndexSearcher Proposal

2007-07-05 Thread Jérôme Etévé


Hi all !

I need a new feature in solr : to allow the configuration of the
IndexSearcher class in the schema configuration to override the lucene
IndexSearcher .
I noticed that there's only one point in the code where the searcher is built:


in org/apache/solr/search/SolrIndexSearcher.java:

 private SolrIndexSearcher(IndexSchema schema, String name,
IndexReader r, boolean closeReader, boolean enableCache) {
   this.schema = schema;
   this.name = "Searcher@" + Integer.toHexString(hashCode()) +
(name!=null ? " "+name : "");

   log.info("Opening " + this.name);

   reader = r;
/** HERE */
  searcher = new IndexSearcher(r);


I'd like to allow a new tag in the schema :

 



I dont exactly know what is the best way to do it.
I was think of:

* In IndexSchema:

implement a method
String getLuceneIndexSearcherClassName()

* In SolrIndexSearcher
 in  private SolrIndexSearcher:

 String idxSearcherClassName = schema.getLuceneIndexSearcherClassName()
 // Then load the class itself
 // Then build a new instance of this class with the IndexReader r

What solr special class loader and instance builder do I have to use
to do the last two operation ?

Can I use directly :

Class idxSearcherClass = Config.findClass(idxSearcherClassName)

and then build a idxSearcher by using the standard java.lang.Class methods ?

Am I in the right and does it fit with the solr architecture to do that ?

I'd be perfectly happy to implement that and submit a patch.

Thanks for your comments and answers.

Jerome

--
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/

Specific fields with DisMaxQueryHandler

2007-07-02 Thread Jérôme Etévé


Hi ,
 when we use DisMaxQueryHandler, queries that includes specific
fields which are not part of the boost string doesn't seem to work.

For instance, If the boost string ( qf ) is 'a^3 b^4' and
my query is 'term +c:term2' , it doesnt produce any result.

Am I using this QueryHandler the bad way ?

Thanks for your help.

Jerome.

--
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/

MultifieldSolrQueryParser ?

2007-06-29 Thread Jérôme Etévé


Hi,
Solr uses a default query parser which is a SolrQueryParser based on
a org.apache.lucene.queryParser.QueryParser;

I wonder which is the best way to make the IndexSchema use some kind
of MultifieldSolrQueryParser which could be based on a
org.apache.lucene.queryParser.MultiFieldQueryParser for per field
boost factor.

Thank you for any help !

Jerome.

--
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/

Re: Log levels setting

2007-06-29 Thread Jérôme Etévé


On 6/29/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:

: Hi,
:  is there a way to avoid going to the web interface to set up the solr
: log level ?

he web intrface for tweaking the log level is actually a miss-feature in
my opinion ... it's a handy way to quickly crank the logging level up if
something weird is happening nad you want to see why, but the best way to
configre logging for Solr is via whatever configuration mechanism your
Servlet Container provides for managing JDK logging.


Thanks for those informations !
I'm using tomcat 6, does somebody has a snippet of conf file
to set up the log level for all org.apache.solr.* classes ?



Resin, Tomcat, and Jetty all support differnet configuration mechanisms
for controlling the logging level of individual loggers (which is one way
you can say i want INFO level from these classes, but only WARNINGs from
these other classes) ... in the absolute worst case scenerio if your
servlet container doesn't support any special logging configuration, you
can use the JDK system properties to specify a logging.properties file the
JDK should load on startup...

http://java.sun.com/j2se/1.5.0/docs/guide/logging/overview.html


-Hoss



--
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/

Log levels setting

2007-06-29 Thread Jérôme Etévé


Hi,
is there a way to avoid going to the web interface to set up the solr
log level ?

I'm also a bit confused about the INFO log level. Actually it's very
nice to see some startup info about the schema , solr home setting,
customize modules loaded ..  But also this INFO log levels gives two
lines for every request done, which fills up very quickly the log file
with not so usefull information .

Is there a way to isolate those request informations from the INFO log level ?

Thanks for your comments and advices !

Jerome.

--
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/

float field indexed with clucene, access with solr

2007-06-28 Thread Jérôme Etévé


Hi,
I have an index which I generated with clucene where there is a float field.
This float field is stored as a simple verbatim character string.
The solr schema doc states that for such float fields:
 

And for sortable float fields:


What does exactly means 'a string value that isn't human-readable in
its internal form' ?
Does that mean that such a field as to be indexed as a binary
representation of the number to allow the use of the sfloat type ?

I noticed that in the FloatField class, the method getSortField is like that:

public SortField getSortField(SchemaField field,boolean reverse) {
   return new SortField(field.name,SortField.FLOAT, reverse);
 }

It seems to return the right type of SortField.FLOAT adapted to my field.

In SortableFloatField,
 public SortField getSortField(SchemaField field,boolean reverse) {
   return getStringSort(field,reverse);
 }

I'm not sure to understand all of this,

but what I feel is that since the type 'FloatField' gives that  'new
SortField(field.name,SortField.FLOAT)' , it should suits my verbatim
float data for sorting the query results.

Do I have the right feeling ?

thanks for your help


--
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/

Re: Problems querying Russian content

2007-06-28 Thread Jérôme Etévé

On 6/28/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:

On 6/28/07, Daniel Alheiros <[EMAIL PROTECTED]> wrote:
> I'm in trouble now about how to issue queries against Solr using in my "q"
> parameter content in Russian (it applies to Chinese and Arabic as well).
>
> The problem is I can't send any Russian special character in URL's because
> they don't fit in ASCII domain, so I'm doing a POST to accomplish that.

You can send unicode in URLs (it's done as the UTF-8 bytes percent encoded).
http://www.ietf.org/rfc/rfc3986.txt

But a POST should work too.  You just need to make sure the
Content-type contains the character encoding, and that it actually
matches what is being sent.

If this is a browser doing the POST, it can be a bit tricky to get it
to post UTF-8... basically, I think the browser uses the charset of
the HTML page containing the form when it does the POST (so make sure
that's UTF8).

You can also ensure the browser sends an utf8 encoded post by
http://jerome.eteve.free.fr/

Re: XML vs JSON writer performance issues

2007-06-27 Thread Jérôme Etévé



2007/6/27, Yonik Seeley <[EMAIL PROTECTED]>:
>
> It would be helpful if you could try out the patch at
> https://issues.apache.org/jira/browse/SOLR-276
>
> -Yonik


I just tryed it out and it works. json output is now as fast as xml !
Well done :) thank you !

J.

--
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/

Re: XML vs JSON writer performance issues

2007-06-26 Thread Jérôme Etévé

On 6/26/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:

On 6/26/07, Jérôme Etévé <[EMAIL PROTECTED]> wrote:
>   I'm currently running some tests with solr on a small index and I
> noticed a big difference on the response time of queries depending on
> the use of XML or json as a response format.
> In average, my test queries (including http connections open and close
> ) takes 6 ms to perform when I ask for XML and they take 30 ms when I
> ask for JSON.

Wow, that's a surprise.
The only thing I can figure is that perhaps during the string escaping
the JSON writer is writing to the stream character-by-character.
Could you try the python writer and see if there is a speed
difference?  It uses a StringBuilder when escaping the string.

I just tried the python writer and it's as fast as XML is.  I'm still
looking at the code trying to point out the reason of that.
Thanks for any help.

J

--
Jerome Eteve.
[EMAIL PROTECTED]
http://www.eteve.net

XML vs JSON writer performance issues

2007-06-26 Thread Jérôme Etévé


Hi all.
 I'm currently running some tests with solr on a small index and I
noticed a big difference on the response time of queries depending on
the use of XML or json as a response format.
In average, my test queries (including http connections open and close
) takes 6 ms to perform when I ask for XML and they take 30 ms when I
ask for JSON.
When I'm running lots of test clients at the same time, the same
factor 30/6 seems to apply.

I looked at the code and didn't see any major difference between the
two writers.
I'd rather use json instead of XML, but that performance issue prevents me to.

I'm using  apache-solr-1.2.0 / apache-tomcat-6.0.13 / java version
"1.5.0_09" (Sun)

Thanks for any comments or help.

--
Jerome Eteve.
[EMAIL PROTECTED]
http://www.eteve.net

94 matches

Mail list logo