from:"antonio"

Hi erik.

What i want to said is that we have enough memory to store shards, and
furthermore, JVMs heapspaces

Machine has 400gb of RAM. I think we have enough.

We have 10 JVM running on the machine, each of one using 16gb.

Shard size is about 8gb.

When we have query or indexing peaks our problem are the CPU ussage and the
disk io, but we have a lot of unused memory.









El 5/7/2017 19:04, "Erick Erickson"  escribió:

> bq: We have enough physical RAM to store full collection and 16Gb for each
> JVM.
>
> That's not quite what I was asking for. Lucene uses MMapDirectory to
> map part of the index into the OS memory space. If you've
> over-allocated the JVM space relative to your physical memory that
> space can start swapping. Frankly I'd expect your query performance to
> die if that was happening so this is a sanity check.
>
> How much physical memory does the machine have and how much memory is
> allocated to _all_ of the JVMs running on that machine?
>
> see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-
> on-64bit.html
>
> Best,
> Erick
>
>
> On Wed, Jul 5, 2017 at 9:41 AM, Antonio De Miguel 
> wrote:
> > Hi Erik! thanks for your response!
> >
> > Our soft commit is 5 seconds. Why generates I/0 a softcommit? first
> notice.
> >
> >
> > We have enough physical RAM to store full collection and 16Gb for each
> > JVM.  The collection is relatively small.
> >
> > I've tried (for testing purposes)  disabling transactionlog (commenting
> > )... but cluster does not go up. I'll try writing into
> separated
> > drive, nice idea...
> >
> >
> >
> >
> >
> >
> >
> >
> > 2017-07-05 18:04 GMT+02:00 Erick Erickson :
> >
> >> What is your soft commit interval? That'll cause I/O as well.
> >>
> >> How much physical RAM and how much is dedicated to _all_ the JVMs on a
> >> machine? One cause here is that Lucene uses MMapDirectory which can be
> >> starved for OS memory if you use too much JVM, my rule of thumb is
> >> that _at least_ half of the physical memory should be reserved for the
> >> OS.
> >>
> >> Your transaction logs should fluctuate but even out. By that I mean
> >> they should increase in size but every hard commit should truncate
> >> some of them so I wouldn't expect them to grow indefinitely.
> >>
> >> One strategy is to put your tlogs on a separate drive exactly to
> >> reduce contention. You could disable them too at a cost of risking
> >> your data. That might be a quick experiment you could run though,
> >> disable tlogs and see what that changes. Of course I'd do this on my
> >> test system ;).
> >>
> >> But yeah, Solr will use a lot of I/O in the scenario you are outlining
> >> I'm afraid.
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Jul 5, 2017 at 8:08 AM, Antonio De Miguel 
> >> wrote:
> >> > thanks Markus!
> >> >
> >> > We already have SSD.
> >> >
> >> > About changing topology we probed yesterday with 10 shards, but
> >> system
> >> > goes more inconsistent than with the current topology (5x10). I dont
> know
> >> > why... too many traffic perhaps?
> >> >
> >> > About merge factor.. we set default configuration for some days... but
> >> when
> >> > a merge occurs system overload. We probed with mergefactor of 4 to
> >> improbe
> >> > query times and trying to have smaller merges.
> >> >
> >> > 2017-07-05 16:51 GMT+02:00 Markus Jelsma  >:
> >> >
> >> >> Try mergeFactor of 10 (default) which should be fine in most cases.
> If
> >> you
> >> >> got an extreme case, either create more shards and consider better
> >> hardware
> >> >> (SSD's)
> >> >>
> >> >> -Original message-
> >> >> > From:Antonio De Miguel 
> >> >> > Sent: Wednesday 5th July 2017 16:48
> >> >> > To: solr-user@lucene.apache.org
> >> >> > Subject: Re: High disk write usage
> >> >> >
> >> >> > Thnaks a lot alessandro!
> >> >> >
> >> >> > Yes, we have very big physical dedicated machines, with a topology
> of
> >> 5
> >> >> > shards and10 replicas each shard.
> >> >> >
> >> >> >
> >> >> > 1. transaction

Re: High disk write usage

Hi Erik! thanks for your response!

Our soft commit is 5 seconds. Why generates I/0 a softcommit? first notice.


We have enough physical RAM to store full collection and 16Gb for each
JVM.  The collection is relatively small.

I've tried (for testing purposes)  disabling transactionlog (commenting
)... but cluster does not go up. I'll try writing into separated
drive, nice idea...








2017-07-05 18:04 GMT+02:00 Erick Erickson :

> What is your soft commit interval? That'll cause I/O as well.
>
> How much physical RAM and how much is dedicated to _all_ the JVMs on a
> machine? One cause here is that Lucene uses MMapDirectory which can be
> starved for OS memory if you use too much JVM, my rule of thumb is
> that _at least_ half of the physical memory should be reserved for the
> OS.
>
> Your transaction logs should fluctuate but even out. By that I mean
> they should increase in size but every hard commit should truncate
> some of them so I wouldn't expect them to grow indefinitely.
>
> One strategy is to put your tlogs on a separate drive exactly to
> reduce contention. You could disable them too at a cost of risking
> your data. That might be a quick experiment you could run though,
> disable tlogs and see what that changes. Of course I'd do this on my
> test system ;).
>
> But yeah, Solr will use a lot of I/O in the scenario you are outlining
> I'm afraid.
>
> Best,
> Erick
>
> On Wed, Jul 5, 2017 at 8:08 AM, Antonio De Miguel 
> wrote:
> > thanks Markus!
> >
> > We already have SSD.
> >
> > About changing topology we probed yesterday with 10 shards, but
> system
> > goes more inconsistent than with the current topology (5x10). I dont know
> > why... too many traffic perhaps?
> >
> > About merge factor.. we set default configuration for some days... but
> when
> > a merge occurs system overload. We probed with mergefactor of 4 to
> improbe
> > query times and trying to have smaller merges.
> >
> > 2017-07-05 16:51 GMT+02:00 Markus Jelsma :
> >
> >> Try mergeFactor of 10 (default) which should be fine in most cases. If
> you
> >> got an extreme case, either create more shards and consider better
> hardware
> >> (SSD's)
> >>
> >> -Original message-
> >> > From:Antonio De Miguel 
> >> > Sent: Wednesday 5th July 2017 16:48
> >> > To: solr-user@lucene.apache.org
> >> > Subject: Re: High disk write usage
> >> >
> >> > Thnaks a lot alessandro!
> >> >
> >> > Yes, we have very big physical dedicated machines, with a topology of
> 5
> >> > shards and10 replicas each shard.
> >> >
> >> >
> >> > 1. transaction log files are increasing but not with this rate
> >> >
> >> > 2.  we 've probed with values between 300 and 2000 MB... without any
> >> > visible results
> >> >
> >> > 3.  We don't use those features
> >> >
> >> > 4. No.
> >> >
> >> > 5. I've probed with low and high mergefacors and i think that is  the
> >> point.
> >> >
> >> > With low merge factor (over 4) we 've high write disk rate as i said
> >> > previously
> >> >
> >> > with merge factor of 20, writing disk rate is decreasing, but now,
> with
> >> > high qps rates (over 1000 qps) system is overloaded.
> >> >
> >> > i think that's the expected behaviour :(
> >> >
> >> >
> >> >
> >> >
> >> > 2017-07-05 15:49 GMT+02:00 alessandro.benedetti  >:
> >> >
> >> > > Point 2 was the ram Buffer size :
> >> > >
> >> > > *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene
> >> > >  indexing for buffering added documents and deletions before
> >> they
> >> > > are
> >> > >  flushed to the Directory.
> >> > >  maxBufferedDocs sets a limit on the number of documents
> >> buffered
> >> > >  before flushing.
> >> > >  If both ramBufferSizeMB and maxBufferedDocs is set, then
> >> > >  Lucene will flush based on whichever limit is hit first.
> >> > >
> >> > > 100
> >> > > 1000
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > -
> >> > > ---
> >> > > Alessandro Benedetti
> >> > > Search Consultant, R&D Software Engineer, Director
> >> > > Sease Ltd. - www.sease.io
> >> > > --
> >> > > View this message in context: http://lucene.472066.n3.
> >> > > nabble.com/High-disk-write-usage-tp4344356p4344386.html
> >> > > Sent from the Solr - User mailing list archive at Nabble.com.
> >> > >
> >> >
> >>
>

Re: High disk write usage

thanks Markus!

We already have SSD.

About changing topology we probed yesterday with 10 shards, but system
goes more inconsistent than with the current topology (5x10). I dont know
why... too many traffic perhaps?

About merge factor.. we set default configuration for some days... but when
a merge occurs system overload. We probed with mergefactor of 4 to improbe
query times and trying to have smaller merges.

2017-07-05 16:51 GMT+02:00 Markus Jelsma :

> Try mergeFactor of 10 (default) which should be fine in most cases. If you
> got an extreme case, either create more shards and consider better hardware
> (SSD's)
>
> -Original message-
> > From:Antonio De Miguel 
> > Sent: Wednesday 5th July 2017 16:48
> > To: solr-user@lucene.apache.org
> > Subject: Re: High disk write usage
> >
> > Thnaks a lot alessandro!
> >
> > Yes, we have very big physical dedicated machines, with a topology of 5
> > shards and10 replicas each shard.
> >
> >
> > 1. transaction log files are increasing but not with this rate
> >
> > 2.  we 've probed with values between 300 and 2000 MB... without any
> > visible results
> >
> > 3.  We don't use those features
> >
> > 4. No.
> >
> > 5. I've probed with low and high mergefacors and i think that is  the
> point.
> >
> > With low merge factor (over 4) we 've high write disk rate as i said
> > previously
> >
> > with merge factor of 20, writing disk rate is decreasing, but now, with
> > high qps rates (over 1000 qps) system is overloaded.
> >
> > i think that's the expected behaviour :(
> >
> >
> >
> >
> > 2017-07-05 15:49 GMT+02:00 alessandro.benedetti :
> >
> > > Point 2 was the ram Buffer size :
> > >
> > > *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene
> > >  indexing for buffering added documents and deletions before
> they
> > > are
> > >  flushed to the Directory.
> > >  maxBufferedDocs sets a limit on the number of documents
> buffered
> > >  before flushing.
> > >  If both ramBufferSizeMB and maxBufferedDocs is set, then
> > >  Lucene will flush based on whichever limit is hit first.
> > >
> > > 100
> > > 1000
> > >
> > >
> > >
> > >
> > > -
> > > ---
> > > Alessandro Benedetti
> > > Search Consultant, R&D Software Engineer, Director
> > > Sease Ltd. - www.sease.io
> > > --
> > > View this message in context: http://lucene.472066.n3.
> > > nabble.com/High-disk-write-usage-tp4344356p4344386.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
>

Re: High disk write usage

Thnaks a lot alessandro!

Yes, we have very big physical dedicated machines, with a topology of 5
shards and10 replicas each shard.


1. transaction log files are increasing but not with this rate

2.  we 've probed with values between 300 and 2000 MB... without any
visible results

3.  We don't use those features

4. No.

5. I've probed with low and high mergefacors and i think that is  the point.

With low merge factor (over 4) we 've high write disk rate as i said
previously

with merge factor of 20, writing disk rate is decreasing, but now, with
high qps rates (over 1000 qps) system is overloaded.

i think that's the expected behaviour :(




2017-07-05 15:49 GMT+02:00 alessandro.benedetti :

> Point 2 was the ram Buffer size :
>
> *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene
>  indexing for buffering added documents and deletions before they
> are
>  flushed to the Directory.
>  maxBufferedDocs sets a limit on the number of documents buffered
>  before flushing.
>  If both ramBufferSizeMB and maxBufferedDocs is set, then
>  Lucene will flush based on whichever limit is hit first.
>
> 100
> 1000
>
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/High-disk-write-usage-tp4344356p4344386.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

High disk write usage

Hi,

We are implementing a solrcloud cluster (6.6 version) with NRT requisites.
We are indexing 600 docs/sec with 1500 docs/sec peaks, and we are serving
about 1500qps.

Our documents has 300 fields with some doc values, about 4kb and we have 3
million of documents.

HardCommit is set to 15 minutes, but disk writing is about 15mbps all the
time (60mbps on peaks), without higher writing disk rates each 15
minutes ... ¿is this the expected behaviour?

Re: Trouble getting a solr join query done

2015-07-13 Thread Antonio David Pérez Morales

Hi again Yusnel

Just to confirm, I have tested your use case and the query which returns
what you need is this one:

http://localhost:8983/solr/category/select?q={!join from=categoryId
fromIndex=product to=id}*:*&wt=json&indent=true&fq=name:clothes&hl=false

Please, check and let us know if it works for you

Regards

2015-07-12 17:02 GMT+02:00 Antonio David Pérez Morales <
adperezmora...@gmail.com>:

> Hi Yusnel
>
> I think the query is invalid. It should be "q=clothes&fq={!join
> from=type_id to=id fromIndex=products}" or "q=*:*&fq={!join from=type_id
> to=id fromIndex=products}clothes" as long as you are using an edismax
> parser or df param for default field, where "clothes" query is matched to.
>
> Regards
>
>
>
> 2015-07-11 2:23 GMT+02:00 Yusnel Rojas García :
>
>> I have 2 indexes
>>
>> products {
>>id,
>>name,
>>type_id
>>..
>> }
>>
>> and
>>
>> categories {
>>id,
>>name
>>..
>> }
>>
>> and I want to get all categories that match a name and have products in
>> it.
>> my best guess would be:
>>
>> http://localhost:8983/solr/categories/select?q=clothes&fl=*,score&fq={!join
>> from=type_id
>> <http://localhost:8983/solr/categories/select?q=clothes&fl=*,score&fq=%7B!joinfrom=type_id>
>> to=id fromIndex=products}*:*
>>
>> but always get an empty response. help please!
>>
>> Is a better way of doing that without using another index?
>>
>
>

Re: Trouble getting a solr join query done

2015-07-12 Thread Antonio David Pérez Morales

Hi Yusnel

I think the query is invalid. It should be "q=clothes&fq={!join
from=type_id to=id fromIndex=products}" or "q=*:*&fq={!join from=type_id
to=id fromIndex=products}clothes" as long as you are using an edismax
parser or df param for default field, where "clothes" query is matched to.

Regards



2015-07-11 2:23 GMT+02:00 Yusnel Rojas García :

> I have 2 indexes
>
> products {
>id,
>name,
>type_id
>..
> }
>
> and
>
> categories {
>id,
>name
>..
> }
>
> and I want to get all categories that match a name and have products in it.
> my best guess would be:
> http://localhost:8983/solr/categories/select?q=clothes&fl=*,score&fq={!join
> from=type_id to=id fromIndex=products}*:*
>
> but always get an empty response. help please!
>
> Is a better way of doing that without using another index?
>

Re: data import

2015-03-13 Thread Antonio Jesús Sánchez Padial


Maybe you should add some info about:

- your architecture, number of servers, etc
- your schema.xml
- and the data (ammount, type, ...) you are indexing

Best.

El 13/03/2015 a las 9:37, abhishek tiwari escribió:

solr indexing taking too much time .

What should i do to reduce time . working on solr 4.0.



--
Antonio Jesús Sánchez Padial
Jefe del Servicio de Biometría
antonio.sanc...@inia.es
Tlfno: +34 91 347 6831
Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria
Ctra.m de La Coruña, km.7
28040 Madrid

alt attribute img tag

2012-04-04 Thread Manuel Antonio Novoa Proenza

Hello, 

I would like to know the method of extracting from the images that are in html 
documents Alt attribute data 

















10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: PageRank

2012-04-04 Thread Manuel Antonio Novoa Proenza

hi Rav
Thank you for your answer.

In my case I use nutch for crawling the web. Using nutch am a true rookie. How 
do I configure nutch to return that information? And how do I make solr to 
index that information, or that information is being built with the score of 
the indexed documents.

thank you very much
















Saludos...














Manuel Antonio Novoa Proenza
Universidad de las Ciencias Informáticas
Email: mano...@estudiantes.uci.cu




10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

pagerank??

2012-04-03 Thread Manuel Antonio Novoa Proenza

Hello, 

I have in my Solr index , many indexed documents. 

Let me know any way or efficient function to calculate the page rank of 
websites indexed. 


s 

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Position Solr results

2012-04-01 Thread Manuel Antonio Novoa Proenza

hi Marcelo

In that sense I think the score does not help. The score is a number that I 
determined at that position results generated are a given site.

For example :

I perform the following query : q = university

Solr generates several results among which is that of a certain website. Does 
solr some mechanism to let me know that posción is this result?

I reiterate that my English is very bad so I use a translator , anyway then 
send you what I mean in Spanish.

thank you very much

Manuel

hola Marcelo

En ese sentido creo que el score no me sirve. El score es un numero que no me 
determina en que posición de los resultados generados se encuentra un 
determinado sitio.

Por ejemplo:

Yo realizo la siguiente consulta: q= universidad

Solr genera varios resultados entre los que se encuentra el de un determinado 
sitio web. ¿Cuenta solr con algún mecanismo que me permita saber en que posción 
se encuentra este resultado?

Te reitero que mi inglés es muy malo por eso uso un traductor, de todas formas 
a continuación te envío lo que quiero decir en español.

Muchas gracias

Manuel

Saludos...

Manuel Antonio Novoa Proenza
Universidad de las Ciencias Informáticas
Email: mano...@estudiantes.uci.cu

- Mensaje original -

De: "Marcelo Carvalho Fernandes" 
Para: solr-user@lucene.apache.org
Enviados: Domingo, 1 de Abril 2012 5:14:50
Asunto: Re: Position Solr results

Try using the "score" field in the search results.

---
Marcelo Carvalho Fernandes

On Friday, March 30, 2012, Manuel Antonio Novoa Proenza <
mano...@estudiantes.uci.cu> wrote:
>
>
>
>
>
> Hi
>
> I'm not good with English, and for this reason I had to resort to a
translator.
>
> I have the following question ...
>
> How I can get the position in which there is a certain website in solr
results generated for a given search criteria ?
>
> regards
>
> ManP
>
>
>
>
>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

--

Marcelo Carvalho Fernandes
+55 21 8272-7970
+55 21 2205-2786

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: index the links having a certain website

2012-04-01 Thread Manuel Antonio Novoa Proenza

hi Marcelo

Certainly I want to index HTML documents, but would like to save these 
separately to text links that these have, for, say , See What are the links to 
external websites have a page ? .

I reiterate that my English is very bad so I use a translator , anyway then 
send you what I mean in Spanish.

thank you very much

Manuel

hola Marcelo

Ciertamente deseo indexar documentos HTML, pero quisiera de estos guardar por 
separado al texto los enlaces que estos posean, para, por ejemplo, Consultar 
¿Cuáles son los links a sitios web externos que posee una determinada página?.

Te reitero que mi inglés es muy malo por eso uso un traductor, de todas formas 
a continuación te envío lo que quiero decir en español.

Muchas gracias

Manuel

Saludos...

Manuel Antonio Novoa Proenza
Universidad de las Ciencias Informáticas
Email: mano...@estudiantes.uci.cu

- Mensaje original -

De: "Marcelo Carvalho Fernandes" 
Para: solr-user@lucene.apache.org
Enviados: Domingo, 1 de Abril 2012 5:12:34
Asunto: Re: index the links having a certain website

Hi Manuel,

Do you mean you need to index html files?
What kind of search do you image doing?

---
Marcelo Carvalho Fernandes

On Friday, March 30, 2012, Manuel Antonio Novoa Proenza <
mano...@estudiantes.uci.cu> wrote:
>
>
>
>
>
> Hello
>
> I'm not good with English, and therefore I had to resort to a translator.
>
> I have the following question ...
>
> How I can index the links having a certain website ?
>
> regards
>
> ManP
>
>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

--

Marcelo Carvalho Fernandes
+55 21 8272-7970
+55 21 2205-2786

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

index the links having a certain website

2012-03-30 Thread Manuel Antonio Novoa Proenza






Hello 

I'm not good with English, and therefore I had to resort to a translator. 

I have the following question ... 

How I can index the links having a certain website ? 

regards 

ManP 



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Position Solr results

2012-03-30 Thread Manuel Antonio Novoa Proenza






Hi 

I'm not good with English, and for this reason I had to resort to a translator. 

I have the following question ... 

How I can get the position in which there is a certain website in solr results 
generated for a given search criteria ? 

regards 

ManP 






10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Response 200 and empty response xml section

2011-10-10 Thread Antonio Pérez-Aranda

Hi all,

I'm doing search stress testing over my solr cluster and when I get
many concurrent random words dict searchers I get the follow response:





With HTTP status 200

I have reach the concurrency limit? pool limit?

It's is a Solr 1.4.1 with tomcat 6.0.32 and I set 500 as maxThreads on
Tomcat server.xml and sufficient ram to hold the full index.

I get more responses with this XML when I lowdon the java ram limit.

If I repeat the query with only one searcher, I get many results.

Re: Multilingual text analysis

2011-06-02 Thread Juan Antonio Farré Basurte

Thank you both Paul and Lee for your answer.
Luckily in my case there's no problem about knowing language at index time nor 
we have really to bother about the language of the query, as users can specify 
the language they are interested in.
So I guess our solution would be to use different optional fields, one for each 
language and that should be good enough.
I just had wondered whether it was possible to parametrize the analyzers in 
function of one field value. I think this would be a very elegant solution for 
many needs. May it could be a possible improvement for future versions of solr 
:)

Paul, what do you mean when you say it would make sense to start a page at the 
solr website?

Thanks again,

Juan

El 02/06/2011, a las 16:06, Paul Libbrecht escribió:

> Juan,
> 
> An easy way in solr, I think, is indeed to use different fields at index time 
> and expand on multiple fields at query time.
> I believe using field-names' wildcards allows you to specify a different 
> analyzer per language doing this.
> 
> There's been long discussions on the java-u...@lucene.apache.org mailing-list 
> about the best design for multilingual indexing and searching. One of the key 
> arguments was wether you were able to detect with faithfulness the language 
> of a query, this is generally very hard.
> 
> It would make sense to start a page at the solr website...
> 
> paul
> 
> 
> Le 2 juin 2011 à 12:52, lee carroll a écrit :
> 
>> Juan
>> 
>> I don't think so.
>> 
>> you can try indexing fields like myfield_en. myfield_fr, my field_xx
>> if you now what language you are dealing with at index and query time.
>> 
>> you can also have seperate cores for your documents for each language
>> if you don't want to complicate your schema
>> again you will need to know language at index and query time
>> 
>> 
>> 
>> On 2 June 2011 08:57, Juan Antonio Farré Basurte
>>  wrote:
>>> Hello,
>>> Some of the possible analyzers that can be applied to a text field, depend 
>>> on the language of the text to analyze and can be configured for a concrete 
>>> language.
>>> In my case, the text fields can be in many different languages, but each 
>>> document also includes a field containing the language of text fields.
>>> Is it possible to configure analyzers to use the suitable language for each 
>>> document, in function of the language field?
>>> Thanks,
>>> 
>>> Juan
>

Multilingual text analysis

2011-06-02 Thread Juan Antonio Farré Basurte

Hello,
Some of the possible analyzers that can be applied to a text field, depend on 
the language of the text to analyze and can be configured for a concrete 
language.
In my case, the text fields can be in many different languages, but each 
document also includes a field containing the language of text fields.
Is it possible to configure analyzers to use the suitable language for each 
document, in function of the language field?
Thanks,

Juan

Re: Nested grouping/field collapsing

2011-05-27 Thread Juan Antonio Farré Basurte

I've found the same issue.
As long as I know, the only solution is to create a copy field which combines 
both-fields values and facet on this field.
If one of the fields has a set of distinct values known in advance and its 
cardinality c is not too big, it isn't a great problem: you can do with c 
queries.

El 27/05/2011, a las 15:03, Martijn Laarman escribió:

> Hi,
> I was wondering if this issue had already been raised.
> 
> We currently have a use case where nested field collapsing would be really
> helpful
> 
> I.e Collapse on field X then Collapse on Field Y within the groups returned
> by field X
> 
> The current behavior of specifying multiple fields seem to be returning
> mutiple result sets.
> 
> Has this already been feature requested ? Does anybody know of a workaround
> ?
> 
> Many thanks,
> 
> Martijn

frange vs TrieRange

2011-05-27 Thread Juan Antonio Farré Basurte

Hello,
I have to perform range queries agains a date field. It is a TrieDateField, and 
I'm already using it for sorting. Hence, there will be already en entry in the 
FieldCache for it.
According to:

http://www.lucidimagination.com/blog/2009/07/06/ranges-over-functions-in-solr-14/

frange queries are typically faster than normal range queries when there are 
many terms between the endpoints (though it could be slower, if there's less 
than a 5% of terms between the endpoints). The cost of this speedup is the 
memory associated with a FieldCache entry for the field. In my case, there's no 
additional memory overhead, as there's already such entry.
It also states that TrieRange queries have the best space/speed tradeoff.
Now my doubt is: if I have no memory overhead, then I only care about relative 
speed between frange and trie. The good speed/space tradeoff of trie is not the 
measure I need in this case, but just a comparison at pure speed level.
Does anybody know if there's data about this? Any clue on whether to choose 
frange or trie in this case?

Thanks,

Juan

Re: Facet Query

2011-05-27 Thread Juan Antonio Farré Basurte

Are you talking about a facet query or a facet field?
If it's a facet query, I don't get what's going on.
If it's a facet field... well, if it's a fixed set of words you're interested 
in, filter the query to only those words and you'll get counts only for them. 
If you just need to filter out common words, I don't remember exactly how it 
works, but when you declare the text field (or its type) you can specify a 
processor that does exactly that: removes common words from the indexed field 
and, hence, you shouldn't get counts on them, because they just aren't there.
Sorry if my information is inexact. I haven't had to deal with this feature yet.

El 27/05/2011, a las 09:51, Jasneet Sabharwal escribió:

> Hi
> 
> When I do a facet query on my data, it shows me a list of all the words 
> present in my database with their count. Is it possible to not get the 
> results of common words like a, an, the, http and so one but only get the 
> count of stuff we need like microsoft, ipad, solr, etc.
> 
> -- 
> Thanx&  Regards
> 
> Jasneet Sabharwal
>

Re: FieldCache

2011-05-26 Thread Juan Antonio Farré Basurte

fieldCache stores one entry for each field that is used for sorting or for 
field faceting when you use the fieldCache (fc) method. Before solr 1.4 the 
method for field faceting was the enum method that executes a filter query for 
each unique value of the field and stores it in the filterCache. From solr 1.4, 
the default method is fc, except for boolean fields, that use enum method by 
default.
So, you should have an entry in fieldCache for each field that you use either 
for sorting or for field faceting with fc facet method. Does it match?
I don't know a way to configure the size of the fieldCache. I don't know how 
much memory does each entry consume, either.
Sorry not to be of further help.
Cheers

El 26/05/2011, a las 16:50, Jean-Sebastien Vachon escribió:

> 10 unique terms on 1.5M documents each with 50+ fields? I don't think so ;)
> 
> What I mean is controlling its size like the other caches. There are
> currently no options in solrconfig.xml to control this cache.
> Is Solr/Lucene managing this all by itself? 
> 
> It could be that my understanding of the FieldCache is wrong. I thought this
> was the main cache for Lucene. Is that right?
> 
> Thanks for your feedback
> 
> -Original Message-
> From: pravesh [mailto:suyalprav...@yahoo.com] 
> Sent: May-26-11 2:58 AM
> To: solr-user@lucene.apache.org
> Subject: Re: FieldCache
> 
> This is because you may be having only 10 unique terms in your indexed
> Field.
> BTW, what do you mean by controlling the FieldCache?
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/FieldCache-tp2987541p2988142.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: solr 3.1 without slf4j-jdk14-1.5.5.jar

2011-05-26 Thread Juan Antonio Farré Basurte

If I'm not wrong, solrj uses slf4j for logging. slf4j-api.jar provides the api, 
but is not capable by itself to do the actual logging.
For it to be able to log, it needs an actual implementation, usually a binding 
to some other logging library.
slf4j-jdk14 is the binding that uses the logging API in the JDK (since v 1.4) 
to do the actual logging.
Solrj needs slf4j-api and at one binding. You have to choose one and can 
exclude jars for other bindings.

The options are:
slf4j-log4j12 -> binding to log4j library version 1.2. Delegates logging to 
log4j.
slf4j-jdk14 -> binding to JDK logging library (in JDK v 1.4 or greater). 
Delegates logging to the JDK.
slf4j-nop -> is a dummy implementation that silently discards all log messages
slf4j-simple -> is itself an implementation that logs messages to System.err 
(only messages of level INFO or higher).
slf4j-jcl -> binding for Jakarta Commons Logging library. Delegates logging to 
JCL.

It's also documented a dependency to jcl-over-slf4j. This is quite the opposite 
of slf4j-jcl. While the latter implements slf4j api delegating logging to jcl, 
the former implements jcl api delegating logging to slf4j.
I don't really think that solrj is using this (not sure). I believe that solrj 
uses slf4j. Needing jcl-over-slf4j would mean that some code in solrj does not 
use slf4j api but jcl api and needs also an implementation for it. If you take 
a look to maven repositories, there is no such dependency for solrj, so I guess 
it's not really needed.

I hope I managed to explain it clearly.

Cheers,

Juan

El 26/05/2011, a las 16:36, antonio escribió:

> Reading the wiki, for use solrj i must use this lib:
> 
> From /lib 
> •slf4j-jdk14-1.5.5.jar 
> 
> But there isn't no one directory call lib, and no one jar called
> slf4j-jdk14-1.5.5.jar .
> 
> Is it necessary? When i can get it?
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-3-1-without-slf4j-jdk14-1-5-5-jar-tp2988950p2988950.html
> Sent from the Solr - User mailing list archive at Nabble.com.

solr 3.1 without slf4j-jdk14-1.5.5.jar

2011-05-26 Thread antonio

Reading the wiki, for use solrj i must use this lib:

>From /lib 
•slf4j-jdk14-1.5.5.jar 

But there isn't no one directory call lib, and no one jar called
slf4j-jdk14-1.5.5.jar .

Is it necessary? When i can get it?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-3-1-without-slf4j-jdk14-1-5-5-jar-tp2988950p2988950.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Termscomponent sort question

2011-05-26 Thread antonio

Hi Dmitry Kan, thanks for your anwser.
This is an idea, but i think that will be not so performing. Because if the
terms are 1000, i must reorder 1000 terms by own length, and i think the
time will be high for make autocomplete.

Don't you think?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Termscomponent-sort-question-tp2980683p2988872.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Termscomponent sort question

2011-05-25 Thread antonio

Help me please...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Termscomponent-sort-question-tp2980683p2986185.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Termscomponent sort question

2011-05-25 Thread antonio

No one has an idea?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Termscomponent-sort-question-tp2980683p2983776.html
Sent from the Solr - User mailing list archive at Nabble.com.

Termscomponent sort question

2011-05-24 Thread antonio

Hi, i use solr 3.1.
I implemented my autocomplete with TermsComponent. I'm finding, if there is,
a way to sort my finding terms by score.
Example, i there are two terms: "Rome" and "Near Rome", that have the same
count (that is 1), i would that "Rome" will be before "Near Rome".
Because count is the same, if i use index as sort, "Near Rome" is
"lexically" before "Rome".

Is there a way to use score like in dismax for termscomponents? Using
dismax, for example, if i search "Rome", the word "Rome" has max score than
"Near Rome". I would the same behavior with TermComponent.

Is it possible?

Thanks.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Termscomponent-sort-question-tp2980683p2980683.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Mysql vs Postgres DIH

2011-05-22 Thread antonio

Hi, thanks for interesting.
I solve the problem.
I use concat function in MYsql that converts in some case the fields in
BLOB. So, solr has a strange behavior importing blob field as id!

I find the solution reading the DIH faq in the wiki, exactly:
"Blob values in my table are added to the Solr document as object strings
like B@1f23c5"

Thank's. 
p.s. excuse for my so bad english!
   
Antonio

 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mysql-vs-Postgres-DIH-tp2963822p2973515.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Mysql vs Postgres DIH

2011-05-19 Thread antonio

Excuse me, i wrong to write 197085, correct is 17085. But never the same
count...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mysql-vs-Postgres-DIH-tp2963822p2963824.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mysql vs Postgres DIH

2011-05-19 Thread antonio

Hi, 
i make the same query to import my data with mysql and postgres.
But only postgres index all data (17090).
While Mysql index 17086, after 197085, after 17087... never 17090. But the
response tell me that it has skipped 0 documents. I don't understand!

Help me please, i woul to use Mysql for my application...

Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mysql-vs-Postgres-DIH-tp2963822p2963822.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlighting does not work when using !boost as a nested query

By the way, I was wrong when saying that using bf instead of !boost did not 
work either. I probably hit more than one problem at the same time when I first 
tested that.
I've retested now and this works:

/select?q=+id:12345^0.01 +_query_:"{!dismax 
v=$qq}"&bf=recip(ms(NOW/DAY,published_date),3.16e-11,1,1)&qq=user_text&qf=text1^2
 text2&pf=text1^2 text2&tie=0.1&q.alt=*:*&hl=true&hl.fl=text1 
text2&hl.mergeContiguous=true

But I don't get the multiplicative boost I'd like to use...

El 19/05/2011, a las 11:31, Juan Antonio Farré Basurte escribió:

> Hi,
> 
> The query is generated dynamically and can be more or less complex depending 
> on different parameters. I'm also not free to give many details of our 
> implementation, but I'll give you the minimal query string that fails and the 
> relevant pieces of the config.
> The query string is:
> 
> /select?q=+id:12345^0.01 +_query_:"{!boost b=$dateboost v=$qq 
> deftype=dismax}"&dateboost=recip(ms(NOW/DAY,published_date),3.16e-11,1,1)&qq=user_text&qf=text1^2
>  text2&pf=text1^2 text2&tie=0.1&q.alt=*:*&hl=true&hl.fl=text1 
> text2&hl.mergeContiguous=true
> 
> where id is an int and text1 and text2 are type text. hl.fl has proven to be 
> necessary whenever I use dismax in an inner query. Ohterwise, only text2 (the 
> default field) is highlighted, and not both fields appearing in qf. For 
> example,
> q={!dismax v=$qq}&... does not require hl.fl to highlight both text1 and 
> text2.
> q=+_query_:"{!dismax v=$qq}"&... only highlights text2, unless I specify 
> hl.fl.
> 
> The given query is probably not minimal in the sense that some of the 
> dismax-related parameters can be omitted and the query still fails. But the 
> one given always fails (and adding more complexity to it does not make it 
> work, quite obviously). Unfortunately, hl.requireFieldMatch=false does not 
> help.
> 
> Request handler config is the following:
> 
> 
>   
> explicit
>   
> 
> 
> Highlighter config is the following:
> 
> 
>default="true">
> 
>   100
> 
>   
>   
> 
>   70
>   0.5
>   [-\w ,/\n\"']{20,200}
> 
>   
>default="true">
> 
>   
>   
> 
>   
> 
> 
> If there's any other information that could be useful, just ask.
> Thank you very much for your help,
> 
> Juan
> 
> El 16/05/2011, a las 23:18, Chris Hostetter escribió:
> 
>> 
>> : As I said in my previous message, if I issue:
>> : q=+field1:range +field2:value +_query_:{!dismax v=$qq}
>> : highlighting works. I've just discovered the problem is not just with 
>> {!boost...}. If I just add a bf parameter to the previous query, 
>> highlighting also fails.
>> : Anybody knows what can be happening? I'm really stuck on this problem...
>> 
>> Just a hunch, but i suspect the problem has to do with 
>> highlighter (or maybe it's the fragment generator?) trying to determine
>> matches from query types it doens't understand 
>> 
>> I thought there was a query param you could use to tell the highlighter to 
>> use an "alternate" query string (that would be simpler) instead of the 
>> real query ... but i'm not seeing it in the docs.
>> 
>> hl.requireFieldMatch=false might also help (not sure)
>> 
>> In general it would probably be helpful for folks if you could post the 
>> *entire* request you are making (full query string and all request params) 
>> along with the solrconfig.xml sections that show how your request handler 
>> and highlighter are configured.
>> 
>> 
>> 
>> -Hoss
>

Re: filter cache and negative filter query

> lookups to work with an arbitrary query, you would either need to changed 
> the cache structure from Query=>DocSet to a mapping of 
> Query=>[DocSet,inverseionBit] and store the same cache value needs needs 
> with two keys -- both the positive and the negative; or you keep the 

Well, I don't know how it's working right now, but I guess that, as the 
positive version is being stored, when you look a negative query up, you 
already have a similar lookup problem: or you store two keys for the same value 
or you just transform the negative query into a positive "canonical" one before 
looking it up. The same could be done in this case, with the difference that 
yes, you need an inversion bit stored too. The double lookup option sounds 
worse, though benchmarking should be done to know for sure.
Would this optimization influence only memory usage or also smaller sets are 
faster to intersect, for example? Well, in any case, saving memory allows to 
use the additional memory to speed up the application, for example, with bigger 
caches.

Re: Highlighting does not work when using !boost as a nested query

Hi,

The query is generated dynamically and can be more or less complex depending on 
different parameters. I'm also not free to give many details of our 
implementation, but I'll give you the minimal query string that fails and the 
relevant pieces of the config.
The query string is:

/select?q=+id:12345^0.01 +_query_:"{!boost b=$dateboost v=$qq 
deftype=dismax}"&dateboost=recip(ms(NOW/DAY,published_date),3.16e-11,1,1)&qq=user_text&qf=text1^2
 text2&pf=text1^2 text2&tie=0.1&q.alt=*:*&hl=true&hl.fl=text1 
text2&hl.mergeContiguous=true

where id is an int and text1 and text2 are type text. hl.fl has proven to be 
necessary whenever I use dismax in an inner query. Ohterwise, only text2 (the 
default field) is highlighted, and not both fields appearing in qf. For example,
q={!dismax v=$qq}&... does not require hl.fl to highlight both text1 and 
text2.
q=+_query_:"{!dismax v=$qq}"&... only highlights text2, unless I specify 
hl.fl.

The given query is probably not minimal in the sense that some of the 
dismax-related parameters can be omitted and the query still fails. But the one 
given always fails (and adding more complexity to it does not make it work, 
quite obviously). Unfortunately, hl.requireFieldMatch=false does not help.

Request handler config is the following:


  
explicit
  


Highlighter config is the following:


  

  100

  
  

  70
  0.5
  [-\w ,/\n\"']{20,200}

  
  

  
  

  


If there's any other information that could be useful, just ask.
Thank you very much for your help,

Juan

El 16/05/2011, a las 23:18, Chris Hostetter escribió:

> 
> : As I said in my previous message, if I issue:
> : q=+field1:range +field2:value +_query_:{!dismax v=$qq}
> : highlighting works. I've just discovered the problem is not just with 
> {!boost...}. If I just add a bf parameter to the previous query, highlighting 
> also fails.
> : Anybody knows what can be happening? I'm really stuck on this problem...
> 
> Just a hunch, but i suspect the problem has to do with 
> highlighter (or maybe it's the fragment generator?) trying to determine
> matches from query types it doens't understand 
> 
> I thought there was a query param you could use to tell the highlighter to 
> use an "alternate" query string (that would be simpler) instead of the 
> real query ... but i'm not seeing it in the docs.
> 
> hl.requireFieldMatch=false might also help (not sure)
> 
> In general it would probably be helpful for folks if you could post the 
> *entire* request you are making (full query string and all request params) 
> along with the solrconfig.xml sections that show how your request handler 
> and highlighter are configured.
> 
> 
> 
> -Hoss

Re: filter cache and negative filter query

> : query that in fact returns the "negative" results. As a simple example, 
> : I believe that, for a boolean field, -field:true is exactly the same as 
> : +field:false, but the former is a negative query and the latter is a 
> 
> that's not strictly true in all cases... 
> 
> * if the field is multivalued=true, a doc may contain both "false" and 
>   "true" in "field", in which case it would match +field:false but it 
>   would not match -field:true
> 
> * if the field is not multivalued-false, and required=false, a doc
>   may not contain any value, in which case it would match -field:true but 
>   it would not match +field:false

You're totally right. But it was just an example. I just didn't think about 
specifying the field to be single valued and required.

I did some testing yesterday about how are filteres cached, using the admin 
interface.
I noticed that if I perform a facet.query on a boolean field testing it to be 
true or false it always looks to add two entries to the query cache. May be it 
also adds an entry to test for unexsistence of the value?
And if I perform a facet.field on the same boolean field, three new entries are 
inserted into the filter cache. May be one for true, one for false and one for 
unexsistence? I really don't know what it's exactly doing, but doesn't look, at 
first sight, like a very optimal behaviour...
I'm testing on 1.4.1 lucidworks version of solr, using the boolean field 
inStock of its example schema, with its example data.

Re: filter cache and negative filter query

2011-05-18 Thread Juan Antonio Farré Basurte

Mmm... I had wondered whether solr reused filters this way (not having both the 
positive and negative versions) and I'm glad to see it does indeed reuse them.
What I don't like is that it systematically uses the positive version. 
Sometimes the negative version will give many less results (for example, in 
some cases I filter by documents not having a given field, and there are very 
few of them).
I think it would be much better that solr performed exactly the query requested 
and, if there's more than a 50% of documents that match the query, then it just 
stored the negated one. I think (without knowing almost at all how things are 
implemented) this shouldn't be a problem.
Is there any place where you can post a suggestion of improvement? :)
Anyway, it would be very useful to know exactly how the current versions work 
(I think the info in the message I'm answering is about version 1.1 and could 
have changed), because knowing it, one can sometimes manage to write, for 
example, a "positive" query that in fact returns the "negative" results. As a 
simple example, I believe that, for a boolean field, -field:true is exactly the 
same as +field:false, but the former is a negative query and the latter is a 
positive one.
So, knowing the exact behaviour of solr can help you write optimized filters 
when you know that one version will give many less hits than the other.

El 18/05/2011, a las 00:26, Yonik Seeley escribió:

> On Tue, May 17, 2011 at 6:17 PM, Markus Jelsma
>  wrote:
>> I'm not sure. The filter cache uses your filter as a key and a negation is a
>> different key. You can check this easily in a controlled environment by
>> issueing these queries and watching the filter cache statistics.
> 
> Gotta hate crossing emails ;-)
> Anyway, this goes back to Solr 1.1
> 
> 5. SOLR-80: Negative queries are now allowed everywhere.  Negative queries
>are generated and cached as their positive counterpart, speeding
>generation and generally resulting in smaller sets to cache.
>Set intersections in SolrIndexSearcher are more efficient,
>starting with the smallest positive set, subtracting all negative
>sets, then intersecting with all other positive sets.  (yonik)
> 
> -Yonik
> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
> 25-26, San Francisco
> 
> 
> 
>>> If I have a query with a filter query such as : " q=art&fq=history" and
>>> then run a second query  "q=art&fq=-history", will Solr realize that it
>>> can use the cached results of the previous filter query "history"  (in the
>>> filter cache) or will it not realize this and have to actually do a second
>>> filter query against the index  for "not history"?
>>> 
>>> Tom
>>

Re: TrieIntField for "short" values

2011-05-15 Thread Juan Antonio Farré Basurte

Hi,

Thanks for your answer.

I am doing range queries on this field, yes, that's why I cared about
how all this trie thing works :)

If I use precisionStep=0 would it be equivalent to use, say, a
SortableIntField?

Would it be possible that you explained, for example, the difference in
how it would work using a precisionStep=0 or using a
precisionStep=Integer.MAX_VALUE?

May be this way I could get an idea on how it works. I've read as much
information as I've been able to find, but I didn't get a clear idea.

Thanks a lot,

Juan

El dom, 15-05-2011 a las 11:01 -0400, Erick Erickson escribió:
> Are you doing range queries on this field? Range queries are where
> Trie shines, so worrying about
> precision step if you're NOT intending to do range queries is a waste,
> just use precisionstep=0.
> 
> In fact, with only 1,000 values, I'd just go with PrecisionStep=0
> (which is the int field)
> 
> Best
> Erick
> 
> On Thu, May 12, 2011 at 11:15 AM, Juan Antonio Farré Basurte
>  wrote:
> > Hello,
> > I'm quite a beginner in solr and have many doubts while trying to learn how 
> > everything works.
> > I have only a slight idea on how TrieFields work.
> > The thing is I have an integer value that will always be in the range 
> > 0-1000. A short field would be enough for this, but there is no such 
> > TrieShortField (not even a SortableShortField). So, I used a TrieIntField.
> > My doubt is, in this case, what would be a suitable value for 
> > precisionStep. If the field had only 1000 distinct values, but they were 
> > more or less uniformly distributed in the 32-bit int range, probably a big 
> > precisionStep would be suitable. But as my values are in the range 0 to 
> > 1000, I think (without much knowledge) that a low precisionStep should be 
> > more adequate. For example, 2.
> > Can anybody, please, help me finding a good configuration for this type? 
> > And, if possible, can anybody explain in a brief and intuitive way what are 
> > the differences and tradeoffs of choosing smaller or bigger precisionSteps?
> > Thanks a lot,
> >
> > Juan

TrieIntField for "short" values

2011-05-12 Thread Juan Antonio Farré Basurte

Hello,
I'm quite a beginner in solr and have many doubts while trying to learn how 
everything works.
I have only a slight idea on how TrieFields work.
The thing is I have an integer value that will always be in the range 0-1000. A 
short field would be enough for this, but there is no such TrieShortField (not 
even a SortableShortField). So, I used a TrieIntField.
My doubt is, in this case, what would be a suitable value for precisionStep. If 
the field had only 1000 distinct values, but they were more or less uniformly 
distributed in the 32-bit int range, probably a big precisionStep would be 
suitable. But as my values are in the range 0 to 1000, I think (without much 
knowledge) that a low precisionStep should be more adequate. For example, 2.
Can anybody, please, help me finding a good configuration for this type? And, 
if possible, can anybody explain in a brief and intuitive way what are the 
differences and tradeoffs of choosing smaller or bigger precisionSteps?
Thanks a lot,

Juan

Highlighting does not work when using !boost as a nested query

2011-05-08 Thread Juan Antonio Farré Basurte

Hi,

I need to boost newer documents in my dismax queries.
As I've been able to read in the wiki, it's best to use a multiplicative
boost. The only way I know to do this with the dismax (not edismax)
query parser is via a {!boost b=$dateboost v=$qq defType=dismax} query.
To make things more complicated, I also need to add some filters to the
query (by date range, by field value...) that don't fit as filters, as
they have a huge number of possible unique values.
Hence, I added them to the main query in a form such:

q=+field1:range +field2:value +_query_:{!boost b=$dateboost v=$qq
defType=dismax}

And then I add hl=true as a top-level parameter.
The result is that the response includes some empty values in the
highlighting list and nothing else:














Using just q={!boost b=$dateboost v=$qq defType=dismax} works well.
Using something like:

q=+field1:range +field2:value +_query_:{!dismax v=$qq}

also works.

But when I try to use dismax inside boost inside a nested query,
highlighting stops working.

Am I doing anyhing wrong? Do you know any workaround? Should I post a
bug anywhere?
Is there another way of specifying a multiplicative boost (without using
edismax)?

Thanks,

Juan

Re: Implementing Search Suggestion on Solr

2010-10-27 Thread Antonio Calo'


Hi

If I understood, you will build a kind of dictionary or ontology or 
thesauru and you will use it if Solr query results are few. At query 
time (before or after) you will perform a query on this dictionary in 
order to retrieve the suggested word.


If you  need to do this, you can try to cvreate a custom request handler 
where you can controll the querying process in a simple manner 
(http://wiki.apache.org/solr/SolrRequestHandler).


With the custom request handler, you can add custom code to check query 
results before submitting query to solr or analizing the query before 
sending result to client. I never coded one, but I think this is a good 
starting point.


Hope this can help you

Antonio



Il 27/10/2010 11.03, Pablo Recio ha scritto:

Thanks, it's not what I'm looking for.

Actually I need something like search "Ubuntu" and it will prompt "Maybe you
will like 'Debian' too" or something like that. I'm not trying to do it
automatically, manually will be ok.

Anyway, is good article you shared, maybe I will implement it, thanks!

2010/10/27 Jakub Godawa


I am a real rookie at solr, but try this:
http://solr.pl/2010/10/18/solr-and-autocomplete-part-1/?lang=en

2010/10/27 Pablo Recio


Hi,

I don't want to be annoying, but I'm looking for a way to do that.

I repeat the question: is there a way to implement Search Suggestion
manually?

Thanks in advance.
Regards,

2010/10/18 Pablo Recio Quijano


Hi!

I'm trying to implement some kind of Search Suggestion on a search

engine

I

have implemented. This search suggestions should not be automatically

like

the one described for the SpellCheckComponent [1]. I'm looking

something

like:

"SAS oppositions" =>  "Public job offers for some-company"

So I will have to define it manually. I was thinking about synonyms [2]

but

I don't know if it's the proper way to do it, because semantically

those

terms are not synonyms.

Any ideas or suggestions?

Regards,

[1] http://wiki.apache.org/solr/SpellCheckComponent
[2]


http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

Re: Search Interface

2010-09-28 Thread Antonio Calo'


 Hi

You could try to use the Velocity framework to build GUIs in a  quick 
and efficent manner.


Solr come with a velocity handler already integrated that could be the 
best solution in your case:


http://wiki.apache.org/solr/VelocityResponseWriter

Also take these hints on the same topic: 
http://www.lucidimagination.com/blog/2009/11/04/solritas-solr-1-4s-hidden-gem/


there is also a webinar about rapid prototyping with solr:

http://www.slideshare.net/erikhatcher/rapid-prototyping-with-solr-4312681

Hope this help

Antonio


Il 28/09/2010 4.35, Claudio Devecchi ha scritto:

Hi everybody,

I`m implementing my first solr engine for conceptual tests, I`m crawling my
wiki intranet to make some searches, the engine is working fine already, but
I need some interface to make my searchs.
Somebody knows where can I find some search interface just for
customizations?

Tks

Re: java.lang.OutOfMemoryError: PermGen space when reopening solr server

2010-09-02 Thread Antonio Calo'


 Il 02/09/2010 8.51, Lance Norskog ha scritto:

Loading a servlet creates a bunch of classes via reflection. These are
in PermGen and never go away. If you load&unload over and over again,
any PermGen setting will fill up.
I agree , taking a look to all the links suggested by Peter seems that 
this exception could be caused by the memory leak. Also, it seems that 
the CGLibe that manage the .class loading used by Spring have a big 
issue about this.


Maibe it is  just an accident that it happens while opening a anew solr 
instance.


I'll investigate about general Permgem fault, but if someone have a 
suggestion on how to close solr server in a safe manner, you are  welcome!


Many thanks for your feedbacks.

Antonio

java.lang.OutOfMemoryError: PermGen space when reopening solr server

2010-09-01 Thread Antonio Calo'


 Hi guys

I'm facing an error in our production environment with our search 
application based on maven with spring + solrj.


When I try to change a class, or try to redeploy/restart an application, 
I catch a java.lang.OutOfMemoryError: PermGen


I've tryed to understand the cause of this and also I've succeded in 
reproducing this issue on my local develop environment by just 
restarting the jetty several time (I'm using eclipse + maven plugin).


The logs obtained are those:

   [...]
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /admin/: org.apache.solr.handler.admin.AdminHandlers
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /admin/ping: PingRequestHandler
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /debug/dump: solr.DumpRequestHandler
   32656 [Finalizer] INFO org.apache.solr.core.SolrCore - []  CLOSING
   SolrCore org.apache.solr.core.solrc...@1409c28
   17:43:19 ERROR InvertedIndexEngine:124 open -
   java.lang.OutOfMemoryError: PermGen space
   java.lang.RuntimeException: java.lang.OutOfMemoryError: PermGen space
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
at org.apache.solr.core.SolrCore.(SolrCore.java:579)
at
   
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
at
   
com.intellisemantic.intellifacet.resource.invertedIndex.InvertedIndexEngine.open(InvertedIndexEngine.java:113)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
   
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeCustomInitMethod(AbstractAutowireCapableBeanFactory.java:1536)
at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1477)
at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1409)
   [...]

The exception is always thrown while solr init is performed after a 
restart (this is the reason why I'm asking your support ;) )


It seems that while solr is trying to be set up (by [Timer-1]), another 
thread ([Finalizer]) is trying to close it. I can see from the Solr code 
that this exception is thrown always in the same place: SolrCore.java:1068.

Here there is a comment that say:

   // need to close the searcher here??? we shouldn't have to.
  throw new RuntimeException(th);
} finally {
  if (newestSearcher != null) {
newestSearcher.decref();
  }
}

I'm using slorj lib in a Spring container, so I'm supposing that Spring 
will manage the relase of all the singleton classes. Should I do 
something other like force closing solr?


Thanks in advance for your support.

Best regards

Antonio

java.lang.OutOfMemoryError: PermGen space when reopening solr server

2010-09-01 Thread Antonio Calo'


 Hi guys

I'm facing an error in our production environment with our search 
application based on maven with spring + solrj.


When I try to change a class, or try to redeploy/restart an application, 
I catch a java.lang.OutOfMemoryError: PermGen


I've tryed to understand the cause of this and also I've succeded in 
reproducing this issue on my local develop environment by just 
restarting the jetty several time (I'm using eclipse + maven plugin).


The logs obtained are those:

   [...]
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /admin/: org.apache.solr.handler.admin.AdminHandlers
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /admin/ping: PingRequestHandler
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /debug/dump: solr.DumpRequestHandler
   32656 [Finalizer] INFO org.apache.solr.core.SolrCore - []  CLOSING
   SolrCore org.apache.solr.core.solrc...@1409c28
   17:43:19 ERROR InvertedIndexEngine:124 open -
   java.lang.OutOfMemoryError: PermGen space
   java.lang.RuntimeException: java.lang.OutOfMemoryError: PermGen space
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
at org.apache.solr.core.SolrCore.(SolrCore.java:579)
at
   
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
at
   
com.intellisemantic.intellifacet.resource.invertedIndex.InvertedIndexEngine.open(InvertedIndexEngine.java:113)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
   
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeCustomInitMethod(AbstractAutowireCapableBeanFactory.java:1536)
at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1477)
at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1409)
   [...]

The exception is always thrown while solr init is performed after a 
restart (this is the reason why I'm asking your support ;) )


It seems that while solr is trying to be set up (by [Timer-1]), another 
thread ([Finalizer]) is trying to close it. I can see from the Solr code 
that this exception is thrown always in the same place: SolrCore.java:1068.

Here there is a comment that say:

   // need to close the searcher here??? we shouldn't have to.
  throw new RuntimeException(th);
} finally {
  if (newestSearcher != null) {
newestSearcher.decref();
  }
}

I'm using slorj lib in a Spring container, so I'm supposing that Spring 
will manage the relase of all the singleton classes. Should I do 
something other like force closing solr?


Thanks in advance for your support.

Best regards

Antonio

Re: Use of EmbeddedSolrServer

2010-06-28 Thread Antonio Calò

I think that this is the best way to use Solr. I've used EmbeddedSolrServer
keeping it in a singleton manner (by using Spring framework).

Also Solr is threadsafe, so you should not have any issue by using it
directly in an Ejb.

Antonio



2010/6/27 Robert Naczinski 

> Hello,
>
> there is a recommendation (best practice) for the use of
> EmbeddedSolrServer? We use it, because we have in our application an
> EJB module with Message Drive Bean. Now, we create the
> EmbeddedSolrServer (http://wiki.apache.org/solr/Solrj #
> EmbeddedSolrServer) in a javax.servlet.ServletContextListener and keep
> the in a singletonwrapper
>
> Can we do it that way? Or should we create a whole pool of the
> servers? ( with Apache Commons Pool )
>
> Can anyone give me any advice?
>
> Regards,
>
> Robert
>



-- 
Antonio Calò
--
Software Developer Engineer
@ Intellisemantic
Mail anton.c...@gmail.com
Tel. 011-56.90.429
--

Re: Any realtime indexing plugin available for SOLR

2010-05-27 Thread Antonio Lobato

Funny enough, I've been looking for my own solution too.  The Zoie plugin does 
not work on multi-core setups, so that's bust for me.  Once you commit 
something to index, you need to "warm" a new searcher (load all the data from 
disk into memory/cache) like Erik says.  On a smaller index, this is very very 
quick, however on a larger index, not so much.

Solr 1.5 will (hopefully) have a new feature that will allow for near real time 
searching.  Check this out:

http://wiki.apache.org/solr/NearRealtimeSearch

On May 27, 2010, at 6:00 AM, Erik Hatcher wrote:

> 
> On May 26, 2010, at 11:29 AM, Dennis Gearon wrote:
> 
>> I thought that if entries were COMMITed to the index, they were immediately 
>> visible?
>> 
>> Is this true, or am I smoking Java coffee beans?
> 
> They're visible after a commit AND warming are complete, yes.   But there 
> could be a potentially substantial delay between a commit message being sent 
> and the new documents actually searchable.
> 
>   Erik
> 

---
Antonio Lobato
Symplicity Corporation
www.symplicity.com
(703) 351-0200 x 8101
alob...@symplicity.com

Re: Date faceting and memory leaks

2010-05-17 Thread Antonio Lobato

I have ~50 million docs, and use the follow lines without any issues:

-XX:MaxNewSize=24m -XX:NewSize=24m -XX:+UseParNewGC 
-XX:+CMSParallelRemarkEnabled -XX:+UseConcMarkSweepGC

Perhaps try them out?

On May 17, 2010, at 2:47 PM, Ge, Yao (Y.) wrote:

> I do not have any GC specific setting in command line. I had tried to
> force GC collection via Jconsole at the end of the run but it didn't
> seems to do anything the heap size.
> -Yao 
> 
> -Original Message-
> From: Antonio Lobato [mailto:alob...@symplicity.com] 
> Sent: Monday, May 17, 2010 2:44 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Date faceting and memory leaks
> 
> What garbage collection settings are you running at the command line
> when starting Solr?
> On May 17, 2010, at 2:41 PM, Yao wrote:
> 
>> 
>> I have been running load testing using JMeter on a Solr 1.4 index with
> ~4
>> million docs. I notice a steady JVM heap size increase as I iterator
> 100
>> query terms a number of times against the index. The GC does not seems
> to
>> claim the heap after the test run is completed. It will run into
> OutOfMemory
>> as I repeat the test or increase the number of threads/users. 
>> 
>> The date facet queries are specified as following (as part of "append"
>> section in request handler):
>>   
>>   {!ex=last_modified}last_modified:[NOW-30DAY
> TO
>> *]
>> name="facet.query">{!ex=last_modified}last_modified:[NOW-90DAY TO
>> NOW-30DAY]
>> name="facet.query">{!ex=last_modified}last_modified:[NOW-180DAY TO
>> NOW-90DAY]
>> name="facet.query">{!ex=last_modified}last_modified:[NOW-365DAY TO
>> NOW-180DAY]
>> name="facet.query">{!ex=last_modified}last_modified:[NOW-730DAY TO
>> NOW-365DAY]
>>{!ex=last_modified}last_modified:[* TO
>> NOW-730DAY]
>>   
>> 
>> The last_modified field is a TrieDateField with a precisionStep of 6.
>> 
>> I have played for filterCache setting but does not have any effects as
> the
>> date field cache seems be  managed by Lucene FieldCahce.
>> 
>> Please help as I can be struggling with this for days. Thanks in
> advance.
>> -- 
>> View this message in context:
> http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp8243
> 72p824372.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
> 
> ---
> Antonio Lobato
> Symplicity Corporation
> www.symplicity.com
> (703) 351-0200 x 8101
> alob...@symplicity.com
> 

---
Antonio Lobato
Symplicity Corporation
www.symplicity.com
(703) 351-0200 x 8101
alob...@symplicity.com

Re: Date faceting and memory leaks

2010-05-17 Thread Antonio Lobato

What garbage collection settings are you running at the command line when 
starting Solr?
On May 17, 2010, at 2:41 PM, Yao wrote:

> 
> I have been running load testing using JMeter on a Solr 1.4 index with ~4
> million docs. I notice a steady JVM heap size increase as I iterator 100
> query terms a number of times against the index. The GC does not seems to
> claim the heap after the test run is completed. It will run into OutOfMemory
> as I repeat the test or increase the number of threads/users. 
> 
> The date facet queries are specified as following (as part of "append"
> section in request handler):
>
>{!ex=last_modified}last_modified:[NOW-30DAY TO
> *]
> {!ex=last_modified}last_modified:[NOW-90DAY TO
> NOW-30DAY]
> {!ex=last_modified}last_modified:[NOW-180DAY TO
> NOW-90DAY]
> {!ex=last_modified}last_modified:[NOW-365DAY TO
> NOW-180DAY]
> {!ex=last_modified}last_modified:[NOW-730DAY TO
> NOW-365DAY]
> {!ex=last_modified}last_modified:[* TO
> NOW-730DAY]
>
> 
> The last_modified field is a TrieDateField with a precisionStep of 6.
> 
> I have played for filterCache setting but does not have any effects as the
> date field cache seems be  managed by Lucene FieldCahce.
> 
> Please help as I can be struggling with this for days. Thanks in advance.
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp824372p824372.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 

---
Antonio Lobato
Symplicity Corporation
www.symplicity.com
(703) 351-0200 x 8101
alob...@symplicity.com

Re: long warmup duration

2010-02-19 Thread Antonio Lobato

You can disable warming, and a new searcher will register (almost) 
instantly, no matter the size.  However, once you run your first search, 
you will be "warming" your searcher, and it will block for a long, long 
time, giving the end user a "frozen" page.


Warming is just another word for "running a set of queries before the 
searcher is pushed to the front end."  Naturally if you disable warming, 
your searcher will register right away.  I wouldn't recommend it 
though.  If I disable warming on my documents, my new searchers would 
register instantly, but my first search on my web page would be stuck 
for 50 seconds or so.


As for the cache size, caching does a cache on entry data, not 
documents.  That's what warming is for.


On 2/19/2010 12:17 PM, Stefan Neumann wrote:

Hey,

I am quite confused with your configuration. It seems to me, that your
caches are extremly small for 30 million documents (128) and during
warmup you only put up to 20 docs in it. Please correct me if I
misunderstand anything.

In my opinion your warm up duration is not that impressiv, since we
currently disabled warmup, the new searcher is registered only in a few
seconds.

Actually, I would not drop these cache numbers. With a cache of 30k
documents we had a hitraion of 60%, decreasing this size the hitratio
decreased as well. With a hitratio of currently 30% it seems to be
better to disable caching anyway. Of course we would love to use caching
;-).

with best regards,

Stefan


Antonio Lobato wrote:
   

Drop those cache numbers.  Way down.  I warm up 30 million documents in about 2 
minutes with the following configuration:

   

   

   

   

Mind you, I also use Solr 1.4.  Also, setup a decent warming query or two, as 
so:
  date:[NOW-2DAYS TO NOW]  0  100  date desc

Don't warm facets that have a large amount of terms or you will kill your warm 
up time.

Hope this helps!

On Feb 17, 2010, at 8:55 AM, Stefan Neumann wrote:

 

Hi all,

we are facing extremly increasing warmup times the last 15 days, which
we are not able to explain, since the number of documents and their size
is stable. Before the increase we can commit our changes in nearly 20
minutes, now it is about 2 hours.

We were able to identify the warmup of the caches (queryresultCache and
filterCache) as the reason. We tried to decrease the number of warmup
elements from 3 to 1 without any impact.

What influences the runtime during the warmup? Is there any possibility
to boost the warmup?

I attach some more information and statistics.

Thanks a lot for your help.

Stefan


Solr:   1.3
Documents:  4.000.000
-Xmx12G
index size/disc 4.7G

config:

100
200

No queries configured for warming.

CACHES:
===

name:   queryResultCache
class:  org.apache.solr.search.LRUCache
version:1.0
description:LRU Cache(maxSize=20,
  initialSize=3,
  autowarmCount=1,
regenerator=org.apache.solr.search.solrindexsearche...@36eb7331)
stats:

lookups:15958
hits :  9589
hitratio:   0.60
inserts:16211
evictions:  0
size:   16169
warmupTime :1960239
cumulative_lookups: 436250
cumulative_hits:260678
cumulative_hitratio:0.59
cumulative_inserts: 174066
cumulative_evictions:   0


name:   filterCache
class:  org.apache.solr.search.LRUCache
version:1.0
description:LRU Cache(maxSize=20,
  initialSize=3,
  autowarmCount=3,  
regenerator=org.apache.solr.search.solrindexsearche...@9818f80)
stats:  
lookups:6313622
hits:   6304004
hitratio: 0.99
inserts: 42266
evictions: 0
size: 40827
warmupTime: 1268074
cumulative_lookups: 118887830
cumulative_hits: 118605224
cumulative_hitratio: 0.99
cumulative_inserts: 296134
cumulative_evictions: 0

Re: long warmup duration

2010-02-17 Thread Antonio Lobato

Drop those cache numbers.  Way down.  I warm up 30 million documents in about 2 
minutes with the following configuration:

  

  

  

  

Mind you, I also use Solr 1.4.  Also, setup a decent warming query or two, as 
so:
 date:[NOW-2DAYS TO NOW] 0 
100 date desc

Don't warm facets that have a large amount of terms or you will kill your warm 
up time.

Hope this helps!

On Feb 17, 2010, at 8:55 AM, Stefan Neumann wrote:

> Hi all,
> 
> we are facing extremly increasing warmup times the last 15 days, which
> we are not able to explain, since the number of documents and their size
> is stable. Before the increase we can commit our changes in nearly 20
> minutes, now it is about 2 hours.
> 
> We were able to identify the warmup of the caches (queryresultCache and
> filterCache) as the reason. We tried to decrease the number of warmup
> elements from 3 to 1 without any impact.
> 
> What influences the runtime during the warmup? Is there any possibility
> to boost the warmup?
> 
> I attach some more information and statistics.
> 
> Thanks a lot for your help.
> 
> Stefan
> 
> 
> Solr: 1.3
> Documents:4.000.000
> -Xmx  12G
> index size/disc 4.7G
> 
> config:
> 
> 100
> 200
> 
> No queries configured for warming.
> 
> CACHES:
> ===
> 
> name:   queryResultCache
> class:  org.apache.solr.search.LRUCache
> version:1.0
> description:LRU Cache(maxSize=20,
>  initialSize=3,
> autowarmCount=1,
>   regenerator=org.apache.solr.search.solrindexsearche...@36eb7331)
> stats:
> 
> lookups:15958
> hits :  9589
> hitratio:   0.60
> inserts:16211
> evictions:  0
> size:   16169
> warmupTime :1960239
> cumulative_lookups: 436250
> cumulative_hits:260678
> cumulative_hitratio:0.59
> cumulative_inserts: 174066
> cumulative_evictions:   0
> 
> 
> name: filterCache
> class:org.apache.solr.search.LRUCache
> version:  1.0
> description:  LRU Cache(maxSize=20,
> initialSize=3,
>  autowarmCount=3, 
>   regenerator=org.apache.solr.search.solrindexsearche...@9818f80)
> stats:
> lookups:  6313622
> hits:   6304004
> hitratio: 0.99
> inserts: 42266
> evictions: 0
> size: 40827
> warmupTime: 1268074
> cumulative_lookups: 118887830
> cumulative_hits: 118605224
> cumulative_hitratio: 0.99
> cumulative_inserts: 296134
> cumulative_evictions: 0
> 
> 
>

Re: Performance-Issues and raising numbers of "cumulative inserts"

2010-02-16 Thread Antonio Lobato

I've actually run into this issue; huge, 30 minute warm up times. I've
found that reducing the auto-warm count on caches (and the general size
of the cache) helped a -lot-, as did making sure my warm up query wasn't
something like:

q=*:*&facet=true&facet.field=somethingWithAWholeLotOfTerms

Tune your warm up queries so they are a bit more conservative, and you
should be fine. Don't warm facets on fields with a billion terms, and
make sure your application makes effective use of query and faceting. I
warm an index with ~30 /million/ records in about 50 seconds, and I'm
pretty sure I could do better!

Hope this helps!

On 2/17/2010 12:29 AM, Lance Norskog wrote:

These are some very large numbers. 700k ms is 70 seconds, 4M ms is 4k
seconds or 66 minutes. No Solr installation should take this long to
warm up.

There is something very wrong here. Have you optimized lately? What
queries do you run to warm it up? And, the basics: how many documents,
how much data per document, how much disk space is the index?

On Tue, Feb 16, 2010 at 3:02 AM, Bohnsack, Sven
wrote:

Hi Shalin!

Thanks for quick response. Sadly it tells me, that i have to look elsewhere to
fix the problem.

Anyone an idea what could cause the increasing warmup-Times? If required I can
post some stats.

Thanking you in anticipation!

Regards,

Sven

Feed: Solr-Mailing-List
Bereitgestellt am: Dienstag, 16. Februar 2010 09:05
Autor: Shalin Shekhar Mangar
Betreff: Re: Performance-Issues and raising numbers of "cumulative inserts"

On Tue, Feb 16, 2010 at 1:06 PM, Bohnsack, Sven wrote:> Hey IT-Crowd!> > I'm dealing with some
performance issues during warmup the> queryResultCache. Normally it tooks about 11 Minutes (~700.000 ms), but> now it tooks about 4
MILLION and more ms. All I can see in the solr.log> ist that the number of cumulative_inserts ascends from from ~250.000 to>
~670.000.> > I asked Google about the cumulative_inserts, but did not get an answer.> Can anyone tell me what "cumulative
inserts" are and what they stand> for? What does it mean, if the number of such inserts raises?> > cumulative_inserts are the
total number of inserts into the cache since Solr started up. The "inserts" shows the number of inserts since the last commit. --
Regards, Shalin Shekhar Mangar.

Artikel
anzeigen...

Interesting stuff; Solr as a syslog store.

2010-02-12 Thread Antonio Lobato

Hey everyone, I don't actually have a question, but I just thought I'd 
share something really cool that I did with Solr for our company.


We run a good amount of servers, well into the several hundreds, and 
naturally we need a way to centralize all of the system logs.  For a 
while we used a commercial solution to centralize and search our logs, 
but they wanted to charge us tens of thousands of dollars for just one 
gigabyte/day more of indexed data.  So I said forget it, I'll write my 
own solution!


We already use Solr for some of our other backend searching systems, so 
I came up with an idea to index all of our logs to Solr.  I wrote a 
daemon in perl that listens on the syslog port, and pointed every single 
system's syslog to forward to this single server.  From there, this 
daemon will write to a Solr indexing server after parsing them into 
fields, such as date/time, host, program, pid, text, etc.  I then wrote 
a cool javascript/ajax web front end for Solr searching, and bam.  Real 
time searching of all of our syslogs from a web interface, for no cost!


Just thought this would be a neat story to share with you all.  I've 
really grown to love Solr, it's something else!


Thanks,
-Antonio

Re: Huge Index - RAM usage?

2010-01-25 Thread Antonio Lobato

Just indexing.  If I shutdown Solr, memory usage goes down to 200MB.   
I've searched the mailing lists, but most situations are with people  
both searching and indexing.  I was under the impression that indexing  
shouldn't use up so much memory.  I'm trying to figure out where all  
the usage is coming from though.  Any ideas?


On Jan 25, 2010, at 11:03 AM, Erick Erickson wrote:


Are you also searching on this machine or just indexing?

I'll assume your certain that it's SOLR that's eating memory,
as in you stop the process and your usage drops way down.

But if you search the user list for memory, you'll see this
kind of thing discussed a bunch of times, along with
suggestions for tracking it down, whether it's just
postponed GCing, etc.

HTH
Erick

On Mon, Jan 25, 2010 at 10:47 AM, Antonio Lobato >wrote:



Hello everyone!

I have a question about indexing a large dataset in Solr and ram  
usage.  I
am currently indexing about 160 gigabytes of data to a dedicated  
indexing
server.  The data is constantly being fed to Solr, 24/7.  The index  
grows as
I prune away old data that is not needed, so the index size stays  
in the
150-170 gigabyte range.  However, RAM usage on this machine is off  
the wall.
The usage grows to about 27 gigabytes of RAM over 2 days or so.  Is  
this

normal behavior for Solr?

Thanks!
-Antonio

Huge Index - RAM usage?

2010-01-25 Thread Antonio Lobato


Hello everyone!

I have a question about indexing a large dataset in Solr and ram  
usage.  I am currently indexing about 160 gigabytes of data to a  
dedicated indexing server.  The data is constantly being fed to Solr,  
24/7.  The index grows as I prune away old data that is not needed, so  
the index size stays in the 150-170 gigabyte range.  However, RAM  
usage on this machine is off the wall. The usage grows to about 27  
gigabytes of RAM over 2 days or so.  Is this normal behavior for Solr?


Thanks!
-Antonio

Re: Custom Field sample?

2009-12-11 Thread Antonio Zippo

I need to add theese "features" to each document

Document1
---
Argument1, positive
Argument2, positive
Argument3, neutral
Argument4, positive
Argument5, negative
Argument6, negative

Document2
---
Argument1, negative
Argument2, positive
Argument3, negative
Argument6, negative
Argument7, neutral

where the argument name is dynamic
using a relational database I could use a master detail structure, but in solr?
I thought about a Map or Pair field

Da: Grant Ingersoll 
A: solr-user@lucene.apache.org
Inviato: Gio 10 dicembre 2009, 19:47:55
Oggetto: Re: Custom Field sample?

Can you perhaps give a little more info on what problem you are trying to 
solve?  FWIW, there are a lot of examples of custom FieldTypes in the Solr code.

On Dec 10, 2009, at 11:46 AM, Antonio Zippo wrote:

> Hi all,
> 
> could you help me to create a custom field?
> 
> I need to create a field structured like a Map
> is it possible? how to define if the search string is on key or value (or 
> both)?
> 
> A way could be to create a char separated multivalued string field... but it 
> isn't the best way. and with facets is the worst way
> 
> could you give me a custom field sample?
> 
> 
> Thanks in advance,  
>  Revenge
> 
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
Solr/Lucene:
http://www.lucidimagination.com/search

Custom Field sample?

2009-12-10 Thread Antonio Zippo

Hi all,

could you help me to create a custom field?

I need to create a field structured like a Map
is it possible? how to define if the search string is on key or value (or both)?

A way could be to create a char separated multivalued string field... but it 
isn't the best way. and with facets is the worst way

could you give me a custom field sample?


Thanks in advance,  
  Revenge

Re: Tika trouble

2009-11-16 Thread Antonio Calò

What I could try to say is that if you want to index a Pdf, then you should
use a Pdf extractor. A Pdf Extractor is able to extract the text content and
the metadata of the files. I suppose you have just opened and indexed the
pdf as is. So you stored bynary data and stop. For my applciation I've used
PdfExtractor, but also pdfBox project could be used.

Antonio

2009/11/16 Markus Jelsma - Buyways B.V. 

> Anyone has a clue?
>
>
>
> > List,
> >
> >
> > I somehow fail to index certain pdf files using the
> > ExtractingRequestHandler in Solr 1.4 with default solrconfig.xml but
> > modified schema. I have a very simple schema for this case using only
> > and ID field, a timestamp field and two dynamic fields; ignored_* and
> > attr_* both indexed, stored and multivalued strings. They are
> > multivalued simple because some HTML files fail when storing multiple
> > hyperlinks.
> >
> > I have posted multiple files to
> > http://.../update/extract?literal.id=doc1 including:
> > 1. the whitepaper at
> > http://www.lucidimagination.com/whitepaper/whats-new-in-lucene-2-9?sc=AP
> > 2. the html file of the frontpage of http://nu.nl/
> > 3. another pdf at
> >
> http://www.google.nl/url?sa=t&source=web&ct=res&cd=1&ved=0CAcQFjAA&url=http%3A%2F%2Fcsl.stanford.edu%2F~christos%2Fpublications%2F2007.cmp_mapreduce.hpca.pdf&rct=j&q=2007.cmp_mapreduce.hpca.pdf&ei=PPz7SpiiOM6l4QbZjKjRAw&usg=AFQjCNHs-olxbUQrGCXpNMHfcZvY8aMk8A<http://www.google.nl/url?sa=t&source=web&ct=res&cd=1&ved=0CAcQFjAA&url=http%3A%2F%2Fcsl.stanford.edu%2F%7Echristos%2Fpublications%2F2007.cmp_mapreduce.hpca.pdf&rct=j&q=2007.cmp_mapreduce.hpca.pdf&ei=PPz7SpiiOM6l4QbZjKjRAw&usg=AFQjCNHs-olxbUQrGCXpNMHfcZvY8aMk8A>
> >
> > For each document i have a corresponding select/?q=*:*:
> >
> >
> > 1. No text? Should i see something?
> >
> > doc1
> > 
> > application/octet-stream
> > 
> > 
> > 
> > text/xml; charset=UTF-8;
> > boundary=cf57b4ad644d
> > 
> > 
> > 
> > 491238
> > 
> > 
> > 
> > 
> > 2009-11-12T12:17:23.016Z
> > 
> >
> >
> > 2. Plenty of data, this seems to be ok
> >
> > 
> > doc1
> > 
> > application/xhtml+xml
> > 
> > 
> > http://www.nu.nl/
> > http://www.nu.nl/
> > http://www.nu.nl/algemeen/
> > http://www.nu.nl/economie/
> > 
> > 
> > 
> > text/xml; charset=UTF-8;
> > boundary=b6e44d087bdd
> > 
> > 
> > 
> > 36991
> > 
> > 
> > 
> > A LOT OF TEXT HERE
> > 
> > 
> > 2009-11-12T12:19:15.415Z
> > 
> >
> >
> > 3. a lot of garbage
> >
> > 
> > doc1
> > 
> > windows-1252
> > 
> > 
> > fr
> > 
> > 
> > text/plain
> > 
> > 
> > fr
> > 
> > 
> > 
> > text/xml; charset=UTF-8;
> > boundary=83df0fd4d358
> > 
> > 
> > 
> > 361458
> > 
> > 
> > 
> > A LOT OF GARBAGE HERE including
> >
> > ió½·Þp™ó 40›
> > š©xÓ ^ CøùI3ëžŒš³î¨V ÚÜ¡yS4 ¹£ ² ›H 6õÉ¨5¤ ÅÜç£©bädÒøŸ\ �s%OîÐÙIÑYRäŠ ;4
> > ¢9"r "—!rEôˆÌ {SìûD²à £©ïœ«{‘ínÆ N÷ô¥F»�™ ±¡Ë'ú\³=·m„Þ »ý)³Å=j¶B¢)`  Ñ
> > „Ï™hjCu{£É5{¢¯ç6½Ñhr¢ºÃ=J M- AqsøtÜì ÿ^Rl S?¿óšM‰—lv‘Ø›Qüãý´ þžŽ
> > $S;¾¦wze³Ù)qÉú§ ‰› ãqó…Ó ‰ª"U:šBÝ‘GuŠ"ë
> > MM±Òv �~ ‚N‹t¢ä§~Ì ÞŒS—Êòö¼ÊÄQaº¸¿7tñ ¾Áç œãØŒ58$O 3Å~�8¿L  ‡ëŽó©pk _
> > Ša Â=u×; (ä<�...@.œ÷ä ù° µk+ÿ PP~ ¨*Ý¤¿Œ™¡D»   @fI$0°�Î Ù·p“Œ,Øâ  †¶v
> > ¤v1#8¼0 ›  èð€-†šZ 6¾  ! ñb ˆbˆ¤v)LS)T X² ¬ l...@€  6E$Q
> > endstream
> > endobj
> > 137 0
> >
> obj<>
> > endobj
> > 138 0 obj< > 728]/FontName/WQHWKD+TTE31911E0t00/Flags 4/MissingWidth 750/StemV
> > 141/CapHeight 728/Ascent 728/Descent -210/ItalicAngle 0>>
> > endobj
> > 139 0 obj<>
> > endobj
> > 140 0 obj< > R]/Type/Pages/Parent 139 0 R>>
> > endobj
> > 141 0 obj< > R]/Type/Pages/Parent
> >
> > 
> >
> > 
> > 
> > 2009-11-12T12:21:28.306Z
> > 
> >
> >
> > Any ideas? Why doesn't the whitepaper produce any results and why is the
> > next whitepaper full of garbage? At least i'm happy that HTML works
> > fine.
> >
> >
> >
> > Regards,
> >
> > -
> > Markus Jelsma  Buyways B.V.
> > Technisch ArchitectFriesestraatweg 215c
> > http://www.buyways.nl  9743 AD Groningen
> >
> >
> > Alg. 050-853 6600  KvK  01074105
> > Tel. 050-853 6620  Fax. 050-3118124
> > Mob. 06-5025 8350  In: http://www.linkedin.com/in/markus17
> >
>



-- 
Antonio Calò
--
Software Developer Engineer
@ Intellisemantic
Mail anton.c...@gmail.com
Tel. 011-56.90.429
--

Re: solrjs

2009-10-28 Thread Antonio Eggberg

I fully understand it was not working probably in production or on other data 
sets. But it did serve a purpose for me.. i.e. show a demo to anyone out of my 
box.. and I update my local repo with trunk all the time..

I could do ant reuters-start using my laptop and it would work.. my point is 
remove it when you have something to replace it with.. active development won't 
help my demo to customers.. and will not promote solr to larger audience..

well whats done is done. i will revert to older repo revision.

thanks

--- Den ons 2009-10-28 skrev Colin Hynes :

> Från: Colin Hynes 
> Ämne: Re: solrjs
> Till: solr-user@lucene.apache.org
> Datum: onsdag 28 oktober 2009 15.18
> 
> Actually, it wasn't quite working. It also replicated a lot
> of stuff that's in ajax solr, which is being more actively
> developed. Hence the removal.
> 
> 
> On Oct 28, 2009, at 10:16 AM, Antonio Eggberg wrote:
> 
> > I am all for new stuff.
> > 
> > It would be nice to see a working example of ajax-solr
> before killing completely solrjs from trunk... at least it
> was working .. ajax-solr has no how to, nor any working
> example..
> > 
> > http://github.com/evolvingweb/ajax-solr
> > 
> > Well why not just remove the javascript folder too and
> just have one liner mention in CHANGES.txt??...
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >     
> __
> > Låna pengar utan säkerhet. Jämför vilkor online
> hos Kelkoo.
> > http://www.kelkoo.se/c-100390123-lan-utan-sakerhet.html?partnerId=96915014
> > 
> 
> Active Media Architects, Inc.
> World Class Design, Programming & Strategy - Since
> 1998
> http://www.activema.com
> 
> 1-888-392-4567 toll free
> 1-586-445-1000 local
> 1-586-445-2247 fax
> 
> 

  ___
Sök efter kärleken!
Hitta din tvillingsjäl på Yahoo! Dejting: 
http://ad.doubleclick.net/clk;185753627;24584539;x?http://se.meetic.yahoo.net/index.php?mtcmk=148783

solrjs

2009-10-28 Thread Antonio Eggberg

I am all for new stuff. 

It would be nice to see a working example of ajax-solr before killing 
completely solrjs from trunk... at least it was working .. ajax-solr has no how 
to, nor any working example.. 

http://github.com/evolvingweb/ajax-solr

Well why not just remove the javascript folder too and just have one liner 
mention in CHANGES.txt??...







  __
Låna pengar utan säkerhet. Jämför vilkor online hos Kelkoo.
http://www.kelkoo.se/c-100390123-lan-utan-sakerhet.html?partnerId=96915014

Re: HighLithing exact phrases with solr

2009-10-20 Thread Antonio Calò

Hi Kaji, many thanks for your suggestion.

Sorry for delay in my feedback.

I've tried to set hl.usePhraseHighlighter=true, but it still not working.

Here my setup:


   
   
   

 100
 true

  true

   

   
   

  
  100
  
  0.5
  
  [-\w ,/\n\"']{20,200}

  true

  true


   

   
   

true

  true
 
 

   


  


Any help from other user is really appreciated.

2009/10/6 Koji Sekiguchi 

> Please try hl.usePhraseHighlighter=true parameter.
> (It should be true by default if you use the latest nightly, but I think
> you don't)
>
> Koji
>
>
> Antonio Calň wrote:
>
>> Hi Guys
>>
>> I'm getting crazy with the highlighting in solr. The problem is the
>> follow:
>> when I submit an exact phrase query, I get the related results and the
>> related snippets with highlight. But I've noticed that the *single term of
>> the phrase are highlighted too*. Here an example:
>>
>> If I start a search for "quick brown fox", I obtain the correct result
>> with
>> the doc wich contains the phrase, but the snippets came to me like this:
>>
>> 
>> 
>>
>>
>> The quick brown fox jump over the lazy dog. The fox is a
>> nice animal.
>>
>> 
>>  
>> 
>>
>>
>> Also with some documents, only single terms are highlighted insteand of
>> exact sentence even if the exact phrase is contained into the document i.
>> e.:
>> 
>> 
>>
>>
>> The fox is a nice animal.
>>
>> 
>>  
>> 
>>
>>
>> My understanding of highlighting is that if I search for exact phrase,
>> only
>> the exact phrase is should be highlighted.
>>
>> Here an extract of my solrconfig.xml & schema.xml
>>
>> solrconfig.xml:
>>
>> 
>>   
>>   
>>   
>>
>> 500
>>
>>   
>>
>>   
>>   > class="org.apache.solr.highlight.RegexFragmenter" default="true">
>>
>>  
>>  700
>>  
>>  0.5
>>  
>>  [-\w ,/\n\"']{20,200}
>>
>>  true
>>
>>  true
>>
>>   
>>
>>   
>>   
>>
>> 
>> 
>>
>>   
>>
>>
>> schema.xml:
>>
>> 
>>
>> > words="stop_italiano.txt"/>
>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0"/>
>>
>>  
>>
>>
>>
>>
>>
>>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>> words="stop_italiano.txt"/>
>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="1"/>
>>
>>
>>
>>
>>
>>
>> Maybe I'm missing something, or my understanding of the highlighting
>> feature
>> is not correct. Any Idea?
>>
>> As always, thanks for your support!
>>
>> Regards, Antonio
>>
>>
>>
>
>


-- 
Antonio Calò
--
Software Developer Engineer
@ Intellisemantic
Mail anton.c...@gmail.com
Tel. 011-56.90.429
--

Re: Solr Porting to .Net

2009-10-05 Thread Antonio Calò

Hi Mauricio, thanks for your feedback.

I suppose we will move to a mixed solution Solr on Tomcat and a .Net client
(maybe SolrNet)

But the Solr on KVM could be interesting. If I've time I'll try It and I'll
let you know in success case.

Antonio

2009/9/30 Mauricio Scheffer 

> Solr is a server that runs on Java and it exposes a http interface.SolrNet
> is a client library for .Net that connects to a Solr instance via its http
> interface.
> My experiment (let's call it SolrIKVM) is an attempt to run Solr on .Net.
>
> Hope that clear things up.
>
> On Wed, Sep 30, 2009 at 11:50 AM, Antonio Calò 
> wrote:
>
> > I guys, thanks for your prompt feedback.
> >
> >
> > So, you are saying that SolrNet is just a wrapper written in C#, that
> > connnect the Solr (still written in Java that run on the IKVM) ?
> >
> > Is my understanding correct?
> >
> > Regards
> >
> > Antonio
> >
> > 2009/9/30 Mauricio Scheffer 
> >
> > > SolrNet is only a http client to Solr.
> > > I've been experimenting with IKVM but wasn't very successful... There
> > seem
> > > to be some issues with class loading, but unfortunately I don't have
> much
> > > time to continue these experiments right now. In case you're interested
> > in
> > > continuing this, here's the repository:
> > > http://code.google.com/p/mausch/source/browse/trunk/SolrIKVM
> > >
> > > Also recently someone registered a project on google code with the same
> > > intentions, but no commits yet: http://code.google.com/p/solrwin/
> > >
> > > <http://code.google.com/p/mausch/source/browse/trunk/SolrIKVM>Cheers,
> > > Mauricio
> > >
> > > On Wed, Sep 30, 2009 at 7:09 AM, Pravin Paratey 
> > wrote:
> > >
> > > > You may want to check out - http://code.google.com/p/solrnet/
> > > >
> > > > 2009/9/30 Antonio Calò :
> > > > > Hi All
> > > > >
> > > > > I'm wondering if is already available a Solr version for .Net or if
> > it
> > > is
> > > > > still under development/planning. I've searched on Solr website but
> > > I've
> > > > > found only info on Lucene .Net project.
> > > > >
> > > > > Best Regards
> > > > >
> > > > > Antonio
> > > > >
> > > > > --
> > > > > Antonio Calò
> > > > > --
> > > > > Software Developer Engineer
> > > > > @ Intellisemantic
> > > > > Mail anton.c...@gmail.com
> > > > > Tel. 011-56.90.429
> > > > > --
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Antonio Calò
> > --
> > Software Developer Engineer
> > @ Intellisemantic
> > Mail anton.c...@gmail.com
> > Tel. 011-56.90.429
> > --
> >
>



-- 
Antonio Calò
--
Software Developer Engineer
@ Intellisemantic
Mail anton.c...@gmail.com
Tel. 011-56.90.429
--

HighLithing exact phrases with solr

2009-10-05 Thread Antonio Calò

Hi Guys

I'm getting crazy with the highlighting in solr. The problem is the follow:
when I submit an exact phrase query, I get the related results and the
related snippets with highlight. But I've noticed that the *single term of
the phrase are highlighted too*. Here an example:

If I start a search for "quick brown fox", I obtain the correct result with
the doc wich contains the phrase, but the snippets came to me like this:


 


The quick brown fox jump over the lazy dog. The fox is a
nice animal.

 
  



Also with some documents, only single terms are highlighted insteand of
exact sentence even if the exact phrase is contained into the document i.
e.:

 


The fox is a nice animal.

 
  



My understanding of highlighting is that if I search for exact phrase, only
the exact phrase is should be highlighted.

Here an extract of my solrconfig.xml & schema.xml

solrconfig.xml:


   
   
   

 500

   

   
   

  
  700
  
  0.5
  
  [-\w ,/\n\"']{20,200}

  true

  true

   

   
   

 
 

   


schema.xml:



 


  














Maybe I'm missing something, or my understanding of the highlighting feature
is not correct. Any Idea?

As always, thanks for your support!

Regards, Antonio

DIH help needed

2009-10-01 Thread Antonio Eggberg

Hello,

I am working with some XML/JSON feed as well as Database and using transformer 
to create the final index. I am no expert and I would like to get some help on 
a hourly/daily rate basis. It might be also the this part of the job can be 
outsourced to you completely, however I need to understand this as I will end 
up operating it.

Please send me mail directly with your rates etc.

Thanks
Anton




  __
Ta semester! - sök efter resor hos Kelkoo.
Jämför pris på flygbiljetter och hotellrum här:
http://www.kelkoo.se/c-169901-resor-biljetter.html?partnerId=96914052

Re: Solr Porting to .Net

2009-09-30 Thread Antonio Calò

I guys, thanks for your prompt feedback.


So, you are saying that SolrNet is just a wrapper written in C#, that
connnect the Solr (still written in Java that run on the IKVM) ?

Is my understanding correct?

Regards

Antonio

2009/9/30 Mauricio Scheffer 

> SolrNet is only a http client to Solr.
> I've been experimenting with IKVM but wasn't very successful... There seem
> to be some issues with class loading, but unfortunately I don't have much
> time to continue these experiments right now. In case you're interested in
> continuing this, here's the repository:
> http://code.google.com/p/mausch/source/browse/trunk/SolrIKVM
>
> Also recently someone registered a project on google code with the same
> intentions, but no commits yet: http://code.google.com/p/solrwin/
>
> <http://code.google.com/p/mausch/source/browse/trunk/SolrIKVM>Cheers,
> Mauricio
>
> On Wed, Sep 30, 2009 at 7:09 AM, Pravin Paratey  wrote:
>
> > You may want to check out - http://code.google.com/p/solrnet/
> >
> > 2009/9/30 Antonio Calò :
> > > Hi All
> > >
> > > I'm wondering if is already available a Solr version for .Net or if it
> is
> > > still under development/planning. I've searched on Solr website but
> I've
> > > found only info on Lucene .Net project.
> > >
> > > Best Regards
> > >
> > > Antonio
> > >
> > > --
> > > Antonio Calò
> > > --
> > > Software Developer Engineer
> > > @ Intellisemantic
> > > Mail anton.c...@gmail.com
> > > Tel. 011-56.90.429
> > > --
> > >
> >
>



-- 
Antonio Calò
--
Software Developer Engineer
@ Intellisemantic
Mail anton.c...@gmail.com
Tel. 011-56.90.429
--

Solr Porting to .Net

2009-09-30 Thread Antonio Calò

Hi All

I'm wondering if is already available a Solr version for .Net or if it is
still under development/planning. I've searched on Solr website but I've
found only info on Lucene .Net project.

Best Regards

Antonio

-- 
Antonio Calò
--
Software Developer Engineer
@ Intellisemantic
Mail anton.c...@gmail.com
Tel. 011-56.90.429
--

Re: DIH example explanation

2009-07-22 Thread Antonio Eggberg

:)

thank you paul! and it works! I have one more stupid question about the wiki.

"url (required) : The url used to invoke the REST API. (Can be templatized)."

How do you templatize the URL? My URL's are being updated all the time by an 
external program. i.e. list of atom sites it's a text file. So I should use 
some form of transformer to process it? any hint..

Thanks.
Anton

--- Den ons 2009-07-22 skrev Noble Paul നോബിള്‍  नोब्ळ् 
:

> Från: Noble Paul നോബിള്‍  नोब्ळ् 
> Ämne: Re: DIH example explanation
> Till: solr-user@lucene.apache.org
> Datum: onsdag 22 juli 2009 10.52
> The point is that namespace is
> ignored while DIH reads the xml. So
> just use the part after the colon (:) in your xpath
> expressions and it
> should just work.
> 
> 
> 
> 
> 
> On Wed, Jul 22, 2009 at 2:16 PM, Antonio
> Eggberg
> wrote:
> > Hi,
> >
> > I am looking at the slashdot example and I am having
> hard time understanding the following, from the wiki
> >
> > ==
> >
> > "You can use this feature for indexing from REST API's
> such as rss/atom feeds, XML data feeds , other Solr servers
> or even well formed xhtml documents . Our XPath support has
> its limitations (no wildcards , only fullpath etc) but we
> have tried to make sure that common use-cases are covered
> and since it's based on a streaming parser, it is extremely
> fast and consumes constant amount of memory even for large
> XMLs. It does not support namespaces , but it can handle
> xmls with namespaces . When you provide the xpath, just drop
> the namespace and give the rest (eg if the tag is
> '' the mapping should just contain
> 'subject').Easy, isn't it? And you didn't need to write one
> line of code! Enjoy"
> > ==
> >
> > How does  becomes field subject and
> why it's mapping xpath="/RDF/item/subject".. what is the
> secret?
> >
> > I am trying to index atom files and I need to
> understand the above cos I have namespace, not sure how to
> proceed. are there any atom example anywhere?
> >
> > Thanks again for clarification.
> > Anton
> >
> >
> >    
>  __
> > Ta semester! - sök efter resor hos Kelkoo.
> > Jämför pris på flygbiljetter och hotellrum här:
> > http://www.kelkoo.se/c-169901-resor-biljetter.html?partnerId=96914052
> >
> >
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
> 


  __
Ta semester! - sök efter resor hos Kelkoo.
Jämför pris på flygbiljetter och hotellrum här:
http://www.kelkoo..se/c-169901-resor-biljetter.html?partnerId=96914052

DIH example explanation

2009-07-22 Thread Antonio Eggberg

Hi, 

I am looking at the slashdot example and I am having hard time understanding 
the following, from the wiki

==

"You can use this feature for indexing from REST API's such as rss/atom feeds, 
XML data feeds , other Solr servers or even well formed xhtml documents . Our 
XPath support has its limitations (no wildcards , only fullpath etc) but we 
have tried to make sure that common use-cases are covered and since it's based 
on a streaming parser, it is extremely fast and consumes constant amount of 
memory even for large XMLs. It does not support namespaces , but it can handle 
xmls with namespaces . When you provide the xpath, just drop the namespace and 
give the rest (eg if the tag is '' the mapping should just contain 
'subject').Easy, isn't it? And you didn't need to write one line of code! Enjoy"
==

How does  becomes field subject and why it's mapping 
xpath="/RDF/item/subject".. what is the secret? 

I am trying to index atom files and I need to understand the above cos I have 
namespace, not sure how to proceed. are there any atom example anywhere?

Thanks again for clarification.
Anton


  __
Ta semester! - sök efter resor hos Kelkoo.
Jämför pris på flygbiljetter och hotellrum här:
http://www.kelkoo.se/c-169901-resor-biljetter.html?partnerId=96914052

Re: can i use solr to do this

2009-07-17 Thread Antonio Eggberg



--- Den fre 2009-07-17 skrev Shalin Shekhar Mangar :

> Från: Shalin Shekhar Mangar 
> Ämne: Re: can i use solr to do this
> Till: solr-user@lucene.apache.org
> Datum: fredag 17 juli 2009 09.32
> On Fri, Jul 17, 2009 at 5:53 AM,
> Antonio Eggberg
> wrote:
> 
> >
> > Hi,
> >
> > every solr document I have a creation date which is
> the default time stamp
> > "NOW". What I like to know how can I have facets like
> the following:
> >
> > Past 24 Hours (3)
> > Past 7 days (23)
> > Past 15 days (33)
> > Past 30 days (59)
> >
> > Is this possible? i.e. range query as facet?
> >
> 
> Yes, you can use facet.query and pass a range
> 
> facet.query=created:[NOW/DAY -
> NOW/DAY-1DAY]&facet.query=created:[NOW/DAY -
> NOW/DAY-7DAYS] and so on
> 


Thank you Shalin works like a charm!.

Anton


  ___
Sök efter kärleken!
Hitta din tvillingsjäl på Yahoo! Dejting: 
http://ad.doubleclick.net/clk;185753627;24584539;x?http://se.meetic.yahoo.net/index.php?mtcmk=148783

can i use solr to do this

2009-07-16 Thread Antonio Eggberg


Hi,

every solr document I have a creation date which is the default time stamp 
"NOW". What I like to know how can I have facets like the following:

Past 24 Hours (3)
Past 7 days (23)
Past 15 days (33)
Past 30 days (59)

Is this possible? i.e. range query as facet?

Regards
Anton




  __
Ta semester! - sök efter resor hos Kelkoo.
Jämför pris på flygbiljetter och hotellrum här:
http://www.kelkoo.se/c-169901-resor-biljetter.html?partnerId=96914052

lucene document via JSON

2009-05-30 Thread Antonio Eggberg


Hi,

Is adding/updating/deleting in JSON format possible? actually my need is mostly 
update I like to let user update certain fields of an existing results?

Another solution is I let user save it in DB and then server convert/post XML 
to Solr.. but not so fancy :)

Thanks
Anton


  __
Ta semester! - sök efter resor hos Kelkoo.
Jämför pris på flygbiljetter och hotellrum här:
http://www.kelkoo.se/c-169901-resor-biljetter.html?partnerId=96914052

Re: SOLR-769 clustering

2009-04-22 Thread Antonio Eggberg

Thanks Grant and Stanislaw.

To answer your question in terms of minimum term is, I am working with "joke 
text" very short in length so the clusters are not so meaning full.. I mean lot 
of adverbs and nouns, I thought increasing it might give me less cluster but 
bit more meaningful (maybe not).

--- Den ons 2009-04-22 skrev Grant Ingersoll :

> Från: Grant Ingersoll 
> Ämne: Re: SOLR-769 clustering
> Till: solr-user@lucene.apache.org
> Datum: onsdag 22 april 2009 14.44
> 
> On Apr 21, 2009, at 3:46 AM, Antonio Eggberg wrote:
> 
> > 
> > Hello:
> > 
> > I have got the clustering working i.e SOLR-769. I am
> wondering
> > 
> > - why there is a filed called "body", does it have
> special purpose?
> > 
> >    indexed="true" stored="true" multiValued="true"/>
> > 
> 
> That's just used in the test schema and there isn't any
> need for you to use it.
> 
> 
> > - can my clustering field be a copyField? basically I
> like to remove the urls and html?
> 
> As long as it is stored, a copyField should be fine.
> 
> > 
> > 
> > - is there anyway to have minimum number of labels per
> cluster?
> 
> See Stanislaw's answer.
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem
> (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 

  __
Går det långsamt? Skaffa dig en snabbare bredbandsuppkoppling. 
Sök och jämför priser hos Kelkoo.
http://www.kelkoo.se/c-100015813-bredband.html?partnerId=96914325

SOLR-769 clustering

2009-04-21 Thread Antonio Eggberg


Hello:

I have got the clustering working i.e SOLR-769. I am wondering 

- why there is a filed called "body", does it have special purpose?

   

- can my clustering field be a copyField? basically I like to remove the urls 
and html?

- is there anyway to have minimum number of labels per cluster? 

Thanks.
Antonio


  __
Ta semester! - sök efter resor hos Kelkoo.
Jämför pris på flygbiljetter och hotellrum här:
http://www.kelkoo.se/c-169901-resor-biljetter.html?partnerId=96914052

Re: CollapseFilter with the latest Solr in trunk

2009-04-19 Thread Antonio Eggberg


I wish it would be planned for 1.4 :)) 


--- Den sön 2009-04-19 skrev Otis Gospodnetic :

> Från: Otis Gospodnetic 
> Ämne: Re: CollapseFilter with the latest Solr in trunk
> Till: solr-user@lucene.apache.org
> Datum: söndag 19 april 2009 15.06
> 
> Thanks for sharing!
> It would be good if you (of Jeff from Zappos or anyone
> making changes to this) could put up a new patch for this
> most-voted-JIRA-issue.
> 
> 
> Thanks,
> Otis --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
> > From: climbingrose 
> > To: solr-user@lucene.apache.org
> > Sent: Sunday, April 19, 2009 8:12:11 AM
> > Subject: Re: CollapseFilter with the latest Solr in
> trunk
> > 
> > Ok, here is how I fixed this problem:
> > 
> >   public DocListAndSet
> getDocListAndSet(Query query, ListfilterList,
> > DocSet docSet, Sort lsort, int offset, int len, int
> flags) throwsIOException {
> > 
> >     //DocListAndSet ret = new
> DocListAndSet();
> > 
> > 
>    //getDocListC(ret,query,filterList,docSet,lsort,offset,len,
> flags |=
> > GET_DOCSET);
> > 
> > DocSet theFilt = getDocSet(filterList);
> > 
> > if (docSet != null) theFilt = (theFilt != null) ?
> > theFilt.intersection(docSet) : docSet;
> > 
> >     QueryCommand qc = new
> QueryCommand();
> > 
> > 
>    qc.setQuery(query).setFilter(theFilt);
> > 
> > 
>    qc.setSort(lsort).setOffset(offset).setLen(len).setFlags(flags
> |=
> > GET_DOCSET);
> > 
> >     QueryResult result = new
> QueryResult();
> > 
> >     getDocListC(result,qc);
> > 
> > 
> > 
> >     return
> result.getDocListAndSet();
> > 
> >   }
> > 
> > 
> > There is also one-off error in CollapseFilter which
> you can find solution on
> > Jira.
> > 
> > Cheers,
> > Cuong
> > 
> > On Sat, Apr 18, 2009 at 4:41 AM, Jeff Newburn wrote:
> > 
> > > We are currently trying to do the same
> thing.  With the patch unaltered we
> > > can use fq as long as collapsing is turned
> on.  If we just send a normal
> > > document level query with an fq parameter it
> blows up.
> > >
> > > Additionally, it does not appear that the
> collapse.facet option works at
> > > all.
> > >
> > > --
> > > Jeff Newburn
> > > Software Engineer, Zappos.com
> > > jnewb...@zappos.com
> - 702-943-7562
> > >
> > >
> > > > From: climbingrose 
> > > > Reply-To: 
> > > > Date: Fri, 17 Apr 2009 16:53:00 +1000
> > > > To: solr-user 
> > > > Subject: CollapseFilter with the latest Solr
> in trunk
> > > >
> > > > Hi all,
> > > >
> > > > Have any one try to use CollapseFilter with
> the latest version of Solr in
> > > > trunk? However, it looks like Solr 1.4
> doesn't allow calling
> > > setFilterList()
> > > > and setFilter() on one instance of the
> QueryCommand. I modified the code
> > > in
> > > > QueryCommand to allow this:
> > > >
> > > >     public QueryCommand
> setFilterList(Query f) {
> > > > //      if( filter != null )
> {
> > > > //        throw new
> IllegalArgumentException( "Either filter or
> > > filterList
> > > > may be set in the QueryCommand, but not
> both." );
> > > > //      }
> > > >       filterList =
> null;
> > > >       if (f !=
> null) {
> > > >     
>    filterList = new ArrayList(2);
> > > >     
>    filterList.add(f);
> > > >       }
> > > >       return this;
> > > >     }
> > > >
> > > > However, I still have a problem which
> prevent query filters from working
> > > > when used in conjunction with
> CollapseFilter. In other words, query
> > > filters
> > > > doesn't seem to have any effects on the
> result set when CollapseFilter is
> > > > used.
> > > >
> > > > The other problem is related to OpenBitSet:
> > > >
> > > > java.lang.ArrayIndexOutOfBoundsException:
> 2183
> > > > at
> org.apache.lucene.util.OpenBitSet.fastSet(OpenBitSet.java:242)
> > > > at
> org.apache.solr.search.CollapseFilter.addDoc(CollapseFilter.java:202)
> > > >
> > > > at
> > > >
> > >
> > > 
> >
> org.apache.solr.search.CollapseFilter.adjacentCollapse(CollapseFilter.java:161>
> > > )
> > > > at
> > >
> org.apache.solr.search.CollapseFilter.(CollapseFilter.java:141)
> > > >
> > > > at
> > > >
> > >
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:2
> > > > 17)
> > > > at
> > > >
> > >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandle
> > > > r.java:195)
> > > > at
> > > >
> > >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.ja
> > > > va:131)
> > > >
> > > > at
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
> > > > at
> > > >
> > >
> > > 
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303>
> > > )
> > > > at
> > > >
> > >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:23
> > > > 2)
> > > >
> > > > at
> > > >
> > >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFi
> > > > lterChain.java:202)
> > > > at
> > > >
> > >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChai
> > > > n.java:173)

DIH XML/update

2009-04-11 Thread Antonio Eggberg


Hi,

Wondering if there is a solution to this. I am using DIH to index an XML feed, 
I update the feed every 10 min and now I have an index of more then 10 million 
docs, I do DIH update/clean = false. Now more and more I update my indexing 
time is increasing and its coming to a point where I can see indexing time will 
be greater then the update time. 

Do you have any suggestion? I am thinking of creating a large index and then 
every X hours merge the small index with the large ones -- small index is the 
one which gets updated by the feed. any other thoughts? How are you folks doing 
it?

Regards
Antonio


  __
Låna pengar utan säkerhet. Jämför vilkor online hos Kelkoo.
http://www.kelkoo.se/c-100390123-lan-utan-sakerhet.html?partnerId=96915014

questions regarding custom search component??

2009-04-08 Thread Antonio Eggberg


Hi,

Currently I am using the DIH to create my index, all works fine. Now what I 
would like to do is to add some "additional filed/value" (which is not in my 
database that i am indexing) example adding categories from another file. I 
also have situation that this category might change over time and when updating 
the index, I will need to re-index the whole db which could take lot of time. 
Option as I understand 

- Use custom transformer (not sure what it means if re-index and category keeps 
changing)
- Post processing with some sort of search component.. basically my index will 
stay the same -- I am still trying to get my head around to this one, I am 
guessing "more like this" and "highlighter" is something I can look and learn. 

Question? 

Which option is and are there any specific code I should look for example?

Are there any real world example to see how it works?

Maybe there is other options? 

Regards
Antonio.


  __
Låna pengar utan säkerhet. Jämför vilkor online hos Kelkoo.
http://www.kelkoo.se/c-100390123-lan-utan-sakerhet.html?partnerId=96915014

Re: Multiple HttpDataSource

2009-03-02 Thread Antonio Eggberg





--- Den mån 2009-03-02 skrev Noble Paul നോബിള്‍  नोब्ळ् :

> Från: Noble Paul നോബിള്‍  नोब्ळ् 
> Ämne: Re: Multiple HttpDataSource
> Till: solr-user@lucene.apache.org, antonio_eggb...@yahoo.se
> Datum: måndag 2 mars 2009 09.39
> you do not need to setup multiple  entries
> unless you wish
> to have separate configuration params for each (say
> readTimeOut ,
> connTimeOut etc). Each entity will get a unique instance of
> the
> DataSource instance.


Paul, 

Thanks for your reply, I am still trying to get my head around DIH :) So only 
thing I need to change is the (entity name and entity url etc in general) for 
each RSS feeds? correct? 

Also, if my schema is rather "loose" i.e. using dynamic field I could 
experiment indexing wikipedia data plus 5 more other RSS fields .. correct?

Thanks.

> On Mon, Mar 2, 2009 at 2:01 PM, Antonio Eggberg
>  wrote:
> >
> > Hello all,
> >
> > It was not very clear to me If I can have more then 1
> HTTPDataSource in the config file also this means I will
> have more then 1
> >
> > 
> > 
> > 
> > 
> >
> > correct?
> >
> > I am trying to index multiple RSS feeds also are there
> any limits? Also is it possible to mix and match i.e. my
> internat database + external RSS feeds as data source?
> >
> > Thanks again for clarification.
> > Antonio
> >
> >
> >    
>  __
> > Går det långsamt? Skaffa dig en snabbare
> bredbandsuppkoppling.
> > Sök och jämför priser hos Kelkoo.
> >
> http://www.kelkoo.se/c-100015813-bredband.html?partnerId=96914325
> >
> 
> 
> 
> -- 
> --Noble Paul


  __
Går det långsamt? Skaffa dig en snabbare bredbandsuppkoppling. 
Sök och jämför priser hos Kelkoo.
http://www.kelkoo.se/c-100015813-bredband.html?partnerId=96914325

Multiple HttpDataSource

2009-03-02 Thread Antonio Eggberg


Hello all,

It was not very clear to me If I can have more then 1 HTTPDataSource in the 
config file also this means I will have more then 1 

 


 

correct? 

I am trying to index multiple RSS feeds also are there any limits? Also is it 
possible to mix and match i.e. my internat database + external RSS feeds as 
data source? 

Thanks again for clarification.
Antonio


  __
Går det långsamt? Skaffa dig en snabbare bredbandsuppkoppling. 
Sök och jämför priser hos Kelkoo.
http://www.kelkoo.se/c-100015813-bredband.html?partnerId=96914325

Re: exact field match

2009-01-26 Thread Antonio Zippo

it works...

thanks for your help

bye

Da: Erick Erickson 
A: solr-user@lucene.apache.org
Inviato: Lunedì 26 gennaio 2009, 20:29:17
Oggetto: Re: exact field match

You need to index and search using something like
KeywordAnalyzer. That analyzer does no tokenizing/
data transformation or such. For instance, it
doesn't fold case.

You will be unable to search for "bond" and get a hit
in this case, so one solution is to use two fields, and
search one or the other depending upon your needs.
e.g.
myField
myFieldTokenized

Each field gets a complete copy of the data, and you search
"myField" in the case you're describing and
myFieldTokenized when you want to match on "bond".

Of course, if you never want a hit on "bond", you don't need the
Tokenized field.

Best
Erick

On Mon, Jan 26, 2009 at 2:15 PM, Antonio Zippo  wrote:

> Hi all,
>
> i'm using a string field named "myField"
> and 2 documents containing:
> 1. myField="my name is james bond"
> 2. myField="james bond"
>
> if i use a query like this:
> myField:"james bond" it returns 2 documents
>
> how can i get only the second document using a string or text field? I need
> to search the document with the exact valuenor documents containing the
> exact phrase in value
>
> thanks in advance
>
>
>

exact field match

2009-01-26 Thread Antonio Zippo

Hi all,

i'm using a string field named "myField"
and 2 documents containing:
1. myField="my name is james bond"
2. myField="james bond"

if i use a query like this: 
myField:"james bond" it returns 2 documents

how can i get only the second document using a string or text field? I need to 
search the document with the exact valuenor documents containing the exact 
phrase in value

thanks in advance

Re: how large can the index be?

2008-12-29 Thread Antonio Eggberg

Thanks you very much for your answer.

I was afraid of that the each document has about 20 fields.. As you pointed out 
it will slow down. Anyway I am thinking is it not possible to do the following:

Load Balancer 
 |
Solr A, Solr B, ...
 |
  one index

So I send 50% query to Solr A, 50% to Solr B and so forth.. is this not good? 
Also to add The index will be like a mounted drive to the solr boxes... On the 
above do I really need to worry about Solr Master, Solr Slave? It probably 
solve my load but I think query speed will be slow...

Just curious anyone using distributed search in production?

Cheers



--- Den mån 2008-12-29 skrev Otis Gospodnetic :

> Från: Otis Gospodnetic 
> Ämne: Re: how large can the index be?
> Till: solr-user@lucene.apache.org
> Datum: måndag 29 december 2008 21.53
> Hi Antonio,
> 
> Besides thinking in terms of documents, you also need to
> think in terms of index size on the file system vs. the
> amount of RAM your search application/server can use.  50M
> documents may be doable on a single server if those
> documents are not too large and you have sufficient RAM.  It
> gets even better if your index doesn't change very often
> and if you can get decent hit ratios on the various Solr
> caches.
> 
> If you are indexing largish documents, or even something as
> small as an average web page, 50M docs may be too much on a
> "commodity box" (say dual core 8 GB RAM box)
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
> > From: Antonio Eggberg 
> > To: solr-user@lucene.apache.org
> > Sent: Monday, December 29, 2008 3:41:48 PM
> > Subject: how large can the index be?
> > 
> > Hi,
> > 
> > We are running successfully a solr index of 3 million
> docs. I have just been 
> > informed that our index size will increase to 50
> million. I been going through 
> > the doc 
> > 
> > http://wiki.apache.org/solr/DistributedSearch
> > 
> > Seems like we will loose out on the date facet and
> some more other stuff that we 
> > use. which is important to us. So far we been using 1
> index and 1 machine. 
> > 
> > Can I still stick with my 1 index but have many query
> servers? We don't update 
> > our index so often this are rather static data. Over
> the past year we have 
> > updated the index data a total of 3 times and about
> 300 records :)
> > 
> > Can someone provide some idea how/what should I do to
> deal with new datasets?.
> > 
> > Thanks for your help.
> > 
> > 
> >  
> __
> > Går det långsamt? Skaffa dig en snabbare
> bredbandsuppkoppling. 
> > Sök och jämför priser hos Kelkoo.
> >
> http://www.kelkoo.se/c-100015813-bredband.html?partnerId=96914325


  __
Låna pengar utan säkerhet. Jämför vilkor online hos Kelkoo.
http://www.kelkoo.se/c-100390123-lan-utan-sakerhet.html?partnerId=96915014

how large can the index be?

2008-12-29 Thread Antonio Eggberg

Hi,

We are running successfully a solr index of 3 million docs. I have just been 
informed that our index size will increase to 50 million. I been going through 
the doc 

http://wiki.apache.org/solr/DistributedSearch

Seems like we will loose out on the date facet and some more other stuff that 
we use. which is important to us. So far we been using 1 index and 1 machine. 

Can I still stick with my 1 index but have many query servers? We don't update 
our index so often this are rather static data. Over the past year we have 
updated the index data a total of 3 times and about 300 records :)

Can someone provide some idea how/what should I do to deal with new datasets?.

Thanks for your help.


  __
Går det långsamt? Skaffa dig en snabbare bredbandsuppkoppling. 
Sök och jämför priser hos Kelkoo.
http://www.kelkoo.se/c-100015813-bredband.html?partnerId=96914325

Re: TextField size limit

>
> No need to re-index with this change.
> But you will have to re-index any documents that got cut off of course.
> 
> -Yonik
> 

Ok, thanks...
I hoped to reindex the documents over the existent index (with incremental 
update...while solr is running) ...and without delete the index folder

But the important is to solve the problem ;-)

Thanks...
  Antonio

Re: TextField size limit

> 
> Check your solrconfig.xml:
> 
>  1
> 
> That's probably the truncating factor.  That's the maximum number of terms, 
> not bytes or characters.
> 
> Erik
> 


Thanks... I think it could be the problem.
i tried to count whitespace in a single text and it's over 55.000 ... but solr 
truncates to 10.000

do you know if I can change the value to 100.000 without recreate the index? 
(when I modify schema.xml I need to create the index again but with 
solrconfig.xml?)

Thanks,
  Antonio

TextField size limit

Hi all,

i have a TextField containing over 400k of text

when i try to search a word solr doesn't return any result but if I search 
for a single document, I can see that the word exists there

So I suppose that solr has a textfield size limit (the field is indexed 
using a tokenizer and some filters)

Could anyone help me to undestand the problem? and if is it possible to solve?

Thanks in advance,
  Antonio

Re: Multi tokenizer