Re: Sorting performance

2008-10-18 Thread christophe
It is slow each time I run it. (I test it from the Solr admin console or 
from a JAVA program using the Http client).

I do not get the OOM each time.

Thx
Christophe

Otis Gospodnetic wrote:

Is the sorted query slow only the first time or every time you run it?

You got an OOM?  What -Xmx value are you using?  Try increasing it.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
  

From: christophe [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, October 17, 2008 1:28:52 PM
Subject: Sorting performance 


Hi,

I'm doing some tests with Solr1.3
I have loaded around 7M documents, each with a few stored and indexed 
fields.


This query: text:sometext returns the results, sorted by score in a few 
milliseconds. (I display 10 out of 8747 matched documents)
This one: text:sometext;id desc   takes something like 60s or more to 
return the data (when it doesn't fails with an out of memory error). (id 
is a string type).

I have tried to display only id, same results.

Any ideas ? I'm sure I'm doing something wrong.

My schema is based on the sample, with the following fields:

  
/ 
  
  
  
  
  
multiValued=true /
  
default=NOW multiValued=false/
  



Thanks
Christophe



  




Re: Sorting performance

2008-10-18 Thread christophe

Here are the memory parameters I'm using now(Tomcat): -Xms2024m -Xmx2024m
With those values, the second query is way faster. Only the first one is 
very slow.

Thanks for the tip.
However, I'm wondering if will be enough and I will not hit the same 
issues when I will have many users searching at the same time: I will do 
a stress test to check this.


Thanks
Christophe

christophe wrote:
It is slow each time I run it. (I test it from the Solr admin console 
or from a JAVA program using the Http client).

I do not get the OOM each time.

Thx
Christophe

Otis Gospodnetic wrote:

Is the sorted query slow only the first time or every time you run it?

You got an OOM?  What -Xmx value are you using?  Try increasing it.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 

From: christophe [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, October 17, 2008 1:28:52 PM
Subject: Sorting performance
Hi,

I'm doing some tests with Solr1.3
I have loaded around 7M documents, each with a few stored and 
indexed fields.


This query: text:sometext returns the results, sorted by score in a 
few milliseconds. (I display 10 out of 8747 matched documents)
This one: text:sometext;id desc   takes something like 60s or more 
to return the data (when it doesn't fails with an out of memory 
error). (id is a string type).

I have tried to display only id, same results.

Any ideas ? I'm sure I'm doing something wrong.

My schema is based on the sample, with the following fields:

  /   multiValued=true /
  default=NOW multiValued=false/
 


Thanks
Christophe



  






solr 1.3 multi language?

2008-10-18 Thread sunnyfr

Hi everybody,

I would like you to help me a bit about managing this multi-language part,
actually an example would be excellent.
So I did multi index in one core and I would like you to let me know what
you think about the way that I've managed that, is there more parameter that
I don't know, some help and an example would be great full.

Thanks a lot,

I need to manage this language :
French (FR)
English (EN)
German (DE)
Spanish (ES)
Russian (RU)
Portuguese (Brazilian) (PT)
Polish (PO)
Dutch (NL)
Greek (GR)
Japanese (JA)
Turkish (TR)

My schema looks like that :

fieldType name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/

filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

 !-- languages --



fieldtype name=text_fr class=solr.TextField

 analyzer

tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.StandardFilterFactory/

filter class=solr.ISOLatin1AccentFilterFactory/

filter class=solr.LowerCaseFilterFactory/



filter class=solr.SnowballPorterFilterFactory 
language=French /

  /analyzer

/fieldtype



fieldtype name=text_en class=solr.TextField

 analyzer

tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.StandardFilterFactory/

filter class=solr.ISOLatin1AccentFilterFactory/

filter class=solr.LowerCaseFilterFactory/

   

filter class=solr.SnowballPorterFilterFactory 
language=English /

  /analyzer

/fieldtype



fieldtype name=text_de class=solr.TextField

 analyzer

tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.StandardFilterFactory/

filter class=solr.ISOLatin1AccentFilterFactory/

filter class=solr.LowerCaseFilterFactory/

   

filter class=solr.SnowballPorterFilterFactory 
language=German /

  /analyzer

/fieldtype



fieldtype name=text_es class=solr.TextField

 analyzer

tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.StandardFilterFactory/

filter class=solr.ISOLatin1AccentFilterFactory/

filter class=solr.LowerCaseFilterFactory/

   

filter class=solr.SnowballPorterFilterFactory 
language=Spanish /

  /analyzer

/fieldtype



fieldType name=text_ru class=solr.TextField

  analyzer class=org.apache.lucene.analysis.ru.RussianAnalyzer/

  filter class=solr.SnowballPorterFilterFactory 
language=Russian
/

/fieldType



fieldtype name=text_pt class=solr.TextField

 analyzer

tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.StandardFilterFactory/

filter class=solr.ISOLatin1AccentFilterFactory/

filter class=solr.LowerCaseFilterFactory/



filter class=solr.SnowballPorterFilterFactory 
language=Portuguese /

  /analyzer

/fieldtype



fieldtype name=text_it class=solr.TextField

 analyzer

tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.StandardFilterFactory/

filter class=solr.ISOLatin1AccentFilterFactory/

filter class=solr.LowerCaseFilterFactory/



filter class=solr.SnowballPorterFilterFactory 
language=Italian /

  /analyzer

/fieldtype






Re: Tree Faceting Component

2008-10-18 Thread Erik Hatcher

Jeremy,

Great troubleshooting!  You were spot on.

I've posted a new patch that fixes the issue.

Erik


On Oct 16, 2008, at 9:53 PM, Jeremy Hinegardner wrote:

After a bit more investigating, it appears that any facet tree where  
the first
item is numerical or boolean or some non-textual type does not  
produce any

secondary facets.  This includes sint, sfloat, boolean and such.

For instance, on the sample index:

 facet.tree=sku,cat = works
 facet.tree=cat,sku = works
 facet.tree=manu_exact,cat = works
 facet.tree=cat,manu_exact = works
 facet.tree=popularity,inStock = fails
 facet.tree=inStock,popularity = fails
 facet.tree=manu_exact,weight = works
 facet.tree=weight,manu_exact = fails

I'm not very familiar with the Solr / Lucene Java API, so this is  
slow going
here.  Maybe I'm barking up the wrong tree, but is the TermQuery for  
the
secondary SimpleFacet messing up some how?  I tried to dig into the  
code, but

was unsuccessful.

It appears to me that the searcher never returns a docSet for any  
TermQuery

where the field being searched has a type that is non-textual.

As a final test, I changed the schema and made the inStock field a  
'text' field
instead of 'boolean'.  When I did that, and reindexed the sample  
data then the

tree facet would work correctly as either facet.tree=cat,inStock or
facet.tree=inStock,cat.  Whereas before it would only work in the  
former.


enjoy,

-jeremy

On Thu, Oct 16, 2008 at 10:55:49AM -0600, Jeremy Hinegardner wrote:

Erik,

After some more experiments, I can get it to perform incorrectly  
using the

sample solr data.

The example query from SOLR-792 ticket:
 
http://localhost:8983/solr/select?q=*:*rows=0facet=onfacet.field=catfacet.tree=cat,inStockwt=jsonindent=on

Make a few altertions to the query:

1) swap the tree order - all tree facets are 0
 
http://localhost:8983/solr/select?q=*:*rows=0facet=onfacet.field=catfacet.tree=inStock,catwt=jsonindent=on

2) swap tree order and change facet.field to be the  
primary( inStock )

 
http://localhost:8983/solr/select?q=*:*rows=0facet=onfacet.field=inStockfacet.tree=inStock,catwt=jsonindent=on

Also, can tree faceting work distributed?

enjoy,

-jeremy

On Wed, Oct 15, 2008 at 05:41:21PM -0700, Erik Hatcher wrote:

Jeremy,

What's the full request you're making to Solr?

Do you get values when you facet normally on date_id and type?
facet.field=date_idfacet.field=type

Erik

p.s. this e-mail is not on the list (on a hotel net connection  
blocking

outgoing mail) - feel free to reply to this back on the list though.

On Oct 15, 2008, at 5:29 PM, Jeremy Hinegardner wrote:


Hi all,

I'm testing out using the Tree Faceting Component (SOLR-792) on  
top of

Solr 1.3.

It looks like it would do exactly what I want, but something is not
working
correctly with my schema.  When I use the example schema, it  
works just

fine,
but I swap out the example schema's and example index and then  
put in my

index
and and schema,  tree facet does not work.

Both of the fields I want to facet can be faceted individually,  
but when I

say
facet.tree=date_id,type then all of the values are 0.

Does anyone have any ideas on where I should start looking ?

enjoy,

-jeremy

--
= 
= 
= 
= 
= 
===

Jeremy Hinegardner  [EMAIL PROTECTED]




--
= 
= 
= 
=
Jeremy Hinegardner   
[EMAIL PROTECTED]




--
= 
= 
==

Jeremy Hinegardner  [EMAIL PROTECTED]




Re: Tree Faceting Component

2008-10-18 Thread Jeremy Hinegardner
Erik,

Thanks, its working great.  Next is to make it distributed.  I was thinking of
working on this, is the FacetCompoent a good model to work from to make the
TreeFacet distributed?  I should probably join solr-dev for that conversation I
assume :-).

-jeremy

On Thu, Oct 16, 2008 at 11:12:45PM -0700, Erik Hatcher wrote:
 Jeremy,

 Great troubleshooting!  You were spot on.

 I've posted a new patch that fixes the issue.

   Erik


 On Oct 16, 2008, at 9:53 PM, Jeremy Hinegardner wrote:

 After a bit more investigating, it appears that any facet tree where the 
 first
 item is numerical or boolean or some non-textual type does not produce any
 secondary facets.  This includes sint, sfloat, boolean and such.

 For instance, on the sample index:

  facet.tree=sku,cat = works
  facet.tree=cat,sku = works
  facet.tree=manu_exact,cat = works
  facet.tree=cat,manu_exact = works
  facet.tree=popularity,inStock = fails
  facet.tree=inStock,popularity = fails
  facet.tree=manu_exact,weight = works
  facet.tree=weight,manu_exact = fails

 I'm not very familiar with the Solr / Lucene Java API, so this is slow 
 going
 here.  Maybe I'm barking up the wrong tree, but is the TermQuery for the
 secondary SimpleFacet messing up some how?  I tried to dig into the code, 
 but
 was unsuccessful.

 It appears to me that the searcher never returns a docSet for any 
 TermQuery
 where the field being searched has a type that is non-textual.

 As a final test, I changed the schema and made the inStock field a 'text' 
 field
 instead of 'boolean'.  When I did that, and reindexed the sample data then 
 the
 tree facet would work correctly as either facet.tree=cat,inStock or
 facet.tree=inStock,cat.  Whereas before it would only work in the former.

 enjoy,

 -jeremy

 On Thu, Oct 16, 2008 at 10:55:49AM -0600, Jeremy Hinegardner wrote:
 Erik,

 After some more experiments, I can get it to perform incorrectly using 
 the
 sample solr data.

 The example query from SOLR-792 ticket:
  
 http://localhost:8983/solr/select?q=*:*rows=0facet=onfacet.field=catfacet.tree=cat,inStockwt=jsonindent=on

 Make a few altertions to the query:

 1) swap the tree order - all tree facets are 0
  
 http://localhost:8983/solr/select?q=*:*rows=0facet=onfacet.field=catfacet.tree=inStock,catwt=jsonindent=on

 2) swap tree order and change facet.field to be the primary( inStock )
  
 http://localhost:8983/solr/select?q=*:*rows=0facet=onfacet.field=inStockfacet.tree=inStock,catwt=jsonindent=on

 Also, can tree faceting work distributed?

 enjoy,

 -jeremy

 On Wed, Oct 15, 2008 at 05:41:21PM -0700, Erik Hatcher wrote:
 Jeremy,

 What's the full request you're making to Solr?

 Do you get values when you facet normally on date_id and type?
 facet.field=date_idfacet.field=type

Erik

 p.s. this e-mail is not on the list (on a hotel net connection blocking
 outgoing mail) - feel free to reply to this back on the list though.

 On Oct 15, 2008, at 5:29 PM, Jeremy Hinegardner wrote:

 Hi all,

 I'm testing out using the Tree Faceting Component (SOLR-792) on top of
 Solr 1.3.

 It looks like it would do exactly what I want, but something is not
 working
 correctly with my schema.  When I use the example schema, it works just
 fine,
 but I swap out the example schema's and example index and then put in 
 my
 index
 and and schema,  tree facet does not work.

 Both of the fields I want to facet can be faceted individually, but 
 when I
 say
 facet.tree=date_id,type then all of the values are 0.

 Does anyone have any ideas on where I should start looking ?

 enjoy,

 -jeremy

 -- 
 
 Jeremy Hinegardner  [EMAIL PROTECTED]


 -- 
 
 Jeremy Hinegardner  [EMAIL PROTECTED]


 -- 
 
 Jeremy Hinegardner  [EMAIL PROTECTED]


-- 

 Jeremy Hinegardner  [EMAIL PROTECTED] 



Re: Sorting performance

2008-10-18 Thread Mark Miller
You need to setup a warming query that sorts so that the initial long  
query is done behind the scenes. Users first query will then be fast.  
Solrconfig.


- Mark


On Oct 18, 2008, at 1:34 AM, christophe [EMAIL PROTECTED]  
wrote:


Here are the memory parameters I'm using now(Tomcat): -Xms2024m - 
Xmx2024m
With those values, the second query is way faster. Only the first  
one is very slow.

Thanks for the tip.
However, I'm wondering if will be enough and I will not hit the same  
issues when I will have many users searching at the same time: I  
will do a stress test to check this.


Thanks
Christophe

christophe wrote:
It is slow each time I run it. (I test it from the Solr admin  
console or from a JAVA program using the Http client).

I do not get the OOM each time.

Thx
Christophe

Otis Gospodnetic wrote:
Is the sorted query slow only the first time or every time you run  
it?


You got an OOM?  What -Xmx value are you using?  Try increasing it.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 


From: christophe [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, October 17, 2008 1:28:52 PM
Subject: Sorting performance
Hi,

I'm doing some tests with Solr1.3
I have loaded around 7M documents, each with a few stored and  
indexed fields.


This query: text:sometext returns the results, sorted by score in  
a few milliseconds. (I display 10 out of 8747 matched documents)
This one: text:sometext;id desc   takes something like 60s or  
more to return the data (when it doesn't fails with an out of  
memory error). (id is a string type).

I have tried to display only id, same results.

Any ideas ? I'm sure I'm doing something wrong.

My schema is based on the sample, with the following fields:

 /   multiValued=true /
 default=NOW multiValued=false/

Thanks
Christophe