Re: Limiting facets for huge data - setting indexed=false in schema.xml

2009-08-01 Thread Yonik Seeley
On Fri, Jul 31, 2009 at 3:19 PM, Yao Ge wrote:
> Having a large number of fields is not the same as having a large number of
> facets. To facets are something you would display to users as aid for query
> refinement or navigation. There is no way for a user to use 3700 facets at
> the same time.

Indeed... it may just be a terminology issue.  Likely it's one field
with 3700 possible values.

-Yonik
http://www.lucidimagination.com


Re: Limiting facets for huge data - setting indexed=false in schema.xml

2009-07-31 Thread Yao Ge

Having a large number of fields is not the same as having a large number of
facets. To facets are something you would display to users as aid for query
refinement or navigation. There is no way for a user to use 3700 facets at
the same time. So it more of question on how to determine what facets to
fetch on search time based on the user's actions or based on certain
predefined configurations. I have written an application with 30 some
facetable fields on millions of records, I also ran into the issue of
calculate all facets as the server resources as limited to number of caches
available and CPU cycles available for facet calculations. I then realize
why display all these facet regardless user want to see them or not? I have
then change to approach to only fetch minimum set of facets by default and
make the rest of facets fields open on-demand (using AJAX). I was able to
dramatically increase the response time by spreading the facet loading
overtime. There are still issues of total facet caches when you have a large
number available facets, but you need realistically evaluate what does it
means to a user to have large number of facet. I don't think on typical user
interface having more than 10 filters showing at the same time will be any
more effective than having a small number of filters to begin with and
progressive showing more on-demand (hierarchical facets?)


Rahul R wrote:
> 
> Hello,
> We are trying to get Solr to work for a really huge parts database.
> Details
> of the database
> - 55 million parts
> - Totally 3700 properties (facets). But each record will not have value
> for
> all properties.
> - Most of these facets are defined as dynamic fields within the Solr Index
> 
> We were getting really unacceptable timing while doing faceting/searches
> on
> an index created with this database. With only one user using the system,
> query times are in excess of 1 minute. With more users concurrently using
> the system, the response times are further high.
> 
> We thought that by limiting the number of properties that are available
> for
> faceting, the performance can be improved. To test this, we enabled only 6
> properties for faceting by setting indexed=true (in schema.xml) for only
> these properties. All other properties which are defined as dynamic
> properties had indexed=false. The observations after this change :
> 
> - Index size reduced by a meagre 5 % only
> - Performance did not improve. Infact during PSR run we observed that it
> degraded.
> 
> My questions:
>  - Will reducing the number of facets improve faceting and search
> performance ?
> - Is there a better way to reduce the number of facets ?
> - Will having a large number of properties defined as dynamic fields,
> reduce
> performance ?
> 
> Thank you.
> 
> Regards
> Rahul
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Limiting-facets-for-huge-data---setting-indexed%3Dfalse-in-schema.xml-tp24751763p24761778.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Limiting facets for huge data - setting indexed=false in schema.xml

2009-07-31 Thread Rahul R
We are using 1.3.0. Thanks for the suggestion. Will see if I can try one of
the ngihtly builds.

On Fri, Jul 31, 2009 at 7:49 PM, Erik Hatcher wrote:

> What version of Solr?   Try a nightly build if you're at Solr 1.3 or
> earlier and you'll be amazed at the difference.
>
>Erik
>
>
> On Jul 31, 2009, at 10:00 AM, Rahul R wrote:
>
> In a production environment, having the caches enabled makes a lot of
>> sense.
>> And most definitely we will be enabling them. However, the primary idea of
>> this exercise is to verify if limiting the number of facets will actually
>> improve the performance.
>>
>> An update on this. I did verify and looks like although I set
>> indexed=false
>> for most of the properties, I have not blocked them from participating in
>> the query. I now enabled only 7 properties for faceting. Now at any given
>> time only a maximum of 7 facets will participate in the query. Performance
>> has now improved from an erstwhile 60 seconds to around 10 seconds.
>>
>> This really helped. Thanks a lot !
>>
>> Regards
>> Rahul
>>
>> On Fri, Jul 31, 2009 at 6:34 PM, Erik Hatcher > >wrote:
>>
>>
>>> On Jul 31, 2009, at 7:17 AM, Rahul R wrote:
>>>
>>> Erik,
>>>
 I understand that caching is going to improve performance. Infact we did
 a
 PSR run with caches enabled and we got awesome results. But these
 wouldn't
 be really representative because the PSR scripts will be doing the same
 searches again and again. These would be cached and there would be
 virtually
 no evictions. This is not a practical case.


>>> I don't understand how this is not practical.  Why wouldn't having the
>>> caches warmed and filled with the facets be practical for your needs?
>>>
>>>  Erik
>>>
>>>
>>>
>


Re: Limiting facets for huge data - setting indexed=false in schema.xml

2009-07-31 Thread Erik Hatcher
What version of Solr?   Try a nightly build if you're at Solr 1.3 or  
earlier and you'll be amazed at the difference.


Erik

On Jul 31, 2009, at 10:00 AM, Rahul R wrote:

In a production environment, having the caches enabled makes a lot  
of sense.
And most definitely we will be enabling them. However, the primary  
idea of
this exercise is to verify if limiting the number of facets will  
actually

improve the performance.

An update on this. I did verify and looks like although I set  
indexed=false
for most of the properties, I have not blocked them from  
participating in
the query. I now enabled only 7 properties for faceting. Now at any  
given
time only a maximum of 7 facets will participate in the query.  
Performance

has now improved from an erstwhile 60 seconds to around 10 seconds.

This really helped. Thanks a lot !

Regards
Rahul

On Fri, Jul 31, 2009 at 6:34 PM, Erik Hatcher >wrote:




On Jul 31, 2009, at 7:17 AM, Rahul R wrote:

Erik,
I understand that caching is going to improve performance. Infact  
we did a
PSR run with caches enabled and we got awesome results. But these  
wouldn't
be really representative because the PSR scripts will be doing the  
same

searches again and again. These would be cached and there would be
virtually
no evictions. This is not a practical case.



I don't understand how this is not practical.  Why wouldn't having  
the

caches warmed and filled with the facets be practical for your needs?

  Erik






Re: Limiting facets for huge data - setting indexed=false in schema.xml

2009-07-31 Thread Rahul R
In a production environment, having the caches enabled makes a lot of sense.
And most definitely we will be enabling them. However, the primary idea of
this exercise is to verify if limiting the number of facets will actually
improve the performance.

An update on this. I did verify and looks like although I set indexed=false
for most of the properties, I have not blocked them from participating in
the query. I now enabled only 7 properties for faceting. Now at any given
time only a maximum of 7 facets will participate in the query. Performance
has now improved from an erstwhile 60 seconds to around 10 seconds.

This really helped. Thanks a lot !

Regards
Rahul

On Fri, Jul 31, 2009 at 6:34 PM, Erik Hatcher wrote:

>
> On Jul 31, 2009, at 7:17 AM, Rahul R wrote:
>
> Erik,
>> I understand that caching is going to improve performance. Infact we did a
>> PSR run with caches enabled and we got awesome results. But these wouldn't
>> be really representative because the PSR scripts will be doing the same
>> searches again and again. These would be cached and there would be
>> virtually
>> no evictions. This is not a practical case.
>>
>
> I don't understand how this is not practical.  Why wouldn't having the
> caches warmed and filled with the facets be practical for your needs?
>
>Erik
>
>


Re: Limiting facets for huge data - setting indexed=false in schema.xml

2009-07-31 Thread Erik Hatcher


On Jul 31, 2009, at 7:17 AM, Rahul R wrote:


Erik,
I understand that caching is going to improve performance. Infact we  
did a
PSR run with caches enabled and we got awesome results. But these  
wouldn't
be really representative because the PSR scripts will be doing the  
same
searches again and again. These would be cached and there would be  
virtually

no evictions. This is not a practical case.


I don't understand how this is not practical.  Why wouldn't having the  
caches warmed and filled with the facets be practical for your needs?


Erik



Re: Limiting facets for huge data - setting indexed=false in schema.xml

2009-07-31 Thread Rahul R
Erik,
I understand that caching is going to improve performance. Infact we did a
PSR run with caches enabled and we got awesome results. But these wouldn't
be really representative because the PSR scripts will be doing the same
searches again and again. These would be cached and there would be virtually
no evictions. This is not a practical case.

My hardware (in the PSR environment where I am testing) is pretty good - 12
CPU, 24 G RAM, Ultrasparc III 1.2 GHz processors, Solaris 10. We have
allocated 3.2 GB RAM for Weblogic (JVM). This is the maximum that I am able
to allocate for one JVM.
I think I need to go back and check if I am not using all the fields in the
query. I understand that setting indexed=false alone will not ensure that
all fields don't participate in the query.

Thanks a lot for your response.

Regards
Rahul
On Fri, Jul 31, 2009 at 3:33 PM, Erik Hatcher wrote:

>
> On Jul 31, 2009, at 2:35 AM, Rahul R wrote:
>
> Hello,
>> We are trying to get Solr to work for a really huge parts database.
>> Details
>> of the database
>> - 55 million parts
>> - Totally 3700 properties (facets). But each record will not have value
>> for
>> all properties.
>> - Most of these facets are defined as dynamic fields within the Solr Index
>>
>> We were getting really unacceptable timing while doing faceting/searches
>> on
>> an index created with this database.
>>
>
> Were you accounting for cache warming?  Were your caches sized
> appropriately?  What kind of hardware and RAM were you using?  What were the
> JVM settings?
>
> And certainly not least important - what version of Solr are you running?
> The difference in faceting performance and scalability between Solr 1.3 and
> what will be Solr 1.4 is quite dramatic.
>
> We thought that by limiting the number of properties that are available for
>> faceting, the performance can be improved. To test this, we enabled only 6
>> properties for faceting by setting indexed=true (in schema.xml) for only
>> these properties. All other properties which are defined as dynamic
>> properties had indexed=false.
>>
>
> These settings won't matter - what matters in this case is what facets you
> request, not what is actually in the index.
>
>
> My questions:
>> - Will reducing the number of facets improve faceting and search
>> performance ?
>>
>
> Reducing what fields you request will, of course.  But what you actually
> index has no effect on performance until you request it.
>
> - Is there a better way to reduce the number of facets ?
>>
>
> Hard to say without doing a deeper analysis of your needs.
>
> - Will having a large number of properties defined as dynamic fields,
>> reduce
>> performance ?
>>
>
> Dynamic fields versus statically named fields have no effect on
> performance.
>
>Erik
>
>


Re: Limiting facets for huge data - setting indexed=false in schema.xml

2009-07-31 Thread Erik Hatcher


On Jul 31, 2009, at 2:35 AM, Rahul R wrote:


Hello,
We are trying to get Solr to work for a really huge parts database.  
Details

of the database
- 55 million parts
- Totally 3700 properties (facets). But each record will not have  
value for

all properties.
- Most of these facets are defined as dynamic fields within the Solr  
Index


We were getting really unacceptable timing while doing faceting/ 
searches on

an index created with this database.


Were you accounting for cache warming?  Were your caches sized  
appropriately?  What kind of hardware and RAM were you using?  What  
were the JVM settings?


And certainly not least important - what version of Solr are you  
running?   The difference in faceting performance and scalability  
between Solr 1.3 and what will be Solr 1.4 is quite dramatic.


We thought that by limiting the number of properties that are  
available for
faceting, the performance can be improved. To test this, we enabled  
only 6
properties for faceting by setting indexed=true (in schema.xml) for  
only

these properties. All other properties which are defined as dynamic
properties had indexed=false.


These settings won't matter - what matters in this case is what facets  
you request, not what is actually in the index.




My questions:
- Will reducing the number of facets improve faceting and search
performance ?


Reducing what fields you request will, of course.  But what you  
actually index has no effect on performance until you request it.



- Is there a better way to reduce the number of facets ?


Hard to say without doing a deeper analysis of your needs.

- Will having a large number of properties defined as dynamic  
fields, reduce

performance ?


Dynamic fields versus statically named fields have no effect on  
performance.


Erik



Limiting facets for huge data - setting indexed=false in schema.xml

2009-07-30 Thread Rahul R
Hello,
We are trying to get Solr to work for a really huge parts database. Details
of the database
- 55 million parts
- Totally 3700 properties (facets). But each record will not have value for
all properties.
- Most of these facets are defined as dynamic fields within the Solr Index

We were getting really unacceptable timing while doing faceting/searches on
an index created with this database. With only one user using the system,
query times are in excess of 1 minute. With more users concurrently using
the system, the response times are further high.

We thought that by limiting the number of properties that are available for
faceting, the performance can be improved. To test this, we enabled only 6
properties for faceting by setting indexed=true (in schema.xml) for only
these properties. All other properties which are defined as dynamic
properties had indexed=false. The observations after this change :

- Index size reduced by a meagre 5 % only
- Performance did not improve. Infact during PSR run we observed that it
degraded.

My questions:
 - Will reducing the number of facets improve faceting and search
performance ?
- Is there a better way to reduce the number of facets ?
- Will having a large number of properties defined as dynamic fields, reduce
performance ?

Thank you.

Regards
Rahul