Re: Limiting facets for huge data - setting indexed=false in schema.xml
On Fri, Jul 31, 2009 at 3:19 PM, Yao Ge wrote: > Having a large number of fields is not the same as having a large number of > facets. To facets are something you would display to users as aid for query > refinement or navigation. There is no way for a user to use 3700 facets at > the same time. Indeed... it may just be a terminology issue. Likely it's one field with 3700 possible values. -Yonik http://www.lucidimagination.com
Re: Limiting facets for huge data - setting indexed=false in schema.xml
Having a large number of fields is not the same as having a large number of facets. To facets are something you would display to users as aid for query refinement or navigation. There is no way for a user to use 3700 facets at the same time. So it more of question on how to determine what facets to fetch on search time based on the user's actions or based on certain predefined configurations. I have written an application with 30 some facetable fields on millions of records, I also ran into the issue of calculate all facets as the server resources as limited to number of caches available and CPU cycles available for facet calculations. I then realize why display all these facet regardless user want to see them or not? I have then change to approach to only fetch minimum set of facets by default and make the rest of facets fields open on-demand (using AJAX). I was able to dramatically increase the response time by spreading the facet loading overtime. There are still issues of total facet caches when you have a large number available facets, but you need realistically evaluate what does it means to a user to have large number of facet. I don't think on typical user interface having more than 10 filters showing at the same time will be any more effective than having a small number of filters to begin with and progressive showing more on-demand (hierarchical facets?) Rahul R wrote: > > Hello, > We are trying to get Solr to work for a really huge parts database. > Details > of the database > - 55 million parts > - Totally 3700 properties (facets). But each record will not have value > for > all properties. > - Most of these facets are defined as dynamic fields within the Solr Index > > We were getting really unacceptable timing while doing faceting/searches > on > an index created with this database. With only one user using the system, > query times are in excess of 1 minute. With more users concurrently using > the system, the response times are further high. > > We thought that by limiting the number of properties that are available > for > faceting, the performance can be improved. To test this, we enabled only 6 > properties for faceting by setting indexed=true (in schema.xml) for only > these properties. All other properties which are defined as dynamic > properties had indexed=false. The observations after this change : > > - Index size reduced by a meagre 5 % only > - Performance did not improve. Infact during PSR run we observed that it > degraded. > > My questions: > - Will reducing the number of facets improve faceting and search > performance ? > - Is there a better way to reduce the number of facets ? > - Will having a large number of properties defined as dynamic fields, > reduce > performance ? > > Thank you. > > Regards > Rahul > > -- View this message in context: http://www.nabble.com/Limiting-facets-for-huge-data---setting-indexed%3Dfalse-in-schema.xml-tp24751763p24761778.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Limiting facets for huge data - setting indexed=false in schema.xml
We are using 1.3.0. Thanks for the suggestion. Will see if I can try one of the ngihtly builds. On Fri, Jul 31, 2009 at 7:49 PM, Erik Hatcher wrote: > What version of Solr? Try a nightly build if you're at Solr 1.3 or > earlier and you'll be amazed at the difference. > >Erik > > > On Jul 31, 2009, at 10:00 AM, Rahul R wrote: > > In a production environment, having the caches enabled makes a lot of >> sense. >> And most definitely we will be enabling them. However, the primary idea of >> this exercise is to verify if limiting the number of facets will actually >> improve the performance. >> >> An update on this. I did verify and looks like although I set >> indexed=false >> for most of the properties, I have not blocked them from participating in >> the query. I now enabled only 7 properties for faceting. Now at any given >> time only a maximum of 7 facets will participate in the query. Performance >> has now improved from an erstwhile 60 seconds to around 10 seconds. >> >> This really helped. Thanks a lot ! >> >> Regards >> Rahul >> >> On Fri, Jul 31, 2009 at 6:34 PM, Erik Hatcher > >wrote: >> >> >>> On Jul 31, 2009, at 7:17 AM, Rahul R wrote: >>> >>> Erik, >>> I understand that caching is going to improve performance. Infact we did a PSR run with caches enabled and we got awesome results. But these wouldn't be really representative because the PSR scripts will be doing the same searches again and again. These would be cached and there would be virtually no evictions. This is not a practical case. >>> I don't understand how this is not practical. Why wouldn't having the >>> caches warmed and filled with the facets be practical for your needs? >>> >>> Erik >>> >>> >>> >
Re: Limiting facets for huge data - setting indexed=false in schema.xml
What version of Solr? Try a nightly build if you're at Solr 1.3 or earlier and you'll be amazed at the difference. Erik On Jul 31, 2009, at 10:00 AM, Rahul R wrote: In a production environment, having the caches enabled makes a lot of sense. And most definitely we will be enabling them. However, the primary idea of this exercise is to verify if limiting the number of facets will actually improve the performance. An update on this. I did verify and looks like although I set indexed=false for most of the properties, I have not blocked them from participating in the query. I now enabled only 7 properties for faceting. Now at any given time only a maximum of 7 facets will participate in the query. Performance has now improved from an erstwhile 60 seconds to around 10 seconds. This really helped. Thanks a lot ! Regards Rahul On Fri, Jul 31, 2009 at 6:34 PM, Erik Hatcher >wrote: On Jul 31, 2009, at 7:17 AM, Rahul R wrote: Erik, I understand that caching is going to improve performance. Infact we did a PSR run with caches enabled and we got awesome results. But these wouldn't be really representative because the PSR scripts will be doing the same searches again and again. These would be cached and there would be virtually no evictions. This is not a practical case. I don't understand how this is not practical. Why wouldn't having the caches warmed and filled with the facets be practical for your needs? Erik
Re: Limiting facets for huge data - setting indexed=false in schema.xml
In a production environment, having the caches enabled makes a lot of sense. And most definitely we will be enabling them. However, the primary idea of this exercise is to verify if limiting the number of facets will actually improve the performance. An update on this. I did verify and looks like although I set indexed=false for most of the properties, I have not blocked them from participating in the query. I now enabled only 7 properties for faceting. Now at any given time only a maximum of 7 facets will participate in the query. Performance has now improved from an erstwhile 60 seconds to around 10 seconds. This really helped. Thanks a lot ! Regards Rahul On Fri, Jul 31, 2009 at 6:34 PM, Erik Hatcher wrote: > > On Jul 31, 2009, at 7:17 AM, Rahul R wrote: > > Erik, >> I understand that caching is going to improve performance. Infact we did a >> PSR run with caches enabled and we got awesome results. But these wouldn't >> be really representative because the PSR scripts will be doing the same >> searches again and again. These would be cached and there would be >> virtually >> no evictions. This is not a practical case. >> > > I don't understand how this is not practical. Why wouldn't having the > caches warmed and filled with the facets be practical for your needs? > >Erik > >
Re: Limiting facets for huge data - setting indexed=false in schema.xml
On Jul 31, 2009, at 7:17 AM, Rahul R wrote: Erik, I understand that caching is going to improve performance. Infact we did a PSR run with caches enabled and we got awesome results. But these wouldn't be really representative because the PSR scripts will be doing the same searches again and again. These would be cached and there would be virtually no evictions. This is not a practical case. I don't understand how this is not practical. Why wouldn't having the caches warmed and filled with the facets be practical for your needs? Erik
Re: Limiting facets for huge data - setting indexed=false in schema.xml
Erik, I understand that caching is going to improve performance. Infact we did a PSR run with caches enabled and we got awesome results. But these wouldn't be really representative because the PSR scripts will be doing the same searches again and again. These would be cached and there would be virtually no evictions. This is not a practical case. My hardware (in the PSR environment where I am testing) is pretty good - 12 CPU, 24 G RAM, Ultrasparc III 1.2 GHz processors, Solaris 10. We have allocated 3.2 GB RAM for Weblogic (JVM). This is the maximum that I am able to allocate for one JVM. I think I need to go back and check if I am not using all the fields in the query. I understand that setting indexed=false alone will not ensure that all fields don't participate in the query. Thanks a lot for your response. Regards Rahul On Fri, Jul 31, 2009 at 3:33 PM, Erik Hatcher wrote: > > On Jul 31, 2009, at 2:35 AM, Rahul R wrote: > > Hello, >> We are trying to get Solr to work for a really huge parts database. >> Details >> of the database >> - 55 million parts >> - Totally 3700 properties (facets). But each record will not have value >> for >> all properties. >> - Most of these facets are defined as dynamic fields within the Solr Index >> >> We were getting really unacceptable timing while doing faceting/searches >> on >> an index created with this database. >> > > Were you accounting for cache warming? Were your caches sized > appropriately? What kind of hardware and RAM were you using? What were the > JVM settings? > > And certainly not least important - what version of Solr are you running? > The difference in faceting performance and scalability between Solr 1.3 and > what will be Solr 1.4 is quite dramatic. > > We thought that by limiting the number of properties that are available for >> faceting, the performance can be improved. To test this, we enabled only 6 >> properties for faceting by setting indexed=true (in schema.xml) for only >> these properties. All other properties which are defined as dynamic >> properties had indexed=false. >> > > These settings won't matter - what matters in this case is what facets you > request, not what is actually in the index. > > > My questions: >> - Will reducing the number of facets improve faceting and search >> performance ? >> > > Reducing what fields you request will, of course. But what you actually > index has no effect on performance until you request it. > > - Is there a better way to reduce the number of facets ? >> > > Hard to say without doing a deeper analysis of your needs. > > - Will having a large number of properties defined as dynamic fields, >> reduce >> performance ? >> > > Dynamic fields versus statically named fields have no effect on > performance. > >Erik > >
Re: Limiting facets for huge data - setting indexed=false in schema.xml
On Jul 31, 2009, at 2:35 AM, Rahul R wrote: Hello, We are trying to get Solr to work for a really huge parts database. Details of the database - 55 million parts - Totally 3700 properties (facets). But each record will not have value for all properties. - Most of these facets are defined as dynamic fields within the Solr Index We were getting really unacceptable timing while doing faceting/ searches on an index created with this database. Were you accounting for cache warming? Were your caches sized appropriately? What kind of hardware and RAM were you using? What were the JVM settings? And certainly not least important - what version of Solr are you running? The difference in faceting performance and scalability between Solr 1.3 and what will be Solr 1.4 is quite dramatic. We thought that by limiting the number of properties that are available for faceting, the performance can be improved. To test this, we enabled only 6 properties for faceting by setting indexed=true (in schema.xml) for only these properties. All other properties which are defined as dynamic properties had indexed=false. These settings won't matter - what matters in this case is what facets you request, not what is actually in the index. My questions: - Will reducing the number of facets improve faceting and search performance ? Reducing what fields you request will, of course. But what you actually index has no effect on performance until you request it. - Is there a better way to reduce the number of facets ? Hard to say without doing a deeper analysis of your needs. - Will having a large number of properties defined as dynamic fields, reduce performance ? Dynamic fields versus statically named fields have no effect on performance. Erik
Limiting facets for huge data - setting indexed=false in schema.xml
Hello, We are trying to get Solr to work for a really huge parts database. Details of the database - 55 million parts - Totally 3700 properties (facets). But each record will not have value for all properties. - Most of these facets are defined as dynamic fields within the Solr Index We were getting really unacceptable timing while doing faceting/searches on an index created with this database. With only one user using the system, query times are in excess of 1 minute. With more users concurrently using the system, the response times are further high. We thought that by limiting the number of properties that are available for faceting, the performance can be improved. To test this, we enabled only 6 properties for faceting by setting indexed=true (in schema.xml) for only these properties. All other properties which are defined as dynamic properties had indexed=false. The observations after this change : - Index size reduced by a meagre 5 % only - Performance did not improve. Infact during PSR run we observed that it degraded. My questions: - Will reducing the number of facets improve faceting and search performance ? - Is there a better way to reduce the number of facets ? - Will having a large number of properties defined as dynamic fields, reduce performance ? Thank you. Regards Rahul