Re: any changes about limitations on huge number of fields lately?
Thanks Toke for the input. I think the plan is to facet only on class_u1, class_u2 for queries from user1, etc. So faceting would not happen on all fields on a single query. But still. I did not design the schema, just found out about the number of fields and advised again that, when they asked for a second opinion. We did not get to discuss a different schema, but if we get to this point I will take that plan into consideration for sure. xavi On Sat, May 30, 2015 at 10:17 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote: xavi jmlucjav jmluc...@gmail.com wrote: They reason for such a large number of fields: - users create dynamically 'classes' of documents, say one user creates 10 classes on average - for each 'class', the fields are created like this: unique_id_+fieldname - there are potentially hundreds of thousands of users. Switch to a scheme where you control the names of fields outside of Solr, but share the fields internally: User 1 has 10 custom classes: u1_a, u1_b, u1_c, ... u1_j Internally they are mapped to class1, class2, class3, ... class10 User 2 uses 2 classes: u2_horses, u2_elephants Internally they are mapped to class1, class2 When User 2 queries field u2_horses, you rewrite the query to use class1 instead. There is faceting in each users' fields. So this will result in 1M fields, very sparsely populated. If you are faceting on all of them and if you are not using DocValues, this will explode your memory requirements with vanilla Solr: UnInverted faceting maintains separate a map from all documentIDs to field values (ordinals for Strings) for _all_ the facet fields. Even if you only had 10 million documents and even if your 1 million facet fields all had just 1 value, represented by 1 bit, it would still require 10M * 1M * 1 bits in memory, which is 10 terabyte of RAM. - Toke Eskildsen
Re: any changes about limitations on huge number of fields lately?
Nothing's really changed in that area lately. Your co-worker is perhaps confusing the statement that Solr has no a-priori limit on the number of distinct fields that can be in a corpus with supporting an infinite number of fields. Not having a built-in limit is much different than supporting Whether Solr breaks with thousands and thousands of fields is pretty dependent on what you _do_ with those fields. Simply doing keyword searches isn't going to put the same memory pressure on as, say, faceting on them all (even if in different queries). I'd really ask why so many fields are necessary though. Best, Erick On Sat, May 30, 2015 at 6:18 AM, xavi jmlucjav jmluc...@gmail.com wrote: Hi guys, someone I work with has been advised that currently Solr can support 'infinite' number of fields. I thought there was a practical limitation of say thousands of fields (for sure less than a million), orthings can start to break (I think I remember seeings memory issues reported on the mailing list by several people). Was there any change I missed lately that makes having say 1M fields in Solr practical?? thanks
Re: any changes about limitations on huge number of fields lately?
xavi jmlucjav jmluc...@gmail.com wrote: They reason for such a large number of fields: - users create dynamically 'classes' of documents, say one user creates 10 classes on average - for each 'class', the fields are created like this: unique_id_+fieldname - there are potentially hundreds of thousands of users. Switch to a scheme where you control the names of fields outside of Solr, but share the fields internally: User 1 has 10 custom classes: u1_a, u1_b, u1_c, ... u1_j Internally they are mapped to class1, class2, class3, ... class10 User 2 uses 2 classes: u2_horses, u2_elephants Internally they are mapped to class1, class2 When User 2 queries field u2_horses, you rewrite the query to use class1 instead. There is faceting in each users' fields. So this will result in 1M fields, very sparsely populated. If you are faceting on all of them and if you are not using DocValues, this will explode your memory requirements with vanilla Solr: UnInverted faceting maintains separate a map from all documentIDs to field values (ordinals for Strings) for _all_ the facet fields. Even if you only had 10 million documents and even if your 1 million facet fields all had just 1 value, represented by 1 bit, it would still require 10M * 1M * 1 bits in memory, which is 10 terabyte of RAM. - Toke Eskildsen
Re: any changes about limitations on huge number of fields lately?
Anything more than a few hundred seems very suspicious. Anything more than a few dozen or 50 or 75 seems suspicious as well. The point should not be how crazy can you get with Solr, but that craziness should be avoided altogether! Solr's design is optimal for a large number of relatively small documents, not large documents. -- Jack Krupansky On Sat, May 30, 2015 at 3:05 PM, Erick Erickson erickerick...@gmail.com wrote: Nothing's really changed in that area lately. Your co-worker is perhaps confusing the statement that Solr has no a-priori limit on the number of distinct fields that can be in a corpus with supporting an infinite number of fields. Not having a built-in limit is much different than supporting Whether Solr breaks with thousands and thousands of fields is pretty dependent on what you _do_ with those fields. Simply doing keyword searches isn't going to put the same memory pressure on as, say, faceting on them all (even if in different queries). I'd really ask why so many fields are necessary though. Best, Erick On Sat, May 30, 2015 at 6:18 AM, xavi jmlucjav jmluc...@gmail.com wrote: Hi guys, someone I work with has been advised that currently Solr can support 'infinite' number of fields. I thought there was a practical limitation of say thousands of fields (for sure less than a million), orthings can start to break (I think I remember seeings memory issues reported on the mailing list by several people). Was there any change I missed lately that makes having say 1M fields in Solr practical?? thanks
Re: any changes about limitations on huge number of fields lately?
xavi jmlucjav jmluc...@gmail.com wrote: I think the plan is to facet only on class_u1, class_u2 for queries from user1, etc. So faceting would not happen on all fields on a single query. I understand that, but most of the created structures stays in memory between calls (DocValues helps here). Your heap will slowly fill up as more and more users perform faceted queries on their content. - Toke Eskildsen
Re: any changes about limitations on huge number of fields lately?
They reason for such a large number of fields: - users create dynamically 'classes' of documents, say one user creates 10 classes on average - for each 'class', the fields are created like this: unique_id_+fieldname - there are potentially hundreds of thousands of users. There is faceting in each users' fields. So this will result in 1M fields, very sparsely populated. I warned them this did not sound like a good design to me, but apparently someone very knowledgeable in solr said this will work out fine. That is why I wanted to double check... On Sat, May 30, 2015 at 9:22 PM, Jack Krupansky jack.krupan...@gmail.com wrote: Anything more than a few hundred seems very suspicious. Anything more than a few dozen or 50 or 75 seems suspicious as well. The point should not be how crazy can you get with Solr, but that craziness should be avoided altogether! Solr's design is optimal for a large number of relatively small documents, not large documents. -- Jack Krupansky On Sat, May 30, 2015 at 3:05 PM, Erick Erickson erickerick...@gmail.com wrote: Nothing's really changed in that area lately. Your co-worker is perhaps confusing the statement that Solr has no a-priori limit on the number of distinct fields that can be in a corpus with supporting an infinite number of fields. Not having a built-in limit is much different than supporting Whether Solr breaks with thousands and thousands of fields is pretty dependent on what you _do_ with those fields. Simply doing keyword searches isn't going to put the same memory pressure on as, say, faceting on them all (even if in different queries). I'd really ask why so many fields are necessary though. Best, Erick On Sat, May 30, 2015 at 6:18 AM, xavi jmlucjav jmluc...@gmail.com wrote: Hi guys, someone I work with has been advised that currently Solr can support 'infinite' number of fields. I thought there was a practical limitation of say thousands of fields (for sure less than a million), orthings can start to break (I think I remember seeings memory issues reported on the mailing list by several people). Was there any change I missed lately that makes having say 1M fields in Solr practical?? thanks
Re: any changes about limitations on huge number of fields lately?
On Sat, May 30, 2015 at 11:15 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote: xavi jmlucjav jmluc...@gmail.com wrote: I think the plan is to facet only on class_u1, class_u2 for queries from user1, etc. So faceting would not happen on all fields on a single query. I understand that, but most of the created structures stays in memory between calls (DocValues helps here). Your heap will slowly fill up as more and more users perform faceted queries on their content. got it...priceless info, thanks! - Toke Eskildsen