Re: any changes about limitations on huge number of fields lately?

2015-05-30 Thread xavi jmlucjav
Thanks Toke for the input.

I think the plan is to facet only on class_u1, class_u2 for queries from
user1, etc. So faceting would not happen on all fields on a single query.
But still.

I did not design the schema, just found out about the number of fields and
advised again that, when they asked for a second opinion. We did not get to
discuss a different schema, but if we get to this point I will take that
plan into consideration for sure.

xavi

On Sat, May 30, 2015 at 10:17 PM, Toke Eskildsen t...@statsbiblioteket.dk
wrote:

 xavi jmlucjav jmluc...@gmail.com wrote:
  They reason for such a large number of fields:
  - users create dynamically 'classes' of documents, say one user creates
 10
  classes on average
  - for each 'class', the fields are created like this:
 unique_id_+fieldname
  - there are potentially hundreds of thousands of users.

 Switch to a scheme where you control the names of fields outside of Solr,
 but share the fields internally:

 User 1 has 10 custom classes: u1_a, u1_b, u1_c, ... u1_j
 Internally they are mapped to class1, class2, class3, ... class10

 User 2 uses 2 classes: u2_horses, u2_elephants
 Internally they are mapped to class1, class2

 When User 2 queries field u2_horses, you rewrite the query to use class1
 instead.

  There is faceting in each users' fields.
  So this will result in 1M fields, very sparsely populated.

 If you are faceting on all of them and if you are not using DocValues,
 this will explode your memory requirements with vanilla Solr: UnInverted
 faceting maintains separate a map from all documentIDs to field values
 (ordinals for Strings) for _all_ the facet fields. Even if you only had 10
 million documents and even if your 1 million facet fields all had just 1
 value, represented by 1 bit, it would still require 10M * 1M * 1 bits in
 memory, which is 10 terabyte of RAM.

 - Toke Eskildsen



Re: any changes about limitations on huge number of fields lately?

2015-05-30 Thread Erick Erickson
Nothing's really changed in that area lately. Your co-worker is
perhaps confusing the statement that Solr has no a-priori limit on
the number of distinct fields that can be in a corpus with supporting
an infinite number of fields. Not having a built-in limit is much
different than supporting

Whether Solr breaks with thousands and thousands of fields is pretty
dependent on what you _do_ with those fields. Simply doing keyword
searches isn't going to put the same memory pressure on as, say,
faceting on them all (even if in different queries).

I'd really ask why so many fields are necessary though.

Best,
Erick

On Sat, May 30, 2015 at 6:18 AM, xavi jmlucjav jmluc...@gmail.com wrote:
 Hi guys,

 someone I work with has been advised that currently Solr can support
 'infinite' number of fields.

 I thought there was a practical limitation of say thousands of fields (for
 sure less than a million), orthings can start to break (I think I
 remember seeings memory issues reported on the mailing list by several
 people).


 Was there any change I missed lately that makes having say 1M fields in
 Solr practical??

 thanks


Re: any changes about limitations on huge number of fields lately?

2015-05-30 Thread Toke Eskildsen
xavi jmlucjav jmluc...@gmail.com wrote:
 They reason for such a large number of fields:
 - users create dynamically 'classes' of documents, say one user creates 10
 classes on average
 - for each 'class', the fields are created like this: unique_id_+fieldname
 - there are potentially hundreds of thousands of users.

Switch to a scheme where you control the names of fields outside of Solr, but 
share the fields internally:

User 1 has 10 custom classes: u1_a, u1_b, u1_c, ... u1_j
Internally they are mapped to class1, class2, class3, ... class10

User 2 uses 2 classes: u2_horses, u2_elephants
Internally they are mapped to class1, class2

When User 2 queries field u2_horses, you rewrite the query to use class1 
instead.

 There is faceting in each users' fields.
 So this will result in 1M fields, very sparsely populated.

If you are faceting on all of them and if you are not using DocValues, this 
will explode your memory requirements with vanilla Solr: UnInverted faceting 
maintains separate a map from all documentIDs to field values (ordinals for 
Strings) for _all_ the facet fields. Even if you only had 10 million documents 
and even if your 1 million facet fields all had just 1 value, represented by 1 
bit, it would still require 10M * 1M * 1 bits in memory, which is 10 terabyte 
of RAM.

- Toke Eskildsen


Re: any changes about limitations on huge number of fields lately?

2015-05-30 Thread Jack Krupansky
Anything more than a few hundred seems very suspicious.

Anything more than a few dozen or 50 or 75 seems suspicious as well.

The point should not be how crazy can you get with Solr, but that craziness
should be avoided altogether!

Solr's design is optimal for a large number of relatively small documents,
not large documents.


-- Jack Krupansky

On Sat, May 30, 2015 at 3:05 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Nothing's really changed in that area lately. Your co-worker is
 perhaps confusing the statement that Solr has no a-priori limit on
 the number of distinct fields that can be in a corpus with supporting
 an infinite number of fields. Not having a built-in limit is much
 different than supporting

 Whether Solr breaks with thousands and thousands of fields is pretty
 dependent on what you _do_ with those fields. Simply doing keyword
 searches isn't going to put the same memory pressure on as, say,
 faceting on them all (even if in different queries).

 I'd really ask why so many fields are necessary though.

 Best,
 Erick

 On Sat, May 30, 2015 at 6:18 AM, xavi jmlucjav jmluc...@gmail.com wrote:
  Hi guys,
 
  someone I work with has been advised that currently Solr can support
  'infinite' number of fields.
 
  I thought there was a practical limitation of say thousands of fields
 (for
  sure less than a million), orthings can start to break (I think I
  remember seeings memory issues reported on the mailing list by several
  people).
 
 
  Was there any change I missed lately that makes having say 1M fields in
  Solr practical??
 
  thanks



Re: any changes about limitations on huge number of fields lately?

2015-05-30 Thread Toke Eskildsen
xavi jmlucjav jmluc...@gmail.com wrote:
 I think the plan is to facet only on class_u1, class_u2 for queries from
 user1, etc. So faceting would not happen on all fields on a single query.

I understand that, but most of the created structures stays in memory between 
calls (DocValues helps here). Your heap will slowly fill up as more and more 
users perform faceted queries on their content.

- Toke Eskildsen


Re: any changes about limitations on huge number of fields lately?

2015-05-30 Thread xavi jmlucjav
They reason for such a large number of fields:
- users create dynamically 'classes' of documents, say one user creates 10
classes on average
- for each 'class', the fields are created like this: unique_id_+fieldname
- there are potentially hundreds of thousands of users.

There is faceting in each users' fields.

So this will result in 1M fields, very sparsely populated. I warned them
this did not sound like a good design to me, but apparently someone very
knowledgeable in solr said this will work out fine. That is why I wanted to
double check...

On Sat, May 30, 2015 at 9:22 PM, Jack Krupansky jack.krupan...@gmail.com
wrote:

 Anything more than a few hundred seems very suspicious.

 Anything more than a few dozen or 50 or 75 seems suspicious as well.

 The point should not be how crazy can you get with Solr, but that craziness
 should be avoided altogether!

 Solr's design is optimal for a large number of relatively small documents,
 not large documents.


 -- Jack Krupansky

 On Sat, May 30, 2015 at 3:05 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  Nothing's really changed in that area lately. Your co-worker is
  perhaps confusing the statement that Solr has no a-priori limit on
  the number of distinct fields that can be in a corpus with supporting
  an infinite number of fields. Not having a built-in limit is much
  different than supporting
 
  Whether Solr breaks with thousands and thousands of fields is pretty
  dependent on what you _do_ with those fields. Simply doing keyword
  searches isn't going to put the same memory pressure on as, say,
  faceting on them all (even if in different queries).
 
  I'd really ask why so many fields are necessary though.
 
  Best,
  Erick
 
  On Sat, May 30, 2015 at 6:18 AM, xavi jmlucjav jmluc...@gmail.com
 wrote:
   Hi guys,
  
   someone I work with has been advised that currently Solr can support
   'infinite' number of fields.
  
   I thought there was a practical limitation of say thousands of fields
  (for
   sure less than a million), orthings can start to break (I think I
   remember seeings memory issues reported on the mailing list by several
   people).
  
  
   Was there any change I missed lately that makes having say 1M fields in
   Solr practical??
  
   thanks
 



Re: any changes about limitations on huge number of fields lately?

2015-05-30 Thread xavi jmlucjav
On Sat, May 30, 2015 at 11:15 PM, Toke Eskildsen t...@statsbiblioteket.dk
wrote:

 xavi jmlucjav jmluc...@gmail.com wrote:
  I think the plan is to facet only on class_u1, class_u2 for queries from
  user1, etc. So faceting would not happen on all fields on a single query.

 I understand that, but most of the created structures stays in memory
 between calls (DocValues helps here). Your heap will slowly fill up as more
 and more users perform faceted queries on their content.

got it...priceless info, thanks!



 - Toke Eskildsen