Re: 答复: Internals about "Too many values for UnInvertedField faceting on field xxx"
On Mon, May 26, 2014 at 9:21 PM, 张月祥 wrote: > Thanks a lot. > >> There are only 256 byte arrays to hold all of the ord data, and the > pointers into those arrays are only 24 bits long. That gets you back > to 32 bits, or 4GB of ord data max. It's practically less since you > only have to overflow one array before the exception is thrown. > > What does the ord data mean? Term Id or Term-Document Relation or > Document-Term Relation ? Every document has a list of term numbers (term ords) associated with it. The deltas between sorted term numbers are vInt encoded. -Yonik http://heliosearch.org - facet functions, subfacets, off-heap filters&fieldcache
RE: 答复: Internals about "Too many values for UnInvertedField faceting on field xxx"
Thanks a lot. > There are only 256 byte arrays to hold all of the ord data, and the pointers into those arrays are only 24 bits long. That gets you back to 32 bits, or 4GB of ord data max. It's practically less since you only have to overflow one array before the exception is thrown. What does the ord data mean? Term Id or Term-Document Relation or Document-Term Relation ?
Re: 答复: Internals about "Too many values for UnInvertedField faceting on field xxx"
On Sat, May 24, 2014 at 9:50 PM, 张月祥 wrote: > Thanks for your reply. I'll try it. > > We're still interested in the real limitation about "Too many values for > UnInvertedField faceting on field xxx" . > > Could anybody tell us some internals about "Too many values for > UnInvertedField faceting on field xxx" ? There are only 256 byte arrays to hold all of the ord data, and the pointers into those arrays are only 24 bits long. That gets you back to 32 bits, or 4GB of ord data max. It's practically less since you only have to overflow one array before the exception is thrown. This faceting method is best for high numbers of unique values, but a relatively low number of unique values per document. I've been considering making an off-heap version for Heliosearch, and maybe bump the limits a little at the same time... -Yonik http://heliosearch.org - facet functions, subfacets, off-heap filters&fieldcache
答复: Internals about "Too many values for UnInvertedField faceting on field xxx"
Thanks for your reply. I'll try it. We're still interested in the real limitation about "Too many values for UnInvertedField faceting on field xxx" . Could anybody tell us some internals about "Too many values for UnInvertedField faceting on field xxx" ? -邮件原件- 发件人: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 发送时间: 2014年5月24日 0:26 收件人: solr-user@lucene.apache.org 主题: RE: Internals about "Too many values for UnInvertedField faceting on field xxx" 张月祥 [zhan...@calis.edu.cn] wrote: > Could anybody tell us some internals about "Too many values for > UnInvertedField faceting on field xxx" ? I must admit I do not fully understand it in detail, but it is a known problem with Field Cache (facet.method=fc) faceting. The remedy is to use DocValues, which does not have the same limitation. This should also result in lower heap usage. You will have to re-index everything though. We have successfully used DocValues on an index with 400M documents and 300M unique values on a single facet field. - Toke Eskildsen
RE: Internals about "Too many values for UnInvertedField faceting on field xxx"
张月祥 [zhan...@calis.edu.cn] wrote: > Could anybody tell us some internals about "Too many values for > UnInvertedField faceting on field xxx" ? I must admit I do not fully understand it in detail, but it is a known problem with Field Cache (facet.method=fc) faceting. The remedy is to use DocValues, which does not have the same limitation. This should also result in lower heap usage. You will have to re-index everything though. We have successfully used DocValues on an index with 400M documents and 300M unique values on a single facet field. - Toke Eskildsen
Internals about "Too many values for UnInvertedField faceting on field xxx"
Could anybody tell us some internals about "Too many values for UnInvertedField faceting on field xxx" ? We have two solr servers. Solr A : 128G RAM, 60M docs, 2600 different terms with field “code”, every term of field “code” has fixed length 6. the sum count of token of field “code” is 9 Billions. The total space used by field “code” is 50 Billions. Solr B: 128G RAM, 140M docs,1600 different terms with field “code” every term of field “code” has fixed length 6. the sum count of token of field “code” is 18 Billions The total space of field “code” is 90 Billions. When we do facet query “ q=*:*&wt=xml&indent=true&facet=true&facet.field=code” Solr B is OK, BUT Solr A meets Exception with the message “Too many values for UnInvertedField faceting on field code”. Now we think the limitation of UnInvertedField is related with the number of different terms with one field Could anybody tell us some internals about this problem? We won’t to use facet.method=enum because it ‘s too slow to use. Thanks!