Re: Solr document routing using composite key
Thanks Shawn and Erick. This is what I also ended up finding, as the number of buckets increased, I noticed the issue. Zheng: I am using Solr7. But this was only an experiment on the hash, i.e., what distribution should I expect from it. (as the above gist shows). I didn't actually index into solr7 but would expect it to do something like the above if I had actually indexed in solr with these partitions and Ids. On Fri, Mar 16, 2018 at 9:24 AM, Erick Erickson wrote: > What Shawn said. 117 shards and 116 docs tells you absolutely nothing > useful. I've never seen the number of docs on various shards be off by > more than 2-3% when enough docs are indexed to be statistically valid. > > Best, > Erick > > On Fri, Mar 16, 2018 at 5:34 AM, Shawn Heisey wrote: > > On 3/6/2018 11:53 AM, Nawab Zada Asad Iqbal wrote: > >> > >> I have 117 shards and i tried to use document ids from zero to 116. I > find > >> that the distribution is very uneven, e.g., the largest bucket receives > >> total 5 documents; and around 38 shards will be empty. Is it expected? > > > > > > With such a small data set, this fits what I would expect. > > > > Choosing buckets by hashing (which is what compositeId does) is not > perfect, > > but if you send it thousands or millions of documents, it will be > > *generally* balanced. > > > > Thanks, > > Shawn > > >
Re: Solr document routing using composite key
What Shawn said. 117 shards and 116 docs tells you absolutely nothing useful. I've never seen the number of docs on various shards be off by more than 2-3% when enough docs are indexed to be statistically valid. Best, Erick On Fri, Mar 16, 2018 at 5:34 AM, Shawn Heisey wrote: > On 3/6/2018 11:53 AM, Nawab Zada Asad Iqbal wrote: >> >> I have 117 shards and i tried to use document ids from zero to 116. I find >> that the distribution is very uneven, e.g., the largest bucket receives >> total 5 documents; and around 38 shards will be empty. Is it expected? > > > With such a small data set, this fits what I would expect. > > Choosing buckets by hashing (which is what compositeId does) is not perfect, > but if you send it thousands or millions of documents, it will be > *generally* balanced. > > Thanks, > Shawn >
Re: Solr document routing using composite key
On 3/6/2018 11:53 AM, Nawab Zada Asad Iqbal wrote: I have 117 shards and i tried to use document ids from zero to 116. I find that the distribution is very uneven, e.g., the largest bucket receives total 5 documents; and around 38 shards will be empty. Is it expected? With such a small data set, this fits what I would expect. Choosing buckets by hashing (which is what compositeId does) is not perfect, but if you send it thousands or millions of documents, it will be *generally* balanced. Thanks, Shawn
Re: Solr document routing using composite key
Hi, What version of Solr are you running? How did you configure your shards in Solr? Regards, Edwin On 7 March 2018 at 02:53, Nawab Zada Asad Iqbal wrote: > Hi solr community: > > > I have been thinking to use composite key for my next project iteration and > tried it today to see how it distributes the documents. > > Here is a gist of my code: > https://gist.github.com/niqbal/3e293e2bcb800d6912a250d914c9d478 > > I have 117 shards and i tried to use document ids from zero to 116. I find > that the distribution is very uneven, e.g., the largest bucket receives > total 5 documents; and around 38 shards will be empty. Is it expected? > > In the following result: value1 is the shard number, value 2 is a list of > documents which it received. > > List(98:List(29) > , 34:List(36) > , 8:List(54) > , 73:List(31) > , 19:List(77) > , 23:List(59) > , 62:List(86) > , 77:List(105) > , 11:List(11) > , 104:List(23) > , 44:List(4) > , 37:List(0) > , 61:List(71) > , 107:List(37) > , 46:List(34) > , 99:List(19) > , 24:List(32) > , 94:List(90) > , 103:List(106) > , 72:List(97) > , 59:List(2) > , 76:List(6) > , 54:List(20) > , 65:List(3) > , 71:List(26) > , 108:List(17) > , 106:List(57) > , 17:List(108) > , 25:List(13) > , 60:List(56) > , 102:List(87) > , 69:List(60) > , 64:List(53) > , 53:List(85) > , 42:List(35) > , 115:List(82) > , 0:List(28) > , 20:List(27) > , 81:List(39) > , 101:List(92) > , 30:List(16) > , 41:List(63) > , 3:List(10) > , 91:List(21) > , 85:List(18) > , 28:List(8) > , 113:List(76, 95) > , 51:List(47, 102) > , 78:List(30, 67) > , 4:List(52, 84) > , 110:List(112, 116) > , 9:List(1, 40) > , 50:List(22, 101) > , 13:List(72, 83) > , 35:List(73, 100) > , 16:List(48, 64) > , 112:List(69, 103) > , 10:List(14, 66) > , 87:List(68, 104) > , 57:List(49, 114) > , 36:List(79, 99) > , 1:List(24, 70) > , 96:List(5, 98) > , 95:List(45, 89) > , 75:List(9, 91) > , 70:List(62, 78) > , 2:List(74, 75) > , 114:List(81, 88) > , 74:List(7, 115) > , 52:List(46, 111) > , 55:List(12, 50, 113) > , 47:List(43, 44, 96) > , 92:List(25, 33, 58) > , 39:List(15, 41, 61, 107) > , 21:List(38, 51, 55, 93, 110) > , 27:List(42, 65, 80, 94, 109) > ) >
Solr document routing using composite key
Hi solr community: I have been thinking to use composite key for my next project iteration and tried it today to see how it distributes the documents. Here is a gist of my code: https://gist.github.com/niqbal/3e293e2bcb800d6912a250d914c9d478 I have 117 shards and i tried to use document ids from zero to 116. I find that the distribution is very uneven, e.g., the largest bucket receives total 5 documents; and around 38 shards will be empty. Is it expected? In the following result: value1 is the shard number, value 2 is a list of documents which it received. List(98:List(29) , 34:List(36) , 8:List(54) , 73:List(31) , 19:List(77) , 23:List(59) , 62:List(86) , 77:List(105) , 11:List(11) , 104:List(23) , 44:List(4) , 37:List(0) , 61:List(71) , 107:List(37) , 46:List(34) , 99:List(19) , 24:List(32) , 94:List(90) , 103:List(106) , 72:List(97) , 59:List(2) , 76:List(6) , 54:List(20) , 65:List(3) , 71:List(26) , 108:List(17) , 106:List(57) , 17:List(108) , 25:List(13) , 60:List(56) , 102:List(87) , 69:List(60) , 64:List(53) , 53:List(85) , 42:List(35) , 115:List(82) , 0:List(28) , 20:List(27) , 81:List(39) , 101:List(92) , 30:List(16) , 41:List(63) , 3:List(10) , 91:List(21) , 85:List(18) , 28:List(8) , 113:List(76, 95) , 51:List(47, 102) , 78:List(30, 67) , 4:List(52, 84) , 110:List(112, 116) , 9:List(1, 40) , 50:List(22, 101) , 13:List(72, 83) , 35:List(73, 100) , 16:List(48, 64) , 112:List(69, 103) , 10:List(14, 66) , 87:List(68, 104) , 57:List(49, 114) , 36:List(79, 99) , 1:List(24, 70) , 96:List(5, 98) , 95:List(45, 89) , 75:List(9, 91) , 70:List(62, 78) , 2:List(74, 75) , 114:List(81, 88) , 74:List(7, 115) , 52:List(46, 111) , 55:List(12, 50, 113) , 47:List(43, 44, 96) , 92:List(25, 33, 58) , 39:List(15, 41, 61, 107) , 21:List(38, 51, 55, 93, 110) , 27:List(42, 65, 80, 94, 109) )