I'm not sure what is up but remember that post_ids in the script is a list not a set. You might be growing it without bounds. On Dec 8, 2014 2:49 PM, "Christophe Verbinnen" <djp...@gmail.com> wrote:
> Hello, > > We have a small cluster with 3 nodes running 1.3.6. > > I have an index setup with only two fields. > > { > index: index_name, > body: { > settings: { > number_of_shards: 3, > store: { > type: :mmapfs > } > }, > mappings: { > mapping_name => { > properties: { > :value => {type: 'string', analyzer: 'keyword'}, > :post_ids => {type: 'long', index: 'not_analyzed'} > } > } > } > } > } > > > We are basically storing strings and all the post they are related to. > > The problem is that this data is not stored this way in the database so I > don't have an id to represent each string nor do I have all the post_ids > from the start. > > So I use the sha1 of the string value as id and I use and script to append > to the post_ids. > > Here is my code that I use to index using the bulk api end point. > > def index! > posts_ids = Post.where... > bulk_data = [] > strings.uniq.each do |string| > string_id = Digest::SHA1.hexdigest string > bulk_data << > { > update: > { > _index: 'post_strings', > _type: 'post_string', > _id: string_id, > data: { > script: "ctx._source.post_ids += additional_post_ids", > params: { > additional_post_ids: post_ids > }, > upsert: { > value: string, > post_ids: post_ids > } > } > } > } > if bulk_data.count == 100 > $elasticsearch.bulk :body => bulk_data > bulk_data = [] > end > end > $elasticsearch.bulk :body => bulk_data if bulk_data.any? > end > > So this worked fine for the first 75 Million strings but It was getting > slower and slower until it reached an indexing rate of only 50 doc per sec. > > After that the cluster just killed itself because the nodes couldn't take > to each other. > > I'm gessing all the threads were blocked trying to index and nodes had no > available threads to respond. > > At first I tought it would be related to the sha1 id being not very > efficient but with my test with sequencial ids it was not getting better. > > I'm out of ideas right now. Any help would be greatly appreciated. > > Cheers. > > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/82c27f2c-bf56-4064-80bc-b348203edcb5%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/82c27f2c-bf56-4064-80bc-b348203edcb5%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0_qr%2B-jU%2BYgPiN-hA283aGgoy-UtH3j5-0wEJBCuP2Mg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.