I'm not sure what is up but remember that post_ids in the script is a list
not a set. You might be growing it without bounds.
On Dec 8, 2014 2:49 PM, "Christophe Verbinnen" <djp...@gmail.com> wrote:

> Hello,
>
> We have a small cluster with 3 nodes running 1.3.6.
>
> I have an index setup with only two fields.
>
>           {
>             index: index_name,
>             body: {
>               settings: {
>                   number_of_shards: 3,
>                   store: {
>                     type: :mmapfs
>                   }
>               },
>               mappings: {
>                 mapping_name => {
>                   properties: {
>                     :value => {type: 'string', analyzer: 'keyword'},
>                     :post_ids => {type: 'long', index: 'not_analyzed'}
>                   }
>                 }
>               }
>             }
>           }
>
>
> We are basically storing strings and all the post they are related to.
>
> The problem is that this data is not stored this way in the database so I
> don't have an id to represent each string nor do I have all the post_ids
> from the start.
>
> So I use the sha1 of the string value as id and I use and script to append
> to the post_ids.
>
> Here is my code that I use to index using the bulk api end point.
>
> def index!
>   posts_ids = Post.where...
>   bulk_data = []
>   strings.uniq.each do |string|
>     string_id = Digest::SHA1.hexdigest string
>     bulk_data <<
>       {
>         update:
>         {
>           _index: 'post_strings',
>           _type: 'post_string',
>           _id: string_id,
>           data: {
>             script: "ctx._source.post_ids += additional_post_ids",
>             params: {
>               additional_post_ids: post_ids
>             },
>             upsert: {
>               value: string,
>               post_ids: post_ids
>             }
>           }
>         }
>       }
>     if bulk_data.count == 100
>       $elasticsearch.bulk :body => bulk_data
>       bulk_data = []
>     end
>   end
>   $elasticsearch.bulk :body => bulk_data if bulk_data.any?
> end
>
> So this worked fine for the first 75 Million strings but It was getting
> slower and slower until it reached an indexing rate of only 50 doc per sec.
>
> After that the cluster just killed itself because the nodes couldn't take
> to each other.
>
> I'm gessing all the threads were blocked trying to index and nodes had no
> available threads to respond.
>
> At first I tought it would be related to the sha1 id being not very
> efficient but with my test with sequencial ids it was not getting better.
>
> I'm out of ideas right now. Any help would be greatly appreciated.
>
> Cheers.
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/82c27f2c-bf56-4064-80bc-b348203edcb5%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/82c27f2c-bf56-4064-80bc-b348203edcb5%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0_qr%2B-jU%2BYgPiN-hA283aGgoy-UtH3j5-0wEJBCuP2Mg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to