[ 
https://issues.apache.org/jira/browse/METRON-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449914#comment-16449914
 ] 

Simon Elliston Ball commented on METRON-1538:
---------------------------------------------

In ES defence, it makes sense to do the uniqueness check, because it can't know 
the guid you give it is a Guid, or that it's not already seen that one. ES 
would have to check even if we did say it was a guid because we can't guarantee 
we're not duplicate indexing or updating an existing doc, so fair play. 

I'm not sure this is something that would be an env by env thing to be honest. 
I know my argument was about a performance tradeoff, but it does feel like 
something we should be opinionated about. I would lean towards having our guid 
just be a field, and letting elastic make up its own ids, which we would pay no 
attention to. 

> Don't use GUIDS for Elastic document id, but autogenerated ID's for 
> performance
> -------------------------------------------------------------------------------
>
>                 Key: METRON-1538
>                 URL: https://issues.apache.org/jira/browse/METRON-1538
>             Project: Metron
>          Issue Type: Improvement
>    Affects Versions: 0.4.3
>            Reporter: Ward Bekker
>            Priority: Major
>              Labels: performance
>
> Metron currently uses GUIDS for ES document Ids, this goes against the best 
> practice:
> "When indexing a document that has an explicit id, Elasticsearch needs to 
> check whether a document with the same id already exists within the same 
> shard, which is a costly operation and gets even more costly as the index 
> grows. By using auto-generated ids, Elasticsearch can skip this check, which 
> makes indexing faster."
> [https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-indexing-]speed.html#_use_auto_generated_ids



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to