Thank you for the suggestions. A couple clarifications and additional 
information:

- We do use bulk indexing, as you inferred, using the Java TransportClient 
directly from our Java application.

- This index, however, does not use dynamic mapping. I included the mapping 
in my first post and we don't send any other fields to ES for this index. 
It acts as a type of object storage for us and we do not query against it. 
(I understand this is an atypical use case and not exactly what ES is 
designed for. But we were so impressed by the feature set of ES, we are 
trying to use it for more than just searching. Outside of this one issue, 
it has performed excellently. And, as stated, we did not see this issue in 
earlier versions.)

- We have 20 nodes in our cluster with 2 replicas. When we first 
experienced this issue, we only had 2 or 3 clients doing bulk indexing into 
the cluster. Each client is single-threaded and waits for each bulk 
operation to finish before issuing the next one.

I will try making the following changes to see what effect it may have:

- Reduce the number of replicas from 2 to 1.

- Disable dynamic mapping (This *should* have no net effect. But, it 
shouldn't hurt either, since we don't require this functionality)

Lastly, to test, I will reduce to 1 client doing bulk indexing to see if 
that helps narrow down the problem. But, it is not a long term solution for 
us since our steady flow of new data is so high, we would not be able to 
keep up in production with only one indexing thread.


On Monday, June 16, 2014 4:15:20 PM UTC-5, Jörg Prante wrote:
>
> I guess you hit the following condition:
>
> - you insert data with bulk indexing
>
> - your index has dynamic mapping and already has huge field mappings
>
> - bulk requests span over many nodes / shards / replicas and introduce 
> tons of new fields into the dynamic mapping
>
> - you do not wait for bulk responses before sending new bulk requests
>
> That is, ES tries heavily to create the new field mappings but the result 
> of the new mapping does not make it to the other node in time before new 
> bulks arrive at the other node. The node just sees there must be a mapping 
> for a new field, but the cluster state has none to present although the 
> field was being mapped.
>
> Maybe the cluster state is not sent at all, or it could not be read fully 
> from disk, or it is "stuck" somewhere else.
>
> ES tries hard to prevent such conditions by assigning high priority to 
> cluster state messages that are sent throughout the cluster. Also, ES 
> avoids flooding of such messages.
>
> Your observation is correct: the longer you execute bulk indexing with the 
> same type of data (except random data), the number of new field mappings 
> decreases over time, so the number of new ES cluster state promotions.
>
> You can try the following to tackle this challenge:
>
> - pre-create the field mappings for your indexes, or even better, 
> pre-create indices and disable dynamic mapping, so no cluster state changes 
> have to be promoted
>
> - switch to synchronous bulk requests, or reduce concurrency in your bulk 
> requests. So you let the bulk indexing routine wait for the cluster state 
> changes to be consistent at all nodes.
>
> - reduce the (perhaps huge) number of field mappings (more a question 
> about the type of data you index)
>
> - reduce number of nodes (obviously an anti-pattern) 
>
> - or reduce replica level (always a good thing for efficiency while using 
> bulk indexing), to give the cluster some breath to broadcast the new 
> cluster states in shorter time to the corresponding nodes
>
> Jörg
>
>
>
> On Mon, Jun 16, 2014 at 10:34 PM, Brooke Babcock <brooke...@gmail.com 
> <javascript:>> wrote:
>
>> Thanks for the reply. 
>> We've checked the log files on all the nodes - no errors or warnings. 
>> Disks were practically empty - it was a fresh cluster, fresh index.
>>
>> We have noticed that the problem occurs less frequently the more data we 
>> send to the cluster. Our latest theory is that it "corrects itself" 
>> (meaning, we are able to get by _id again) once a flush occurs. So by 
>> sending it more data, we are ensuring that flushes happen more often.
>>
>>
>> On Monday, June 16, 2014 8:05:15 AM UTC-5, Alexander Reelsen wrote:
>>
>>> Hey,
>>>
>>> it seems, as if writing into the translog fails at some stage (from a 
>>> complete birds eye view). Can you check your logfiles, if you ran into some 
>>> weird exceptions before that happens? Also, you did not run out of disk 
>>> space at any time when this has happened?
>>>
>>>
>>> --Alex
>>>
>>>
>>> On Fri, Jun 6, 2014 at 8:39 PM, Brooke Babcock <brooke...@gmail.com> 
>>> wrote:
>>>
>>>> In one part of our application we use Elasticsearch as an object store. 
>>>> Therefore, when indexing, we supply our own _id. Likewise, when accessing 
>>>> a 
>>>> document we use the simple GET method to fetch by _id. This has worked 
>>>> well 
>>>> for us, up until recently. Normally, this is what we get:
>>>>
>>>> curl -XGET 'http://127.0.0.1:9200/data-2014.06.06/key/test1?pretty=true
>>>> '
>>>> {
>>>>   "_index" : "data-2014.06.06",
>>>>   "_type" : "key",
>>>>   "_id" : "test1",
>>>>   "_version" : 1,
>>>>   "found" : true,
>>>>   "_source":{"sData":"test data 1"}
>>>> }
>>>>
>>>>
>>>> Now, we often encounter a recently indexed document that throws the 
>>>> following error when we try to fetch it:
>>>>
>>>> curl -XGET 'http://127.0.0.1:9200/data-2014.06.06/key/test2?pretty=true
>>>> '
>>>> {
>>>>   "error":"IllegalArgumentException[No type mapped for [43]]",
>>>>   "status":500
>>>> }
>>>>
>>>>
>>>>
>>>> This condition persists anywhere from 1 to 25 minutes or so, at which 
>>>> point we no longer receive the error for that document and the GET 
>>>> succeeds 
>>>> as normal. From that point on, we are able to consistently retrieve that 
>>>> document by _id without issue. But, soon after, we will find a different 
>>>> newly indexed document caught in the same bad state.
>>>>
>>>> We know the documents are successfully indexed. Our bulk sender (which 
>>>> uses the Java transport client) indicates no error during indexing and 
>>>> we are still able to locate the document by doing an ids query, such as:
>>>>
>>>> curl -XPOST "http://127.0.0.1:9200/data-2014.06.06/key/_search?pretty=
>>>> true" -d '
>>>> {
>>>>   "query": {
>>>>     "ids": {
>>>>       "values": ["test2"]
>>>>     }
>>>>   }
>>>> }'
>>>>
>>>> Which responds:
>>>> {
>>>>    "took": 543,
>>>>    "timed_out": false,
>>>>    "_shards": {
>>>>       "total": 10,
>>>>       "successful": 10,
>>>>       "failed": 0
>>>>    },
>>>>    "hits": {
>>>>       "total": 1,
>>>>       "max_score": 1.0,
>>>>       "hits": [ {
>>>>          "_index": "data-2014.06.06",
>>>>          "_type": "key",
>>>>          "_id": "test2",
>>>>          "_score": 1.0,
>>>>          "_source":{"sData": "test data 2"}
>>>>       } ]
>>>>    }
>>>> }
>>>>
>>>>
>>>> We first noticed this behavior in version 1.2.0. When we upgraded to 
>>>> 1.2.1, we deleted all indexes and started with a fresh cluster. We hoped 
>>>> our problem would be solved by the big fix that came in 1.2.1, but we are 
>>>> still regularly seeing it. Although our situation may sound like the 
>>>> routing bug introduced in 1.2.0, we are certain that it is not. This 
>>>> appears to be a significant issue with the translog - we hope the 
>>>> developers will be able to look at what may have changed. We did not 
>>>> notice 
>>>> this problem in version 1.1.1.
>>>>
>>>> Just in case, here is the mapping being used:
>>>> curl -XGET 'http://127.0.0.1:9200/data-2014.06.06/key/_mapping?
>>>> pretty=true'
>>>> {
>>>>   "data-2014.06.06" : {
>>>>     "mappings" : {
>>>>       "key" : {
>>>>         "_all" : {
>>>>           "enabled" : false
>>>>         },
>>>>         "properties" : {
>>>>           "sData" : {
>>>>             "type" : "string",
>>>>             "index" : "no"
>>>>           }
>>>>         }
>>>>       }
>>>>     }
>>>>   }
>>>> }
>>>>
>>>>
>>>> Thanks for your help.
>>>>
>>>>
>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to elasticsearc...@googlegroups.com.
>>>>
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/20c45cf8-3459-47f5-8cc3-1e63c93b2c0c%
>>>> 40googlegroups.com 
>>>> <https://groups.google.com/d/msgid/elasticsearch/20c45cf8-3459-47f5-8cc3-1e63c93b2c0c%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/8449ec28-7b7f-4e8b-a3c2-6f410ef80187%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/8449ec28-7b7f-4e8b-a3c2-6f410ef80187%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ef2ffea1-d937-45b5-8ee4-a5a636f58a95%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to