subject:"Elasticsearch _bulk periodically unavailable through Python API, but seems fine otherwise"

Re: Elasticsearch _bulk periodically unavailable through Python API, but seems fine otherwise

2014-09-23 Thread Henrik Ossipoff Hansen

Hi Honza,

Hoped you would be lurking around here :)

I almost feel ashamed for not thinking about that myself - it makes totally 
sense. The default is 10 seconds as I recall, and we haven't tampered with 
this in this particular one of our projects. It also makes sense that we 
are seeing this type of error more on our test cluster, since it has much 
worse hardware than our production nodes.

Moreover, your comment has likely put a point on another issue we see in a 
different application, where we have changed the default timeout, but to 
something I'm now realising is likely way too low.

Thank you -- I'll be monitoring the issue for sure.

I'll report back once we get our test set up again in a few hours.

Regards,
Henrik

Den tirsdag den 23. september 2014 22.59.32 UTC+2 skrev Honza Král:
>
> Hello Henrik, 
>
> My guess would be that the timeouts you are seeing in python are 
> causing this - when the python client encounters a timeout, it retries 
> the request, thinking something went wrong. Thus if your timeout is 
> too small it can actually lead to spamming the cluster - when the 
> python client encounters a timeout the request is still running in the 
> server so each retry will spawn another bulk request being processed. 
>
> The solution should be to make the timeout for your python client higher. 
>
> I am starting to consider this a bug so I created a ticket to fix it, 
> you can track it on github: 
>
> https://github.com/elasticsearch/elasticsearch-py/issues/128 
>
>
> Hope this helps, 
> Honza 
>
> On Tue, Sep 23, 2014 at 9:48 PM, Henrik Ossipoff Hansen 
> > wrote: 
> > We're currently working with re-indexing one of our indexes on one of 
> our 
> > test servers, and we appear to be having issues. The operation should be 
> > fairly obvious, and there's not a lot of data -- around 320.000 total 
> > documents in the index, at around 1.15GB of data. We scan from the old 
> > index, and bulk index 1000 documents at a time (we've also tried 500 
> > documents at a time to no avail). Since our production indexes are more 
> > around the 30GB size, we're a bit discouraged by hitting this wall. 
> > 
> > Once in a while, we get a random "resource not available" error, but 
> they 
> > usually resolves themselves and go away -- however, after indexing 
> around 
> > 260.000 documents, the Python script starts behaving weirdly, throwing a 
> lot 
> > of resource not available errors. A full gist is here: 
> > https://gist.github.com/HenrikOssipoff/ac63060d4c18182719dd (note that 
> this 
> > is Python 3.4, so chained exceptions and everything -- the top exception 
> is 
> > not actually the important one). 
> > 
> > Now, thing is. During this time, Elasticsearch itself seems fine, and 
> the 
> > system is generally available for querying. The JVM heap is fine, and 
> the 
> > Elasticsearch log itself has absolutely no warnings or errors (log level 
> > INFO enabled). 
> > 
> > Our next course of action will be try to doing this by just cURLs 
> outside of 
> > Python, or even enabling debug logging in Elasticsearch, but we wanted 
> to 
> > see if anyone has some ideas for what could be the course. 
> > 
> > It seems logical, that since it only happens every one in a while, and 
> gets 
> > worse the more documents we index, it has something to do with the 
> amount of 
> > _bulk indexes we do, but we can't seem to find anything in logs or stats 
> > that support this. 
> > 
> > Any help appreciated, or any pointers in the right direction. 
> > 
> > Thank you! 
> > 
> > Regards, 
> > Henrik Ossipoff Hansen 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "elasticsearch" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to elasticsearc...@googlegroups.com . 
> > To view this discussion on the web visit 
> > 
> https://groups.google.com/d/msgid/elasticsearch/2afa6d99-b9d4-42a2-9f4e-b717a3a41251%40googlegroups.com.
>  
>
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4e98937d-9c5c-4ffb-8e90-95796deb6293%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch _bulk periodically unavailable through Python API, but seems fine otherwise

2014-09-23 Thread Honza Král

Hello Henrik,

My guess would be that the timeouts you are seeing in python are
causing this - when the python client encounters a timeout, it retries
the request, thinking something went wrong. Thus if your timeout is
too small it can actually lead to spamming the cluster - when the
python client encounters a timeout the request is still running in the
server so each retry will spawn another bulk request being processed.

The solution should be to make the timeout for your python client higher.

I am starting to consider this a bug so I created a ticket to fix it,
you can track it on github:

https://github.com/elasticsearch/elasticsearch-py/issues/128


Hope this helps,
Honza

On Tue, Sep 23, 2014 at 9:48 PM, Henrik Ossipoff Hansen
 wrote:
> We're currently working with re-indexing one of our indexes on one of our
> test servers, and we appear to be having issues. The operation should be
> fairly obvious, and there's not a lot of data -- around 320.000 total
> documents in the index, at around 1.15GB of data. We scan from the old
> index, and bulk index 1000 documents at a time (we've also tried 500
> documents at a time to no avail). Since our production indexes are more
> around the 30GB size, we're a bit discouraged by hitting this wall.
>
> Once in a while, we get a random "resource not available" error, but they
> usually resolves themselves and go away -- however, after indexing around
> 260.000 documents, the Python script starts behaving weirdly, throwing a lot
> of resource not available errors. A full gist is here:
> https://gist.github.com/HenrikOssipoff/ac63060d4c18182719dd (note that this
> is Python 3.4, so chained exceptions and everything -- the top exception is
> not actually the important one).
>
> Now, thing is. During this time, Elasticsearch itself seems fine, and the
> system is generally available for querying. The JVM heap is fine, and the
> Elasticsearch log itself has absolutely no warnings or errors (log level
> INFO enabled).
>
> Our next course of action will be try to doing this by just cURLs outside of
> Python, or even enabling debug logging in Elasticsearch, but we wanted to
> see if anyone has some ideas for what could be the course.
>
> It seems logical, that since it only happens every one in a while, and gets
> worse the more documents we index, it has something to do with the amount of
> _bulk indexes we do, but we can't seem to find anything in logs or stats
> that support this.
>
> Any help appreciated, or any pointers in the right direction.
>
> Thank you!
>
> Regards,
> Henrik Ossipoff Hansen
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/2afa6d99-b9d4-42a2-9f4e-b717a3a41251%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CABfdDioPUGyaJQmOXPv9SUoTfynDbTfnvfAQnrQGGhE_NVr1GA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Elasticsearch _bulk periodically unavailable through Python API, but seems fine otherwise

2014-09-23 Thread Henrik Ossipoff Hansen

We're currently working with re-indexing one of our indexes on one of our
test servers, and we appear to be having issues. The operation should be
fairly obvious, and there's not a lot of data -- around 320.000 total
documents in the index, at around 1.15GB of data. We scan from the old
index, and bulk index 1000 documents at a time (we've also tried 500
documents at a time to no avail). Since our production indexes are more
around the 30GB size, we're a bit discouraged by hitting this wall.

Once in a while, we get a random "resource not available" error, but they
usually resolves themselves and go away -- however, after indexing around
260.000 documents, the Python script starts behaving weirdly, throwing *a
lot* of resource not available errors. A full gist is
here: https://gist.github.com/HenrikOssipoff/ac63060d4c18182719dd (note
that this is Python 3.4, so chained exceptions and everything -- the top
exception is not actually the important one).

Now, thing is. During this time, Elasticsearch itself seems fine, and the
system is generally available for querying. The JVM heap is fine, and the
Elasticsearch log itself has absolutely no warnings or errors (log level
INFO enabled).

Our next course of action will be try to doing this by just cURLs outside
of Python, or even enabling debug logging in Elasticsearch, but we wanted
to see if anyone has some ideas for what could be the course.

It seems logical, that since it only happens every one in a while, and gets
worse the more documents we index, it has something to do with the amount
of _bulk indexes we do, but we can't seem to find anything in logs or stats
that support this.

Any help appreciated, or any pointers in the right direction.

Thank you!

Regards,
Henrik Ossipoff Hansen

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2afa6d99-b9d4-42a2-9f4e-b717a3a41251%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch _bulk periodically unavailable through Python API, but seems fine otherwise

Re: Elasticsearch _bulk periodically unavailable through Python API, but seems fine otherwise

Elasticsearch _bulk periodically unavailable through Python API, but seems fine otherwise

3 matches

Site Navigation

Mail list logo

Footer information