from:"Chris Mildebrandt"

Re: UnavailableShardsException after loading 1.5M documents

2014-06-09 Thread Chris Mildebrandt

Well, I hit the same error again after 6.7M documents and the increased 
heap size. 

After a restart of the cluster, it's taking a very long time to bring up 
the replica shards. Until I have about 50% of my shards (primary + replica) 
initialized, I'm unable to load any more documents. When I try to bulk 
load, I fail after about another 1000 documents. Here's my current cluster 
stats from Marvel:

Nodes: 6
Indices: 106
Shards: 2342
Data: 1.14 TB
CPU: 238%
Memory: 44.21 GB / 719.90 GB
Up time: 3.4 h
Version: 1.1.1

My total number of shards is actually 3124, still waiting on several 
hundred to come back after three and a half hours. I have the following 
questions:

   - Is there an index size to heap size ratio that we should be adhering? 
   - What would prevent us from indexing additional documents? We seem to 
   be no where close to filling the memory allocated to elasticsearch, and the 
   CPU usage has remained low.
   - I just changed the es.logger.level to DEBUG, restarted the cluster, 
   waited for all the primary shards to initialize, and submitted a document 
   for indexing. It failed with the UnavailableShardsException and nothing 
   appeared in the logs on the node where I submitted the request. Is there 
   somewhere else I should be looking?
   - I've given no tuning parameters to the indexes or the system as a 
   whole. Is there something I may be missing?
   
I'm sure this is something that could be solved on my side with some change 
in parameters. Any ideas what I can try?

Thanks,
-Chris

On Friday, June 6, 2014 4:59:49 PM UTC-7, Chris Mildebrandt wrote:
>
> It looks like the problem is on my end. I misplaced the HEAP size 
> parameter and was only running with 1GB. After bumping it up to a more 
> respectable amount, the loading is humming along again.
>
> -Chris
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f2e4833c-7fc0-4ca6-8a07-81ee63a0cceb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: UnavailableShardsException after loading 1.5M documents

2014-06-06 Thread Chris Mildebrandt

It looks like the problem is on my end. I misplaced the HEAP size parameter 
and was only running with 1GB. After bumping it up to a more respectable 
amount, the loading is humming along again.

-Chris

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1270ea62-1b90-4e52-950d-84776ba3a668%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

UnavailableShardsException after loading 1.5M documents

2014-06-06 Thread Chris Mildebrandt



Hi all,


I'm using the Python API (pyes) to perform the bulk loading of our data, 
here's the important part of the code:


import os
from pyes import ES

max_docs = 1
es = ES(server='hadoop42.robinsystems.com:9200')

for prefix in xrange(1, 105):
f_name = os.path.join('data', str(prefix) + '.json')
with open(f_name, 'rb') as f:
for line in f:
es.index(line, str(prefix), 'my_type', bulk=True)


It loops through files (1.json, 2.json, 3.json, etc) and loads them into 
indexes ('1', '2', '3', etc). The API does 400 documents at a time. It hums 
along until about 1.5M documents, then the process fails with the following 
error:

Traceback (most recent call last):
  File "load_data.py", line 24, in 
es.index(line, str(prefix), 'my_type', bulk=True)
  File "/usr/local/lib/python2.7/site-packages/pyes/es.py", line 729, in 
index
return self.flush_bulk()
  File "/usr/local/lib/python2.7/site-packages/pyes/es.py", line 763, in 
flush_bulk
return self.bulker.flush_bulk(forced)
  File "/usr/local/lib/python2.7/site-packages/pyes/models.py", line 204, 
in flush_bulk
"\n".join(batch) + "\n")
  File "/usr/local/lib/python2.7/site-packages/pyes/es.py", line 441, in 
_send_request
response = self.connection.execute(request)
  File "/usr/local/lib/python2.7/site-packages/pyes/connection_http.py", 
line 109, in execute
self._local.server = server = self._get_server()
  File "/usr/local/lib/python2.7/site-packages/pyes/connection_http.py", 
line 145, in _get_server
raise NoServerAvailable(ex)
pyes.exceptions.NoServerAvailable: list index out of range


After that, I can't even load one document into the system:

curl -XPOST http://hadoop42.robinsystems.com:9200/_bulk --data-binary 
@t.json

{"took":60001,"errors":true,"items":[{"create":{"_index":"21","_type":"my_type","_id":"unj0OWVgQZCNXYqfChaOVg","status":503,"error":"UnavailableShardsException[[21][5]
 
[3] shardIt, [1] active : Timeout waiting for [1m], request: 
org.elasticsearch.action.bulk.BulkShardRequest@5e27693e]"}}]}

The t.json file has one document in it. I restarted the cluster and I get 
the same error. All my primary shards are active, the replicas are coming 
up slowly. The current state of the cluster is yellow. I would expect to be 
able to still load documents in this state. 

Here are some more details of our setup:


   - 6 node cluster with 256GB RAM, 120GB set as ES_HEAP
   - 104 indexes with 10 shards each and 2 replicas
   - Each index holds 80,000 documents and each document is about 20KB
   

Any idea why I'd be unable to load documents into my cluster after this 
point?

Thanks,
-Chris

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/912ea701-bc14-455b-a023-f0f644b9f5de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: UnavailableShardsException after loading 1.5M documents

Re: UnavailableShardsException after loading 1.5M documents

UnavailableShardsException after loading 1.5M documents

3 matches

Site Navigation

Mail list logo

Footer information