New question #201494 on Graphite:
https://answers.launchpad.net/graphite/+question/201494

I am currently evaluating Graphite performance handling 100k metrics per 
minute. I've created two identical setups on a local VM and a medium instance 
in EC2, made a script which would post new metric "systemN.loadavg_1min {rand} 
{now}" with N ranging from 1 to 50k (sleeping for 0.0006s after each, so that 
there are 100k per minute) and the metric value is random.

After a while I tried counting the number of directories in the storage 
location locally:

me@ubuntu:~/graphite-dev$ ls /opt/graphite/storage/whisper/ | wc
50000   50000  588889

and on EC2 (whisper dir is symlinked to /mnt):

ubuntu@ip-x-x-x-x:/opt/graphite$ ls /mnt/whisper/ | wc
31998   31998  372865

The number 31998 does not grow and the strangest thing is that when I delete 
/mnt/whisper completely, create it back and restart the script, the directory 
count stops at 31998 again.

console.log contains this kind of entries:

26/06/2012 12:27:37 :: Unhandled Error
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/lib/python2.7/dist-packages/twisted/python/threadpool.py", 
line 167, in _worker
    result = context.call(ctx, function, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/twisted/python/context.py", line 
118, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/twisted/python/context.py", line 
81, in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "/opt/graphite/lib/carbon/writer.py", line 158, in writeForever
    writeCachedDataPoints()
  File "/opt/graphite/lib/carbon/writer.py", line 118, in writeCachedDataPoints
    whisper.create(dbFilePath, archiveConfig, xFilesFactor, aggregationMethod, 
settings.WHISPER_SPARSE_CREATE)
  File "/usr/local/lib/python2.7/dist-packages/whisper.py", line 327, in create
    fh = open(path,'wb')
exceptions.IOError: [Errno 2] No such file or directory: 
'/opt/graphite/storage/whisper/system31851/loadavg_1min.wsp'

Obviously the permissions are ok since the rest of the dirs are created, but 
some are not. The box has 1 CPU and 4G memory, the /mnt filesystem has 300GB+ 
of free space.

I have set MAX_CACHE_SIZE to 100000 to force carbon to write the data to disk 
sooner, MAX_UPDATES_PER_SECOND and MAX_CREATES_PER_SECOND are "inf". Hovewer 
the disk usage is not high:

ubuntu@ip-x-x-x-x:/opt/graphite$ iostat -dxk 10

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     0.32    0.48    0.55     5.89     5.98    23.12     
0.01    6.35    8.14    4.77   2.45   0.25
xvdb              0.00   304.39    4.97   60.72    38.24  1460.47    45.63    
10.81  164.52    5.54  177.54   1.26   8.28

I guess since the logs show "unhandled exception", this is due to python 
threads dying together with a part of metrics.

How can I fix that?

-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.

_______________________________________________
Mailing list: https://launchpad.net/~graphite-dev
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~graphite-dev
More help   : https://help.launchpad.net/ListHelp

Reply via email to