Stack traces are here: https://gist.github.com/kbzod/e6e21ea15cf5670ba534
This time something showed up in the monitor, often there is no stack trace there. The thread dump is from setting ACCUMULO_KILL_CMD to "kill -3 %p". Thanks again Bill On Tue, May 27, 2014 at 5:09 PM, Bill Havanki <[email protected]>wrote: > I left the default key size constraint in place. I had set the tserver > mesage size up from 1 GB to 1.5 GB, but it didn't help. (I forgot that > config item.) > > Stack trace(s) coming up! I got tired of failures all day so I'm running a > different test that will hopefully work. I'll re-break it shortly :D > > > On Tue, May 27, 2014 at 5:04 PM, Josh Elser <[email protected]> wrote: > >> Stack traces would definitely be helpful, IMO. >> >> (or interesting if nothing else :D) >> >> >> On 5/27/14, 4:55 PM, Bill Havanki wrote: >> >>> No sir. I am seeing general out of heap space messages, nothing about >>> direct buffers. One specific example would be while Thrift is writing to >>> a >>> ByteArrayOutputStream to send off scan results. (I can get an exact stack >>> trace - easily :} - if it would be helpful.) It seems as if there just >>> isn't enough heap left, after controlling for what I have so far. >>> >>> As a clarification of my original email: each row has 100 cells, and each >>> cell has a 100 MB value. So, one row would occupy just over 10 GB. >>> >>> >>> On Tue, May 27, 2014 at 4:49 PM, <[email protected]> wrote: >>> >>> Are you seeing something similar to the error in >>>> https://issues.apache.org/jira/browse/ACCUMULO-2495? >>>> >>>> ----- Original Message ----- >>>> >>>> From: "Bill Havanki" <[email protected]> >>>> To: "Accumulo Dev List" <[email protected]> >>>> Sent: Tuesday, May 27, 2014 4:30:59 PM >>>> Subject: Supporting large values >>>> >>>> I'm trying to run a stress test where each row in a table has 100 cells, >>>> each with a value of 100 MB of random data. (This is using Bill Slacum's >>>> memory stress test tool). Despite fiddling with the cluster >>>> configuration, >>>> I always run out of tablet server heap space before too long. >>>> >>>> Here are the configurations I've tried so far, with valuable guidance >>>> from >>>> Busbey and madrob: >>>> >>>> - native maps are enabled, tserver.memory.maps.max = 8G >>>> - table.compaction.minor.logs.threshold = 8 >>>> - tserver.walog.max.size = 1G >>>> - Tablet server has 4G heap (-Xmx4g) >>>> - table is pre-split into 8 tablets (split points 0x20, 0x40, 0x60, >>>> ...), 5 >>>> tablet servers are available >>>> - tserver.cache.data.size = 256M >>>> - tserver.cache.index.size = 40M (keys are small - 4 bytes - in this >>>> test) >>>> - table.scan.max.memory = 256M >>>> - tserver.readahead.concurrent.max = 4 (default is 16) >>>> >>>> It's often hard to tell where the OOM error comes from, but I have seen >>>> it >>>> frequently coming from Thrift as it is writing out scan results. >>>> >>>> Does anyone have any good conventions for supporting large values? >>>> (Warning: I'll want to work on large keys (and tiny values) next! :) ) >>>> >>>> Thanks very much >>>> Bill >>>> >>>> -- >>>> // Bill Havanki >>>> // Solutions Architect, Cloudera Govt Solutions >>>> // 443.686.9283 >>>> >>>> >>>> >>> >>> > > > -- > // Bill Havanki > // Solutions Architect, Cloudera Govt Solutions > // 443.686.9283 > -- // Bill Havanki // Solutions Architect, Cloudera Govt Solutions // 443.686.9283
