I'm trying to run a stress test where each row in a table has 100 cells, each with a value of 100 MB of random data. (This is using Bill Slacum's memory stress test tool). Despite fiddling with the cluster configuration, I always run out of tablet server heap space before too long.
Here are the configurations I've tried so far, with valuable guidance from Busbey and madrob: - native maps are enabled, tserver.memory.maps.max = 8G - table.compaction.minor.logs.threshold = 8 - tserver.walog.max.size = 1G - Tablet server has 4G heap (-Xmx4g) - table is pre-split into 8 tablets (split points 0x20, 0x40, 0x60, ...), 5 tablet servers are available - tserver.cache.data.size = 256M - tserver.cache.index.size = 40M (keys are small - 4 bytes - in this test) - table.scan.max.memory = 256M - tserver.readahead.concurrent.max = 4 (default is 16) It's often hard to tell where the OOM error comes from, but I have seen it frequently coming from Thrift as it is writing out scan results. Does anyone have any good conventions for supporting large values? (Warning: I'll want to work on large keys (and tiny values) next! :) ) Thanks very much Bill -- // Bill Havanki // Solutions Architect, Cloudera Govt Solutions // 443.686.9283
