Keith, what do you think about the throughput archived? Was it around 15k messages per second, right?
*Dummy questions:* I've noticed when I increase the number of process (threads/applications) loading data better is the throughput. (Obviously) But I didn't reach the maximum of the Fluo. Is MapRedure the best way to load data to fluo? Are there differences in use "fluo exec" to start a Fluo client instead use a "java -jar" or other ways? Are there other clients to load/transact with Fluo? Python, Go... What's the difference in use a Loader or a client transaction? Asynchronous/Synchronous? *(sorry for the disconnected questions!)* Thanks! Alan Camillo *BlueShift *I IT Director Cel.: +55 11 98283-6358 Tel.: +55 11 4605-5082 2018-01-10 13:19 GMT-02:00 Keith Turner <ke...@deenlo.com>: > I completed a successful 24hr run of the Fluo stress test on a 10 node > EC2 cluster. For the test 1 billion random integers were loaded via > map reduce and then 370 million were loaded by Fluo. This resulted in > ~1.3 billion transaction executing and ~13 million collisions. Fluo > commit dbad51d was used for the test. Below is the final output from > the test. > > *****Verifying Fluo & MapReduce results match***** > Success! Fluo & MapReduce both calculated 1369064132 unique integers > > During the test CPU utilization was not uniformly high. Looking at > the Accumulo monitor some nodes would have lots of queued scans. > Running jstack on that nodes showed lots of threads trying to reserve > open files. However there were only a few threads actually running > scans. This seemed very odd and I plan to investigate further. I had > set the max open files to 1000 and all tablets had only 3 to 4 files. > Therefore if 1000 files were reserved I would have expected to see > lots of scans running, however this was not what I saw. > > > Below is a gist with info about config used for the test. > > https://gist.github.com/keith-turner/e28ee6cd4941210f34e5cd0e6a6b3106 >