Adar Dembo created KUDU-1970:
--------------------------------

             Summary: Integration test for data scalability
                 Key: KUDU-1970
                 URL: https://issues.apache.org/jira/browse/KUDU-1970
             Project: Kudu
          Issue Type: Sub-task
          Components: master, tserver
    Affects Versions: 1.4.0
            Reporter: Adar Dembo
            Assignee: Adar Dembo


To help test data scalability fixes, we need a way to easily produce an 
environment that exhibits our current scalability issues. I'm sure one of our 
long-running workloads would be up to the task, but aside from taking a long 
time, it'd also fill up the disk, which makes it unusable on most developer 
machines. Ultimately, data isn't really the root cause of our scalability woes; 
it's the metadata necessary to maintain the data that hurts us. So an idealized 
environment would be heavy on the metadata. Here's a not-so-exhaustive list:
* Many tablets.
* Many columns per tablet.
* Many rowsets per tablet.
* Many data blocks.
* Many tables (tservers don't care about this, but maybe the master does?)

Let's write an integration test that swamps the machine with the above. It 
should be use an external mini cluster to simplify isolating master and tserver 
performance characteristics, but it needn't have more than one instance of each.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to