Grant Henke created KUDU-2785:
---------------------------------

             Summary: Support more parallel scanners in the backup job
                 Key: KUDU-2785
                 URL: https://issues.apache.org/jira/browse/KUDU-2785
             Project: Kudu
          Issue Type: Improvement
    Affects Versions: 1.9.0
            Reporter: Grant Henke


Currently the KuduBackup job uses 1 scanner and therefore 1 Spark task per Kudu 
partition. When KUDU-2670 is complete, we should consider and test having more 
than one scanner per partition and instead configuring a target data size for 
each scanner. That should result in faster and more reliable/predictable backup 
jobs regardless of partition count.

It may however make restoring more difficult because it could cause 
compactions. Restore side testing and improvements may also be required.

Improvements to the estimation for key range sizes may also need to be done, so 
this should be well tested. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to