[ https://issues.apache.org/jira/browse/ACCUMULO-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mark Owens resolved ACCUMULO-4355. ---------------------------------- Resolution: Resolved The remaining sub-tasks of this ticket applied to the legacy bulk import, which has been superseded by the newer bulk import API. > Provide more granular control for bulk import operations > -------------------------------------------------------- > > Key: ACCUMULO-4355 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4355 > Project: Accumulo > Issue Type: Wish > Components: master, tserver > Reporter: Shawn Walker > Assignee: Shawn Walker > Priority: Major > Fix For: 2.0.0 > > > Accumulo currently provides mechanisms to initiate bulk imports and to list > bulk imports in progress. Scheduling of bulk import requests is not entirely > deterministic, and most of the execution of a bulk-import request is done in > a non-preemptable manner. As such, any bulk import which takes very long to > complete can block bulk imports with higher operational priority for > significant periods. > To better support bulk-import-heavy applications, it would be nice if > Accumulo would offer additional mechanisms for controlling the scheduling and > execution of bulk imports, such as the abilities to: > * Pause/resume bulk import in progress. > * Prioritize/reprioritize bulk import requests. > * Cancel bulk import in progress. If possible, cancelling a partially > completed bulk import request should result in a rollback of changes. That > is, a bulk import should either succeed or make no changes. > Additionally, for multitenant situations, it would be nice if Accumulo would: > * Provide multiple queues for bulk import requests. Each queue would have > its requests worked serially in priority order. Requests in separate queues > should be worked in parallel, or have time distributed among the queues in > some manner as to make work appear roughly parallel. > ---- > Implementation-wise, I'm thinking of rewriting much of the current > bulk-loading logic. While the current logic depends upon multiple threads > executing (potentially long-duration) blocking RPC calls, I'd like to move to > a more event-driven/message-passing model backed by a persistent state > machine. > Current ideas I'm playing around with (very tentative) > * Creating a new table {{accumulo.bulk_load_queues}} to keep track of bulk > load progress. > * Distributing bulk load orchestration via a mechanism similar to tablet > assignment instead of the current blocking RPC calls (LoadFiles.java:156). > * Implementing something akin to a two-phase commit to achieve rollback > behavior on failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)