[jira] [Resolved] (ACCUMULO-4355) Provide more granular control for bulk import operations

Mark Owens (JIRA) Fri, 26 Apr 2019 12:14:48 -0700


     [ 
https://issues.apache.org/jira/browse/ACCUMULO-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mark Owens resolved ACCUMULO-4355.
----------------------------------
    Resolution: Resolved

The remaining sub-tasks of this ticket applied to the legacy bulk import, which 
has been superseded by the newer bulk import API.

> Provide more granular control for bulk import operations
> --------------------------------------------------------
>
>                 Key: ACCUMULO-4355
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4355
>             Project: Accumulo
>          Issue Type: Wish
>          Components: master, tserver
>            Reporter: Shawn Walker
>            Assignee: Shawn Walker
>            Priority: Major
>             Fix For: 2.0.0
>
>
> Accumulo currently provides mechanisms to initiate bulk imports and to list 
> bulk imports in progress.  Scheduling of bulk import requests is not entirely 
> deterministic, and most of the execution of a bulk-import request is done in 
> a non-preemptable manner.  As such, any bulk import which takes very long to 
> complete can block bulk imports with higher operational priority for 
> significant periods.
> To better support bulk-import-heavy applications, it would be nice if 
> Accumulo would offer additional mechanisms for controlling the scheduling and 
> execution of bulk imports, such as the abilities to:
> * Pause/resume bulk import in progress.
> * Prioritize/reprioritize bulk import requests.
> * Cancel bulk import in progress.  If possible, cancelling a partially 
> completed bulk import request should result in a rollback of changes.  That 
> is, a bulk import should either succeed or make no changes.
> Additionally, for multitenant situations, it would be nice if Accumulo would:
> * Provide multiple queues for bulk import requests.  Each queue would have 
> its requests worked serially in priority order.  Requests in separate queues 
> should be worked in parallel, or have time distributed among the queues in 
> some manner as to make work appear roughly parallel.
> ----
> Implementation-wise, I'm thinking of rewriting much of the current 
> bulk-loading logic.  While the current logic depends upon multiple threads 
> executing (potentially long-duration) blocking RPC calls, I'd like to move to 
> a more event-driven/message-passing model backed by a persistent state 
> machine.
> Current ideas I'm playing around with (very tentative)
> * Creating a new table {{accumulo.bulk_load_queues}} to keep track of bulk 
> load progress.
> * Distributing bulk load orchestration via a mechanism similar to tablet 
> assignment instead of the current blocking RPC calls (LoadFiles.java:156).
> * Implementing something akin to a two-phase commit to achieve rollback 
> behavior on failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ACCUMULO-4355) Provide more granular control for bulk import operations

Reply via email to