[jira] [Commented] (PHOENIX-1590) Add an Asynchronous/Deferred Delete Option

James Taylor (JIRA) Sat, 24 Jan 2015 11:44:14 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290794#comment-14290794
 ]


James Taylor commented on PHOENIX-1590:
---------------------------------------

Thinking about this a bit more, I think I like your original idea better, 
[~jfernando_sfdc]: supporting a deferred delete of the data on a DROP VIEW 
command (if the view is updatable). Supporting it in-general on DELETE becomes 
very eventually-consistent-like and is very non standard. Though still non 
standard on the DROP statement, it's less important as there are frequently 
vendor-specific options on DDL commands.

Not sure of the best syntax, maybe one of these?
{code}
DROP VIEW foo DEFERRED DELETE
DROP VIEW foo DELETE ALL
DROP VIEW foo INCLUDE DATA
{code}

Not sure how best to handle corner cases, in particular, what happens if you 
create another VIEW with the same or overlapping WHERE clause before the data 
has actually been deleted? If the VIEW is tenant specific and we, by convention 
name the VIEW the same as the TABLE, then attempts to CREATE a new VIEW would 
fail until the data is actually deleted. That might be ok for our usage, but 
there's a lot of ifs there. :-)

Also, implementation-wise, not sure best how to track this. Maybe one way would 
be to mark the status of the view and it's indexes with a new value of 
DEFERRED_DELETE. Then, at compaction-time, we'd query the SYSTEM.CATALOG table 
to see if the table being compacted has views in a DEFERRED_DELETE state, 
collect up the view WHERE clauses, and generate a filter that could be used to 
evaluate if the row is included. We'd taking advantage of HBASE-12859 to know 
when we could remove the VIEW from the SYSTEM.CATALOG.

Thoughts, [~lhofhansl], [~jfernando_sfdc]? 

> Add an Asynchronous/Deferred Delete Option
> ------------------------------------------
>
>                 Key: PHOENIX-1590
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1590
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Jan Fernando
>
> For use cases where we need to delete very large amounts of data from Phoenix 
> tables running a synchronous delete can be problematic. In order to guarantee 
> that the delete completes, handle failure scenarios, and ensure it doesn't 
> put too much load on the HBase cluster and crowd out other queries running we 
> need to build tooling around the longer running delete operations to chunk 
> them up, provide retries in the event of failures, and have ways to throttle 
> delete load if the Region Servers get hot.  
> It would be really great if Phoenix offered a way to invoke a resilient 
> delete that was processed asynchronously and had minimal load on the cluster. 
> An idea mentioned to implement this is to introduce a DEFERRED keyword to the 
> DELETE operation and for such a delete to remove the data at compaction time.
> For our use cases, ideally, we would like to set delete filters that are 
> based on the first 2 elements of the row key (a multi-tenant id and the next 
> item).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1590) Add an Asynchronous/Deferred Delete Option

Reply via email to