[ 
https://issues.apache.org/jira/browse/PHOENIX-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16236759#comment-16236759
 ] 

Geoffrey Jacoby commented on PHOENIX-4344:
------------------------------------------

Some thoughts, [~jamestaylor]

I want this to be usable for generic DELETE queries without the need for 
hand-written DBWritable subclasses.

MapReduce goes line by line, rather than by Mapper Task/Scan, so while the 
client would be issuing a broad DELETE query, the mapper itself would either be:

1. Issuing point DELETE Phoenix queries by the complete primary key derived 
from a SELECT the MapReduce is iterating over 
(Mapper<NullWritable, DBWritable, NullWritable, NullWritable>)
OR
2. Issuing DELETE mutations down to several HTables via MultiHFileOutputFormat 
from a DELETE the MapReduce is iterating over
(Mapper<NullWritable, DBWritable, ImmutableBytesWritable, Delete>)

FormatToBytesWritableMapper relies heavily on a LineParser interface, and the 
only choices appear to be CsvLineParser, JsonLineParser, and RegexLineParser. 
That means that in either case the complete row key would have to be built by a 
new ResultSetLineParser that can take in a ResultSet and parse it into an 
intermediate form suitable making either DELETE DML (Option 1) or Delete 
Mutations (Option 2). The former would just need to grab the row key 
components, while the latter would potentially need everything, because an 
index can be on any column. 

Also either way, we need a concrete generalized subclass of the abstract 
DBWritable. 

Option 1 seems considerably simpler/higher level, while Option 2 seems more 
efficient

> MapReduce Delete Support
> ------------------------
>
>                 Key: PHOENIX-4344
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4344
>             Project: Phoenix
>          Issue Type: New Feature
>    Affects Versions: 4.12.0
>            Reporter: Geoffrey Jacoby
>            Assignee: Geoffrey Jacoby
>            Priority: Major
>
> Phoenix already has the ability to use MapReduce for asynchronous handling of 
> long-running SELECTs. It would be really useful to have this capability for 
> long-running DELETEs, particularly of tables with indexes where using HBase's 
> own MapReduce integration would be prohibitively complicated. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to