[jira] [Updated] (PHOENIX-5521) Phoenix-level HBase Replication sink (Endpoint coproc)

Viraj Jasani (Jira) Fri, 10 Feb 2023 12:27:15 -0800


     [ 
https://issues.apache.org/jira/browse/PHOENIX-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Viraj Jasani updated PHOENIX-5521:
----------------------------------
    Description: 
An HBase coprocessor Endpoint hook that takes in a request from a remote 
cluster (containing both the WALEdit's data and the WALKey's annotated metadata 
telling the remote cluster what tenant_id, logical tablename, and timestamp the 
data is associated with).

Ideally the API's message format should be configurable / pluggable, and could 
be either a protobuf or an Avro schema similar to the WALEdit-like one 
described by PHOENIX-5443. Endpoints in HBase are structured to work with 
protobufs, so some conversion may be necessary in an Avro-compatible version. 
Future work may also extend this to any conforming schema given by a schema 
service such as the one in PHOENIX-5443, which would be useful in allowing 
PHOENIX-5442's CDC service to be used as a backup / migration tool. 

The endpoint hook would take the metadata + data and regenerate a complete set 
of Phoenix mutations, both data and indexes, just as the phoenix client did for 
the original SQL statement that generated the source-side edits. These 
mutations would be written to the remote cluster by the normal Phoenix write 
path. 

 

HBASE-27529 provides regionserver coproc hook to attach WAL extended attributes 
to mutations at replication sink. We can utilize this hook and provide 
end-to-end flow for Phoenix metadata attributes (tenant id, schema name, 
logical table name, table type etc). The source cluster can attach the metadata 
attributes to source mutations. By using "phoenix.append.metadata.to.wal", the 
attributes can be appended to WAL in the form of extended attributes. By using 
a new regionserver coproc in Phoenix, we can utilize HBASE-27529 and allow the 
sink cluster to attach the WAL extended attributes to Mutations. This way, 
IndexRegionObserver and other coproc endpoints would be able to get Phoenix 
metadata attributes in both source and sink clusters.

 

The changes required to enable replication sink coproc, and allow it to attach 
phoenix metadata as Mutation attributes at the Sink cluster:
 # Add "org.apache.phoenix.coprocessor.ReplicationSinkEndpoint" to 
hbase.coprocessor.regionserver.classes config
 # phoenix.append.metadata.to.wal = true

  was:
An HBase coprocessor Endpoint hook that takes in a request from a remote 
cluster (containing both the WALEdit's data and the WALKey's annotated metadata 
telling the remote cluster what tenant_id, logical tablename, and timestamp the 
data is associated with).

Ideally the API's message format should be configurable / pluggable, and could 
be either a protobuf or an Avro schema similar to the WALEdit-like one 
described by PHOENIX-5443. Endpoints in HBase are structured to work with 
protobufs, so some conversion may be necessary in an Avro-compatible version. 
Future work may also extend this to any conforming schema given by a schema 
service such as the one in PHOENIX-5443, which would be useful in allowing 
PHOENIX-5442's CDC service to be used as a backup / migration tool. 

The endpoint hook would take the metadata + data and regenerate a complete set 
of Phoenix mutations, both data and indexes, just as the phoenix client did for 
the original SQL statement that generated the source-side edits. These 
mutations would be written to the remote cluster by the normal Phoenix write 
path. 

 

HBASE-27529 provides regionserver coproc hook to attach WAL extended attributes 
to mutations at replication sink. We can utilize this hook and provide 
end-to-end flow for Phoenix metadata attributes (tenant id, schema name, 
logical table name, table type etc). The source cluster can attach the metadata 
attributes to source mutations. By using "phoenix.append.metadata.to.wal", the 
attributes can be appended to WAL in the form of extended attributes. By using 
a new regionserver coproc in Phoenix, we can utilize HBASE-27529 and allow the 
sink cluster to attach the WAL extended attributes to Mutations. This way, 
IndexRegionObserver and other coproc endpoints would be able to get Phoenix 
metadata attributes in both source and sink clusters.

 

The changes required to enable replication sink coproc, and allow it to attach 
phoenix metadata as Mutation attributes at the Sink cluster:
 # Add "org.apache.phoenix.coprocessor.ReplicationSinkEndpoint" to 
hbase.coprocessor.regionserver.classes config
 # phoenix.append.metadata.to.wal = true
 # Use "CHANGE_DETECTION_ENABLED = true" for the given table


> Phoenix-level HBase Replication sink (Endpoint coproc)
> ------------------------------------------------------
>
>                 Key: PHOENIX-5521
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5521
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Geoffrey Jacoby
>            Assignee: Viraj Jasani
>            Priority: Major
>             Fix For: 5.2.0, 5.1.4
>
>
> An HBase coprocessor Endpoint hook that takes in a request from a remote 
> cluster (containing both the WALEdit's data and the WALKey's annotated 
> metadata telling the remote cluster what tenant_id, logical tablename, and 
> timestamp the data is associated with).
> Ideally the API's message format should be configurable / pluggable, and 
> could be either a protobuf or an Avro schema similar to the WALEdit-like one 
> described by PHOENIX-5443. Endpoints in HBase are structured to work with 
> protobufs, so some conversion may be necessary in an Avro-compatible version. 
> Future work may also extend this to any conforming schema given by a schema 
> service such as the one in PHOENIX-5443, which would be useful in allowing 
> PHOENIX-5442's CDC service to be used as a backup / migration tool. 
> The endpoint hook would take the metadata + data and regenerate a complete 
> set of Phoenix mutations, both data and indexes, just as the phoenix client 
> did for the original SQL statement that generated the source-side edits. 
> These mutations would be written to the remote cluster by the normal Phoenix 
> write path. 
>  
> HBASE-27529 provides regionserver coproc hook to attach WAL extended 
> attributes to mutations at replication sink. We can utilize this hook and 
> provide end-to-end flow for Phoenix metadata attributes (tenant id, schema 
> name, logical table name, table type etc). The source cluster can attach the 
> metadata attributes to source mutations. By using 
> "phoenix.append.metadata.to.wal", the attributes can be appended to WAL in 
> the form of extended attributes. By using a new regionserver coproc in 
> Phoenix, we can utilize HBASE-27529 and allow the sink cluster to attach the 
> WAL extended attributes to Mutations. This way, IndexRegionObserver and other 
> coproc endpoints would be able to get Phoenix metadata attributes in both 
> source and sink clusters.
>  
> The changes required to enable replication sink coproc, and allow it to 
> attach phoenix metadata as Mutation attributes at the Sink cluster:
>  # Add "org.apache.phoenix.coprocessor.ReplicationSinkEndpoint" to 
> hbase.coprocessor.regionserver.classes config
>  # phoenix.append.metadata.to.wal = true



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PHOENIX-5521) Phoenix-level HBase Replication sink (Endpoint coproc)

Reply via email to