[ https://issues.apache.org/jira/browse/PHOENIX-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Jasani updated PHOENIX-5521: ---------------------------------- Description: An HBase coprocessor Endpoint hook that takes in a request from a remote cluster (containing both the WALEdit's data and the WALKey's annotated metadata telling the remote cluster what tenant_id, logical tablename, and timestamp the data is associated with). Ideally the API's message format should be configurable / pluggable, and could be either a protobuf or an Avro schema similar to the WALEdit-like one described by PHOENIX-5443. Endpoints in HBase are structured to work with protobufs, so some conversion may be necessary in an Avro-compatible version. Future work may also extend this to any conforming schema given by a schema service such as the one in PHOENIX-5443, which would be useful in allowing PHOENIX-5442's CDC service to be used as a backup / migration tool. The endpoint hook would take the metadata + data and regenerate a complete set of Phoenix mutations, both data and indexes, just as the phoenix client did for the original SQL statement that generated the source-side edits. These mutations would be written to the remote cluster by the normal Phoenix write path. HBASE-27529 provides regionserver coproc hook to attach WAL extended attributes to mutations at replication sink. We can utilize this hook and provide end-to-end flow for Phoenix metadata attributes (tenant id, schema name, logical table name, table type etc). The source cluster can attach the metadata attributes to source mutations. By using "phoenix.append.metadata.to.wal", the attributes can be appended to WAL in the form of extended attributes. By using a new regionserver coproc in Phoenix, we can utilize HBASE-27529 and allow the sink cluster to attach the WAL extended attributes to Mutations. This way, IndexRegionObserver and other coproc endpoints would be able to get Phoenix metadata attributes in both source and sink clusters. The changes required to enable replication sink coproc, and allow it to attach phoenix metadata as Mutation attributes at the Sink cluster: # Add "org.apache.phoenix.coprocessor.ReplicationSinkEndpoint" to hbase.coprocessor.regionserver.classes config # phoenix.append.metadata.to.wal = true was: An HBase coprocessor Endpoint hook that takes in a request from a remote cluster (containing both the WALEdit's data and the WALKey's annotated metadata telling the remote cluster what tenant_id, logical tablename, and timestamp the data is associated with). Ideally the API's message format should be configurable / pluggable, and could be either a protobuf or an Avro schema similar to the WALEdit-like one described by PHOENIX-5443. Endpoints in HBase are structured to work with protobufs, so some conversion may be necessary in an Avro-compatible version. Future work may also extend this to any conforming schema given by a schema service such as the one in PHOENIX-5443, which would be useful in allowing PHOENIX-5442's CDC service to be used as a backup / migration tool. The endpoint hook would take the metadata + data and regenerate a complete set of Phoenix mutations, both data and indexes, just as the phoenix client did for the original SQL statement that generated the source-side edits. These mutations would be written to the remote cluster by the normal Phoenix write path. HBASE-27529 provides regionserver coproc hook to attach WAL extended attributes to mutations at replication sink. We can utilize this hook and provide end-to-end flow for Phoenix metadata attributes (tenant id, schema name, logical table name, table type etc). The source cluster can attach the metadata attributes to source mutations. By using "phoenix.append.metadata.to.wal", the attributes can be appended to WAL in the form of extended attributes. By using a new regionserver coproc in Phoenix, we can utilize HBASE-27529 and allow the sink cluster to attach the WAL extended attributes to Mutations. This way, IndexRegionObserver and other coproc endpoints would be able to get Phoenix metadata attributes in both source and sink clusters. The changes required to enable replication sink coproc, and allow it to attach phoenix metadata as Mutation attributes at the Sink cluster: # Add "org.apache.phoenix.coprocessor.ReplicationSinkEndpoint" to hbase.coprocessor.regionserver.classes config # phoenix.append.metadata.to.wal = true # Use "CHANGE_DETECTION_ENABLED = true" for the given table > Phoenix-level HBase Replication sink (Endpoint coproc) > ------------------------------------------------------ > > Key: PHOENIX-5521 > URL: https://issues.apache.org/jira/browse/PHOENIX-5521 > Project: Phoenix > Issue Type: Sub-task > Reporter: Geoffrey Jacoby > Assignee: Viraj Jasani > Priority: Major > Fix For: 5.2.0, 5.1.4 > > > An HBase coprocessor Endpoint hook that takes in a request from a remote > cluster (containing both the WALEdit's data and the WALKey's annotated > metadata telling the remote cluster what tenant_id, logical tablename, and > timestamp the data is associated with). > Ideally the API's message format should be configurable / pluggable, and > could be either a protobuf or an Avro schema similar to the WALEdit-like one > described by PHOENIX-5443. Endpoints in HBase are structured to work with > protobufs, so some conversion may be necessary in an Avro-compatible version. > Future work may also extend this to any conforming schema given by a schema > service such as the one in PHOENIX-5443, which would be useful in allowing > PHOENIX-5442's CDC service to be used as a backup / migration tool. > The endpoint hook would take the metadata + data and regenerate a complete > set of Phoenix mutations, both data and indexes, just as the phoenix client > did for the original SQL statement that generated the source-side edits. > These mutations would be written to the remote cluster by the normal Phoenix > write path. > > HBASE-27529 provides regionserver coproc hook to attach WAL extended > attributes to mutations at replication sink. We can utilize this hook and > provide end-to-end flow for Phoenix metadata attributes (tenant id, schema > name, logical table name, table type etc). The source cluster can attach the > metadata attributes to source mutations. By using > "phoenix.append.metadata.to.wal", the attributes can be appended to WAL in > the form of extended attributes. By using a new regionserver coproc in > Phoenix, we can utilize HBASE-27529 and allow the sink cluster to attach the > WAL extended attributes to Mutations. This way, IndexRegionObserver and other > coproc endpoints would be able to get Phoenix metadata attributes in both > source and sink clusters. > > The changes required to enable replication sink coproc, and allow it to > attach phoenix metadata as Mutation attributes at the Sink cluster: > # Add "org.apache.phoenix.coprocessor.ReplicationSinkEndpoint" to > hbase.coprocessor.regionserver.classes config > # phoenix.append.metadata.to.wal = true -- This message was sent by Atlassian Jira (v8.20.10#820010)