[ https://issues.apache.org/jira/browse/HBASE-23678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Kyle Purtell updated HBASE-23678: ---------------------------------------- Description: Lars designed the combination of VERSIONS, TTL, MIN_VERSIONS, and KEEP_DELETED_CELLS with a maximum of flexibility. There is a lot of nuance regarding their usage. Almost all combinations of these four settings make sense for some use cases (exceptions are MIN_VERSIONS > 0 without TTL, and KEEP_DELETED_CELLS=TTL without TTL). There should be a way to make the behavior with TTL easier to conceive when creating the schema. This could take the form of a literate builder API for ColumnDescriptor or an extension to an existing one. Let me give you a motivating example: We may want to retain all versions for a given TTL, and then only a specific number of versions after that interval elapses. This can be achieved with VERSIONS=INT_MAX, TTL=_retention_interval_, KEEP_DELETED_CELLS=TTL, MIN_VERSIONS=_num_versions_ . This is not intuitive though because VERSIONS has been used to specify the number of versions to retain (_num_versions_ in this example) since HBase version 0.1, so this is going to be a source of confusion - I've seen it in practice. A literate builder API, by way of the way we design its method names, could let a user describe more or less in speaking language how they want version retention to work, and internally the builder API could set the low level schema attributes. was: Lars designed the combination of VERSIONS, TTL, MIN_VERSIONS, and KEEP_DELETED_CELLS with a maximum of flexibility. There is a lot of nuance regarding their usage. Almost all combinations of these four settings make sense for some use cases (exceptions are MIN_VERSIONS > 0 without TTL, and KEEP_DELETED_CELLS=TTL without TTL). There should be a way to make the behavior with TTL easier to conceive when creating the schema. This could take the form of a literate builder API for ColumnDescriptor or an extension to an existing one. Let me give you a motivating example: We may want to retain all versions for a given TTL, and then only a specific number of versions after that interval elapses. This can be achieved with VERSIONS=INT_MAX, TTL=_retention_interval_, KEEP_DELETED_CELLS=TTL, MIN_VERSION=_num_versions_ . This is not intuitive though because VERSIONS has been used to specify the number of versions to retain (_num_versions_ in this example) since HBase version 0.1, so this is going to be a source of confusion - I've seen it in practice. A literate builder API, by way of the way we design its method names, could let a user describe more or less in speaking language how they want version retention to work, and internally the builder API could set the low level schema attributes. > Literate builder API for version management in schema > ----------------------------------------------------- > > Key: HBASE-23678 > URL: https://issues.apache.org/jira/browse/HBASE-23678 > Project: HBase > Issue Type: Improvement > Reporter: Andrew Kyle Purtell > Priority: Major > > Lars designed the combination of VERSIONS, TTL, MIN_VERSIONS, and > KEEP_DELETED_CELLS with a maximum of flexibility. There is a lot of nuance > regarding their usage. Almost all combinations of these four settings make > sense for some use cases (exceptions are MIN_VERSIONS > 0 without TTL, and > KEEP_DELETED_CELLS=TTL without TTL). There should be a way to make the > behavior with TTL easier to conceive when creating the schema. This could > take the form of a literate builder API for ColumnDescriptor or an extension > to an existing one. > Let me give you a motivating example: We may want to retain all versions for > a given TTL, and then only a specific number of versions after that interval > elapses. This can be achieved with VERSIONS=INT_MAX, > TTL=_retention_interval_, KEEP_DELETED_CELLS=TTL, MIN_VERSIONS=_num_versions_ > . This is not intuitive though because VERSIONS has been used to specify the > number of versions to retain (_num_versions_ in this example) since HBase > version 0.1, so this is going to be a source of confusion - I've seen it in > practice. > A literate builder API, by way of the way we design its method names, could > let a user describe more or less in speaking language how they want version > retention to work, and internally the builder API could set the low level > schema attributes. -- This message was sent by Atlassian Jira (v8.3.4#803005)