Re: [DISCUSS] CEP-20: Dynamic Data Masking
> > > This type of feature is very useful, but it may be easier to analyze > this proposal if it’s compared with other DDM implementations from other > databases? Would it be reasonable to add a table to the proposal comparing > syntax and output from eg Azure SQL vs Cassandra vs whatever ? Good idea. I have added a section at the end of the document briefly describing how some other databases deal with data masking, and with links to their documentation for the topic. I am not an expert in none of those databases, so please take my comments there with a grain of salt. On Fri, 19 Aug 2022 at 17:30, Jeff Jirsa wrote: > This type of feature is very useful, but it may be easier to analyze this > proposal if it’s compared with other DDM implementations from other > databases? Would it be reasonable to add a table to the proposal comparing > syntax and output from eg Azure SQL vs Cassandra vs whatever ? > > > On Aug 19, 2022, at 4:50 AM, Andrés de la Peña > wrote: > > > Hi everyone, > > I'd like to start a discussion about this proposal for dynamic data > masking: > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-20%3A+Dynamic+Data+Masking > > Dynamic data masking allows to obscure sensitive information without > changing the stored data. It would be based on a set of native CQL > functions providing different types of masking, such as replacing the > column value by "". These functions could be used as regular functions > or attached to table columns with CREATE/ALTER table. There would be a new > UNMASK permission, so only the users with this permissions would be able to > see the unmasked column values. It would be possible to customize masking > by using UDFs as masking functions. > > Thanks, > >
Re: [DISCUSS] CEP-20: Dynamic Data Masking
This type of feature is very useful, but it may be easier to analyze this proposal if it’s compared with other DDM implementations from other databases? Would it be reasonable to add a table to the proposal comparing syntax and output from eg Azure SQL vs Cassandra vs whatever ? > On Aug 19, 2022, at 4:50 AM, Andrés de la Peña wrote: > > > Hi everyone, > > I'd like to start a discussion about this proposal for dynamic data masking: > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-20%3A+Dynamic+Data+Masking > > Dynamic data masking allows to obscure sensitive information without changing > the stored data. It would be based on a set of native CQL functions providing > different types of masking, such as replacing the column value by "". > These functions could be used as regular functions or attached to table > columns with CREATE/ALTER table. There would be a new UNMASK permission, so > only the users with this permissions would be able to see the unmasked column > values. It would be possible to customize masking by using UDFs as masking > functions. > > Thanks,
Re: [DISCUSS] CEP-20: Dynamic Data Masking
sounds interesting. I would like to understand a couple things here. If the column names are the same for masked and unmasked data, it would impact existing applications. I am curious what the transition plan look like for applications that expect unmasked data? For example, let’s say you store SSNs and Birth dates. Upon enabling this feature, let’s say the app user is not given the UNMASK permission. Now the app is receiving masked values for these columns. This is fine for most read only applications. However, a lot of times these columns may be used as primary keys or part of primary keys in other tables. This would break existing applications. How would this work in mixed mode when ew nodes in the cluster are masking data and others aren’t? How would it impact the driver? How would the application learn that the column values are masked? This is important in case a user has UNMASK permission and then later taken away. Again this would break a lot of applications. Dinesh > On Aug 19, 2022, at 4:50 AM, Andrés de la Peña wrote: > > > Hi everyone, > > I'd like to start a discussion about this proposal for dynamic data masking: > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-20%3A+Dynamic+Data+Masking > > Dynamic data masking allows to obscure sensitive information without changing > the stored data. It would be based on a set of native CQL functions providing > different types of masking, such as replacing the column value by "". > These functions could be used as regular functions or attached to table > columns with CREATE/ALTER table. There would be a new UNMASK permission, so > only the users with this permissions would be able to see the unmasked column > values. It would be possible to customize masking by using UDFs as masking > functions. > > Thanks,
Re: Is this an MV bug?
You mean entirely distinct CQL statements issued by the same client “concurrently”? If they’re submitted to the same coordinator then M2 will have a higher timestamp than M1, so if M2 applies first then M1 will be a no-op and should not generate any view update. If submitted to different coordinators with server-issued timestamps then unless timestamps clash, one of them will win, but it may not be M2. > On 19 Aug 2022, at 11:14, Claude Warren, Jr via dev > wrote: > > Perhaps my diagram was not clear. I am starting with mutations on the base > table. I assume they are not bundled together so from separate CQL > statements. > > On Fri, Aug 19, 2022 at 11:11 AM Claude Warren, Jr > wrote: >> If each mutation comes from a separate CQL they would be separate, no? >> >> >> On Fri, Aug 19, 2022 at 10:17 AM Benedict wrote: >>> If M1 and M2 both operate over the same partition key they won’t be >>> separate mutations, they should be combined into a single mutation before >>> submission to SP.mutate >>> >>> > On 19 Aug 2022, at 10:05, Claude Warren, Jr via dev >>> > wrote: >>> > >>> > >>> > >>> > # Table definitions >>> > >>> > Table [ Primary key ] other data >>> > base [ A B C ] D E >>> > MV[ D C ] A B E >>> > >>> > >>> > # Initial data >>> > base -> MV >>> > [ a b c ] d e -> [d c] a b e >>> > [ a' b c ] d e -> [d c] a' b e >>> > >>> > >>> > ## Mutations -> expected outcome >>> > >>> > M1: base [ a b c ] d e' -> MV [ d c ] a b e' >>> > M2: base [ a b c ] d' e -> MV [ d' c ] a b e >>> > >>> > ## processing bug >>> > Assume lock can not be obtained during processing of M1. >>> > >>> > The mutation M1 sleeps to wait for lock. (Trunk Keyspace.java : 601 ) >>> > >>> > Assume M2 obtains the lock and executes. >>> > >>> > MV is now >>> > [ d' c ] a b e >>> > >>> > M1 then obtains the lock and executes >>> > >>> > MV is now >>> > [ d c ] a b e' >>> > [ d' c] a b e >>> > >>> > base is >>> > [ a b c ] d e' >>> > >>> > MV entry "[ d' c ] a b e" is orphaned >>> > >>> >
[DISCUSS] CEP-20: Dynamic Data Masking
Hi everyone, I'd like to start a discussion about this proposal for dynamic data masking: https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-20%3A+Dynamic+Data+Masking Dynamic data masking allows to obscure sensitive information without changing the stored data. It would be based on a set of native CQL functions providing different types of masking, such as replacing the column value by "". These functions could be used as regular functions or attached to table columns with CREATE/ALTER table. There would be a new UNMASK permission, so only the users with this permissions would be able to see the unmasked column values. It would be possible to customize masking by using UDFs as masking functions. Thanks,
Re: Is this an MV bug?
Perhaps my diagram was not clear. I am starting with mutations on the base table. I assume they are not bundled together so from separate CQL statements. On Fri, Aug 19, 2022 at 11:11 AM Claude Warren, Jr wrote: > If each mutation comes from a separate CQL they would be separate, no? > > > On Fri, Aug 19, 2022 at 10:17 AM Benedict wrote: > >> If M1 and M2 both operate over the same partition key they won’t be >> separate mutations, they should be combined into a single mutation before >> submission to SP.mutate >> >> > On 19 Aug 2022, at 10:05, Claude Warren, Jr via dev < >> dev@cassandra.apache.org> wrote: >> > >> > >> > >> > # Table definitions >> > >> > Table [ Primary key ] other data >> > base [ A B C ] D E >> > MV[ D C ] A B E >> > >> > >> > # Initial data >> > base -> MV >> > [ a b c ] d e -> [d c] a b e >> > [ a' b c ] d e -> [d c] a' b e >> > >> > >> > ## Mutations -> expected outcome >> > >> > M1: base [ a b c ] d e' -> MV [ d c ] a b e' >> > M2: base [ a b c ] d' e -> MV [ d' c ] a b e >> > >> > ## processing bug >> > Assume lock can not be obtained during processing of M1. >> > >> > The mutation M1 sleeps to wait for lock. (Trunk Keyspace.java : 601 ) >> > >> > Assume M2 obtains the lock and executes. >> > >> > MV is now >> > [ d' c ] a b e >> > >> > M1 then obtains the lock and executes >> > >> > MV is now >> > [ d c ] a b e' >> > [ d' c] a b e >> > >> > base is >> > [ a b c ] d e' >> > >> > MV entry "[ d' c ] a b e" is orphaned >> > >> > >> >>
Re: Is this an MV bug?
If each mutation comes from a separate CQL they would be separate, no? On Fri, Aug 19, 2022 at 10:17 AM Benedict wrote: > If M1 and M2 both operate over the same partition key they won’t be > separate mutations, they should be combined into a single mutation before > submission to SP.mutate > > > On 19 Aug 2022, at 10:05, Claude Warren, Jr via dev < > dev@cassandra.apache.org> wrote: > > > > > > > > # Table definitions > > > > Table [ Primary key ] other data > > base [ A B C ] D E > > MV[ D C ] A B E > > > > > > # Initial data > > base -> MV > > [ a b c ] d e -> [d c] a b e > > [ a' b c ] d e -> [d c] a' b e > > > > > > ## Mutations -> expected outcome > > > > M1: base [ a b c ] d e' -> MV [ d c ] a b e' > > M2: base [ a b c ] d' e -> MV [ d' c ] a b e > > > > ## processing bug > > Assume lock can not be obtained during processing of M1. > > > > The mutation M1 sleeps to wait for lock. (Trunk Keyspace.java : 601 ) > > > > Assume M2 obtains the lock and executes. > > > > MV is now > > [ d' c ] a b e > > > > M1 then obtains the lock and executes > > > > MV is now > > [ d c ] a b e' > > [ d' c] a b e > > > > base is > > [ a b c ] d e' > > > > MV entry "[ d' c ] a b e" is orphaned > > > > > >
Re: Is this an MV bug?
If M1 and M2 both operate over the same partition key they won’t be separate mutations, they should be combined into a single mutation before submission to SP.mutate > On 19 Aug 2022, at 10:05, Claude Warren, Jr via dev > wrote: > > > > # Table definitions > > Table [ Primary key ] other data > base [ A B C ] D E > MV[ D C ] A B E > > > # Initial data > base -> MV > [ a b c ] d e -> [d c] a b e > [ a' b c ] d e -> [d c] a' b e > > > ## Mutations -> expected outcome > > M1: base [ a b c ] d e' -> MV [ d c ] a b e' > M2: base [ a b c ] d' e -> MV [ d' c ] a b e > > ## processing bug > Assume lock can not be obtained during processing of M1. > > The mutation M1 sleeps to wait for lock. (Trunk Keyspace.java : 601 ) > > Assume M2 obtains the lock and executes. > > MV is now > [ d' c ] a b e > > M1 then obtains the lock and executes > > MV is now > [ d c ] a b e' > [ d' c] a b e > > base is > [ a b c ] d e' > > MV entry "[ d' c ] a b e" is orphaned > >
Is this an MV bug?
# Table definitions Table [ Primary key ] other data base [ A B C ] D E MV[ D C ] A B E # Initial data base -> MV [ a b c ] d e -> [d c] a b e [ a' b c ] d e -> [d c] a' b e ## Mutations -> expected outcome M1: base [ a b c ] d e' -> MV [ d c ] a b e' M2: base [ a b c ] d' e -> MV [ d' c ] a b e ## processing bug Assume lock can not be obtained during processing of M1. The mutation M1 sleeps to wait for lock. (Trunk Keyspace.java : 601 ) Assume M2 obtains the lock and executes. MV is now [ d' c ] a b e M1 then obtains the lock and executes MV is now [ d c ] a b e' [ d' c] a b e base is [ a b c ] d e' MV entry "[ d' c ] a b e" is orphaned
Re: [Proposal] add pull request template
Since there seems to be agreement, I opened a ticket (CASSANDRA-17837) and a pull request (https://github.com/apache/cassandra/pull/1799) in so that the final text can be hashed out and accepted. I also used the proposed pull request in the text of the pull so that it can be seen in all its glory On Thu, Aug 18, 2022 at 9:10 PM Josh McKenzie wrote: > I have never seen this > kind of git merging strategy elsewhere, I am not sure if I am not > experienced enough or we are truly unique the way we do things. > > I am very fond of this project and this community. THAT SAID ;) you could > replace "kind of git merging strategy" with a lot of different things and > have it equally apply on this project. > > Perils of being a mature long-lived project I suspect. I'm all for us > doing the hard work of introspecting on how we do things and changing them > to improve or match industry standards where applicable. > > On Thu, Aug 18, 2022, at 3:33 PM, Stefan Miklosovic wrote: > > Interesting, thanks for explicitly writing that down. I humbly think > the CI and the convenience of the GitHub workflow is ultimately > secondary when it comes to the code-base as such. Indeed, nice to > have, but if it turns out to be uncomfortable in other ways, I guess > we just have to live with what we have. TBH I have never seen this > kind of git merging strategy elsewhere, I am not sure if I am not > experienced enough or we are truly unique the way we do things. > However, it does make sense. > > On Thu, 18 Aug 2022 at 21:28, Benedict > wrote: > > > > The benefits being extolled involve people setting up GitHub bots to > integrate with PRs to run CI etc, which will require some non-trivial > investment by somebody to put together > > > > The alternative merge strategy being discussed is not to merge, but to > instead cherry-pick or rebase. This means we can produce separate PRs for > each branch, that can be merged independently via the GitHub API. The > downside of this is that there are no merge commits, while one upside of > this is that there are no merge commits. > > > > On 18 Aug 2022, at 20:20, Stefan Miklosovic < > stefan.mikloso...@instaclustr.com> wrote: > > > > No chicken-egg to me. All it takes is ctrl+c & ctrl+v on your merging > > commits. How would new merging strategy actually look like? I am all > > ears. This seems to be quite nice as is if we stick to be more verbose > > what we did. > > > > On Thu, 18 Aug 2022 at 20:27, Benedict wrote: > > > > > > Was it? > > > > > > I mean, we’ve all (or most) I think worked on projects with those > things, so we all know what the benefits are? > > > > > > It’s fair to point out that we don’t have it even running for any branch > yet. However there’s perhaps a chicken-and-egg situation, where I’m unsure > the investment to develop can be justified by those who are able, if > there’s a chance it will be discarded? I can’t see us maintaining a > bifurcated process, where some patches go through automation and others > don’t, so if we don’t change the merge strategy that work would presumably > end up wasted. > > > > > > On 18 Aug 2022, at 18:53, Mick Semb Wever wrote: > > > > > > > > > > > > That debatable benefit aside, not doing merge commits would also open up > options for us to use PR's for merges and integrate running CI, and > blocking on clean CI, pre-merge. Which has some other pretty big benefits. > :) > > > > > > > > > > The past agreement IIRC was to start doing those things on trunk-only so > we can evaluate them for real. > > >