[jira] [Updated] (CASSANDRA-13315) Consistency is confusing for new users

2017-03-09 Thread Ryan Svihla (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Svihla updated CASSANDRA-13315:

Description: 
New users really struggle with consistency level and fall into a large number 
of tarpits trying to decide on the right one.
o
1. There are a LOT of consistency levels and it's up to the end user to reason 
about what combinations are valid and what is really what they intend it to be. 
Is there any reason why write at ALL and read at CL TWO is better than read at 
CL ONE? 
2. They require a good understanding of failure modes to do well. It's not 
uncommon for people to use CL one and wonder why their data is missing.
3. The serial consistency level "bucket" is confusing to even write about and 
easy to get wrong even for experienced users.

So I propose the following steps (EDIT based on Jonathan's comment):

1. Remove the "serial consistency" level of consistency levels and just have 
all consistency levels in one bucket to set, conditions still need to be 
required for SERIAL/LOCAL_SERIAL
2. add 3 new consistency levels pointing to existing ones but that infer intent 
much more cleanly:
EDIT better names bases on comments.

   * EVENTUALLY = LOCAL_ONE reads and writes
   * STRONG = LOCAL_QUORUM reads and writes
   * SERIAL = LOCAL_SERIAL reads and writes (though a ton of folks dont know 
what SERIAL means so this is why I suggested TRANSACTIONAL even if its not as 
correct as Id like)
for global levels of this I propose keeping the old ones around, they're rarely 
used in the field except by accident or particularly opinionated and advanced 
users.

Drivers should put the new consistency levels in a new package and docs should 
be updated to suggest their use. Likewise setting default CL should only 
provide those three settings and applying it for reads and writes at the same 
time.

CQLSH I'm gonna suggest should default to HIGHLY_CONSISTENT. New sysadmins get 
surprised by this frequently and I can think of a couple very major escalations 
because people were confused what the default behavior was.

The benefit to all this change is we shrink the surface area that one has to 
understand when learning Cassandra greatly, and we have far less bad initial 
experiences and surprises. New users will more likely be able to wrap their 
brains around those 3 ideas more readily then they can "what happens when I 
have RF2, QUROUM writes and ONE reads". Advanced users get access to all the 
way still, while new users don't have to learn all the ins and outs of 
distributed theory just to write data and be able to read it back.

  was:
New users really struggle with consistency level and fall into a large number 
of tarpits trying to decide on the right one.

1. There are a LOT of consistency levels and it's up to the end user to reason 
about what combinations are valid and what is really what they intend it to be. 
Is there any reason why write at ALL and read at CL TWO is better than read at 
CL ONE? 
2. They require a good understanding of failure modes to do well. It's not 
uncommon for people to use CL one and wonder why their data is missing.
3. The serial consistency level "bucket" is confusing to even write about and 
easy to get wrong even for experienced users.

So I propose the following steps (EDIT based on Jonathan's comment):

1. Remove the "serial consistency" level of consistency levels and just have 
all consistency levels in one bucket to set, conditions still need to be 
required for SERIAL/LOCAL_SERIAL
2. add 3 new consistency levels pointing to existing ones but that infer intent 
much more cleanly:

   * EVENTUALLY_CONSISTENT = LOCAL_ONE reads and writes
   * HIGHLY_CONSISTENT = LOCAL_QUORUM reads and writes
   * TRANSACTIONALLY_CONSISTENT = LOCAL_SERIAL reads and writes
for global levels of this I propose keeping the old ones around, they're rarely 
used in the field except by accident or particularly opinionated and advanced 
users.

Drivers should put the new consistency levels in a new package and docs should 
be updated to suggest their use. Likewise setting default CL should only 
provide those three settings and applying it for reads and writes at the same 
time.

CQLSH I'm gonna suggest should default to HIGHLY_CONSISTENT. New sysadmins get 
surprised by this frequently and I can think of a couple very major escalations 
because people were confused what the default behavior was.

The benefit to all this change is we shrink the surface area that one has to 
understand when learning Cassandra greatly, and we have far less bad initial 
experiences and surprises. New users will more likely be able to wrap their 
brains around those 3 ideas more readily then they can "what happens when I 
have RF2, QUROUM writes and ONE reads". Advanced users get access to all the 
way still, while new users don't have to learn all the ins and outs of 
distributed theory just to w

[jira] [Updated] (CASSANDRA-13315) Consistency is confusing for new users

2017-03-09 Thread Ryan Svihla (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Svihla updated CASSANDRA-13315:

Description: 
New users really struggle with consistency level and fall into a large number 
of tarpits trying to decide on the right one.

1. There are a LOT of consistency levels and it's up to the end user to reason 
about what combinations are valid and what is really what they intend it to be. 
Is there any reason why write at ALL and read at CL TWO is better than read at 
CL ONE? 
2. They require a good understanding of failure modes to do well. It's not 
uncommon for people to use CL one and wonder why their data is missing.
3. The serial consistency level "bucket" is confusing to even write about and 
easy to get wrong even for experienced users.

So I propose the following steps (EDIT based on Jonathan's comment):

1. Remove the "serial consistency" level of consistency levels and just have 
all consistency levels in one bucket to set, conditions still need to be 
required for SERIAL/LOCAL_SERIAL
2. add 3 new consistency levels pointing to existing ones but that infer intent 
much more cleanly:

   * EVENTUALLY_CONSISTENT = LOCAL_ONE reads and writes
   * HIGHLY_CONSISTENT = LOCAL_QUORUM reads and writes
   * TRANSACTIONALLY_CONSISTENT = LOCAL_SERIAL reads and writes
for global levels of this I propose keeping the old ones around, they're rarely 
used in the field except by accident or particularly opinionated and advanced 
users.

Drivers should put the new consistency levels in a new package and docs should 
be updated to suggest their use. Likewise setting default CL should only 
provide those three settings and applying it for reads and writes at the same 
time.

CQLSH I'm gonna suggest should default to HIGHLY_CONSISTENT. New sysadmins get 
surprised by this frequently and I can think of a couple very major escalations 
because people were confused what the default behavior was.

The benefit to all this change is we shrink the surface area that one has to 
understand when learning Cassandra greatly, and we have far less bad initial 
experiences and surprises. New users will more likely be able to wrap their 
brains around those 3 ideas more readily then they can "what happens when I 
have RF2, QUROUM writes and ONE reads". Advanced users get access to all the 
way still, while new users don't have to learn all the ins and outs of 
distributed theory just to write data and be able to read it back.

  was:
New users really struggle with consistency level and fall into a large number 
of tarpits trying to decide on the right one.

1. There are a LOT of consistency levels and it's up to the end user to reason 
about what combinations are valid and what is really what they intend it to be. 
Is there any reason why write at ALL and read at CL TWO is better than read at 
CL ONE? 
2. They require a good understanding of failure modes to do well. It's not 
uncommon for people to use CL one and wonder why their data is missing.
3. The serial consistency level "bucket" is confusing to even write about and 
easy to get wrong even for experienced users.

So I propose the following steps (EDIT based on Jonathan's comment):

1. Remove the "serial consistency" level of consistency levels and just have 
all consistency levels in one bucket to set, conditional updates still need to 
be required for SERIAL/LOCAL_SERIAL
2. add 3 new consistency levels pointing to existing ones but that infer intent 
much more cleanly:

   * EVENTUALLY_CONSISTENT = LOCAL_ONE reads and writes
   * HIGHLY_CONSISTENT = LOCAL_QUORUM reads and writes
   * TRANSACTIONALLY_CONSISTENT = LOCAL_SERIAL reads and writes
for global levels of this I propose keeping the old ones around, they're rarely 
used in the field except by accident or particularly opinionated and advanced 
users.

Drivers should put the new consistency levels in a new package and docs should 
be updated to suggest their use. Likewise setting default CL should only 
provide those three settings and applying it for reads and writes at the same 
time.

CQLSH I'm gonna suggest should default to HIGHLY_CONSISTENT. New sysadmins get 
surprised by this frequently and I can think of a couple very major escalations 
because people were confused what the default behavior was.

The benefit to all this change is we shrink the surface area that one has to 
understand when learning Cassandra greatly, and we have far less bad initial 
experiences and surprises. New users will more likely be able to wrap their 
brains around those 3 ideas more readily then they can "what happens when I 
have RF2, QUROUM writes and ONE reads". Advanced users get access to all the 
way still, while new users don't have to learn all the ins and outs of 
distributed theory just to write data and be able to read it back.


> Consistency is confusing for new users
> 

[jira] [Updated] (CASSANDRA-13315) Consistency is confusing for new users

2017-03-09 Thread Ryan Svihla (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Svihla updated CASSANDRA-13315:

Description: 
New users really struggle with consistency level and fall into a large number 
of tarpits trying to decide on the right one.

1. There are a LOT of consistency levels and it's up to the end user to reason 
about what combinations are valid and what is really what they intend it to be. 
Is there any reason why write at ALL and read at CL TWO is better than read at 
CL ONE? 
2. They require a good understanding of failure modes to do well. It's not 
uncommon for people to use CL one and wonder why their data is missing.
3. The serial consistency level "bucket" is confusing to even write about and 
easy to get wrong even for experienced users.

So I propose the following steps (EDIT based on Jonathan's comment):

1. Remove the "serial consistency" level of consistency levels and just have 
all consistency levels in one bucket to set conditional updates still need to 
be required for SERIAL/LOCAL_SERIAL
2. add 3 new consistency levels pointing to existing ones but that infer intent 
much more cleanly:

   * EVENTUALLY_CONSISTENT = LOCAL_ONE reads and writes
   * HIGHLY_CONSISTENT = LOCAL_QUORUM reads and writes
   * TRANSACTIONALLY_CONSISTENT = LOCAL_SERIAL reads and writes
for global levels of this I propose keeping the old ones around, they're rarely 
used in the field except by accident or particularly opinionated and advanced 
users.

Drivers should put the new consistency levels in a new package and docs should 
be updated to suggest their use. Likewise setting default CL should only 
provide those three settings and applying it for reads and writes at the same 
time.

CQLSH I'm gonna suggest should default to HIGHLY_CONSISTENT. New sysadmins get 
surprised by this frequently and I can think of a couple very major escalations 
because people were confused what the default behavior was.

The benefit to all this change is we shrink the surface area that one has to 
understand when learning Cassandra greatly, and we have far less bad initial 
experiences and surprises. New users will more likely be able to wrap their 
brains around those 3 ideas more readily then they can "what happens when I 
have RF2, QUROUM writes and ONE reads". Advanced users get access to all the 
way still, while new users don't have to learn all the ins and outs of 
distributed theory just to write data and be able to read it back.

  was:
New users really struggle with consistency level and fall into a large number 
of tarpits trying to decide on the right one.

1. There are a LOT of consistency levels and it's up to the end user to reason 
about what combinations are valid and what is really what they intend it to be. 
Is there any reason why write at ALL and read at CL TWO is better than read at 
CL ONE? 
2. They require a good understanding of failure modes to do well. It's not 
uncommon for people to use CL one and wonder why their data is missing.
3. The serial consistency level "bucket" is confusing to even write about and 
easy to get wrong even for experienced users.

So I propose the following steps:

1. Remove the "serial consistency" level of consistency levels and just have 
all consistency levels in one bucket at the protocol level.
2. To enable #1 just reject writes or updates done without a condition when 
SERIAL/LOCAL_SERIAL is specified in the primary CL.
3. add 3 new consistency levels pointing to existing ones but that infer intent 
much more cleanly:

   * EVENTUALLY_CONSISTENT = LOCAL_ONE reads and writes
   * HIGHLY_CONSISTENT = LOCAL_QUORUM reads and writes
   * TRANSACTIONALLY_CONSISTENT = LOCAL_SERIAL reads and writes
for global levels of this I propose keeping the old ones around, they're rarely 
used in the field except by accident or particularly opinionated and advanced 
users.

Drivers should put the new consistency levels in a new package and docs should 
be updated to suggest their use. Likewise setting default CL should only 
provide those three settings and applying it for reads and writes at the same 
time.

CQLSH I'm gonna suggest should default to HIGHLY_CONSISTENT. New sysadmins get 
surprised by this frequently and I can think of a couple very major escalations 
because people were confused what the default behavior was.

The benefit to all this change is we shrink the surface area that one has to 
understand when learning Cassandra greatly, and we have far less bad initial 
experiences and surprises. New users will more likely be able to wrap their 
brains around those 3 ideas more readily then they can "what happens when I 
have RF2, QUROUM writes and ONE reads". Advanced users get access to all the 
way still, while new users don't have to learn all the ins and outs of 
distributed theory just to write data and be able to read it back.


> Consistency is confusing for ne

[jira] [Updated] (CASSANDRA-13315) Consistency is confusing for new users

2017-03-09 Thread Ryan Svihla (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Svihla updated CASSANDRA-13315:

Description: 
New users really struggle with consistency level and fall into a large number 
of tarpits trying to decide on the right one.

1. There are a LOT of consistency levels and it's up to the end user to reason 
about what combinations are valid and what is really what they intend it to be. 
Is there any reason why write at ALL and read at CL TWO is better than read at 
CL ONE? 
2. They require a good understanding of failure modes to do well. It's not 
uncommon for people to use CL one and wonder why their data is missing.
3. The serial consistency level "bucket" is confusing to even write about and 
easy to get wrong even for experienced users.

So I propose the following steps (EDIT based on Jonathan's comment):

1. Remove the "serial consistency" level of consistency levels and just have 
all consistency levels in one bucket to set, conditional updates still need to 
be required for SERIAL/LOCAL_SERIAL
2. add 3 new consistency levels pointing to existing ones but that infer intent 
much more cleanly:

   * EVENTUALLY_CONSISTENT = LOCAL_ONE reads and writes
   * HIGHLY_CONSISTENT = LOCAL_QUORUM reads and writes
   * TRANSACTIONALLY_CONSISTENT = LOCAL_SERIAL reads and writes
for global levels of this I propose keeping the old ones around, they're rarely 
used in the field except by accident or particularly opinionated and advanced 
users.

Drivers should put the new consistency levels in a new package and docs should 
be updated to suggest their use. Likewise setting default CL should only 
provide those three settings and applying it for reads and writes at the same 
time.

CQLSH I'm gonna suggest should default to HIGHLY_CONSISTENT. New sysadmins get 
surprised by this frequently and I can think of a couple very major escalations 
because people were confused what the default behavior was.

The benefit to all this change is we shrink the surface area that one has to 
understand when learning Cassandra greatly, and we have far less bad initial 
experiences and surprises. New users will more likely be able to wrap their 
brains around those 3 ideas more readily then they can "what happens when I 
have RF2, QUROUM writes and ONE reads". Advanced users get access to all the 
way still, while new users don't have to learn all the ins and outs of 
distributed theory just to write data and be able to read it back.

  was:
New users really struggle with consistency level and fall into a large number 
of tarpits trying to decide on the right one.

1. There are a LOT of consistency levels and it's up to the end user to reason 
about what combinations are valid and what is really what they intend it to be. 
Is there any reason why write at ALL and read at CL TWO is better than read at 
CL ONE? 
2. They require a good understanding of failure modes to do well. It's not 
uncommon for people to use CL one and wonder why their data is missing.
3. The serial consistency level "bucket" is confusing to even write about and 
easy to get wrong even for experienced users.

So I propose the following steps (EDIT based on Jonathan's comment):

1. Remove the "serial consistency" level of consistency levels and just have 
all consistency levels in one bucket to set conditional updates still need to 
be required for SERIAL/LOCAL_SERIAL
2. add 3 new consistency levels pointing to existing ones but that infer intent 
much more cleanly:

   * EVENTUALLY_CONSISTENT = LOCAL_ONE reads and writes
   * HIGHLY_CONSISTENT = LOCAL_QUORUM reads and writes
   * TRANSACTIONALLY_CONSISTENT = LOCAL_SERIAL reads and writes
for global levels of this I propose keeping the old ones around, they're rarely 
used in the field except by accident or particularly opinionated and advanced 
users.

Drivers should put the new consistency levels in a new package and docs should 
be updated to suggest their use. Likewise setting default CL should only 
provide those three settings and applying it for reads and writes at the same 
time.

CQLSH I'm gonna suggest should default to HIGHLY_CONSISTENT. New sysadmins get 
surprised by this frequently and I can think of a couple very major escalations 
because people were confused what the default behavior was.

The benefit to all this change is we shrink the surface area that one has to 
understand when learning Cassandra greatly, and we have far less bad initial 
experiences and surprises. New users will more likely be able to wrap their 
brains around those 3 ideas more readily then they can "what happens when I 
have RF2, QUROUM writes and ONE reads". Advanced users get access to all the 
way still, while new users don't have to learn all the ins and outs of 
distributed theory just to write data and be able to read it back.


> Consistency is confusing for new users
> 

[jira] [Updated] (CASSANDRA-13315) Consistency is confusing for new users

2017-03-09 Thread Ryan Svihla (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Svihla updated CASSANDRA-13315:

Description: 
New users really struggle with consistency level and fall into a large number 
of tarpits trying to decide on the right one.

1. There are a LOT of consistency levels and it's up to the end user to reason 
about what combinations are valid and what is really what they intend it to be. 
Is there any reason why write at ALL and read at CL TWO is better than read at 
CL ONE? 
2. They require a good understanding of failure modes to do well. It's not 
uncommon for people to use CL one and wonder why their data is missing.
3. The serial consistency level "bucket" is confusing to even write about and 
easy to get wrong even for experienced users.

So I propose the following steps:

1. Remove the "serial consistency" level of consistency levels and just have 
all consistency levels in one bucket at the protocol level.
2. To enable #1 just reject writes or updates done without a condition when 
SERIAL/LOCAL_SERIAL is specified in the primary CL.
3. add 3 new consistency levels pointing to existing ones but that infer intent 
much more cleanly:

   * EVENTUALLY_CONSISTENT = LOCAL_ONE reads and writes
   * HIGHLY_CONSISTENT = LOCAL_QUORUM reads and writes
   * TRANSACTIONALLY_CONSISTENT = LOCAL_SERIAL reads and writes
for global levels of this I propose keeping the old ones around, they're rarely 
used in the field except by accident or particularly opinionated and advanced 
users.

Drivers should put the new consistency levels in a new package and docs should 
be updated to suggest their use. Likewise setting default CL should only 
provide those three settings and applying it for reads and writes at the same 
time.

CQLSH I'm gonna suggest should default to HIGHLY_CONSISTENT. New sysadmins get 
surprised by this frequently and I can think of a couple very major escalations 
because people were confused what the default behavior was.

The benefit to all this change is we shrink the surface area that one has to 
understand when learning Cassandra greatly, and we have far less bad initial 
experiences and surprises. New users will more likely be able to wrap their 
brains around those 3 ideas more readily then they can "what happens when I 
have RF2, QUROUM writes and ONE reads". Advanced users get access to all the 
way still, while new users don't have to learn all the ins and outs of 
distributed theory just to write data and be able to read it back.

  was:
New users really struggle with consistency level and fall into a large number 
of tarpits trying to decide on the right one.

1. There are a LOT of consistency levels and it's up to the end user to reason 
about what combinations are valid and what is really what they intend it to be. 
Is there any reason why write at ALL and read at CL TWO is better than read at 
CL ONE? 
2. They require a good understanding of failure modes to do well. It's not 
uncommon for people to use CL one and wonder why their data is missing.
3. The serial consistency level "bucket" is confusing to even write about and 
easy to get wrong even for experienced users.

So I propose the following steps:

1. Remove the "serial consistency" level of consistency levels and just have 
all consistency levels in one bucket at the protocol level.
2. To enable #1 just reject writes or updates done without a condition when 
SERIAL/LOCAL_SERIAL is specified.
3. add 3 new consistency levels pointing to existing ones but that infer intent 
much more cleanly:

   * EVENTUALLY_CONSISTENT = LOCAL_ONE reads and writes
   * HIGHLY_CONSISTENT = LOCAL_QUORUM reads and writes
   * TRANSACTIONALLY_CONSISTENT = LOCAL_SERIAL reads and writes
for global levels of this I propose keeping the old ones around, they're rarely 
used in the field except by accident or particularly opinionated and advanced 
users.

Drivers should put the new consistency levels in a new package and docs should 
be updated to suggest their use. Likewise setting default CL should only 
provide those three settings and applying it for reads and writes at the same 
time.

CQLSH I'm gonna suggest should default to HIGHLY_CONSISTENT. New sysadmins get 
surprised by this frequently and I can think of a couple very major escalations 
because people were confused what the default behavior was.

The benefit to all this change is we shrink the surface area that one has to 
understand when learning Cassandra greatly, and we have far less bad initial 
experiences and surprises. New users will more likely be able to wrap their 
brains around those 3 ideas more readily then they can "what happens when I 
have RF2, QUROUM writes and ONE reads". Advanced users get access to all the 
way still, while new users don't have to learn all the ins and outs of 
distributed theory just to write data and be able to read it back.


> Consistenc