[jira] [Commented] (CASSANDRA-13065) Consistent range movements to not require MV updates to go through write paths

2017-04-06 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959033#comment-15959033
 ] 

Benjamin Roth commented on CASSANDRA-13065:
---

Looks good. I like putting the requiresViewBuild property into the 
StreamOperation!

> Consistent range movements to not require MV updates to go through write 
> paths 
> ---
>
> Key: CASSANDRA-13065
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13065
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benjamin Roth
>Assignee: Benjamin Roth
>Priority: Critical
> Fix For: 4.0
>
>
> Booting or decommisioning nodes with MVs is unbearably slow as all streams go 
> through the regular write paths. This causes read-before-writes for every 
> mutation and during bootstrap it causes them to be sent to batchlog.
> The makes it virtually impossible to boot a new node in an acceptable amount 
> of time.
> Using the regular streaming behaviour for consistent range movements works 
> much better in this case and does not break the MV local consistency contract.
> Already tested on own cluster.
> Bootstrap case is super easy to handle, decommission case requires 
> CASSANDRA-13064



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13065) Consistent range movements to not require MV updates to go through write paths

2017-04-06 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959002#comment-15959002
 ] 

Paulo Motta commented on CASSANDRA-13065:
-

Thanks, when I was about to commit I noticed that this will change the behavior 
of bulk loading (sstableloader) of base tables: before we rebuilt the views 
when base tables were bulk loaded but right now the user will either need to 
manually load the view sstables or trigger a manual view rebuild.

While we can provide this as a bulk loader option in the future, we should keep 
the behavior for bulk loading since this potentially inserts new data into the 
cluster, so I did a minor change to the patch to also not skip the write path 
for bulk loading and others (since the behavior is undefined in this case) - 
[~brstgt] can you double check this minor change, the core of the change is 
[this 
line|https://github.com/pauloricardomg/cassandra/commit/1344f9ba7b620d31c07487973c6545514ef40476#diff-1a464da4a62ac4a734c725059cbc918bR161].
 I also updated the ticket description to {{Skip building views during base 
table streams on range movements}} to better reflect what was done here.

This is nearly there, hopefully the final round of CI:
||trunk||
|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-13064]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-13064-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-13064-dtest/lastCompletedBuild/testReport/]|


> Consistent range movements to not require MV updates to go through write 
> paths 
> ---
>
> Key: CASSANDRA-13065
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13065
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benjamin Roth
>Assignee: Benjamin Roth
>Priority: Critical
> Fix For: 4.0
>
>
> Booting or decommisioning nodes with MVs is unbearably slow as all streams go 
> through the regular write paths. This causes read-before-writes for every 
> mutation and during bootstrap it causes them to be sent to batchlog.
> The makes it virtually impossible to boot a new node in an acceptable amount 
> of time.
> Using the regular streaming behaviour for consistent range movements works 
> much better in this case and does not break the MV local consistency contract.
> Already tested on own cluster.
> Bootstrap case is super easy to handle, decommission case requires 
> CASSANDRA-13064



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13065) Consistent range movements to not require MV updates to go through write paths

2017-04-03 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954589#comment-15954589
 ] 

Benjamin Roth commented on CASSANDRA-13065:
---

[~pauloricardomg]

There you go. My first complete patch. I hope its ok and works :)
https://github.com/Jaumo/cassandra/commit/88699700feb6b9a504df88ff063b82641d7939f7

> Consistent range movements to not require MV updates to go through write 
> paths 
> ---
>
> Key: CASSANDRA-13065
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13065
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benjamin Roth
>Assignee: Benjamin Roth
>Priority: Critical
> Fix For: 4.0
>
>
> Booting or decommisioning nodes with MVs is unbearably slow as all streams go 
> through the regular write paths. This causes read-before-writes for every 
> mutation and during bootstrap it causes them to be sent to batchlog.
> The makes it virtually impossible to boot a new node in an acceptable amount 
> of time.
> Using the regular streaming behaviour for consistent range movements works 
> much better in this case and does not break the MV local consistency contract.
> Already tested on own cluster.
> Bootstrap case is super easy to handle, decommission case requires 
> CASSANDRA-13064



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13065) Consistent range movements to not require MV updates to go through write paths

2017-03-30 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949693#comment-15949693
 ] 

Paulo Motta commented on CASSANDRA-13065:
-

Thanks [~brstgt], this looks mostly good, except the following nits:
- Add header to {{StreamOperation}} test
- CDC should always go through write path on streaming, not only repair

After this can you squash and format your patch according to these 
[guidelines|https://cassandra.apache.org/doc/latest/development/patches.html#creating-a-patch]
 (including CHANGES.txt entry).

I submitted CI tests for the current version of the patch:
||trunk||
|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-13065]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-13065-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-13065-dtest/lastCompletedBuild/testReport/]|



> Consistent range movements to not require MV updates to go through write 
> paths 
> ---
>
> Key: CASSANDRA-13065
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13065
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benjamin Roth
>Assignee: Benjamin Roth
>Priority: Critical
> Fix For: 4.0
>
>
> Booting or decommisioning nodes with MVs is unbearably slow as all streams go 
> through the regular write paths. This causes read-before-writes for every 
> mutation and during bootstrap it causes them to be sent to batchlog.
> The makes it virtually impossible to boot a new node in an acceptable amount 
> of time.
> Using the regular streaming behaviour for consistent range movements works 
> much better in this case and does not break the MV local consistency contract.
> Already tested on own cluster.
> Bootstrap case is super easy to handle, decommission case requires 
> CASSANDRA-13064



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13065) Consistent range movements to not require MV updates to go through write paths

2017-03-28 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945381#comment-15945381
 ] 

Benjamin Roth commented on CASSANDRA-13065:
---

[~pauloricardomg]

Gnaaa. Stupid search+replace errors.
Made changes according to your feedback: 
https://github.com/Jaumo/cassandra/commit/7ba773e901bcdb3bf830417ebd07ad4786a5b179
CDC uses write path again. Hope this time everything's ok.

Thanks for your patience :)

> Consistent range movements to not require MV updates to go through write 
> paths 
> ---
>
> Key: CASSANDRA-13065
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13065
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benjamin Roth
>Assignee: Benjamin Roth
>Priority: Critical
> Fix For: 4.0
>
>
> Booting or decommisioning nodes with MVs is unbearably slow as all streams go 
> through the regular write paths. This causes read-before-writes for every 
> mutation and during bootstrap it causes them to be sent to batchlog.
> The makes it virtually impossible to boot a new node in an acceptable amount 
> of time.
> Using the regular streaming behaviour for consistent range movements works 
> much better in this case and does not break the MV local consistency contract.
> Already tested on own cluster.
> Bootstrap case is super easy to handle, decommission case requires 
> CASSANDRA-13064



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13065) Consistent range movements to not require MV updates to go through write paths

2017-03-27 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943405#comment-15943405
 ] 

Paulo Motta commented on CASSANDRA-13065:
-

Thanks for the ping, I didn't get the JIRA notification for some reason. :(

bq. According to the feedback of Joshua McKenzie, I didn't add anything for 
CDC. Hope this is ok.

Right now the commit-log segment discarding is tied to memtable flushing, so 
I'm not sure how bypassing the memtable will affect commitlog segment 
discarding for mutations originated from streaming so it would be nice to have 
a test for that. Furthermore, right now there's no test for CDC behavior on 
streaming so this is a good opportunity to add a dtest verifying that behavior 
(leaving the house cleaner than you found it).

Since this may be considerable effort, I'm fine if we split the cdc component 
of this ticket into another ticket.

bq. So now you have MV data on C that belonged to a base table on A which is 
very likely to have an inconsistent relation not to it's base table data on D.

That's right, but after inconsistent range movements you are expected to run 
repair in both the base table and the view, which should make the rebuilt view 
consistent with the base again. So, in this scenario you would repair the bases 
A and D first, making them consistent, and then repair the views B and C, thus 
making base D consistent with view C.

I looked the updated patch and looks good, except the following nits:
* reuse {{requiresWritepath}} variable on before {{sendThroughWritePath}}
* no need to have a special {{UNIT_TEST}} {{StreamOperation}}, can just reuse 
one of the existing types
* I think you renamed mbean names ({{StorageProxy}}, {{StorageService}}, 
{{StreamManager}}) from {{type}} to {{streamOperation}} by mistake
* There are still a few places using {{streamOperation.toString()}} instead of 
{{streamOperation.getDescription()}}

> Consistent range movements to not require MV updates to go through write 
> paths 
> ---
>
> Key: CASSANDRA-13065
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13065
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benjamin Roth
>Assignee: Benjamin Roth
>Priority: Critical
> Fix For: 4.0
>
>
> Booting or decommisioning nodes with MVs is unbearably slow as all streams go 
> through the regular write paths. This causes read-before-writes for every 
> mutation and during bootstrap it causes them to be sent to batchlog.
> The makes it virtually impossible to boot a new node in an acceptable amount 
> of time.
> Using the regular streaming behaviour for consistent range movements works 
> much better in this case and does not break the MV local consistency contract.
> Already tested on own cluster.
> Bootstrap case is super easy to handle, decommission case requires 
> CASSANDRA-13064



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13065) Consistent range movements to not require MV updates to go through write paths

2017-03-26 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942722#comment-15942722
 ] 

Benjamin Roth commented on CASSANDRA-13065:
---

[~pauloricardomg] Did you notice my update?

> Consistent range movements to not require MV updates to go through write 
> paths 
> ---
>
> Key: CASSANDRA-13065
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13065
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benjamin Roth
>Assignee: Benjamin Roth
>Priority: Critical
> Fix For: 4.0
>
>
> Booting or decommisioning nodes with MVs is unbearably slow as all streams go 
> through the regular write paths. This causes read-before-writes for every 
> mutation and during bootstrap it causes them to be sent to batchlog.
> The makes it virtually impossible to boot a new node in an acceptable amount 
> of time.
> Using the regular streaming behaviour for consistent range movements works 
> much better in this case and does not break the MV local consistency contract.
> Already tested on own cluster.
> Bootstrap case is super easy to handle, decommission case requires 
> CASSANDRA-13064



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13065) Consistent range movements to not require MV updates to go through write paths

2017-03-20 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932912#comment-15932912
 ] 

Joshua McKenzie commented on CASSANDRA-13065:
-

It looks like we can always just use {{appendToCommitLog}} for CDC-enabled cfs 
+ the standard streaming code rather than using the write path the way MV's do. 
CDC has no requirement for the data to make it back into a memtable or to go 
through the entire write path; what we really need is to ensure that data makes 
it to the CL and the segments get flagged as containing CDC-enabled cf's which 
this code should do.

> Consistent range movements to not require MV updates to go through write 
> paths 
> ---
>
> Key: CASSANDRA-13065
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13065
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benjamin Roth
>Assignee: Benjamin Roth
>Priority: Critical
> Fix For: 4.0
>
>
> Booting or decommisioning nodes with MVs is unbearably slow as all streams go 
> through the regular write paths. This causes read-before-writes for every 
> mutation and during bootstrap it causes them to be sent to batchlog.
> The makes it virtually impossible to boot a new node in an acceptable amount 
> of time.
> Using the regular streaming behaviour for consistent range movements works 
> much better in this case and does not break the MV local consistency contract.
> Already tested on own cluster.
> Bootstrap case is super easy to handle, decommission case requires 
> CASSANDRA-13064



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13065) Consistent range movements to not require MV updates to go through write paths

2017-03-19 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932012#comment-15932012
 ] 

Paulo Motta commented on CASSANDRA-13065:
-

Thanks for your patch!  Overall the patch and approach looks good. Since this 
changes the behavior of streaming for both MVs and CDC, which have different 
implications, I will split review into those 2 parts.

MVs
- I like the extraction of stream descriptions into an enumeration, can you 
just add an {{OTHER}} type for backward compatibility in case another tool or 
system uses another stream description to avoid having  a null type. This can 
be probably be used on tests instead of creating custom {{TEST_TRANSFER}} or 
{{TEST_LEGACY_STREAMING}}.
- Can we also perhaps make the enum name a bit more descriptive, such as 
{{StreamingOperation}} or {{StreamingPurpose}}?
- It would be nice if you could also add a simple unit test to make sure 
descriptions are being converted correctly to {{StreamType}}.
- I don't like tying logic to "toString()" since that can change, can you add a 
{{getDescription}} method to {{StreamType}} and use that instead?
- I think we only need to run mutations through the write path on repair, since 
this is done to make base table consistent with the view which is the objective 
of repair, but not general streaming.
- Minor typo: {{sendThroughWritePatch}} -> {{sendThroughWritePath}}

CDC
- Add unit test showing that commit log segments are properly discarded when 
mutations do not go to memtable.
- It seems there are no dtests to verify CDC behavior on streaming (repair, 
bootstrap, etc), so I'd like to see some dtests showing that CDC behavior with 
streaming will be maintained with this. In order not to block the fix for MVs, 
if you want we can split the CDC part of this into another ticket.

> Consistent range movements to not require MV updates to go through write 
> paths 
> ---
>
> Key: CASSANDRA-13065
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13065
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benjamin Roth
>Assignee: Benjamin Roth
>Priority: Critical
> Fix For: 4.0
>
>
> Booting or decommisioning nodes with MVs is unbearably slow as all streams go 
> through the regular write paths. This causes read-before-writes for every 
> mutation and during bootstrap it causes them to be sent to batchlog.
> The makes it virtually impossible to boot a new node in an acceptable amount 
> of time.
> Using the regular streaming behaviour for consistent range movements works 
> much better in this case and does not break the MV local consistency contract.
> Already tested on own cluster.
> Bootstrap case is super easy to handle, decommission case requires 
> CASSANDRA-13064



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13065) Consistent range movements to not require MV updates to go through write paths

2017-03-02 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892040#comment-15892040
 ] 

Benjamin Roth commented on CASSANDRA-13065:
---

[~pauloricardomg] Please also look at follow-up commits on review. I added 2 
more commits (13064+13065) with tiny fixes. Depending on your feedback, I can 
rearrange them if required.

> Consistent range movements to not require MV updates to go through write 
> paths 
> ---
>
> Key: CASSANDRA-13065
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13065
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benjamin Roth
>Assignee: Benjamin Roth
>Priority: Critical
> Fix For: 4.0
>
>
> Booting or decommisioning nodes with MVs is unbearably slow as all streams go 
> through the regular write paths. This causes read-before-writes for every 
> mutation and during bootstrap it causes them to be sent to batchlog.
> The makes it virtually impossible to boot a new node in an acceptable amount 
> of time.
> Using the regular streaming behaviour for consistent range movements works 
> much better in this case and does not break the MV local consistency contract.
> Already tested on own cluster.
> Bootstrap case is super easy to handle, decommission case requires 
> CASSANDRA-13064



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13065) Consistent range movements to not require MV updates to go through write paths

2017-02-28 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889675#comment-15889675
 ] 

Benjamin Roth commented on CASSANDRA-13065:
---

[~pauloricardomg] This is the follow-up to CASSANDRA-13064.

I also optimized behaviour for CDC if no writepath is required due to MVs. This 
will allow incremental repairs for CFs with CDC without MVs.

> Consistent range movements to not require MV updates to go through write 
> paths 
> ---
>
> Key: CASSANDRA-13065
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13065
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benjamin Roth
>Assignee: Benjamin Roth
>Priority: Critical
> Fix For: 4.0
>
>
> Booting or decommisioning nodes with MVs is unbearably slow as all streams go 
> through the regular write paths. This causes read-before-writes for every 
> mutation and during bootstrap it causes them to be sent to batchlog.
> The makes it virtually impossible to boot a new node in an acceptable amount 
> of time.
> Using the regular streaming behaviour for consistent range movements works 
> much better in this case and does not break the MV local consistency contract.
> Already tested on own cluster.
> Bootstrap case is super easy to handle, decommission case requires 
> CASSANDRA-13064



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13065) Consistent range movements to not require MV updates to go through write paths

2016-12-21 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766478#comment-15766478
 ] 

Benjamin Roth commented on CASSANDRA-13065:
---

I marked it as critical as it severly affects cluster maintainability when 
using MVs. Maybe it's also worth to be considered as a bug.

> Consistent range movements to not require MV updates to go through write 
> paths 
> ---
>
> Key: CASSANDRA-13065
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13065
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benjamin Roth
>Priority: Critical
>
> Booting or decommisioning nodes with MVs is unbearably slow as all streams go 
> through the regular write paths. This causes read-before-writes for every 
> mutation and during bootstrap it causes them to be sent to batchlog.
> The makes it virtually impossible to boot a new node in an acceptable amount 
> of time.
> Using the regular streaming behaviour for consistent range movements works 
> much better in this case and does not break the MV local consistency contract.
> Already tested on own cluster.
> Bootstrap case is super easy to handle, decommission case requires 
> CASSANDRA-13064



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-13065) Consistent range movements to not require MV updates to go through write paths

2016-12-21 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766474#comment-15766474
 ] 

Benjamin Roth commented on CASSANDRA-13065:
---

Required for decommissioning

> Consistent range movements to not require MV updates to go through write 
> paths 
> ---
>
> Key: CASSANDRA-13065
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13065
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benjamin Roth
>
> Booting or decommisioning nodes with MVs is unbearably slow as all streams go 
> through the regular write paths. This causes read-before-writes for every 
> mutation and during bootstrap it causes them to be sent to batchlog.
> The makes it virtually impossible to boot a new node in an acceptable amount 
> of time.
> Using the regular streaming behaviour for consistent range movements works 
> much better in this case and does not break the MV local consistency contract.
> Already tested on own cluster.
> Bootstrap case is super easy to handle, decommission case requires 
> CASSANDRA-13064



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)