[jira] [Commented] (CASSANDRA-10862) LCS repair: compact tables before making available in L0

2017-06-14 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049505#comment-16049505
 ] 

Simon Zhou commented on CASSANDRA-10862:


[~chenshen] Are you still working on this?

> LCS repair: compact tables before making available in L0
> 
>
> Key: CASSANDRA-10862
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10862
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Streaming and Messaging
>Reporter: Jeff Ferland
>Assignee: Chen Shen
>  Labels: lcs
>
> When doing repair on a system with lots of mismatched ranges, the number of 
> tables in L0 goes up dramatically, as correspondingly goes the number of 
> tables referenced for a query. Latency increases dramatically in tandem.
> Eventually all the copied tables are compacted down in L0, then copied into 
> L1 (which may be a very large copy), finally reducing the number of SSTables 
> per query into the manageable range.
> It seems to me that the cleanest answer is to compact after streaming, then 
> mark tables available rather than marking available when the file itself is 
> complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10862) LCS repair: compact tables before making available in L0

2016-06-17 Thread Chen Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15336712#comment-15336712
 ] 

Chen Shen commented on CASSANDRA-10862:
---

Thank you [~pauloricardomg]!, I will address your comments with a new patch 
soon.

> LCS repair: compact tables before making available in L0
> 
>
> Key: CASSANDRA-10862
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10862
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Streaming and Messaging
>Reporter: Jeff Ferland
>Assignee: Chen Shen
>
> When doing repair on a system with lots of mismatched ranges, the number of 
> tables in L0 goes up dramatically, as correspondingly goes the number of 
> tables referenced for a query. Latency increases dramatically in tandem.
> Eventually all the copied tables are compacted down in L0, then copied into 
> L1 (which may be a very large copy), finally reducing the number of SSTables 
> per query into the manageable range.
> It seems to me that the cleanest answer is to compact after streaming, then 
> mark tables available rather than marking available when the file itself is 
> complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10862) LCS repair: compact tables before making available in L0

2016-06-16 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334128#comment-15334128
 ] 

Paulo Motta commented on CASSANDRA-10862:
-

Thanks for the patch and sorry for the delay [~scv...@gmail.com]. Overall I 
like your approach, because it mitigates the impact without so many changes. 
See some suggestions for improvement below:
* I'm a bit uncomfortable with the unbounded busy wait so we should probably 
add a time bound to the loop in order to avoid hanging indefinitely if there is 
a problem with compactions catching up
* While the synchronization block on {{CFS}} would guarantee only one producer 
would add new sstables at a time, I think this might be a premature 
optimization that could be a source of problems later, so I'd prefer to take a 
best effort approach initially since I think that we're trying to protect from 
an abysmal number of sstables, and we don't allow concurrent repairs on the 
same tables anyway, so there shouldn't me many concurrent 
{{OnCompletionRunnable}} running for the same {{CFS}}
* With that said, I think we could have something like 
{{CompactionManager.waitForL0Leveling(ColumnFamilyStore cfs, int 
maxSStableCount, long maxWaitTime}}), similar to {{waitForCessation}} method 
but waits for L0 leveling instead and without taking a {{Callable}} as argument
* I think waiting for leveling on validation will probably cause overstreaming 
during repair, since different replicas will flush on different times, causing 
digest mismatches, so we should probably avoid that
* {{compaction_max_l0_sstable_count}} is quite an advanced lever so I don't 
think it should be exposed as a {{cassandra.yaml}} attribute, but instead as a 
system property (similar to {{cassandra.disable_stcs_in_l0}}). We could also 
maybe add this as a dynamic JMX attribute to facilitate tuning.
* We could also probably add another property for the {{max_wait_time}} for the 
L0 leveling, and maybe even provide a conservative default to both properties 
on trunk that could already bring some benefits to the average user and still 
allow more advanced users to tune it according to usage, something like: 
{{streaming.max_L0_count=1000}} and {{streaming.max_L0_wait_time=1min}}. I'm 
not really sure about these values, so it would be nice if you have any 
suggestions based on your tests so far.

Anything else to add here or any caveat I might be missing [~krummas]?

> LCS repair: compact tables before making available in L0
> 
>
> Key: CASSANDRA-10862
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10862
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Streaming and Messaging
>Reporter: Jeff Ferland
>Assignee: Chen Shen
>
> When doing repair on a system with lots of mismatched ranges, the number of 
> tables in L0 goes up dramatically, as correspondingly goes the number of 
> tables referenced for a query. Latency increases dramatically in tandem.
> Eventually all the copied tables are compacted down in L0, then copied into 
> L1 (which may be a very large copy), finally reducing the number of SSTables 
> per query into the manageable range.
> It seems to me that the cleanest answer is to compact after streaming, then 
> mark tables available rather than marking available when the file itself is 
> complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10862) LCS repair: compact tables before making available in L0

2016-06-08 Thread Chen Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321703#comment-15321703
 ] 

Chen Shen commented on CASSANDRA-10862:
---

[~pauloricardomg]
I've done some investigation and I find it might not so easy to schedule a 
compaction on L0 table on reception as the only straightforward way to trigger 
a compaction is by submitting a task to CompactionManager.submitBackground, and 
1) it's not guaranteed to be executed according to my knowledge 2) 
submitBackground need a `ColumnFamilyStore` as input, so we need either create 
a new CFS, or split the compaction strategy out of CompactionManager, each of 
which might need lots of work.
So instead I am doing a different tricky approach: Don't add tables to CFS 
until the number of L0 sstables is smaller than a threshold. And subscribe to 
`SSTableListChangedNotification` so that the `OnCompletionRunnable` could sleep 
and wait on notification. 
Is this a right direction? I have a commit here 
https://github.com/scv119/cassandra/commit/5e0c5b1da83ae7f2d2ccc382fd69c438637b2772
 if you want to take a look. I'm also planing to apply this patch to our 
production tier to see if this helps.
 

> LCS repair: compact tables before making available in L0
> 
>
> Key: CASSANDRA-10862
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10862
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Streaming and Messaging
>Reporter: Jeff Ferland
>Assignee: Chen Shen
>
> When doing repair on a system with lots of mismatched ranges, the number of 
> tables in L0 goes up dramatically, as correspondingly goes the number of 
> tables referenced for a query. Latency increases dramatically in tandem.
> Eventually all the copied tables are compacted down in L0, then copied into 
> L1 (which may be a very large copy), finally reducing the number of SSTables 
> per query into the manageable range.
> It seems to me that the cleanest answer is to compact after streaming, then 
> mark tables available rather than marking available when the file itself is 
> complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10862) LCS repair: compact tables before making available in L0

2016-06-02 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313225#comment-15313225
 ] 

Paulo Motta commented on CASSANDRA-10862:
-

bq. Also, l'm willing to jump on this if it's not your top priority 

Definitely, go for it! I'm happy to review it.

You cannot yet be assigned to this issue but I will assign you when some admin 
add you to the assignable list.

> LCS repair: compact tables before making available in L0
> 
>
> Key: CASSANDRA-10862
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10862
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Streaming and Messaging
>Reporter: Jeff Ferland
>
> When doing repair on a system with lots of mismatched ranges, the number of 
> tables in L0 goes up dramatically, as correspondingly goes the number of 
> tables referenced for a query. Latency increases dramatically in tandem.
> Eventually all the copied tables are compacted down in L0, then copied into 
> L1 (which may be a very large copy), finally reducing the number of SSTables 
> per query into the manageable range.
> It seems to me that the cleanest answer is to compact after streaming, then 
> mark tables available rather than marking available when the file itself is 
> complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10862) LCS repair: compact tables before making available in L0

2016-06-02 Thread Chen Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313089#comment-15313089
 ] 

Chen Shen commented on CASSANDRA-10862:
---

[~pauloricardomg] This is happening during streaming stage and we are on 2.2.5. 
So it seems CASSANDRA-6851 should have been included in this version and we 
should perform STCS on received sstables right? 

> LCS repair: compact tables before making available in L0
> 
>
> Key: CASSANDRA-10862
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10862
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Streaming and Messaging
>Reporter: Jeff Ferland
>
> When doing repair on a system with lots of mismatched ranges, the number of 
> tables in L0 goes up dramatically, as correspondingly goes the number of 
> tables referenced for a query. Latency increases dramatically in tandem.
> Eventually all the copied tables are compacted down in L0, then copied into 
> L1 (which may be a very large copy), finally reducing the number of SSTables 
> per query into the manageable range.
> It seems to me that the cleanest answer is to compact after streaming, then 
> mark tables available rather than marking available when the file itself is 
> complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10862) LCS repair: compact tables before making available in L0

2016-05-25 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300686#comment-15300686
 ] 

Paulo Motta commented on CASSANDRA-10862:
-

Is this incremental or non-incremental repair? What version? Do you know if 
these 4000 sstables were created from streaming alone or from anti-compaction? 
CASSANDRA-6851 reduced the amount of sstables generated after anti-compaction, 
so maybe that will already help mitigate this.

If the problem is with streaming, we could perhaps add a configurable threshold 
and perform STCS on received sstables before adding them to the datatracker 
({{OnCompletionRunnable}}). We should probably disable this during bootstrap, 
as people generally want their nodes to bootstrap faster. We should also make 
sure the sstables are not added to the data-tracker when they are compacted, 
but only after all of them are compacted in order to be able to abort/rollback 
the transaction if the node fails before that.

> LCS repair: compact tables before making available in L0
> 
>
> Key: CASSANDRA-10862
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10862
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Streaming and Messaging
>Reporter: Jeff Ferland
>
> When doing repair on a system with lots of mismatched ranges, the number of 
> tables in L0 goes up dramatically, as correspondingly goes the number of 
> tables referenced for a query. Latency increases dramatically in tandem.
> Eventually all the copied tables are compacted down in L0, then copied into 
> L1 (which may be a very large copy), finally reducing the number of SSTables 
> per query into the manageable range.
> It seems to me that the cleanest answer is to compact after streaming, then 
> mark tables available rather than marking available when the file itself is 
> complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10862) LCS repair: compact tables before making available in L0

2016-05-25 Thread Chen Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300498#comment-15300498
 ] 

Chen Shen commented on CASSANDRA-10862:
---

Hi [~pauloricardomg], I am with [~dikanggu] on this issue CASSANDRA-11432. As 
we investigated that the each node has over 900,000,000 keys, so that when node 
repairing it created over 4000 sstables and made huge performance degradation. 
So I'm thinking this issue should be able to solve the problem, what do you 
think?

> LCS repair: compact tables before making available in L0
> 
>
> Key: CASSANDRA-10862
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10862
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Streaming and Messaging
>Reporter: Jeff Ferland
>
> When doing repair on a system with lots of mismatched ranges, the number of 
> tables in L0 goes up dramatically, as correspondingly goes the number of 
> tables referenced for a query. Latency increases dramatically in tandem.
> Eventually all the copied tables are compacted down in L0, then copied into 
> L1 (which may be a very large copy), finally reducing the number of SSTables 
> per query into the manageable range.
> It seems to me that the cleanest answer is to compact after streaming, then 
> mark tables available rather than marking available when the file itself is 
> complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10862) LCS repair: compact tables before making available in L0

2016-04-07 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230399#comment-15230399
 ] 

Paulo Motta commented on CASSANDRA-10862:
-

Seems plausible to me. Since we already have an anti-compaction step after 
repair, perhaps we could have {{SizeTieredAntiCompaction}} that performs both 
anti-compaction and size-tiering in one shot before making sstables available?

While potentially increasing repair time, this could probably soften the impact 
of having many small sstables after repair. Since there are other tasks already 
addressing this I'm not sure this would a big win over other approaches. If 
not, we should probably close this, given CASSANDRA-10979 already mitigated the 
problem that originated this ticket.

> LCS repair: compact tables before making available in L0
> 
>
> Key: CASSANDRA-10862
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10862
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Streaming and Messaging
>Reporter: Jeff Ferland
>
> When doing repair on a system with lots of mismatched ranges, the number of 
> tables in L0 goes up dramatically, as correspondingly goes the number of 
> tables referenced for a query. Latency increases dramatically in tandem.
> Eventually all the copied tables are compacted down in L0, then copied into 
> L1 (which may be a very large copy), finally reducing the number of SSTables 
> per query into the manageable range.
> It seems to me that the cleanest answer is to compact after streaming, then 
> mark tables available rather than marking available when the file itself is 
> complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10862) LCS repair: compact tables before making available in L0

2016-04-01 Thread Jeff Ferland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15221843#comment-15221843
 ] 

Jeff Ferland commented on CASSANDRA-10862:
--

Note that CASSANDRA-10979 has helped immensely with the performance issues 
outlines here.

> LCS repair: compact tables before making available in L0
> 
>
> Key: CASSANDRA-10862
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10862
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Streaming and Messaging
>Reporter: Jeff Ferland
>
> When doing repair on a system with lots of mismatched ranges, the number of 
> tables in L0 goes up dramatically, as correspondingly goes the number of 
> tables referenced for a query. Latency increases dramatically in tandem.
> Eventually all the copied tables are compacted down in L0, then copied into 
> L1 (which may be a very large copy), finally reducing the number of SSTables 
> per query into the manageable range.
> It seems to me that the cleanest answer is to compact after streaming, then 
> mark tables available rather than marking available when the file itself is 
> complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10862) LCS repair: compact tables before making available in L0

2016-04-01 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15221799#comment-15221799
 ] 

Sylvain Lebresne commented on CASSANDRA-10862:
--

That sounds like an easy enough idea to me. Any opinions [~krummas], 
[~pauloricardomg] or [~yukim]?

> LCS repair: compact tables before making available in L0
> 
>
> Key: CASSANDRA-10862
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10862
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Streaming and Messaging
>Reporter: Jeff Ferland
>
> When doing repair on a system with lots of mismatched ranges, the number of 
> tables in L0 goes up dramatically, as correspondingly goes the number of 
> tables referenced for a query. Latency increases dramatically in tandem.
> Eventually all the copied tables are compacted down in L0, then copied into 
> L1 (which may be a very large copy), finally reducing the number of SSTables 
> per query into the manageable range.
> It seems to me that the cleanest answer is to compact after streaming, then 
> mark tables available rather than marking available when the file itself is 
> complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)