subject:"\[jira\] \[Commented\] \(CASSANDRA\-5220\) Repair improvements when using vnodes"

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-11 Thread Marcus Olsson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681352#comment-14681352
 ] 

Marcus Olsson commented on CASSANDRA-5220:
--

LGTM!

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-10 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14680959#comment-14680959
 ] 

Stefania commented on CASSANDRA-5220:
-

[~molsson] could you quickly review the coverity patch I linked in my comment 
above?

Then, if all good, [~jbellis] could you commit it and resolve this ticket?

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-06 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661132#comment-14661132
 ] 

Stefania commented on CASSANDRA-5220:
-

Attaching information on the Coverity defects reported against this patch. I 
propose to handle as follows:

* CID 1315416 - TokenRangeComparator should be serializable - ignore since we 
have several other comparators that are not serializable
* CID 1315412:   RESOURCE_LEAK in CompactionStrategyManager line 368 - false 
positive, even though the lists aren't closed, their iterators are closed or 
returned for future use, same problem was already present before 
* CID 1315410:  Possible NPE in MerkleTrees line 172 - this code is for testing 
only so I propose to convert possible NPE to AssertionError and add 
@VisibleForTesting
* CID 1315409:  Possible NPE in MerkleTrees line 130 - this code is for testing 
only so I propose to convert possible NPE to AssertionError and add 
@VisibleForTesting
* CID 1315407:  Possible NPE in MerkleTrees line 162 - I verified it cannot 
happen so I propose to convert possible NPE to AssertionError

Proposed patch on the [3.0 
branch|https://github.com/stef1927/cassandra/tree/5220-3.0] : 
[here|https://github.com/stef1927/cassandra/commit/aa419e331783e78c9aafe79eaeb0362e2338a6b6].

Here are the defects details:

{code}
** CID 1315416:  FindBugs: Bad practice  
(FB.SE_COMPARATOR_SHOULD_BE_SERIALIZABLE)
/src/java/org/apache/cassandra/utils/MerkleTrees.java: 423 in ()


*** CID 1315416:  FindBugs: Bad practice  
(FB.SE_COMPARATOR_SHOULD_BE_SERIALIZABLE)
/src/java/org/apache/cassandra/utils/MerkleTrees.java: 423 in ()
417 }
418 return size;
419 }
420
421 }
422
>>> CID 1315416:  FindBugs: Bad practice  
>>> (FB.SE_COMPARATOR_SHOULD_BE_SERIALIZABLE)
>>> org.apache.cassandra.utils.MerkleTrees$TokenRangeComparator implements 
>>> Comparator but not Serializable.
423 private static class TokenRangeComparator implements 
Comparator>
424 {
425 @Override
426 public int compare(Range rt1, Range rt2)
427 {
428 if (rt1.left.compareTo(rt2.left) == 0)

** CID 1315412:(RESOURCE_LEAK)
/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java: 
368 in 
org.apache.cassandra.db.compaction.CompactionStrategyManager.getScanners(java.util.Collection,
 java.util.Collection)()
/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java: 
368 in 
org.apache.cassandra.db.compaction.CompactionStrategyManager.getScanners(java.util.Collection,
 java.util.Collection)()



*** CID 1315412:(RESOURCE_LEAK)
/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java: 
368 in 
org.apache.cassandra.db.compaction.CompactionStrategyManager.getScanners(java.util.Collection,
 java.util.Collection)()
362
363 for (ISSTableScanner scanner : 
Iterables.concat(repairedScanners.scanners, unrepairedScanners.scanners))
364 {
365 if (!scanners.add(scanner))
366 scanner.close();
367 }
>>> CID 1315412:(RESOURCE_LEAK)
>>> Variable "repairedScanners" going out of scope leaks the resource it 
>>> refers to.
368 }
369
370 return new AbstractCompactionStrategy.ScannerList(new 
ArrayList<>(scanners));
371 }
372
373 public synchronized AbstractCompactionStrategy.ScannerList 
getScanners(Collection sstables)
/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java: 
368 in 
org.apache.cassandra.db.compaction.CompactionStrategyManager.getScanners(java.util.Collection,
 java.util.Collection)()
362
363 for (ISSTableScanner scanner : 
Iterables.concat(repairedScanners.scanners, unrepairedScanners.scanners))
364 {
365 if (!scanners.add(scanner))
366 scanner.close();
367 }
>>> CID 1315412:(RESOURCE_LEAK)
>>> Variable "unrepairedScanners" going out of scope leaks the resource it 
>>> refers to.
368 }
369
370 return new AbstractCompactionStrategy.ScannerList(new 
ArrayList<>(scanners));
371 }
372
373 public synchronized AbstractCompactionStrategy.ScannerList 
getScanners(Collection sstables)

** CID 1315410:  Null pointer dereferences  (NULL_RETURNS)
/src/java/org/apache/cassandra/utils/MerkleTrees.java: 172 in 
org.apache.cassandra.utils.MerkleTrees.invalidate(org.apache.cassandra.dht.Token)()



*** C

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-06 Thread Yuki Morishita (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660750#comment-14660750
 ] 

Yuki Morishita commented on CASSANDRA-5220:
---

Unfortunately no.
This fix involves message format change, so backporting this breaks 
compatibility within major version.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-06 Thread Kenneth Failbus (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660740#comment-14660740
 ] 

Kenneth Failbus commented on CASSANDRA-5220:


Will this fix be back-ported to 2.0.x or 2.1.x releases. It will be a big help 
since this would solve and make the product stable on those releases. Since 
vnodes is a very good functionality for scaling purpose only if repairs keep up.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-06 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660003#comment-14660003
 ] 

Jonathan Ellis commented on CASSANDRA-5220:
---

Committed. Thanks, Marcus and Stefania!

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-06 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659640#comment-14659640
 ] 

Stefania commented on CASSANDRA-5220:
-

Continuous integration results are comparable to the unpatched cassandra-3.0 
results; the [3.0 patch|https://github.com/stef1927/cassandra/commits/5220-3.0] 
can be committed. [~jbellis] can you take care of this?

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-06 Thread Marcus Olsson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659593#comment-14659593
 ] 

Marcus Olsson commented on CASSANDRA-5220:
--

LGTM! :)

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-05 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659424#comment-14659424
 ] 

Jonathan Ellis commented on CASSANDRA-5220:
---

I'm okay with adding this to 3.0, since otherwise we'll need to wait for either 
8110 or 4.0, and I don't think that's fair to Marcus since he had the first 
version written months ago.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-05 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659416#comment-14659416
 ] 

Stefania commented on CASSANDRA-5220:
-

Thanks, I made a couple more really tiny changes 
[here|https://github.com/stef1927/cassandra/commit/dbd5c88c6f89ff303f4fece9bb8c5ffa6c3825a1].
 The TODO comment above was misplaced sorry, I meant it for {{MerkleTrees}}. 
You're quite right that we don't need to change the existing trunk behavior. 

About _repair_history_, I verified it would result in an exception when 
upgrading from 2.2 with some sstables already on disk. Although I believe we 
could ask people to wipe this data on a major upgrade, I don't see why 
inconvenience people and so I went ahead and reverted the old format and 
inserted one line per rage, see commit 
[here|https://github.com/stef1927/cassandra/commit/92bd923a8b2d9976dc711f1b7007d25db30d06f9].
 Thanks for spotting this.

If you confirm these final changes are OK, then I am +1 to commit once CI 
completes.

[~jbellis] I assume we want this on 3.0? If so I ported the patch to 
_cassandra-3.0_ [here|https://github.com/stef1927/cassandra/commits/5220-3.0]. 
It is identical to the [trunk 
patch|https://github.com/stef1927/cassandra/commits/5220] as it applied with no 
conflicts. You can pick whichever you need depending on where you want to 
commit to and discard the other one.

CI results for trunk will appear here:

http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-5220-testall/
http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-5220-dtest/

CI results for 3.0 are instead here:

http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-5220-3.0-testall/
http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-5220-3.0-dtest/


> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-05 Thread Marcus Olsson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14658349#comment-14658349
 ] 

Marcus Olsson commented on CASSANDRA-5220:
--

Created a pull request [here|https://github.com/stef1927/cassandra/pull/2] to 
your branch.

Most comments should've been fixed but there was one in particular I wasn't 
100% sure about. In _RepairJobDesc.java_ in the _deserialize()_ method:
{quote}
// CR-TODO is it safe to use the MS.globalPartitioner() here?
range = (Range) AbstractBounds.tokenSerializer.deserialize(in,
MessagingService.globalPartitioner(), version);
{quote}
Not sure what to use instead, but I guess it should be safe since the trunk 
version uses it.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-05 Thread Marcus Olsson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14658262#comment-14658262
 ] 

Marcus Olsson commented on CASSANDRA-5220:
--

While looking at CASSANDRA-5839 I realized that this might break something 
during upgrade from 2.2->3.0, with this patch the table _repair_history_ 
changes to have a set of ranges instead of a start and end range. (This patch 
was first done when

Should I change the table back and do one insert per range instead?

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-05 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14658225#comment-14658225
 ] 

Stefania commented on CASSANDRA-5220:
-

Sounds great, thanks! :)

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-05 Thread Marcus Olsson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14655293#comment-14655293
 ] 

Marcus Olsson commented on CASSANDRA-5220:
--

I'm happy to implement it!

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-05 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14655230#comment-14655230
 ] 

Stefania commented on CASSANDRA-5220:
-

As you prefer, in preference you should implement them but if you are busy I 
can also implement them myself and then you review afterwards. Just let me know.




> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-05 Thread Marcus Olsson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14655181#comment-14655181
 ] 

Marcus Olsson commented on CASSANDRA-5220:
--

Nice, would you like me to take care of the main points/nits/comments as well 
or would you rather fix them yourself?

Regarding the main points:

#2 For MerkleTrees serialization I guess we could remove the range and just 
serialize the MerkleTree's and use the fullRange.

#3 I guess I missed that option, it should probably be possible to use TreeMap 
instead.

#4 I don't think the token ranges should overlap, so a few assertions could be 
useful.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-04 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654768#comment-14654768
 ] 

Stefania commented on CASSANDRA-5220:
-

Great to hear this, then we should be able to commit this soon, the remaining 
points won't take long at all.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-04 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654755#comment-14654755
 ] 

Jonathan Ellis commented on CASSANDRA-5220:
---

We can't support repair anyway with older-version nodes until we have 
CASSANDRA-8110, so don't worry about it here.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-04 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654745#comment-14654745
 ] 

Stefania commented on CASSANDRA-5220:
-

Quite impressive gain indeed! Thanks for fixing those rebase errors too.

I've merged your branch into mine and rebased so that we can more easily 
compare the CI results. As you've noticed some test failures are not related to 
this patch, so keeping it up-to-date with trunk makes it easier to compare the 
test results with trunk ([here|http://cassci.datastax.com/job/trunk_testall] 
and [here|http://cassci.datastax.com/job/trunk_dtest]).

I also pushed [another 
commit|https://github.com/stef1927/cassandra/commit/27615434aec0ce05c2bfa689020b0e00a6409590]
 with some very minor changes, mostly nits or comments. There are also a couple 
of trivial things to do marked as {{// CR-TODO}}. I prefer not to clatter the 
discussion with these trivial matters and to instead focus on the main points, 
but if upon checking the changes something concerns you then don't hesitate to 
raise it.

Here are the main points:

* Do we need to support repair with older replicas? Normally we do support 
older nodes in a cluster when changing message formats, that's why we have a 
version in the serializers. So unless repair is different we need to make sure 
we still send the old message format to the old nodes, which I'm afraid could 
be a bit of a pain to implement. cc [~jbellis] to confirm.

* In {{MerkleTrees.deserialize()}}: is it safe to use 
{{MessagingService.globalPartitioner()}}? {{MerkleTree}} currently serializes 
the partitioner name so I would have thought we need to do the same? In fact, 
why send the range on the wire at all, can we not just take it from the tree 
{{fullRange}}?

* In {{MerkleTrees}}: why do we need a separate list of {{Range}}, isn't 
a sorted map like a tree map sufficient? 

* The token ranges should not overlap from what I understand so should we add a 
couple of assertions in {{MerkleTrees}} to make sure this is the case? (I'm not 
sure about this one).

* By reading the code documentation of {{RepairSession}} I found an old ticket, 
CASSANDRA-2816. I believe this proposed implementation should be fine as we 
scan multiple ranges at the same time in the validation compaction but I did 
not read the entire discussion on that ticket and so I thought I'd mention it 
here.


> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-04 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654147#comment-14654147
 ] 

Jonathan Ellis commented on CASSANDRA-5220:
---

Very substantial.  Excited to get this in!

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-04 Thread Marcus Olsson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653834#comment-14653834
 ] 

Marcus Olsson commented on CASSANDRA-5220:
--

Thanks, I've managed to get the dtests up and running now!

I had some problem running the dtests, but I think there might have been two 
small misses with the rebase. So the dtests are now working properly with 
[this|https://github.com/emolsson/cassandra/commits/5220] patch and when running
{noformat}
PRINT_DEBUG=True nosetests -s -v 
repair_test.py:TestRepair.simple_parallel_repair_test
{noformat}
on both trunk and the patched trunk I see an improvement from ~14.2s to ~6.7s 
repair time, and without vnodes(but with the patch) it was ~2s.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-04 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653407#comment-14653407
 ] 

Stefania commented on CASSANDRA-5220:
-

Thanks for resuming work at such short notice. 

About the dtests, it's a compatibility issue with the python driver. If you 
have a copy of the driver git repository then the "cassandra-test" branch 
should work.

Otherwise unzip the driver zip file bundled with the cassandra source 
(lib/cassandra-driver-internal-only-2.6.0c2.zip) and either install this 
version (python setup.py install) or make sure the cassandra folder is 
reachable by the dtests, i.e. by putting it in the same directory as the 
dtests. If using a local folder you probably also have to uninstall the 
official driver, if installed at all.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-04 Thread Marcus Olsson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653376#comment-14653376
 ] 

Marcus Olsson commented on CASSANDRA-5220:
--

Sorry about the unit tests, I had some problems with ant and junit when I 
uploaded the patch so I ran those tests through eclipse and it seems that I 
missed the -ea flag. Also the tests in SerializationsTest seems to be 
broken(due to the changes for validations). The other two failing tests seems 
to be failing on trunk as well, so I'm assuming that it's not due to this patch.

I'm working based on your rebased branch 5220 and have fixed the unit tests 
[here|https://github.com/emolsson/cassandra/commits/5220].

Also I seem to be having some problems with running the dtests, is it something 
special that needs to be done to run the dtests on trunk? I get the following 
error message:
{noformat}
NoHostAvailable: ('Unable to connect to any servers', {'127.0.0.1': 
InvalidRequest(u'code=2200 [Invalid query] message="unconfigured table 
schema_keyspaces"',)})
{noformat}


> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-03 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651640#comment-14651640
 ] 

Stefania commented on CASSANDRA-5220:
-

I've rebased _cassandra-3.0-5220-2.patch_ on to the latest trunk 
[here|https://github.com/stef1927/cassandra/commits/5220].

I haven't had the time to look at the code in depth yet, I plan to do so in the 
next few days.

Meanwhile, these unit tests are failing:

* MerkelTreesTest.testHashRandom
* LeveledCompactionStrategyTest.testValidationMultipleSSTablePerLevel

Initially I though it was because of the rebase but when I applied the patch 
onto trunk as of April 2015 
[here|https://github.com/stef1927/cassandra/commits/5220-old], with no 
conflicts, they were also failing.

There may be more broken unit tests, I've only checked the ones that were 
modified by the patch. Eventually the full CI will appear here:

http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-5220-testall/
http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-5220-dtest/




> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-04-24 Thread Marcus Olsson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511202#comment-14511202
 ] 

Marcus Olsson commented on CASSANDRA-5220:
--

Yes I ran the dtest and I see these exceptions as well while running it.

The tests I ran before was very basic with three nodes and using the stress 
tool with the cqlstress-example.yaml profile(changing the replication factor to 
two) and then ran it with n=100. Then I stopped a node, removed the 
inserted data and all commitlog entries, started it again and ran a full repair 
on that node using `repair -full -- stresscql`.


The main problem seems to be that it runs out of TreeRange's to iterate over 
while doing the validation compaction. I have probably done a faulty assumption 
somewhere and the first thing that comes to mind is that the wrapping iterator 
is sorting the ranges in a different order compared to how the validation 
compaction is reading them. Unfortunately I don't have time to debug this 
further until Monday.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-04-24 Thread Ryan McGuire (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510971#comment-14510971
 ] 

Ryan McGuire commented on CASSANDRA-5220:
-

Thanks [~molsson], the patch applies correctly now :)

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-04-24 Thread Marcus Olsson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510776#comment-14510776
 ] 

Marcus Olsson commented on CASSANDRA-5220:
--

Hi, yes I had some mix-up with my branches, so this wasn't the latest patch I'm 
afraid, will try to upload the new patch ASAP.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-04-23 Thread Ryan McGuire (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509113#comment-14509113
 ] 

Ryan McGuire commented on CASSANDRA-5220:
-

Hi [~molsson], I'm happy to run some of my own testing on this, but I'm having 
trouble applying your patch. Can you rebase it or let me know what git SHA your 
patch applies to?

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-04-23 Thread Marcus Olsson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508997#comment-14508997
 ] 

Marcus Olsson commented on CASSANDRA-5220:
--

I've done some smaller tests to verify that it works, but I haven't had the 
chance to run performance testing on it.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-04-23 Thread Jeremiah Jordan (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508973#comment-14508973
 ] 

Jeremiah Jordan commented on CASSANDRA-5220:


Did you run any tests to see how this improved things?

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2015-04-23 Thread Marcus Olsson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508815#comment-14508815
 ] 

Marcus Olsson commented on CASSANDRA-5220:
--

I've done some work on this to make the repair handle multiple ranges at the 
same time(attaching patch). Essentially what it does is that it finds the 
common ranges for a set of nodes and repairs them all at the same time.

Assume we have three nodes A, B and C, and RF=2 containing the ranges:
A -> 1, 2, 3, 4
B -> 3, 4, 5, 6
C -> 1, 2, 5, 6
then if we issue a repair -pr on A it would create two repair sessions:
(A, B) -> (3, 4)
and
(A, C) -> (1, 2)
instead of one for each range:
(A, B) -> 3
(A, B) -> 4
(A, C) -> 1
(A, C) -> 2


The change is mostly centered around the new utility class MerkleTrees which is 
a wrapper for multiple MerkleTree's and their associated ranges. This utility 
class replaces the occurrences of the MerkleTree class in the validator phase 
and the repair messages. The changes are non-backwards compatible, since the 
repair job is sending multiple ranges and validation complete sends MerkleTrees 
instead of MerkleTree.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2014-12-30 Thread Jeremy Hanna (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261376#comment-14261376
 ] 

Jeremy Hanna commented on CASSANDRA-5220:
-

I think it's important to reiterate that the project devs recognize that these 
inefficiencies are impacting many users.  However, lots of parallel work is 
getting done on repair.  As Yuki pointed out, with incremental repair 
(CASSANDRA-5351) already in 2.1 and improving the concurrency of the repair 
process (CASSANDRA-6455) coming in 3.0, many of the problems seen in this 
ticket will be resolved.

Until 2.1/3.0, sub-range repair (CASSANDRA-5280) is helpful to parallelize and 
repair more efficiently with virtual nodes.  See 
http://www.datastax.com/dev/blog/advanced-repair-techniques for details about 
efficiency gains with sub-range repair.  It's just more tedious to track.  
Saving repair data to a system table (CASSANDRA-5839) will help track that in 
Cassandra itself.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2014-09-22 Thread Yuki Morishita (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143627#comment-14143627
 ] 

Yuki Morishita commented on CASSANDRA-5220:
---

I'm inclined to mark this 'later' in favor of incremental repair and internal 
refactoring such as CASSANDRA-6455.
Especially, incremental repair should decrease the time needed for validating 
data, which is one of the major heavy-liftin processes of repair.


> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
>  Labels: performance, repair
> Fix For: 3.0
>
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2014-08-25 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110288#comment-14110288
 ] 

Jonathan Ellis commented on CASSANDRA-5220:
---

bq. I just talked to some people who were seeing an 8 node (256 vnodes each) 
repair with about 1GB/node take two days.

I'm still not sure we have a good handle on this.  Is this reproducible?  I'm 
not convinced "spending more time in messaging" is an adequate explanation.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
>  Labels: performance, repair
> Fix For: 3.0
>
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2014-06-05 Thread Jeffery Schnick (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019031#comment-14019031
 ] 

Jeffery Schnick commented on CASSANDRA-5220:


[~cscetbon] Thank you for the info.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
>  Labels: performance, repair
> Fix For: 2.1.1
>
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2014-06-04 Thread Cyril Scetbon (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018504#comment-14018504
 ] 

Cyril Scetbon commented on CASSANDRA-5220:
--

[~SchnickDaddy] It's not fixed yet. We just hope it'll be fixed in version 
2.1.1, and currently guys are digging to find where is located the overhead 
that slows the repair 

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
>  Labels: performance, repair
> Fix For: 2.1.1
>
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2014-06-04 Thread Jeffery Schnick (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018331#comment-14018331
 ] 

Jeffery Schnick commented on CASSANDRA-5220:


I see this is fixed in 2.1 rc1, but is there a patch or could I be pointed to 
the GIT commit that this was addressed?

Thanks

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
>  Labels: performance, repair
> Fix For: 2.1.1
>
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2014-05-22 Thread JIRA


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005843#comment-14005843
 ] 

Juho Mäkinen commented on CASSANDRA-5220:
-

In addition the repair operation gives poor status on its progress so it would 
be nice that some additional logging about repair progress would be added both 
to log4j and also to JMX.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
>  Labels: performance, repair
> Fix For: 2.1.1
>
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2014-04-21 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976385#comment-13976385
 ] 

Jonathan Ellis commented on CASSANDRA-5220:
---

bq. Send validation request once for all ranges, replica node builds MT for 
each range one by one, and sent back MT as it is built.

This is a fairly straightforward extension, isn't it?  I'd favor that approach.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
>  Labels: performance, repair
> Fix For: 2.1 beta2
>
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2014-04-16 Thread Ryan McGuire (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971607#comment-13971607
 ] 

Ryan McGuire commented on CASSANDRA-5220:
-

yourkit also listed some potential deadlocks, which apparently it doesn't save 
to the snapshot:

{code}
Frozen threads found (potential deadlock)
 
It seems that the following threads have not changed their stack for more than 
10 seconds.
These threads are possibly (but not necessarily!) in a deadlock or hung.
 
Thread-10 <--- Frozen for at least 48 sec
sun.nio.ch.FileDispatcherImpl.read0(FileDescriptor, long, int)
sun.nio.ch.SocketDispatcher.read(FileDescriptor, long, int)
sun.nio.ch.IOUtil.readIntoNativeBuffer(FileDescriptor, ByteBuffer, long, 
NativeDispatcher)
sun.nio.ch.IOUtil.read(FileDescriptor, ByteBuffer, long, NativeDispatcher)
sun.nio.ch.SocketChannelImpl.read(ByteBuffer)
sun.nio.ch.SocketAdaptor$SocketInputStream.read(ByteBuffer)
sun.nio.ch.ChannelInputStream.read(byte[], int, int)
org.xerial.snappy.SnappyInputStream.hasNextChunk()
org.xerial.snappy.SnappyInputStream.read()
java.io.DataInputStream.readInt()
org.apache.cassandra.net.IncomingTcpConnection.handleModernVersion()
org.apache.cassandra.net.IncomingTcpConnection.run()



Thread-11 <--- Frozen for at least 1m 17 sec
sun.nio.ch.FileDispatcherImpl.read0(FileDescriptor, long, int)
sun.nio.ch.SocketDispatcher.read(FileDescriptor, long, int)
sun.nio.ch.IOUtil.readIntoNativeBuffer(FileDescriptor, ByteBuffer, long, 
NativeDispatcher)
sun.nio.ch.IOUtil.read(FileDescriptor, ByteBuffer, long, NativeDispatcher)
sun.nio.ch.SocketChannelImpl.read(ByteBuffer)
sun.nio.ch.SocketAdaptor$SocketInputStream.read(ByteBuffer)
sun.nio.ch.ChannelInputStream.read(byte[], int, int)
org.xerial.snappy.SnappyInputStream.hasNextChunk()
org.xerial.snappy.SnappyInputStream.read()
java.io.DataInputStream.readInt()
org.apache.cassandra.net.IncomingTcpConnection.handleModernVersion()
org.apache.cassandra.net.IncomingTcpConnection.run()



Thread-12 <--- Frozen for at least 48 sec
sun.nio.ch.FileDispatcherImpl.read0(FileDescriptor, long, int)
sun.nio.ch.SocketDispatcher.read(FileDescriptor, long, int)
sun.nio.ch.IOUtil.readIntoNativeBuffer(FileDescriptor, ByteBuffer, long, 
NativeDispatcher)
sun.nio.ch.IOUtil.read(FileDescriptor, ByteBuffer, long, NativeDispatcher)
sun.nio.ch.SocketChannelImpl.read(ByteBuffer)
sun.nio.ch.SocketAdaptor$SocketInputStream.read(ByteBuffer)
sun.nio.ch.ChannelInputStream.read(byte[], int, int)
org.xerial.snappy.SnappyInputStream.hasNextChunk()
org.xerial.snappy.SnappyInputStream.read()
java.io.DataInputStream.readInt()
org.apache.cassandra.net.IncomingTcpConnection.handleModernVersion()
org.apache.cassandra.net.IncomingTcpConnection.run()



Thread-13 <--- Frozen for at least 48 sec
sun.nio.ch.FileDispatcherImpl.read0(FileDescriptor, long, int)
sun.nio.ch.SocketDispatcher.read(FileDescriptor, long, int)
sun.nio.ch.IOUtil.readIntoNativeBuffer(FileDescriptor, ByteBuffer, long, 
NativeDispatcher)
sun.nio.ch.IOUtil.read(FileDescriptor, ByteBuffer, long, NativeDispatcher)
sun.nio.ch.SocketChannelImpl.read(ByteBuffer)
sun.nio.ch.SocketAdaptor$SocketInputStream.read(ByteBuffer)
sun.nio.ch.ChannelInputStream.read(byte[], int, int)
org.xerial.snappy.SnappyInputStream.hasNextChunk()
org.xerial.snappy.SnappyInputStream.read()
java.io.DataInputStream.readInt()
org.apache.cassandra.net.IncomingTcpConnection.handleModernVersion()
org.apache.cassandra.net.IncomingTcpConnection.run()



Thread-3 <--- Frozen for at least 1m 21 sec
sun.nio.ch.FileDispatcherImpl.read0(FileDescriptor, long, int)
sun.nio.ch.SocketDispatcher.read(FileDescriptor, long, int)
sun.nio.ch.IOUtil.readIntoNativeBuffer(FileDescriptor, ByteBuffer, long, 
NativeDispatcher)
sun.nio.ch.IOUtil.read(FileDescriptor, ByteBuffer, long, NativeDispatcher)
sun.nio.ch.SocketChannelImpl.read(ByteBuffer)
sun.nio.ch.SocketAdaptor$SocketInputStream.read(ByteBuffer)
sun.nio.ch.ChannelInputStream.read(byte[], int, int)
org.xerial.snappy.SnappyInputStream.hasNextChunk()
org.xerial.snappy.SnappyInputStream.read()
java.io.DataInputStream.readInt()
org.apache.cassandra.net.IncomingTcpConnection.handleModernVersion()
org.apache.cassandra.net.IncomingTcpConnection.run()



Thread-7 <--- Frozen for at least 1m 21 sec
sun.nio.ch.FileDispatcherImpl.read0(FileDescriptor, long, int)
sun.nio.ch.SocketDispatcher.read(FileDescriptor, long, int)
sun.nio.ch.IOUtil.readIntoNativeBuffer(FileDescriptor, ByteBuffer, long, 
NativeDispatcher)
sun.nio.ch.IOUtil.read(FileDescriptor, ByteBuffer, long, NativeDispatcher)
sun.nio.ch.SocketChannelImpl.read(ByteBuffer)
sun.nio.ch.SocketAdaptor$SocketInputStream.read(ByteBuffer)
sun.nio.ch.ChannelInputStream.read(byte[], int, int)
org.xerial.snappy.SnappyInputStream.hasNextChunk()
org.xerial.snappy.SnappyInputStream.read()
java.io.DataInputStream.readInt()
org.apache.cassandra.net.IncomingTcpConnection.handleModernVersion()
org.

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2014-04-16 Thread Yuki Morishita (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971595#comment-13971595
 ] 

Yuki Morishita commented on CASSANDRA-5220:
---

Thanks, Ryan.
Time increase in Incoming/OutboundTcpConnection indicate repair is spending 
more time in messaging.
It is understandable the messaging is taking more than 200x for repairing 256x 
ranges.

One possible solutin is to repair multiple ranges at once.
I have two ideas in my mind:

# Build two-level MerkleTree of multiple ranges. In the lower level we have 
regular, per range MT and in the upper level, we have MT whose leaf is root 
hash of lower MT. So we can carry multiple MT in one round trip of message.
# Send validation request once for all ranges, replica node builds MT for each 
range one by one, and sent back MT as it is built.



> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
>  Labels: performance, repair
> Fix For: 2.1 beta2
>
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2014-04-16 Thread Lyuben Todorov (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971505#comment-13971505
 ] 

Lyuben Todorov commented on CASSANDRA-5220:
---

I'll have a shot at adding in some logging into the repair process to see if we 
can get a better idea of how much time is being spend in the different repair 
stages. 

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
>  Labels: performance, repair
> Fix For: 2.1 beta2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2014-04-16 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971465#comment-13971465
 ] 

Jonathan Ellis commented on CASSANDRA-5220:
---

I just talked to some people who were seeing an 8 node (256 vnodes each) repair 
with about 1GB/node take *two days*.

I would suggest doing some more digging to see where all the overhead is coming 
from, before guessing at solutions.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
>  Labels: performance, repair
> Fix For: 2.1 beta2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2014-04-15 Thread Richard Low (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969613#comment-13969613
 ] 

Richard Low commented on CASSANDRA-5220:


It's going to be a lot slower when there's little data because there is 
num_tokens times as much work to do. But when there is lots of data the times 
should be pretty much independent of num_tokens because most of repair is spent 
reading data and hashing. I ran some tests when we were developing vnodes 
(sorry, I don't have the data still available) and this was the case. Something 
might have regressed though.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
> Fix For: 2.1 beta2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2014-03-27 Thread Brandon Williams (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950268#comment-13950268
 ] 

Brandon Williams commented on CASSANDRA-5220:
-

After talking with Ryan, I'm convinced that I just didn't have an accurate 
measure of actual repair time when I filed this, and the problem is even worse 
than I initially thought :(

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
> Fix For: 2.1 beta2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2014-03-27 Thread Brandon Williams (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950239#comment-13950239
 ] 

Brandon Williams commented on CASSANDRA-5220:
-

bq. without vnodes: Repair time: 5.10s

That honestly sounds too fast to be believable to me, when I was tracking the 
time on the dtests repair was always one of, if not the, longest one.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
> Fix For: 2.1 beta2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2014-03-27 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950222#comment-13950222
 ] 

Jonathan Ellis commented on CASSANDRA-5220:
---

Worth bisecting?

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
> Fix For: 2.1 beta2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2014-03-27 Thread Ryan McGuire (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13949909#comment-13949909
 ] 

Ryan McGuire commented on CASSANDRA-5220:
-

As of today, on cassandra-2.0 HEAD

repair_test.TestRepair.simple_repair_test:

bq.without vnodes: Repair time: 5.10s
bq.with vnodes: Repair time: 562.97s

100x slower than without vnodes. So I'm not sure what happened here since 
@driftx ran this in November.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
> Fix For: 2.1 beta2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2014-03-06 Thread Robert Coli (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923229#comment-13923229
 ] 

Robert Coli commented on CASSANDRA-5220:


{quote}So we're 3-3.5x slower in the simple case.{quote}
So, if :

1) the default for gc_grace_seconds is how frequently we want people to repair
2) and vnodes make repair 3-3.5x slower in the simple case
3) and vnodes are enabled by default
4) why has the default for gc_grace_seconds not been increased by 3-3.5x? 
(CASSANDRA-5850)

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
> Fix For: 2.1 beta2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2013-12-23 Thread Donald Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855774#comment-13855774
 ] 

Donald Smith commented on CASSANDRA-5220:
-

 We ran "nodetool repair" on a 3 node cassandra cluster with production-quality 
hardware, using version 2.0.3. Each node had about 1TB of data. This is still 
testing.  After 5 days the repair job still hasn't finished. I can see it's 
still running.

Here's the process:
{noformat}
root 30835 30774  0 Dec17 pts/000:03:53 /usr/bin/java -cp 
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar
 -Xmx32m -Dlog4j.configuration=log4j-tools.properties 
-Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 
repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for 
keyspace as_reports
{noformat}

Here's the output of "nodetool tpstats":
{noformat}
cass3 /tmp> nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 1 0   38083403 0  
   0
RequestResponseStage  0 0 1951200451 0  
   0
MutationStage 0 0 2853354069 0  
   0
ReadRepairStage   0 03794926 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 04880147 0  
   0
AntiEntropyStage  1 3  9 0  
   0
MigrationStage0 0 30 0  
   0
MemoryMeter   0 0115 0  
   0
MemtablePostFlusher   0 0  75121 0  
   0
FlushWriter   0 0  49934 0  
  52
MiscStage 0 0  0 0  
   0
PendingRangeCalculator0 0  7 0  
   0
commitlog_archiver0 0  0 0  
   0
AntiEntropySessions   1 1  1 0  
   0
InternalResponseStage 0 0  9 0  
   0
HintedHandoff 0 0   1141 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
PAGED_RANGE  0
BINARY   0
READ   884
MUTATION   1407711
_TRACE   0
REQUEST_RESPONSE 0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest column family, as reported by "nodetool cfstats":
{n

[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2013-02-04 Thread Yuki Morishita (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570751#comment-13570751
 ] 

Yuki Morishita commented on CASSANDRA-5220:
---

The reason the repair is done almost sequentially is to synchronize merkle tree 
creation across the nodes(CASSANDRA-2816). If we could form the groups of nodes 
that do not overlap for several ranges, we would be able to parallelize 
create/validate merkle tree.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
> Fix For: 1.2.2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

51 matches

Mail list logo