[jira] [Updated] (CASSANDRA-12871) Debian package no longer exports EXTRA_CLASSPATH

2016-11-29 Thread Rob Emery (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Emery updated CASSANDRA-12871:
--
Since Version: 2.1.0
  Component/s: Tools

> Debian package no longer exports EXTRA_CLASSPATH
> 
>
> Key: CASSANDRA-12871
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12871
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging, Tools
> Environment: Debian Jessie/Ubuntu 12.04+; Upgrading 2.0.17 to 2.1.16
>Reporter: Rob Emery
>
> We use mx4j to monitor Cassandra; in 2.0.17 we uncomment the EXTRA_CLASSPATH 
> in /etc/default/cassandra and cassandra loads the mx4j jar and provides the 
> web interface on port 8081.
> In 2.1.16 the export of EXTRA_CLASSPATH has been removed 
> (https://github.com/apache/cassandra/commit/2ba394676dc673b6d66a07247dccd122b64b0578)
>  in this commit meaning that the mx4j jar is never discovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12956) CL is not replayed on custom 2i exception

2016-11-29 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705324#comment-15705324
 ] 

Alex Petrov commented on CASSANDRA-12956:
-

The problem is only present in Cassandra starting from 3.0. Versions before 
that will replay commit log despite the exception, possibly generating multiple 
indentical sstables.

> CL is not replayed on custom 2i exception
> -
>
> Key: CASSANDRA-12956
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12956
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Priority: Critical
>
> If during the node shutdown / drain the custom (non-cf) 2i throws an 
> exception, CommitLog will get correctly preserved (segments won't get 
> discarded because segment tracking is correct). 
> However, when it gets replayed on node startup,  we're making a decision 
> whether or not to replay the commit log. CL segment starts getting replayed, 
> since there are non-discarded segments and during this process we're checking 
> whether there every [individual 
> mutation|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L215]
>  in commit log is already committed or no. Information about the sstables is 
> taken from [live sstables on 
> disk|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L250-L256].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9143) Improving consistency of repairAt field across replicas

2016-11-29 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-9143:
---
Status: Awaiting Feedback  (was: Open)

> Improving consistency of repairAt field across replicas 
> 
>
> Key: CASSANDRA-9143
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9143
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Blake Eggleston
>
> We currently send an anticompaction request to all replicas. During this, a 
> node will split stables and mark the appropriate ones repaired. 
> The problem is that this could fail on some replicas due to many reasons 
> leading to problems in the next repair. 
> This is what I am suggesting to improve it. 
> 1) Send anticompaction request to all replicas. This can be done at session 
> level. 
> 2) During anticompaction, stables are split but not marked repaired. 
> 3) When we get positive ack from all replicas, coordinator will send another 
> message called markRepaired. 
> 4) On getting this message, replicas will mark the appropriate stables as 
> repaired. 
> This will reduce the window of failure. We can also think of "hinting" 
> markRepaired message if required. 
> Also the stables which are streaming can be marked as repaired like it is 
> done now. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9143) Improving consistency of repairAt field across replicas

2016-11-29 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-9143:
---
Status: Open  (was: Patch Available)

> Improving consistency of repairAt field across replicas 
> 
>
> Key: CASSANDRA-9143
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9143
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Blake Eggleston
>
> We currently send an anticompaction request to all replicas. During this, a 
> node will split stables and mark the appropriate ones repaired. 
> The problem is that this could fail on some replicas due to many reasons 
> leading to problems in the next repair. 
> This is what I am suggesting to improve it. 
> 1) Send anticompaction request to all replicas. This can be done at session 
> level. 
> 2) During anticompaction, stables are split but not marked repaired. 
> 3) When we get positive ack from all replicas, coordinator will send another 
> message called markRepaired. 
> 4) On getting this message, replicas will mark the appropriate stables as 
> repaired. 
> This will reduce the window of failure. We can also think of "hinting" 
> markRepaired message if required. 
> Also the stables which are streaming can be marked as repaired like it is 
> done now. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas

2016-11-29 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705274#comment-15705274
 ] 

Marcus Eriksson commented on CASSANDRA-9143:


Looks good in general - comments;

* Rename the cleanup compaction task, very confusing wrt the current cleanup 
compactions
* Should we prioritize the pending-repair-cleanup compactions?
** If we don't we might compare different datasets - a repair fails half way 
through and one node happens to move the pending data to unrepaired, operator 
retriggers repair and we would compare different datasets. If we instead move 
the data back as quickly as possible we minimize this window
** It would also help the next normal compactions as we might be able to 
include more sstables in the repaired/unrepaired strategies
* Is there any point in doing anticompaction after repair with -full repairs? 
Can we always do consistent repairs? We would need to anticompact already 
repaired sstables into pending, but that should not be a big problem?
* In CompactionManager#getSSTablesToValidate we still mark all unrepaired 
sstables as repairing - we don't need to do that for consistent repairs. And if 
we can do consistent repair for -full as well, all that code can be removed
* In handleStatusRequest - if we don't have the local session, we should 
probably return that the session is failed?
* Fixed some minor nits here: 
https://github.com/krummas/cassandra/commit/24ef8b2f6df98431d66519ee12452df3db84fd7d


> Improving consistency of repairAt field across replicas 
> 
>
> Key: CASSANDRA-9143
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9143
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Blake Eggleston
>
> We currently send an anticompaction request to all replicas. During this, a 
> node will split stables and mark the appropriate ones repaired. 
> The problem is that this could fail on some replicas due to many reasons 
> leading to problems in the next repair. 
> This is what I am suggesting to improve it. 
> 1) Send anticompaction request to all replicas. This can be done at session 
> level. 
> 2) During anticompaction, stables are split but not marked repaired. 
> 3) When we get positive ack from all replicas, coordinator will send another 
> message called markRepaired. 
> 4) On getting this message, replicas will mark the appropriate stables as 
> repaired. 
> This will reduce the window of failure. We can also think of "hinting" 
> markRepaired message if required. 
> Also the stables which are streaming can be marked as repaired like it is 
> done now. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-11-29 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705225#comment-15705225
 ] 

Sam Tunnicliffe commented on CASSANDRA-12796:
-

The CI looks reasonable: 3 dtest failures on the 3.0 branch, which all have 
corresponding failures upstream, plus a couple of failures on the original 2.2 
branch which have since been addressed by other tickets. 

The internal paging will aim to read rows in chunks of ~4mb and I've used the 
default CQL page size of 10k rows for a floor as it seems like as good a place 
to start as any. 

[~mmajercik], [~anmols] how does this latest 3.0 version look with your 
testing? 

> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12900) Resurrect or remove HeapPool (unslabbed_heap_buffers)

2016-11-29 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15704955#comment-15704955
 ] 

Branimir Lambov commented on CASSANDRA-12900:
-

Patch looks good.

Could you also remove the {{DataReclaimer}} type? It is never used ({{NO_OP}} 
is its only instance) and misleads that we do (or can do) something to recover 
space.



> Resurrect or remove HeapPool (unslabbed_heap_buffers)
> -
>
> Key: CASSANDRA-12900
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12900
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 3.0.x, 3.x
>
>
> Seems this code has been commented out since CASSANDRA-8099 - we should 
> either remove the option or fix the code



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-12969) Index: index can significantly slow down boot

2016-11-29 Thread Corentin Chary (JIRA)
Corentin Chary created CASSANDRA-12969:
--

 Summary: Index: index can significantly slow down boot
 Key: CASSANDRA-12969
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12969
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Corentin Chary
 Attachments: 0004-index-do-not-re-insert-values-in-IndexInfo.patch

During startup, each existing index is opened and marked as built by adding an 
entry in "IndexInfo" and forcing a flush. Because of that we end up flushing 
one sstable per index. On systems on HDD this can take minutes for nothing.

Thw following patch allows to avoid creating useless new sstables if the index 
was already marked as built and will greatly reduce the startup time (and 
improve availability during restarts).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12910) SASI: calculatePrimary() always returns null

2016-11-29 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-12910:
---
Reproduced In: 3.x
   Status: Patch Available  (was: Open)

Joined patch

> SASI: calculatePrimary() always returns null
> 
>
> Key: CASSANDRA-12910
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12910
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Corentin Chary
>Priority: Minor
> Attachments: 0002-sasi-fix-calculatePrimary.patch
>
>
> While investigating performance issues with SASI  
> (https://github.com/criteo/biggraphite/issues/174 if you want to know more) I 
> ended finding calculatePrimary() in QueryController.java which apparently 
> should return the "primary index".
> It lacks documentation, and I'm unsure what the "primary index" should be, 
> but apparently this function never returns one because primaryIndexes.size() 
> is always 0.
> https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/index/sasi/plan/QueryController.java#L237
> I'm unsure if the proper fix is checking if the collection is empty or 
> reversing the operator (selecting the index with higher cardinality versus 
> the one with lower cardinality).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12910) SASI: calculatePrimary() always returns null

2016-11-29 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-12910:
---
Attachment: 0002-sasi-fix-calculatePrimary.patch

> SASI: calculatePrimary() always returns null
> 
>
> Key: CASSANDRA-12910
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12910
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Corentin Chary
>Priority: Minor
> Attachments: 0002-sasi-fix-calculatePrimary.patch
>
>
> While investigating performance issues with SASI  
> (https://github.com/criteo/biggraphite/issues/174 if you want to know more) I 
> ended finding calculatePrimary() in QueryController.java which apparently 
> should return the "primary index".
> It lacks documentation, and I'm unsure what the "primary index" should be, 
> but apparently this function never returns one because primaryIndexes.size() 
> is always 0.
> https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/index/sasi/plan/QueryController.java#L237
> I'm unsure if the proper fix is checking if the collection is empty or 
> reversing the operator (selecting the index with higher cardinality versus 
> the one with lower cardinality).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12910) SASI: calculatePrimary() always returns null

2016-11-29 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-12910:
---
Priority: Minor  (was: Major)

> SASI: calculatePrimary() always returns null
> 
>
> Key: CASSANDRA-12910
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12910
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Corentin Chary
>Priority: Minor
>
> While investigating performance issues with SASI  
> (https://github.com/criteo/biggraphite/issues/174 if you want to know more) I 
> ended finding calculatePrimary() in QueryController.java which apparently 
> should return the "primary index".
> It lacks documentation, and I'm unsure what the "primary index" should be, 
> but apparently this function never returns one because primaryIndexes.size() 
> is always 0.
> https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/index/sasi/plan/QueryController.java#L237
> I'm unsure if the proper fix is checking if the collection is empty or 
> reversing the operator (selecting the index with higher cardinality versus 
> the one with lower cardinality).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-12905) Retry acquire MV lock on failure instead of throwing WTE on streaming

2016-11-29 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15704715#comment-15704715
 ] 

Benjamin Roth edited comment on CASSANDRA-12905 at 11/29/16 8:58 AM:
-

Problem 3.
All MV updates that happen during bootstrap are sent to batchlog (see 
StorageProxy.mutateMV). This puts so much pressure on the BatchlogManager and 
causes zillions of compactions of system.batches during bootstraps. The fact 
that the batchlog implementation is an antipattern (dont use CS as a queue) 
does not improve the situation. BL has to deal with more and more tombstones 
the larger the log gets. I observed batchlogs with 60GBs.

Not sending tables with MVs through regular write path on bootstrap would solve 
problem (1.) and (3.). (2.) still persists but can easily handled by disabling 
timeout for mutations from hints.

I would even go so far to say that sending streams through the regular write 
path for MVs is never a good idea. This would also alleviate other problems 
like incremental repairs for MVs (CASSANDRA-12888). But that is maybe a 
different story - but IMHO still worth a discussion.


was (Author: brstgt):
Problem 3.
All MV updates that happen during bootstrap are sent to batchlog (see 
StorageProxy.mutateMV). This puts so much pressure on the BatchlogManager and 
causes zillions of compactions of system.batches during bootstraps. The fact 
that the batchlog implementation is an antipattern (dont use CS as a queue) 
does not improve the situation. BL has to deal with more and more tombstones 
the larger the log gets. I observed batchlogs with 60GBs.

Not sending tables with MVs through regular write path on bootstrap would solve 
problem (1.) and (3.). (2.) still persists but can easily handled by disabling 
timeout for mutations from hints.

> Retry acquire MV lock on failure instead of throwing WTE on streaming
> -
>
> Key: CASSANDRA-12905
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12905
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: centos 6.7 x86_64
>Reporter: Nir Zilka
>Priority: Critical
> Fix For: 3.9
>
>
> Hello,
> I performed two upgrades to the current cluster (currently 15 nodes, 1 DC, 
> private VLAN),
> first it was 2.2.5.1 and repair worked flawlessly,
> second upgrade was to 3.0.9 (with upgradesstables) and also repair worked 
> well,
> then i upgraded 2 weeks ago to 3.9 - and the repair problems started.
> there are several errors types from the system.log (different nodes) :
> - Sync failed between /xxx.xxx.xxx.xxx and /xxx.xxx.xxx.xxx
> - Streaming error occurred on session with peer xxx.xxx.xxx.xxx Operation 
> timed out - received only 0 responses
> - Remote peer xxx.xxx.xxx.xxx failed stream session
> - Session completed with the following error
> org.apache.cassandra.streaming.StreamException: Stream failed
> 
> i use 3.9 default configuration with the cluster settings adjustments (3 
> seeds, GossipingPropertyFileSnitch).
> streaming_socket_timeout_in_ms is the default (8640).
> i'm afraid from consistency problems while i'm not performing repair.
> Any ideas?
> Thanks,
> Nir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12905) Retry acquire MV lock on failure instead of throwing WTE on streaming

2016-11-29 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15704715#comment-15704715
 ] 

Benjamin Roth commented on CASSANDRA-12905:
---

Problem 3.
All MV updates that happen during bootstrap are sent to batchlog (see 
StorageProxy.mutateMV). This puts so much pressure on the BatchlogManager and 
causes zillions of compactions of system.batches during bootstraps. The fact 
that the batchlog implementation is an antipattern (dont use CS as a queue) 
does not improve the situation. BL has to deal with more and more tombstones 
the larger the log gets. I observed batchlogs with 60GBs.

Not sending tables with MVs through regular write path on bootstrap would solve 
problem (1.) and (3.). (2.) still persists but can easily handled by disabling 
timeout for mutations from hints.

> Retry acquire MV lock on failure instead of throwing WTE on streaming
> -
>
> Key: CASSANDRA-12905
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12905
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: centos 6.7 x86_64
>Reporter: Nir Zilka
>Priority: Critical
> Fix For: 3.9
>
>
> Hello,
> I performed two upgrades to the current cluster (currently 15 nodes, 1 DC, 
> private VLAN),
> first it was 2.2.5.1 and repair worked flawlessly,
> second upgrade was to 3.0.9 (with upgradesstables) and also repair worked 
> well,
> then i upgraded 2 weeks ago to 3.9 - and the repair problems started.
> there are several errors types from the system.log (different nodes) :
> - Sync failed between /xxx.xxx.xxx.xxx and /xxx.xxx.xxx.xxx
> - Streaming error occurred on session with peer xxx.xxx.xxx.xxx Operation 
> timed out - received only 0 responses
> - Remote peer xxx.xxx.xxx.xxx failed stream session
> - Session completed with the following error
> org.apache.cassandra.streaming.StreamException: Stream failed
> 
> i use 3.9 default configuration with the cluster settings adjustments (3 
> seeds, GossipingPropertyFileSnitch).
> streaming_socket_timeout_in_ms is the default (8640).
> i'm afraid from consistency problems while i'm not performing repair.
> Any ideas?
> Thanks,
> Nir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    1   2