[jira] [Created] (CASSANDRA-5384) SSTables are evicted from the page cache during compaction even if populate_io_cache_on_flush is true

2013-03-26 Thread Jouni Hartikainen (JIRA)
Jouni Hartikainen created CASSANDRA-5384:


 Summary: SSTables are evicted from the page cache during 
compaction even if populate_io_cache_on_flush is true
 Key: CASSANDRA-5384
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5384
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.2.3
Reporter: Jouni Hartikainen
Priority: Minor


AbstractCompactionStrategy acquires direct scanners on SSTables to be 
compacted. These scanners are always created with skipIOCache set true. Because 
of this, compactions even for CFs that have populate_io_cache_on_flush set to 
true will evict source SSTables from the page cache after 128MB 
(CACHE_FLUSH_INTERVAL_IN_BYTES in RandomAccessReader) have been read from them. 

This leads to disk reads even in cases where the dataset completely fits into 
memory and unnecessarily limits compaction throughput on nodes that have lots 
of RAM.

Maybe compaction strategy should try to avoid skipping IO cache if CF has 
populate_io_cache_on_flush set to true?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-5244) Compactions don't work while node is bootstrapping

2013-02-12 Thread Jouni Hartikainen (JIRA)
Jouni Hartikainen created CASSANDRA-5244:


 Summary: Compactions don't work while node is bootstrapping
 Key: CASSANDRA-5244
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5244
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Jouni Hartikainen
Priority: Critical


It seems that there is a race condition in StorageService that prevents 
compactions from completing while node is in a bootstrap state.

I have been able to reproduce this multiple times by throttling streaming 
throughput to extend the bootstrap time while simultaneously inserting data to 
the cluster.

The problems lies in the synchronization of initServer(int delay) and 
reportSeverity(double incr) methods as they both try to acquire the instance 
lock of StorageService through the use of synchronized keyword. As initServer 
does not return until the bootstrap has completed, all calls to reportSeverity 
will block until that. However, reportSeverity is called when starting 
compactions in CompactionInfo and thus all compactions block until bootstrap 
completes. 

This might severely degrade node's performance after bootstrap as it might have 
lots of compactions pending while simultaneously starting to serve reads.

I have been able to solve the issue by adding a separate lock for 
reportSeverity and removing its class level synchronization. This of course is 
not a valid approach if we must assume that any of Gossiper's 
IEndpointStateChangeSubscribers could potentially end up calling back to 
StorageService's synchronized methods. However, at least at the moment, that 
does not seem to be the case.

Maybe somebody with more experience about the codebase comes up with a better 
solution?

(This might affect DynamicEndpointSnitch as well, as it also calls to 
reportSeverity in its setSeverity method)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4784) Create separate sstables for each token range handled by a node

2013-02-09 Thread Jouni Hartikainen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575153#comment-13575153
 ] 

Jouni Hartikainen commented on CASSANDRA-4784:
--

I'm not really sure if I understood this correctly, but wouldn't this change 
lead to memtable flushes creating much more random I/O than previously? 
Especially when using vnodes wouldn't the incoming data be spread to num_tokens 
files per CF instead of one per CF? Wouldn't this affect compactions as well? 
E.g. for default size tiered strategy, instead of compacting 4 larger SSTables 
into one even larger per CF, we would be compacting num_tokens * 4 smaller 
files into num_tokens larger ones per CF.

Am I missing something here?

> Create separate sstables for each token range handled by a node
> ---
>
> Key: CASSANDRA-4784
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4784
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: sankalp kohli
>Assignee: Benjamin Coverston
>Priority: Minor
>  Labels: perfomance
> Fix For: 2.0
>
> Attachments: 4784.patch
>
>
> Currently, each sstable has data for all the ranges that node is handling. If 
> we change that and rather have separate sstables for each range that node is 
> handling, it can lead to some improvements.
> Improvements
> 1) Node rebuild will be very fast as sstables can be directly copied over to 
> the bootstrapping node. It will minimize any application level logic. We can 
> directly use Linux native methods to transfer sstables without using CPU and 
> putting less pressure on the serving node. I think in theory it will be the 
> fastest way to transfer data. 
> 2) Backup can only transfer sstables for a node which belong to its primary 
> keyrange. 
> 3) ETL process can only copy one replica of data and will be much faster. 
> Changes:
> We can split the writes into multiple memtables for each range it is 
> handling. The sstables being flushed from these can have details of which 
> range of data it is handling.
> There will be no change I think for any reads as they work with interleaved 
> data anyway. But may be we can improve there as well? 
> Complexities:
> The change does not look very complicated. I am not taking into account how 
> it will work when ranges are being changed for nodes. 
> Vnodes might make this work more complicated. We can also have a bit on each 
> sstable which says whether it is primary data or not. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-5097) cassandra-shuffle fails as system keyspace is not user-modifiable

2013-01-02 Thread Jouni Hartikainen (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jouni Hartikainen updated CASSANDRA-5097:
-

Attachment: CASSANDRA-5097.patch

> cassandra-shuffle fails as system keyspace is not user-modifiable
> -
>
> Key: CASSANDRA-5097
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5097
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core, Tools
>Affects Versions: 1.2.0 rc2, 1.2.0
>Reporter: Jouni Hartikainen
>Assignee: Aleksey Yeschenko
> Fix For: 1.2.1
>
> Attachments: CASSANDRA-5097.patch
>
>
> cassandra-shuffle tool fails to insert calculated relocations into the system 
> keyspace as it is not user-modifiable. When run, the following exception is 
> thrown after printing out the list of relocations for the first node in ring:
> Exception in thread "main" java.lang.RuntimeException: 
> InvalidRequestException(why:system keyspace is not user-modifiable.)
> at 
> org.apache.cassandra.tools.Shuffle.executeCqlQuery(Shuffle.java:516)
> at org.apache.cassandra.tools.Shuffle.shuffle(Shuffle.java:359)
> at org.apache.cassandra.tools.Shuffle.main(Shuffle.java:678)
> Caused by: InvalidRequestException(why:system keyspace is not 
> user-modifiable.)
> at 
> org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result.read(Cassandra.java:37849)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
> at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql3_query(Cassandra.java:1562)
> at 
> org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Cassandra.java:1547)
> at 
> org.apache.cassandra.tools.CassandraClient.execute_cql_query(Shuffle.java:733)
> at 
> org.apache.cassandra.tools.Shuffle.executeCqlQuery(Shuffle.java:502)
> ... 2 more
> By quickly checking the code it seems that the patch set for CASSANDRA-4874 
> disallows modifications to system keyspace again (they were previously 
> allowed by CASSANDRA-4664) thus rendering cassandra-shuffle unable to do its 
> job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-5097) cassandra-shuffle fails as system keyspace is not user-modifiable

2013-01-02 Thread Jouni Hartikainen (JIRA)
Jouni Hartikainen created CASSANDRA-5097:


 Summary: cassandra-shuffle fails as system keyspace is not 
user-modifiable
 Key: CASSANDRA-5097
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5097
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.2.0 rc2
Reporter: Jouni Hartikainen


cassandra-shuffle tool fails to insert calculated relocations into the system 
keyspace as it is not user-modifiable. When run, the following exception is 
thrown after printing out the list of relocations for the first node in ring:

Exception in thread "main" java.lang.RuntimeException: 
InvalidRequestException(why:system keyspace is not user-modifiable.)
at org.apache.cassandra.tools.Shuffle.executeCqlQuery(Shuffle.java:516)
at org.apache.cassandra.tools.Shuffle.shuffle(Shuffle.java:359)
at org.apache.cassandra.tools.Shuffle.main(Shuffle.java:678)
Caused by: InvalidRequestException(why:system keyspace is not user-modifiable.)
at 
org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result.read(Cassandra.java:37849)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at 
org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql3_query(Cassandra.java:1562)
at 
org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Cassandra.java:1547)
at 
org.apache.cassandra.tools.CassandraClient.execute_cql_query(Shuffle.java:733)
at org.apache.cassandra.tools.Shuffle.executeCqlQuery(Shuffle.java:502)
... 2 more

By quickly checking the code it seems that the patch set for CASSANDRA-4874 
disallows modifications to system keyspace again (they were previously allowed 
by CASSANDRA-4664) thus rendering cassandra-shuffle unable to do its job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-5097) cassandra-shuffle fails as system keyspace is not user-modifiable

2013-01-02 Thread Jouni Hartikainen (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jouni Hartikainen updated CASSANDRA-5097:
-

Affects Version/s: 1.2.0

> cassandra-shuffle fails as system keyspace is not user-modifiable
> -
>
> Key: CASSANDRA-5097
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5097
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 1.2.0 rc2, 1.2.0
>Reporter: Jouni Hartikainen
>
> cassandra-shuffle tool fails to insert calculated relocations into the system 
> keyspace as it is not user-modifiable. When run, the following exception is 
> thrown after printing out the list of relocations for the first node in ring:
> Exception in thread "main" java.lang.RuntimeException: 
> InvalidRequestException(why:system keyspace is not user-modifiable.)
> at 
> org.apache.cassandra.tools.Shuffle.executeCqlQuery(Shuffle.java:516)
> at org.apache.cassandra.tools.Shuffle.shuffle(Shuffle.java:359)
> at org.apache.cassandra.tools.Shuffle.main(Shuffle.java:678)
> Caused by: InvalidRequestException(why:system keyspace is not 
> user-modifiable.)
> at 
> org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result.read(Cassandra.java:37849)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
> at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql3_query(Cassandra.java:1562)
> at 
> org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Cassandra.java:1547)
> at 
> org.apache.cassandra.tools.CassandraClient.execute_cql_query(Shuffle.java:733)
> at 
> org.apache.cassandra.tools.Shuffle.executeCqlQuery(Shuffle.java:502)
> ... 2 more
> By quickly checking the code it seems that the patch set for CASSANDRA-4874 
> disallows modifications to system keyspace again (they were previously 
> allowed by CASSANDRA-4664) thus rendering cassandra-shuffle unable to do its 
> job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira