[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2014-10-31 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192869#comment-14192869
 ] 

Jonathan Ellis commented on CASSANDRA-5863:
---

bq. The Tricky part is tracking the "hotness" of these chunks. It needs to 
track the number of times the chunk was decompressed in the last X seconds. 

Backing up a bit -- why not just take a LRU approach?  When we uncompress a 
chunk, we cache it.  Add metrics so users can monitor cache churn and disable 
if it's not useful.  (But since our chunks are fairly large, and thus 
decompressing is relatively expensive, I think we could tolerate relatively 
high churn.)

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: T Jake Luciani
>  Labels: performance
> Fix For: 3.0
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2014-11-02 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193874#comment-14193874
 ] 

T Jake Luciani commented on CASSANDRA-5863:
---

bq. why not just take a LRU approach?

This is what my initial attempt does mostly but the perf impact of always 
putting stuff into the cache is high (since it uses off heap memcopy)

I can resurect this code and show how it looks.  Perhaps the new cache impl 
Vijay is working on will improve this.

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: T Jake Luciani
>  Labels: performance
> Fix For: 3.0
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2014-11-03 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195362#comment-14195362
 ] 

Jonathan Ellis commented on CASSANDRA-5863:
---

If "copy a chunk" cost is so high, does that mean we should be using smaller 
compression chunks?

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: T Jake Luciani
>  Labels: performance
> Fix For: 3.0
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2014-04-15 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969831#comment-13969831
 ] 

T Jake Luciani commented on CASSANDRA-5863:
---

I do think having a set of fast disks for hot data that doesn't fit into memory 
is key because in a large per node deployment you want:

1.  Memory (Really hot data)
2.  SSD (Hot data that doesn't fit in memory)
3.  Spinning disk (Historic cold data) 

[~benedict] you are describing building a custom page cache impl off heap which 
is pretty ambitious.  Don't you think a baby step would be to rely on the OS 
page cache to start and build a custom one as a phase II?

What would be the page size for uncompressed data.  For compressed the chunk 
size (conceptually) fits nicely. 

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: T Jake Luciani
>Assignee: Pavel Yaskevich
>  Labels: performance
> Fix For: 2.1 beta2
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data. Initially this could be a off heap cache but it would 
> be great to put these decompressed chunks onto a SSD so the hot data lives on 
> a fast disk similar to https://github.com/facebook/flashcache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969927#comment-13969927
 ] 

Benedict commented on CASSANDRA-5863:
-

I think there are at least three issues we're contending with here, and each 
need their own ticket (eventually). Putting historic data on slow drives is, I 
think, a different problem to putting a cache on some fast disks. Both will be 
helpful. Ideally I think we want the following tiers:

# Uncompressed Memory Cache
# Compressed Memory Cache (disjoint set from 1)
# Compressed SSD cache
# Regular Data
# Archived/Cold/Historic Data

The main distinction being the added "regular data" layer: any special "fast 
disk" cache should not store the full sstable hierarchy and its related files, 
it should just store the most popular blocks (or portions of blocks)

bq. Benedict you are describing building a custom page cache impl off heap 
which is pretty ambitious. Don't you think a baby step would be to rely on the 
OS page cache to start and build a custom one as a phase II?

People get very worried when they think they're competing with the kernel 
developers. Often for good reason, but since we don't have to be all things to 
all people we get the opportunity to make economies that aren't always as 
easily available to them. But also we only need to get roughly the same 
performance so we can build on this to make inroads elsewhere. What we're 
talking about here is pretty straight forward - it's one of the less 
challenging problems. A compressed page cache is more challenging, since we 
don't have a uniform size, but it is still probably not too difficult. Take a 
look at my suggestion for a key cache in CASSANDRA-6709 for a detailed 
description of how I would build the offheap structure.

The basic approach I would probably take is this: deal with 4Kb blocks. Any 
blocks we read from disk larger than this we split up into 4Kb chunks and 
insert each into the cache separately*. The cache itself is 8- or 16-way 
associative, with 3 components: a long storing the LRU information for the 
bucket, 16-longs storing identity information for the lookup within the bucket, 
and corresponding positions in a large address space storing each of the 4Kb 
data chunks. Readers always hit the cache, and if they miss they populate the 
cache using the appropriate reader before continuing. Regrettably we don't have 
access to SIMD instructions or we could do a lot of this stuff tremendously 
efficiently, but even without that it should be pretty nippy.

*This allows us to have a greater granularity for eviction and keeps cpu-cache 
traffic when reading from the cache to a minimum. It's also a pretty optimal 
size for reading/writing to SSD if we overflow to disk, and is a sufficiently 
large amount to get good compression for an in-memory compressed cache, whilst 
still being small enough to stream&decompress from main-memory without a major 
penalty to lookup a small part of it.

As to having a fast disk cache, I also think this is a great idea. But I think 
it fits in as an extension of this and any compressed in-memory cache, as we 
build a tiered-cache architecture.

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: T Jake Luciani
>Assignee: Pavel Yaskevich
>  Labels: performance
> Fix For: 2.1 beta2
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data. Initially this could be a off heap cache but it would 
> be great to put these decompressed chunks onto a SSD so the hot data lives on 
> a fast disk similar to https://github.com/facebook/flashcache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2014-04-21 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976356#comment-13976356
 ] 

Jonathan Ellis commented on CASSANDRA-5863:
---

bq. [[~xedin]] I have one more idea how to make it work, will keep you posted...

What was that?

bq. [[~benedict]] As to having a fast disk cache, I also think this is a great 
idea. But I think it fits in as an extension of this and any compressed 
in-memory cache

+1 separate ticket (and a harder one IMO)

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: T Jake Luciani
>Assignee: Pavel Yaskevich
>  Labels: performance
> Fix For: 2.1 beta2
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data. Initially this could be a off heap cache but it would 
> be great to put these decompressed chunks onto a SSD so the hot data lives on 
> a fast disk similar to https://github.com/facebook/flashcache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2014-04-21 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976380#comment-13976380
 ] 

Pavel Yaskevich commented on CASSANDRA-5863:


[~jbellis] I tried to directly replace blocks of the compressed file with 
uncompressed content (align all of the blocks to 64KB boundary effectively 
creating file holes, mprotect some of the blocks to be writable, write 
uncompressed contents), keeping per file block global heat map based on key 
cache, but that didn't work out.

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: T Jake Luciani
>Assignee: Pavel Yaskevich
>  Labels: performance
> Fix For: 2.1 beta2
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data. Initially this could be a off heap cache but it would 
> be great to put these decompressed chunks onto a SSD so the hot data lives on 
> a fast disk similar to https://github.com/facebook/flashcache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-23 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255189#comment-15255189
 ] 

Pavel Yaskevich commented on CASSANDRA-5863:


[~blambov] Sorry for the delay, I'm planing to look at the code shortly. While 
I'm on it, do you think it would be possible (if it hasn't been done already) 
to simulate situation when single key read touches multiple SSTables (aka 
multi-collation case)? That, I think, might be one of the interesting cases for 
cache performance even without writes present, since it closely reflects some 
of the most common real world situations, which require multiple index/data 
reads per request generating different eviction patterns. 

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-24 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255531#comment-15255531
 ] 

Pavel Yaskevich commented on CASSANDRA-5863:


*General Notes:*
* I think we should try to come up with better names for rebufferer classes, so 
their function is more obvious... Maybe something like {File, FS, Data}Reader 
with load or read method instead of rebuffer.
* Maybe we should try to implement the idea I mentioned when I originally 
worked on the compression support about 5 years go in CASSANDRA-47 which wasn't 
worth it at that time but might be more relevant now :), it might make caching 
of compressed files a lot simpler, here it is - make compressors always return 
a size of the buffer aligned on PAGE_SIZE (default 512 bytes) and leave "holes" 
in the file by seeking to the next alignment, over the years I've double 
checked with multiple familiar people that most of the modern/popular 
filesystems (NTFS, ext*, xfs etc.) already have support for that, and are not 
going to allocate unused blocks as well as place all of the allocated ones 
close together. This is going to help here in following ways:
-- caches don't have to worry about size/alignment of the 
compressed/decompressed chunks;
-- compressed reader is very simple since it has to just align requested 
offsets (allows to remove CompressionInfo segment);
-- there is no need to keep uncompressed size information around since data 
size is the same for compressed/umcompressed cases (everything is already 
aligned);
-- CRAR/CSF and all of the supporting classes are not longer required;
-- and more e.g. we could potentially just re-map compressed pages into 
decompressed on the fly and cache doesn't even have to know.

*Code Notes:*
* why does ReaderCache only account for buffer side instead of key + buffer 
size in weighter, this means cache size is underestimated?
* couple instanceof checks kind of signal that we want to re-evaluate rebuffer 
class hierarchy.
* ReaderCache#invalidateFile is not very efficient O\(n\) from the size of the 
cache, which is used by cleanup of the mmap'ed files, which might be a problem.
* (potential safety improvement) ChecksummedDataInput#readBuffer - should do 
buffer vs. read length validation for -1 situation because otherwise this might 
cause corruption
* HintsReader - adds unused import and commented out seek which should be 
removed
* since CRAR no longer extends RAR header comment about that should be removed 
as well, as the matter of fact since CRAR now is just
  a container of rebuffer implementations, maybe it makes sense to remove it 
all together and just use RAR from CompressedSegmentedFile with different 
"rebuffer" backends,
  so in other words put all of the rebuffers from CRAR to CSF?
* BufferlessRebufferer#rebuffer(long position, ByteBuffer buffer) at least 
requires better clarification of the parameters and return value,
  because in e.g. CRAR it's not exactly clear why would uncompressed buffer be 
provided to also be returned,
  why can't argument just be filled and return type changed to be long which is 
an (aligned) offset of the file?
  Which allows to remove Rebufferer#rebuffer(long) method and always let 
callers provide the buffer to fill,
  since I only see it used in RAR#reBufferAt and LimitingRebufferer where both 
could be made to hold the actual buffer.
  Such allows to converge everything under BufferlessRebufferer and have 
ReaderCache and RAR to handle buffers and divides
  reponsibilities of buffer management and actual block storage handling 
between RAR and Rebufferer.


> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-24 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255538#comment-15255538
 ] 

Benedict commented on CASSANDRA-5863:
-

bq. I've double checked with multiple familiar people that most of the 
modern/popular filesystems (NTFS, ext*, xfs etc.) already have support for that

It's worth double checking what that support entails - in XFS (since I happen 
to have recently read the spec), such a gap would be represented by introducing 
a b+-tree, rather than a single-continuous allocation (on the assumption 
contiguous space was available on disk in the location of the first inode).  
This could result in multiple levels of inode, such that a random seek into the 
file (our usual modus operandi) could incur many more disk accesses than 
previously was the case.

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-24 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255913#comment-15255913
 ] 

Pavel Yaskevich commented on CASSANDRA-5863:


Indeed it comes with expense of utilizing inode cache more but not having to 
worry about keeping CompressionInfo in memory and all other effects and 
complexities which come from current state of compression might be worth it. 
We'll definitely need some experimental results to back it up but even if it 
doesn't work out the way I expect it still would be an interesting experiment 
to make.

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-25 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256531#comment-15256531
 ] 

Branimir Lambov commented on CASSANDRA-5863:


Thanks for the review and ideas. Updated the branch to apply your comments.

bq. why does ReaderCache only account for buffer side instead of key + buffer 
size in weighter, this means cache size is underestimated?

The key itself is a small and fixed part of the overhead (all objects it 
references are already found elsewhere); there are also on-heap support 
structures within the implementing cache which are bigger. Though that's not 
trivial, we could also account for those, but I don't know how that helps cache 
management and sizing for the user.

My point of view is that the user requests a certain amount of cached _data_ 
which is what the weigher currently measures. That enables a simple message to 
the user (_x_ bytes of data off heap with _y_% typical on-heap overhead); 
measuring (or accounting for) both would create difficulties communicating how 
space is allocated and used.

bq. couple instanceof checks kind of signal that we want to re-evaluate 
rebuffer class hierarchy.

You are right, there's no need this to be implemented outside the Rebufferer 
hierarchy. Reorganized.

bq. ReaderCache#invalidateFile is not very efficient O\(n\) from the size of 
the cache, which is used by cleanup of the mmap'ed files, which might be a 
problem.

This needs to be evaluated in more detail to figure out if there's any point 
calling it at all. It is necessary for testing, otherwise we could just as well 
skip the call.

bq. (potential safety improvement) ChecksummedDataInput#readBuffer - should do 
buffer vs. read length validation for -1 situation because otherwise this might 
cause corruption

I'm sorry, I do not understand the problem -- the code only relies on the 
position of the buffer and since buffer is cleared before the read, an end of 
stream (and only that) will result in an empty buffer; both {{read()}} and 
{{readByte()}} interpret this correctly.

bq. HintsReader - adds unused import and commented out seek which should be 
removed

Fixed.

bq. since CRAR no longer extends RAR header comment about that should be 
removed as well, as the matter of fact since CRAR now is just a container of 
rebuffer implementations, maybe it makes sense to remove it all together and 
just use RAR from CompressedSegmentedFile with different "rebuffer" backends, 
so in other words put all of the rebuffers from CRAR to CSF?

Done, with a little extra clean-up around RAR.Builder.

bq. BufferlessRebufferer#rebuffer(long position, ByteBuffer buffer) at least 
requires better clarification of the parameters and return value, because in 
e.g. CRAR it's not exactly clear why would uncompressed buffer be provided to 
also be returned, why can't argument just be filled and return type changed to 
be long which is an (aligned) offset of the file?

I had added a return of the passed buffer for convenience but it also adds 
possibility for error -- changed the return of the method to void. On the other 
point, it does not make sense for the callee to return an (aligned) offset as 
the caller may need to have a better control over positioning before allocating 
the buffer -- caching rebufferers, specifically, do.

bq. Which allows to remove Rebufferer#rebuffer(long) method and always let 
callers provide the buffer to fill, since I only see it used in RAR#reBufferAt 
and LimitingRebufferer where both could be made to hold the actual buffer.

This wasn't the case even before this ticket. When RAR requests rebuffering at 
a certain position, it can either have its buffer filled (direct case), or 
receive a view of a shared buffer that holds the data (mem-mapped case). There 
was a lot of clumsiness in RAR to handle the question of which of these is the 
case, does it own its buffer, should it be allocated or freed. The patch 
addresses this clumsiness as well as allowing for another type of advantageous 
buffer management.

bq. Such allows to converge everything under BufferlessRebufferer and have 
ReaderCache and RAR to handle buffers and divides reponsibilities of buffer 
management and actual block storage handling between RAR and Rebufferer.

Unless you want the cache to copy data into a buffer provided by the RAR rather 
than just provide access to a shared buffer, buffer management is integral in 
caching and mem-mapped access and thus belongs in the rebufferer.

bq. make compressors always return a size of the buffer aligned on PAGE_SIZE 
(default 512 bytes) and leave "holes" in the file

Interesting. Another possibility mentioned before is to implement compression 
in such a way that the _compressed_ size matches the chunk size. Both are 
orthogonal and outside the scope of this ticket -- lets open a new issue for 
that?

> In process (uncompressed) page

[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-25 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257210#comment-15257210
 ] 

Pavel Yaskevich commented on CASSANDRA-5863:


bq. The key itself is a small and fixed part of the overhead (all objects it 
references are already found elsewhere); there are also on-heap support 
structures within the implementing cache which are bigger. Though that's not 
trivial, we could also account for those, but I don't know how that helps cache 
management and sizing for the user.

The problem I see with this is the same for any other data structure on JVM - 
if we don't account for additional overhead at some point it will blow up and 
it won't be pretty, especially if we don't account for internal size of the 
data structure which holds the cache and other overhead like keys and their 
containers, can we claim with certainty that at some capacity it's actual size 
in memory is not going to be 2x or 3x? If yes then let's leave it like it is 
today otherwise we need to do something about it right away.

bq. I'm sorry, I do not understand the problem – the code only relies on the 
position of the buffer and since buffer is cleared before the read, an end of 
stream (and only that) will result in an empty buffer; both read() and 
readByte() interpret this correctly.

Sorry, what I mean - we might want to be more conservative and indicate early 
that requested length is bigger than number of available bytes, we already had 
couple of bugs which where hard to debug because EOFException doesn't provide 
any useful information...

bq. I had added a return of the passed buffer for convenience but it also adds 
possibility for error – changed the return of the method to void. On the other 
point, it does not make sense for the callee to return an (aligned) offset as 
the caller may need to have a better control over positioning before allocating 
the buffer – caching rebufferers, specifically, do.

and 

bq. This wasn't the case even before this ticket. When RAR requests rebuffering 
at a certain position, it can either have its buffer filled (direct case), or 
receive a view of a shared buffer that holds the data (mem-mapped case). There 
was a lot of clumsiness in RAR to handle the question of which of these is the 
case, does it own its buffer, should it be allocated or freed. The patch 
addresses this clumsiness as well as allowing for another type of advantageous 
buffer management.

I understand, I actually started with proposition to return "void" but I 
changed it later on because I saw a possibility to unify bufferless with other 
implementations because essentially the question is - where original data comes 
from - directly from the channel or already mmap'ed buffer, so maybe if we had 
a common interface to both of the cases and used it as a backend for rebufferer 
it would simplify things instead of putting that logic into rebufferer itself? 
Just something to think about...

bq. Interesting. Another possibility mentioned before is to implement 
compression in such a way that the compressed size matches the chunk size. Both 
are orthogonal and outside the scope of this ticket – lets open a new issue for 
that?

I'm fine if we make it a separate ticket but I think we will have to tackle it 
first since it would directly affect rebufferer/cache logic.


> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-25 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257648#comment-15257648
 ] 

Branimir Lambov commented on CASSANDRA-5863:


bq. we might want to be more conservative and indicate early that requested 
length is bigger than number of available bytes

This is not that trivial for rebuffering -- the requested length would normally 
be bigger than the number of available bytes during the last {{reBuffer()}} 
call on the file, thus the changes required to implement this check are too 
substantial to be within the scope of this ticket.

bq. I saw a possibility to unify bufferless with other implementations because 
essentially the question is - where original data comes from - directly from 
the channel or already mmap'ed buffer

I started with a single implementation and quickly found out it is not 
sufficient. There are other things to take into account:
- Does the buffer hold all the data you need for decompression?
- Is it properly aligned so it can be cached?
- How do you signal that you are done with it so it can be reclaimed?
- Do you need to copy anything from one buffer to another to make this work?

If you still think a single interface / removing {{Rebufferer}} could work 
efficiently, would you elaborate and, possibly, provide some code?

bq. I think we will have to tackle it first since it would directly affect 
rebufferer/cache logic

The logic is clearly defined for this round. This patch is targeted at 3.x 
where we can't change sstable format in any way, and support for this format 
will be required long in the future.

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-26 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257687#comment-15257687
 ] 

Pavel Yaskevich commented on CASSANDRA-5863:


bq. This is not that trivial for rebuffering – the requested length would 
normally be bigger than the number of available bytes during the last 
{{reBuffer()}} call on the file, thus the changes required to implement this 
check are too substantial to be within the scope of this ticket.

Ok, fair enough.

bq. There are other things to take into account:

Let me address the things you mentioned - all of the data is written into 
SSTable file in fixed size chucks and most of the rebuffers are done (or have 
been done) in the granularity of 64k or what ever compression buffer size was, 
since SSTable has compression parameters we might want to have cache work on 
the level of sstables instead of individual files, that way we can get access 
to some essential metadata. So what I was saying is that cache could hold 
already decompressed 64k (or other power of 2 size) aligned buffers, either raw 
file data or decompressed data based on the file; backend implementation, 
plugged into rebufferer, would mmap or use regular channel read to read buffer 
size aligned chunks based on position given to it, in compressed mode cache 
would hold decompressed buffers so it doesn't have to share mmap'ed buffers. 
ReaderCache can rely on LIRS or LRU as a replacement mechanism for aligned 
buffers, so each buffer is going to be reclaimed when either sstable is removed 
(another reason to work closely with sstables) or replacement mechanism 
indicates that buffer is no longer viable or invalidated manually. Sorry I 
can't provide code, so if you think that rethinking this is not worth it, I'm 
fine with that.

bq. The logic is clearly defined for this round. This patch is targeted at 3.x 
where we can't change sstable format in any way, and support for this format 
will be required long in the future.

I wasn't aware of that. [~jbellis] We can't modify SSTable format at all while 
in 3.x phrase, even backward compatible changes in the even feature releases?

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-26 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257722#comment-15257722
 ] 

Branimir Lambov commented on CASSANDRA-5863:


bq. Let me address the things you mentioned - ...

You seem to be describing exactly what is currently implemented. The sstable 
metadata is part of the data rebufferers work with, in that sense they do work 
on the sstable level. {{BufferlessRebufferer}} is the backend, {{Rebufferer}} 
is the front. What fills the cache and what use data from it need different 
interfaces, and the primary difference is the buffer management. In the 
compressed case the cache provides shared decompressed buffers and does not 
give anyone access to the underlying (mmapped or not) file or buffers. RAR does 
not know anything or care about the underlying sstable format, and apart from 
the chunk size neither does {{ReaderCache}}.

Perhaps the only not-yet-addressed point is the granularity of the cache, if I 
understand you correctly you are describing per-file/sstable caches: do you 
mean a specific space allocation for each file? If so, how do you propose to 
manage splitting the space among the individual caches? If not (i.e. per-file 
maps with shared eviction strategy), this is a sensible option that I started 
pursuing as part of CASSANDRA-11452 within the context of this infrastructure 
and decided to forego at this point because the benefit it would provide over 
just using Caffeine would not be substantial enough for the amount of new code, 
complexity, testing and risk it requires.

The latter is a decision that can be very easily changed in the future.

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-26 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258961#comment-15258961
 ] 

Pavel Yaskevich commented on CASSANDRA-5863:


[~blambov] What I meant is similar to per-file map with shared eviction 
strategy, if you already have that in mind - perfect :) What you are saying 
regarding rebufferers makes sense to me, I was just trying to advocate for is 
providing better distinction between BufferlessRebufferer and Rebufferer at 
least via naming so "rebuffer" or "buffer processor" is the thing with holds 
the actual processing logic and BufferlessRebufferer is essentially "data 
source" or "data provider/producer" for it. I'm asking to do this because for 
me, as an observer of the changes, the distinction wasn't clear from the first 
glimpse, what made it especially confusing is rebuffer method itself.

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-27 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260127#comment-15260127
 ] 

Branimir Lambov commented on CASSANDRA-5863:


In the latest couple of updates I did some renaming:
- {{BufferlessRebufferer}} to {{ChunkReader}} with {{rebuffer}} to {{readChunk}}
- {{BaseRebufferer}} to {{ReaderFileProxy}}
- {{SharedRebufferer}} to {{RebuffererFactory}} with factory method
- {{ReaderCache}} to {{ChunkCache}}

and updated some of the documentation. Hopefully this reads better now?

Switched to Caffeine as planned in CASSANDRA-11452:
- [better cache 
efficiency|https://docs.google.com/spreadsheets/d/11VcYh8wiCbpVmeix10onalAS4phfREWcxE-RMPTM7cc/edit#gid=0]
 on CachingBench which includes compaction, scans and collation from multiple 
sstables
- [cstar_perf with everything served off 
cache|http://cstar.datastax.com/tests/id/b5963866-0b9a-11e6-a761-0256e416528f] 
shows equivalent performance, i.e. it does not degrade on heavy load
- [cstar_perf on smaller 
cache|http://cstar.datastax.com/tests/id/41b4c650-0c6d-11e6-bf41-0256e416528f] 
shows better hit rate even with uniformly random access patterns (48.8 vs 45.4% 
as reported by nodetool info)
- unlike LIRS, memory overheads are very controlled and specified 
[here|https://github.com/ben-manes/caffeine/wiki/Memory-overhead]: at most 112 
bytes per chunk including key, i.e. 0.2% for 64k chunks to 3% for 4k chunks.

And finally rebased to get dtest in sync:
|[code|https://github.com/blambov/cassandra/tree/5863-page-cache-caffeine-rebased]|[utest|http://cassci.datastax.com/job/blambov-5863-page-cache-caffeine-rebased-testall/]|[dtest|http://cassci.datastax.com/job/blambov-5863-page-cache-caffeine-rebased-dtest/]|

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-27 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261121#comment-15261121
 ] 

Pavel Yaskevich commented on CASSANDRA-5863:


+1 on the changes, much more readable now. Maybe one more nit from my original 
comments - is there anyway we can change ChunkCache#invalidatePosition so 
instead of doing instance-of checks and redirects to CachedRebufferer it simply 
does invalidate(new Key(...)), since ChunkReader is effectively stateless maybe 
we could drop RebuffererFactory and use ChunkReader as a source of all 
Rebufferers? This way IMHO it's clearer that ChunkReader is the source of the 
data and doesn't have any bufferering, if buffering/caching is needed it can 
produce Rebufferer which manages the memory, WDYT?

Also how do you want to proceed with this? After all of the changes can you 
squash/rebase, so I can push?



> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-28 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262132#comment-15262132
 ] 

Branimir Lambov commented on CASSANDRA-5863:


bq. change ChunkCache#invalidatePosition so instead of doing instance-of checks 
and redirects to CachedRebufferer it simply does invalidate(new Key(...))

Key needs the file's chunk reader's type (otherwise it may confuse uncompressed 
and compressed readings of the same file which is a problem for some tests) and 
to get to that it still needs to cast. If you prefer, I can declare 
{{invalidate}} in {{RebuffererFactory}} and only implement it in 
{{CachingRebufferer}} to avoid the cast/instanceof?

bq. since ChunkReader is effectively stateless maybe we could drop 
RebuffererFactory and use ChunkReader as a source of all Rebufferers?

We want the {{CachingRebufferer}} to be shared and not re-instantiated for each 
reader; such a change would mean {{ChunkReader}} must hold a reference to one 
over itself. I would rather not do that, because it is a bit of too-tight 
coupling and dependency inversion. It would make extensions harder (e.g. 
multiple cache types as played with in 11452).


Squashed the commits, rebased again, expanded the comment on 
{{file_cache_size_in_mb}} in {{cassandra.yaml}} and added the {{CHANGES.txt}} 
entry and commit description:
|[trunk 
patch|https://github.com/blambov/cassandra/tree/5863-page-cache-squashed]|[utest|http://cassci.datastax.com/job/blambov-5863-page-cache-squashed-testall/]|[dtest|http://cassci.datastax.com/job/blambov-5863-page-cache-squashed-dtest/]|


> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-28 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262159#comment-15262159
 ] 

Branimir Lambov commented on CASSANDRA-5863:


Created CASSANDRA-11681 to optimize the {{file_cache_size_in_mb}} default.

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-05-03 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268305#comment-15268305
 ] 

Branimir Lambov commented on CASSANDRA-5863:


Uploaded {{build.xml}} patch 
[here|https://github.com/blambov/cassandra/tree/5863-build]. [~jjordan], could 
you review and maybe commit?

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.6
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-05-03 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268655#comment-15268655
 ] 

T Jake Luciani commented on CASSANDRA-5863:
---

I committed in (my own patch) {{7c559def3422fe9e0edb161eeff85ae9ca853952}}

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.6
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-02-28 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171263#comment-15171263
 ] 

Pavel Yaskevich commented on CASSANDRA-5863:


I've been working on Windmill [1], recently, and it contains a page cache 
implementation that may be useful here or could be used as a reference. The 
implementation is similar to the Linux Kernel’s; It is based on a static, fixed 
maximum length Radix Tree [2] and supports asynchronous reads and writes using 
a dedicated thread pool. I’ve recently added DMA support, via O_DIRECT, and I 
am working on something similar to posix_fadvise to provide even more granular 
file access controls, as well. 

[1] https://github.com/xedin/windmill
[2] 
https://0xax.gitbooks.io/linux-insides/content/DataStructures/radix-tree.html

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-03-11 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191284#comment-15191284
 ] 

Branimir Lambov commented on CASSANDRA-5863:


Work-in-progress patch here:
|[code|https://github.com/blambov/cassandra/tree/5863-page-cache]|[utest|http://cassci.datastax.com/job/blambov-5863-page-cache-testall/]|[dtest|http://cassci.datastax.com/job/blambov-5863-page-cache-dtest/]|[cstar_perf|http://cstar.datastax.com/tests/id/aff1cbdc-e6bf-11e5-9473-0256e416528f]|

The patch mainly refactors things around {{\[Compressed\]RandomAccessReader}} 
to allow caching to be inserted between the RAR and decompressor, and adds a 
basic cache built around guava's {{LoadingCache}}. Perf results show this to be 
viable even in this basic form, as there's a clear performance benefit. It is 
more pronounced looking at the local read latency (taken from {{nodetool 
tablestats}}):
Work-in-progress patch here:
|[code|https://github.com/blambov/cassandra/tree/5863-page-cache]|[utest|http://cassci.datastax.com/job/blambov-5863-page-cache-testall/]|[dtest|http://cassci.datastax.com/job/blambov-5863-page-cache-dtest/]|[cstar_perf|http://cstar.datastax.com/tests/id/aff1cbdc-e6bf-11e5-9473-0256e416528f]|

The patch mainly refactors things around {{\[Compressed\]RandomAccessReader}} 
to allow caching to be inserted between the RAR and decompressor, and adds a 
basic cache built around guava's {{LoadingCache}}. Perf results show this to be 
viable even in this basic form, as there's a clear performance benefit. It is 
more pronounced looking at the local read latency (taken from {{nodetool 
tablestats}}):
| |node 1|node2|node3||
|baseline|0.197 ms|0.194 ms|0.196 ms|
|with uncompressed chunk cache|0.172 ms|0.170 ms|0.158 ms|
|chunk cache and buffer pool off|0.217 ms|0.210 ms|0.206 ms|
as well as in single-node runs on my local machine where cache on takes reads 

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-03-22 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206459#comment-15206459
 ] 

Branimir Lambov commented on CASSANDRA-5863:


Updated patch here:
|[code|https://github.com/blambov/cassandra/tree/5863-page-cache-4-rebase]|[utest|http://cassci.datastax.com/job/blambov-5863-page-cache-4-rebase-testall/]|[dtest|http://cassci.datastax.com/job/blambov-5863-page-cache-4-rebase-dtest/]|[cstar_perf|http://cstar.datastax.com/tests/id/9d46f64a-f003-11e5-9527-0256e416528f]|

The new version takes care of the eviction issue, adds metrics (misses, 
requests, hit ratio, latency of read after a miss), refactors a bit more to 
reduce the number of extra objects and clarify the code, and adds a 
single-threaded fixed-seed benchmark (mostly taken from CASSANDRA-7019) that 
clearly shows the effects, includes scans and compactions to highlight LRU 
weaknesses, and should be more responsive to smaller changes in how the actual 
caching is done.

The code still uses Guava's {{LoadingCache}}, experiments with other solutions 
and a custom implementation will come next.

{{cstar_perf}} above clearly shows benefits. Data from a run of the included 
{{CachingBench}} is shown below; it also demonstrates significant benefits, 
also for uncompressed data:
{code}
Reader RandomAccessReader:CachingRebufferer:MmapRebufferer(... LZ4Compressor, 
chunk length 65536)
Cache size 480 MB requests 16,434,534 hit ratio 0.846705
Operations completed in 442.415s

Reader RandomAccessReader:CachingRebufferer:StandardRebufferer(... 
LZ4Compressor, chunk length 65536)
Cache size 480 MB requests 16,439,112 hit ratio 0.846770
Operations completed in 471.663s

Reader RandomAccessReader:BufferManagingRebufferer.Aligned:MmapRebufferer(... 
LZ4Compressor, chunk length 65536)
Cache disabled
Operations completed in 703.370s

Reader 
RandomAccessReader:BufferManagingRebufferer.Aligned:StandardRebufferer(... 
LZ4Compressor, chunk length 65536)
Cache disabled
Operations completed in 847.063s

Reader RandomAccessReader:CachingRebufferer:SimpleReadRebufferer(... chunk 
length 12288)
Cache size 479.88 MB requests 17,125,696 hit ratio 0.851649
Operations completed in 450.076s

Reader 
RandomAccessReader:BufferManagingRebufferer.Unaligned:SimpleReadRebufferer(... 
chunk length 12288)
Cache disabled
Operations completed in 564.559s

Reader RandomAccessReader:MmapRebufferer(...)
Cache disabled
Operations completed in 403.994s
{code}


> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-03-22 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206490#comment-15206490
 ] 

T Jake Luciani commented on CASSANDRA-5863:
---

Can you re-run the stress with 500 threads vs 50 to see what it looks like 
under some load?

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-03-22 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206596#comment-15206596
 ] 

Branimir Lambov commented on CASSANDRA-5863:


bq. Can you re-run the stress with 500 threads vs 50 to see what it looks like 
under some load?

The run is 
[here|http://cstar.datastax.com/tests/id/2324a078-f040-11e5-8634-0256e416528f]. 
This time the 8G run is only as fast as the 480M one; this may be because this 
many threads need more memory and we reserve 8G but only use 0.85 (this can be 
found in the nodetool info output). Room for improvement there.

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)