RE: [DISCUSSION] Next release roadmap

2021-04-14 Thread Kokoori, Shylaja
RE: Pluggability improvements, this is a great idea. This will be good for 
persistent memory support 

https://issues.apache.org/jira/browse/CASSANDRA-13981 - Enable Cassandra for 
Persistent Memory, can be easily refactored to be a pluggable memtable.



> On Apr 8, 2021, at 8:22 AM, Benjamin Lerer  wrote:
> 
> On our side, the list of improvements we plan to deliver for the next 
> release are:
> 
> Query side improvements:
> 
>  * Storage Attached Index or SAI. The CEP can be found at 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage
> +Attached+Index
>  * Add support for OR predicates in the CQL where clause
>  * Allow to aggregate by time intervals (CASSANDRA-11871) and allow 
> UDFs in GROUP BY clause
>  * Materialized views hardening: Addressing the different Materialized 
> Views issues (see CASSANDRA-15921 and [1] for some of the work 
> involved)
> 
> Security improvements:
> 
>  * Add support for Dynamic Data Masking (CEP pending)
>  * Allow the creation of roles that have the ability to assign 
> arbitrary privileges, or scoped privileges without also granting those 
> roles access to database objects.
>  * Filter rows from system and system_schema based on users 
> permissions
> 
> Performance improvements:
> 
>  * Trie-based index format (CEP pending)
>  * Trie-based memtables (CEP pending)
> 
> Safety/Usability improvements:
> 
>  * Guardrails. The CEP can be found at 
> https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CE
> P-3%3A+Guardrails
> 
> Pluggability improvements:
> 
>  * Pluggable schema manager (CEP pending)
>  * Pluggable filesystem (CEP pending)
>  * Memtable API (CEP pending). The goal being to allow improvements 
> such as CASSANDRA-13981 to be easily plugged into Cassandra
> 
> Feedbacks are welcome :-).
> 
> 
> [1]
> https://lists.apache.org/thread.html/r908b5397dd803132822cabe5ba075586
> 1d97bb5d8603a523591d55c9%40%3Cdev.cassandra.apache.org%3E
> 
> Le jeu. 8 avr. 2021 à 17:21, Benjamin Lerer  a écrit :
> 
>> Hi Everybody,
>> Please speak up and tell us what you plan to contribute in the next year.
>> 
>> The goal of this discussion is to allow people to present the 
>> contributions that they have planned for the next Cassandra release.
>> 
>> That discussion has several benefits:
>> 
>>   - It will give a greater visibility on who is planning to contribute
>>   and what their contributions would be. Allowing other contributors to join
>>   the efforts or ask for questions if they wish to.
>>   -  It will also us to synchronize our efforts when some features
>>   impact the same part of the code
>>   - For users, it will provide an idea of what to expect from the next
>>   release
>> 
>> 
>> Thanks in advance for all your inputs.
>> 



RE: [DISCUSSION] Next release roadmap

2021-04-14 Thread Kokoori, Shylaja
Thanks Stefan for bringing up the JIRA 
https://issues.apache.org/jira/browse/CASSANDRA-9633  - Add ability to encrypt 
sstables
I have been working on it and would appreciate some feedback.

I would also like to add to the list this one 
https://issues.apache.org/jira/browse/CASSANDRA-14466  - Enable Direct I/O

-Original Message-
From: Stefan Miklosovic  
Sent: Wednesday, April 14, 2021 12:04 AM
To: dev@cassandra.apache.org
Subject: Re: [DISCUSSION] Next release roadmap

Hi,

for me definitely https://issues.apache.org/jira/browse/CASSANDRA-9633

I am surprised nobody mentioned this in the previous answers, there is
~50 people waiting for it to happen and multiple people working on it seriously 
and wanting that feature to be there for so so long.

We will come up with a more detailed list of things but this just came to my 
mind instantly as I read this thread.

Regards

On Wed, 14 Apr 2021 at 00:53, Sumanth Pasupuleti 
 wrote:
>
> I plan to work on the following
>
>1. CASSANDRA-12106
> Blacklisting bad
>partitions - Rework patch and solicit for feedback/review and have it
>committed
>2. CASSANDRA-14557
> Default and
>required keyspace RF - Patch available; solicit for feedback/review and
>have it committed
>3. CASSANDRA-15433
> Pending ranges
>are not recalculated on keyspace creation - patch available; work on jvm
>dtests, solicit for feedback/review, have it committed.
>4. CASSANDRA-8877 
>Querying TTL and writetime for collections
>5. CASSANDRA-15472
> Read failure due
>to exception from metrics-core dependency
>
>
> On Sun, Apr 11, 2021 at 7:15 PM guo Maxwell  wrote:
>
> >  besides do we need a table level backup and restore solution for 
> > cassandra ? https://issues.apache.org/jira/browse/CASSANDRA-15402
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



RE: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-18 Thread Kokoori, Shylaja
To address Joey's concern, the OpenJDK JVM and its derivatives optimize Java 
crypto based on the underlying HW capabilities. For example, if the underlying 
HW supports AES-NI, JVM intrinsics will use those for crypto operations. 
Likewise, the new vector AES available on the latest Intel platform is utilized 
by the JVM while running on that platform to make crypto operations faster.
 
>From our internal experiments, we see single digit % regression when 
>transparent data encryption is enabled.

-Original Message-
From: bened...@apache.org  
Sent: Thursday, November 18, 2021 1:23 AM
To: dev@cassandra.apache.org
Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption

I agree with Joey that most users may be better served by OS level encryption, 
but I also think this ticket can likely be delivered fairly easily. If we have 
a new contributor willing to produce a patch then the overhead for the project 
in shepherding it shouldn't be that onerous. If we also have known use cases in 
the community then on balance there's a good chance it will be a net positive 
investment for the project to enable users that desire in-database encryption. 
It might even spur further improvements to e.g. streaming performance.

I would scope the work to the minimum viable (but efficient) solution. So, in 
my view, that would mean encrypting per-sstable encryption keys with per-node 
master keys that can be rotated cheaply, requiring authentication to receive a 
stream containing both the unencrypted sstable encryption key and the encrypted 
sstable, and the receiving node encrypting the encryption key before 
serializing it to disk.

Since there are already compression hooks, this means only a little bit of 
special handling, and I _anticipate_ the patch should be quite modest for such 
a notable feature.


From: Ben Slater 
Date: Thursday, 18 November 2021 at 09:07
To: dev@cassandra.apache.org 
Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption I wanted to 
provide a bit of background in the interest we've seen in this ticket/feature 
(at Instaclustr) - essentially it comes down to in-db encryption at rest being 
a feature that compliance people are used to seeing in databases and having a 
very hard time believing that operating system level encryption is an 
equivalent control (whatever the reality may be). I've seen this be a 
significant obstacle for people who want to adopt Apache Cassandra many times 
and an insurmountable obstacle on multiple occasions. From what I've seen, I 
think this is one of the most watched tickets with the most "is this coming 
soon" comments in the project backlog and it's something we pretty regularly 
get asked whether we know if/when it's coming.

That said, I completely agree that we don't want to be engaging in security 
theatre or " introducing something that is either insecure or too slow to be 
useful." and I think there are some really good suggestions in this thread to 
come up with a strong solution for what will undoubtedly be a pretty complex 
and major change.

Cheers
Ben




On Wed, 17 Nov 2021 at 03:34, Joseph Lynch  wrote:

> For FDE you'd probably have  the key file in a tmpfs pulled from a 
> remote secret manager and when the machine boots it mounts the 
> encrypted partition that contains your data files. I'm not aware of 
> anyone doing FDE with a password in production. If you wanted 
> selective encryption it would make sense to me to support placing 
> keyspaces on different data directories (this may already be possible) 
> but since crypto in the kernel is so cheap I don't know why you'd do 
> selective encryption. Also I think it's worth noting many hosting 
> providers (e.g. AWS) just encrypt the disks for you so you can check 
> the "data is encrypted at rest" box.
>
> I think Cassandra will be pretty handicapped by being in the JVM which 
> generally has very slow crypto. I'm slightly concerned that we're 
> already slow at streaming and compaction, and adding slow JVM crypto 
> will make C* even less competitive. For example, if we have to disable 
> full sstable streaming (zero copy or otherwise) I think that would be 
> very unfortunate (although Bowen's approach of sharing one secret 
> across the cluster and then having files use a key derivation function 
> may avoid that). Maybe if we did something like CASSANDRA-15294 [1] to 
> try to offload to native crypto like how internode networking did with 
> tcnative to fix the perf issues with netty TLS with JVM crypto I'd 
> feel a little less concerned but ... crypto that is both secure and 
> performant in the JVM is a hard problem ...
>
> I guess I'm just concerned we're going to introduce something that is 
> either insecure or too slow to be useful.
>
> -Joey
>
> On Tue, Nov 16, 2021 at 8:10 AM Bowen Song  wrote:
> >
> > I don't like the idea that FDE Full Disk Encryption as an 
> > alternative to application managed encryption at rest. Each has 
> > their own advantages 

RE: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-19 Thread Kokoori, Shylaja
I agree with Joey, kernel also should be able to take advantage of the crypto 
acceleration. 

I also want to add, since performance of JDK is a concern here, newer Intel 
Icelake server platforms supports VAES and SHA-NI which further accelerates 
AES-GCM perf by 2x and SHA1 perf by ~6x using JDK 11.
 
Some configuration information for the tests I ran. 

- JDK version used was JDK14 (should behave similarly with JDK11 also). 
- Since the tests were done before 4.0 GA'd, Cassandra version used was 
4.0-beta3. Dataset size was ~500G
- Workloads tested were 100% reads, 100% updates & 80:20 mix with 
cassandra-stress. I have not tested streaming yet.

I would be happy to provide additional data points or make necessary code 
changes based on recommendations from folks here.

Thanks,
Shylaja

-Original Message-
From: Joshua McKenzie  
Sent: Friday, November 19, 2021 4:53 AM
To: dev@cassandra.apache.org
Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption

>
> setting performance requirements on this regard is a nonsense. As long 
> as it's reasonably usable in real world, and Cassandra makes the 
> estimated effects on performance available, it will be up to the 
> operators to decide whether to turn on the feature

I think Joey's argument, and correct me if I'm wrong, is that implementing a 
complex feature in Cassandra that we then have to manage that's essentially 
worse in every way compared to a built-in full-disk encryption option via 
LUKS+LVM etc is a poor use of our time and energy.

i.e. we'd be better off investing our time into documenting how to do full disk 
encryption in a variety of scenarios + explaining why that is our recommended 
approach instead of taking the time and energy to design, implement, debug, and 
then maintain an inferior solution.

On Fri, Nov 19, 2021 at 7:49 AM Joshua McKenzie 
wrote:

> Are you for real here?
>
> Please keep things cordial. Statements like this don't help move the 
> conversation along.
>
>
> On Fri, Nov 19, 2021 at 3:57 AM Stefan Miklosovic < 
> stefan.mikloso...@instaclustr.com> wrote:
>
>> On Fri, 19 Nov 2021 at 02:51, Joseph Lynch  wrote:
>> >
>> > On Thu, Nov 18, 2021 at 7:23 PM Kokoori, Shylaja <
>> shylaja.koko...@intel.com>
>> > wrote:
>> >
>> > > To address Joey's concern, the OpenJDK JVM and its derivatives
>> optimize
>> > > Java crypto based on the underlying HW capabilities. For example, 
>> > > if
>> the
>> > > underlying HW supports AES-NI, JVM intrinsics will use those for
>> crypto
>> > > operations. Likewise, the new vector AES available on the latest 
>> > > Intel platform is utilized by the JVM while running on that 
>> > > platform to make crypto operations faster.
>> > >
>> >
>> > Which JDK version were you running? We have had a number of issues 
>> > with
>> the
>> > JVM being 2-10x slower than native crypto on Java 8 (especially 
>> > MD5,
>> SHA1,
>> > and AES-GCM) and to a lesser extent Java 11 (usually ~2x slower). 
>> > Again
>> I
>> > think we could get the JVM crypto penalty down to ~2x native if we
>> linked
>> > in e.g. ACCP by default [1, 2] but even the very best Java crypto 
>> > I've
>> seen
>> > (fully utilizing hardware instructions) is still ~2x slower than 
>> > native code. The operating system has a number of advantages here 
>> > in that they don't pay JVM allocation costs or the JNI barrier (in 
>> > the case of ACCP)
>> and
>> > the kernel also takes advantage of hardware instructions.
>> >
>> >
>> > > From our internal experiments, we see single digit % regression 
>> > > when transparent data encryption is enabled.
>> > >
>> >
>> > Which workloads are you testing and how are you measuring the
>> regression? I
>> > suspect that compaction, repair (validation compaction), streaming, 
>> > and quorum reads are probably much slower (probably ~10x slower for 
>> > the throughput bound operations and ~2x slower on the read path). 
>> > As compaction/repair/streaming usually take up between 10-20% of 
>> > available
>> CPU
>> > cycles making them 2x slower might show up as <10% overall 
>> > utilization increase when you've really regressed 100% or more on 
>> > key metrics (compaction throughput, streaming throughput, memory 
>> > allocation rate,
>> etc
>> > ...). For example, if compaction was able to achieve 2 MiBps of
>> throughput
>> > before

CASSANDRA-19268: Improve Cassandra compression performance using hardware accelerators

2024-01-19 Thread Kokoori, Shylaja
Hi,
Latest processors have integrated hardware accelerators which can speed up 
operations like compress/decompress, crypto and analytics. Here are some links 
to details
1) https://cdrdv2.intel.com/v1/dl/getContent/721858
2) 
https://www.intel.com/content/www/us/en/content-details/780887/intel-in-memory-analytics-accelerator-intel-iaa.html

We would like to add a new compressor which can accelerate compress/decompress 
when hardware is available and which will default to software otherwise.

Thanks,
Shylaja


RE: CASSANDRA-19268: Improve Cassandra compression performance using hardware accelerators

2024-01-22 Thread Kokoori, Shylaja
Dinesh & Abe,
Thank you very much for your feedback.

The algorithm used by this HW compressor is compatible with Deflate but there 
is a constraint of 4K window size. Therefore the concern is that existing data 
may not decompress correctly as is. That is why we chose the path of adding a 
new compressor.
Another reason is that, there are some additional features available in the 
hardware which are not compatible with zlib. With this approach we could enable 
those features as well.

We are also planning to accelerate existing compressors, if that is the 
preferred approach we will try to come up with a solution to work around the 4k 
window limitation.

Thank you,
Shylaja

From: Dinesh Joshi 
Sent: Monday, January 22, 2024 11:18 AM
To: dev@cassandra.apache.org
Subject: Re: CASSANDRA-19268: Improve Cassandra compression performance using 
hardware accelerators

Shylaja,

Cassandra uses ZStd, LZ4 and other compression libraries via JNI to compress 
data. The intel hardware accelerator support is integrated into those libraries 
and we can benefit from it. If there are special parameters that need to be 
passed in to these libraries we can make those changes on the database but as 
such Cassandra does not directly implement the compression algorithms itself.

Dinesh


RE: CASSANDRA-19268: Improve Cassandra compression performance using hardware accelerators

2024-01-25 Thread Kokoori, Shylaja
Thank you Dinesh.

To answer your questions,

1. QPL Java Library[1] (JNI bindings to Intel's QPL) does not have any license 
information on the repo. This needs to be corrected. Please see the types of 
licenses we can use[2] for further information.

Will address this. Thank you for pointing out

2. Can you describe how the compressor will behave when the cluster is made up 
of heterogeneous hardware? For example, let's say we have a mix of machines 
where some support Intel's IAA and some don't?

If the hardware is not available, all supported functionalities are executed by 
a software library on CPU

3. Does QPL have checksumming built in?

Yes, QPL does calculate checksum. Here is some more information  
https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_cases/deflate/c_deflate_decompression.html#checksums
Will this work?

Thanks,
Shylaja


From: Dinesh Joshi 
Sent: Tuesday, January 23, 2024 10:36 PM
To: dev@cassandra.apache.org
Subject: Re: CASSANDRA-19268: Improve Cassandra compression performance using 
hardware accelerators

Hi Shylaja,

If you'd like we can continue this on the ticket you opened. Here are my 
concerns -

1. QPL Java Library[1] (JNI bindings to Intel's QPL) does not have any license 
information on the repo. This needs to be corrected. Please see the types of 
licenses we can use[2] for further information.

2. Can you describe how the compressor will behave when the cluster is made up 
of heterogeneous hardware? For example, let's say we have a mix of machines 
where some support Intel's IAA and some don't?

3. Does QPL have checksumming built in?

thanks,

Dinesh

[1] https://github.com/intel/qpl-java
[2] https://www.apache.org/legal/resolved.html#category-a

On Mon, Jan 22, 2024 at 6:37 PM Kokoori, Shylaja 
mailto:shylaja.koko...@intel.com>> wrote:
Dinesh & Abe,
Thank you very much for your feedback.

The algorithm used by this HW compressor is compatible with Deflate but there 
is a constraint of 4K window size. Therefore the concern is that existing data 
may not decompress correctly as is. That is why we chose the path of adding a 
new compressor.
Another reason is that, there are some additional features available in the 
hardware which are not compatible with zlib. With this approach we could enable 
those features as well.

We are also planning to accelerate existing compressors, if that is the 
preferred approach we will try to come up with a solution to work around the 4k 
window limitation.

Thank you,
Shylaja

From: Dinesh Joshi mailto:djo...@apache.org>>
Sent: Monday, January 22, 2024 11:18 AM
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>
Subject: Re: CASSANDRA-19268: Improve Cassandra compression performance using 
hardware accelerators

Shylaja,

Cassandra uses ZStd, LZ4 and other compression libraries via JNI to compress 
data. The intel hardware accelerator support is integrated into those libraries 
and we can benefit from it. If there are special parameters that need to be 
passed in to these libraries we can make those changes on the database but as 
such Cassandra does not directly implement the compression algorithms itself.

Dinesh