Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-12-15 Thread Jon Haddad
At a high level I really like the idea of being able to better leverage
cheaper storage especially object stores like S3.

One important thing though - I feel pretty strongly that there's a big,
deal breaking downside.   Backups, disk failure policies, snapshots and
possibly repairs would get more complicated which haven't been particularly
great in the past, and of course there's the issue of failure recovery
being only partially possible if you're looking at a durable block store
paired with an ephemeral one with some of your data not replicated to the
cold side.  That introduces a failure case that's unacceptable for most
teams, which results in needing to implement potentially 2 different backup
solutions.  This is operationally complex with a lot of surface area for
headaches.  I think a lot of teams would probably have an issue with the
big question mark around durability and I probably would avoid it myself.

On the other hand, I'm +1 if we approach it something slightly differently
- where _all_ the data is located on the cold storage, with the local hot
storage used as a cache.  This means we can use the cold directories for
the complete dataset, simplifying backups and node replacements.

For a little background, we had a ticket several years ago where I pointed
out it was possible to do this *today* at the operating system level as
long as you're using block devices (vs an object store) and LVM [1].  For
example, this works well with GP3 EBS w/ low IOPS provisioning + local NVMe
to get a nice balance of great read performance without going nuts on the
cost for IOPS.  I also wrote about this in a little more detail in my blog
[2].  There's also the new mount point tech in AWS which pretty much does
exactly what I've suggested above [3] that's probably worth evaluating just
to get a feel for it.

I'm not insisting we require LVM or the AWS S3 fs, since that would rule
out other cloud providers, but I am pretty confident that the entire
dataset should reside in the "cold" side of things for the practical and
technical reasons I listed above.  I don't think it massively changes the
proposal, and should simplify things for everyone.

Jon

[1] https://rustyrazorblade.com/post/2018/2018-04-24-intro-to-lvm/
[2] https://issues.apache.org/jira/browse/CASSANDRA-8460
[3] https://aws.amazon.com/about-aws/whats-new/2023/03/mountpoint-amazon-s3/


On Thu, Dec 14, 2023 at 1:56 AM Claude Warren  wrote:

> Is there still interest in this?  Can we get some points down on electrons
> so that we all understand the issues?
>
> While it is fairly simple to redirect the read/write to something other
> than the local system for a single node this will not solve the problem for
> tiered storage.
>
> Tiered storage will require that on read/write the primary key be assessed
> and determine if the read/write should be redirected.  My reasoning for
> this statement is that in a cluster with a replication factor greater than
> 1 the node will store data for the keys that would be allocated to it in a
> cluster with a replication factor = 1, as well as some keys from nodes
> earlier in the ring.
>
> Even if we can get the primary keys for all the data we want to write to
> "cold storage" to map to a single node a replication factor > 1 means that
> data will also be placed in "normal storage" on subsequent nodes.
>
> To overcome this, we have to explore ways to route data to different
> storage based on the keys and that different storage may have to be
> available on _all_  the nodes.
>
> Have any of the partial solutions mentioned in this email chain (or
> others) solved this problem?
>
> Claude
>


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-12-14 Thread Claude Warren
Is there still interest in this?  Can we get some points down on electrons so 
that we all understand the issues?

While it is fairly simple to redirect the read/write to something other  than 
the local system for a single node this will not solve the problem for tiered 
storage.

Tiered storage will require that on read/write the primary key be assessed and 
determine if the read/write should be redirected.  My reasoning for this 
statement is that in a cluster with a replication factor greater than 1 the 
node will store data for the keys that would be allocated to it in a cluster 
with a replication factor = 1, as well as some keys from nodes earlier in the 
ring.

Even if we can get the primary keys for all the data we want to write to "cold 
storage" to map to a single node a replication factor > 1 means that data will 
also be placed in "normal storage" on subsequent nodes.

To overcome this, we have to explore ways to route data to different storage 
based on the keys and that different storage may have to be available on _all_  
the nodes.

Have any of the partial solutions mentioned in this email chain (or others) 
solved this problem?

Claude


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-10-31 Thread Claude Warren, Jr via dev
@henrik,  Have you made any progress on this?  I would like to help drive
it forward but I am waiting to see what your code looks like and figure out
what I need to do.  Any update on timeline would be appreciated.

On Mon, Oct 23, 2023 at 9:07 PM Jon Haddad 
wrote:

> I think this is a great more generally useful than the two scenarios
> you've outlined.  I think it could / should be possible to use an object
> store as the primary storage for sstables and rely on local disk as a cache
> for reads.
>
> I don't know the roadmap for TCM, but imo if it allowed for more stable,
> pre-allocated ranges that compaction will always be aware of (plus a bunch
> of plumbing I'm deliberately avoiding the details on), then you could
> bootstrap a new node by copying s3 directories around rather than streaming
> data between nodes.  That's how we get to 20TB / node, easy scale up /
> down, etc, and always-ZCS for non-object store deployments.
>
> Jon
>
> On 2023/09/25 06:48:06 "Claude Warren, Jr via dev" wrote:
> > I have just filed CEP-36 [1] to allow for keyspace/table storage outside
> of
> > the standard storage space.
> >
> > There are two desires  driving this change:
> >
> >1. The ability to temporarily move some keyspaces/tables to storage
> >outside the normal directory tree to other disk so that compaction can
> >occur in situations where there is not enough disk space for
> compaction and
> >the processing to the moved data can not be suspended.
> >2. The ability to store infrequently used data on slower cheaper
> storage
> >layers.
> >
> > I have a working POC implementation [2] though there are some issues
> still
> > to be solved and much logging to be reduced.
> >
> > I look forward to productive discussions,
> > Claude
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> > [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
> >
>


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-10-23 Thread Jon Haddad
I think this is a great more generally useful than the two scenarios you've 
outlined.  I think it could / should be possible to use an object store as the 
primary storage for sstables and rely on local disk as a cache for reads.  

I don't know the roadmap for TCM, but imo if it allowed for more stable, 
pre-allocated ranges that compaction will always be aware of (plus a bunch of 
plumbing I'm deliberately avoiding the details on), then you could bootstrap a 
new node by copying s3 directories around rather than streaming data between 
nodes.  That's how we get to 20TB / node, easy scale up / down, etc, and 
always-ZCS for non-object store deployments.

Jon

On 2023/09/25 06:48:06 "Claude Warren, Jr via dev" wrote:
> I have just filed CEP-36 [1] to allow for keyspace/table storage outside of
> the standard storage space.
> 
> There are two desires  driving this change:
> 
>1. The ability to temporarily move some keyspaces/tables to storage
>outside the normal directory tree to other disk so that compaction can
>occur in situations where there is not enough disk space for compaction and
>the processing to the moved data can not be suspended.
>2. The ability to store infrequently used data on slower cheaper storage
>layers.
> 
> I have a working POC implementation [2] though there are some issues still
> to be solved and much logging to be reduced.
> 
> I look forward to productive discussions,
> Claude
> 
> [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
> 


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-10-19 Thread Claude Warren, Jr via dev
n an
>>>> array?
>>>>
>>>> Should the system set the path to the root of the ColumnFamilyStore in
>>>> the ColumnFamilyStore directories instance?
>>>> Should the Directories.getLocationForDisk() do the proxy to the other
>>>> file system?
>>>>
>>>> Where is the proper location to change from the standard internal
>>>> representation to the remote location?
>>>>
>>>>
>>>> On Fri, Sep 29, 2023 at 8:07 AM Claude Warren, Jr <
>>>> claude.war...@aiven.io> wrote:
>>>>
>>>>> Sorry I was out sick and did not respond yesterday.
>>>>>
>>>>> Henrik,  How does your system work?  What is the design strategy?
>>>>> Also is your code available somewhere?
>>>>>
>>>>> After looking at the code some more I think that the best solution is
>>>>> not a FileChannelProxy but to modify the Cassandra File class to get a
>>>>> FileSystem object for a Factory to build the Path that is used within that
>>>>> object.  I think that this makes if very small change that will pick up
>>>>> 90+% of the cases.  We then just need to find the edge cases.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Sep 29, 2023 at 1:14 AM German Eichberger via dev <
>>>>> dev@cassandra.apache.org> wrote:
>>>>>
>>>>>> Super excited about this as well. Happy to help test with Azure and
>>>>>> any other way needed.
>>>>>>
>>>>>> Thanks,
>>>>>> German
>>>>>> --
>>>>>> *From:* guo Maxwell 
>>>>>> *Sent:* Wednesday, September 27, 2023 7:38 PM
>>>>>> *To:* dev@cassandra.apache.org 
>>>>>> *Subject:* [EXTERNAL] Re: [DISCUSS] CEP-36: A Configurable
>>>>>> ChannelProxy to alias external storage locations
>>>>>>
>>>>>> Thanks , So I think a jira can be created now. And I'd be happy to
>>>>>> provide some help with this as well if needed.
>>>>>>
>>>>>> Henrik Ingo  于2023年9月28日周四 00:21写道:
>>>>>>
>>>>>> It seems I was volunteered to rebase the Astra implementation of this
>>>>>> functionality (FileSystemProvider) onto Cassandra trunk. (And publish it,
>>>>>> of course) I'll try to get going today or tomorrow, so that this
>>>>>> discussion can then benefit from having that code available for 
>>>>>> inspection.
>>>>>> And potentially using it as a soluttion to this use case.
>>>>>>
>>>>>> On Tue, Sep 26, 2023 at 8:04 PM Jake Luciani 
>>>>>> wrote:
>>>>>>
>>>>>> We (DataStax) have a FileSystemProvider for Astra we can provide.
>>>>>> Works with S3/GCS/Azure.
>>>>>>
>>>>>> I'll ask someone on our end to make it accessible.
>>>>>>
>>>>>> This would work by having a bucket prefix per node. But there are lots
>>>>>> of details needed to support things like out of bound compaction
>>>>>> (mentioned in CEP).
>>>>>>
>>>>>> Jake
>>>>>>
>>>>>> On Tue, Sep 26, 2023 at 12:56 PM Benedict 
>>>>>> wrote:
>>>>>> >
>>>>>> > I agree with Ariel, the more suitable insertion point is probably
>>>>>> the JDK level FileSystemProvider and FileSystem abstraction.
>>>>>> >
>>>>>> > It might also be that we can reuse existing work here in some cases?
>>>>>> >
>>>>>> > On 26 Sep 2023, at 17:49, Ariel Weisberg  wrote:
>>>>>> >
>>>>>> > 
>>>>>> > Hi,
>>>>>> >
>>>>>> > Support for multiple storage backends including remote storage
>>>>>> backends is a pretty high value piece of functionality. I am happy to see
>>>>>> there is interest in that.
>>>>>> >
>>>>>> > I think that `ChannelProxyFactory` as an integration point is going
>>>>>> to quickly turn into a dead end as we get into really using multiple
>>>>>> storage backends. We need to be able to list

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-10-18 Thread guo Maxwell
If it is ok for Henrik to rebase the Astra implementation of this
functionality (FileSystemProvider) onto Cassandra trunk.

Then we can create a jira to move this forward for a small step.

Claude Warren, Jr  于2023年10月18日周三 15:05写道:

> Henrik and Guo,
>
> Have you moved forward on this topic?  I have not seen anything recently.
> I have posted a solution that intercepts calls for directories and injects
> directories from different FileSystems.  This means that a node can have
> keyspaces both on the local file system and one or more other FileSystem
> implementations.
>
> I look forward to hearing from you,
> Claude
>
>
> On Wed, Oct 18, 2023 at 9:00 AM Claude Warren, Jr 
> wrote:
>
>> After a bit more analysis and some testing I have a new branch that I
>> think solves the problem. [1]  I have also created a pull request internal
>> to my clone so that it is easy to see the changes. [2]
>>
>> The strategy change is to move the insertion of the proxy from the
>> Cassandra File class to the Directories class.  This means that all action
>> with the table is captured (this solves a problem encountered in the
>> earlier strategy).
>> The strategy is to create a path on a different FileSystem and return
>> that.  The example code only moves the data for the table to another
>> directory on the same FileSystem but using a different FileSystem
>> implementation should be a trivial change.
>>
>> The current code works on an entire keyspace.  I, while code exists to
>> limit the redirect to a table I have not tested that branch yet and am not
>> certain that it will work.  There is also some code (i.e. the PathParser)
>> that may no longer be needed but has not been removed yet.
>>
>> Please take a look and let me know if you see any issues with this
>> solution.
>>
>> Claude
>>
>> [1] https://github.com/Claudenw/cassandra/tree/FileSystemProxy
>> [2] https://github.com/Claudenw/cassandra/pull/5/files
>>
>>
>>
>> On Tue, Oct 10, 2023 at 10:28 AM Claude Warren, Jr <
>> claude.war...@aiven.io> wrote:
>>
>>> I have been exploring adding a second Path to the Cassandra File
>>> object.  The original path being the path within the standard Cassandra
>>> directory tree and the second being a translated path when there is what
>>> was called a ChannelProxy in place.
>>>
>>> A problem arises when the Directories.getLocationForDisk() is called.
>>> It seems to be looking for locations that start with the data directory
>>> absolute path.   I can change it to make it look for the original path not
>>> the translated path.  But in other cases the translated path is the one
>>> that is needed.
>>>
>>> I notice that there is a concept of multiple file locations in the code
>>> base, particularly in the Directories.DataDirectories class where there are
>>> "locationsForNonSystemKeyspaces" and "locationsForSystemKeyspace" in the
>>> constructor, and in the
>>> DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations() method
>>> which returns an array of String and is populated from the cassandra.yaml
>>> file.
>>>
>>> The DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations()
>>> only ever seems to return an array of one item.
>>>
>>> Why does
>>> DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations()  return an
>>> array?
>>>
>>> Should the system set the path to the root of the ColumnFamilyStore in
>>> the ColumnFamilyStore directories instance?
>>> Should the Directories.getLocationForDisk() do the proxy to the other
>>> file system?
>>>
>>> Where is the proper location to change from the standard internal
>>> representation to the remote location?
>>>
>>>
>>> On Fri, Sep 29, 2023 at 8:07 AM Claude Warren, Jr <
>>> claude.war...@aiven.io> wrote:
>>>
>>>> Sorry I was out sick and did not respond yesterday.
>>>>
>>>> Henrik,  How does your system work?  What is the design strategy?  Also
>>>> is your code available somewhere?
>>>>
>>>> After looking at the code some more I think that the best solution is
>>>> not a FileChannelProxy but to modify the Cassandra File class to get a
>>>> FileSystem object for a Factory to build the Path that is used within that
>>>> object.  I think that this makes if very small change that will pick up
>>>> 90+% of the cases.  We then just need to find the ed

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-10-18 Thread Claude Warren, Jr via dev
Henrik and Guo,

Have you moved forward on this topic?  I have not seen anything recently.
I have posted a solution that intercepts calls for directories and injects
directories from different FileSystems.  This means that a node can have
keyspaces both on the local file system and one or more other FileSystem
implementations.

I look forward to hearing from you,
Claude


On Wed, Oct 18, 2023 at 9:00 AM Claude Warren, Jr 
wrote:

> After a bit more analysis and some testing I have a new branch that I
> think solves the problem. [1]  I have also created a pull request internal
> to my clone so that it is easy to see the changes. [2]
>
> The strategy change is to move the insertion of the proxy from the
> Cassandra File class to the Directories class.  This means that all action
> with the table is captured (this solves a problem encountered in the
> earlier strategy).
> The strategy is to create a path on a different FileSystem and return
> that.  The example code only moves the data for the table to another
> directory on the same FileSystem but using a different FileSystem
> implementation should be a trivial change.
>
> The current code works on an entire keyspace.  I, while code exists to
> limit the redirect to a table I have not tested that branch yet and am not
> certain that it will work.  There is also some code (i.e. the PathParser)
> that may no longer be needed but has not been removed yet.
>
> Please take a look and let me know if you see any issues with this
> solution.
>
> Claude
>
> [1] https://github.com/Claudenw/cassandra/tree/FileSystemProxy
> [2] https://github.com/Claudenw/cassandra/pull/5/files
>
>
>
> On Tue, Oct 10, 2023 at 10:28 AM Claude Warren, Jr 
> wrote:
>
>> I have been exploring adding a second Path to the Cassandra File object.
>> The original path being the path within the standard Cassandra directory
>> tree and the second being a translated path when there is what was called a
>> ChannelProxy in place.
>>
>> A problem arises when the Directories.getLocationForDisk() is called.  It
>> seems to be looking for locations that start with the data directory
>> absolute path.   I can change it to make it look for the original path not
>> the translated path.  But in other cases the translated path is the one
>> that is needed.
>>
>> I notice that there is a concept of multiple file locations in the code
>> base, particularly in the Directories.DataDirectories class where there are
>> "locationsForNonSystemKeyspaces" and "locationsForSystemKeyspace" in the
>> constructor, and in the
>> DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations() method
>> which returns an array of String and is populated from the cassandra.yaml
>> file.
>>
>> The DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations()
>> only ever seems to return an array of one item.
>>
>> Why does
>> DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations()  return an
>> array?
>>
>> Should the system set the path to the root of the ColumnFamilyStore in
>> the ColumnFamilyStore directories instance?
>> Should the Directories.getLocationForDisk() do the proxy to the other
>> file system?
>>
>> Where is the proper location to change from the standard internal
>> representation to the remote location?
>>
>>
>> On Fri, Sep 29, 2023 at 8:07 AM Claude Warren, Jr 
>> wrote:
>>
>>> Sorry I was out sick and did not respond yesterday.
>>>
>>> Henrik,  How does your system work?  What is the design strategy?  Also
>>> is your code available somewhere?
>>>
>>> After looking at the code some more I think that the best solution is
>>> not a FileChannelProxy but to modify the Cassandra File class to get a
>>> FileSystem object for a Factory to build the Path that is used within that
>>> object.  I think that this makes if very small change that will pick up
>>> 90+% of the cases.  We then just need to find the edge cases.
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Sep 29, 2023 at 1:14 AM German Eichberger via dev <
>>> dev@cassandra.apache.org> wrote:
>>>
>>>> Super excited about this as well. Happy to help test with Azure and any
>>>> other way needed.
>>>>
>>>> Thanks,
>>>> German
>>>> --
>>>> *From:* guo Maxwell 
>>>> *Sent:* Wednesday, September 27, 2023 7:38 PM
>>>> *To:* dev@cassandra.apache.org 
>>>> *Subject:* [EXTERNAL] Re: [DISCUSS] CEP-36: A Configurable
>&g

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-10-18 Thread Claude Warren, Jr via dev
After a bit more analysis and some testing I have a new branch that I think
solves the problem. [1]  I have also created a pull request internal to my
clone so that it is easy to see the changes. [2]

The strategy change is to move the insertion of the proxy from the
Cassandra File class to the Directories class.  This means that all action
with the table is captured (this solves a problem encountered in the
earlier strategy).
The strategy is to create a path on a different FileSystem and return
that.  The example code only moves the data for the table to another
directory on the same FileSystem but using a different FileSystem
implementation should be a trivial change.

The current code works on an entire keyspace.  I, while code exists to
limit the redirect to a table I have not tested that branch yet and am not
certain that it will work.  There is also some code (i.e. the PathParser)
that may no longer be needed but has not been removed yet.

Please take a look and let me know if you see any issues with this solution.

Claude

[1] https://github.com/Claudenw/cassandra/tree/FileSystemProxy
[2] https://github.com/Claudenw/cassandra/pull/5/files



On Tue, Oct 10, 2023 at 10:28 AM Claude Warren, Jr 
wrote:

> I have been exploring adding a second Path to the Cassandra File object.
> The original path being the path within the standard Cassandra directory
> tree and the second being a translated path when there is what was called a
> ChannelProxy in place.
>
> A problem arises when the Directories.getLocationForDisk() is called.  It
> seems to be looking for locations that start with the data directory
> absolute path.   I can change it to make it look for the original path not
> the translated path.  But in other cases the translated path is the one
> that is needed.
>
> I notice that there is a concept of multiple file locations in the code
> base, particularly in the Directories.DataDirectories class where there are
> "locationsForNonSystemKeyspaces" and "locationsForSystemKeyspace" in the
> constructor, and in the
> DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations() method
> which returns an array of String and is populated from the cassandra.yaml
> file.
>
> The DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations()  only
> ever seems to return an array of one item.
>
> Why does DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations()
> return an array?
>
> Should the system set the path to the root of the ColumnFamilyStore in the
> ColumnFamilyStore directories instance?
> Should the Directories.getLocationForDisk() do the proxy to the other file
> system?
>
> Where is the proper location to change from the standard internal
> representation to the remote location?
>
>
> On Fri, Sep 29, 2023 at 8:07 AM Claude Warren, Jr 
> wrote:
>
>> Sorry I was out sick and did not respond yesterday.
>>
>> Henrik,  How does your system work?  What is the design strategy?  Also
>> is your code available somewhere?
>>
>> After looking at the code some more I think that the best solution is not
>> a FileChannelProxy but to modify the Cassandra File class to get a
>> FileSystem object for a Factory to build the Path that is used within that
>> object.  I think that this makes if very small change that will pick up
>> 90+% of the cases.  We then just need to find the edge cases.
>>
>>
>>
>>
>>
>> On Fri, Sep 29, 2023 at 1:14 AM German Eichberger via dev <
>> dev@cassandra.apache.org> wrote:
>>
>>> Super excited about this as well. Happy to help test with Azure and any
>>> other way needed.
>>>
>>> Thanks,
>>> German
>>> --
>>> *From:* guo Maxwell 
>>> *Sent:* Wednesday, September 27, 2023 7:38 PM
>>> *To:* dev@cassandra.apache.org 
>>> *Subject:* [EXTERNAL] Re: [DISCUSS] CEP-36: A Configurable ChannelProxy
>>> to alias external storage locations
>>>
>>> Thanks , So I think a jira can be created now. And I'd be happy to
>>> provide some help with this as well if needed.
>>>
>>> Henrik Ingo  于2023年9月28日周四 00:21写道:
>>>
>>> It seems I was volunteered to rebase the Astra implementation of this
>>> functionality (FileSystemProvider) onto Cassandra trunk. (And publish it,
>>> of course) I'll try to get going today or tomorrow, so that this
>>> discussion can then benefit from having that code available for inspection.
>>> And potentially using it as a soluttion to this use case.
>>>
>>> On Tue, Sep 26, 2023 at 8:04 PM Jake Luciani  wrote:
>>>
>>> We (DataStax) have a FileSystemProvid

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-10-10 Thread Claude Warren, Jr via dev
I have been exploring adding a second Path to the Cassandra File object.
The original path being the path within the standard Cassandra directory
tree and the second being a translated path when there is what was called a
ChannelProxy in place.

A problem arises when the Directories.getLocationForDisk() is called.  It
seems to be looking for locations that start with the data directory
absolute path.   I can change it to make it look for the original path not
the translated path.  But in other cases the translated path is the one
that is needed.

I notice that there is a concept of multiple file locations in the code
base, particularly in the Directories.DataDirectories class where there are
"locationsForNonSystemKeyspaces" and "locationsForSystemKeyspace" in the
constructor, and in the
DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations() method
which returns an array of String and is populated from the cassandra.yaml
file.

The DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations()  only
ever seems to return an array of one item.

Why does DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations()
return an array?

Should the system set the path to the root of the ColumnFamilyStore in the
ColumnFamilyStore directories instance?
Should the Directories.getLocationForDisk() do the proxy to the other file
system?

Where is the proper location to change from the standard internal
representation to the remote location?


On Fri, Sep 29, 2023 at 8:07 AM Claude Warren, Jr 
wrote:

> Sorry I was out sick and did not respond yesterday.
>
> Henrik,  How does your system work?  What is the design strategy?  Also is
> your code available somewhere?
>
> After looking at the code some more I think that the best solution is not
> a FileChannelProxy but to modify the Cassandra File class to get a
> FileSystem object for a Factory to build the Path that is used within that
> object.  I think that this makes if very small change that will pick up
> 90+% of the cases.  We then just need to find the edge cases.
>
>
>
>
>
> On Fri, Sep 29, 2023 at 1:14 AM German Eichberger via dev <
> dev@cassandra.apache.org> wrote:
>
>> Super excited about this as well. Happy to help test with Azure and any
>> other way needed.
>>
>> Thanks,
>> German
>> --
>> *From:* guo Maxwell 
>> *Sent:* Wednesday, September 27, 2023 7:38 PM
>> *To:* dev@cassandra.apache.org 
>> *Subject:* [EXTERNAL] Re: [DISCUSS] CEP-36: A Configurable ChannelProxy
>> to alias external storage locations
>>
>> Thanks , So I think a jira can be created now. And I'd be happy to
>> provide some help with this as well if needed.
>>
>> Henrik Ingo  于2023年9月28日周四 00:21写道:
>>
>> It seems I was volunteered to rebase the Astra implementation of this
>> functionality (FileSystemProvider) onto Cassandra trunk. (And publish it,
>> of course) I'll try to get going today or tomorrow, so that this
>> discussion can then benefit from having that code available for inspection.
>> And potentially using it as a soluttion to this use case.
>>
>> On Tue, Sep 26, 2023 at 8:04 PM Jake Luciani  wrote:
>>
>> We (DataStax) have a FileSystemProvider for Astra we can provide.
>> Works with S3/GCS/Azure.
>>
>> I'll ask someone on our end to make it accessible.
>>
>> This would work by having a bucket prefix per node. But there are lots
>> of details needed to support things like out of bound compaction
>> (mentioned in CEP).
>>
>> Jake
>>
>> On Tue, Sep 26, 2023 at 12:56 PM Benedict  wrote:
>> >
>> > I agree with Ariel, the more suitable insertion point is probably the
>> JDK level FileSystemProvider and FileSystem abstraction.
>> >
>> > It might also be that we can reuse existing work here in some cases?
>> >
>> > On 26 Sep 2023, at 17:49, Ariel Weisberg  wrote:
>> >
>> > 
>> > Hi,
>> >
>> > Support for multiple storage backends including remote storage backends
>> is a pretty high value piece of functionality. I am happy to see there is
>> interest in that.
>> >
>> > I think that `ChannelProxyFactory` as an integration point is going to
>> quickly turn into a dead end as we get into really using multiple storage
>> backends. We need to be able to list files and really the full range of
>> filesystem interactions that Java supports should work with any backend to
>> make development, testing, and using existing code straightforward.
>> >
>> > It's a little more work to get C* to creates paths for alternate
>> backends where appropriate, but t

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-28 Thread Claude Warren, Jr via dev
Sorry I was out sick and did not respond yesterday.

Henrik,  How does your system work?  What is the design strategy?  Also is
your code available somewhere?

After looking at the code some more I think that the best solution is not a
FileChannelProxy but to modify the Cassandra File class to get a FileSystem
object for a Factory to build the Path that is used within that object.  I
think that this makes if very small change that will pick up 90+% of the
cases.  We then just need to find the edge cases.





On Fri, Sep 29, 2023 at 1:14 AM German Eichberger via dev <
dev@cassandra.apache.org> wrote:

> Super excited about this as well. Happy to help test with Azure and any
> other way needed.
>
> Thanks,
> German
> --
> *From:* guo Maxwell 
> *Sent:* Wednesday, September 27, 2023 7:38 PM
> *To:* dev@cassandra.apache.org 
> *Subject:* [EXTERNAL] Re: [DISCUSS] CEP-36: A Configurable ChannelProxy
> to alias external storage locations
>
> Thanks , So I think a jira can be created now. And I'd be happy to provide
> some help with this as well if needed.
>
> Henrik Ingo  于2023年9月28日周四 00:21写道:
>
> It seems I was volunteered to rebase the Astra implementation of this
> functionality (FileSystemProvider) onto Cassandra trunk. (And publish it,
> of course) I'll try to get going today or tomorrow, so that this
> discussion can then benefit from having that code available for inspection.
> And potentially using it as a soluttion to this use case.
>
> On Tue, Sep 26, 2023 at 8:04 PM Jake Luciani  wrote:
>
> We (DataStax) have a FileSystemProvider for Astra we can provide.
> Works with S3/GCS/Azure.
>
> I'll ask someone on our end to make it accessible.
>
> This would work by having a bucket prefix per node. But there are lots
> of details needed to support things like out of bound compaction
> (mentioned in CEP).
>
> Jake
>
> On Tue, Sep 26, 2023 at 12:56 PM Benedict  wrote:
> >
> > I agree with Ariel, the more suitable insertion point is probably the
> JDK level FileSystemProvider and FileSystem abstraction.
> >
> > It might also be that we can reuse existing work here in some cases?
> >
> > On 26 Sep 2023, at 17:49, Ariel Weisberg  wrote:
> >
> > 
> > Hi,
> >
> > Support for multiple storage backends including remote storage backends
> is a pretty high value piece of functionality. I am happy to see there is
> interest in that.
> >
> > I think that `ChannelProxyFactory` as an integration point is going to
> quickly turn into a dead end as we get into really using multiple storage
> backends. We need to be able to list files and really the full range of
> filesystem interactions that Java supports should work with any backend to
> make development, testing, and using existing code straightforward.
> >
> > It's a little more work to get C* to creates paths for alternate
> backends where appropriate, but that works is probably necessary even with
> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple
> Fileystems). There will probably also be backend specific behaviors that
> show up above the `ChannelProxy` layer that will depend on the backend.
> >
> > Ideally there would be some config to specify several backend
> filesystems and their individual configuration that can be used, as well as
> configuration and support for a "backend file router" for file creation
> (and opening) that can be used to route files to the backend most
> appropriate.
> >
> > Regards,
> > Ariel
> >
> > On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
> >
> > I have just filed CEP-36 [1] to allow for keyspace/table storage outside
> of the standard storage space.
> >
> > There are two desires  driving this change:
> >
> > The ability to temporarily move some keyspaces/tables to storage outside
> the normal directory tree to other disk so that compaction can occur in
> situations where there is not enough disk space for compaction and the
> processing to the moved data can not be suspended.
> > The ability to store infrequently used data on slower cheaper storage
> layers.
> >
> > I have a working POC implementation [2] though there are some issues
> still to be solved and much logging to be reduced.
> >
> > I look forward to productive discussions,
> > Claude
> >
> > [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> > [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
> >
> >
> >
>
>
> --
> http://twitter.com/tjake
>
>
>
> --
>
> Henrik Ingo
>
> c. +358 40 569 7354
>
> w. www.datastax.com
>
> <https://www.facebook.com/datastax>  <https://twitter.com/datastax>
> <https://www.linkedin.com/company/datastax/>
> <https://github.com/datastax/>
>
>
>
> --
> you are the apple of my eye !
>


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-28 Thread German Eichberger via dev
Super excited about this as well. Happy to help test with Azure and any other 
way needed.

Thanks,
German

From: guo Maxwell 
Sent: Wednesday, September 27, 2023 7:38 PM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias 
external storage locations

Thanks , So I think a jira can be created now. And I'd be happy to provide some 
help with this as well if needed.

Henrik Ingo mailto:henrik.i...@datastax.com>> 
于2023年9月28日周四 00:21写道:
It seems I was volunteered to rebase the Astra implementation of this 
functionality (FileSystemProvider) onto Cassandra trunk. (And publish it, of 
course) I'll try to get going today or tomorrow, so that this  discussion can 
then benefit from having that code available for inspection. And potentially 
using it as a soluttion to this use case.

On Tue, Sep 26, 2023 at 8:04 PM Jake Luciani 
mailto:jak...@gmail.com>> wrote:
We (DataStax) have a FileSystemProvider for Astra we can provide.
Works with S3/GCS/Azure.

I'll ask someone on our end to make it accessible.

This would work by having a bucket prefix per node. But there are lots
of details needed to support things like out of bound compaction
(mentioned in CEP).

Jake

On Tue, Sep 26, 2023 at 12:56 PM Benedict 
mailto:bened...@apache.org>> wrote:
>
> I agree with Ariel, the more suitable insertion point is probably the JDK 
> level FileSystemProvider and FileSystem abstraction.
>
> It might also be that we can reuse existing work here in some cases?
>
> On 26 Sep 2023, at 17:49, Ariel Weisberg 
> mailto:ar...@weisberg.ws>> wrote:
>
> 
> Hi,
>
> Support for multiple storage backends including remote storage backends is a 
> pretty high value piece of functionality. I am happy to see there is interest 
> in that.
>
> I think that `ChannelProxyFactory` as an integration point is going to 
> quickly turn into a dead end as we get into really using multiple storage 
> backends. We need to be able to list files and really the full range of 
> filesystem interactions that Java supports should work with any backend to 
> make development, testing, and using existing code straightforward.
>
> It's a little more work to get C* to creates paths for alternate backends 
> where appropriate, but that works is probably necessary even with 
> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple 
> Fileystems). There will probably also be backend specific behaviors that show 
> up above the `ChannelProxy` layer that will depend on the backend.
>
> Ideally there would be some config to specify several backend filesystems and 
> their individual configuration that can be used, as well as configuration and 
> support for a "backend file router" for file creation (and opening) that can 
> be used to route files to the backend most appropriate.
>
> Regards,
> Ariel
>
> On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
>
> I have just filed CEP-36 [1] to allow for keyspace/table storage outside of 
> the standard storage space.
>
> There are two desires  driving this change:
>
> The ability to temporarily move some keyspaces/tables to storage outside the 
> normal directory tree to other disk so that compaction can occur in 
> situations where there is not enough disk space for compaction and the 
> processing to the moved data can not be suspended.
> The ability to store infrequently used data on slower cheaper storage layers.
>
> I have a working POC implementation [2] though there are some issues still to 
> be solved and much logging to be reduced.
>
> I look forward to productive discussions,
> Claude
>
> [1] 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>
>
>


--
http://twitter.com/tjake


--

[https://lh5.googleusercontent.com/UwlCp-Ixn21QzYv9oNnaGy0cKfFk1ukEBVKSv4V3-nQShsR-cib_VeSuNm4M_xZxyAzTTr0Et7MsQuTDhUGcmWQyfVP801Flif-SGT2x38lFRGkgoMUB4cot1DB9xd7Y0x2P0wJWA-gQ5k4rzytFSoLCP4wJntmJzhlqTuQQsOanCBHeejtSBcBry5v6kw]

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com<http://www.datastax.com/>

[https://lh3.googleusercontent.com/T6MEp9neZySKd-eg-tkz96Yf4qG_Xsgu-IznDkdHfsHCjAnnHQP6OsPCdj8rsDvgKs-GJS6TA7Yx5HlK-zfRlE64j0zDpDG9cI29VaG948x5xLgUU4KKctaHNAhbpJ_pDwzRag9K7yCibGblB5Ix5z6Xj99Vc92V9nYSmR4HIj5F9T_TVI7ayW2n2_lp5Q]<https://www.facebook.com/datastax>
  
[https://lh3.googleusercontent.com/Xrju2UthJiMtMS5jFknV8AhVO45tfhXSR6U0F8Qam1Mu2taE2SeVcl5ExaxU5l6pG0fHjv2b6vvUOe12WQldMqsOHknC7wQtBVYiX9ff3fLMtFAbjVRM0MGTKvPsjAcMI_FNvcIcuWIBP_zwRuh3b3g6hjHOW0ik9bDPuuYMvdLWIF8C8YgKDYQ-nV9dlQ]
 <https://twitter.com/datastax>  

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-27 Thread Jeff Jirsa
Claude if you’re still at POC phase does it make sense for you to perhaps help validate / qualify the work that Henrik seems willing to rebase rather than reinventing the wheel? On Sep 26, 2023, at 11:23 PM, Claude Warren, Jr via dev  wrote:I spent a little (very little) time building an S3 implementation using an Apache licensed S3 filesystem package.  I have not yet tested it but if anyone is interested it is at https://github.com/Aiven-Labs/S3-Cassandra-ChannelProxyIn looking at some of the code I think the Cassandra File class needs to be modified to ask the ChannelProxy for the default file system for the file in question.  This should resolve some of the issues my original demo has with some files being created in the data tree.  It may also handle many of the cases for offline tools as well.On Tue, Sep 26, 2023 at 7:33 PM Miklosovic, Stefan <stefan.mikloso...@netapp.com> wrote:Would it be possible to make Jimfs integration production-ready then? I see we are using it in the tests already.

It might be one of the reference implementations of this CEP. If there is a type of workload / type of nodes with plenty of RAM but no disk, some kind of compute nodes, it would just hold it all in memory and we might "flush" it to a cloud-based storage if rendered to be not necessary anymore (whatever that means).

We could then completely bypass the memtables as fetching data from an SSTable from memory would be basically roughly same?

On the other hand, that might be achieved by creating a ramdisk so I am not sure what exactly we would gain here. However, if it was eventually storing these SSTables in a cloud storage, we might "compact" "TWCS tables" automatically after so-and-so period by moving them there.


From: Jake Luciani <jak...@gmail.com>
Sent: Tuesday, September 26, 2023 19:03
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




We (DataStax) have a FileSystemProvider for Astra we can provide.
Works with S3/GCS/Azure.

I'll ask someone on our end to make it accessible.

This would work by having a bucket prefix per node. But there are lots
of details needed to support things like out of bound compaction
(mentioned in CEP).

Jake

On Tue, Sep 26, 2023 at 12:56 PM Benedict <bened...@apache.org> wrote:
>
> I agree with Ariel, the more suitable insertion point is probably the JDK level FileSystemProvider and FileSystem abstraction.
>
> It might also be that we can reuse existing work here in some cases?
>
> On 26 Sep 2023, at 17:49, Ariel Weisberg <ar...@weisberg.ws> wrote:
>
> 
> Hi,
>
> Support for multiple storage backends including remote storage backends is a pretty high value piece of functionality. I am happy to see there is interest in that.
>
> I think that `ChannelProxyFactory` as an integration point is going to quickly turn into a dead end as we get into really using multiple storage backends. We need to be able to list files and really the full range of filesystem interactions that Java supports should work with any backend to make development, testing, and using existing code straightforward.
>
> It's a little more work to get C* to creates paths for alternate backends where appropriate, but that works is probably necessary even with `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple Fileystems). There will probably also be backend specific behaviors that show up above the `ChannelProxy` layer that will depend on the backend.
>
> Ideally there would be some config to specify several backend filesystems and their individual configuration that can be used, as well as configuration and support for a "backend file router" for file creation (and opening) that can be used to route files to the backend most appropriate.
>
> Regards,
> Ariel
>
> On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
>
> I have just filed CEP-36 [1] to allow for keyspace/table storage outside of the standard storage space.
>
> There are two desires  driving this change:
>
> The ability to temporarily move some keyspaces/tables to storage outside the normal directory tree to other disk so that compaction can occur in situations where there is not enough disk space for compaction and the processing to the moved data can not be suspended.
> The ability to store infrequently used data on slower cheaper storage layers.
>
> I have a working POC implementation [2] though there are some issues still to be solved and much logging to be reduced.
>
> I look forward to productive discussions,
> Claude
>
> [1] https://cwiki.apache.org/conflue

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-27 Thread guo Maxwell
Thanks , So I think a jira can be created now. And I'd be happy to provide
some help with this as well if needed.

Henrik Ingo  于2023年9月28日周四 00:21写道:

> It seems I was volunteered to rebase the Astra implementation of this
> functionality (FileSystemProvider) onto Cassandra trunk. (And publish it,
> of course) I'll try to get going today or tomorrow, so that this
> discussion can then benefit from having that code available for inspection.
> And potentially using it as a soluttion to this use case.
>
> On Tue, Sep 26, 2023 at 8:04 PM Jake Luciani  wrote:
>
>> We (DataStax) have a FileSystemProvider for Astra we can provide.
>> Works with S3/GCS/Azure.
>>
>> I'll ask someone on our end to make it accessible.
>>
>> This would work by having a bucket prefix per node. But there are lots
>> of details needed to support things like out of bound compaction
>> (mentioned in CEP).
>>
>> Jake
>>
>> On Tue, Sep 26, 2023 at 12:56 PM Benedict  wrote:
>> >
>> > I agree with Ariel, the more suitable insertion point is probably the
>> JDK level FileSystemProvider and FileSystem abstraction.
>> >
>> > It might also be that we can reuse existing work here in some cases?
>> >
>> > On 26 Sep 2023, at 17:49, Ariel Weisberg  wrote:
>> >
>> > 
>> > Hi,
>> >
>> > Support for multiple storage backends including remote storage backends
>> is a pretty high value piece of functionality. I am happy to see there is
>> interest in that.
>> >
>> > I think that `ChannelProxyFactory` as an integration point is going to
>> quickly turn into a dead end as we get into really using multiple storage
>> backends. We need to be able to list files and really the full range of
>> filesystem interactions that Java supports should work with any backend to
>> make development, testing, and using existing code straightforward.
>> >
>> > It's a little more work to get C* to creates paths for alternate
>> backends where appropriate, but that works is probably necessary even with
>> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple
>> Fileystems). There will probably also be backend specific behaviors that
>> show up above the `ChannelProxy` layer that will depend on the backend.
>> >
>> > Ideally there would be some config to specify several backend
>> filesystems and their individual configuration that can be used, as well as
>> configuration and support for a "backend file router" for file creation
>> (and opening) that can be used to route files to the backend most
>> appropriate.
>> >
>> > Regards,
>> > Ariel
>> >
>> > On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
>> >
>> > I have just filed CEP-36 [1] to allow for keyspace/table storage
>> outside of the standard storage space.
>> >
>> > There are two desires  driving this change:
>> >
>> > The ability to temporarily move some keyspaces/tables to storage
>> outside the normal directory tree to other disk so that compaction can
>> occur in situations where there is not enough disk space for compaction and
>> the processing to the moved data can not be suspended.
>> > The ability to store infrequently used data on slower cheaper storage
>> layers.
>> >
>> > I have a working POC implementation [2] though there are some issues
>> still to be solved and much logging to be reduced.
>> >
>> > I look forward to productive discussions,
>> > Claude
>> >
>> > [1]
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
>> > [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>> >
>> >
>> >
>>
>>
>> --
>> http://twitter.com/tjake
>>
>
>
> --
>
> Henrik Ingo
>
> c. +358 40 569 7354
>
> w. www.datastax.com
>
>   
> 
> 
>
>

-- 
you are the apple of my eye !


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-27 Thread Henrik Ingo
It seems I was volunteered to rebase the Astra implementation of this
functionality (FileSystemProvider) onto Cassandra trunk. (And publish it,
of course) I'll try to get going today or tomorrow, so that this
discussion can then benefit from having that code available for inspection.
And potentially using it as a soluttion to this use case.

On Tue, Sep 26, 2023 at 8:04 PM Jake Luciani  wrote:

> We (DataStax) have a FileSystemProvider for Astra we can provide.
> Works with S3/GCS/Azure.
>
> I'll ask someone on our end to make it accessible.
>
> This would work by having a bucket prefix per node. But there are lots
> of details needed to support things like out of bound compaction
> (mentioned in CEP).
>
> Jake
>
> On Tue, Sep 26, 2023 at 12:56 PM Benedict  wrote:
> >
> > I agree with Ariel, the more suitable insertion point is probably the
> JDK level FileSystemProvider and FileSystem abstraction.
> >
> > It might also be that we can reuse existing work here in some cases?
> >
> > On 26 Sep 2023, at 17:49, Ariel Weisberg  wrote:
> >
> > 
> > Hi,
> >
> > Support for multiple storage backends including remote storage backends
> is a pretty high value piece of functionality. I am happy to see there is
> interest in that.
> >
> > I think that `ChannelProxyFactory` as an integration point is going to
> quickly turn into a dead end as we get into really using multiple storage
> backends. We need to be able to list files and really the full range of
> filesystem interactions that Java supports should work with any backend to
> make development, testing, and using existing code straightforward.
> >
> > It's a little more work to get C* to creates paths for alternate
> backends where appropriate, but that works is probably necessary even with
> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple
> Fileystems). There will probably also be backend specific behaviors that
> show up above the `ChannelProxy` layer that will depend on the backend.
> >
> > Ideally there would be some config to specify several backend
> filesystems and their individual configuration that can be used, as well as
> configuration and support for a "backend file router" for file creation
> (and opening) that can be used to route files to the backend most
> appropriate.
> >
> > Regards,
> > Ariel
> >
> > On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
> >
> > I have just filed CEP-36 [1] to allow for keyspace/table storage outside
> of the standard storage space.
> >
> > There are two desires  driving this change:
> >
> > The ability to temporarily move some keyspaces/tables to storage outside
> the normal directory tree to other disk so that compaction can occur in
> situations where there is not enough disk space for compaction and the
> processing to the moved data can not be suspended.
> > The ability to store infrequently used data on slower cheaper storage
> layers.
> >
> > I have a working POC implementation [2] though there are some issues
> still to be solved and much logging to be reduced.
> >
> > I look forward to productive discussions,
> > Claude
> >
> > [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> > [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
> >
> >
> >
>
>
> --
> http://twitter.com/tjake
>


-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

  
  


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Claude Warren, Jr via dev
I spent a little (very little) time building an S3 implementation using an
Apache licensed S3 filesystem package.  I have not yet tested it but if
anyone is interested it is at
https://github.com/Aiven-Labs/S3-Cassandra-ChannelProxy

In looking at some of the code I think the Cassandra File class needs to be
modified to ask the ChannelProxy for the default file system for the file
in question.  This should resolve some of the issues my original demo has
with some files being created in the data tree.  It may also handle many of
the cases for offline tools as well.


On Tue, Sep 26, 2023 at 7:33 PM Miklosovic, Stefan <
stefan.mikloso...@netapp.com> wrote:

> Would it be possible to make Jimfs integration production-ready then? I
> see we are using it in the tests already.
>
> It might be one of the reference implementations of this CEP. If there is
> a type of workload / type of nodes with plenty of RAM but no disk, some
> kind of compute nodes, it would just hold it all in memory and we might
> "flush" it to a cloud-based storage if rendered to be not necessary anymore
> (whatever that means).
>
> We could then completely bypass the memtables as fetching data from an
> SSTable from memory would be basically roughly same?
>
> On the other hand, that might be achieved by creating a ramdisk so I am
> not sure what exactly we would gain here. However, if it was eventually
> storing these SSTables in a cloud storage, we might "compact" "TWCS tables"
> automatically after so-and-so period by moving them there.
>
> 
> From: Jake Luciani 
> Sent: Tuesday, September 26, 2023 19:03
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias
> external storage locations
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
>
> We (DataStax) have a FileSystemProvider for Astra we can provide.
> Works with S3/GCS/Azure.
>
> I'll ask someone on our end to make it accessible.
>
> This would work by having a bucket prefix per node. But there are lots
> of details needed to support things like out of bound compaction
> (mentioned in CEP).
>
> Jake
>
> On Tue, Sep 26, 2023 at 12:56 PM Benedict  wrote:
> >
> > I agree with Ariel, the more suitable insertion point is probably the
> JDK level FileSystemProvider and FileSystem abstraction.
> >
> > It might also be that we can reuse existing work here in some cases?
> >
> > On 26 Sep 2023, at 17:49, Ariel Weisberg  wrote:
> >
> > 
> > Hi,
> >
> > Support for multiple storage backends including remote storage backends
> is a pretty high value piece of functionality. I am happy to see there is
> interest in that.
> >
> > I think that `ChannelProxyFactory` as an integration point is going to
> quickly turn into a dead end as we get into really using multiple storage
> backends. We need to be able to list files and really the full range of
> filesystem interactions that Java supports should work with any backend to
> make development, testing, and using existing code straightforward.
> >
> > It's a little more work to get C* to creates paths for alternate
> backends where appropriate, but that works is probably necessary even with
> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple
> Fileystems). There will probably also be backend specific behaviors that
> show up above the `ChannelProxy` layer that will depend on the backend.
> >
> > Ideally there would be some config to specify several backend
> filesystems and their individual configuration that can be used, as well as
> configuration and support for a "backend file router" for file creation
> (and opening) that can be used to route files to the backend most
> appropriate.
> >
> > Regards,
> > Ariel
> >
> > On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
> >
> > I have just filed CEP-36 [1] to allow for keyspace/table storage outside
> of the standard storage space.
> >
> > There are two desires  driving this change:
> >
> > The ability to temporarily move some keyspaces/tables to storage outside
> the normal directory tree to other disk so that compaction can occur in
> situations where there is not enough disk space for compaction and the
> processing to the moved data can not be suspended.
> > The ability to store infrequently used data on slower cheaper storage
> layers.
> >
> > I have a working POC implementation [2] though there are some issues
> still to be solved and much logging to be reduced.
> >
> > I look forward to productive discussions,
> > Claude
> >
> > [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> > [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
> >
> >
> >
>
>
> --
> http://twitter.com/tjake
>


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Miklosovic, Stefan
Would it be possible to make Jimfs integration production-ready then? I see we 
are using it in the tests already.

It might be one of the reference implementations of this CEP. If there is a 
type of workload / type of nodes with plenty of RAM but no disk, some kind of 
compute nodes, it would just hold it all in memory and we might "flush" it to a 
cloud-based storage if rendered to be not necessary anymore (whatever that 
means).

We could then completely bypass the memtables as fetching data from an SSTable 
from memory would be basically roughly same?

On the other hand, that might be achieved by creating a ramdisk so I am not 
sure what exactly we would gain here. However, if it was eventually storing 
these SSTables in a cloud storage, we might "compact" "TWCS tables" 
automatically after so-and-so period by moving them there.


From: Jake Luciani 
Sent: Tuesday, September 26, 2023 19:03
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external 
storage locations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




We (DataStax) have a FileSystemProvider for Astra we can provide.
Works with S3/GCS/Azure.

I'll ask someone on our end to make it accessible.

This would work by having a bucket prefix per node. But there are lots
of details needed to support things like out of bound compaction
(mentioned in CEP).

Jake

On Tue, Sep 26, 2023 at 12:56 PM Benedict  wrote:
>
> I agree with Ariel, the more suitable insertion point is probably the JDK 
> level FileSystemProvider and FileSystem abstraction.
>
> It might also be that we can reuse existing work here in some cases?
>
> On 26 Sep 2023, at 17:49, Ariel Weisberg  wrote:
>
> 
> Hi,
>
> Support for multiple storage backends including remote storage backends is a 
> pretty high value piece of functionality. I am happy to see there is interest 
> in that.
>
> I think that `ChannelProxyFactory` as an integration point is going to 
> quickly turn into a dead end as we get into really using multiple storage 
> backends. We need to be able to list files and really the full range of 
> filesystem interactions that Java supports should work with any backend to 
> make development, testing, and using existing code straightforward.
>
> It's a little more work to get C* to creates paths for alternate backends 
> where appropriate, but that works is probably necessary even with 
> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple 
> Fileystems). There will probably also be backend specific behaviors that show 
> up above the `ChannelProxy` layer that will depend on the backend.
>
> Ideally there would be some config to specify several backend filesystems and 
> their individual configuration that can be used, as well as configuration and 
> support for a "backend file router" for file creation (and opening) that can 
> be used to route files to the backend most appropriate.
>
> Regards,
> Ariel
>
> On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
>
> I have just filed CEP-36 [1] to allow for keyspace/table storage outside of 
> the standard storage space.
>
> There are two desires  driving this change:
>
> The ability to temporarily move some keyspaces/tables to storage outside the 
> normal directory tree to other disk so that compaction can occur in 
> situations where there is not enough disk space for compaction and the 
> processing to the moved data can not be suspended.
> The ability to store infrequently used data on slower cheaper storage layers.
>
> I have a working POC implementation [2] though there are some issues still to 
> be solved and much logging to be reduced.
>
> I look forward to productive discussions,
> Claude
>
> [1] 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>
>
>


--
http://twitter.com/tjake


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Jake Luciani
We (DataStax) have a FileSystemProvider for Astra we can provide.
Works with S3/GCS/Azure.

I'll ask someone on our end to make it accessible.

This would work by having a bucket prefix per node. But there are lots
of details needed to support things like out of bound compaction
(mentioned in CEP).

Jake

On Tue, Sep 26, 2023 at 12:56 PM Benedict  wrote:
>
> I agree with Ariel, the more suitable insertion point is probably the JDK 
> level FileSystemProvider and FileSystem abstraction.
>
> It might also be that we can reuse existing work here in some cases?
>
> On 26 Sep 2023, at 17:49, Ariel Weisberg  wrote:
>
> 
> Hi,
>
> Support for multiple storage backends including remote storage backends is a 
> pretty high value piece of functionality. I am happy to see there is interest 
> in that.
>
> I think that `ChannelProxyFactory` as an integration point is going to 
> quickly turn into a dead end as we get into really using multiple storage 
> backends. We need to be able to list files and really the full range of 
> filesystem interactions that Java supports should work with any backend to 
> make development, testing, and using existing code straightforward.
>
> It's a little more work to get C* to creates paths for alternate backends 
> where appropriate, but that works is probably necessary even with 
> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple 
> Fileystems). There will probably also be backend specific behaviors that show 
> up above the `ChannelProxy` layer that will depend on the backend.
>
> Ideally there would be some config to specify several backend filesystems and 
> their individual configuration that can be used, as well as configuration and 
> support for a "backend file router" for file creation (and opening) that can 
> be used to route files to the backend most appropriate.
>
> Regards,
> Ariel
>
> On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
>
> I have just filed CEP-36 [1] to allow for keyspace/table storage outside of 
> the standard storage space.
>
> There are two desires  driving this change:
>
> The ability to temporarily move some keyspaces/tables to storage outside the 
> normal directory tree to other disk so that compaction can occur in 
> situations where there is not enough disk space for compaction and the 
> processing to the moved data can not be suspended.
> The ability to store infrequently used data on slower cheaper storage layers.
>
> I have a working POC implementation [2] though there are some issues still to 
> be solved and much logging to be reduced.
>
> I look forward to productive discussions,
> Claude
>
> [1] 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>
>
>


-- 
http://twitter.com/tjake


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Benedict
I agree with Ariel, the more suitable insertion point is probably the JDK level 
FileSystemProvider and FileSystem abstraction.

It might also be that we can reuse existing work here in some cases?

> On 26 Sep 2023, at 17:49, Ariel Weisberg  wrote:
> 
> 
> Hi,
> 
> Support for multiple storage backends including remote storage backends is a 
> pretty high value piece of functionality. I am happy to see there is interest 
> in that.
> 
> I think that `ChannelProxyFactory` as an integration point is going to 
> quickly turn into a dead end as we get into really using multiple storage 
> backends. We need to be able to list files and really the full range of 
> filesystem interactions that Java supports should work with any backend to 
> make development, testing, and using existing code straightforward.
> 
> It's a little more work to get C* to creates paths for alternate backends 
> where appropriate, but that works is probably necessary even with 
> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple 
> Fileystems). There will probably also be backend specific behaviors that show 
> up above the `ChannelProxy` layer that will depend on the backend.
> 
> Ideally there would be some config to specify several backend filesystems and 
> their individual configuration that can be used, as well as configuration and 
> support for a "backend file router" for file creation (and opening) that can 
> be used to route files to the backend most appropriate.
> 
> Regards,
> Ariel
> 
>> On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
>> I have just filed CEP-36 [1] to allow for keyspace/table storage outside of 
>> the standard storage space.  
>> 
>> There are two desires  driving this change:
>> The ability to temporarily move some keyspaces/tables to storage outside the 
>> normal directory tree to other disk so that compaction can occur in 
>> situations where there is not enough disk space for compaction and the 
>> processing to the moved data can not be suspended.
>> The ability to store infrequently used data on slower cheaper storage layers.
>> I have a working POC implementation [2] though there are some issues still 
>> to be solved and much logging to be reduced.
>> 
>> I look forward to productive discussions,
>> Claude
>> 
>> [1] 
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
>> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory 
>> 
>> 
> 


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Ariel Weisberg
Hi,

Support for multiple storage backends including remote storage backends is a 
pretty high value piece of functionality. I am happy to see there is interest 
in that.

I think that `ChannelProxyFactory` as an integration point is going to quickly 
turn into a dead end as we get into really using multiple storage backends. We 
need to be able to list files and really the full range of filesystem 
interactions that Java supports should work with any backend to make 
development, testing, and using existing code straightforward.

It's a little more work to get C* to creates paths for alternate backends where 
appropriate, but that works is probably necessary even with 
`ChanelProxyFactory` and munging UNIX paths (vs supporting multiple 
Fileystems). There will probably also be backend specific behaviors that show 
up above the `ChannelProxy` layer that will depend on the backend.

Ideally there would be some config to specify several backend filesystems and 
their individual configuration that can be used, as well as configuration and 
support for a "backend file router" for file creation (and opening) that can be 
used to route files to the backend most appropriate.

Regards,
Ariel

On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
> I have just filed CEP-36 [1] to allow for keyspace/table storage outside of 
> the standard storage space.  
> 
> There are two desires  driving this change:
>  1. The ability to temporarily move some keyspaces/tables to storage outside 
> the normal directory tree to other disk so that compaction can occur in 
> situations where there is not enough disk space for compaction and the 
> processing to the moved data can not be suspended.
>  2. The ability to store infrequently used data on slower cheaper storage 
> layers.
> I have a working POC implementation [2] though there are some issues still to 
> be solved and much logging to be reduced.
> 
> I look forward to productive discussions,
> Claude
> 
> [1] 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory 
> 
> 


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread guo Maxwell
Yeah, there is so much things to do as cassandra (share-nothing) is
different from some other system like hbase , So I think we can break the
final goal into multiple steps. first is what Claude proposed. But I
suggest that this design can make the interface more scalable and we can
consider the implementation of cloud storage. so that someone can extend
the interface in the future.

Josh McKenzie  于2023年9月26日周二 18:40写道:

> it may be better to support most cloud storage
> It simply only supports S3, which feels a bit customized for a certain
> user and is not universal enough.Am I right ?
>
> I agree w/the eventual goal (and constraint on design now) of supporting
> most popular cloud storage vendors, but if we have someone with an itch to
> scratch and at the end of that we end up with first steps in a compatible
> direction to ultimately supporting decoupled / abstracted storage systems,
> that's fantastic.
>
> To Jeff's point - so long as we can think about and chart a general path
> of where we want to go, if Claude has the time and inclination to handle
> abstracting out the API in that direction and one implementation, that's
> fantastic IMO.
>
> I know there's some other folks out there who've done some interception /
> refactoring of the FileChannel stuff to support disaggregated storage;
> curious what their experiences were like.
>
>
> On Tue, Sep 26, 2023, at 4:20 AM, Claude Warren, Jr via dev wrote:
>
> The intention of the CEP is to lay the groundwork to allow development of
> ChannelProxyFactories that are pluggable in Cassandra.  In this way any
> storage system can be a candidate for Cassandra storage provided
> FileChannels can be created for the system.
>
> As I stated before I think that there may be a need for a
> java.nio.FileSystem implementation for  the proxies but I have not had the
> time to dig into it yet.
>
> Claude
>
>
> On Tue, Sep 26, 2023 at 9:01 AM guo Maxwell  wrote:
>
> In my mind , it may be better to support most cloud storage : aws,
> azure,gcp,aliyun and so on . We may make it a plugable. But in that way, it
> seems there may need a filesystem interface layer for object storage. And
> should we support ,distributed system like hdfs ,or something else. We
> should first discuss what should be done and what should not be done. It
> simply only supports S3, which feels a bit customized for a certain user
> and is not universal enough.Am I right ?
>
> Claude Warren, Jr  于2023年9月26日周二 14:36写道:
>
> My intention is to develop an S3 storage system using
> https://github.com/carlspring/s3fs-nio
>
> There are several issues yet to be solved:
>
>1. There are some internal calls that create files in the table
>directory that do not use the channel proxy.  I believe that these are
>making calls on File objects.  I think those File objects are Cassandra
>File objects not Java I/O File objects, but am unsure.
>2. Determine if the carlspring s3fs-nio library will be performant
>enough to work in the long run.  There may be issues with it:
>1. Downloading entire files before using them rather than using views
>   into larger remotely stored files.
>   2. Requiring a complete file to upload rather than using the
>   partial upload capability of the S3 interface.
>
>
>
> On Tue, Sep 26, 2023 at 4:11 AM guo Maxwell  wrote:
>
> "Rather than building this piece by piece, I think it'd be awesome if
> someone drew up an end-to-end plan to implement tiered storage, so we can
> make sure we're discussing the whole final state, and not an implementation
> detail of one part of the final state?"
>
> Do agree with jeff for this ~~~ If these feature can be supported in oss
> cassandra , I think it will be very popular, whether in  a private
> deployment environment or a public cloud service (our experience can prove
> it). In addition, it is also a cost-cutting option for users too
>
> Jeff Jirsa  于2023年9月26日周二 00:11写道:
>
>
> - I think this is a great step forward.
> - Being able to move sstables around between tiers of storage is a feature
> Cassandra desperately needs, especially if one of those tiers is some sort
> of object storage
> - This looks like it's a foundational piece that enables that. Perhaps by
> a team that's already implemented this end to end?
> - Rather than building this piece by piece, I think it'd be awesome if
> someone drew up an end-to-end plan to implement tiered storage, so we can
> make sure we're discussing the whole final state, and not an implementation
> detail of one part of the final state?
>
>
>
>
>
>
> On Sun, Sep 24, 2023 at 11:49 PM Claude Warren, Jr via dev <
> dev@cassandra.apache.org> wrote:
>
> I have just filed CEP-36 [1] to allow for keyspace/table storage outside
> of the standard storage space.
>
> There are two desires  driving this change:
>
>1. The ability to temporarily move some keyspaces/tables to storage
>outside the normal directory tree to other disk so that compaction can
>occur in situati

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Josh McKenzie
> it may be better to support most cloud storage
> It simply only supports S3, which feels a bit customized for a certain user 
> and is not universal enough.Am I right ?
I agree w/the eventual goal (and constraint on design now) of supporting most 
popular cloud storage vendors, but if we have someone with an itch to scratch 
and at the end of that we end up with first steps in a compatible direction to 
ultimately supporting decoupled / abstracted storage systems, that's fantastic.

To Jeff's point - so long as we can think about and chart a general path of 
where we want to go, if Claude has the time and inclination to handle 
abstracting out the API in that direction and one implementation, that's 
fantastic IMO.

I know there's some other folks out there who've done some interception / 
refactoring of the FileChannel stuff to support disaggregated storage; curious 
what their experiences were like.


On Tue, Sep 26, 2023, at 4:20 AM, Claude Warren, Jr via dev wrote:
> The intention of the CEP is to lay the groundwork to allow development of 
> ChannelProxyFactories that are pluggable in Cassandra.  In this way any 
> storage system can be a candidate for Cassandra storage provided FileChannels 
> can be created for the system. 
> 
> As I stated before I think that there may be a need for a java.nio.FileSystem 
> implementation for  the proxies but I have not had the time to dig into it 
> yet.
> 
> Claude
> 
> 
> On Tue, Sep 26, 2023 at 9:01 AM guo Maxwell  wrote:
>> In my mind , it may be better to support most cloud storage : aws, 
>> azure,gcp,aliyun and so on . We may make it a plugable. But in that way, it 
>> seems there may need a filesystem interface layer for object storage. And 
>> should we support ,distributed system like hdfs ,or something else. We 
>> should first discuss what should be done and what should not be done. It 
>> simply only supports S3, which feels a bit customized for a certain user and 
>> is not universal enough.Am I right ?
>> 
>> Claude Warren, Jr  于2023年9月26日周二 14:36写道:
>>> My intention is to develop an S3 storage system using  
>>> https://github.com/carlspring/s3fs-nio 
>>> 
>>> There are several issues yet to be solved:
>>>  1. There are some internal calls that create files in the table directory 
>>> that do not use the channel proxy.  I believe that these are making calls 
>>> on File objects.  I think those File objects are Cassandra File objects not 
>>> Java I/O File objects, but am unsure.
>>>  2. Determine if the carlspring s3fs-nio library will be performant enough 
>>> to work in the long run.  There may be issues with it:
>>>1. Downloading entire files before using them rather than using views 
>>> into larger remotely stored files.
>>>2. Requiring a complete file to upload rather than using the partial 
>>> upload capability of the S3 interface.
>>> 
>>> 
>>> On Tue, Sep 26, 2023 at 4:11 AM guo Maxwell  wrote:
 "Rather than building this piece by piece, I think it'd be awesome if 
 someone drew up an end-to-end plan to implement tiered storage, so we can 
 make sure we're discussing the whole final state, and not an 
 implementation detail of one part of the final state?"
 
 Do agree with jeff for this ~~~ If these feature can be supported in oss 
 cassandra , I think it will be very popular, whether in  a private 
 deployment environment or a public cloud service (our experience can prove 
 it). In addition, it is also a cost-cutting option for users too
 
 Jeff Jirsa  于2023年9月26日周二 00:11写道:
> 
> - I think this is a great step forward. 
> - Being able to move sstables around between tiers of storage is a 
> feature Cassandra desperately needs, especially if one of those tiers is 
> some sort of object storage
> - This looks like it's a foundational piece that enables that. Perhaps by 
> a team that's already implemented this end to end? 
> - Rather than building this piece by piece, I think it'd be awesome if 
> someone drew up an end-to-end plan to implement tiered storage, so we can 
> make sure we're discussing the whole final state, and not an 
> implementation detail of one part of the final state?
> 
> 
> 
> 
> 
> 
> On Sun, Sep 24, 2023 at 11:49 PM Claude Warren, Jr via dev 
>  wrote:
>> I have just filed CEP-36 [1] to allow for keyspace/table storage outside 
>> of the standard storage space.  
>> 
>> There are two desires  driving this change:
>>  1. The ability to temporarily move some keyspaces/tables to storage 
>> outside the normal directory tree to other disk so that compaction can 
>> occur in situations where there is not enough disk space for compaction 
>> and the processing to the moved data can not be suspended.
>>  2. The ability to store infrequently used data on slower cheaper 
>> storage layers.
>> I have a working POC implementation [2] though there are som

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Claude Warren, Jr via dev
The intention of the CEP is to lay the groundwork to allow development of
ChannelProxyFactories that are pluggable in Cassandra.  In this way any
storage system can be a candidate for Cassandra storage provided
FileChannels can be created for the system.

As I stated before I think that there may be a need for a
java.nio.FileSystem implementation for  the proxies but I have not had the
time to dig into it yet.

Claude


On Tue, Sep 26, 2023 at 9:01 AM guo Maxwell  wrote:

> In my mind , it may be better to support most cloud storage : aws,
> azure,gcp,aliyun and so on . We may make it a plugable. But in that way, it
> seems there may need a filesystem interface layer for object storage. And
> should we support ,distributed system like hdfs ,or something else. We
> should first discuss what should be done and what should not be done. It
> simply only supports S3, which feels a bit customized for a certain user
> and is not universal enough.Am I right ?
>
> Claude Warren, Jr  于2023年9月26日周二 14:36写道:
>
>> My intention is to develop an S3 storage system using
>> https://github.com/carlspring/s3fs-nio
>>
>> There are several issues yet to be solved:
>>
>>1. There are some internal calls that create files in the table
>>directory that do not use the channel proxy.  I believe that these are
>>making calls on File objects.  I think those File objects are Cassandra
>>File objects not Java I/O File objects, but am unsure.
>>2. Determine if the carlspring s3fs-nio library will be performant
>>enough to work in the long run.  There may be issues with it:
>>   1. Downloading entire files before using them rather than using
>>   views into larger remotely stored files.
>>   2. Requiring a complete file to upload rather than using the
>>   partial upload capability of the S3 interface.
>>
>>
>>
>> On Tue, Sep 26, 2023 at 4:11 AM guo Maxwell  wrote:
>>
>>> "Rather than building this piece by piece, I think it'd be awesome if
>>> someone drew up an end-to-end plan to implement tiered storage, so we can
>>> make sure we're discussing the whole final state, and not an implementation
>>> detail of one part of the final state?"
>>>
>>> Do agree with jeff for this ~~~ If these feature can be supported in oss
>>> cassandra , I think it will be very popular, whether in  a private
>>> deployment environment or a public cloud service (our experience can prove
>>> it). In addition, it is also a cost-cutting option for users too
>>>
>>> Jeff Jirsa  于2023年9月26日周二 00:11写道:
>>>

 - I think this is a great step forward.
 - Being able to move sstables around between tiers of storage is a
 feature Cassandra desperately needs, especially if one of those tiers is
 some sort of object storage
 - This looks like it's a foundational piece that enables that. Perhaps
 by a team that's already implemented this end to end?
 - Rather than building this piece by piece, I think it'd be awesome if
 someone drew up an end-to-end plan to implement tiered storage, so we can
 make sure we're discussing the whole final state, and not an implementation
 detail of one part of the final state?






 On Sun, Sep 24, 2023 at 11:49 PM Claude Warren, Jr via dev <
 dev@cassandra.apache.org> wrote:

> I have just filed CEP-36 [1] to allow for keyspace/table storage
> outside of the standard storage space.
>
> There are two desires  driving this change:
>
>1. The ability to temporarily move some keyspaces/tables to
>storage outside the normal directory tree to other disk so that 
> compaction
>can occur in situations where there is not enough disk space for 
> compaction
>and the processing to the moved data can not be suspended.
>2. The ability to store infrequently used data on slower cheaper
>storage layers.
>
> I have a working POC implementation [2] though there are some issues
> still to be solved and much logging to be reduced.
>
> I look forward to productive discussions,
> Claude
>
> [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>
>
>
>>>
>>> --
>>> you are the apple of my eye !
>>>
>>
>
> --
> you are the apple of my eye !
>


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread guo Maxwell
In my mind , it may be better to support most cloud storage : aws,
azure,gcp,aliyun and so on . We may make it a plugable. But in that way, it
seems there may need a filesystem interface layer for object storage. And
should we support ,distributed system like hdfs ,or something else. We
should first discuss what should be done and what should not be done. It
simply only supports S3, which feels a bit customized for a certain user
and is not universal enough.Am I right ?

Claude Warren, Jr  于2023年9月26日周二 14:36写道:

> My intention is to develop an S3 storage system using
> https://github.com/carlspring/s3fs-nio
>
> There are several issues yet to be solved:
>
>1. There are some internal calls that create files in the table
>directory that do not use the channel proxy.  I believe that these are
>making calls on File objects.  I think those File objects are Cassandra
>File objects not Java I/O File objects, but am unsure.
>2. Determine if the carlspring s3fs-nio library will be performant
>enough to work in the long run.  There may be issues with it:
>   1. Downloading entire files before using them rather than using
>   views into larger remotely stored files.
>   2. Requiring a complete file to upload rather than using the
>   partial upload capability of the S3 interface.
>
>
>
> On Tue, Sep 26, 2023 at 4:11 AM guo Maxwell  wrote:
>
>> "Rather than building this piece by piece, I think it'd be awesome if
>> someone drew up an end-to-end plan to implement tiered storage, so we can
>> make sure we're discussing the whole final state, and not an implementation
>> detail of one part of the final state?"
>>
>> Do agree with jeff for this ~~~ If these feature can be supported in oss
>> cassandra , I think it will be very popular, whether in  a private
>> deployment environment or a public cloud service (our experience can prove
>> it). In addition, it is also a cost-cutting option for users too
>>
>> Jeff Jirsa  于2023年9月26日周二 00:11写道:
>>
>>>
>>> - I think this is a great step forward.
>>> - Being able to move sstables around between tiers of storage is a
>>> feature Cassandra desperately needs, especially if one of those tiers is
>>> some sort of object storage
>>> - This looks like it's a foundational piece that enables that. Perhaps
>>> by a team that's already implemented this end to end?
>>> - Rather than building this piece by piece, I think it'd be awesome if
>>> someone drew up an end-to-end plan to implement tiered storage, so we can
>>> make sure we're discussing the whole final state, and not an implementation
>>> detail of one part of the final state?
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Sep 24, 2023 at 11:49 PM Claude Warren, Jr via dev <
>>> dev@cassandra.apache.org> wrote:
>>>
 I have just filed CEP-36 [1] to allow for keyspace/table storage
 outside of the standard storage space.

 There are two desires  driving this change:

1. The ability to temporarily move some keyspaces/tables to storage
outside the normal directory tree to other disk so that compaction can
occur in situations where there is not enough disk space for compaction 
 and
the processing to the moved data can not be suspended.
2. The ability to store infrequently used data on slower cheaper
storage layers.

 I have a working POC implementation [2] though there are some issues
 still to be solved and much logging to be reduced.

 I look forward to productive discussions,
 Claude

 [1]
 https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
 [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory



>>
>> --
>> you are the apple of my eye !
>>
>

-- 
you are the apple of my eye !


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-25 Thread Claude Warren, Jr via dev
My intention is to develop an S3 storage system using
https://github.com/carlspring/s3fs-nio

There are several issues yet to be solved:

   1. There are some internal calls that create files in the table
   directory that do not use the channel proxy.  I believe that these are
   making calls on File objects.  I think those File objects are Cassandra
   File objects not Java I/O File objects, but am unsure.
   2. Determine if the carlspring s3fs-nio library will be performant
   enough to work in the long run.  There may be issues with it:
  1. Downloading entire files before using them rather than using views
  into larger remotely stored files.
  2. Requiring a complete file to upload rather than using the partial
  upload capability of the S3 interface.



On Tue, Sep 26, 2023 at 4:11 AM guo Maxwell  wrote:

> "Rather than building this piece by piece, I think it'd be awesome if
> someone drew up an end-to-end plan to implement tiered storage, so we can
> make sure we're discussing the whole final state, and not an implementation
> detail of one part of the final state?"
>
> Do agree with jeff for this ~~~ If these feature can be supported in oss
> cassandra , I think it will be very popular, whether in  a private
> deployment environment or a public cloud service (our experience can prove
> it). In addition, it is also a cost-cutting option for users too
>
> Jeff Jirsa  于2023年9月26日周二 00:11写道:
>
>>
>> - I think this is a great step forward.
>> - Being able to move sstables around between tiers of storage is a
>> feature Cassandra desperately needs, especially if one of those tiers is
>> some sort of object storage
>> - This looks like it's a foundational piece that enables that. Perhaps by
>> a team that's already implemented this end to end?
>> - Rather than building this piece by piece, I think it'd be awesome if
>> someone drew up an end-to-end plan to implement tiered storage, so we can
>> make sure we're discussing the whole final state, and not an implementation
>> detail of one part of the final state?
>>
>>
>>
>>
>>
>>
>> On Sun, Sep 24, 2023 at 11:49 PM Claude Warren, Jr via dev <
>> dev@cassandra.apache.org> wrote:
>>
>>> I have just filed CEP-36 [1] to allow for keyspace/table storage outside
>>> of the standard storage space.
>>>
>>> There are two desires  driving this change:
>>>
>>>1. The ability to temporarily move some keyspaces/tables to storage
>>>outside the normal directory tree to other disk so that compaction can
>>>occur in situations where there is not enough disk space for compaction 
>>> and
>>>the processing to the moved data can not be suspended.
>>>2. The ability to store infrequently used data on slower cheaper
>>>storage layers.
>>>
>>> I have a working POC implementation [2] though there are some issues
>>> still to be solved and much logging to be reduced.
>>>
>>> I look forward to productive discussions,
>>> Claude
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
>>> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>>>
>>>
>>>
>
> --
> you are the apple of my eye !
>


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-25 Thread guo Maxwell
"Rather than building this piece by piece, I think it'd be awesome if
someone drew up an end-to-end plan to implement tiered storage, so we can
make sure we're discussing the whole final state, and not an implementation
detail of one part of the final state?"

Do agree with jeff for this ~~~ If these feature can be supported in oss
cassandra , I think it will be very popular, whether in  a private
deployment environment or a public cloud service (our experience can prove
it). In addition, it is also a cost-cutting option for users too

Jeff Jirsa  于2023年9月26日周二 00:11写道:

>
> - I think this is a great step forward.
> - Being able to move sstables around between tiers of storage is a feature
> Cassandra desperately needs, especially if one of those tiers is some sort
> of object storage
> - This looks like it's a foundational piece that enables that. Perhaps by
> a team that's already implemented this end to end?
> - Rather than building this piece by piece, I think it'd be awesome if
> someone drew up an end-to-end plan to implement tiered storage, so we can
> make sure we're discussing the whole final state, and not an implementation
> detail of one part of the final state?
>
>
>
>
>
>
> On Sun, Sep 24, 2023 at 11:49 PM Claude Warren, Jr via dev <
> dev@cassandra.apache.org> wrote:
>
>> I have just filed CEP-36 [1] to allow for keyspace/table storage outside
>> of the standard storage space.
>>
>> There are two desires  driving this change:
>>
>>1. The ability to temporarily move some keyspaces/tables to storage
>>outside the normal directory tree to other disk so that compaction can
>>occur in situations where there is not enough disk space for compaction 
>> and
>>the processing to the moved data can not be suspended.
>>2. The ability to store infrequently used data on slower cheaper
>>storage layers.
>>
>> I have a working POC implementation [2] though there are some issues
>> still to be solved and much logging to be reduced.
>>
>> I look forward to productive discussions,
>> Claude
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
>> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>>
>>
>>

-- 
you are the apple of my eye !


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-25 Thread Jeff Jirsa
- I think this is a great step forward.
- Being able to move sstables around between tiers of storage is a feature
Cassandra desperately needs, especially if one of those tiers is some sort
of object storage
- This looks like it's a foundational piece that enables that. Perhaps by a
team that's already implemented this end to end?
- Rather than building this piece by piece, I think it'd be awesome if
someone drew up an end-to-end plan to implement tiered storage, so we can
make sure we're discussing the whole final state, and not an implementation
detail of one part of the final state?






On Sun, Sep 24, 2023 at 11:49 PM Claude Warren, Jr via dev <
dev@cassandra.apache.org> wrote:

> I have just filed CEP-36 [1] to allow for keyspace/table storage outside
> of the standard storage space.
>
> There are two desires  driving this change:
>
>1. The ability to temporarily move some keyspaces/tables to storage
>outside the normal directory tree to other disk so that compaction can
>occur in situations where there is not enough disk space for compaction and
>the processing to the moved data can not be suspended.
>2. The ability to store infrequently used data on slower cheaper
>storage layers.
>
> I have a working POC implementation [2] though there are some issues still
> to be solved and much logging to be reduced.
>
> I look forward to productive discussions,
> Claude
>
> [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>
>
>


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-25 Thread Claude Warren, Jr via dev
external storage can be any storage that you can produce a FileChannel
for.  There is an S3 library that does this so S3 is a definite
possibility for storage in this solution.  My example code only writes to a
different directory on the same system.  And there are a couple of places
where I did not catch the file creation, those have to be found and
redirected to the proxy location.  I think that it may be necessary to have
a java FileSystem object to make the whole thing work.  The S3 library that
I found also has an S3 FileSystem class.

This solution uses the internal file name for for example an sstable name.
The proxyfactory can examine the entire path and make a determination of
where to read/write the file.  So any determination that can be made based
on the information in the file path can be implemented with this approach.
There is no direct inspection of the data being written to determine
routing.  The only routing data are in the file name.

I ran an inhouse demo where I showed that we could reroute a single table
to a different storage while leaving the rest of the tables in the same
keyspace alone.

In discussing this with a colleague we hit upon the term "tiered nodes".
If you can spread your data across the nodes so that some nodes get the
infrequently used data (cold data) and other nodes receive the frequently
used data (hot data) then the cold data nodes can use this process to store
the data on S3 or similar systems.

On Mon, Sep 25, 2023 at 10:45 AM guo Maxwell  wrote:

> Great suggestion,  Can external storage only be local storage media? Or
> can it be stored in any storage medium, such as object storage s3 ?
> We have previously implemented a tiered storage capability, that is, there
> are multiple storage media on one node, SSD, HDD, and data placement based
> on requests. After briefly browsing the proposals, it seems that there are
> some differences. Can you help to do some explain ? Thanks 。
>
>
> Claude Warren, Jr via dev  于2023年9月25日周一
> 14:49写道:
>
>> I have just filed CEP-36 [1] to allow for keyspace/table storage outside
>> of the standard storage space.
>>
>> There are two desires  driving this change:
>>
>>1. The ability to temporarily move some keyspaces/tables to storage
>>outside the normal directory tree to other disk so that compaction can
>>occur in situations where there is not enough disk space for compaction 
>> and
>>the processing to the moved data can not be suspended.
>>2. The ability to store infrequently used data on slower cheaper
>>storage layers.
>>
>> I have a working POC implementation [2] though there are some issues
>> still to be solved and much logging to be reduced.
>>
>> I look forward to productive discussions,
>> Claude
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
>> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>>
>>
>>
>
> --
> you are the apple of my eye !
>


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-25 Thread guo Maxwell
Great suggestion,  Can external storage only be local storage media? Or can
it be stored in any storage medium, such as object storage s3 ?
We have previously implemented a tiered storage capability, that is, there
are multiple storage media on one node, SSD, HDD, and data placement based
on requests. After briefly browsing the proposals, it seems that there are
some differences. Can you help to do some explain ? Thanks 。


Claude Warren, Jr via dev  于2023年9月25日周一 14:49写道:

> I have just filed CEP-36 [1] to allow for keyspace/table storage outside
> of the standard storage space.
>
> There are two desires  driving this change:
>
>1. The ability to temporarily move some keyspaces/tables to storage
>outside the normal directory tree to other disk so that compaction can
>occur in situations where there is not enough disk space for compaction and
>the processing to the moved data can not be suspended.
>2. The ability to store infrequently used data on slower cheaper
>storage layers.
>
> I have a working POC implementation [2] though there are some issues still
> to be solved and much logging to be reduced.
>
> I look forward to productive discussions,
> Claude
>
> [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>
>
>

-- 
you are the apple of my eye !