Re: [DISCUSS] CommitLog default disk access mode

2024-06-15 Thread Sam
*"Glad you brought up compaction here - I think there would be a
significant benefit to moving compaction to direct i/o."*

Support Direct I/O for SSTable writing

https://issues.apache.org/jira/browse/CASSANDRA-19707

On Mon, 16 Oct 2023 at 17:38, Jon Haddad  wrote:

> Glad you brought up compaction here - I think there would be a significant
> benefit to moving compaction to direct i/o.
>
>
> On 2023/10/16 16:14:28 Benedict wrote:
> > I have some plans to (eventually) use the commit log as memtable payload
> storage (ie memtables would reference the commit log entries directly,
> storing only indexing info), and to back first level of sstables by
> reference to commit log entries. This will permit us to deliver not only
> much bigger memtables (cutting compaction throughput requirements by the
> ratio of size increase - so pretty dramatically), and faster flushing (so
> better behaviour ling write bursts), but also a fairly cheap and simple way
> to support MVCC - which will be helpful for transaction throughput.
> >
> > There is also a new commit log (“journal”) coming with Accord, that the
> rest of C* may or may not transition to.
> >
> > I only say this because this makes the utility of direct IO for commit
> log suspect, as we will be reading from the files as a matter of course
> should this go ahead; and we may end up relying on a different commit log
> implementation before long anyway.
> >
> > This is obviously a big suggestion and is not guaranteed to transpire,
> and probably won’t within the next year or so, but it should perhaps form
> some minimal part of any calculus. If the patch is otherwise simple and
> beneficial I don’t have anything against it, and the use of direct IO could
> well be of benefit eg in compaction - and also in future if we manage to
> bring a page management in process. So laying foundations there could be of
> benefit, even if the commit log eventually does not use it.
> >
> > > On 16 Oct 2023, at 17:00, Jon Haddad 
> wrote:
> > >
> > > I haven't looked at the patch, but at a high level, defaulting to
> direct I/O for commit logs makes a lot of sense to me.
> > >
> > >> On 2023/10/16 06:34:05 "Pawar, Amit" wrote:
> > >> [Public]
> > >>
> > >> Hi,
> > >>
> > >> CommitLog uses mmap (memory mapped ) segments by default. Direct-IO
> feature is proposed through new PR[1] to improve the CommitLog IO speed.
> Enabling this by default could be useful feature to address IO bottleneck
> seen during peak load.
> > >>
> > >> Need your input regarding changing this default. Please suggest.
> > >>
> > >> https://issues.apache.org/jira/browse/CASSANDRA-18464
> > >>
> > >> thanks,
> > >> Amit Pawar
> > >>
> > >> [1] - https://github.com/apache/cassandra/pull/2777
> > >>
> >
>


Re: [DISCUSS] CommitLog default disk access mode

2023-10-26 Thread guo Maxwell
Thanks  for your contribution 

Pawar, Amit 于2023年10月26日 周四下午11:41写道:

> [Public]
>
> Default behavior is not changed. Thank you, Josh for your appreciation.
> This is my first patch, and it means lot to me.
>
>
>
> Thanks again,
>
> Amit
>
>
>
> +1 to adding the feature, clear and easy configurability, and if after a
> major cycle we can say with confidence it's beating the status quo in the
> vast majority of general cases, flip default. I mean, logically it
> *should* be, but infra software at the scale we do requires great care. :)
>
>
>
> This is great work Amit - well done.
>
>
>
> On Mon, Oct 16, 2023, at 4:28 PM, Dinesh Joshi wrote:
>
> I haven't looked at the patch yet so take whatever I say here with a pinch
> of salt.
>
>
>
> Philosophically, defaults should not change unless there is a clear
> demonstrable benefit in majority cases for our users. In this case DirectIO
> should have clear benefits. That said, this is a new feature and I would
> personally default it to off. We should document it and allow for our users
> to enable it. This derisks the project in case there is an inadvertent
> change in behavior.
>
>
>
> Dinesh
>
>
>
> On Oct 15, 2023, at 11:34 PM, Pawar, Amit  wrote:
>
>
>
> [Public]
>
>
>
> Hi,
>
>
>
> CommitLog uses mmap (memory mapped ) segments by default. Direct-IO
> feature is proposed through new PR[1] to improve the CommitLog IO speed.
> Enabling this by default could be useful feature to address IO bottleneck
> seen during peak load.
>
>
>
> Need your input regarding changing this default. Please suggest.
>
>
>
> https://issues.apache.org/jira/browse/CASSANDRA-18464
>
>
>
> thanks,
>
> Amit Pawar
>
>
>
> [1] - https://github.com/apache/cassandra/pull/2777
>
>
>


RE: [DISCUSS] CommitLog default disk access mode

2023-10-26 Thread Pawar, Amit
[Public]

Default behavior is not changed. Thank you, Josh for your appreciation. This is 
my first patch, and it means lot to me.

Thanks again,
Amit

+1 to adding the feature, clear and easy configurability, and if after a major 
cycle we can say with confidence it's beating the status quo in the vast 
majority of general cases, flip default. I mean, logically it should be, but 
infra software at the scale we do requires great care. :)

This is great work Amit - well done.

On Mon, Oct 16, 2023, at 4:28 PM, Dinesh Joshi wrote:
I haven't looked at the patch yet so take whatever I say here with a pinch of 
salt.

Philosophically, defaults should not change unless there is a clear 
demonstrable benefit in majority cases for our users. In this case DirectIO 
should have clear benefits. That said, this is a new feature and I would 
personally default it to off. We should document it and allow for our users to 
enable it. This derisks the project in case there is an inadvertent change in 
behavior.

Dinesh

On Oct 15, 2023, at 11:34 PM, Pawar, Amit 
mailto:amit.pa...@amd.com>> wrote:


[Public]

Hi,

CommitLog uses mmap (memory mapped ) segments by default. Direct-IO feature is 
proposed through new PR[1] to improve the CommitLog IO speed. Enabling this by 
default could be useful feature to address IO bottleneck seen during peak load.

Need your input regarding changing this default. Please suggest.

https://issues.apache.org/jira/browse/CASSANDRA-18464

thanks,
Amit Pawar

[1] - https://github.com/apache/cassandra/pull/2777



RE: [DISCUSS] CommitLog default disk access mode

2023-10-26 Thread Pawar, Amit
[Public]

Default behavior won’t be changed as per your feedback and a ‘direct’ mode can 
be used to enable this feature. Thank you all again.

--
Amit

I think introducing the  feature is a good idea.
I also think that it should _NOT_ be enabled by default for all the reasons 
stated above.
Finding a cohort of users who are interested in turning it on would provide a 
nice testbed to shake out any issues without affecting everyone.

On Tue, Oct 17, 2023 at 3:58 PM C. Scott Andreas 
mailto:sc...@paradoxica.net>> wrote:
Let’s please not change the default at the same time the feature is introduced.

Making the capability available will allow users to evaluate and quantify the 
benefit of the work, as well as to call out any unintended consequences. As 
users and the project gain confidence in the results, we can evaluate changing 
the default.

– Scott


On Oct 17, 2023, at 4:25 AM, guo Maxwell 
mailto:cclive1...@gmail.com>> wrote:

-1

I still think we should keep it as it is until the  direct io  for commitlog 
(read and write) is ready and relatively stable. And then we may change the 
default value to direct io from mmap in a future version, such as 5.2, or 6.0.

Pawar, Amit mailto:amit.pa...@amd.com>> 于2023年10月17日周二 
19:03写道:

[AMD Official Use Only - General]

Thank you all for your input. Received total 6 replies and below is the summary.

1. Mmap   : 2/6
2. Direct-I/O : 4/6

Default should be changed to Direct-IO then ? please confirm.

Thanks,
Amit

Strongly agree with this point of view that direct IO  can bring great benefits.

I have reviewed part of the code, and my preliminary judgment is that it is not 
very common and limited in some situations, for example, it  works for 
commitlog's write path only for this patch.So I suggest that the default value 
should not be modified until the entire function is comprehensive and stable, 
and then modified in a future version.

Sam mailto:samueldlightf...@gmail.com>> 
于2023年10月17日周二 05:39写道:
Glad you brought up compaction here - I think there would be a significant 
benefit to moving compaction to direct i/o.

+1. Would love to see this get traction.

On Mon, 16 Oct 2023 at 19:38, Jon Haddad 
mailto:rustyrazorbl...@apache.org>> wrote:
Glad you brought up compaction here - I think there would be a significant 
benefit to moving compaction to direct i/o.


On 2023/10/16 16:14:28 Benedict wrote:
> I have some plans to (eventually) use the commit log as memtable payload 
> storage (ie memtables would reference the commit log entries directly, 
> storing only indexing info), and to back first level of sstables by reference 
> to commit log entries. This will permit us to deliver not only much bigger 
> memtables (cutting compaction throughput requirements by the ratio of size 
> increase - so pretty dramatically), and faster flushing (so better behaviour 
> ling write bursts), but also a fairly cheap and simple way to support MVCC - 
> which will be helpful for transaction throughput.
>
> There is also a new commit log (“journal”) coming with Accord, that the rest 
> of C* may or may not transition to.
>
> I only say this because this makes the utility of direct IO for commit log 
> suspect, as we will be reading from the files as a matter of course should 
> this go ahead; and we may end up relying on a different commit log 
> implementation before long anyway.
>
> This is obviously a big suggestion and is not guaranteed to transpire, and 
> probably won’t within the next year or so, but it should perhaps form some 
> minimal part of any calculus. If the patch is otherwise simple and beneficial 
> I don’t have anything against it, and the use of direct IO could well be of 
> benefit eg in compaction - and also in future if we manage to bring a page 
> management in process. So laying foundations there could be of benefit, even 
> if the commit log eventually does not use it.
>
> > On 16 Oct 2023, at 17:00, Jon Haddad 
> > mailto:rustyrazorbl...@apache.org>> wrote:
> >
> > I haven't looked at the patch, but at a high level, defaulting to direct 
> > I/O for commit logs makes a lot of sense to me.
> >
> >> On 2023/10/16 06:34:05 "Pawar, Amit" wrote:
> >> [Public]
> >>
> >> Hi,
> >>
> >> CommitLog uses mmap (memory mapped ) segments by default. Direct-IO 
> >> feature is proposed through new PR[1] to improve the CommitLog IO speed. 
> >> Enabling this by default could be useful feature to address IO bottleneck 
> >> seen during peak load.
> >>
> >> Need your input regarding changing this default. Please suggest.
> >>
> >> https://issues.apache.org/jira/browse/CASSANDRA-18464
> >>
> >> thanks,
> >> Amit Pawar
> >>
> >> [1] - https://github.com/apache/cassandra/pull/2777
> >>
>


--
you are the apple of my eye !


--
you are the apple of my eye !


Re: [DISCUSS] CommitLog default disk access mode

2023-10-18 Thread Josh McKenzie
+1 to adding the feature, clear and easy configurability, and if after a major 
cycle we can say with confidence it's beating the status quo in the vast 
majority of general cases, flip default. I mean, logically it *should* be, but 
infra software at the scale we do requires great care. :)

This is great work Amit - well done.

On Mon, Oct 16, 2023, at 4:28 PM, Dinesh Joshi wrote:
> I haven't looked at the patch yet so take whatever I say here with a pinch of 
> salt.
> 
> Philosophically, defaults should not change unless there is a clear 
> demonstrable benefit in majority cases for our users. In this case DirectIO 
> should have clear benefits. That said, this is a new feature and I would 
> personally default it to off. We should document it and allow for our users 
> to enable it. This derisks the project in case there is an inadvertent change 
> in behavior.
> 
> Dinesh
> 
>> On Oct 15, 2023, at 11:34 PM, Pawar, Amit  wrote:
>> 
>> [Public]
>> 
>> 
>> Hi,
>>  
>> CommitLog uses mmap (memory mapped ) segments by default. Direct-IO feature 
>> is proposed through new PR[1] to improve the CommitLog IO speed. Enabling 
>> this by default could be useful feature to address IO bottleneck seen during 
>> peak load.
>>  
>> Need your input regarding changing this default. Please suggest.
>>  
>> https://issues.apache.org/jira/browse/CASSANDRA-18464
>>  
>> thanks,
>> Amit Pawar
>>  
>> [1] - https://github.com/apache/cassandra/pull/2777


Re: [DISCUSS] CommitLog default disk access mode

2023-10-18 Thread Claude Warren, Jr via dev
I think introducing the  feature is a good idea.
I also think that it should _NOT_ be enabled by default for all the reasons
stated above.
Finding a cohort of users who are interested in turning it on would provide
a nice testbed to shake out any issues without affecting everyone.

On Tue, Oct 17, 2023 at 3:58 PM C. Scott Andreas 
wrote:

> Let’s please not change the default at the same time the feature is
> introduced.
>
> Making the capability available will allow users to evaluate and quantify
> the benefit of the work, as well as to call out any unintended
> consequences. As users and the project gain confidence in the results, we
> can evaluate changing the default.
>
> – Scott
>
> On Oct 17, 2023, at 4:25 AM, guo Maxwell  wrote:
>
> 
> -1
>
> I still think we should keep it as it is until the  direct io
> for commitlog (read and write) is ready and relatively stable. And then we
> may change the default value to direct io from mmap in a future version,
> such as 5.2, or 6.0.
>
> Pawar, Amit  于2023年10月17日周二 19:03写道:
>
>> [AMD Official Use Only - General]
>>
>> Thank you all for your input. Received total 6 replies and below is the
>> summary.
>>
>>
>>
>> 1. Mmap   : 2/6
>>
>> 2. Direct-I/O : 4/6
>>
>>
>>
>> Default should be changed to Direct-IO then ? please confirm.
>>
>>
>>
>> Thanks,
>>
>> Amit
>>
>>
>>
>> Strongly agree with this point of view that direct IO  can bring great
>> benefits.
>>
>>
>>
>> I have reviewed part of the code, and my preliminary judgment is that it
>> is not very common and limited in some situations, for example, it  works
>> for commitlog's write path only for this patch.So I suggest that the
>> default value should not be modified until the entire function is
>> comprehensive and stable, and then modified in a future version.
>>
>>
>>
>> Sam  于2023年10月17日周二 05:39写道:
>>
>> *Glad you brought up compaction here - I think there would be a
>> significant benefit to moving compaction to direct i/o.*
>>
>>
>>
>> +1. Would love to see this get traction.
>>
>>
>>
>> On Mon, 16 Oct 2023 at 19:38, Jon Haddad 
>> wrote:
>>
>> Glad you brought up compaction here - I think there would be a
>> significant benefit to moving compaction to direct i/o.
>>
>>
>> On 2023/10/16 16:14:28 Benedict wrote:
>> > I have some plans to (eventually) use the commit log as memtable
>> payload storage (ie memtables would reference the commit log entries
>> directly, storing only indexing info), and to back first level of sstables
>> by reference to commit log entries. This will permit us to deliver not only
>> much bigger memtables (cutting compaction throughput requirements by the
>> ratio of size increase - so pretty dramatically), and faster flushing (so
>> better behaviour ling write bursts), but also a fairly cheap and simple way
>> to support MVCC - which will be helpful for transaction throughput.
>> >
>> > There is also a new commit log (“journal”) coming with Accord, that the
>> rest of C* may or may not transition to.
>> >
>> > I only say this because this makes the utility of direct IO for commit
>> log suspect, as we will be reading from the files as a matter of course
>> should this go ahead; and we may end up relying on a different commit log
>> implementation before long anyway.
>> >
>> > This is obviously a big suggestion and is not guaranteed to transpire,
>> and probably won’t within the next year or so, but it should perhaps form
>> some minimal part of any calculus. If the patch is otherwise simple and
>> beneficial I don’t have anything against it, and the use of direct IO could
>> well be of benefit eg in compaction - and also in future if we manage to
>> bring a page management in process. So laying foundations there could be of
>> benefit, even if the commit log eventually does not use it.
>> >
>> > > On 16 Oct 2023, at 17:00, Jon Haddad 
>> wrote:
>> > >
>> > > I haven't looked at the patch, but at a high level, defaulting to
>> direct I/O for commit logs makes a lot of sense to me.
>> > >
>> > >> On 2023/10/16 06:34:05 "Pawar, Amit" wrote:
>> > >> [Public]
>> > >>
>> > >> Hi,
>> > >>
>> > >> CommitLog uses mmap (memory mapped ) segments by default. Direct-IO
>> feature is proposed through new PR[1] to improve the CommitLog IO speed.
>> Enabling this by default could be useful feature to address IO bottleneck
>> seen during peak load.
>> > >>
>> > >> Need your input regarding changing this default. Please suggest.
>> > >>
>> > >> https://issues.apache.org/jira/browse/CASSANDRA-18464
>> > >>
>> > >> thanks,
>> > >> Amit Pawar
>> > >>
>> > >> [1] - https://github.com/apache/cassandra/pull/2777
>> > >>
>> >
>>
>>
>>
>>
>> --
>>
>> you are the apple of my eye !
>>
>
>
> --
> you are the apple of my eye !
>
>


Re: [DISCUSS] CommitLog default disk access mode

2023-10-17 Thread C. Scott Andreas
Let’s please not change the default at the same time the feature is introduced.Making the capability available will allow users to evaluate and quantify the benefit of the work, as well as to call out any unintended consequences. As users and the project gain confidence in the results, we can evaluate changing the default.– ScottOn Oct 17, 2023, at 4:25 AM, guo Maxwell  wrote:-1I still think we should keep it as it is until the  direct io  for commitlog (read and write) is ready and relatively stable. And then we may change the default value to direct io from mmap in a future version, such as 5.2, or 6.0.Pawar, Amit  于2023年10月17日周二 19:03写道:







[AMD Official Use Only - General]




Thank you all for your input. Received total 6 replies and below is the summary.
 
1. Mmap   : 2/6
2. Direct-I/O : 4/6
 
Default should be changed to Direct-IO then ? please confirm.
 
Thanks,
Amit
 




Strongly agree with this point of view that direct IO  can bring great benefits. 


 

I have reviewed part of the code, and my preliminary judgment is that it is not very common and limited in some situations, for example, it  works for commitlog's write path only for this patch.So I suggest that
 the default value should not be modified until the entire function is comprehensive and stable, and then modified in a future version.


 



Sam 
于2023年10月17日周二 05:39写道:




Glad you brought up compaction here - I think there would be a significant benefit to moving compaction to direct i/o.


 


+1. Would love to see this get traction.

 


On Mon, 16 Oct 2023 at 19:38, Jon Haddad  wrote:


Glad you brought up compaction here - I think there would be a significant benefit to moving compaction to direct i/o.


On 2023/10/16 16:14:28 Benedict wrote:
> I have some plans to (eventually) use the commit log as memtable payload storage (ie memtables would reference the commit log entries directly, storing only indexing info), and to back first level of sstables by reference to commit log entries. This will
 permit us to deliver not only much bigger memtables (cutting compaction throughput requirements by the ratio of size increase - so pretty dramatically), and faster flushing (so better behaviour ling write bursts), but also a fairly cheap and simple way to
 support MVCC - which will be helpful for transaction throughput.
> 
> There is also a new commit log (“journal”) coming with Accord, that the rest of C* may or may not transition to.
> 
> I only say this because this makes the utility of direct IO for commit log suspect, as we will be reading from the files as a matter of course should this go ahead; and we may end up relying on a different commit log implementation before long anyway.
> 
> This is obviously a big suggestion and is not guaranteed to transpire, and probably won’t within the next year or so, but it should perhaps form some minimal part of any calculus. If the patch is otherwise simple and beneficial I don’t have anything against
 it, and the use of direct IO could well be of benefit eg in compaction - and also in future if we manage to bring a page management in process. So laying foundations there could be of benefit, even if the commit log eventually does not use it.
> 
> > On 16 Oct 2023, at 17:00, Jon Haddad  wrote:
> > 
> > I haven't looked at the patch, but at a high level, defaulting to direct I/O for commit logs makes a lot of sense to me. 

> > 
> >> On 2023/10/16 06:34:05 "Pawar, Amit" wrote:
> >> [Public]
> >> 
> >> Hi,
> >> 
> >> CommitLog uses mmap (memory mapped ) segments by default. Direct-IO feature is proposed through new PR[1] to improve the CommitLog IO speed. Enabling this by default could be useful feature to address IO bottleneck seen during peak load.
> >> 
> >> Need your input regarding changing this default. Please suggest.
> >> 
> >> 
https://issues.apache.org/jira/browse/CASSANDRA-18464
> >> 
> >> thanks,
> >> Amit Pawar
> >> 
> >> [1] - 
https://github.com/apache/cassandra/pull/2777
> >> 
> 









 

--


you are the apple of my eye !






-- you are the apple of my eye !


Re: [DISCUSS] CommitLog default disk access mode

2023-10-17 Thread guo Maxwell
-1

I still think we should keep it as it is until the  direct io
for commitlog (read and write) is ready and relatively stable. And then we
may change the default value to direct io from mmap in a future version,
such as 5.2, or 6.0.

Pawar, Amit  于2023年10月17日周二 19:03写道:

> [AMD Official Use Only - General]
>
> Thank you all for your input. Received total 6 replies and below is the
> summary.
>
>
>
> 1. Mmap   : 2/6
>
> 2. Direct-I/O : 4/6
>
>
>
> Default should be changed to Direct-IO then ? please confirm.
>
>
>
> Thanks,
>
> Amit
>
>
>
> Strongly agree with this point of view that direct IO  can bring great
> benefits.
>
>
>
> I have reviewed part of the code, and my preliminary judgment is that it
> is not very common and limited in some situations, for example, it  works
> for commitlog's write path only for this patch.So I suggest that the
> default value should not be modified until the entire function is
> comprehensive and stable, and then modified in a future version.
>
>
>
> Sam  于2023年10月17日周二 05:39写道:
>
> *Glad you brought up compaction here - I think there would be a
> significant benefit to moving compaction to direct i/o.*
>
>
>
> +1. Would love to see this get traction.
>
>
>
> On Mon, 16 Oct 2023 at 19:38, Jon Haddad 
> wrote:
>
> Glad you brought up compaction here - I think there would be a significant
> benefit to moving compaction to direct i/o.
>
>
> On 2023/10/16 16:14:28 Benedict wrote:
> > I have some plans to (eventually) use the commit log as memtable payload
> storage (ie memtables would reference the commit log entries directly,
> storing only indexing info), and to back first level of sstables by
> reference to commit log entries. This will permit us to deliver not only
> much bigger memtables (cutting compaction throughput requirements by the
> ratio of size increase - so pretty dramatically), and faster flushing (so
> better behaviour ling write bursts), but also a fairly cheap and simple way
> to support MVCC - which will be helpful for transaction throughput.
> >
> > There is also a new commit log (“journal”) coming with Accord, that the
> rest of C* may or may not transition to.
> >
> > I only say this because this makes the utility of direct IO for commit
> log suspect, as we will be reading from the files as a matter of course
> should this go ahead; and we may end up relying on a different commit log
> implementation before long anyway.
> >
> > This is obviously a big suggestion and is not guaranteed to transpire,
> and probably won’t within the next year or so, but it should perhaps form
> some minimal part of any calculus. If the patch is otherwise simple and
> beneficial I don’t have anything against it, and the use of direct IO could
> well be of benefit eg in compaction - and also in future if we manage to
> bring a page management in process. So laying foundations there could be of
> benefit, even if the commit log eventually does not use it.
> >
> > > On 16 Oct 2023, at 17:00, Jon Haddad 
> wrote:
> > >
> > > I haven't looked at the patch, but at a high level, defaulting to
> direct I/O for commit logs makes a lot of sense to me.
> > >
> > >> On 2023/10/16 06:34:05 "Pawar, Amit" wrote:
> > >> [Public]
> > >>
> > >> Hi,
> > >>
> > >> CommitLog uses mmap (memory mapped ) segments by default. Direct-IO
> feature is proposed through new PR[1] to improve the CommitLog IO speed.
> Enabling this by default could be useful feature to address IO bottleneck
> seen during peak load.
> > >>
> > >> Need your input regarding changing this default. Please suggest.
> > >>
> > >> https://issues.apache.org/jira/browse/CASSANDRA-18464
> > >>
> > >> thanks,
> > >> Amit Pawar
> > >>
> > >> [1] - https://github.com/apache/cassandra/pull/2777
> > >>
> >
>
>
>
>
> --
>
> you are the apple of my eye !
>


-- 
you are the apple of my eye !


RE: [DISCUSS] CommitLog default disk access mode

2023-10-17 Thread Pawar, Amit
[AMD Official Use Only - General]

Thank you all for your input. Received total 6 replies and below is the summary.

1. Mmap   : 2/6
2. Direct-I/O : 4/6

Default should be changed to Direct-IO then ? please confirm.

Thanks,
Amit

Strongly agree with this point of view that direct IO  can bring great benefits.

I have reviewed part of the code, and my preliminary judgment is that it is not 
very common and limited in some situations, for example, it  works for 
commitlog's write path only for this patch.So I suggest that the default value 
should not be modified until the entire function is comprehensive and stable, 
and then modified in a future version.

Sam mailto:samueldlightf...@gmail.com>> 
于2023年10月17日周二 05:39写道:
Glad you brought up compaction here - I think there would be a significant 
benefit to moving compaction to direct i/o.

+1. Would love to see this get traction.

On Mon, 16 Oct 2023 at 19:38, Jon Haddad 
mailto:rustyrazorbl...@apache.org>> wrote:
Glad you brought up compaction here - I think there would be a significant 
benefit to moving compaction to direct i/o.


On 2023/10/16 16:14:28 Benedict wrote:
> I have some plans to (eventually) use the commit log as memtable payload 
> storage (ie memtables would reference the commit log entries directly, 
> storing only indexing info), and to back first level of sstables by reference 
> to commit log entries. This will permit us to deliver not only much bigger 
> memtables (cutting compaction throughput requirements by the ratio of size 
> increase - so pretty dramatically), and faster flushing (so better behaviour 
> ling write bursts), but also a fairly cheap and simple way to support MVCC - 
> which will be helpful for transaction throughput.
>
> There is also a new commit log (“journal”) coming with Accord, that the rest 
> of C* may or may not transition to.
>
> I only say this because this makes the utility of direct IO for commit log 
> suspect, as we will be reading from the files as a matter of course should 
> this go ahead; and we may end up relying on a different commit log 
> implementation before long anyway.
>
> This is obviously a big suggestion and is not guaranteed to transpire, and 
> probably won’t within the next year or so, but it should perhaps form some 
> minimal part of any calculus. If the patch is otherwise simple and beneficial 
> I don’t have anything against it, and the use of direct IO could well be of 
> benefit eg in compaction - and also in future if we manage to bring a page 
> management in process. So laying foundations there could be of benefit, even 
> if the commit log eventually does not use it.
>
> > On 16 Oct 2023, at 17:00, Jon Haddad 
> > mailto:rustyrazorbl...@apache.org>> wrote:
> >
> > I haven't looked at the patch, but at a high level, defaulting to direct 
> > I/O for commit logs makes a lot of sense to me.
> >
> >> On 2023/10/16 06:34:05 "Pawar, Amit" wrote:
> >> [Public]
> >>
> >> Hi,
> >>
> >> CommitLog uses mmap (memory mapped ) segments by default. Direct-IO 
> >> feature is proposed through new PR[1] to improve the CommitLog IO speed. 
> >> Enabling this by default could be useful feature to address IO bottleneck 
> >> seen during peak load.
> >>
> >> Need your input regarding changing this default. Please suggest.
> >>
> >> https://issues.apache.org/jira/browse/CASSANDRA-18464
> >>
> >> thanks,
> >> Amit Pawar
> >>
> >> [1] - https://github.com/apache/cassandra/pull/2777
> >>
>


--
you are the apple of my eye !


Re: [DISCUSS] CommitLog default disk access mode

2023-10-16 Thread guo Maxwell
Strongly agree with this point of view that direct IO  can bring great
benefits.

I have reviewed part of the code, and my preliminary judgment is that it is
not very common and limited in some situations, for example, it  works for
commitlog's write path only for this patch.So I suggest that the default
value should not be modified until the entire function is comprehensive and
stable, and then modified in a future version.

Sam  于2023年10月17日周二 05:39写道:

>
> *Glad you brought up compaction here - I think there would be a
> significant benefit to moving compaction to direct i/o.*
>
> +1. Would love to see this get traction.
>
> On Mon, 16 Oct 2023 at 19:38, Jon Haddad 
> wrote:
>
>> Glad you brought up compaction here - I think there would be a
>> significant benefit to moving compaction to direct i/o.
>>
>>
>> On 2023/10/16 16:14:28 Benedict wrote:
>> > I have some plans to (eventually) use the commit log as memtable
>> payload storage (ie memtables would reference the commit log entries
>> directly, storing only indexing info), and to back first level of sstables
>> by reference to commit log entries. This will permit us to deliver not only
>> much bigger memtables (cutting compaction throughput requirements by the
>> ratio of size increase - so pretty dramatically), and faster flushing (so
>> better behaviour ling write bursts), but also a fairly cheap and simple way
>> to support MVCC - which will be helpful for transaction throughput.
>> >
>> > There is also a new commit log (“journal”) coming with Accord, that the
>> rest of C* may or may not transition to.
>> >
>> > I only say this because this makes the utility of direct IO for commit
>> log suspect, as we will be reading from the files as a matter of course
>> should this go ahead; and we may end up relying on a different commit log
>> implementation before long anyway.
>> >
>> > This is obviously a big suggestion and is not guaranteed to transpire,
>> and probably won’t within the next year or so, but it should perhaps form
>> some minimal part of any calculus. If the patch is otherwise simple and
>> beneficial I don’t have anything against it, and the use of direct IO could
>> well be of benefit eg in compaction - and also in future if we manage to
>> bring a page management in process. So laying foundations there could be of
>> benefit, even if the commit log eventually does not use it.
>> >
>> > > On 16 Oct 2023, at 17:00, Jon Haddad 
>> wrote:
>> > >
>> > > I haven't looked at the patch, but at a high level, defaulting to
>> direct I/O for commit logs makes a lot of sense to me.
>> > >
>> > >> On 2023/10/16 06:34:05 "Pawar, Amit" wrote:
>> > >> [Public]
>> > >>
>> > >> Hi,
>> > >>
>> > >> CommitLog uses mmap (memory mapped ) segments by default. Direct-IO
>> feature is proposed through new PR[1] to improve the CommitLog IO speed.
>> Enabling this by default could be useful feature to address IO bottleneck
>> seen during peak load.
>> > >>
>> > >> Need your input regarding changing this default. Please suggest.
>> > >>
>> > >> https://issues.apache.org/jira/browse/CASSANDRA-18464
>> > >>
>> > >> thanks,
>> > >> Amit Pawar
>> > >>
>> > >> [1] - https://github.com/apache/cassandra/pull/2777
>> > >>
>> >
>>
>

-- 
you are the apple of my eye !


Re: [DISCUSS] CommitLog default disk access mode

2023-10-16 Thread Sam
*Glad you brought up compaction here - I think there would be a significant
benefit to moving compaction to direct i/o.*

+1. Would love to see this get traction.

On Mon, 16 Oct 2023 at 19:38, Jon Haddad  wrote:

> Glad you brought up compaction here - I think there would be a significant
> benefit to moving compaction to direct i/o.
>
>
> On 2023/10/16 16:14:28 Benedict wrote:
> > I have some plans to (eventually) use the commit log as memtable payload
> storage (ie memtables would reference the commit log entries directly,
> storing only indexing info), and to back first level of sstables by
> reference to commit log entries. This will permit us to deliver not only
> much bigger memtables (cutting compaction throughput requirements by the
> ratio of size increase - so pretty dramatically), and faster flushing (so
> better behaviour ling write bursts), but also a fairly cheap and simple way
> to support MVCC - which will be helpful for transaction throughput.
> >
> > There is also a new commit log (“journal”) coming with Accord, that the
> rest of C* may or may not transition to.
> >
> > I only say this because this makes the utility of direct IO for commit
> log suspect, as we will be reading from the files as a matter of course
> should this go ahead; and we may end up relying on a different commit log
> implementation before long anyway.
> >
> > This is obviously a big suggestion and is not guaranteed to transpire,
> and probably won’t within the next year or so, but it should perhaps form
> some minimal part of any calculus. If the patch is otherwise simple and
> beneficial I don’t have anything against it, and the use of direct IO could
> well be of benefit eg in compaction - and also in future if we manage to
> bring a page management in process. So laying foundations there could be of
> benefit, even if the commit log eventually does not use it.
> >
> > > On 16 Oct 2023, at 17:00, Jon Haddad 
> wrote:
> > >
> > > I haven't looked at the patch, but at a high level, defaulting to
> direct I/O for commit logs makes a lot of sense to me.
> > >
> > >> On 2023/10/16 06:34:05 "Pawar, Amit" wrote:
> > >> [Public]
> > >>
> > >> Hi,
> > >>
> > >> CommitLog uses mmap (memory mapped ) segments by default. Direct-IO
> feature is proposed through new PR[1] to improve the CommitLog IO speed.
> Enabling this by default could be useful feature to address IO bottleneck
> seen during peak load.
> > >>
> > >> Need your input regarding changing this default. Please suggest.
> > >>
> > >> https://issues.apache.org/jira/browse/CASSANDRA-18464
> > >>
> > >> thanks,
> > >> Amit Pawar
> > >>
> > >> [1] - https://github.com/apache/cassandra/pull/2777
> > >>
> >
>


Re: [DISCUSS] CommitLog default disk access mode

2023-10-16 Thread Dinesh Joshi
I haven't looked at the patch yet so take whatever I say here with a pinch of 
salt.

Philosophically, defaults should not change unless there is a clear 
demonstrable benefit in majority cases for our users. In this case DirectIO 
should have clear benefits. That said, this is a new feature and I would 
personally default it to off. We should document it and allow for our users to 
enable it. This derisks the project in case there is an inadvertent change in 
behavior.

Dinesh

> On Oct 15, 2023, at 11:34 PM, Pawar, Amit  wrote:
> 
> [Public]
> 
> 
> Hi,
>  
> CommitLog uses mmap (memory mapped ) segments by default. Direct-IO feature 
> is proposed through new PR[1] to improve the CommitLog IO speed. Enabling 
> this by default could be useful feature to address IO bottleneck seen during 
> peak load.
>  
> Need your input regarding changing this default. Please suggest.
>  
> https://issues.apache.org/jira/browse/CASSANDRA-18464
>  
> thanks,
> Amit Pawar
>  
> [1] - https://github.com/apache/cassandra/pull/2777



Re: [DISCUSS] CommitLog default disk access mode

2023-10-16 Thread Jon Haddad
Glad you brought up compaction here - I think there would be a significant 
benefit to moving compaction to direct i/o.


On 2023/10/16 16:14:28 Benedict wrote:
> I have some plans to (eventually) use the commit log as memtable payload 
> storage (ie memtables would reference the commit log entries directly, 
> storing only indexing info), and to back first level of sstables by reference 
> to commit log entries. This will permit us to deliver not only much bigger 
> memtables (cutting compaction throughput requirements by the ratio of size 
> increase - so pretty dramatically), and faster flushing (so better behaviour 
> ling write bursts), but also a fairly cheap and simple way to support MVCC - 
> which will be helpful for transaction throughput.
> 
> There is also a new commit log (“journal”) coming with Accord, that the rest 
> of C* may or may not transition to.
> 
> I only say this because this makes the utility of direct IO for commit log 
> suspect, as we will be reading from the files as a matter of course should 
> this go ahead; and we may end up relying on a different commit log 
> implementation before long anyway.
> 
> This is obviously a big suggestion and is not guaranteed to transpire, and 
> probably won’t within the next year or so, but it should perhaps form some 
> minimal part of any calculus. If the patch is otherwise simple and beneficial 
> I don’t have anything against it, and the use of direct IO could well be of 
> benefit eg in compaction - and also in future if we manage to bring a page 
> management in process. So laying foundations there could be of benefit, even 
> if the commit log eventually does not use it.
> 
> > On 16 Oct 2023, at 17:00, Jon Haddad  wrote:
> > 
> > I haven't looked at the patch, but at a high level, defaulting to direct 
> > I/O for commit logs makes a lot of sense to me.  
> > 
> >> On 2023/10/16 06:34:05 "Pawar, Amit" wrote:
> >> [Public]
> >> 
> >> Hi,
> >> 
> >> CommitLog uses mmap (memory mapped ) segments by default. Direct-IO 
> >> feature is proposed through new PR[1] to improve the CommitLog IO speed. 
> >> Enabling this by default could be useful feature to address IO bottleneck 
> >> seen during peak load.
> >> 
> >> Need your input regarding changing this default. Please suggest.
> >> 
> >> https://issues.apache.org/jira/browse/CASSANDRA-18464
> >> 
> >> thanks,
> >> Amit Pawar
> >> 
> >> [1] - https://github.com/apache/cassandra/pull/2777
> >> 
> 


Re: [DISCUSS] CommitLog default disk access mode

2023-10-16 Thread Benedict
I have some plans to (eventually) use the commit log as memtable payload 
storage (ie memtables would reference the commit log entries directly, storing 
only indexing info), and to back first level of sstables by reference to commit 
log entries. This will permit us to deliver not only much bigger memtables 
(cutting compaction throughput requirements by the ratio of size increase - so 
pretty dramatically), and faster flushing (so better behaviour ling write 
bursts), but also a fairly cheap and simple way to support MVCC - which will be 
helpful for transaction throughput.

There is also a new commit log (“journal”) coming with Accord, that the rest of 
C* may or may not transition to.

I only say this because this makes the utility of direct IO for commit log 
suspect, as we will be reading from the files as a matter of course should this 
go ahead; and we may end up relying on a different commit log implementation 
before long anyway.

This is obviously a big suggestion and is not guaranteed to transpire, and 
probably won’t within the next year or so, but it should perhaps form some 
minimal part of any calculus. If the patch is otherwise simple and beneficial I 
don’t have anything against it, and the use of direct IO could well be of 
benefit eg in compaction - and also in future if we manage to bring a page 
management in process. So laying foundations there could be of benefit, even if 
the commit log eventually does not use it.

> On 16 Oct 2023, at 17:00, Jon Haddad  wrote:
> 
> I haven't looked at the patch, but at a high level, defaulting to direct I/O 
> for commit logs makes a lot of sense to me.  
> 
>> On 2023/10/16 06:34:05 "Pawar, Amit" wrote:
>> [Public]
>> 
>> Hi,
>> 
>> CommitLog uses mmap (memory mapped ) segments by default. Direct-IO feature 
>> is proposed through new PR[1] to improve the CommitLog IO speed. Enabling 
>> this by default could be useful feature to address IO bottleneck seen during 
>> peak load.
>> 
>> Need your input regarding changing this default. Please suggest.
>> 
>> https://issues.apache.org/jira/browse/CASSANDRA-18464
>> 
>> thanks,
>> Amit Pawar
>> 
>> [1] - https://github.com/apache/cassandra/pull/2777
>> 


Re: [DISCUSS] CommitLog default disk access mode

2023-10-16 Thread Jon Haddad
I haven't looked at the patch, but at a high level, defaulting to direct I/O 
for commit logs makes a lot of sense to me.  

On 2023/10/16 06:34:05 "Pawar, Amit" wrote:
> [Public]
> 
> Hi,
> 
> CommitLog uses mmap (memory mapped ) segments by default. Direct-IO feature 
> is proposed through new PR[1] to improve the CommitLog IO speed. Enabling 
> this by default could be useful feature to address IO bottleneck seen during 
> peak load.
> 
> Need your input regarding changing this default. Please suggest.
> 
> https://issues.apache.org/jira/browse/CASSANDRA-18464
> 
> thanks,
> Amit Pawar
> 
> [1] - https://github.com/apache/cassandra/pull/2777
> 


Re: [DISCUSS] CommitLog default disk access mode

2023-10-16 Thread guo Maxwell
I think what we should do is  provide options, and let users make their own
decisions.

The default behavior should not be modified until some future released
version.


Pawar, Amit  于2023年10月16日周一 15:51写道:

> [Public]
>
> Hi,
>
>
>
> CommitLog uses mmap (memory mapped ) segments by default. Direct-IO
> feature is proposed through new PR[1] to improve the CommitLog IO speed.
> Enabling this by default could be useful feature to address IO bottleneck
> seen during peak load.
>
>
>
> Need your input regarding changing this default. Please suggest.
>
>
>
> https://issues.apache.org/jira/browse/CASSANDRA-18464
>
>
>
> thanks,
>
> Amit Pawar
>
>
>
> [1] - https://github.com/apache/cassandra/pull/2777
>


-- 
you are the apple of my eye !