Re: discuss: add to_human_size function

2024-04-29 Thread Štefan Miklošovič
FYI, I added both to_human_size and to_human_duration (1), (2).

I try my luck with asking for a reviewer. It is also tested / documented
etc.

(1) https://github.com/apache/cassandra/pull/3239/files
(2)
https://github.com/apache/cassandra/blob/f35ed228145fae3edb4325d29464f0d950d13511/doc/modules/cassandra/pages/developing/cql/functions.adoc#human-helper-functions

On Thu, Apr 25, 2024 at 6:20 PM Ekaterina Dimitrova 
wrote:

> All I say is we should be careful not to open the door for someone to be
> able to set for a parameter in cassandra.yaml 512MiB and convert it to 0
> GiB internally while changing those classes. Loss of precision and weird
> settings. As long as that pandora box stays closed, all good 
>
>  I do support this new function addition proposed here, thank you!
>
> On Thu, 25 Apr 2024 at 7:31, Jon Haddad  wrote:
>
>> I can’t see a good reason not to support it. Seems like extra work to
>> avoid with no benefit.
>>
>> —
>>
>> Jon Haddad
>> Rustyrazorblade Consulting
>> rustyrazorblade.com
>>
>>
>> On Thu, Apr 25, 2024 at 7:16 AM Štefan Miklošovič <
>> stefan.mikloso...@gmail.com> wrote:
>>
>>> Can you elaborate on intentionally not supporting some conversions? Are
>>> we safe to base these conversions on DataStorageUnit? We have set of units
>>> from BYTES to GIBIBYTES and respective methods on them which convert from
>>> that unit to whatever else. Is this OK to be used for the purposes of this
>>> feature? I would expect that once we have units like these and methods on
>>> them to convert from-to, it can be reused in wherever else.
>>>
>>> On Thu, Apr 25, 2024 at 4:06 PM Ekaterina Dimitrova <
>>> e.dimitr...@gmail.com> wrote:
>>>
 All I am saying is be careful with adding those conversions not to end
 up used while setting our configuration. Thanks 

 On Thu, 25 Apr 2024 at 6:53, Štefan Miklošovič <
 stefan.mikloso...@gmail.com> wrote:

> Well, technically I do not need DataStorageSpec at all. All I need is
> DataStorageUnit for that matter. That can convert from one unit to another
> easily.
>
> We can omit tebibytes, that's just fine. People would need to live
> with gibibytes at most in cqlsh output. They would not get 5 TiB but 5120
> GiB, I guess that is just enough to have a picture of what magnitude that
> value looks like.
>
> On Thu, Apr 25, 2024 at 3:36 PM Ekaterina Dimitrova <
> e.dimitr...@gmail.com> wrote:
>
>> Quick comment:
>>
>> DataRateSpec, DataStorageSpec, or DurationSpec
>> - we intentionally do not support going smaller to bigger size in
>> those classes which are specific for cassandra.yaml - precision issues.
>> Please keep it that way. That is why the notion of min unit was added in
>> cassandra.yaml for parameters that are internally represented in a bigger
>> unit.
>>
>> I am not sure that people want to add TiB. There was explicit
>> agreement what units we will allow in cassandra.yaml. I suspect any new
>> units should be approved on the ML
>>
>> Hope this helps
>>
>>
>>
>> On Thu, 25 Apr 2024 at 5:55, Claude Warren, Jr via dev <
>> dev@cassandra.apache.org> wrote:
>>
>>> TiB is not yet in DataStorageSpec (perhaps we should add it).
>>>
>>> A quick review tells me that all the units are unique across the 3
>>> specs.  As long as we guarantee that in the future the method you 
>>> propose
>>> should be easily expandable to the other specs.
>>>
>>> +1 to this idea.
>>>
>>> On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič <
>>> stefan.mikloso...@gmail.com> wrote:
>>>
 That is a very interesting point, Claude. My so-far implementation
 is using FileUtils.stringifyFileSize which is just dividing a value by 
 a
 respective divisor based on how big a value is. While this works, it 
 will
 prevent you from specifying what unit you want that value to be 
 converted
 to as well as it will prevent you from specifying what unit a value you
 provided is of. So, for example, if a column is known to be in 
 kibibytes
 and we want that to be converted into gibibytes, that won't be possible
 because that function will think that a value is in bytes.

 It would be more appropriate to have something like this:

 to_human_size(val) -> alias to FileUtils.stringifyFileSize, without
 any source nor target unit, it will consider it to be in bytes and it 
 will
 convert it like in FileUtils.stringifyFileSize

 to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B',
 'MiB')
 to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B',
 'GiB')

 the first argument is the source unit, the second argument is
 target unit

 to_human_size(val, 'B', 

Re: discuss: add to_human_size function

2024-04-25 Thread Ekaterina Dimitrova
All I say is we should be careful not to open the door for someone to be
able to set for a parameter in cassandra.yaml 512MiB and convert it to 0
GiB internally while changing those classes. Loss of precision and weird
settings. As long as that pandora box stays closed, all good 

 I do support this new function addition proposed here, thank you!

On Thu, 25 Apr 2024 at 7:31, Jon Haddad  wrote:

> I can’t see a good reason not to support it. Seems like extra work to
> avoid with no benefit.
>
> —
>
> Jon Haddad
> Rustyrazorblade Consulting
> rustyrazorblade.com
>
>
> On Thu, Apr 25, 2024 at 7:16 AM Štefan Miklošovič <
> stefan.mikloso...@gmail.com> wrote:
>
>> Can you elaborate on intentionally not supporting some conversions? Are
>> we safe to base these conversions on DataStorageUnit? We have set of units
>> from BYTES to GIBIBYTES and respective methods on them which convert from
>> that unit to whatever else. Is this OK to be used for the purposes of this
>> feature? I would expect that once we have units like these and methods on
>> them to convert from-to, it can be reused in wherever else.
>>
>> On Thu, Apr 25, 2024 at 4:06 PM Ekaterina Dimitrova <
>> e.dimitr...@gmail.com> wrote:
>>
>>> All I am saying is be careful with adding those conversions not to end
>>> up used while setting our configuration. Thanks 
>>>
>>> On Thu, 25 Apr 2024 at 6:53, Štefan Miklošovič <
>>> stefan.mikloso...@gmail.com> wrote:
>>>
 Well, technically I do not need DataStorageSpec at all. All I need is
 DataStorageUnit for that matter. That can convert from one unit to another
 easily.

 We can omit tebibytes, that's just fine. People would need to live with
 gibibytes at most in cqlsh output. They would not get 5 TiB but 5120 GiB, I
 guess that is just enough to have a picture of what magnitude that value
 looks like.

 On Thu, Apr 25, 2024 at 3:36 PM Ekaterina Dimitrova <
 e.dimitr...@gmail.com> wrote:

> Quick comment:
>
> DataRateSpec, DataStorageSpec, or DurationSpec
> - we intentionally do not support going smaller to bigger size in
> those classes which are specific for cassandra.yaml - precision issues.
> Please keep it that way. That is why the notion of min unit was added in
> cassandra.yaml for parameters that are internally represented in a bigger
> unit.
>
> I am not sure that people want to add TiB. There was explicit
> agreement what units we will allow in cassandra.yaml. I suspect any new
> units should be approved on the ML
>
> Hope this helps
>
>
>
> On Thu, 25 Apr 2024 at 5:55, Claude Warren, Jr via dev <
> dev@cassandra.apache.org> wrote:
>
>> TiB is not yet in DataStorageSpec (perhaps we should add it).
>>
>> A quick review tells me that all the units are unique across the 3
>> specs.  As long as we guarantee that in the future the method you propose
>> should be easily expandable to the other specs.
>>
>> +1 to this idea.
>>
>> On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič <
>> stefan.mikloso...@gmail.com> wrote:
>>
>>> That is a very interesting point, Claude. My so-far implementation
>>> is using FileUtils.stringifyFileSize which is just dividing a value by a
>>> respective divisor based on how big a value is. While this works, it 
>>> will
>>> prevent you from specifying what unit you want that value to be 
>>> converted
>>> to as well as it will prevent you from specifying what unit a value you
>>> provided is of. So, for example, if a column is known to be in kibibytes
>>> and we want that to be converted into gibibytes, that won't be possible
>>> because that function will think that a value is in bytes.
>>>
>>> It would be more appropriate to have something like this:
>>>
>>> to_human_size(val) -> alias to FileUtils.stringifyFileSize, without
>>> any source nor target unit, it will consider it to be in bytes and it 
>>> will
>>> convert it like in FileUtils.stringifyFileSize
>>>
>>> to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB')
>>> to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB')
>>>
>>> the first argument is the source unit, the second argument is target
>>> unit
>>>
>>> to_human_size(val, 'B', 'MiB')
>>> to_human_size(val, 'B', 'GiB')
>>> to_human_size(val, 'KiB', 'GiB')
>>> to_human_size(val, 'KiB', 'TiB')
>>>
>>> I think this is more flexible and we should funnel this via
>>> DataStorageSpec and similar as you mentioned.
>>>
>>> In the future, we might also add to_human_duration which would be
>>> implemented against DurationSpec so similar conversions are possible.
>>>
>>> On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev <
>>> dev@cassandra.apache.org> wrote:
>>>
 I like the idea.  Is the intention to 

Re: discuss: add to_human_size function

2024-04-25 Thread Jon Haddad
I can’t see a good reason not to support it. Seems like extra work to avoid
with no benefit.

—
Jon Haddad
Rustyrazorblade Consulting
rustyrazorblade.com


On Thu, Apr 25, 2024 at 7:16 AM Štefan Miklošovič <
stefan.mikloso...@gmail.com> wrote:

> Can you elaborate on intentionally not supporting some conversions? Are we
> safe to base these conversions on DataStorageUnit? We have set of units
> from BYTES to GIBIBYTES and respective methods on them which convert from
> that unit to whatever else. Is this OK to be used for the purposes of this
> feature? I would expect that once we have units like these and methods on
> them to convert from-to, it can be reused in wherever else.
>
> On Thu, Apr 25, 2024 at 4:06 PM Ekaterina Dimitrova 
> wrote:
>
>> All I am saying is be careful with adding those conversions not to end up
>> used while setting our configuration. Thanks 
>>
>> On Thu, 25 Apr 2024 at 6:53, Štefan Miklošovič <
>> stefan.mikloso...@gmail.com> wrote:
>>
>>> Well, technically I do not need DataStorageSpec at all. All I need is
>>> DataStorageUnit for that matter. That can convert from one unit to another
>>> easily.
>>>
>>> We can omit tebibytes, that's just fine. People would need to live with
>>> gibibytes at most in cqlsh output. They would not get 5 TiB but 5120 GiB, I
>>> guess that is just enough to have a picture of what magnitude that value
>>> looks like.
>>>
>>> On Thu, Apr 25, 2024 at 3:36 PM Ekaterina Dimitrova <
>>> e.dimitr...@gmail.com> wrote:
>>>
 Quick comment:

 DataRateSpec, DataStorageSpec, or DurationSpec
 - we intentionally do not support going smaller to bigger size in those
 classes which are specific for cassandra.yaml - precision issues. Please
 keep it that way. That is why the notion of min unit was added in
 cassandra.yaml for parameters that are internally represented in a bigger
 unit.

 I am not sure that people want to add TiB. There was explicit agreement
 what units we will allow in cassandra.yaml. I suspect any new units should
 be approved on the ML

 Hope this helps



 On Thu, 25 Apr 2024 at 5:55, Claude Warren, Jr via dev <
 dev@cassandra.apache.org> wrote:

> TiB is not yet in DataStorageSpec (perhaps we should add it).
>
> A quick review tells me that all the units are unique across the 3
> specs.  As long as we guarantee that in the future the method you propose
> should be easily expandable to the other specs.
>
> +1 to this idea.
>
> On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič <
> stefan.mikloso...@gmail.com> wrote:
>
>> That is a very interesting point, Claude. My so-far implementation is
>> using FileUtils.stringifyFileSize which is just dividing a value by a
>> respective divisor based on how big a value is. While this works, it will
>> prevent you from specifying what unit you want that value to be converted
>> to as well as it will prevent you from specifying what unit a value you
>> provided is of. So, for example, if a column is known to be in kibibytes
>> and we want that to be converted into gibibytes, that won't be possible
>> because that function will think that a value is in bytes.
>>
>> It would be more appropriate to have something like this:
>>
>> to_human_size(val) -> alias to FileUtils.stringifyFileSize, without
>> any source nor target unit, it will consider it to be in bytes and it 
>> will
>> convert it like in FileUtils.stringifyFileSize
>>
>> to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB')
>> to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB')
>>
>> the first argument is the source unit, the second argument is target
>> unit
>>
>> to_human_size(val, 'B', 'MiB')
>> to_human_size(val, 'B', 'GiB')
>> to_human_size(val, 'KiB', 'GiB')
>> to_human_size(val, 'KiB', 'TiB')
>>
>> I think this is more flexible and we should funnel this via
>> DataStorageSpec and similar as you mentioned.
>>
>> In the future, we might also add to_human_duration which would be
>> implemented against DurationSpec so similar conversions are possible.
>>
>> On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev <
>> dev@cassandra.apache.org> wrote:
>>
>>> I like the idea.  Is the intention to have the of the function be
>>> parsable by the config  parsers like DataRateSpec, DataStorageSpec, or
>>> DurationSpec?
>>>
>>> Claude
>>>
>>> On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg 
>>> wrote:
>>>
 Hi,

 I think it’s a good quality of life improvement, but I am someone
 who believes in a rich set of built-in functions being a good thing.

 A format function is a bit more scope and kind of orthogonal. It
 would still be good to have shorthand functions for 

Re: discuss: add to_human_size function

2024-04-25 Thread Štefan Miklošovič
Can you elaborate on intentionally not supporting some conversions? Are we
safe to base these conversions on DataStorageUnit? We have set of units
from BYTES to GIBIBYTES and respective methods on them which convert from
that unit to whatever else. Is this OK to be used for the purposes of this
feature? I would expect that once we have units like these and methods on
them to convert from-to, it can be reused in wherever else.

On Thu, Apr 25, 2024 at 4:06 PM Ekaterina Dimitrova 
wrote:

> All I am saying is be careful with adding those conversions not to end up
> used while setting our configuration. Thanks 
>
> On Thu, 25 Apr 2024 at 6:53, Štefan Miklošovič <
> stefan.mikloso...@gmail.com> wrote:
>
>> Well, technically I do not need DataStorageSpec at all. All I need is
>> DataStorageUnit for that matter. That can convert from one unit to another
>> easily.
>>
>> We can omit tebibytes, that's just fine. People would need to live with
>> gibibytes at most in cqlsh output. They would not get 5 TiB but 5120 GiB, I
>> guess that is just enough to have a picture of what magnitude that value
>> looks like.
>>
>> On Thu, Apr 25, 2024 at 3:36 PM Ekaterina Dimitrova <
>> e.dimitr...@gmail.com> wrote:
>>
>>> Quick comment:
>>>
>>> DataRateSpec, DataStorageSpec, or DurationSpec
>>> - we intentionally do not support going smaller to bigger size in those
>>> classes which are specific for cassandra.yaml - precision issues. Please
>>> keep it that way. That is why the notion of min unit was added in
>>> cassandra.yaml for parameters that are internally represented in a bigger
>>> unit.
>>>
>>> I am not sure that people want to add TiB. There was explicit agreement
>>> what units we will allow in cassandra.yaml. I suspect any new units should
>>> be approved on the ML
>>>
>>> Hope this helps
>>>
>>>
>>>
>>> On Thu, 25 Apr 2024 at 5:55, Claude Warren, Jr via dev <
>>> dev@cassandra.apache.org> wrote:
>>>
 TiB is not yet in DataStorageSpec (perhaps we should add it).

 A quick review tells me that all the units are unique across the 3
 specs.  As long as we guarantee that in the future the method you propose
 should be easily expandable to the other specs.

 +1 to this idea.

 On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič <
 stefan.mikloso...@gmail.com> wrote:

> That is a very interesting point, Claude. My so-far implementation is
> using FileUtils.stringifyFileSize which is just dividing a value by a
> respective divisor based on how big a value is. While this works, it will
> prevent you from specifying what unit you want that value to be converted
> to as well as it will prevent you from specifying what unit a value you
> provided is of. So, for example, if a column is known to be in kibibytes
> and we want that to be converted into gibibytes, that won't be possible
> because that function will think that a value is in bytes.
>
> It would be more appropriate to have something like this:
>
> to_human_size(val) -> alias to FileUtils.stringifyFileSize, without
> any source nor target unit, it will consider it to be in bytes and it will
> convert it like in FileUtils.stringifyFileSize
>
> to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB')
> to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB')
>
> the first argument is the source unit, the second argument is target
> unit
>
> to_human_size(val, 'B', 'MiB')
> to_human_size(val, 'B', 'GiB')
> to_human_size(val, 'KiB', 'GiB')
> to_human_size(val, 'KiB', 'TiB')
>
> I think this is more flexible and we should funnel this via
> DataStorageSpec and similar as you mentioned.
>
> In the future, we might also add to_human_duration which would be
> implemented against DurationSpec so similar conversions are possible.
>
> On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev <
> dev@cassandra.apache.org> wrote:
>
>> I like the idea.  Is the intention to have the of the function be
>> parsable by the config  parsers like DataRateSpec, DataStorageSpec, or
>> DurationSpec?
>>
>> Claude
>>
>> On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg 
>> wrote:
>>
>>> Hi,
>>>
>>> I think it’s a good quality of life improvement, but I am someone
>>> who believes in a rich set of built-in functions being a good thing.
>>>
>>> A format function is a bit more scope and kind of orthogonal. It
>>> would still be good to have shorthand functions for things like size.
>>>
>>> Ariel
>>>
>>> On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote:
>>>
>>> Hi,
>>>
>>> I want to propose CASSANDRA-19546. It would be possible to convert
>>> raw numbers to something human-friendly.
>>> There are cases when we write just a number of bytes in our system
>>> tables but these numbers 

Re: discuss: add to_human_size function

2024-04-25 Thread Ekaterina Dimitrova
All I am saying is be careful with adding those conversions not to end up
used while setting our configuration. Thanks 

On Thu, 25 Apr 2024 at 6:53, Štefan Miklošovič 
wrote:

> Well, technically I do not need DataStorageSpec at all. All I need is
> DataStorageUnit for that matter. That can convert from one unit to another
> easily.
>
> We can omit tebibytes, that's just fine. People would need to live with
> gibibytes at most in cqlsh output. They would not get 5 TiB but 5120 GiB, I
> guess that is just enough to have a picture of what magnitude that value
> looks like.
>
> On Thu, Apr 25, 2024 at 3:36 PM Ekaterina Dimitrova 
> wrote:
>
>> Quick comment:
>>
>> DataRateSpec, DataStorageSpec, or DurationSpec
>> - we intentionally do not support going smaller to bigger size in those
>> classes which are specific for cassandra.yaml - precision issues. Please
>> keep it that way. That is why the notion of min unit was added in
>> cassandra.yaml for parameters that are internally represented in a bigger
>> unit.
>>
>> I am not sure that people want to add TiB. There was explicit agreement
>> what units we will allow in cassandra.yaml. I suspect any new units should
>> be approved on the ML
>>
>> Hope this helps
>>
>>
>>
>> On Thu, 25 Apr 2024 at 5:55, Claude Warren, Jr via dev <
>> dev@cassandra.apache.org> wrote:
>>
>>> TiB is not yet in DataStorageSpec (perhaps we should add it).
>>>
>>> A quick review tells me that all the units are unique across the 3
>>> specs.  As long as we guarantee that in the future the method you propose
>>> should be easily expandable to the other specs.
>>>
>>> +1 to this idea.
>>>
>>> On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič <
>>> stefan.mikloso...@gmail.com> wrote:
>>>
 That is a very interesting point, Claude. My so-far implementation is
 using FileUtils.stringifyFileSize which is just dividing a value by a
 respective divisor based on how big a value is. While this works, it will
 prevent you from specifying what unit you want that value to be converted
 to as well as it will prevent you from specifying what unit a value you
 provided is of. So, for example, if a column is known to be in kibibytes
 and we want that to be converted into gibibytes, that won't be possible
 because that function will think that a value is in bytes.

 It would be more appropriate to have something like this:

 to_human_size(val) -> alias to FileUtils.stringifyFileSize, without any
 source nor target unit, it will consider it to be in bytes and it will
 convert it like in FileUtils.stringifyFileSize

 to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB')
 to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB')

 the first argument is the source unit, the second argument is target
 unit

 to_human_size(val, 'B', 'MiB')
 to_human_size(val, 'B', 'GiB')
 to_human_size(val, 'KiB', 'GiB')
 to_human_size(val, 'KiB', 'TiB')

 I think this is more flexible and we should funnel this via
 DataStorageSpec and similar as you mentioned.

 In the future, we might also add to_human_duration which would be
 implemented against DurationSpec so similar conversions are possible.

 On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev <
 dev@cassandra.apache.org> wrote:

> I like the idea.  Is the intention to have the of the function be
> parsable by the config  parsers like DataRateSpec, DataStorageSpec, or
> DurationSpec?
>
> Claude
>
> On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg 
> wrote:
>
>> Hi,
>>
>> I think it’s a good quality of life improvement, but I am someone who
>> believes in a rich set of built-in functions being a good thing.
>>
>> A format function is a bit more scope and kind of orthogonal. It
>> would still be good to have shorthand functions for things like size.
>>
>> Ariel
>>
>> On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote:
>>
>> Hi,
>>
>> I want to propose CASSANDRA-19546. It would be possible to convert
>> raw numbers to something human-friendly.
>> There are cases when we write just a number of bytes in our system
>> tables but these numbers are just hard to parse visually. Users can 
>> indeed
>> use this for their tables too if they find it useful.
>>
>> Also, a user can indeed write a UDF for this but I would prefer if we
>> had something baked in.
>>
>> Does this make sense to people? Are there any other approaches to do
>> this?
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-19546
>> https://github.com/apache/cassandra/pull/3239/files
>>
>> Regards
>>
>>
>>


Re: discuss: add to_human_size function

2024-04-25 Thread Štefan Miklošovič
Well, technically I do not need DataStorageSpec at all. All I need is
DataStorageUnit for that matter. That can convert from one unit to another
easily.

We can omit tebibytes, that's just fine. People would need to live with
gibibytes at most in cqlsh output. They would not get 5 TiB but 5120 GiB, I
guess that is just enough to have a picture of what magnitude that value
looks like.

On Thu, Apr 25, 2024 at 3:36 PM Ekaterina Dimitrova 
wrote:

> Quick comment:
>
> DataRateSpec, DataStorageSpec, or DurationSpec
> - we intentionally do not support going smaller to bigger size in those
> classes which are specific for cassandra.yaml - precision issues. Please
> keep it that way. That is why the notion of min unit was added in
> cassandra.yaml for parameters that are internally represented in a bigger
> unit.
>
> I am not sure that people want to add TiB. There was explicit agreement
> what units we will allow in cassandra.yaml. I suspect any new units should
> be approved on the ML
>
> Hope this helps
>
>
>
> On Thu, 25 Apr 2024 at 5:55, Claude Warren, Jr via dev <
> dev@cassandra.apache.org> wrote:
>
>> TiB is not yet in DataStorageSpec (perhaps we should add it).
>>
>> A quick review tells me that all the units are unique across the 3
>> specs.  As long as we guarantee that in the future the method you propose
>> should be easily expandable to the other specs.
>>
>> +1 to this idea.
>>
>> On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič <
>> stefan.mikloso...@gmail.com> wrote:
>>
>>> That is a very interesting point, Claude. My so-far implementation is
>>> using FileUtils.stringifyFileSize which is just dividing a value by a
>>> respective divisor based on how big a value is. While this works, it will
>>> prevent you from specifying what unit you want that value to be converted
>>> to as well as it will prevent you from specifying what unit a value you
>>> provided is of. So, for example, if a column is known to be in kibibytes
>>> and we want that to be converted into gibibytes, that won't be possible
>>> because that function will think that a value is in bytes.
>>>
>>> It would be more appropriate to have something like this:
>>>
>>> to_human_size(val) -> alias to FileUtils.stringifyFileSize, without any
>>> source nor target unit, it will consider it to be in bytes and it will
>>> convert it like in FileUtils.stringifyFileSize
>>>
>>> to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB')
>>> to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB')
>>>
>>> the first argument is the source unit, the second argument is target unit
>>>
>>> to_human_size(val, 'B', 'MiB')
>>> to_human_size(val, 'B', 'GiB')
>>> to_human_size(val, 'KiB', 'GiB')
>>> to_human_size(val, 'KiB', 'TiB')
>>>
>>> I think this is more flexible and we should funnel this via
>>> DataStorageSpec and similar as you mentioned.
>>>
>>> In the future, we might also add to_human_duration which would be
>>> implemented against DurationSpec so similar conversions are possible.
>>>
>>> On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev <
>>> dev@cassandra.apache.org> wrote:
>>>
 I like the idea.  Is the intention to have the of the function be
 parsable by the config  parsers like DataRateSpec, DataStorageSpec, or
 DurationSpec?

 Claude

 On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg 
 wrote:

> Hi,
>
> I think it’s a good quality of life improvement, but I am someone who
> believes in a rich set of built-in functions being a good thing.
>
> A format function is a bit more scope and kind of orthogonal. It would
> still be good to have shorthand functions for things like size.
>
> Ariel
>
> On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote:
>
> Hi,
>
> I want to propose CASSANDRA-19546. It would be possible to convert raw
> numbers to something human-friendly.
> There are cases when we write just a number of bytes in our system
> tables but these numbers are just hard to parse visually. Users can indeed
> use this for their tables too if they find it useful.
>
> Also, a user can indeed write a UDF for this but I would prefer if we
> had something baked in.
>
> Does this make sense to people? Are there any other approaches to do
> this?
>
> https://issues.apache.org/jira/browse/CASSANDRA-19546
> https://github.com/apache/cassandra/pull/3239/files
>
> Regards
>
>
>


Re: discuss: add to_human_size function

2024-04-25 Thread Ekaterina Dimitrova
Edit: I meant to say smaller to bigger unit, not size, sorry

On Thu, 25 Apr 2024 at 6:35, Ekaterina Dimitrova 
wrote:

> Quick comment:
>
> DataRateSpec, DataStorageSpec, or DurationSpec
> - we intentionally do not support going smaller to bigger size in those
> classes which are specific for cassandra.yaml - precision issues. Please
> keep it that way. That is why the notion of min unit was added in
> cassandra.yaml for parameters that are internally represented in a bigger
> unit.
>
> I am not sure that people want to add TiB. There was explicit agreement
> what units we will allow in cassandra.yaml. I suspect any new units should
> be approved on the ML
>
> Hope this helps
>
>
>
> On Thu, 25 Apr 2024 at 5:55, Claude Warren, Jr via dev <
> dev@cassandra.apache.org> wrote:
>
>> TiB is not yet in DataStorageSpec (perhaps we should add it).
>>
>> A quick review tells me that all the units are unique across the 3
>> specs.  As long as we guarantee that in the future the method you propose
>> should be easily expandable to the other specs.
>>
>> +1 to this idea.
>>
>> On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič <
>> stefan.mikloso...@gmail.com> wrote:
>>
>>> That is a very interesting point, Claude. My so-far implementation is
>>> using FileUtils.stringifyFileSize which is just dividing a value by a
>>> respective divisor based on how big a value is. While this works, it will
>>> prevent you from specifying what unit you want that value to be converted
>>> to as well as it will prevent you from specifying what unit a value you
>>> provided is of. So, for example, if a column is known to be in kibibytes
>>> and we want that to be converted into gibibytes, that won't be possible
>>> because that function will think that a value is in bytes.
>>>
>>> It would be more appropriate to have something like this:
>>>
>>> to_human_size(val) -> alias to FileUtils.stringifyFileSize, without any
>>> source nor target unit, it will consider it to be in bytes and it will
>>> convert it like in FileUtils.stringifyFileSize
>>>
>>> to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB')
>>> to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB')
>>>
>>> the first argument is the source unit, the second argument is target unit
>>>
>>> to_human_size(val, 'B', 'MiB')
>>> to_human_size(val, 'B', 'GiB')
>>> to_human_size(val, 'KiB', 'GiB')
>>> to_human_size(val, 'KiB', 'TiB')
>>>
>>> I think this is more flexible and we should funnel this via
>>> DataStorageSpec and similar as you mentioned.
>>>
>>> In the future, we might also add to_human_duration which would be
>>> implemented against DurationSpec so similar conversions are possible.
>>>
>>> On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev <
>>> dev@cassandra.apache.org> wrote:
>>>
 I like the idea.  Is the intention to have the of the function be
 parsable by the config  parsers like DataRateSpec, DataStorageSpec, or
 DurationSpec?

 Claude

 On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg 
 wrote:

> Hi,
>
> I think it’s a good quality of life improvement, but I am someone who
> believes in a rich set of built-in functions being a good thing.
>
> A format function is a bit more scope and kind of orthogonal. It would
> still be good to have shorthand functions for things like size.
>
> Ariel
>
> On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote:
>
> Hi,
>
> I want to propose CASSANDRA-19546. It would be possible to convert raw
> numbers to something human-friendly.
> There are cases when we write just a number of bytes in our system
> tables but these numbers are just hard to parse visually. Users can indeed
> use this for their tables too if they find it useful.
>
> Also, a user can indeed write a UDF for this but I would prefer if we
> had something baked in.
>
> Does this make sense to people? Are there any other approaches to do
> this?
>
> https://issues.apache.org/jira/browse/CASSANDRA-19546
> https://github.com/apache/cassandra/pull/3239/files
>
> Regards
>
>
>


Re: discuss: add to_human_size function

2024-04-25 Thread Ekaterina Dimitrova
Quick comment:

DataRateSpec, DataStorageSpec, or DurationSpec
- we intentionally do not support going smaller to bigger size in those
classes which are specific for cassandra.yaml - precision issues. Please
keep it that way. That is why the notion of min unit was added in
cassandra.yaml for parameters that are internally represented in a bigger
unit.

I am not sure that people want to add TiB. There was explicit agreement
what units we will allow in cassandra.yaml. I suspect any new units should
be approved on the ML

Hope this helps



On Thu, 25 Apr 2024 at 5:55, Claude Warren, Jr via dev <
dev@cassandra.apache.org> wrote:

> TiB is not yet in DataStorageSpec (perhaps we should add it).
>
> A quick review tells me that all the units are unique across the 3 specs.
> As long as we guarantee that in the future the method you propose should be
> easily expandable to the other specs.
>
> +1 to this idea.
>
> On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič <
> stefan.mikloso...@gmail.com> wrote:
>
>> That is a very interesting point, Claude. My so-far implementation is
>> using FileUtils.stringifyFileSize which is just dividing a value by a
>> respective divisor based on how big a value is. While this works, it will
>> prevent you from specifying what unit you want that value to be converted
>> to as well as it will prevent you from specifying what unit a value you
>> provided is of. So, for example, if a column is known to be in kibibytes
>> and we want that to be converted into gibibytes, that won't be possible
>> because that function will think that a value is in bytes.
>>
>> It would be more appropriate to have something like this:
>>
>> to_human_size(val) -> alias to FileUtils.stringifyFileSize, without any
>> source nor target unit, it will consider it to be in bytes and it will
>> convert it like in FileUtils.stringifyFileSize
>>
>> to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB')
>> to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB')
>>
>> the first argument is the source unit, the second argument is target unit
>>
>> to_human_size(val, 'B', 'MiB')
>> to_human_size(val, 'B', 'GiB')
>> to_human_size(val, 'KiB', 'GiB')
>> to_human_size(val, 'KiB', 'TiB')
>>
>> I think this is more flexible and we should funnel this via
>> DataStorageSpec and similar as you mentioned.
>>
>> In the future, we might also add to_human_duration which would be
>> implemented against DurationSpec so similar conversions are possible.
>>
>> On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev <
>> dev@cassandra.apache.org> wrote:
>>
>>> I like the idea.  Is the intention to have the of the function be
>>> parsable by the config  parsers like DataRateSpec, DataStorageSpec, or
>>> DurationSpec?
>>>
>>> Claude
>>>
>>> On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg 
>>> wrote:
>>>
 Hi,

 I think it’s a good quality of life improvement, but I am someone who
 believes in a rich set of built-in functions being a good thing.

 A format function is a bit more scope and kind of orthogonal. It would
 still be good to have shorthand functions for things like size.

 Ariel

 On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote:

 Hi,

 I want to propose CASSANDRA-19546. It would be possible to convert raw
 numbers to something human-friendly.
 There are cases when we write just a number of bytes in our system
 tables but these numbers are just hard to parse visually. Users can indeed
 use this for their tables too if they find it useful.

 Also, a user can indeed write a UDF for this but I would prefer if we
 had something baked in.

 Does this make sense to people? Are there any other approaches to do
 this?

 https://issues.apache.org/jira/browse/CASSANDRA-19546
 https://github.com/apache/cassandra/pull/3239/files

 Regards





Re: discuss: add to_human_size function

2024-04-25 Thread Claude Warren, Jr via dev
TiB is not yet in DataStorageSpec (perhaps we should add it).

A quick review tells me that all the units are unique across the 3 specs.
As long as we guarantee that in the future the method you propose should be
easily expandable to the other specs.

+1 to this idea.

On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič <
stefan.mikloso...@gmail.com> wrote:

> That is a very interesting point, Claude. My so-far implementation is
> using FileUtils.stringifyFileSize which is just dividing a value by a
> respective divisor based on how big a value is. While this works, it will
> prevent you from specifying what unit you want that value to be converted
> to as well as it will prevent you from specifying what unit a value you
> provided is of. So, for example, if a column is known to be in kibibytes
> and we want that to be converted into gibibytes, that won't be possible
> because that function will think that a value is in bytes.
>
> It would be more appropriate to have something like this:
>
> to_human_size(val) -> alias to FileUtils.stringifyFileSize, without any
> source nor target unit, it will consider it to be in bytes and it will
> convert it like in FileUtils.stringifyFileSize
>
> to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB')
> to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB')
>
> the first argument is the source unit, the second argument is target unit
>
> to_human_size(val, 'B', 'MiB')
> to_human_size(val, 'B', 'GiB')
> to_human_size(val, 'KiB', 'GiB')
> to_human_size(val, 'KiB', 'TiB')
>
> I think this is more flexible and we should funnel this via
> DataStorageSpec and similar as you mentioned.
>
> In the future, we might also add to_human_duration which would be
> implemented against DurationSpec so similar conversions are possible.
>
> On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev <
> dev@cassandra.apache.org> wrote:
>
>> I like the idea.  Is the intention to have the of the function be
>> parsable by the config  parsers like DataRateSpec, DataStorageSpec, or
>> DurationSpec?
>>
>> Claude
>>
>> On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg  wrote:
>>
>>> Hi,
>>>
>>> I think it’s a good quality of life improvement, but I am someone who
>>> believes in a rich set of built-in functions being a good thing.
>>>
>>> A format function is a bit more scope and kind of orthogonal. It would
>>> still be good to have shorthand functions for things like size.
>>>
>>> Ariel
>>>
>>> On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote:
>>>
>>> Hi,
>>>
>>> I want to propose CASSANDRA-19546. It would be possible to convert raw
>>> numbers to something human-friendly.
>>> There are cases when we write just a number of bytes in our system
>>> tables but these numbers are just hard to parse visually. Users can indeed
>>> use this for their tables too if they find it useful.
>>>
>>> Also, a user can indeed write a UDF for this but I would prefer if we
>>> had something baked in.
>>>
>>> Does this make sense to people? Are there any other approaches to do
>>> this?
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-19546
>>> https://github.com/apache/cassandra/pull/3239/files
>>>
>>> Regards
>>>
>>>
>>>


Re: discuss: add to_human_size function

2024-04-25 Thread Štefan Miklošovič
That is a very interesting point, Claude. My so-far implementation is
using FileUtils.stringifyFileSize which is just dividing a value by a
respective divisor based on how big a value is. While this works, it will
prevent you from specifying what unit you want that value to be converted
to as well as it will prevent you from specifying what unit a value you
provided is of. So, for example, if a column is known to be in kibibytes
and we want that to be converted into gibibytes, that won't be possible
because that function will think that a value is in bytes.

It would be more appropriate to have something like this:

to_human_size(val) -> alias to FileUtils.stringifyFileSize, without any
source nor target unit, it will consider it to be in bytes and it will
convert it like in FileUtils.stringifyFileSize

to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB')
to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB')

the first argument is the source unit, the second argument is target unit

to_human_size(val, 'B', 'MiB')
to_human_size(val, 'B', 'GiB')
to_human_size(val, 'KiB', 'GiB')
to_human_size(val, 'KiB', 'TiB')

I think this is more flexible and we should funnel this via DataStorageSpec
and similar as you mentioned.

In the future, we might also add to_human_duration which would be
implemented against DurationSpec so similar conversions are possible.

On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev <
dev@cassandra.apache.org> wrote:

> I like the idea.  Is the intention to have the of the function be parsable
> by the config  parsers like DataRateSpec, DataStorageSpec, or DurationSpec?
>
> Claude
>
> On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg  wrote:
>
>> Hi,
>>
>> I think it’s a good quality of life improvement, but I am someone who
>> believes in a rich set of built-in functions being a good thing.
>>
>> A format function is a bit more scope and kind of orthogonal. It would
>> still be good to have shorthand functions for things like size.
>>
>> Ariel
>>
>> On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote:
>>
>> Hi,
>>
>> I want to propose CASSANDRA-19546. It would be possible to convert raw
>> numbers to something human-friendly.
>> There are cases when we write just a number of bytes in our system tables
>> but these numbers are just hard to parse visually. Users can indeed use
>> this for their tables too if they find it useful.
>>
>> Also, a user can indeed write a UDF for this but I would prefer if we had
>> something baked in.
>>
>> Does this make sense to people? Are there any other approaches to do
>> this?
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-19546
>> https://github.com/apache/cassandra/pull/3239/files
>>
>> Regards
>>
>>
>>


Re: discuss: add to_human_size function

2024-04-19 Thread Claude Warren, Jr via dev
I like the idea.  Is the intention to have the of the function be parsable
by the config  parsers like DataRateSpec, DataStorageSpec, or DurationSpec?

Claude

On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg  wrote:

> Hi,
>
> I think it’s a good quality of life improvement, but I am someone who
> believes in a rich set of built-in functions being a good thing.
>
> A format function is a bit more scope and kind of orthogonal. It would
> still be good to have shorthand functions for things like size.
>
> Ariel
>
> On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote:
>
> Hi,
>
> I want to propose CASSANDRA-19546. It would be possible to convert raw
> numbers to something human-friendly.
> There are cases when we write just a number of bytes in our system tables
> but these numbers are just hard to parse visually. Users can indeed use
> this for their tables too if they find it useful.
>
> Also, a user can indeed write a UDF for this but I would prefer if we had
> something baked in.
>
> Does this make sense to people? Are there any other approaches to do this?
>
> https://issues.apache.org/jira/browse/CASSANDRA-19546
> https://github.com/apache/cassandra/pull/3239/files
>
> Regards
>
>
>


Re: discuss: add to_human_size function

2024-04-18 Thread Ariel Weisberg
Hi,

I think it’s a good quality of life improvement, but I am someone who believes 
in a rich set of built-in functions being a good thing.

A format function is a bit more scope and kind of orthogonal. It would still be 
good to have shorthand functions for things like size.

Ariel

On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote:
> Hi,
> 
> I want to propose CASSANDRA-19546. It would be possible to convert raw 
> numbers to something human-friendly. 
> There are cases when we write just a number of bytes in our system tables but 
> these numbers are just hard to parse visually. Users can indeed use this for 
> their tables too if they find it useful.
> 
> Also, a user can indeed write a UDF for this but I would prefer if we had 
> something baked in.
> 
> Does this make sense to people? Are there any other approaches to do this? 
> 
> https://issues.apache.org/jira/browse/CASSANDRA-19546
> https://github.com/apache/cassandra/pull/3239/files
> 
> Regards


Re: discuss: add to_human_size function

2024-04-15 Thread Štefan Miklošovič
I think we might still have two functions.

The first one, format, as you just showed, which would copy the behaviour
in MySQL as closely as possible.

The second one would deal with sizes, like "format_size", which would
append size unit, as shown in the branch I posted.

WDYT?

Regards

On Thu, Apr 11, 2024 at 6:09 AM Brad  wrote:

> It's a useful idea and something supported in other databases.
>
> MySQL has FORMAT function:
>
> FORMAT(X,D[,locale])
>
>
> Formats the number X to a format like '#,###,###.##', rounded to D decimal
> places, and returns the result as a string. If D is 0, the result has no
> decimal point or fractional part. If X or D is NULL, the function returns
> NULL.FORMAT(X,D[,locale])
>
>
>
> ex:
>
>
> SELECT FORMAT(250500.5634, 2);
>
> 250,500.56
>
>
> SELECT FORMAT(250500.5634,0);
>
> 250,500
>
>
>
> https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_format
>
>
> On Tue, Apr 9, 2024 at 8:10 AM Štefan Miklošovič <
> stefan.mikloso...@gmail.com> wrote:
>
>> Hi,
>>
>> I want to propose CASSANDRA-19546. It would be possible to convert raw
>> numbers to something human-friendly.
>> There are cases when we write just a number of bytes in our system tables
>> but these numbers are just hard to parse visually. Users can indeed use
>> this for their tables too if they find it useful.
>>
>> Also, a user can indeed write a UDF for this but I would prefer if we had
>> something baked in.
>>
>> Does this make sense to people? Are there any other approaches to do
>> this?
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-19546
>> https://github.com/apache/cassandra/pull/3239/files
>>
>> Regards
>>
>


Re: discuss: add to_human_size function

2024-04-10 Thread Brad
It's a useful idea and something supported in other databases.

MySQL has FORMAT function:

FORMAT(X,D[,locale])


Formats the number X to a format like '#,###,###.##', rounded to D decimal
places, and returns the result as a string. If D is 0, the result has no
decimal point or fractional part. If X or D is NULL, the function returns
NULL.FORMAT(X,D[,locale])



ex:


SELECT FORMAT(250500.5634, 2);

250,500.56


SELECT FORMAT(250500.5634,0);

250,500


https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_format


On Tue, Apr 9, 2024 at 8:10 AM Štefan Miklošovič <
stefan.mikloso...@gmail.com> wrote:

> Hi,
>
> I want to propose CASSANDRA-19546. It would be possible to convert raw
> numbers to something human-friendly.
> There are cases when we write just a number of bytes in our system tables
> but these numbers are just hard to parse visually. Users can indeed use
> this for their tables too if they find it useful.
>
> Also, a user can indeed write a UDF for this but I would prefer if we had
> something baked in.
>
> Does this make sense to people? Are there any other approaches to do this?
>
> https://issues.apache.org/jira/browse/CASSANDRA-19546
> https://github.com/apache/cassandra/pull/3239/files
>
> Regards
>