Re: discuss: add to_human_size function
FYI, I added both to_human_size and to_human_duration (1), (2). I try my luck with asking for a reviewer. It is also tested / documented etc. (1) https://github.com/apache/cassandra/pull/3239/files (2) https://github.com/apache/cassandra/blob/f35ed228145fae3edb4325d29464f0d950d13511/doc/modules/cassandra/pages/developing/cql/functions.adoc#human-helper-functions On Thu, Apr 25, 2024 at 6:20 PM Ekaterina Dimitrova wrote: > All I say is we should be careful not to open the door for someone to be > able to set for a parameter in cassandra.yaml 512MiB and convert it to 0 > GiB internally while changing those classes. Loss of precision and weird > settings. As long as that pandora box stays closed, all good 👍🏻 > > I do support this new function addition proposed here, thank you! > > On Thu, 25 Apr 2024 at 7:31, Jon Haddad wrote: > >> I can’t see a good reason not to support it. Seems like extra work to >> avoid with no benefit. >> >> — >> >> Jon Haddad >> Rustyrazorblade Consulting >> rustyrazorblade.com >> >> >> On Thu, Apr 25, 2024 at 7:16 AM Štefan Miklošovič < >> stefan.mikloso...@gmail.com> wrote: >> >>> Can you elaborate on intentionally not supporting some conversions? Are >>> we safe to base these conversions on DataStorageUnit? We have set of units >>> from BYTES to GIBIBYTES and respective methods on them which convert from >>> that unit to whatever else. Is this OK to be used for the purposes of this >>> feature? I would expect that once we have units like these and methods on >>> them to convert from-to, it can be reused in wherever else. >>> >>> On Thu, Apr 25, 2024 at 4:06 PM Ekaterina Dimitrova < >>> e.dimitr...@gmail.com> wrote: >>> All I am saying is be careful with adding those conversions not to end up used while setting our configuration. Thanks 🙏 On Thu, 25 Apr 2024 at 6:53, Štefan Miklošovič < stefan.mikloso...@gmail.com> wrote: > Well, technically I do not need DataStorageSpec at all. All I need is > DataStorageUnit for that matter. That can convert from one unit to another > easily. > > We can omit tebibytes, that's just fine. People would need to live > with gibibytes at most in cqlsh output. They would not get 5 TiB but 5120 > GiB, I guess that is just enough to have a picture of what magnitude that > value looks like. > > On Thu, Apr 25, 2024 at 3:36 PM Ekaterina Dimitrova < > e.dimitr...@gmail.com> wrote: > >> Quick comment: >> >> DataRateSpec, DataStorageSpec, or DurationSpec >> - we intentionally do not support going smaller to bigger size in >> those classes which are specific for cassandra.yaml - precision issues. >> Please keep it that way. That is why the notion of min unit was added in >> cassandra.yaml for parameters that are internally represented in a bigger >> unit. >> >> I am not sure that people want to add TiB. There was explicit >> agreement what units we will allow in cassandra.yaml. I suspect any new >> units should be approved on the ML >> >> Hope this helps >> >> >> >> On Thu, 25 Apr 2024 at 5:55, Claude Warren, Jr via dev < >> dev@cassandra.apache.org> wrote: >> >>> TiB is not yet in DataStorageSpec (perhaps we should add it). >>> >>> A quick review tells me that all the units are unique across the 3 >>> specs. As long as we guarantee that in the future the method you >>> propose >>> should be easily expandable to the other specs. >>> >>> +1 to this idea. >>> >>> On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič < >>> stefan.mikloso...@gmail.com> wrote: >>> That is a very interesting point, Claude. My so-far implementation is using FileUtils.stringifyFileSize which is just dividing a value by a respective divisor based on how big a value is. While this works, it will prevent you from specifying what unit you want that value to be converted to as well as it will prevent you from specifying what unit a value you provided is of. So, for example, if a column is known to be in kibibytes and we want that to be converted into gibibytes, that won't be possible because that function will think that a value is in bytes. It would be more appropriate to have something like this: to_human_size(val) -> alias to FileUtils.stringifyFileSize, without any source nor target unit, it will consider it to be in bytes and it will convert it like in FileUtils.stringifyFileSize to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB') to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB') the first argument is the source unit, the second argument is target unit to_human_size(val, 'B', 'MiB
Re: discuss: add to_human_size function
All I say is we should be careful not to open the door for someone to be able to set for a parameter in cassandra.yaml 512MiB and convert it to 0 GiB internally while changing those classes. Loss of precision and weird settings. As long as that pandora box stays closed, all good 👍🏻 I do support this new function addition proposed here, thank you! On Thu, 25 Apr 2024 at 7:31, Jon Haddad wrote: > I can’t see a good reason not to support it. Seems like extra work to > avoid with no benefit. > > — > > Jon Haddad > Rustyrazorblade Consulting > rustyrazorblade.com > > > On Thu, Apr 25, 2024 at 7:16 AM Štefan Miklošovič < > stefan.mikloso...@gmail.com> wrote: > >> Can you elaborate on intentionally not supporting some conversions? Are >> we safe to base these conversions on DataStorageUnit? We have set of units >> from BYTES to GIBIBYTES and respective methods on them which convert from >> that unit to whatever else. Is this OK to be used for the purposes of this >> feature? I would expect that once we have units like these and methods on >> them to convert from-to, it can be reused in wherever else. >> >> On Thu, Apr 25, 2024 at 4:06 PM Ekaterina Dimitrova < >> e.dimitr...@gmail.com> wrote: >> >>> All I am saying is be careful with adding those conversions not to end >>> up used while setting our configuration. Thanks 🙏 >>> >>> On Thu, 25 Apr 2024 at 6:53, Štefan Miklošovič < >>> stefan.mikloso...@gmail.com> wrote: >>> Well, technically I do not need DataStorageSpec at all. All I need is DataStorageUnit for that matter. That can convert from one unit to another easily. We can omit tebibytes, that's just fine. People would need to live with gibibytes at most in cqlsh output. They would not get 5 TiB but 5120 GiB, I guess that is just enough to have a picture of what magnitude that value looks like. On Thu, Apr 25, 2024 at 3:36 PM Ekaterina Dimitrova < e.dimitr...@gmail.com> wrote: > Quick comment: > > DataRateSpec, DataStorageSpec, or DurationSpec > - we intentionally do not support going smaller to bigger size in > those classes which are specific for cassandra.yaml - precision issues. > Please keep it that way. That is why the notion of min unit was added in > cassandra.yaml for parameters that are internally represented in a bigger > unit. > > I am not sure that people want to add TiB. There was explicit > agreement what units we will allow in cassandra.yaml. I suspect any new > units should be approved on the ML > > Hope this helps > > > > On Thu, 25 Apr 2024 at 5:55, Claude Warren, Jr via dev < > dev@cassandra.apache.org> wrote: > >> TiB is not yet in DataStorageSpec (perhaps we should add it). >> >> A quick review tells me that all the units are unique across the 3 >> specs. As long as we guarantee that in the future the method you propose >> should be easily expandable to the other specs. >> >> +1 to this idea. >> >> On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič < >> stefan.mikloso...@gmail.com> wrote: >> >>> That is a very interesting point, Claude. My so-far implementation >>> is using FileUtils.stringifyFileSize which is just dividing a value by a >>> respective divisor based on how big a value is. While this works, it >>> will >>> prevent you from specifying what unit you want that value to be >>> converted >>> to as well as it will prevent you from specifying what unit a value you >>> provided is of. So, for example, if a column is known to be in kibibytes >>> and we want that to be converted into gibibytes, that won't be possible >>> because that function will think that a value is in bytes. >>> >>> It would be more appropriate to have something like this: >>> >>> to_human_size(val) -> alias to FileUtils.stringifyFileSize, without >>> any source nor target unit, it will consider it to be in bytes and it >>> will >>> convert it like in FileUtils.stringifyFileSize >>> >>> to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB') >>> to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB') >>> >>> the first argument is the source unit, the second argument is target >>> unit >>> >>> to_human_size(val, 'B', 'MiB') >>> to_human_size(val, 'B', 'GiB') >>> to_human_size(val, 'KiB', 'GiB') >>> to_human_size(val, 'KiB', 'TiB') >>> >>> I think this is more flexible and we should funnel this via >>> DataStorageSpec and similar as you mentioned. >>> >>> In the future, we might also add to_human_duration which would be >>> implemented against DurationSpec so similar conversions are possible. >>> >>> On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev < >>> dev@cassandra.apache.org> wrote: >>> I like the idea. Is the intention to h
Re: discuss: add to_human_size function
I can’t see a good reason not to support it. Seems like extra work to avoid with no benefit. — Jon Haddad Rustyrazorblade Consulting rustyrazorblade.com On Thu, Apr 25, 2024 at 7:16 AM Štefan Miklošovič < stefan.mikloso...@gmail.com> wrote: > Can you elaborate on intentionally not supporting some conversions? Are we > safe to base these conversions on DataStorageUnit? We have set of units > from BYTES to GIBIBYTES and respective methods on them which convert from > that unit to whatever else. Is this OK to be used for the purposes of this > feature? I would expect that once we have units like these and methods on > them to convert from-to, it can be reused in wherever else. > > On Thu, Apr 25, 2024 at 4:06 PM Ekaterina Dimitrova > wrote: > >> All I am saying is be careful with adding those conversions not to end up >> used while setting our configuration. Thanks 🙏 >> >> On Thu, 25 Apr 2024 at 6:53, Štefan Miklošovič < >> stefan.mikloso...@gmail.com> wrote: >> >>> Well, technically I do not need DataStorageSpec at all. All I need is >>> DataStorageUnit for that matter. That can convert from one unit to another >>> easily. >>> >>> We can omit tebibytes, that's just fine. People would need to live with >>> gibibytes at most in cqlsh output. They would not get 5 TiB but 5120 GiB, I >>> guess that is just enough to have a picture of what magnitude that value >>> looks like. >>> >>> On Thu, Apr 25, 2024 at 3:36 PM Ekaterina Dimitrova < >>> e.dimitr...@gmail.com> wrote: >>> Quick comment: DataRateSpec, DataStorageSpec, or DurationSpec - we intentionally do not support going smaller to bigger size in those classes which are specific for cassandra.yaml - precision issues. Please keep it that way. That is why the notion of min unit was added in cassandra.yaml for parameters that are internally represented in a bigger unit. I am not sure that people want to add TiB. There was explicit agreement what units we will allow in cassandra.yaml. I suspect any new units should be approved on the ML Hope this helps On Thu, 25 Apr 2024 at 5:55, Claude Warren, Jr via dev < dev@cassandra.apache.org> wrote: > TiB is not yet in DataStorageSpec (perhaps we should add it). > > A quick review tells me that all the units are unique across the 3 > specs. As long as we guarantee that in the future the method you propose > should be easily expandable to the other specs. > > +1 to this idea. > > On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič < > stefan.mikloso...@gmail.com> wrote: > >> That is a very interesting point, Claude. My so-far implementation is >> using FileUtils.stringifyFileSize which is just dividing a value by a >> respective divisor based on how big a value is. While this works, it will >> prevent you from specifying what unit you want that value to be converted >> to as well as it will prevent you from specifying what unit a value you >> provided is of. So, for example, if a column is known to be in kibibytes >> and we want that to be converted into gibibytes, that won't be possible >> because that function will think that a value is in bytes. >> >> It would be more appropriate to have something like this: >> >> to_human_size(val) -> alias to FileUtils.stringifyFileSize, without >> any source nor target unit, it will consider it to be in bytes and it >> will >> convert it like in FileUtils.stringifyFileSize >> >> to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB') >> to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB') >> >> the first argument is the source unit, the second argument is target >> unit >> >> to_human_size(val, 'B', 'MiB') >> to_human_size(val, 'B', 'GiB') >> to_human_size(val, 'KiB', 'GiB') >> to_human_size(val, 'KiB', 'TiB') >> >> I think this is more flexible and we should funnel this via >> DataStorageSpec and similar as you mentioned. >> >> In the future, we might also add to_human_duration which would be >> implemented against DurationSpec so similar conversions are possible. >> >> On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev < >> dev@cassandra.apache.org> wrote: >> >>> I like the idea. Is the intention to have the of the function be >>> parsable by the config parsers like DataRateSpec, DataStorageSpec, or >>> DurationSpec? >>> >>> Claude >>> >>> On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg >>> wrote: >>> Hi, I think it’s a good quality of life improvement, but I am someone who believes in a rich set of built-in functions being a good thing. A format function is a bit more scope and kind of orthogonal. It would still be good to have shorthand functions for thin
Re: discuss: add to_human_size function
Can you elaborate on intentionally not supporting some conversions? Are we safe to base these conversions on DataStorageUnit? We have set of units from BYTES to GIBIBYTES and respective methods on them which convert from that unit to whatever else. Is this OK to be used for the purposes of this feature? I would expect that once we have units like these and methods on them to convert from-to, it can be reused in wherever else. On Thu, Apr 25, 2024 at 4:06 PM Ekaterina Dimitrova wrote: > All I am saying is be careful with adding those conversions not to end up > used while setting our configuration. Thanks 🙏 > > On Thu, 25 Apr 2024 at 6:53, Štefan Miklošovič < > stefan.mikloso...@gmail.com> wrote: > >> Well, technically I do not need DataStorageSpec at all. All I need is >> DataStorageUnit for that matter. That can convert from one unit to another >> easily. >> >> We can omit tebibytes, that's just fine. People would need to live with >> gibibytes at most in cqlsh output. They would not get 5 TiB but 5120 GiB, I >> guess that is just enough to have a picture of what magnitude that value >> looks like. >> >> On Thu, Apr 25, 2024 at 3:36 PM Ekaterina Dimitrova < >> e.dimitr...@gmail.com> wrote: >> >>> Quick comment: >>> >>> DataRateSpec, DataStorageSpec, or DurationSpec >>> - we intentionally do not support going smaller to bigger size in those >>> classes which are specific for cassandra.yaml - precision issues. Please >>> keep it that way. That is why the notion of min unit was added in >>> cassandra.yaml for parameters that are internally represented in a bigger >>> unit. >>> >>> I am not sure that people want to add TiB. There was explicit agreement >>> what units we will allow in cassandra.yaml. I suspect any new units should >>> be approved on the ML >>> >>> Hope this helps >>> >>> >>> >>> On Thu, 25 Apr 2024 at 5:55, Claude Warren, Jr via dev < >>> dev@cassandra.apache.org> wrote: >>> TiB is not yet in DataStorageSpec (perhaps we should add it). A quick review tells me that all the units are unique across the 3 specs. As long as we guarantee that in the future the method you propose should be easily expandable to the other specs. +1 to this idea. On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič < stefan.mikloso...@gmail.com> wrote: > That is a very interesting point, Claude. My so-far implementation is > using FileUtils.stringifyFileSize which is just dividing a value by a > respective divisor based on how big a value is. While this works, it will > prevent you from specifying what unit you want that value to be converted > to as well as it will prevent you from specifying what unit a value you > provided is of. So, for example, if a column is known to be in kibibytes > and we want that to be converted into gibibytes, that won't be possible > because that function will think that a value is in bytes. > > It would be more appropriate to have something like this: > > to_human_size(val) -> alias to FileUtils.stringifyFileSize, without > any source nor target unit, it will consider it to be in bytes and it will > convert it like in FileUtils.stringifyFileSize > > to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB') > to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB') > > the first argument is the source unit, the second argument is target > unit > > to_human_size(val, 'B', 'MiB') > to_human_size(val, 'B', 'GiB') > to_human_size(val, 'KiB', 'GiB') > to_human_size(val, 'KiB', 'TiB') > > I think this is more flexible and we should funnel this via > DataStorageSpec and similar as you mentioned. > > In the future, we might also add to_human_duration which would be > implemented against DurationSpec so similar conversions are possible. > > On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev < > dev@cassandra.apache.org> wrote: > >> I like the idea. Is the intention to have the of the function be >> parsable by the config parsers like DataRateSpec, DataStorageSpec, or >> DurationSpec? >> >> Claude >> >> On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg >> wrote: >> >>> Hi, >>> >>> I think it’s a good quality of life improvement, but I am someone >>> who believes in a rich set of built-in functions being a good thing. >>> >>> A format function is a bit more scope and kind of orthogonal. It >>> would still be good to have shorthand functions for things like size. >>> >>> Ariel >>> >>> On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote: >>> >>> Hi, >>> >>> I want to propose CASSANDRA-19546. It would be possible to convert >>> raw numbers to something human-friendly. >>> There are cases when we write just a number of bytes in our system >>> tables but these numbers a
Re: discuss: add to_human_size function
All I am saying is be careful with adding those conversions not to end up used while setting our configuration. Thanks 🙏 On Thu, 25 Apr 2024 at 6:53, Štefan Miklošovič wrote: > Well, technically I do not need DataStorageSpec at all. All I need is > DataStorageUnit for that matter. That can convert from one unit to another > easily. > > We can omit tebibytes, that's just fine. People would need to live with > gibibytes at most in cqlsh output. They would not get 5 TiB but 5120 GiB, I > guess that is just enough to have a picture of what magnitude that value > looks like. > > On Thu, Apr 25, 2024 at 3:36 PM Ekaterina Dimitrova > wrote: > >> Quick comment: >> >> DataRateSpec, DataStorageSpec, or DurationSpec >> - we intentionally do not support going smaller to bigger size in those >> classes which are specific for cassandra.yaml - precision issues. Please >> keep it that way. That is why the notion of min unit was added in >> cassandra.yaml for parameters that are internally represented in a bigger >> unit. >> >> I am not sure that people want to add TiB. There was explicit agreement >> what units we will allow in cassandra.yaml. I suspect any new units should >> be approved on the ML >> >> Hope this helps >> >> >> >> On Thu, 25 Apr 2024 at 5:55, Claude Warren, Jr via dev < >> dev@cassandra.apache.org> wrote: >> >>> TiB is not yet in DataStorageSpec (perhaps we should add it). >>> >>> A quick review tells me that all the units are unique across the 3 >>> specs. As long as we guarantee that in the future the method you propose >>> should be easily expandable to the other specs. >>> >>> +1 to this idea. >>> >>> On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič < >>> stefan.mikloso...@gmail.com> wrote: >>> That is a very interesting point, Claude. My so-far implementation is using FileUtils.stringifyFileSize which is just dividing a value by a respective divisor based on how big a value is. While this works, it will prevent you from specifying what unit you want that value to be converted to as well as it will prevent you from specifying what unit a value you provided is of. So, for example, if a column is known to be in kibibytes and we want that to be converted into gibibytes, that won't be possible because that function will think that a value is in bytes. It would be more appropriate to have something like this: to_human_size(val) -> alias to FileUtils.stringifyFileSize, without any source nor target unit, it will consider it to be in bytes and it will convert it like in FileUtils.stringifyFileSize to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB') to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB') the first argument is the source unit, the second argument is target unit to_human_size(val, 'B', 'MiB') to_human_size(val, 'B', 'GiB') to_human_size(val, 'KiB', 'GiB') to_human_size(val, 'KiB', 'TiB') I think this is more flexible and we should funnel this via DataStorageSpec and similar as you mentioned. In the future, we might also add to_human_duration which would be implemented against DurationSpec so similar conversions are possible. On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev < dev@cassandra.apache.org> wrote: > I like the idea. Is the intention to have the of the function be > parsable by the config parsers like DataRateSpec, DataStorageSpec, or > DurationSpec? > > Claude > > On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg > wrote: > >> Hi, >> >> I think it’s a good quality of life improvement, but I am someone who >> believes in a rich set of built-in functions being a good thing. >> >> A format function is a bit more scope and kind of orthogonal. It >> would still be good to have shorthand functions for things like size. >> >> Ariel >> >> On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote: >> >> Hi, >> >> I want to propose CASSANDRA-19546. It would be possible to convert >> raw numbers to something human-friendly. >> There are cases when we write just a number of bytes in our system >> tables but these numbers are just hard to parse visually. Users can >> indeed >> use this for their tables too if they find it useful. >> >> Also, a user can indeed write a UDF for this but I would prefer if we >> had something baked in. >> >> Does this make sense to people? Are there any other approaches to do >> this? >> >> https://issues.apache.org/jira/browse/CASSANDRA-19546 >> https://github.com/apache/cassandra/pull/3239/files >> >> Regards >> >> >>
Re: discuss: add to_human_size function
Well, technically I do not need DataStorageSpec at all. All I need is DataStorageUnit for that matter. That can convert from one unit to another easily. We can omit tebibytes, that's just fine. People would need to live with gibibytes at most in cqlsh output. They would not get 5 TiB but 5120 GiB, I guess that is just enough to have a picture of what magnitude that value looks like. On Thu, Apr 25, 2024 at 3:36 PM Ekaterina Dimitrova wrote: > Quick comment: > > DataRateSpec, DataStorageSpec, or DurationSpec > - we intentionally do not support going smaller to bigger size in those > classes which are specific for cassandra.yaml - precision issues. Please > keep it that way. That is why the notion of min unit was added in > cassandra.yaml for parameters that are internally represented in a bigger > unit. > > I am not sure that people want to add TiB. There was explicit agreement > what units we will allow in cassandra.yaml. I suspect any new units should > be approved on the ML > > Hope this helps > > > > On Thu, 25 Apr 2024 at 5:55, Claude Warren, Jr via dev < > dev@cassandra.apache.org> wrote: > >> TiB is not yet in DataStorageSpec (perhaps we should add it). >> >> A quick review tells me that all the units are unique across the 3 >> specs. As long as we guarantee that in the future the method you propose >> should be easily expandable to the other specs. >> >> +1 to this idea. >> >> On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič < >> stefan.mikloso...@gmail.com> wrote: >> >>> That is a very interesting point, Claude. My so-far implementation is >>> using FileUtils.stringifyFileSize which is just dividing a value by a >>> respective divisor based on how big a value is. While this works, it will >>> prevent you from specifying what unit you want that value to be converted >>> to as well as it will prevent you from specifying what unit a value you >>> provided is of. So, for example, if a column is known to be in kibibytes >>> and we want that to be converted into gibibytes, that won't be possible >>> because that function will think that a value is in bytes. >>> >>> It would be more appropriate to have something like this: >>> >>> to_human_size(val) -> alias to FileUtils.stringifyFileSize, without any >>> source nor target unit, it will consider it to be in bytes and it will >>> convert it like in FileUtils.stringifyFileSize >>> >>> to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB') >>> to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB') >>> >>> the first argument is the source unit, the second argument is target unit >>> >>> to_human_size(val, 'B', 'MiB') >>> to_human_size(val, 'B', 'GiB') >>> to_human_size(val, 'KiB', 'GiB') >>> to_human_size(val, 'KiB', 'TiB') >>> >>> I think this is more flexible and we should funnel this via >>> DataStorageSpec and similar as you mentioned. >>> >>> In the future, we might also add to_human_duration which would be >>> implemented against DurationSpec so similar conversions are possible. >>> >>> On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev < >>> dev@cassandra.apache.org> wrote: >>> I like the idea. Is the intention to have the of the function be parsable by the config parsers like DataRateSpec, DataStorageSpec, or DurationSpec? Claude On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg wrote: > Hi, > > I think it’s a good quality of life improvement, but I am someone who > believes in a rich set of built-in functions being a good thing. > > A format function is a bit more scope and kind of orthogonal. It would > still be good to have shorthand functions for things like size. > > Ariel > > On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote: > > Hi, > > I want to propose CASSANDRA-19546. It would be possible to convert raw > numbers to something human-friendly. > There are cases when we write just a number of bytes in our system > tables but these numbers are just hard to parse visually. Users can indeed > use this for their tables too if they find it useful. > > Also, a user can indeed write a UDF for this but I would prefer if we > had something baked in. > > Does this make sense to people? Are there any other approaches to do > this? > > https://issues.apache.org/jira/browse/CASSANDRA-19546 > https://github.com/apache/cassandra/pull/3239/files > > Regards > > >
Re: discuss: add to_human_size function
Edit: I meant to say smaller to bigger unit, not size, sorry On Thu, 25 Apr 2024 at 6:35, Ekaterina Dimitrova wrote: > Quick comment: > > DataRateSpec, DataStorageSpec, or DurationSpec > - we intentionally do not support going smaller to bigger size in those > classes which are specific for cassandra.yaml - precision issues. Please > keep it that way. That is why the notion of min unit was added in > cassandra.yaml for parameters that are internally represented in a bigger > unit. > > I am not sure that people want to add TiB. There was explicit agreement > what units we will allow in cassandra.yaml. I suspect any new units should > be approved on the ML > > Hope this helps > > > > On Thu, 25 Apr 2024 at 5:55, Claude Warren, Jr via dev < > dev@cassandra.apache.org> wrote: > >> TiB is not yet in DataStorageSpec (perhaps we should add it). >> >> A quick review tells me that all the units are unique across the 3 >> specs. As long as we guarantee that in the future the method you propose >> should be easily expandable to the other specs. >> >> +1 to this idea. >> >> On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič < >> stefan.mikloso...@gmail.com> wrote: >> >>> That is a very interesting point, Claude. My so-far implementation is >>> using FileUtils.stringifyFileSize which is just dividing a value by a >>> respective divisor based on how big a value is. While this works, it will >>> prevent you from specifying what unit you want that value to be converted >>> to as well as it will prevent you from specifying what unit a value you >>> provided is of. So, for example, if a column is known to be in kibibytes >>> and we want that to be converted into gibibytes, that won't be possible >>> because that function will think that a value is in bytes. >>> >>> It would be more appropriate to have something like this: >>> >>> to_human_size(val) -> alias to FileUtils.stringifyFileSize, without any >>> source nor target unit, it will consider it to be in bytes and it will >>> convert it like in FileUtils.stringifyFileSize >>> >>> to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB') >>> to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB') >>> >>> the first argument is the source unit, the second argument is target unit >>> >>> to_human_size(val, 'B', 'MiB') >>> to_human_size(val, 'B', 'GiB') >>> to_human_size(val, 'KiB', 'GiB') >>> to_human_size(val, 'KiB', 'TiB') >>> >>> I think this is more flexible and we should funnel this via >>> DataStorageSpec and similar as you mentioned. >>> >>> In the future, we might also add to_human_duration which would be >>> implemented against DurationSpec so similar conversions are possible. >>> >>> On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev < >>> dev@cassandra.apache.org> wrote: >>> I like the idea. Is the intention to have the of the function be parsable by the config parsers like DataRateSpec, DataStorageSpec, or DurationSpec? Claude On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg wrote: > Hi, > > I think it’s a good quality of life improvement, but I am someone who > believes in a rich set of built-in functions being a good thing. > > A format function is a bit more scope and kind of orthogonal. It would > still be good to have shorthand functions for things like size. > > Ariel > > On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote: > > Hi, > > I want to propose CASSANDRA-19546. It would be possible to convert raw > numbers to something human-friendly. > There are cases when we write just a number of bytes in our system > tables but these numbers are just hard to parse visually. Users can indeed > use this for their tables too if they find it useful. > > Also, a user can indeed write a UDF for this but I would prefer if we > had something baked in. > > Does this make sense to people? Are there any other approaches to do > this? > > https://issues.apache.org/jira/browse/CASSANDRA-19546 > https://github.com/apache/cassandra/pull/3239/files > > Regards > > >
Re: discuss: add to_human_size function
Quick comment: DataRateSpec, DataStorageSpec, or DurationSpec - we intentionally do not support going smaller to bigger size in those classes which are specific for cassandra.yaml - precision issues. Please keep it that way. That is why the notion of min unit was added in cassandra.yaml for parameters that are internally represented in a bigger unit. I am not sure that people want to add TiB. There was explicit agreement what units we will allow in cassandra.yaml. I suspect any new units should be approved on the ML Hope this helps On Thu, 25 Apr 2024 at 5:55, Claude Warren, Jr via dev < dev@cassandra.apache.org> wrote: > TiB is not yet in DataStorageSpec (perhaps we should add it). > > A quick review tells me that all the units are unique across the 3 specs. > As long as we guarantee that in the future the method you propose should be > easily expandable to the other specs. > > +1 to this idea. > > On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič < > stefan.mikloso...@gmail.com> wrote: > >> That is a very interesting point, Claude. My so-far implementation is >> using FileUtils.stringifyFileSize which is just dividing a value by a >> respective divisor based on how big a value is. While this works, it will >> prevent you from specifying what unit you want that value to be converted >> to as well as it will prevent you from specifying what unit a value you >> provided is of. So, for example, if a column is known to be in kibibytes >> and we want that to be converted into gibibytes, that won't be possible >> because that function will think that a value is in bytes. >> >> It would be more appropriate to have something like this: >> >> to_human_size(val) -> alias to FileUtils.stringifyFileSize, without any >> source nor target unit, it will consider it to be in bytes and it will >> convert it like in FileUtils.stringifyFileSize >> >> to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB') >> to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB') >> >> the first argument is the source unit, the second argument is target unit >> >> to_human_size(val, 'B', 'MiB') >> to_human_size(val, 'B', 'GiB') >> to_human_size(val, 'KiB', 'GiB') >> to_human_size(val, 'KiB', 'TiB') >> >> I think this is more flexible and we should funnel this via >> DataStorageSpec and similar as you mentioned. >> >> In the future, we might also add to_human_duration which would be >> implemented against DurationSpec so similar conversions are possible. >> >> On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev < >> dev@cassandra.apache.org> wrote: >> >>> I like the idea. Is the intention to have the of the function be >>> parsable by the config parsers like DataRateSpec, DataStorageSpec, or >>> DurationSpec? >>> >>> Claude >>> >>> On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg >>> wrote: >>> Hi, I think it’s a good quality of life improvement, but I am someone who believes in a rich set of built-in functions being a good thing. A format function is a bit more scope and kind of orthogonal. It would still be good to have shorthand functions for things like size. Ariel On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote: Hi, I want to propose CASSANDRA-19546. It would be possible to convert raw numbers to something human-friendly. There are cases when we write just a number of bytes in our system tables but these numbers are just hard to parse visually. Users can indeed use this for their tables too if they find it useful. Also, a user can indeed write a UDF for this but I would prefer if we had something baked in. Does this make sense to people? Are there any other approaches to do this? https://issues.apache.org/jira/browse/CASSANDRA-19546 https://github.com/apache/cassandra/pull/3239/files Regards
Re: discuss: add to_human_size function
TiB is not yet in DataStorageSpec (perhaps we should add it). A quick review tells me that all the units are unique across the 3 specs. As long as we guarantee that in the future the method you propose should be easily expandable to the other specs. +1 to this idea. On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič < stefan.mikloso...@gmail.com> wrote: > That is a very interesting point, Claude. My so-far implementation is > using FileUtils.stringifyFileSize which is just dividing a value by a > respective divisor based on how big a value is. While this works, it will > prevent you from specifying what unit you want that value to be converted > to as well as it will prevent you from specifying what unit a value you > provided is of. So, for example, if a column is known to be in kibibytes > and we want that to be converted into gibibytes, that won't be possible > because that function will think that a value is in bytes. > > It would be more appropriate to have something like this: > > to_human_size(val) -> alias to FileUtils.stringifyFileSize, without any > source nor target unit, it will consider it to be in bytes and it will > convert it like in FileUtils.stringifyFileSize > > to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB') > to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB') > > the first argument is the source unit, the second argument is target unit > > to_human_size(val, 'B', 'MiB') > to_human_size(val, 'B', 'GiB') > to_human_size(val, 'KiB', 'GiB') > to_human_size(val, 'KiB', 'TiB') > > I think this is more flexible and we should funnel this via > DataStorageSpec and similar as you mentioned. > > In the future, we might also add to_human_duration which would be > implemented against DurationSpec so similar conversions are possible. > > On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev < > dev@cassandra.apache.org> wrote: > >> I like the idea. Is the intention to have the of the function be >> parsable by the config parsers like DataRateSpec, DataStorageSpec, or >> DurationSpec? >> >> Claude >> >> On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg wrote: >> >>> Hi, >>> >>> I think it’s a good quality of life improvement, but I am someone who >>> believes in a rich set of built-in functions being a good thing. >>> >>> A format function is a bit more scope and kind of orthogonal. It would >>> still be good to have shorthand functions for things like size. >>> >>> Ariel >>> >>> On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote: >>> >>> Hi, >>> >>> I want to propose CASSANDRA-19546. It would be possible to convert raw >>> numbers to something human-friendly. >>> There are cases when we write just a number of bytes in our system >>> tables but these numbers are just hard to parse visually. Users can indeed >>> use this for their tables too if they find it useful. >>> >>> Also, a user can indeed write a UDF for this but I would prefer if we >>> had something baked in. >>> >>> Does this make sense to people? Are there any other approaches to do >>> this? >>> >>> https://issues.apache.org/jira/browse/CASSANDRA-19546 >>> https://github.com/apache/cassandra/pull/3239/files >>> >>> Regards >>> >>> >>>
Re: discuss: add to_human_size function
That is a very interesting point, Claude. My so-far implementation is using FileUtils.stringifyFileSize which is just dividing a value by a respective divisor based on how big a value is. While this works, it will prevent you from specifying what unit you want that value to be converted to as well as it will prevent you from specifying what unit a value you provided is of. So, for example, if a column is known to be in kibibytes and we want that to be converted into gibibytes, that won't be possible because that function will think that a value is in bytes. It would be more appropriate to have something like this: to_human_size(val) -> alias to FileUtils.stringifyFileSize, without any source nor target unit, it will consider it to be in bytes and it will convert it like in FileUtils.stringifyFileSize to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB') to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB') the first argument is the source unit, the second argument is target unit to_human_size(val, 'B', 'MiB') to_human_size(val, 'B', 'GiB') to_human_size(val, 'KiB', 'GiB') to_human_size(val, 'KiB', 'TiB') I think this is more flexible and we should funnel this via DataStorageSpec and similar as you mentioned. In the future, we might also add to_human_duration which would be implemented against DurationSpec so similar conversions are possible. On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev < dev@cassandra.apache.org> wrote: > I like the idea. Is the intention to have the of the function be parsable > by the config parsers like DataRateSpec, DataStorageSpec, or DurationSpec? > > Claude > > On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg wrote: > >> Hi, >> >> I think it’s a good quality of life improvement, but I am someone who >> believes in a rich set of built-in functions being a good thing. >> >> A format function is a bit more scope and kind of orthogonal. It would >> still be good to have shorthand functions for things like size. >> >> Ariel >> >> On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote: >> >> Hi, >> >> I want to propose CASSANDRA-19546. It would be possible to convert raw >> numbers to something human-friendly. >> There are cases when we write just a number of bytes in our system tables >> but these numbers are just hard to parse visually. Users can indeed use >> this for their tables too if they find it useful. >> >> Also, a user can indeed write a UDF for this but I would prefer if we had >> something baked in. >> >> Does this make sense to people? Are there any other approaches to do >> this? >> >> https://issues.apache.org/jira/browse/CASSANDRA-19546 >> https://github.com/apache/cassandra/pull/3239/files >> >> Regards >> >> >>
Re: discuss: add to_human_size function
I like the idea. Is the intention to have the of the function be parsable by the config parsers like DataRateSpec, DataStorageSpec, or DurationSpec? Claude On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg wrote: > Hi, > > I think it’s a good quality of life improvement, but I am someone who > believes in a rich set of built-in functions being a good thing. > > A format function is a bit more scope and kind of orthogonal. It would > still be good to have shorthand functions for things like size. > > Ariel > > On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote: > > Hi, > > I want to propose CASSANDRA-19546. It would be possible to convert raw > numbers to something human-friendly. > There are cases when we write just a number of bytes in our system tables > but these numbers are just hard to parse visually. Users can indeed use > this for their tables too if they find it useful. > > Also, a user can indeed write a UDF for this but I would prefer if we had > something baked in. > > Does this make sense to people? Are there any other approaches to do this? > > https://issues.apache.org/jira/browse/CASSANDRA-19546 > https://github.com/apache/cassandra/pull/3239/files > > Regards > > >
Re: discuss: add to_human_size function
Hi, I think it’s a good quality of life improvement, but I am someone who believes in a rich set of built-in functions being a good thing. A format function is a bit more scope and kind of orthogonal. It would still be good to have shorthand functions for things like size. Ariel On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote: > Hi, > > I want to propose CASSANDRA-19546. It would be possible to convert raw > numbers to something human-friendly. > There are cases when we write just a number of bytes in our system tables but > these numbers are just hard to parse visually. Users can indeed use this for > their tables too if they find it useful. > > Also, a user can indeed write a UDF for this but I would prefer if we had > something baked in. > > Does this make sense to people? Are there any other approaches to do this? > > https://issues.apache.org/jira/browse/CASSANDRA-19546 > https://github.com/apache/cassandra/pull/3239/files > > Regards
Re: discuss: add to_human_size function
I think we might still have two functions. The first one, format, as you just showed, which would copy the behaviour in MySQL as closely as possible. The second one would deal with sizes, like "format_size", which would append size unit, as shown in the branch I posted. WDYT? Regards On Thu, Apr 11, 2024 at 6:09 AM Brad wrote: > It's a useful idea and something supported in other databases. > > MySQL has FORMAT function: > > FORMAT(X,D[,locale]) > > > Formats the number X to a format like '#,###,###.##', rounded to D decimal > places, and returns the result as a string. If D is 0, the result has no > decimal point or fractional part. If X or D is NULL, the function returns > NULL.FORMAT(X,D[,locale]) > > > > ex: > > > SELECT FORMAT(250500.5634, 2); > > 250,500.56 > > > SELECT FORMAT(250500.5634,0); > > 250,500 > > > > https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_format > > > On Tue, Apr 9, 2024 at 8:10 AM Štefan Miklošovič < > stefan.mikloso...@gmail.com> wrote: > >> Hi, >> >> I want to propose CASSANDRA-19546. It would be possible to convert raw >> numbers to something human-friendly. >> There are cases when we write just a number of bytes in our system tables >> but these numbers are just hard to parse visually. Users can indeed use >> this for their tables too if they find it useful. >> >> Also, a user can indeed write a UDF for this but I would prefer if we had >> something baked in. >> >> Does this make sense to people? Are there any other approaches to do >> this? >> >> https://issues.apache.org/jira/browse/CASSANDRA-19546 >> https://github.com/apache/cassandra/pull/3239/files >> >> Regards >> >
Re: discuss: add to_human_size function
It's a useful idea and something supported in other databases. MySQL has FORMAT function: FORMAT(X,D[,locale]) Formats the number X to a format like '#,###,###.##', rounded to D decimal places, and returns the result as a string. If D is 0, the result has no decimal point or fractional part. If X or D is NULL, the function returns NULL.FORMAT(X,D[,locale]) ex: SELECT FORMAT(250500.5634, 2); 250,500.56 SELECT FORMAT(250500.5634,0); 250,500 https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_format On Tue, Apr 9, 2024 at 8:10 AM Štefan Miklošovič < stefan.mikloso...@gmail.com> wrote: > Hi, > > I want to propose CASSANDRA-19546. It would be possible to convert raw > numbers to something human-friendly. > There are cases when we write just a number of bytes in our system tables > but these numbers are just hard to parse visually. Users can indeed use > this for their tables too if they find it useful. > > Also, a user can indeed write a UDF for this but I would prefer if we had > something baked in. > > Does this make sense to people? Are there any other approaches to do this? > > https://issues.apache.org/jira/browse/CASSANDRA-19546 > https://github.com/apache/cassandra/pull/3239/files > > Regards >
discuss: add to_human_size function
Hi, I want to propose CASSANDRA-19546. It would be possible to convert raw numbers to something human-friendly. There are cases when we write just a number of bytes in our system tables but these numbers are just hard to parse visually. Users can indeed use this for their tables too if they find it useful. Also, a user can indeed write a UDF for this but I would prefer if we had something baked in. Does this make sense to people? Are there any other approaches to do this? https://issues.apache.org/jira/browse/CASSANDRA-19546 https://github.com/apache/cassandra/pull/3239/files Regards