Re: Compatibility guidelines for toString overrides
It sounds like what we're coming around to is that we're comfortable with this proposal at the Java API layer, but the potential for impact at the shell interface makes it overall uncomfortable to declare anything in the compatibility guidelines. I think that's a fair assessment. Steve's example of AclEntry#toString is very relevant here. Here I've been arguing that we should -1 patches that rely on toString output for any externalized format (serialization, UI, shell, etc.). I wrote or code reviewed all of the HDFS ACL code, and yet this got past me. Diligence during code review is not infallible. I have assigned HADOOP-13150 to myself to change the HDFS ACL commands to avoid dependencies on toString. Let's try to avoid this in new bits of code and update existing code as we find unfortunate dependencies on toString. --Chris Nauroth From: Steve Loughran mailto:ste...@hortonworks.com>> Date: Saturday, May 14, 2016 at 1:01 PM To: Allen Wittenauer mailto:a...@apache.org>> Cc: Chris Nauroth mailto:cnaur...@hortonworks.com>>, "common-dev@hadoop.apache.org<mailto:common-dev@hadoop.apache.org>" mailto:common-dev@hadoop.apache.org>> Subject: Re: Compatibility guidelines for toString overrides On 14 May 2016, at 18:39, Allen Wittenauer mailto:a...@apache.org>> wrote: On May 12, 2016, at 12:20 PM, Chris Nauroth mailto:cnaur...@hortonworks.com>> wrote: Hello Allen, The intent is not for this rule to override other compatibility rules, such as the important CLI output rule. It's also not intended to give us free reign to change existing toString() implementations without due diligence. If a patch changes an existing toString() implementation that already goes out to the shell or any other form of external serialization, then the patch needs to be declined. (I concur that relying on toString() like this should never be done, and I'd encourage us to fix any occurrences we find.) The problem is that we as a community do a terrible job on following the compatibility guidelines when it comes to CLI output. Because while this maybe true: Instead, the intent is to advertise to Java API consumers that toString() output may evolve freely, and therefore we recommend against writing Java code that depends on that output format. ... what's going to happen is people are going to assume that because toString is now considered Unstable, all output will be considered Unstable by way of mental short cut. There have been many, many instances over the past year or two where even long time PMC members have claimed CLI output was already Unstable/outside the scope of the compatibility guidelines and I just see this re-inforcing that (incorrect) world view. HDFS-9732 is a good example of how to handle this. I didn't explicitly -1 it, but I did call out the CLI compatibility problem and recommend how to change the patch so that it preserves compatibility. Does this help address your concerns, or is the full code audit the only thing that would convince you to lift your -1? No, because "perception is reality". If we want to make toString Unstable, then we need to be very clear as to which toStrings are Unstable and which aren't. This isn't a universal rule without doing some legwork. I am painfully coming round to this point of view *on existing classes* 1. We need some interface "StableString" which we can tag to say "this is stable"; new CLI-ready code can use it. 2. We need to see where we use stuff today., by backchaining from uses of System.out & System.err, and from TableListing uses especially (which should implement StableString itself obviously). 3. Or: look at where toString() is subclassed and go from there...there's less than 800 such overrides. But here you can't tell where the use of a string is going to be. There are obvious ones (DU, DF, others which appear to be useful, AclEntry.toString(), but many which look like IDE-generated patterns for debug. There's another tactic, which would be to add some explicit log diagnostics method which we offered no guarantees on ... but the challenge then becomes one of "how to use efficiently in logging". That could be addressed, equally painfully, by having our own log class (extending/mimicing slfj4j and which handles that message specially. That would be an absolute PITA to work with. I think I'd rather look at our command line use of toString values Now, what about new bits of code?
Re: Compatibility guidelines for toString overrides
On 14 May 2016, at 18:39, Allen Wittenauer mailto:a...@apache.org>> wrote: On May 12, 2016, at 12:20 PM, Chris Nauroth mailto:cnaur...@hortonworks.com>> wrote: Hello Allen, The intent is not for this rule to override other compatibility rules, such as the important CLI output rule. It's also not intended to give us free reign to change existing toString() implementations without due diligence. If a patch changes an existing toString() implementation that already goes out to the shell or any other form of external serialization, then the patch needs to be declined. (I concur that relying on toString() like this should never be done, and I'd encourage us to fix any occurrences we find.) The problem is that we as a community do a terrible job on following the compatibility guidelines when it comes to CLI output. Because while this maybe true: Instead, the intent is to advertise to Java API consumers that toString() output may evolve freely, and therefore we recommend against writing Java code that depends on that output format. … what’s going to happen is people are going to assume that because toString is now considered Unstable, all output will be considered Unstable by way of mental short cut. There have been many, many instances over the past year or two where even long time PMC members have claimed CLI output was already Unstable/outside the scope of the compatibility guidelines and I just see this re-inforcing that (incorrect) world view. HDFS-9732 is a good example of how to handle this. I didn't explicitly -1 it, but I did call out the CLI compatibility problem and recommend how to change the patch so that it preserves compatibility. Does this help address your concerns, or is the full code audit the only thing that would convince you to lift your -1? No, because “perception is reality”. If we want to make toString Unstable, then we need to be very clear as to which toStrings are Unstable and which aren’t. This isn’t a universal rule without doing some legwork. I am painfully coming round to this point of view *on existing classes* 1. We need some interface "StableString" which we can tag to say "this is stable"; new CLI-ready code can use it. 2. We need to see where we use stuff today., by backchaining from uses of System.out & System.err, and from TableListing uses especially (which should implement StableString itself obviously). 3. Or: look at where toString() is subclassed and go from there...there's less than 800 such overrides. But here you can't tell where the use of a string is going to be. There are obvious ones (DU, DF, others which appear to be useful, AclEntry.toString(), but many which look like IDE-generated patterns for debug. There's another tactic, which would be to add some explicit log diagnostics method which we offered no guarantees on ... but the challenge then becomes one of "how to use efficiently in logging". That could be addressed, equally painfully, by having our own log class (extending/mimicing slfj4j and which handles that message specially. That would be an absolute PITA to work with. I think I'd rather look at our command line use of toString values Now, what about new bits of code?
Re: Compatibility guidelines for toString overrides
> On May 12, 2016, at 12:20 PM, Chris Nauroth wrote: > > Hello Allen, > > The intent is not for this rule to override other compatibility rules, > such as the important CLI output rule. It's also not intended to give us > free reign to change existing toString() implementations without due > diligence. If a patch changes an existing toString() implementation that > already goes out to the shell or any other form of external serialization, > then the patch needs to be declined. (I concur that relying on toString() > like this should never be done, and I'd encourage us to fix any > occurrences we find.) The problem is that we as a community do a terrible job on following the compatibility guidelines when it comes to CLI output. Because while this maybe true: > Instead, the intent is to advertise to Java API consumers that toString() > output may evolve freely, and therefore we recommend against writing Java > code that depends on that output format. … what’s going to happen is people are going to assume that because toString is now considered Unstable, all output will be considered Unstable by way of mental short cut. There have been many, many instances over the past year or two where even long time PMC members have claimed CLI output was already Unstable/outside the scope of the compatibility guidelines and I just see this re-inforcing that (incorrect) world view. > HDFS-9732 is a good example of how to handle this. I didn't explicitly -1 > it, but I did call out the CLI compatibility problem and recommend how to > change the patch so that it preserves compatibility. > > Does this help address your concerns, or is the full code audit the only > thing that would convince you to lift your -1? No, because “perception is reality”. If we want to make toString Unstable, then we need to be very clear as to which toStrings are Unstable and which aren’t. This isn’t a universal rule without doing some legwork. - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Re: Compatibility guidelines for toString overrides
How about this idea: Define a new annotation "StableImplUnstableInterface" which means consumers can't assume stability but producers can't change things. Mark all toStrings with this annotation. Then, in a lazy fashion, as the need arises to change various toString methods, diligence can be done first to see if any legacy code depends on a method in a compatibility-breaking manner, those dependencies can be fixed, then the method is changed and remarked as unstable. Conversely, there might be circumstances where a toString method might be marked as stable. (Certainly it's reasonable to assume the Integer.toString returns a parsable result, for example, the point being that for some classes it makes sense to have a stable spec for toString). Over the years one would hope that the StableImplUnstableSpec annotations would disappear. Sent from my iPhone > On May 12, 2016, at 1:40 PM, Sean Busbey wrote: > > As a downstream user of Hadoop, it would be much clearer if the > toString functions included the appropriate annotations to say they're > non-public, evolving, or whatever. > > Most downstream users of Hadoop aren't going to remember in-detail > exceptions to the java API compatibility rules, once they see that a > class is labeled Public/Stable, they're going to presume that applies > to all non-private members. > >> On Thu, May 12, 2016 at 9:32 AM, Colin McCabe wrote: >> Hi all, >> >> Recently a discussion came up on HADOOP-13028 about the wisdom of >> overloading S3AInputStream#toString to output statistics information. >> It's a difficult judgement for me to make, since I'm not aware of any >> compatibility guidelines for InputStream#toString. Do we have >> compatibility guidelines for toString functions? >> >> It seems like the output of toString functions is usually used as a >> debugging aid, rather than as a stable format suitable for UI display or >> object serialization. Clearly, there are a few cases where we might >> want to specifically declare that a toString method is a stable API. >> However, I think if we attempt to treat the toString output of all >> public classes as stable, we will have greatly increased the API >> surface. Should we formalize this and declare that toString functions >> are @Unstable, Evolving unless declared otherwise? >> >> best, >> Colin >> >> - >> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org >> For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > > -- > busbey > > - > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Re: Compatibility guidelines for toString overrides
As a downstream user of Hadoop, it would be much clearer if the toString functions included the appropriate annotations to say they're non-public, evolving, or whatever. Most downstream users of Hadoop aren't going to remember in-detail exceptions to the java API compatibility rules, once they see that a class is labeled Public/Stable, they're going to presume that applies to all non-private members. On Thu, May 12, 2016 at 9:32 AM, Colin McCabe wrote: > Hi all, > > Recently a discussion came up on HADOOP-13028 about the wisdom of > overloading S3AInputStream#toString to output statistics information. > It's a difficult judgement for me to make, since I'm not aware of any > compatibility guidelines for InputStream#toString. Do we have > compatibility guidelines for toString functions? > > It seems like the output of toString functions is usually used as a > debugging aid, rather than as a stable format suitable for UI display or > object serialization. Clearly, there are a few cases where we might > want to specifically declare that a toString method is a stable API. > However, I think if we attempt to treat the toString output of all > public classes as stable, we will have greatly increased the API > surface. Should we formalize this and declare that toString functions > are @Unstable, Evolving unless declared otherwise? > > best, > Colin > > - > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > -- busbey - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Re: Compatibility guidelines for toString overrides
> On 12 May 2016, at 17:57, Chris Nauroth wrote: > > I'm in favor of making a statement in the compatibility guidelines that > there is no guarantee of stability or backwards-compatibility for > toString() output. If a proposed patch introduces new dependencies on a > stable toString() output, such as for UI display or object serialization, > then I'd consider -1'ing it and instead asking that the logic move to a > different method that can provide that guarantee, i.e. toStableString(). > There are further comments justifying this on HADOOP-13028 and HDFS-9732. > > --Chris Nauroth > +1 - We now need to be rigorous about using a specific method for those use cases where we offer guarantees of stability. Maybe making this an interface "StableString" with the relevant method can help manage this - we're going to need to see where there is existing tool code which logs toString() values and move them to explicitly stable code. How best to do that? - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Re: Compatibility guidelines for toString overrides
Hello Allen, The intent is not for this rule to override other compatibility rules, such as the important CLI output rule. It's also not intended to give us free reign to change existing toString() implementations without due diligence. If a patch changes an existing toString() implementation that already goes out to the shell or any other form of external serialization, then the patch needs to be declined. (I concur that relying on toString() like this should never be done, and I'd encourage us to fix any occurrences we find.) Instead, the intent is to advertise to Java API consumers that toString() output may evolve freely, and therefore we recommend against writing Java code that depends on that output format. HDFS-9732 is a good example of how to handle this. I didn't explicitly -1 it, but I did call out the CLI compatibility problem and recommend how to change the patch so that it preserves compatibility. Does this help address your concerns, or is the full code audit the only thing that would convince you to lift your -1? --Chris Nauroth On 5/12/16, 11:55 AM, "Allen Wittenauer" wrote: > > >-1 without an audit of every toString usage. We have recent examples >(e.g., HDFS-9732) where toString was used for output from CLI utilities. >(which, IMO, should never have been done, but it¹s too late now...) >Output of CLI utilities is most definitely covered by the compatibility >guidelines. Therefore, changing toString output indiscriminately may >lead to compatibility breaks. > > > >> On May 12, 2016, at 10:23 AM, Sangjin Lee wrote: >> >> +1. >> >> Thanks, >> Sangjin >> >> On Thu, May 12, 2016 at 10:05 AM, Ravi Prakash >>wrote: >> >>> Thanks sounds reasonable Colin. +1 to not using toString() as an API >>> >>> On Thu, May 12, 2016 at 9:57 AM, Chris Nauroth >>> >>> wrote: >>> I'm in favor of making a statement in the compatibility guidelines that there is no guarantee of stability or backwards-compatibility for toString() output. If a proposed patch introduces new dependencies on a stable toString() output, such as for UI display or object serialization, then I'd consider -1'ing it and instead asking that the logic move to a different method that can provide that guarantee, i.e. toStableString(). There are further comments justifying this on HADOOP-13028 and HDFS-9732. --Chris Nauroth On 5/12/16, 9:32 AM, "Colin McCabe" wrote: > Hi all, > > Recently a discussion came up on HADOOP-13028 about the wisdom of > overloading S3AInputStream#toString to output statistics information. > It's a difficult judgement for me to make, since I'm not aware of any > compatibility guidelines for InputStream#toString. Do we have > compatibility guidelines for toString functions? > > It seems like the output of toString functions is usually used as a > debugging aid, rather than as a stable format suitable for UI >display or > object serialization. Clearly, there are a few cases where we might > want to specifically declare that a toString method is a stable API. > However, I think if we attempt to treat the toString output of all > public classes as stable, we will have greatly increased the API > surface. Should we formalize this and declare that toString >functions > are @Unstable, Evolving unless declared otherwise? > > best, > Colin > > - > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org >>> > > >- >To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org >For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Re: Compatibility guidelines for toString overrides
-1 without an audit of every toString usage. We have recent examples (e.g., HDFS-9732) where toString was used for output from CLI utilities. (which, IMO, should never have been done, but it’s too late now...) Output of CLI utilities is most definitely covered by the compatibility guidelines. Therefore, changing toString output indiscriminately may lead to compatibility breaks. > On May 12, 2016, at 10:23 AM, Sangjin Lee wrote: > > +1. > > Thanks, > Sangjin > > On Thu, May 12, 2016 at 10:05 AM, Ravi Prakash wrote: > >> Thanks sounds reasonable Colin. +1 to not using toString() as an API >> >> On Thu, May 12, 2016 at 9:57 AM, Chris Nauroth >> wrote: >> >>> I'm in favor of making a statement in the compatibility guidelines that >>> there is no guarantee of stability or backwards-compatibility for >>> toString() output. If a proposed patch introduces new dependencies on a >>> stable toString() output, such as for UI display or object serialization, >>> then I'd consider -1'ing it and instead asking that the logic move to a >>> different method that can provide that guarantee, i.e. toStableString(). >>> There are further comments justifying this on HADOOP-13028 and HDFS-9732. >>> >>> --Chris Nauroth >>> >>> >>> >>> >>> On 5/12/16, 9:32 AM, "Colin McCabe" wrote: >>> Hi all, Recently a discussion came up on HADOOP-13028 about the wisdom of overloading S3AInputStream#toString to output statistics information. It's a difficult judgement for me to make, since I'm not aware of any compatibility guidelines for InputStream#toString. Do we have compatibility guidelines for toString functions? It seems like the output of toString functions is usually used as a debugging aid, rather than as a stable format suitable for UI display or object serialization. Clearly, there are a few cases where we might want to specifically declare that a toString method is a stable API. However, I think if we attempt to treat the toString output of all public classes as stable, we will have greatly increased the API surface. Should we formalize this and declare that toString functions are @Unstable, Evolving unless declared otherwise? best, Colin - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org >>> >>> >>> - >>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org >>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org >>> >>> >> - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Re: Compatibility guidelines for toString overrides
+1. Thanks, Sangjin On Thu, May 12, 2016 at 10:05 AM, Ravi Prakash wrote: > Thanks sounds reasonable Colin. +1 to not using toString() as an API > > On Thu, May 12, 2016 at 9:57 AM, Chris Nauroth > wrote: > > > I'm in favor of making a statement in the compatibility guidelines that > > there is no guarantee of stability or backwards-compatibility for > > toString() output. If a proposed patch introduces new dependencies on a > > stable toString() output, such as for UI display or object serialization, > > then I'd consider -1'ing it and instead asking that the logic move to a > > different method that can provide that guarantee, i.e. toStableString(). > > There are further comments justifying this on HADOOP-13028 and HDFS-9732. > > > > --Chris Nauroth > > > > > > > > > > On 5/12/16, 9:32 AM, "Colin McCabe" wrote: > > > > >Hi all, > > > > > >Recently a discussion came up on HADOOP-13028 about the wisdom of > > >overloading S3AInputStream#toString to output statistics information. > > >It's a difficult judgement for me to make, since I'm not aware of any > > >compatibility guidelines for InputStream#toString. Do we have > > >compatibility guidelines for toString functions? > > > > > >It seems like the output of toString functions is usually used as a > > >debugging aid, rather than as a stable format suitable for UI display or > > >object serialization. Clearly, there are a few cases where we might > > >want to specifically declare that a toString method is a stable API. > > >However, I think if we attempt to treat the toString output of all > > >public classes as stable, we will have greatly increased the API > > >surface. Should we formalize this and declare that toString functions > > >are @Unstable, Evolving unless declared otherwise? > > > > > >best, > > >Colin > > > > > >- > > >To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > >For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > > > > > > > > > > - > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > > >
Re: Compatibility guidelines for toString overrides
Thanks sounds reasonable Colin. +1 to not using toString() as an API On Thu, May 12, 2016 at 9:57 AM, Chris Nauroth wrote: > I'm in favor of making a statement in the compatibility guidelines that > there is no guarantee of stability or backwards-compatibility for > toString() output. If a proposed patch introduces new dependencies on a > stable toString() output, such as for UI display or object serialization, > then I'd consider -1'ing it and instead asking that the logic move to a > different method that can provide that guarantee, i.e. toStableString(). > There are further comments justifying this on HADOOP-13028 and HDFS-9732. > > --Chris Nauroth > > > > > On 5/12/16, 9:32 AM, "Colin McCabe" wrote: > > >Hi all, > > > >Recently a discussion came up on HADOOP-13028 about the wisdom of > >overloading S3AInputStream#toString to output statistics information. > >It's a difficult judgement for me to make, since I'm not aware of any > >compatibility guidelines for InputStream#toString. Do we have > >compatibility guidelines for toString functions? > > > >It seems like the output of toString functions is usually used as a > >debugging aid, rather than as a stable format suitable for UI display or > >object serialization. Clearly, there are a few cases where we might > >want to specifically declare that a toString method is a stable API. > >However, I think if we attempt to treat the toString output of all > >public classes as stable, we will have greatly increased the API > >surface. Should we formalize this and declare that toString functions > >are @Unstable, Evolving unless declared otherwise? > > > >best, > >Colin > > > >- > >To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > >For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > > > > > - > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > >
Re: Compatibility guidelines for toString overrides
I'm in favor of making a statement in the compatibility guidelines that there is no guarantee of stability or backwards-compatibility for toString() output. If a proposed patch introduces new dependencies on a stable toString() output, such as for UI display or object serialization, then I'd consider -1'ing it and instead asking that the logic move to a different method that can provide that guarantee, i.e. toStableString(). There are further comments justifying this on HADOOP-13028 and HDFS-9732. --Chris Nauroth On 5/12/16, 9:32 AM, "Colin McCabe" wrote: >Hi all, > >Recently a discussion came up on HADOOP-13028 about the wisdom of >overloading S3AInputStream#toString to output statistics information. >It's a difficult judgement for me to make, since I'm not aware of any >compatibility guidelines for InputStream#toString. Do we have >compatibility guidelines for toString functions? > >It seems like the output of toString functions is usually used as a >debugging aid, rather than as a stable format suitable for UI display or >object serialization. Clearly, there are a few cases where we might >want to specifically declare that a toString method is a stable API. >However, I think if we attempt to treat the toString output of all >public classes as stable, we will have greatly increased the API >surface. Should we formalize this and declare that toString functions >are @Unstable, Evolving unless declared otherwise? > >best, >Colin > >- >To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org >For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org