On Tue, Feb 21, 2017 at 7:55 AM, sebb <seb...@gmail.com> wrote: > On 21 February 2017 at 12:40, Rob Tompkins <chtom...@apache.org> wrote: >> >>> On Feb 21, 2017, at 6:02 AM, sebb <seb...@gmail.com> wrote: >>> >>> On 21 February 2017 at 04:40, Sampanna Kahu <sampy...@gmail.com >>> <mailto:sampy...@gmail.com>> wrote: >>>> Hi Guys, >>>> Very good points are being made above. Please allow me to add my two cents >>>> :-) >>>> >>>> What if the string contains syntactically valid HTML characters/tags and >>>> our aim is to prevent rendering these tags in the browser when this string >>>> is being served via a web application? Or prevent the execution of harmful >>>> embedded scripts when serving it? The 'escapeOnce' method could be useful >>>> here, right? >>> >>> I don't think so. >>> >>>> To explain better, let's consider an example of the specific use-case that >>>> I had in mind when building the 'escapeOnce' method: >>>> Consider the scenario of a simple restful web application where users can >>>> manipulate their text using simple crud operations. Lets assume that we do >>>> not have the 'escapeOnce' method yet. >>>> 1. A user comes and submits his string. We escape it and store it in our >>>> database. If the string had any HTML characters, they would have gotten >>>> escaped. >>>> >>>> 2. After some time, the same user fetches his string, adds some more HTML >>>> characters and submits it. At this point, although the escape method would >>>> correctly escape the freshly added HTML characters, it would escape the >>>> older escaped HTML characters again! (for example > would become >>>> &gt;) >>>> And this effect gets magnified if step number 2 above is repeated. >>> >>> Of course, that is my point. >>> >>> Also remember that you want to show the original string to the user. >>> That's not possible in general if you use this approach. >>> >>> Suppose they originally entered >>> >>> "To code ampersand (&) in HTML, use '&'" >>> >>> Using escapeOnce, this would become: >>> >>> "To code ampersand (&) in HTML, use '&'" >>> >>> You can either show that directly to the user, or use an unescapeOnce >>> and show them: >>> >>> "To code ampersand (&) in HTML, use '&'"
I have had this use case in a project (enclosing XML/HTML content in a XML stream) and the expected output for escapeOnce in this case would be: "To code ampersand (&) in HTML, use '&amp;'" And similarly unsecape once would generate back: "To code ampersand (&) in HTML, use '&'" Just my two cents, as I have had to write this code. >>> >>> Neither makes any sense. >>> >>>> How do we solve the above problem without the 'escapeOnce' method? >>> >>> Store the raw string in the database and escape it just before display. >>> >>> If you are using Javascript, then use an approach such as this to escape it: >>> >>> document.getElementById("whereItGoes").appendChild(document.createTextNode(unsafe_str)); >>> >>> See: >>> >>> http://shebang.brandonmintern.com/foolproof-html-escaping-in-javascript/ >>> <http://shebang.brandonmintern.com/foolproof-html-escaping-in-javascript/> >>> >>> This has a good discussion of some of the problems. >>> >>> == >>> >>> Sorry, but it's not possible in general to do what you want, because >>> one cannot reliably determine if a string has been escaped just from >>> looking at the string. >> >> Another thought occurred to me (again despite potential lack of value). >> >> We should be able to quickly verify if there are any escape strings in the >> string in question. A single application of unescape followed by checking >> string equality with the original input would yield a predicate on the >> existence of escape’s present in the input in question. > > Again, what does unescape mean in this context? > Does it ignore incomplete escape sequences, or throw an error? > >> From there we could: (1) escape if no escapes were present in the original, >> or (2) throw an exception if there were escapes present in the original >> string. >> Again, this feels contrived, so I’m not really suggesting that we add it. >> I’m just playing with ideas here that could accomplish what Sampanna is >> going for. > > The request is impossible to fulfill reliably, and does not deserve to > be added to a Commons library. > > I don't know why this is still being discussed. > >> -Rob >> >>> >>> The most one can do is to sanitise the string by escaping anything >>> that is unescaped. >>> However that process corrupts the input - a browser won't display the >>> proper output. >>> >>>> On 20 February 2017 at 21:40, sebb <seb...@gmail.com> wrote: >>>> >>>>> On 20 February 2017 at 15:36, Rob Tompkins <chtom...@apache.org> wrote: >>>>>> >>>>>>> On Feb 20, 2017, at 10:30 AM, sebb <seb...@gmail.com> wrote: >>>>>>> >>>>>>> On 20 February 2017 at 14:55, Rob Tompkins <chtom...@apache.org> wrote: >>>>>>>> >>>>>>>>> On Feb 20, 2017, at 4:31 AM, sebb <seb...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> On 19 February 2017 at 14:29, Raymond DeCampo <r...@decampo.org >>>>> <mailto:r...@decampo.org>> wrote: >>>>>>>>>> I am trying to see how having the proposed unescape() method leads >>>>> to an a >>>>>>>>>> useful escape method. >>>>>>>>>> >>>>>>>>>> E.g. clearly unescape("&") would evaluate to "&". So would >>>>>>>>>> unescape("&amp;"). That means the proposed escape() method >>>>> would also >>>>>>>>>> have the same output for "&" and "&amp;". >>>>>>>>>> >>>>>>>>>> I think a better approach for an idempotent escape would be to just >>>>>>>>>> unescape the string once, and then run the traditional escape. >>>>>>>>> >>>>>>>>> That does not eliminate the problems, as you state below. >>>>>>>>> >>>>>>>>>> You will >>>>>>>>>> still have issues if the user intended to escape the string "&" >>>>> but you >>>>>>>>>> are never going to crack that without some kind of state saving. >>>>>>>>> >>>>>>>>> That is my exact point. >>>>>>>>> >>>>>>>>> Since it's not possible for the function to work reliably, we should >>>>>>>>> not mislead users by pretending that there is a magic method that >>>>>>>>> works. >>>>>>>>> >>>>>>>>>> Than given that the functionality is available via to consecutive >>>>> calls to >>>>>>>>>> existing methods, I would probably be disinclined to include it in >>>>> the >>>>>>>>>> library. >>>>>>>>> >>>>>>>>> +1 >>>>>>>> >>>>>>>> I’m a (+1) for removal as well. >>>>>>>> >>>>>>>> Also, I didn’t mean for my example to sound like a proposal. I merely >>>>> was trying to get to a potentially valuable stateless idempotent string >>>>> escape function. Its contrivance it quite clear. >>>>>>>> >>>>>>>> Any other comments out there? >>>>>>>> >>>>>>>> We could provide a stateful escaper (that figures out how many escapes >>>>> a string is in), or a method that returns the number of escapes in a >>>>> string >>>>> is. Again, I’m not all that sure on the value of such methods. >>>>>>> >>>>>>> I don't think it's possible to work out the number of times a string >>>>>>> has been escaped. >>>>>> >>>>>> That may indeed be true, but it is possible to return the number of >>>>> times unescape need be run before subsequent unescapes yield the same >>>>> result. >>>>> >>>>> That in itself is potentially ambiguous. >>>>> Does the unescaper keep going until there are no valid escape >>>>> sequences left, or does it stop when there is a least one ampersand >>>>> which is not part of a valid escape sequence? >>>>> >>>>>> Again, I’m not sure if this is a valuable measure to concern ourselves >>>>> with. >>>>> >>>>> I don't think it provides anything useful. >>>>> >>>>>>> >>>>>>> The most one can do is to determine if a string has not been escaped. >>>>>>> That would be the case where a string has one or more unescaped >>>>>>> characters in it. >>>>>>> For example "This & that" has obviously not been escaped. >>>>>>> >>>>>>> However if a string has no un-escaped characters it it, that does not >>>>>>> necessarily mean that it has already been escaped. >>>>>>> For example: "This & that". >>>>>>> This might have been escaped - or it might not. >>>>>> >>>>>> Ah, I was using the definition of “having been escaped” to be that the >>>>> string contains escape sequences. >>>>>> >>>>>>> For example it could be the answer to: "How does one code 'This & >>>>>>> that' in HTML?” >>>>>>> >>>>>>> The application has to keep track of the escape-state of the string. >>>>>> >>>>>> Definitely agreed with your definition of “having been escaped." >>>>>> >>>>>>> >>>>>>>> Cheers, >>>>>>>> -Rob >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sat, Feb 18, 2017 at 12:04 PM, Rob Tompkins <chtom...@gmail.com> >>>>> wrote: >>>>>>>>>> >>>>>>>>>>> In preparation for the 1.0 release, I think we should address Sebb's >>>>>>>>>>> concern in TEXT-40 about the attempt to create "idempotent" string >>>>> escape >>>>>>>>>>> methods. By idempotent I mean someMethod("some string") = >>>>>>>>>>> someMethod(someMethod(someMethod(...someMethod("some string")))), a >>>>>>>>>>> single application of a method is equal to any number of the >>>>> applications >>>>>>>>>>> of the method on the same input. >>>>>>>>>>> >>>>>>>>>>> Below I lay out a mechanism by which it is possible to write such >>>>> methods, >>>>>>>>>>> but I don’t know the value in writing such methods. I'm merely >>>>> expressing >>>>>>>>>>> that idempotency is a possibility. >>>>>>>>>>> >>>>>>>>>>> For string "un-escaping", I believe that we can write a method that, >>>>>>>>>>> indeed, is idempotent by simply running the un-escape method the >>>>> finite >>>>>>>>>>> number of un-escapings to get to the point at which the string >>>>> remains >>>>>>>>>>> unchanged between applications of the un-escaping method. (I >>>>> believe that I >>>>>>>>>>> can write a proof that all un-escape methods have such a point, if >>>>> that is >>>>>>>>>>> needed for the sake of discussion). >>>>>>>>>>> >>>>>>>>>>> If indeed we can create an idempotent un-escape method, then we can >>>>> simply >>>>>>>>>>> take that method run it, and then run the escaping method one time. >>>>> If we >>>>>>>>>>> always completely unescape and then escape once then we do have an >>>>>>>>>>> idempotent method. >>>>>>>>>>> >>>>>>>>>>> Such a method might not be all that valuable to the user though. >>>>>>>>>>> Furthermore, this just explains one way to create such an idempotent >>>>>>>>>>> method. Whether or not more or more valuable methods exists, would >>>>> take >>>>>>>>>>> some more though. >>>>>>>>>>> >>>>>>>>>>> Anyone have any thoughts? My feeling is that it might be more >>>>> effort than >>>>>>>>>>> it's worth to ensure that any string is only "singly encoded.” >>>>> Further, we >>>>>>>>>>> probably should give a look at the “escape_once” methods in >>>>>>>>>>> StringEsapeUtils. >>>>>>>>>>> >>>>>>>>>>> Cheers >>>>>>>>>>> -Rob >>>>>>>>>>> ------------------------------------------------------------ >>>>> --------- >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>>>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org <mailto: >>>>> dev-unsubscr...@commons.apache.org> >>>>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org <mailto: >>>>> dev-h...@commons.apache.org> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>>>> >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>> >>>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>> <mailto:dev-unsubscr...@commons.apache.org> >>> For additional commands, e-mail: dev-h...@commons.apache.org >>> <mailto:dev-h...@commons.apache.org> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org