Re: [text] On the value of idempotent string escape methods?

Bhowmik, Bindul Tue, 21 Feb 2017 13:32:24 -0800

On Tue, Feb 21, 2017 at 7:55 AM, sebb <[email protected]> wrote:
> On 21 February 2017 at 12:40, Rob Tompkins <[email protected]> wrote:
>>
>>> On Feb 21, 2017, at 6:02 AM, sebb <[email protected]> wrote:
>>>
>>> On 21 February 2017 at 04:40, Sampanna Kahu <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>> Hi Guys,
>>>> Very good points are being made above. Please allow me to add my two cents
>>>> :-)
>>>>
>>>> What if the string contains syntactically valid HTML characters/tags and
>>>> our aim is to prevent rendering these tags in the browser when this string
>>>> is being served via a web application? Or prevent the execution of harmful
>>>> embedded scripts when serving it? The 'escapeOnce' method could be useful
>>>> here, right?
>>>
>>> I don't think so.
>>>
>>>> To explain better, let's consider an example of the specific use-case that
>>>> I had in mind when building the 'escapeOnce' method:
>>>> Consider the scenario of a simple restful web application where users can
>>>> manipulate their text using simple crud operations. Lets assume that we do
>>>> not have the 'escapeOnce' method yet.
>>>> 1. A user comes and submits his string. We escape it and store it in our
>>>> database. If the string had any HTML characters, they would have gotten
>>>> escaped.
>>>>
>>>> 2. After some time, the same user fetches his string, adds some more HTML
>>>> characters and submits it. At this point, although the escape method would
>>>> correctly escape the freshly added HTML characters, it would escape the
>>>> older escaped HTML characters again! (for example &gt; would become
>>>> &amp;gt;)
>>>> And this effect gets magnified if step number 2 above is repeated.
>>>
>>> Of course, that is my point.
>>>
>>> Also remember that you want to show the original string to the user.
>>> That's not possible in general if you use this approach.
>>>
>>> Suppose they originally entered
>>>
>>> "To code ampersand (&) in HTML, use '&amp;'"
>>>
>>> Using escapeOnce, this would become:
>>>
>>> "To code ampersand (&amp;) in HTML, use '&amp;'"
>>>
>>> You can either show that directly to the user, or use an unescapeOnce
>>> and show them:
>>>
>>> "To code ampersand (&) in HTML, use '&'"


I have had this use case in a project (enclosing XML/HTML content in a
XML stream) and the expected output for escapeOnce in this case would
be:
"To code ampersand (&amp;) in HTML, use '&amp;amp;'"

And similarly unsecape once would generate back:
"To code ampersand (&) in HTML, use '&amp;'"

Just my two cents, as I have had to write this code.

>>>
>>> Neither makes any sense.
>>>
>>>> How do we solve the above problem without the 'escapeOnce' method?
>>>
>>> Store the raw string in the database and escape it just before display.
>>>
>>> If you are using Javascript, then use an approach such as this to escape it:
>>>
>>> document.getElementById("whereItGoes").appendChild(document.createTextNode(unsafe_str));
>>>
>>> See:
>>>
>>> http://shebang.brandonmintern.com/foolproof-html-escaping-in-javascript/ 
>>> <http://shebang.brandonmintern.com/foolproof-html-escaping-in-javascript/>
>>>
>>> This has a good discussion of some of the problems.
>>>
>>> ==
>>>
>>> Sorry, but it's not possible in general to do what you want, because
>>> one cannot reliably determine if a string has been escaped just from
>>> looking at the string.
>>
>> Another thought occurred to me (again despite potential lack of value).
>>
>> We should be able to quickly verify if there are any escape strings in the 
>> string in question. A single application of unescape followed by checking 
>> string equality with the original input would yield a predicate on the 
>> existence of escape’s present in the input in question.
>
> Again, what does unescape mean in this context?
> Does it ignore incomplete escape sequences, or throw an error?
>
>> From there we could: (1) escape if no escapes were present in the original, 
>> or (2) throw an exception if there were escapes present in the original 
>> string.
>> Again, this feels contrived, so I’m not really suggesting that we add it. 
>> I’m just playing with ideas here that could accomplish what Sampanna is 
>> going for.
>
> The request is impossible to fulfill reliably, and does not deserve to
> be added to a Commons library.
>
> I don't know why this is still being discussed.
>
>> -Rob
>>
>>>
>>> The most one can do is to sanitise the string by escaping anything
>>> that is unescaped.
>>> However that process corrupts the input - a browser won't display the
>>> proper output.
>>>
>>>> On 20 February 2017 at 21:40, sebb <[email protected]> wrote:
>>>>
>>>>> On 20 February 2017 at 15:36, Rob Tompkins <[email protected]> wrote:
>>>>>>
>>>>>>> On Feb 20, 2017, at 10:30 AM, sebb <[email protected]> wrote:
>>>>>>>
>>>>>>> On 20 February 2017 at 14:55, Rob Tompkins <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> On Feb 20, 2017, at 4:31 AM, sebb <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> On 19 February 2017 at 14:29, Raymond DeCampo <[email protected]
>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>> I am trying to see how having the proposed unescape() method leads
>>>>> to an a
>>>>>>>>>> useful escape method.
>>>>>>>>>>
>>>>>>>>>> E.g. clearly unescape("&amp;") would evaluate to "&".  So would
>>>>>>>>>> unescape("&amp;amp;").  That means the proposed escape() method
>>>>> would also
>>>>>>>>>> have the same output for "&amp;" and "&amp;amp;".
>>>>>>>>>>
>>>>>>>>>> I think a better approach for an idempotent escape would be to just
>>>>>>>>>> unescape the string once, and then run the traditional escape.
>>>>>>>>>
>>>>>>>>> That does not eliminate the problems, as you state below.
>>>>>>>>>
>>>>>>>>>> You will
>>>>>>>>>> still have issues if the user intended to escape the string "&amp;"
>>>>> but you
>>>>>>>>>> are never going to crack that without some kind of state saving.
>>>>>>>>>
>>>>>>>>> That is my exact point.
>>>>>>>>>
>>>>>>>>> Since it's not possible for the function to work reliably, we should
>>>>>>>>> not mislead users by pretending that there is a magic method that
>>>>>>>>> works.
>>>>>>>>>
>>>>>>>>>> Than given that the functionality is available via to consecutive
>>>>> calls to
>>>>>>>>>> existing methods, I would probably be disinclined to include it in
>>>>> the
>>>>>>>>>> library.
>>>>>>>>>
>>>>>>>>> +1
>>>>>>>>
>>>>>>>> I’m a (+1) for removal as well.
>>>>>>>>
>>>>>>>> Also, I didn’t mean for my example to sound like a proposal. I merely
>>>>> was trying to get to a potentially valuable stateless idempotent string
>>>>> escape function. Its contrivance it quite clear.
>>>>>>>>
>>>>>>>> Any other comments out there?
>>>>>>>>
>>>>>>>> We could provide a stateful escaper (that figures out how many escapes
>>>>> a string is in), or a method that returns the number of escapes in a 
>>>>> string
>>>>> is. Again, I’m not all that sure on the value of such methods.
>>>>>>>
>>>>>>> I don't think it's possible to work out the number of times a string
>>>>>>> has been escaped.
>>>>>>
>>>>>> That may indeed be true, but it is possible to return the number of
>>>>> times unescape need be run before subsequent unescapes yield the same
>>>>> result.
>>>>>
>>>>> That in itself is potentially ambiguous.
>>>>> Does the unescaper keep going until there are no valid escape
>>>>> sequences left, or does it stop when there is a least one ampersand
>>>>> which is not part of a valid escape sequence?
>>>>>
>>>>>> Again, I’m not sure if this is a valuable measure to concern ourselves
>>>>> with.
>>>>>
>>>>> I don't think it provides anything useful.
>>>>>
>>>>>>>
>>>>>>> The most one can do is to determine if a string has not been escaped.
>>>>>>> That would be the case where a string has one or more unescaped
>>>>>>> characters in it.
>>>>>>> For example "This & that" has obviously not been escaped.
>>>>>>>
>>>>>>> However if a string has no un-escaped characters it it, that does not
>>>>>>> necessarily mean that it has already been escaped.
>>>>>>> For example: "This &amp; that".
>>>>>>> This might have been escaped - or it might not.
>>>>>>
>>>>>> Ah, I was using the definition of “having been escaped” to be that the
>>>>> string contains escape sequences.
>>>>>>
>>>>>>> For example it could be the answer to: "How does one code 'This &
>>>>>>> that' in HTML?”
>>>>>>>
>>>>>>> The application has to keep track of the escape-state of the string.
>>>>>>
>>>>>> Definitely agreed with your definition of “having been escaped."
>>>>>>
>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> -Rob
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, Feb 18, 2017 at 12:04 PM, Rob Tompkins <[email protected]>
>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> In preparation for the 1.0 release, I think we should address Sebb's
>>>>>>>>>>> concern in TEXT-40 about the attempt to create "idempotent" string
>>>>> escape
>>>>>>>>>>> methods. By idempotent I mean someMethod("some string") =
>>>>>>>>>>> someMethod(someMethod(someMethod(...someMethod("some string")))), a
>>>>>>>>>>> single application of a method is equal to any number of the
>>>>> applications
>>>>>>>>>>> of the method on the same input.
>>>>>>>>>>>
>>>>>>>>>>> Below I lay out a mechanism by which it is possible to write such
>>>>> methods,
>>>>>>>>>>> but I don’t know the value in writing such methods. I'm merely
>>>>> expressing
>>>>>>>>>>> that idempotency is a possibility.
>>>>>>>>>>>
>>>>>>>>>>> For string "un-escaping", I believe that we can write a method that,
>>>>>>>>>>> indeed, is idempotent by simply running the un-escape method the
>>>>> finite
>>>>>>>>>>> number of un-escapings to get to the point at which the string
>>>>> remains
>>>>>>>>>>> unchanged between applications of the un-escaping method. (I
>>>>> believe that I
>>>>>>>>>>> can write a proof that all un-escape methods have such a point, if
>>>>> that is
>>>>>>>>>>> needed for the sake of discussion).
>>>>>>>>>>>
>>>>>>>>>>> If indeed we can create an idempotent un-escape method, then we can
>>>>> simply
>>>>>>>>>>> take that method run it, and then run the escaping method one time.
>>>>> If we
>>>>>>>>>>> always completely unescape and then escape once then we do have an
>>>>>>>>>>> idempotent method.
>>>>>>>>>>>
>>>>>>>>>>> Such a method might not be all that valuable to the user though.
>>>>>>>>>>> Furthermore, this just explains one way to create such an idempotent
>>>>>>>>>>> method. Whether or not more or more valuable methods exists, would
>>>>> take
>>>>>>>>>>> some more though.
>>>>>>>>>>>
>>>>>>>>>>> Anyone have any thoughts? My feeling is that it might be more
>>>>> effort than
>>>>>>>>>>> it's worth to ensure that any string is only "singly encoded.”
>>>>> Further, we
>>>>>>>>>>> probably should give a look at the “escape_once” methods in
>>>>>>>>>>> StringEsapeUtils.
>>>>>>>>>>>
>>>>>>>>>>> Cheers
>>>>>>>>>>> -Rob
>>>>>>>>>>> ------------------------------------------------------------
>>>>> ---------
>>>>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: [email protected] <mailto:
>>>>> [email protected]>
>>>>>>>>> For additional commands, e-mail: [email protected] <mailto:
>>>>> [email protected]>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>> For additional commands, e-mail: [email protected]
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>>
>>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected] 
>>> <mailto:[email protected]>
>>> For additional commands, e-mail: [email protected] 
>>> <mailto:[email protected]>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [text] On the value of idempotent string escape methods?

Reply via email to