Ah, this is excellent. Low impact, yet flexible.

The technique you mention is quite useful for two reasons:

1) It works well when someone eats your cookies.  Quite a few gateways do this.
2) Since state is transported in all requests, it makes the HTTP requests 
stateless, which means that this approach scales really, really well. Though of 
course it needs to be applied consistently everywhere...

/Janne

On 7 Jan 2010, at 16:53, Andrew Jaquith wrote:

> Janne,
> 
> I picked the really nice option.  :) The solution is that when a post
> contains spam, we redirect to the editor page, but request a CAPTCHA
> be displayed. Re-editing is allowed.
> 
> Here is how it works. There are two collaborating parts: the
> SpamProtectTag and the SpamInterceptor. This is where we do a little
> magic. :)
> 
> Let's say you've loaded the editor for the first time (i.e., you
> haven't submitted). What we do is write out a special parameter, a
> "challenge request," when SpamProtectTag executes. The contents, for
> the FIRST GET, contain the string value of the enum
> Challenge.Request.CHALLENGE_ON_DEMAND. This means "no CAPTCHA is
> required, but when we interpret the post, get ready to generate one
> after redirect if there's spam in it." Then, we encrypt the parameter
> using CryptoUtil.
> 
> When SpamInterceptor intercepts the POST, we then look for the special
> challenge-request parameter. Two things can happen: a normal user
> submits (in which case the challenge-request parameter will be there),
> or s spammer submits (in which case it will not be).
> 
> In the normal case, we extract the challenge-request parameter,
> decrypt the contents and figure out that its value was
> CHALLENGE_ON_DEMAND. Because it has this value, we do NOT run the
> Captcha validator. We always run the content Inspection. If it
> contains spam, we add a ValidationError. If not, we return a null
> Resolution, the "save" event method executes further down the chain,
> and we are done.
> 
> Now, let's look at the spammer case.
> 
> If the challenge-request parameter is not present in the request, we
> KNOW that the user has been naughty, or that it is a spammer. So we
> add a ValidationError and redirect to the editor again.
> 
> On the second GET (i.e., after the POST and redirect back to the
> editor page), the SpamProtectTag executes again. This time, it knows
> there was spam because of the ValidationError, and this time will
> write out the enum Challenge.Request.CAPTCHA, which means "I just
> rendered a CAPTCHA, and when SpamInterceptor intercepts the post,
> validate it." Thus, when SpamInterceptor handles the post next time
> around, when it sees the CAPTCHA value it knows that it should do the
> CAPTCHA check.
> 
> (and then we lather, rinse, repeat until the user submits a correct
> CAPTCHA value)
> 
> That might sound complicated, but it's not -- the code is dead simple.
> The key is that the SpamProtectTag writes the current state out to the
> challenge-request parameter: CAPTCHA_ON_DEMAND is written out for the
> first-time GET, and on subsequent GETs, CAPTCHA will be written out if
> the contents are spam. All SpamInterceptor needs to do is obtain what
> the state was by retrieving and decrypting the challenge-request
> param.
> 
> There is one other wrinkle here, which is if we see the SpamProtectTag
> attribute "challenge" in the JSP, when the JSP author wants to force a
> password check or a CAPTCHA in all cases. In that case, we will write
> out the value Challenge.Request.CAPTCHA or Challenge.Request.PASSWORD
> and render the Challenge right away, even on that first post.
> 
> Naming-wise, I've gone back and forth about what the right names for
> everything should be. At the moment, I think Challenge.Request might
> better be called Challenge.State. :) Maybe CAPTCHA_ON_DEMAND becomes
> CHALLENGE_NOT_RENDERED, CAPTCHA becomes CAPTCHA_RENDERED, PASSWORD
> becomes PASSWORD_RENDERED? Not sure. But,
> 
> Oh, and one more thing. This basic technique -- encrypt some sort of
> state object, write it out as a hidden parameter to the form, then
> extract/decrypt on POST -- is something I gleaned from looking through
> the Stripes code. They do a lot of "state smuggling" as an alternative
> to storing server-side session attributes. I think it's a nice,
> low-overhead technique for situations like forms, which are
> essentially stateful. I use this technique also for smuggling the
> parameter names used for the spam tokens, for example.
> 
> Long post! Hope it made sense.
> 
> Andrew
> 
> 
> On Thu, Jan 7, 2010 at 3:26 AM, Janne Jalkanen <[email protected]> 
> wrote:
>> 
>> Errr... How do we determine what is a previous post? Spambots tend to make
>> each request from a  different address and ignore cookies. Or is it so that
>> if the post is determined to contain spam, you get a redirect to the editor
>> page, but this time with a captcha? 'cos that would be really nice, since it
>> allows you to re-edit the content.
>> 
>> /Janne
>> 
>> On Jan 5, 2010, at 18:10 , Andrew Jaquith wrote:
>> 
>>> Small correction (this is what happens when you type too quickly) --
>>> 
>>> CAPTCHAs are rendered, by default, ONLY if the previous post contains
>>> spam. The missing "only" makes all the difference. :)
>>> 
>>> The important point is that we are treating spam, essentially, as a
>>> form validation error.
>>> 
>>> If you don't submit spam, it won't produce a validation error, so you
>>> won't see a CAPTCHA. (Unless the JSP requires it, for example, when
>>> creating a user account).
>>> 
>>> Andrew
>>> 
>>> On Tue, Jan 5, 2010 at 10:46 AM, Andrew Jaquith
>>> <[email protected]> wrote:
>>>> 
>>>> Hi all --
>>>> 
>>>> Just thought I'd send a quick update on CATPCHA. Janne and I have had
>>>> some back-channel conversations about enhancements that I needed to
>>>> make.
>>>> 
>>>> Functionally, here's how the revised system will work:
>>>> 
>>>> - CAPTCHAs will be rendered on the same page as the submitting form,
>>>> but by default if the previous post contains spam (this is in line
>>>> with Janne's comments)
>>>> - CAPTCHA-rendering will be the responsibility of the wiki:SpamProtect
>>>> tag (as before)
>>>> - wiki:SpamProtect must be added as a child of a form or stripes:form
>>>> element (as before)
>>>> - If the JSP author wishes, they may require a CAPTCHA by adding an
>>>> attribute challenge="captcha" to the SpamProtect tag (new)
>>>> - In addition, a form can require password confirmation by adding
>>>> attribute challenge="password" to the SpamProtect tag (new)
>>>> - All of the back-end processing will be done by SpamInterceptor, in
>>>> collaboration with the content-inspection system (as before)
>>>> - Stripes ActionBeans that require spam protection need only add a
>>>> @SpamProtect annotation to the target event methods (as before)
>>>> 
>>>> We will add the SpamProtect tag to the page-edit form, comment form,
>>>> new user registration form, and user profile form. For new user
>>>> registration, a CAPTCHA will likely be required (challenge=captcha).
>>>> For user profile changes and post-install wiki configuration (coming
>>>> soon!), the user's password will be required to confirm
>>>> (challenge=password).
>>>> 
>>>> So, that's the functional design -- nice and simple. And we knock out
>>>> some JIRA bugs while we're at it (e.g., confirm password for account
>>>> changes)...
>>>> 
>>>> Andrew
>>>> 
>> 
>> 

Reply via email to