Re: [Ldsoss] Digitizing handwritten records by stopping spammers (or vice versa)

2007-10-02 Thread Jacob Sorensen
I've seen this idea before, and the main problem is that digitizing scanned
words and CAPTCHA are at cross-purposes.  The problem in digitizing is that
the computer doesn't know the word.  In CAPTCHA, the computer knows the
word, and it needs to in order to validate the user.  If you don't know for
sure that the word was typed in correctly, you can't validate the user.

CAPTCHA words can be used to validate once they're known, but that kind of
defeats the purpose.  You could just take the majority answer, but in
order to gather a strong majority you would have to let some minority
answers through, some of which may be invalid users who should not be
allowed access.

I suspect using digitized text for CAPTCHA would not provide as much use on
the digitization side as one might think.

Jake

On 10/2/07, Jon D. [EMAIL PROTECTED] wrote:

 Here's an idea...
 Some of you may have seen today's (and previous)
 Slashdot links on reCaptcha, a cool idea
 that's starting to be more commonly-used:

 http://news.bbc.co.uk/2/hi/technology/7023627.stm
 http://recaptcha.net/learnmore.html

 Basically they're using a CAPTCHA to digitize old
 scanned books.[1]

 This could be applied to handwritten historic records.
 However, it might be hard to trust regular schmoes to
 correctly transcribe handwritten historic texts.  One
 way to address this might be to just ask more people
 the same word, and if they all (or mostly) match, we
 can be fairly certain it's transcribed correctly.
 Or this could just be used to verify a previous manual
 transcription.

 Thoughts?

 -Jon



 [1] FYI, a CAPTCHA is where you have to type
 a distorted word - to stop spammers  hackers.  For
 example, when you mistype your password to enter gmail
 or yahoo mail enough times, it'll require you to type
 in a word that's blurred.  The new application of this
 anti-spam technique is to use scanned books as the
 source of words.





 
 Yahoo! oneSearch: Finally, mobile search
 that gives answers, not web links.
 http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC
 ___
 Ldsoss mailing list
 Ldsoss@lists.ldsoss.org
 http://lists.ldsoss.org/mailman/listinfo/ldsoss

___
Ldsoss mailing list
Ldsoss@lists.ldsoss.org
http://lists.ldsoss.org/mailman/listinfo/ldsoss


Re: [Ldsoss] Digitizing handwritten records by stopping spammers (or vice versa)

2007-10-02 Thread Jesse Stay
On 10/2/07, Jon D. [EMAIL PROTECTED] wrote:Here's an idea...
 Some of you may have seen today's (and previous)
 Slashdot links on reCaptcha, a cool idea
 that's starting to be more commonly-used:

 http://news.bbc.co.uk/2/hi/technology/7023627.stm
 http://recaptcha.net/learnmore.html
 Basically they're using a CAPTCHA to digitize old
 scanned books.[1]

I blogged about this several months ago.  I think it's awesome technology
and a great way to use something intended for another purpose.  I've
implemented it on my blog, and strongly encourage any others to use it as
well (have had issues on mobile phones with it however - not sure if they've
worked around that).

Jesse


-- 

#!/usr/bin/perl
$^=q;@!~|{krwyn{u$$Sn||n|}j=$$Yn{uQjltn{  0gFzD gD, 00Fz, 0,,( 0hF
0g)F/=, 0 L$/GEIFewe{,$/ 0C$~ @=,m,|,(e 0.), 01,pnn,y{ rw}
;,$0=q,$,,($_=$^)=~y,$/ C-~@=\n\r,-~$:-u/ #y,d,s,(\$.),$1,gee,print
___
Ldsoss mailing list
Ldsoss@lists.ldsoss.org
http://lists.ldsoss.org/mailman/listinfo/ldsoss


Re: [Ldsoss] Digitizing handwritten records by stopping spammers (or vice versa)

2007-10-02 Thread Jon D.
But most of these points are in fact addressed by
reCaptcha.  The idea given below was simply using
handwritten texts, instead of printed books as input,
which would require just a little bit more
verification of accuracy.

-Jon


--- Jacob Sorensen [EMAIL PROTECTED] wrote:

 I've seen this idea before, and the main problem is
 that digitizing scanned
 words and CAPTCHA are at cross-purposes.  The
 problem in digitizing is that
 the computer doesn't know the word.  In CAPTCHA, the
 computer knows the
 word, and it needs to in order to validate the user.
  If you don't know for
 sure that the word was typed in correctly, you can't
 validate the user.
 
 CAPTCHA words can be used to validate once they're
 known, but that kind of
 defeats the purpose.  You could just take the
 majority answer, but in
 order to gather a strong majority you would have to
 let some minority
 answers through, some of which may be invalid users
 who should not be
 allowed access.
 
 I suspect using digitized text for CAPTCHA would not
 provide as much use on
 the digitization side as one might think.
 
 Jake
 
 On 10/2/07, Jon D. [EMAIL PROTECTED] wrote:
 
  Here's an idea...
  Some of you may have seen today's (and previous)
  Slashdot links on reCaptcha, a cool idea
  that's starting to be more commonly-used:
 
  http://news.bbc.co.uk/2/hi/technology/7023627.stm
  http://recaptcha.net/learnmore.html
 
  Basically they're using a CAPTCHA to digitize old
  scanned books.[1]
 
  This could be applied to handwritten historic
 records.
  However, it might be hard to trust regular schmoes
 to
  correctly transcribe handwritten historic texts. 
 One
  way to address this might be to just ask more
 people
  the same word, and if they all (or mostly) match,
 we
  can be fairly certain it's transcribed correctly.
  Or this could just be used to verify a previous
 manual
  transcription.
 
  Thoughts?
 
  -Jon
 
 
 
  [1] FYI, a CAPTCHA is where you have to type
  a distorted word - to stop spammers  hackers. 
 For
  example, when you mistype your password to enter
 gmail
  or yahoo mail enough times, it'll require you to
 type
  in a word that's blurred.  The new application of
 this
  anti-spam technique is to use scanned books as the
  source of words.
 
 
 
 
 
 


  Yahoo! oneSearch: Finally, mobile search
  that gives answers, not web links.
 

http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC
  ___
  Ldsoss mailing list
  Ldsoss@lists.ldsoss.org
  http://lists.ldsoss.org/mailman/listinfo/ldsoss
 
  ___
 Ldsoss mailing list
 Ldsoss@lists.ldsoss.org
 http://lists.ldsoss.org/mailman/listinfo/ldsoss
 



   

Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for 
today's economy) at Yahoo! Games.
http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow  
___
Ldsoss mailing list
Ldsoss@lists.ldsoss.org
http://lists.ldsoss.org/mailman/listinfo/ldsoss


Re: [Ldsoss] Digitizing handwritten records by stopping spammers (or vice versa)

2007-10-02 Thread Jacob Sorensen
Having a valid CAPTCHA and then a digitization problem is okay, but
recognize it doesn't mean that CAPTCHA can validly be used for digitization,
or vice versa -- it just means that you've added a service element onto
the CAPTCHA so people can do some useful work at the same time they are
validating themselves (on a different problem).

Using handwriting for CAPTCHA can be good for some things, but you have to
be careful because certain texts can have writing that regular people will
misrecognize.  For example: a 200 year old American text with a letter that
will get entered by the majority as B.

Jake

On 10/2/07, Jon D. [EMAIL PROTECTED] wrote:

 But most of these points are in fact addressed by
 reCaptcha.  The idea given below was simply using
 handwritten texts, instead of printed books as input,
 which would require just a little bit more
 verification of accuracy.

 -Jon


 --- Jacob Sorensen [EMAIL PROTECTED] wrote:

  I've seen this idea before, and the main problem is
  that digitizing scanned
  words and CAPTCHA are at cross-purposes.  The
  problem in digitizing is that
  the computer doesn't know the word.  In CAPTCHA, the
  computer knows the
  word, and it needs to in order to validate the user.
   If you don't know for
  sure that the word was typed in correctly, you can't
  validate the user.
 
  CAPTCHA words can be used to validate once they're
  known, but that kind of
  defeats the purpose.  You could just take the
  majority answer, but in
  order to gather a strong majority you would have to
  let some minority
  answers through, some of which may be invalid users
  who should not be
  allowed access.
 
  I suspect using digitized text for CAPTCHA would not
  provide as much use on
  the digitization side as one might think.
 
  Jake
 
  On 10/2/07, Jon D. [EMAIL PROTECTED] wrote:
  
   Here's an idea...
   Some of you may have seen today's (and previous)
   Slashdot links on reCaptcha, a cool idea
   that's starting to be more commonly-used:
  
   http://news.bbc.co.uk/2/hi/technology/7023627.stm
   http://recaptcha.net/learnmore.html
  
   Basically they're using a CAPTCHA to digitize old
   scanned books.[1]
  
   This could be applied to handwritten historic
  records.
   However, it might be hard to trust regular schmoes
  to
   correctly transcribe handwritten historic texts.
  One
   way to address this might be to just ask more
  people
   the same word, and if they all (or mostly) match,
  we
   can be fairly certain it's transcribed correctly.
   Or this could just be used to verify a previous
  manual
   transcription.
  
   Thoughts?
  
   -Jon
  
  
  
   [1] FYI, a CAPTCHA is where you have to type
   a distorted word - to stop spammers  hackers.
  For
   example, when you mistype your password to enter
  gmail
   or yahoo mail enough times, it'll require you to
  type
   in a word that's blurred.  The new application of
  this
   anti-spam technique is to use scanned books as the
   source of words.
  
  
  
  
  
  
 

 
   Yahoo! oneSearch: Finally, mobile search
   that gives answers, not web links.
  
 
 http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC
   ___
   Ldsoss mailing list
   Ldsoss@lists.ldsoss.org
   http://lists.ldsoss.org/mailman/listinfo/ldsoss
  
   ___
  Ldsoss mailing list
  Ldsoss@lists.ldsoss.org
  http://lists.ldsoss.org/mailman/listinfo/ldsoss
 





 
 Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated
 for today's economy) at Yahoo! Games.
 http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow
 ___
 Ldsoss mailing list
 Ldsoss@lists.ldsoss.org
 http://lists.ldsoss.org/mailman/listinfo/ldsoss

___
Ldsoss mailing list
Ldsoss@lists.ldsoss.org
http://lists.ldsoss.org/mailman/listinfo/ldsoss


Re: [Ldsoss] Digitizing handwritten records by stopping spammers (or vice versa)

2007-10-02 Thread Bryan Murdock
The inventor of the captcha calls this Human Computation.  He gave an
interesting talk at Google on the subject that you can watch here:

http://video.google.com/videoplay?docid=-8246463980976635143

He presents it very well and even non-techies (like my wife) enjoyed
watching this when I showed them.  OK, it was just my wife, but I'm sure
others would like it too.

Bryan
___
Ldsoss mailing list
Ldsoss@lists.ldsoss.org
http://lists.ldsoss.org/mailman/listinfo/ldsoss


Re: [Ldsoss] Digitizing handwritten records by stopping spammers (or vice versa)

2007-10-02 Thread m h
Sounds like a good way to do genealogical indexing. Someone should
tell the church ;)
Also sounds like an interesting business idea.  Farm out captchas to
blogs, and pay people for using the captcha

On 10/2/07, Jon D. [EMAIL PROTECTED] wrote:
 Here's an idea...
 Some of you may have seen today's (and previous)
 Slashdot links on reCaptcha, a cool idea
 that's starting to be more commonly-used:

 http://news.bbc.co.uk/2/hi/technology/7023627.stm
 http://recaptcha.net/learnmore.html

 Basically they're using a CAPTCHA to digitize old
 scanned books.[1]

 This could be applied to handwritten historic records.
 However, it might be hard to trust regular schmoes to
 correctly transcribe handwritten historic texts.  One
 way to address this might be to just ask more people
 the same word, and if they all (or mostly) match, we
 can be fairly certain it's transcribed correctly.
 Or this could just be used to verify a previous manual
 transcription.

 Thoughts?

 -Jon



 [1] FYI, a CAPTCHA is where you have to type
 a distorted word - to stop spammers  hackers.  For
 example, when you mistype your password to enter gmail
 or yahoo mail enough times, it'll require you to type
 in a word that's blurred.  The new application of this
 anti-spam technique is to use scanned books as the
 source of words.




 
 Yahoo! oneSearch: Finally, mobile search
 that gives answers, not web links.
 http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC
 ___
 Ldsoss mailing list
 Ldsoss@lists.ldsoss.org
 http://lists.ldsoss.org/mailman/listinfo/ldsoss

___
Ldsoss mailing list
Ldsoss@lists.ldsoss.org
http://lists.ldsoss.org/mailman/listinfo/ldsoss


Re: [Ldsoss] Digitizing handwritten records by stopping spammers (or vice versa)

2007-10-02 Thread Bryan Murdock
On 10/2/07, Bryan Murdock [EMAIL PROTECTED] wrote:
 The inventor of the captcha calls this Human Computation.  He gave an
 interesting talk at Google on the subject that you can watch here:

 http://video.google.com/videoplay?docid=-8246463980976635143

 He presents it very well and even non-techies (like my wife) enjoyed
 watching this when I showed them.  OK, it was just my wife, but I'm sure
 others would like it too.

 Bryan

OK, I just read the fine article and it's the same guy doing the
reCAPTCHA thing, Luis von Ahn.  This is cool stuff.  If you don't have
time to watch the video above (which you really should) I'll just tell
you.  He does a similar thing for image recognition using some online
games:

http://www.espgame.org/
http://www.peekaboom.org/

Genius.  Using it for genealogy indexing seems like a great idea too.

Bryan
___
Ldsoss mailing list
Ldsoss@lists.ldsoss.org
http://lists.ldsoss.org/mailman/listinfo/ldsoss


Re: [Ldsoss] Digitizing handwritten records by stopping spammers (or vice versa)

2007-10-02 Thread Jesse Stay
On 10/2/07, m h [EMAIL PROTECTED] wrote:

 Sounds like a good way to do genealogical indexing. Someone should
 tell the church ;)
 Also sounds like an interesting business idea.  Farm out captchas to
 blogs, and pay people for using the captcha


Seth Godin actually already proposed this idea - it's open for anyone to
try!:

http://sethgodin.typepad.com/seths_blog/2006/12/commercializing.html

Another thing to look into is Amazon's Mechanical Turk:

http://www.mturk.com/mturk/welcome

Jesse

-- 

#!/usr/bin/perl
$^=q;@!~|{krwyn{u$$Sn||n|}j=$$Yn{uQjltn{  0gFzD gD, 00Fz, 0,,( 0hF
0g)F/=, 0 L$/GEIFewe{,$/ 0C$~ @=,m,|,(e 0.), 01,pnn,y{ rw}
;,$0=q,$,,($_=$^)=~y,$/ C-~@=\n\r,-~$:-u/ #y,d,s,(\$.),$1,gee,print
___
Ldsoss mailing list
Ldsoss@lists.ldsoss.org
http://lists.ldsoss.org/mailman/listinfo/ldsoss


Re: [Ldsoss] Digitizing handwritten records by stopping spammers (or vice versa)

2007-10-02 Thread m h
On 10/2/07, Jesse Stay [EMAIL PROTECTED] wrote:
 On 10/2/07, m h [EMAIL PROTECTED] wrote:
  Sounds like a good way to do genealogical indexing. Someone should
  tell the church ;)
  Also sounds like an interesting business idea.  Farm out captchas to
  blogs, and pay people for using the captcha
 

 Seth Godin actually already proposed this idea - it's open for anyone to
 try!:

 http://sethgodin.typepad.com/seths_blog/2006/12/commercializing.html

Great minds think alike (or think the same thing a year later).  One
issue with his idea is he doesn't use the results for anything
useful... (Admittedly if someone made me translate immigration records
to reply to a blog I probably wouldn't do it much)


 Another thing to look into is Amazon's Mechanical Turk:


Yes, this is a specialized version of the turk.  As I understand it
now, genealogical indexing is done by Volunteer Turks

-matt
___
Ldsoss mailing list
Ldsoss@lists.ldsoss.org
http://lists.ldsoss.org/mailman/listinfo/ldsoss