Re: Scientists Work on Software to Scan Arabic

2005-01-28 Thread Justin
On 2005-01-28T20:03:22-0500, R.A. Hettinga wrote:
> <http://www.nytimes.com/aponline/technology/AP-Arabic-Software.html?oref=login&pagewanted=print&position=>
> The New York Times
> January 27, 2005
> Scientists Work on Software to Scan Arabic
>  By THE ASSOCIATED PRESS
> 
> ``The whole Internet is skewed toward people who speak English,'' said Venu
> Govindaraju, director of the Center for Unified Biometrics and Sensors at
> the University at Buffalo, where the software is being developed.

Someone give that man a brain, and a cookie.  I don't live near NY.

The internet has nothing to do with scanning written/printed arabic
texts.

He obviously intended to squeeze a complaint about the internet into an
article about scanning printed/written documents.  The reason the
internet is "skewed" is because these idiots want others to "fix" the
internet to accommodate their languages.  As a result, much of the
non-western-language support in software is done by westerners, and so
doesn't work.

-- 
"War is the father and king of all, and some he shows as gods, others as
men; some he makes slaves, others free."  --Heraclitus (Kahn.83/D-K.53) 



Scientists Work on Software to Scan Arabic

2005-01-28 Thread R.A. Hettinga
<http://www.nytimes.com/aponline/technology/AP-Arabic-Software.html?oref=login&pagewanted=print&position=>

The New York Times

January 27, 2005

Scientists Work on Software to Scan Arabic
 By THE ASSOCIATED PRESS


 Filed at 8:09 a.m. ET

BUFFALO, N.Y. (AP) -- Computer scientists are developing software to scan
Arabic documents, including handwritten ones, for specific words and
phrases, filling a void that became apparent following the Sept. 11.
attacks.

Besides helping with intelligence gathering, the software should expand
access to modern and ancient Arabic manuscripts. It will allow Arabic
writings to be digitized and posted on the Web.

``The whole Internet is skewed toward people who speak English,'' said Venu
Govindaraju, director of the Center for Unified Biometrics and Sensors at
the University at Buffalo, where the software is being developed.

Govindaraju fears that if optical character recognition software isn't
developed for a particular language, ``then all the classic texts in that
language will disappear into oblivion.''

Bill Young, an Arab language specialist at the University of Maryland, said
the software could help scan through masses of typed pages for specific
names or words, though he cautioned that handwritten Arabic presents
serious challenges for computers.

For instance, the word mas'uul, meaning responsible, can be written in more
than one way, he said. So the software would have to be given instructions
about possible variations.

Govindaraju, who helped develop software to recognize handwritten addresses
in English, said the Arabic software would take into account the fact that
characters may take different forms depending on where within a word they
appear, and that Arabic vowels are pronounced but often not written.

-- 
-
R. A. Hettinga 
The Internet Bearer Underwriting Corporation <http://www.ibuc.com/>
44 Farquhar Street, Boston, MA 02131 USA
"... however it may deserve respect for its usefulness and antiquity,
[predicting the end of the world] has not been found agreeable to
experience." -- Edward Gibbon, 'Decline and Fall of the Roman Empire'