
CIA using data mining to keep smart

By Tabassum Zakaria
March 3, 2001 7:39 AM PT

LANGLEY, Va.--The CIA, faced with a daily avalanche of information, is using
new ``data mining'' technology to find useful nuggets within thousands of
documents and broadcasts in different languages.

The spy agency must sift through a barrage of information from both
classified and unclassified sources in varied formats such as hard text,
digital text, imagery, and audio in more than 35 languages. The Office of
Advanced Information Technology (AIT), part of the CIA's Directorate of
Science and Technology, is focused on finding solutions to the ``volume
challenge.'' ``We're not growing at a fast rate, but the amount of
information that comes into this place is growing by leaps and bounds,''
Larry Fairchild, AIT director, said in an interview this week in a basement
demonstration room at Central Intelligence Agency headquarters.

``How do we give folks technologies so that they are able to handle the big
increase in information they're going to have to deal with on a day-to-day
basis?'' he said. One computer tool called ``Oasis'' can convert audio
signals from television and radio broadcasts into text. It can distinguish
accented English for greater accuracy in the transcription, whether the
speaker is male or female, and whether one male or female voice is different
from another of the same gender. At the left of the screen of a transcribed
broadcast are labels ``Male 1,'' ``Female 1,'' ``Male 2,'' next to sentences.
If one voice is labeled with a name, the computer from then on will put that
name on anything else with that same voice. So for example if a broadcast by
Saudi-exile Osama bin Laden, whom the CIA considers a major threat to
Americans, was transcribed and labeled, every time his voice was detected the
computer would automatically label it.

Machine translator
If the machine translation appears off, the user can with a mouse click hear
the actual broadcast. For example, the demonstration showed a transcription
that read ``latest danger from hell'' but the audio said ``latest danger from
el nino.'' The computer cuts down on the time it would take a person to
transcribe a half-hour broadcast to 10 minutes from up to 90 minutes, a CIA
employee conducting the demonstration said. The CIA is planning to have Oasis
developed for different languages such as Arabic and Chinese. It also finds
similar meanings of words being searched, for example a broadcast might not
mention ``terrorism'' but might say ''car bombing,'' which the computer would
tag as ``terrorism'' so that anyone searching for that category would find
it. Currently the CIA's Foreign Broadcast Information Service is using it in
one Asian city and intends to have it in other regions such as the Middle
East this year. Another computer tool, ``FLUENT,'' enables a user to conduct
computer searches of documents that are in a language the user does not
understand.The user can put English words into the search field, such as
''nuclear weapons,'' and documents in languages such as Russian, Chinese and
Arabic pop up. The system will then translate the document and if it is seen
as useful, the analyst can send it to a human translator for more precision.
Languages that FLUENT can translate into English include Chinese, Korean,
Portuguese, Russian, Serbo-Croatian and Ukrainian. ``Data mining'' tools are
used to extract key pieces of information from a variety of intelligence
traffic such as on the flow of illegal drugs and also to keep track of
illicit financial transactions.

Analyzed documents on Iraq
Tools were developed to help CIA analysts on Iraq, who were asked to analyze
the agency's holdings on Iraqi war crime violations, about 1.2 million
documents going back to 1979. The Text Data Mining tool extracted and indexed
all words in the data so for example if an analyst was asked whether Iraq
ever used anthrax as a weapon, the analyst could open the tool and find
anthrax in the automatically generated index. That tool also counts the
frequency of word use and can handle various spellings of the same Iraqi
names or locations. There is also ``gifting technology'' which gives the
flavor of the key information of a document in a short paragraph, Fairchild
said.With the latest spy furor in the nation's capital, would any of the
tools help catch a spy? ``Yes, some of the things we're doing can,''
Fairchild said without details. ``We're looking at better technologies to put
in that area,'' he added. Another intelligence official, on condition of
anonymity, said: ``If they have this kind of technology to plumb the depths
of open sources, you can imagine what kind of technologies they have to track
down spies.''

