[Medianews] Data-mining pioneer joins Microsoft

Rob Mon, 15 May 2006 14:47:13 -0700

Data-mining pioneer joins Microsoft
By Michael Kanellos
Staff Writer, CNET News.com
Published: May 15, 2006, 12:06 PM PDT

http://news.com.com/Data-mining+pioneer+joins+Microsoft/2100-1022_3-6072321.html?tag=nefd.top

Not every big name in search is going to Google.

Rakesh Agrawal, who is credited with creating data mining, or the
science of extracting trends from large and often disparate databases,
has left IBM to become a Microsoft technical fellow in the company's
Search Labs.

Large tech companies for years have tried to woo each other's top
scientists, and in the search and computer science field, Google has
lately been getting most of them. Google pulled away Microsoft's Kai-Fu
Lee to run its China labs, which led to a lawsuit. Google also snagged
search expert Udi Manber from Amazon.com. (Vint Cerf and his ceremonial
hat of many colors joined Google as well.)

Agrawal--who had been an IBM fellow, the company's highest title for
researchers--is one of the better-known scientists in data extraction
and databases. Data mining is a hot topic because it has emerged that
the federal government has begun to use the method to examine millions
of phone records. Corporations, though, have exploited it for years as a
way to understand customer behavior and enhance their own Web traffic.

Although not associated in the public mind with search, IBM is one of
the major forces in the field. The company was one of the first to
devise a video search engine, and in March, it bought Language Analysis
Systems, a search company focused on finding references to individuals,
even if their name gets spelled differently in different databases.

Agrawal joined Microsoft a few weeks ago, but it was not publicly
announced. Microsoft created its Search Labs in January.

The idea behind data mining came up during a lunchtime conversation in
the early 1990s between Agrawal and an executive from the British
department store chain Marks & Spencer. The store chain had been
collecting all sorts of data but didn't know what to do with it.

Agrawal and his team began devising algorithms for asking open-ended
queries, eventually authoring a 1993 paper describing data mining. The
paper has been cited in more than 650 other studies, making it one of
the most widely cited papers of its kind.

We were not even sure we should send it because we thought people might
think it was too simple-minded," Agrawal said.

More recently, Agrawal has been working on randomization. In this
technique, data is scrambled before it gets entered into a database.
Nonetheless, mathematicians, by applying probabilistic computing
techniques to the scrambled data, can come up with patterns that are
similar to what the actual data would have shown.

Thus, a corporation can get a handle on its 18- to 24-year-old buyers
while privacy is ensured; the original data is never entered in the
database.

The scientific underpinnings of randomization have been the subject of a
few academic papers. The short explanation is as follows: "It's the
beauty of math," Agrawal said in an interview a few years ago.

Reply with a "Thank you" if you liked this post.
_____________________________

MEDIANEWS mailing list
medianews@twiar.org
To unsubscribe send an email to:
[EMAIL PROTECTED]

[Medianews] Data-mining pioneer joins Microsoft

Reply via email to