Just saw this announcement go by:


Constructing ImageNet

Data sets are essential in computer vision and content based image retrieval 
research. We
present the work in progress for constructing ImageNet, a large scale image 
data set based
on the Princeton WordNet.
The goal is to associate more than 1000 clean images with each node of 
WordNet, which
consists of ~30,000 ( estimated ) imagable nodes. We build a prototype system 
constructing ImageNet, as a first step toward large scale deployment. For each 
node of
WordNet, which is a synonym set (synset) for a single concept, we collect 
candidate images
from the Internet and clean up them with semi-automatic labeling.  We train 
classifiers from human labeled data and use active learning to substantially 
speed up the
labeling process. We also developed a web interface for massive online human 
labeling. We
demonstrate the effectiveness of our system with results from a subset of 

Reading list:

Text book:

Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006.
Chapter 1,2,8,14.
Modern Operating System, Tanenbaum.

Animals on the Web, Berg, Forsyth, CVPR06
OPTIMOL: automatic Online Picture collecTion via Incremental MOdel Learning, 
Li, Wang,
Fei-Fei, CVPR07 Learning Object Categories from Google's image Search, Fergus, 
Perona, Zissermaman, ICCV05 Harvesting Image Databases from the Web, Scroff, 
From Aardvark to Zorro: A Benchmark of Mammal Images, Fink, Ullman, 
Tiny Images, Torralba, Fergus, Freeman, TechReport MIT, 2007 Labeling Images 
with a
Computer Game. Luis von Ahn and Laura Dabbish, CHI04
LabelMe: a database and web-based tool for image annotation, Russell, 
Torralba, IJCV07
Introduction to a large scale general purpose groundtruth dataset:
methodology, annotation tool, and benchmarks, Z.Y. Yao, X. Yang, and S.C. Zhu, 
Combining active and semi-supervised learning for spoken language 
understanding, Tur,
Hakkani-Tur, Schapire,  Speech Communication, 05 Online boosting and vision, 

