Hi Josh,

I briefly looked at the ImageNet description at the Princeton WordNet site.  It 
does not reveal whether the images are open source to the extent this new data 
can be linked and distributed with WordNet, which has a very permissive 


----- Original Message ----
From: "J Storrs Hall, PhD"
To: agi@v2.listbox.com
Sent: Friday, May 2, 2008
Subject: [agi] upcoming oral at Princeton

Just saw this announcement go by:


Constructing ImageNet

Data sets are essential in computer vision and content based image retrieval 
research. We
present the work in progress for constructing ImageNet, a large scale image 
data set based
on the Princeton WordNet.
The goal is to associate more than 1000 clean images with each node of 
WordNet, which
consists of ~30,000 ( estimated ) imagable nodes. We build a prototype system 
constructing ImageNet, as a first step toward large scale deployment. For each 
node of
WordNet, which is a synonym set (synset) for a single concept, we collect 
candidate images
from the Internet and clean up them with semi-automatic labeling.  We train 
classifiers from human labeled data and use active learning to substantially 
speed up the
labeling process. We also developed a web interface for massive online human 
labeling. We
demonstrate the effectiveness of our system with results from a subset of 

Reading list:

Text book:

Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006.
Chapter 1,2,8,14.
Modern Operating System, Tanenbaum.

Animals on the Web, Berg, Forsyth, CVPR06
OPTIMOL: automatic Online Picture collecTion via Incremental MOdel Learning, 
Li, Wang,
Fei-Fei, CVPR07 Learning Object Categories from Google's image Search, Fergus, 
Perona, Zissermaman, ICCV05 Harvesting Image Databases from the Web, Scroff, 
>From Aardvark to Zorro: A Benchmark of Mammal Images, Fink, Ullman, 
Tiny Images, Torralba, Fergus, Freeman, TechReport MIT, 2007 Labeling Images 
with a
Computer Game. Luis von Ahn and Laura Dabbish, CHI04
LabelMe: a database and web-based tool for image annotation, Russell, 
Torralba, IJCV07
Introduction to a large scale general purpose groundtruth dataset:
methodology, annotation tool, and benchmarks, Z.Y. Yao, X. Yang, and S.C. Zhu, 
Combining active and semi-supervised learning for spoken language 
understanding, Tur,
Hakkani-Tur, Schapire,  Speech Communication, 05 Online boosting and vision, 

