Re: [CODE4LIB] Easiest way to tag thousands of images

Shaun Ellis Tue, 20 Nov 2012 13:05:34 -0800

Kyle,
I think we need to break down your question...

It seems that the thematic folders and the file names may be okdescriptive tag sources to start with. Perhaps you could try toidentify patterns to extract information for tags (i.e., "hall","committee", "holiday", etc.) You could traverse the file system, anduse the Google Data API for Picasa, to do an initial upload with tagsgenerated from those folders and file name patterns. You will need*some* kind of user input to get to more detail.

So, are you asking whether it's possible for software to generatemetadata for what is pictured in a photograph without user input?

Are you intending for facial recognition software to identify who is ina picture or just that there are people in it?

I think you're on the right track by using Picasa. I think it may beeasier for people in those departmental units to help tag people, andadd descriptions through that interface. Then you can write some kindof script that slurps them into your repository.


Are you using the Google Data API for Picasa?
https://developers.google.com/picasa-web/

For batching jobs, you might look at GoogleCL:
http://code.google.com/p/googlecl/wiki/ExampleScripts#Picasa

If you don't want to or are not allowed to upload all your photos to theweb first, you will probably have to look at Pyfaces[http://code.google.com/p/pyfaces/] or Open Source Computer Vision[http://opencv.willowgarage.com/wiki/FaceDetection], but they smell ofpain to me.


-Shaun

On 11/20/12 2:54 PM, Kyle Banerjee wrote:

I am in the process of examining how photo collections maintained by campus
units can be incorporated into the library's repository. In all cases that
I've had to deal with so far, they're just using the file system -- i.e.
traversing folders that arrange images thematically to file names that
indicate the content.

Each of these collections contains many thousands of images. This means
that it's a hassle for them to find images, but also that there's no way
library staff alone will be able to handle all the metadata creation.

I'd like to use something slick like picasa to help them out (facial
recognition is an especially big deal for us). But I'm finding the metadata
to be both minimalist and clunky to work with so I wanted a reality check
to see if I'm not doing this the dumb way. Things I've noticed:


    1. Picasa appears to store info in xmp rather than exif which is great
    given the limitations of exif. However, I haven't yet found a way to use
    more than a couple fields. The caption shows up in a description field, and
    they tag show up in subjects. But aside from that, I'm at a loss of how to
    populate other DC fields through the interface.

    2. Facial recognition metadata doesn't show up in xmp at all. However, I
    can get that by parsing .picasa.ini and contacts.xml (clunky, but doable).
    I'm kind of tempted to tell people to go into albums and batch tag the
    people albums since it's going to be fun explaining how to locate these
    hidden files.

My real question is whether anyone has come up with a really good way to
assign metadata to thousands of photos, preferably in batch fashion? Thanks,

kyle


--
Shaun D. Ellis
Digital Library Interface Developer
Firestone Library, Princeton University
voice: 609.258.1698 | sha...@princeton.edu

Re: [CODE4LIB] Easiest way to tag thousands of images

Reply via email to