On Thursday, April 25, 2013 at 12:52 PM, Siddharth Kothari wrote: > Hi everyone, > >
Hello! > I am interested in a couple of projects - CC Web Content API, and Media > Fingerprinting Library. I wanted to see if I understand how these projects > fit in the OpenHome project before starting some contributions. > > The way I envision OpenHome is as a central system where the CC licensed > contents will be indexed by their hashes. The CC Web Content API could be > used by sites that aggregate user content, let's say: github, youtube, > slideshare to find out remixing of an existing CC licensed content. The Media > Fingerprinting library helps in determining deduplication of content (it > should also work when a content is cropped, clipped, blurred or quoted in > parts). Am I understanding this correctly? Roughly, yes. The DB needs to be laid out such that it can be queried from fingerprint alone (which is not like a MD5/SHA-1 hash). The Fingerprinting project should aim to catch cropped, distorted, resized, etc. files. > I find the Fingerprinting project fascinating, but delving more into the idea > and looking at pHash.org (http://pHash.org), I realized it already implements > fingerprinting for image, audio, and video content and provides this as a > nice API - http://www.phash.org/docs/howto.html. Unless we find GPLv3 too > restrictive, I can't think of a good reason to not use this. Perhaps, pHash > can be extended to support for text and compound media types (ppt, pdf). But > I think starting with pHash and supporting text using w-shingling can be a > pretty good start for the fingerprinting library. I would like to hear more > thoughts on this. I have heard mixed reviews of pHash. I think a first step in the project should be to come up with a set of tests and metrics, and try out pHash as well as other solutions. > The CC Web Content API project sounds appealing, since it is the glue that > binds other parts, and perhaps crucial to the successful implementation of > OpenHome project. Imo, this could perhaps be meshed with the Fingerprinting > project (if pHash is used as a base). Essentially, the current Fingerprinting > task is reduced to exposing the pHash library via a nice API. And over the > time, pHash/Fingerprinting algorithms can be added/improved. Yes, if you'd like to focus on the Web content API, then you can abstract away the fingerprinting portion. Even straight-up SHA-1 would work for a demo of the Web content API (it wouldn't catch modified images, but it would catch the same file in other webpages). > Let me know if I am making sense. Sorry if it's difficult to follow, we can > carry this conversation on IRC. My nick is sids_aquarius. Sounds good! I'm traveling until Sunday, but will try to drop by when possible. Dan
_______________________________________________ cc-devel mailing list [email protected] http://lists.ibiblio.org/mailman/listinfo/cc-devel
