On Thursday, April 25, 2013 at 12:52 PM, Siddharth Kothari wrote:
> Hi everyone,
> 
> 


Hello!

> I am interested in a couple of projects - CC Web Content API, and Media 
> Fingerprinting Library. I wanted to see if I understand how these projects 
> fit in the OpenHome project before starting some contributions. 
> 
> The way I envision OpenHome is as a central system where the CC licensed 
> contents will be indexed by their hashes. The CC Web Content API could be 
> used by sites that aggregate user content, let's say: github, youtube, 
> slideshare to find out remixing of an existing CC licensed content. The Media 
> Fingerprinting library helps in determining deduplication of content (it 
> should also work when a content is cropped, clipped, blurred or quoted in 
> parts). Am I understanding this correctly?

Roughly, yes. The DB needs to be laid out such that it can be queried from 
fingerprint alone (which is not like a MD5/SHA-1 hash). The Fingerprinting 
project should aim to catch cropped, distorted, resized, etc. files.

> I find the Fingerprinting project fascinating, but delving more into the idea 
> and looking at pHash.org (http://pHash.org), I realized it already implements 
> fingerprinting for image, audio, and video content and provides this as a 
> nice API - http://www.phash.org/docs/howto.html. Unless we find GPLv3 too 
> restrictive, I can't think of a good reason to not use this. Perhaps, pHash 
> can be extended to support for text and compound media types (ppt, pdf). But 
> I think starting with pHash and supporting text using w-shingling can be a 
> pretty good start for the fingerprinting library. I would like to hear more 
> thoughts on this.

I have heard mixed reviews of pHash. I think a first step in the project should 
be to come up with a set of tests and metrics, and try out pHash as well as 
other solutions.

> The CC Web Content API project sounds appealing, since it is the glue that 
> binds other parts, and perhaps crucial to the successful implementation of 
> OpenHome project. Imo, this could perhaps be meshed with the Fingerprinting 
> project (if pHash is used as a base). Essentially, the current Fingerprinting 
> task is reduced to exposing the pHash library via a nice API. And over the 
> time, pHash/Fingerprinting algorithms can be added/improved.

Yes, if you'd like to focus on the Web content API, then you can abstract away 
the fingerprinting portion. Even straight-up SHA-1 would work for a demo of the 
Web content API (it wouldn't catch modified images, but it would catch the same 
file in other webpages).

> Let me know if I am making sense. Sorry if it's difficult to follow, we can 
> carry this conversation on IRC. My nick is sids_aquarius.

Sounds good! I'm traveling until Sunday, but will try to drop by when possible.

Dan 

_______________________________________________
cc-devel mailing list
[email protected]
http://lists.ibiblio.org/mailman/listinfo/cc-devel

Reply via email to