Re: [cc-devel] Queries about OpenHome projects Web Content API, and Fingerprinting Library

Siddharth Kothari Tue, 30 Apr 2013 05:52:44 -0700

I built a basic WebContent library which can scan custom medias and
add a solid border if OpenHome service thinks it's a CC-licensed
content. Readme and demo here -
http://sids-aquarius.github.io/webcontent/.


The most glaring missing parts are -
* It doesn't track for changes in the DOM. It should ideally scan for
content in the nodes that are changed.
* Add support for scanning text.
* the fingerprinting is an identity function currently, but the
WebContent library has nothing to do with it.

A code-review would be helpful. It's a small lib (< 50 lines) -
https://github.com/sids-aquarius/webcontent/blob/master/lib/webcontent.js.
I would like to know if I am thinking in the correct direction.

I would also like to know about challenges I should anticipate over
time. Imho, the WebContent project sounds a bit under-scoped. The only
challenge I can think is making this robust to work in a variety of
settings.

Dan - Would you be on IRC sometime today? I wanted to chat about the
fingerprinting library.

Thanks,
Siddharth

On Mon, Apr 29, 2013 at 3:05 AM, Siddharth Kothari
<[email protected]> wrote:
> Hi Dan,
>
> I have hosted my fork of Open Home here -
> http://106.187.50.124:50124/. Is the schema for the DB laid? I had to
> create one and change one of the arguments in getItem() and
> updateItem() call, replacing HashKeyElement with the actual attribute
> name to get it working. Also, look out for my pull request (it
> contains minor fixes).
>
> About the task -
> I will wait till we discuss more about the Fingerprinting library.
> Meanwhile, I will get started with the Web Content library.
> 1. For a start, I am planning to build/use a basic html parser that
> can fetch all the sources from the <img>/<video>/<audio> tags and send
> them over as a json object to Open Home server.
> 2. The Open Home server returns a json object indicating which of the
> contents are CC-licensed.
> 3. Inject some markup to indicate the contents that are CC-licensed.
> As far as fingerprinting is concerned, I will index SHA-1 hashes of
> few static content files (or I can hash the files I retrieve from a
> user's Dropbox account).
>
> Let me know if you would like to make some amends in the above.
>
> Thanks,
> Siddharth
>
> On Fri, Apr 26, 2013 at 7:31 AM, Dan Mills <[email protected]> wrote:
>> On Thursday, April 25, 2013 at 12:52 PM, Siddharth Kothari wrote:
>>
>> Hi everyone,
>>
>>
>> Hello!
>>
>> I am interested in a couple of projects - CC Web Content API, and Media
>> Fingerprinting Library. I wanted to see if I understand how these projects
>> fit in the OpenHome project before starting some contributions.
>>
>> The way I envision OpenHome is as a central system where the CC licensed
>> contents will be indexed by their hashes. The CC Web Content API could be
>> used by sites that aggregate user content, let's say: github, youtube,
>> slideshare to find out remixing of an existing CC licensed content. The
>> Media Fingerprinting library helps in determining deduplication of content
>> (it should also work when a content is cropped, clipped, blurred or quoted
>> in parts). Am I understanding this correctly?
>>
>>
>> Roughly, yes. The DB needs to be laid out such that it can be queried from
>> fingerprint alone (which is not like a MD5/SHA-1 hash). The Fingerprinting
>> project should aim to catch cropped, distorted, resized, etc. files.
>>
>> I find the Fingerprinting project fascinating, but delving more into the
>> idea and looking at pHash.org, I realized it already implements
>> fingerprinting for image, audio, and video content and provides this as a
>> nice API - http://www.phash.org/docs/howto.html. Unless we find GPLv3 too
>> restrictive, I can't think of a good reason to not use this. Perhaps, pHash
>> can be extended to support for text and compound media types (ppt, pdf). But
>> I think starting with pHash and supporting text using w-shingling can be a
>> pretty good start for the fingerprinting library. I would like to hear more
>> thoughts on this.
>>
>>
>> I have heard mixed reviews of pHash. I think a first step in the project
>> should be to come up with a set of tests and metrics, and try out pHash as
>> well as other solutions.
>>
>> The CC Web Content API project sounds appealing, since it is the glue that
>> binds other parts, and perhaps crucial to the successful implementation of
>> OpenHome project. Imo, this could perhaps be meshed with the Fingerprinting
>> project (if pHash is used as a base). Essentially, the current
>> Fingerprinting task is reduced to exposing the pHash library via a nice API.
>> And over the time, pHash/Fingerprinting algorithms can be added/improved.
>>
>>
>> Yes, if you'd like to focus on the Web content API, then you can abstract
>> away the fingerprinting portion. Even straight-up SHA-1 would work for a
>> demo of the Web content API (it wouldn't catch modified images, but it would
>> catch the same file in other webpages).
>>
>> Let me know if I am making sense. Sorry if it's difficult to follow, we can
>> carry this conversation on IRC. My nick is sids_aquarius.
>>
>>
>> Sounds good! I'm traveling until Sunday, but will try to drop by when
>> possible.
>>
>> Dan
>>
_______________________________________________
cc-devel mailing list
[email protected]
http://lists.ibiblio.org/mailman/listinfo/cc-devel

Re: [cc-devel] Queries about OpenHome projects Web Content API, and Fingerprinting Library

Reply via email to