On Fri, May 7, 2010 at 7:03 AM, petero <[email protected]> wrote: >> > What are the thoughts on adding an optional attribute to the hash >> > element so that each piece can express its own length? >> >> hi Peter, >> >> I had thought something like this would be nice for things like music, >> where if you edit the ID3 tags of an mp3, changing the artist or song, >> you change the whole file's checksum, while not really changing the >> important data at all. > > Hi Anthony > > Thanks! Interesting idea. If the apps creating the metalink pieces > further agreed on where to make those piece boundaries, in common > types of content (e.g. mp3): other apps could identify content that is > similar apart from its header and or footer. They could do this very > efficiently by just comparing piece info from the metalinks, rather > than by re-chunking and hashing each file's content themselves. > > Once pieces have been identified as being the same across different > files, apps could identify more potential sources for particular > pieces, identify duplication within a distributed collection, find the > richest metadata/tags for particular content etc. > > The pieces in the particular app I was originally referring to are > more similar to this: > http://www.hpl.hp.com/techreports/2005/HPL-2005-42R1.pdf > Finding Similar Files in Large Document Repositories > See 2.2 Chunking > > "Content-based chunking, as introduced in [7], is a way of breaking a > file into a sequence of chunks so that chunk boundaries are determined > by the local contents of the file. This is in contrast to using fixed > size chunks, where chunk boundaries are determined by the distance > from the begin- ning of the file; inserting a single byte at the > beginning would change every chunk." > > As the chunks could be small and many, it would be good if each of the > hashed pieces could express their own length in a space efficient > way... > > I didn't quite follow the extension elements spec. Would you lean > towards extending the hash element to have an optional length > attribute? Or have a new element that is an alternative to pieces, > e.g. chunks, which has a list of hashes + lengths? It may be good if > examples of potential extensions esp variable-length pieces or chunks > were hinted at in the spec to gain interest in their standardization > and adoption?
this doesn't sound like exactly the same thing, but there's http://en.wikipedia.org/wiki/Similarity_Enhanced_Transfer a new element would probably be better, so as not to confuse unextended clients. since metalink4's extensions are based on atom, you might follow an atom extension like RFC 4685. for historic purposes, here's an example of extensions for metalink3 http://groups.google.com/group/metalink-discussion/web/extending-metalink -- (( Anthony Bryan ... Metalink [ http://www.metalinker.org ] )) Easier, More Reliable, Self Healing Downloads -- You received this message because you are subscribed to the Google Groups "Metalink Discussion" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/metalink-discussion?hl=en.
