> > What are the thoughts on adding an optional attribute to the hash > > element so that each piece can express its own length? > > hi Peter, > > I had thought something like this would be nice for things like music, > where if you edit the ID3 tags of an mp3, changing the artist or song, > you change the whole file's checksum, while not really changing the > important data at all.
Hi Anthony Thanks! Interesting idea. If the apps creating the metalink pieces further agreed on where to make those piece boundaries, in common types of content (e.g. mp3): other apps could identify content that is similar apart from its header and or footer. They could do this very efficiently by just comparing piece info from the metalinks, rather than by re-chunking and hashing each file's content themselves. Once pieces have been identified as being the same across different files, apps could identify more potential sources for particular pieces, identify duplication within a distributed collection, find the richest metadata/tags for particular content etc. The pieces in the particular app I was originally referring to are more similar to this: http://www.hpl.hp.com/techreports/2005/HPL-2005-42R1.pdf Finding Similar Files in Large Document Repositories See 2.2 Chunking "Content-based chunking, as introduced in [7], is a way of breaking a file into a sequence of chunks so that chunk boundaries are determined by the local contents of the file. This is in contrast to using fixed size chunks, where chunk boundaries are determined by the distance from the begin- ning of the file; inserting a single byte at the beginning would change every chunk." As the chunks could be small and many, it would be good if each of the hashed pieces could express their own length in a space efficient way... I didn't quite follow the extension elements spec. Would you lean towards extending the hash element to have an optional length attribute? Or have a new element that is an alternative to pieces, e.g. chunks, which has a list of hashes + lengths? It may be good if examples of potential extensions esp variable-length pieces or chunks were hinted at in the spec to gain interest in their standardization and adoption? -- You received this message because you are subscribed to the Google Groups "Metalink Discussion" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/metalink-discussion?hl=en.
