> > What are the thoughts on adding an optional attribute to the hash
> > element so that each piece can express its own length?
>
> hi Peter,
>
> I had thought something like this would be nice for things like music,
> where if you edit the ID3 tags of an mp3, changing the artist or song,
> you change the whole file's checksum, while not really changing the
> important data at all.

Hi Anthony

Thanks! Interesting idea. If the apps creating the metalink pieces
further agreed on where to make those piece boundaries, in common
types of content (e.g. mp3): other apps could identify content that is
similar apart from its header and or footer. They could do this very
efficiently by just comparing piece info from the metalinks, rather
than by re-chunking and hashing each file's content themselves.

Once pieces have been identified as being the same across different
files, apps could identify more potential sources for particular
pieces, identify duplication within a distributed collection, find the
richest metadata/tags for particular content etc.

The pieces in the particular app I was originally referring to are
more similar to this:
http://www.hpl.hp.com/techreports/2005/HPL-2005-42R1.pdf
Finding Similar Files in Large Document Repositories
See 2.2 Chunking

"Content-based chunking, as introduced in [7], is a way of breaking a
file into a sequence of chunks so that chunk boundaries are determined
by the local contents of the file. This is in contrast to using fixed
size chunks, where chunk boundaries are determined by the distance
from the begin- ning of the file; inserting a single byte at the
beginning would change every chunk."

As the chunks could be small and many, it would be good if each of the
hashed pieces could express their own length in a space efficient
way...

I didn't quite follow the extension elements spec. Would you lean
towards extending the hash element to have an optional length
attribute? Or have a new element that is an alternative to pieces,
e.g. chunks, which has a list of hashes + lengths? It may be good if
examples of potential extensions esp variable-length pieces or chunks
were hinted at in the spec to gain interest in their standardization
and adoption?

-- 
You received this message because you are subscribed to the Google Groups 
"Metalink Discussion" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/metalink-discussion?hl=en.

Reply via email to