subject:"\[Standards\] \[XEP\-0234\] File checking in JingleFT"

Re: [Standards] [XEP-0234] File checking in JingleFT

2012-02-08 Thread Peter Saint-Andre

On 12/12/11 11:32 AM, Jefry Lagrange wrote:
 I was thinking that maybe we propose a separate XEP for file checking.
 Still, some changes should have to be done to JingleFT such as
 allowing range transfer reuse the session and add a limit attribute
 in the range file transfer. So that it can send chunks of data (with
 offset and limit).

Well, XEP-0234 is not finished, so should have been done is a bit
pessimistic. I have just updated the spec to track the latest changes to
XEP-0300 and I'd be happy to make further changes so that we can support
future checking algorithms. I agree that we also need to provide clearer
documentation of ranged transfers (e.g., reusing the Jingle session ID).

Peter

-- 
Peter Saint-Andre
https://stpeter.im/

Re: [Standards] [XEP-0234] File checking in JingleFT

2011-12-12 Thread Jefry Lagrange

I was thinking that maybe we propose a separate XEP for file checking.
Still, some changes should have to be done to JingleFT such as
allowing range transfer reuse the session and add a limit attribute
in the range file transfer. So that it can send chunks of data (with
offset and limit).

On Thu, Dec 8, 2011 at 3:10 PM, Waqas Hussain waqa...@gmail.com wrote:
On Thu, Dec 8, 2011 at 6:48 AM, Jefry Lagrange jefry.re...@gmail.com wrote:
So far what I've read in the XEP-0234 doesn't say anything about file
checking. I suppose, it is implied that once the comparing of the
hashes takes place, the program has to inform the user of a possible
file corruption and then restart the file transfer manually.

This is very inefficient, and awkward (for the user). Instead of
resending the whole file, we should be able to detect in which part of
the file there is corruption and resend that part using ranged
transfers (as defined in
http://xmpp.org/extensions/xep-0234.html#range).

There are various issues with this. In the current implementation of
jingleFT, range file transfer have to initiate a new session in order
to transfer the file from a given offset. Ideally, we would want to
use the current session.

Example:

* initiator sends file
* responder receives it
* initiator communicates the hash
* responder checks, and notices that the file is corrupt. Finds where
the corruption took place (using techniques explained later)
* responder ask for ranged file transfer over the same session (note:
we haven't terminated the session).
* initiator resends the requested part of the file

I'm not sure that only providing an offset for as a range can
accomplish this, it would be probably be a better idea to provide a
range as a range. I might be wrong about this, I'm not sure.

Now, the question is, how do we know which part of the file is corrupted?

One way to go about this is using hash-lists. Which is the way
bittorrent makes sure there isn't corruption in the files being
transfered.

Basically, what we do with this is that we send the file in parts. For
each part the initiator will send an IQ with the hash of that file
part. The responder will respond with a result IQ or an error. In case
there is an error from part of the responder, the initiator will
resend that part.

Easy right? The problem with this algorithm is that it is too
expensive. It makes sense using it in p2p programs, because a file may
be share to other peers before it is completed. It doesn't make very
much sense using this in XMPP, where there is usually just one sender
and one receiver. But this technique is very illustrative and easy to
understand.

Another alternative technique that we can use is binary-hashing (I
made the name up, I don't know what this is called). Using this
technique we will wait until the file is transfered as usual, then
when we check the hashes if there is a mismatch we will splice the
file in two parts, take the hash of both parts and ask the initiator
for confirmation on those hashes, if one mismatches, that should tell
us that the corrupt part is, and we continue repeating this process
until the parts are small enough for them to be reasonable resent. If
the file is small enough to being with, then we can skip this.

The cost (in terms of computation) of this technique might be greater
than hash-list. But the overall cost is many times smaller than
hash-list, because we only need to use this if we find that the file
is corrupt, which is an unlikely event.

I would love to know what you guys think of this.

--
Jefry Lagrange

I have always been interested in ranged transfers, and verification.
I'm not sure how much interest there is in general XMPP community
though.

The way bittorrent and other file sharing protocols have been doing
this for ages is quite interesting, and efficient.

Here's is the specification of data contained in a bittorrent file:
http://www.bittorrent.org/beps/bep_0003.html#metainfo-files-are-bencoded-dictionaries-with-the-following-keys

And what you called 'binary-hashing' are Merkel trees, or Hash trees.
http://en.wikipedia.org/wiki/Hash_tree

A bittorrent extension for merkel trees was defined two years ago:
http://www.bittorrent.org/beps/bep_0030.html

In bittorrent, the size of a block of data is defined in the torrent
file, specific to that torrent. A common size is 512KB.

For a 1GB transfer, this would be 2048 blocks of 512KB each. Given
SHA-1 as hash, this would be 20*2048 = 40KB of hashes. 53.3KB with
base-64 encoding. Your suggested hash tree may be twice that size, at
106.67KB, or it can be just 53.3KB by keeping hashes in all levels of
the tree, not just leaves. See how BEP-0030 does it.

Re: [Standards] [XEP-0234] File checking in JingleFT

2011-12-08 Thread Waqas Hussain

On Thu, Dec 8, 2011 at 6:48 AM, Jefry Lagrange jefry.re...@gmail.com wrote:
So far what I've read in the XEP-0234 doesn't say anything about file
checking. I suppose, it is implied that once the comparing of the
hashes takes place, the program has to inform the user of a possible
file corruption and then restart the file transfer manually.

Example:

I'm not sure that only providing an offset for as a range can
accomplish this, it would be probably be a better idea to provide a
range as a range. I might be wrong about this, I'm not sure.

Now, the question is, how do we know which part of the file is corrupted?

One way to go about this is using hash-lists. Which is the way
bittorrent makes sure there isn't corruption in the files being
transfered.

I would love to know what you guys think of this.

--
Jefry Lagrange

I have always been interested in ranged transfers, and verification.
I'm not sure how much interest there is in general XMPP community
though.

The way bittorrent and other file sharing protocols have been doing
this for ages is quite interesting, and efficient.

Here's is the specification of data contained in a bittorrent file:

http://www.bittorrent.org/beps/bep_0003.html#metainfo-files-are-bencoded-dictionaries-with-the-following-keys

And what you called 'binary-hashing' are Merkel trees, or Hash trees.
http://en.wikipedia.org/wiki/Hash_tree

A bittorrent extension for merkel trees was defined two years ago:
http://www.bittorrent.org/beps/bep_0030.html

In bittorrent, the size of a block of data is defined in the torrent
file, specific to that torrent. A common size is 512KB.

A 1TB file though would have 53.3MB of hashes.. and just think how
many IQs it would take if you are requesting subsection hashes, and
how much CPU and I/O time it would take if you are calculating hashes
of parts independently, and not the way bittorrent is doing it...
given that we are defining a protocol here, we have to keep scaling in
mind.

The simplest and most scalable solution would be to simply make a
merkel hash file available as a separate file transfer, and only
transfer the root hash in the jingle negotiation. This also allows
ranged transfer of only necessary parts of the hash file, and

Re: [Standards] [XEP-0234] File checking in JingleFT

Re: [Standards] [XEP-0234] File checking in JingleFT

Re: [Standards] [XEP-0234] File checking in JingleFT

3 matches

Site Navigation

Mail list logo

Footer information