Re: [Standards] Binary data over XMPP

2007-11-27 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Rachel Blackman wrote:
> Alternatively, if we have decided that sending 100k custom emoticons
> over mobile phones generates 33k of 'needless' traffic which is a
> deal-breaker to the point that solution is throwing out XMPP 1.0 and
> starting over with 2.0, I would say that the more practical solution
> here is not to support custom emoticons on mobile phones.

I remember, years ago, talking about a xmpp<->binary proxy designed for
pay-per-byte environments.

> I just think we may be overthinking this.

I agree.

- --
Jesus Cea Avion _/_/  _/_/_/_/_/_/
[EMAIL PROTECTED] http://www.argo.es/~jcea/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:[EMAIL PROTECTED] _/_/_/_/  _/_/_/_/_/
   _/_/  _/_/_/_/  _/_/  _/_/
"Things are not so easy"  _/_/  _/_/_/_/  _/_/_/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/_/_/_/  _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBR0ynFZlgi5GaxT1NAQJOpAP/YNg3wc8eKJdQ6umXHfsMjG5hmtNWjSJP
K1QaeeUURezccJhF60q5X4MbkMLEEXpE0F3aC/+qG+kabBsFjRflcwcaWdPvcazj
myto0F/ayMg0E3wbDClu6kt6Yn1zmiu0EsNZe3+hdwiOrtxjeAAX67G3Q6voZAWP
niZt1nT9q8U=
=/B15
-END PGP SIGNATURE-


Re: [Standards] Binary data over XMPP

2007-11-27 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dave Cridland wrote:
> Note that compressing first, then base64 encoding, then compressing
> *again* actually gave better results than base64 *then* compressing,
> meaning that almost every file transfer we do under base64 should be
> compressed first.

But if you compress the output before sending, you have again 8 bit data
that you must encode to send inside XMPP.

The question is: if you compress before base64, you send less data.
Sure. But that is orthogonal to the baseX encoding.

- --
Jesus Cea Avion _/_/  _/_/_/_/_/_/
[EMAIL PROTECTED] http://www.argo.es/~jcea/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:[EMAIL PROTECTED] _/_/_/_/  _/_/_/_/_/
   _/_/  _/_/_/_/  _/_/  _/_/
"Things are not so easy"  _/_/  _/_/_/_/  _/_/_/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/_/_/_/  _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBR0ymRZlgi5GaxT1NAQJpzQQAiLxk8lapCaEpDaweXn3U2tO12IPVv8Ry
MbrcgRKZzozTS6fWCfpqkE4h4toIkQHt6Uv7/ftaFnGzyF2E7PdFf5fdgi8KY8xF
QRXQOVdyIaj9GSE2tljR6MNvoslHFrFA8ScO4hTgt6M690AP7f2tWoRf8eQ7Zucm
VIjuH+FuZT0=
=IXsU
-END PGP SIGNATURE-


Re: [Standards] Binary data over XMPP

2007-11-12 Thread Dave Cridland

On Sat Nov 10 02:07:08 2007, Justin Karneges wrote:

On Friday 09 November 2007 3:35 pm, Dave Cridland wrote:
> ubiquitous encryption

Best laugh of the day!



Oh, I'm not laughing.


Other protocols have been fighting this battle for years.  Is XMPP  
so much different?  I can see the headlines: "XMPP finally gets  
everyone in the world to use encryption.  Email working group  
wasted their lives."


To understand why those efforts failed, it's worth looking at what's  
changed over the years.


When Internet Mail started, it was purely an interoperability  
facility between heterogeneous systems - as were pretty well all  
protocols back then. You can see this in the way that an email  
address is specified - there's no specification at all for the  
local-part - it can contain pretty much anything at all, it may or  
may not be case-sensitive, etc.


As I said, most protocols of the time were similar. FTP exposes the  
host's filesystem semantics, so using FTP requires that you know the  
remote host's filesystem layout. IMAP, similarly, exposes the host's  
mailbox layout and hierarchy - giving endless fun for client  
developers who usually expect all IMAP servers to look the same.


So providing any end-to-end service over email is tricky, because the  
majority of email servers - still - are not "Internet" mailservers,  
but LAN mail systems that have a gateway. (Exchange is, now, finally  
dealing with Internet Mail internally, but until very recently it was  
X.400 internally, and was much happier talking X.400 P1 rather than  
ESMTP). Hence most ESMTP extensions assume that somewhere, the  
Internet Mail system stops, and gets gatewayed into something local.


A sea-change (or paradigm shift, if you like playing buzzword bingo)  
in protocol design happened around the early 90's, when protocol  
designers shifted from exposing local semantics into providing a  
homogenous model. XMPP is a late protocol, by this metric, as is  
HTTP. Many protocols have shifted toward this style, too - FTP now  
has TVFS, IMAP servers increasingly provide a fairly homogeneous  
layout, etc. This makes deploying end to end services significantly  
easier.


The other factor is that email isn't a close-knit community. At the  
SDO level it is - the majority of email standards developers know  
each other to some degree. However the vast majority of client - and  
even server - developers don't participate. This contrasts heavily  
with XMPP, where the vast majority of client and server developers  
are active on this list.


Finally, we're a much younger protocol. Email is thoroughly ancient,  
and encryption is a comparitively new issue, and even there, multiple  
paths have been explored, and problems discovered. We've got the  
benefit of hindsight here - we know which bits have proven difficult  
to deploy, and which bits have proven easy. We know what end-users  
actually want, as well. All of this knowledge has effectively come  
from email.


I strongly suspect that we're in a much better position to achieve  
ubiquitous (or near ubiquitous) encryption than email ever was, and I  
certainly don't think that it's worth giving up before we've started.


Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade


Re: [Standards] Binary data over XMPP

2007-11-10 Thread Fabio Forno
On Nov 11, 2007 12:09 AM, Fabio Forno <[EMAIL PROTECTED]> wrote:
>
> But if the ids are independently chosen by the clients there may be
> the risk of colliding acks, so how can the server chose in the correct
> way?

I answer to myself, when I read the XEP the first time I've mistaken
the id used for reconnecting with the ids used for acks. Since the
"ack_id" for "reconnecting" is given at the beginning of the examples
and then the example using it for reconnecting is described at the
end, I suggest to give it a more meaningful name in order to avoid
misunderstandings, as it happened ;)

-- 
Fabio Forno, PhD
Istituto Superiore Mario Boella
Jabber ID: xmpp:[EMAIL PROTECTED]
** Try Jabber http://www.jabber.org


Re: [Standards] Binary data over XMPP

2007-11-10 Thread Fabio Forno
On Nov 9, 2007 7:37 PM, Rachel Blackman <[EMAIL PROTECTED]> wrote:

> Facetious comments aside, my point is that if we're talking about
> modifying how the XMPP parser works, why bother doing things halfway
> with little workarounds?  Throw out XMPP 1.0 entirely and come up with
> an extensible 2.0 binary protocol.
> If we like to chant the 'XMPP is not really XML' mantra and the 'we
> must shave off every byte we can to spare the poor mobile users'
> mantras, that's great.  But considering we only have 3 actual main
> stanza types, a purely binary (and not necessarily XML-related)
> protocol would be more efficient.

That's exactly my point: XMPP 1.0 is good for desktop clients, and at
present for a series of reasons I've already talked about I prefer
BOSH for mobiles, but an extensible binary xml protocol would be the
best of both worlds.

> I think we've lost sight of whatever the original problem we were
> trying to solve was (inline images?  Size of binary blobs to mobiles?)
> and have become caught up in hypothetical solutions which may no
> longer be directly connected to the issue.  :)

One more good reason for using BOSH with mobiles: you can fix very
quickly the binary data issue offering the decoded, more compact, data
on the same channel, accessing it using a different path in the
request. The change would be almost trivial, leaving the time for a
decent binary XMPP 2.0

-- 
Fabio Forno, PhD
Istituto Superiore Mario Boella
Jabber ID: xmpp:[EMAIL PROTECTED]
** Try Jabber http://www.jabber.org


Re: [Standards] Binary data over XMPP

2007-11-10 Thread Fabio Forno
On Nov 8, 2007 8:11 PM, Justin Karneges
<[EMAIL PROTECTED]> wrote:

> When you connect again, you specify the ack session id of the disconnected
> connection, so that the server knows which session you are trying to recover.
>

But if the ids are independently chosen by the clients there may be
the risk of colliding acks, so how can the server chose in the correct
way?

> According to the XEP, you then would do resource binding.  At the very least,
> the XEP should be updated to state that the client must bind to the same
> resource as before, and if it doesn't then the server must assert the correct
> resource in the bind iq reply.  However, I'm tempted to say that when you
> resume an ack session, the resource binding step should just be skipped.
> After all, both parties already know what the resource is supposed to be.

I think the resource binding MUST be skipped and the resource
maintained, usually a session object in servers is associated to a
given resource that cannot be changed during its life. Moreover the
other clients will see a presence of type unvailable from the former
resource, and a new presence from the new one. I don't think that this
is the behavior we want.

-- 
Fabio Forno, PhD
Istituto Superiore Mario Boella
Jabber ID: xmpp:[EMAIL PROTECTED]
** Try Jabber http://www.jabber.org


Re: [Standards] Binary data over XMPP

2007-11-09 Thread Justin Karneges
On Friday 09 November 2007 3:35 pm, Dave Cridland wrote:
> ubiquitous encryption

Best laugh of the day!

Other protocols have been fighting this battle for years.  Is XMPP so much 
different?  I can see the headlines: "XMPP finally gets everyone in the world 
to use encryption.  Email working group wasted their lives."

-Justin


Re: [Standards] Binary data over XMPP

2007-11-09 Thread Dave Cridland

On Fri Nov  9 18:37:08 2007, Rachel Blackman wrote:
If we like to chant the 'XMPP is not really XML' mantra and the 'we  
 must shave off every byte we can to spare the poor mobile users'   
mantras, that's great.


I'm not chanting any mantras, sorry. If encrypted sessions become the  
rule, rather than the rare exception - and I do want this to happen -  
then 25% of a server owner's bandwidth bill is going to be down to  
base64. If you're okay with that, please send me the cash instead. :-)


  But considering we only have 3 actual main  stanza types, a  
purely binary (and not necessarily XML-related)  protocol would be  
more efficient.


Much harder to code and debug, though - we need a middle ground here.  
An escape mechanism makes sense to me, but I'm easy to persuade  
otherwise.


  And if we're going to break the  world by changing how XMPP  
parsing works, then why on earth would we  go through the pain of  
breaking our protocol to glue the ability to  include a few extra  
characters in just to go ASCII85 or BASE91 instead  of BASE64?



This I definitely agree with, not least because it still doesn't gain  
us anything particularly useful in terms of bandwidth improvements.  
We might drop that overhead from 33% to 10% with a serious amount of  
work, but that's as good as it gets, and means introducing  
tricky-to-write untested codecs everywhere. Fun and games.



I think we've lost sight of whatever the original problem we were   
trying to solve was (inline images?  Size of binary blobs to  
mobiles?)  and have become caught up in hypothetical solutions  
which may no  longer be directly connected to the issue.  :)


The problem is hypothetical, which makes solutions also hypothetical.

The hypothesis is:

XMPP will display a tendency toward being used increasingly for  
binary data, in particular via encryption, but also for various other  
things (including file transfer). As this trend continues, the issue  
of base64 encoding will play a significant role in bandwidth figures  
for both servers and clients. This trend is desirable, because it  
indicates an uptake of encryption, and therefore is to be encouraged  
by support within the protocol.


Inlined images aren't driving this for me at all. At best it seems  
that addressing these if we can has merit. I'm really thinking in  
terms of IBB, and leveraging that for use in encrypted session  
support et al ready for the future when we'll actually need this.


I'm entirely cool with agreeing it's not needed now, but the sooner  
we start thinking about this the better - I think you're clearly  
stating that if we choose to address this, it'll be a major bit of  
work.


Please don't consider this in terms of inlined images and fringe  
users - think of it in terms of ubiquitous encryption and servers.


Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade


Re: [Standards] Binary data over XMPP

2007-11-09 Thread Tomasz Sterna
Dnia 09-11-2007, Pt o godzinie 10:23 -0800, Justin Karneges pisze:
> Each session is given a unique id.  There is no "guessing" for the
> server to 
> do, because no two sessions would be given the same id.

Right.  I've reread the XEP and there is no confusion for me anymore.


> Why is this thread still going? :)

I think, the amount of text and its formatting in XEP-0198 is
overwhelming. All the people I talked about it with said, that it's very
hard to grasp and we almost need to methodically decipher it. ;-)


-- 
  /\_./o__ Tomasz Sterna
 (/^/(_^^'  Xiaoka.com
._.(_.)_  XMPP: [EMAIL PROTECTED]



Re: [Standards] Binary data over XMPP

2007-11-09 Thread Kevin Smith

On 9 Nov 2007, at 20:49, Peter Saint-Andre wrote:

So let's bite the bullet and say that In-Band Bytestreams is perfectly
fine for small bits of data (and maybe even larger blobs of data).  
If we

need something that's good for including really tiny bits of data in a
stanza (e.g., via data: URL) then let's define that too so that we can
do small incline icons or rasterized images for whiteboards or  
whatever

all else people want to build. All this hypothetical stuff is well and
good but it's a tangent.


That all sounds eminently sensible.

/K



Re: [Standards] Binary data over XMPP

2007-11-09 Thread Peter Saint-Andre
Rachel Blackman wrote:

> I think we've lost sight of whatever the original problem we were trying
> to solve was (inline images?  Size of binary blobs to mobiles?) and have
> become caught up in hypothetical solutions which may no longer be
> directly connected to the issue.  :)

Right!

So let's bite the bullet and say that In-Band Bytestreams is perfectly
fine for small bits of data (and maybe even larger blobs of data). If we
need something that's good for including really tiny bits of data in a
stanza (e.g., via data: URL) then let's define that too so that we can
do small incline icons or rasterized images for whiteboards or whatever
all else people want to build. All this hypothetical stuff is well and
good but it's a tangent.

Peter

-- 
Peter Saint-Andre
https://stpeter.im/



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Standards] Binary data over XMPP

2007-11-09 Thread Tobias Markmann
Exactly, it doesn't matter what character the already available
methods/implementation output as long as they don't output more than
101 different characters. If they output one we can't use we just
replace it with one we can use.

cheers
Tobias

On Nov 9, 2007 7:45 PM, Michal 'vorner' Vaner <[EMAIL PROTECTED]> wrote:
> Hello
>
> On Fri, Nov 09, 2007 at 10:01:39AM -0700, Joe Hildebrand wrote:
> >
>
> > On Nov 9, 2007, at 8:47 AM, Tobias Markmann wrote:
> >
> >> There are already several binary-to-text encodings which perform a bit
> >> better than Base64, two of them are:
> >>
> >> 1. http://en.wikipedia.org/wiki/ASCII85 invented by Adobe
> >> 2. http://base91.sourceforge.net/
> >
> > Both of those seem to allow < and &, which make them less than ideal for
> > embedding in XML.
>
> Why not replace these 2 with something else?
>
> --
> If you are over 80 years old and accompanied by your parents, we will
> cash your check.
>
> Michal 'vorner' Vaner
>


Re: [Standards] Binary data over XMPP

2007-11-09 Thread Michal 'vorner' Vaner
Hello

On Fri, Nov 09, 2007 at 10:01:39AM -0700, Joe Hildebrand wrote:
>
> On Nov 9, 2007, at 8:47 AM, Tobias Markmann wrote:
>
>> There are already several binary-to-text encodings which perform a bit
>> better than Base64, two of them are:
>>
>> 1. http://en.wikipedia.org/wiki/ASCII85 invented by Adobe
>> 2. http://base91.sourceforge.net/
>
> Both of those seem to allow < and &, which make them less than ideal for 
> embedding in XML.

Why not replace these 2 with something else?

-- 
If you are over 80 years old and accompanied by your parents, we will
cash your check.

Michal 'vorner' Vaner


pgpTvutzcdXDn.pgp
Description: PGP signature


Re: [Standards] Binary data over XMPP

2007-11-09 Thread Rachel Blackman


On Nov 9, 2007, at 10:27 AM, Rachel Blackman wrote:


On Nov 9, 2007, at 8:47 AM, Tobias Markmann wrote:

There are already several binary-to-text encodings which perform  
a bit

better than Base64, two of them are:

1. http://en.wikipedia.org/wiki/ASCII85 invented by Adobe
2. http://base91.sourceforge.net/


Both of those seem to allow < and &, which make them less than ideal
for embedding in XML.


"XMPP is not XML" :-)))


No.  But just because a is not b does not imply that b is not a.   
XMPP is a /subset/ of XML: all XML is not valid XMPP, but all XMPP  
is (or should be) valid XML when the session is taken as a  
document.  :)


Both from a design standpoint, and a practical standpoint (re-using  
existing XML parsers for XMPP is easy given that XMPP obeys a subset  
of the XML rules).  So one would think that < and & are still  
equally important not to have appearing raw in an XMPP stream.


On top of which, if you modify the XMPP stream/parser rules to allow  
raw & and < in a stream you really have to roll your own parser  
anyway.  So at that point, why the hell not just send the raw binary  
blob rather than trying to needlessly encode it?


I mean, if you are completely throwing out the idea and redoing how  
streams work, why do it halfway?  Why change it so that you can allow  
< and & raw in a stream, just so that you can shave a few bytes off by  
replacing BASE64?  Let's just go to a completely-binary protocol like  
AIM's OSCAR; it opens up a lot of doors without having to worry about  
parsing rules.  Just define a binary packet format with a header and a  
length field and hey, we're good to go on whatever!


Facetious comments aside, my point is that if we're talking about  
modifying how the XMPP parser works, why bother doing things halfway  
with little workarounds?  Throw out XMPP 1.0 entirely and come up with  
an extensible 2.0 binary protocol.


If we like to chant the 'XMPP is not really XML' mantra and the 'we  
must shave off every byte we can to spare the poor mobile users'  
mantras, that's great.  But considering we only have 3 actual main  
stanza types, a purely binary (and not necessarily XML-related)  
protocol would be more efficient.  And if we're going to break the  
world by changing how XMPP parsing works, then why on earth would we  
go through the pain of breaking our protocol to glue the ability to  
include a few extra characters in just to go ASCII85 or BASE91 instead  
of BASE64?


I think we've lost sight of whatever the original problem we were  
trying to solve was (inline images?  Size of binary blobs to mobiles?)  
and have become caught up in hypothetical solutions which may no  
longer be directly connected to the issue.  :)


--
Rachel Blackman <[EMAIL PROTECTED]>
Trillian Messenger - http://www.trillianastra.com/




Re: [Standards] Binary data over XMPP

2007-11-09 Thread Rachel Blackman

On Nov 9, 2007, at 8:47 AM, Tobias Markmann wrote:

There are already several binary-to-text encodings which perform a  
bit

better than Base64, two of them are:

1. http://en.wikipedia.org/wiki/ASCII85 invented by Adobe
2. http://base91.sourceforge.net/


Both of those seem to allow < and &, which make them less than ideal
for embedding in XML.


"XMPP is not XML" :-)))


No.  But just because a is not b does not imply that b is not a.  XMPP  
is a /subset/ of XML: all XML is not valid XMPP, but all XMPP is (or  
should be) valid XML when the session is taken as a document.  :)


Both from a design standpoint, and a practical standpoint (re-using  
existing XML parsers for XMPP is easy given that XMPP obeys a subset  
of the XML rules).  So one would think that < and & are still equally  
important not to have appearing raw in an XMPP stream.


--
Rachel Blackman <[EMAIL PROTECTED]>
Trillian Messenger - http://www.trillianastra.com/




Re: [Standards] Binary data over XMPP

2007-11-09 Thread Justin Karneges
On Friday 09 November 2007 1:10 am, Tomasz Sterna wrote:
> Dnia 08-11-2007, Cz o godzinie 11:11 -0800, Justin Karneges pisze:
> > I mean: in the
> >
> > > unlucky case of an entity having two open sessions and losing both
> >
> > of
> >
> > > them, how can the server decide which is the session to recover, but
> > > adding some semantics to the ids?
> >
> > When you connect again, you specify the ack session id of the
> > disconnected
> > connection, so that the server knows which session you are trying to
> > recover.
>
> The problem described is what happens with two lost sessions at the same
> time.
> Without semantics encoded in the ID server has no way of guessing which
> session is trying to reconnect.

Each session is given a unique id.  There is no "guessing" for the server to 
do, because no two sessions would be given the same id.  Why is this thread 
still going? :)

-Justin


Re: [Standards] Binary data over XMPP

2007-11-09 Thread Robin Redeker
On Fri, Nov 09, 2007 at 10:01:39AM -0700, Joe Hildebrand wrote:
> 
> On Nov 9, 2007, at 8:47 AM, Tobias Markmann wrote:
> 
> >There are already several binary-to-text encodings which perform a bit
> >better than Base64, two of them are:
> >
> >1. http://en.wikipedia.org/wiki/ASCII85 invented by Adobe
> >2. http://base91.sourceforge.net/
> 
> Both of those seem to allow < and &, which make them less than ideal  
> for embedding in XML.

"XMPP is not XML" :-)))

R


Re: [Standards] Binary data over XMPP

2007-11-09 Thread Joe Hildebrand


On Nov 9, 2007, at 8:47 AM, Tobias Markmann wrote:


There are already several binary-to-text encodings which perform a bit
better than Base64, two of them are:

1. http://en.wikipedia.org/wiki/ASCII85 invented by Adobe
2. http://base91.sourceforge.net/


Both of those seem to allow < and &, which make them less than ideal  
for embedding in XML.


--
Joe Hildebrand



Re: [Standards] Binary data over XMPP

2007-11-09 Thread Tobias Markmann
There are already several binary-to-text encodings which perform a bit
better than Base64, two of them are:

1. http://en.wikipedia.org/wiki/ASCII85 invented by Adobe
2. http://base91.sourceforge.net/

cheers
Tobias Markmann

On Nov 9, 2007 10:29 AM, Michal 'vorner' Vaner <[EMAIL PROTECTED]> wrote:
> Hello
>
> On Fri, Nov 09, 2007 at 10:18:51AM +0100, Matthias Wimmer wrote:
> > Hi Thomasz!
> >
> > Tomasz Sterna schrieb:
> > > Simplest that comes to mind:
> > > Let's take first 256 allowable UTF-8 characters and assign them to 256
> > > values of a single byte.
> >
> > It is not possible to sent the complete set of the first 256 Unicode
> > code points within XML. E.g. U+ cannot be present in an XML document.
>
> That's why there was 'allowable' -- the ones which you are allowed to
> send -- put characters in line, strike out all the ones you can't send
> and take the first 256.
>
> --
> Hallowed be the zeroes and ones
>
> Michal 'vorner' Vaner
>


Re: [Standards] Binary data over XMPP

2007-11-09 Thread Tomasz Sterna
Dnia 09-11-2007, Pt o godzinie 10:29 +0100, Michal 'vorner' Vaner pisze:
> put characters in line, strike out all the ones you can't send
> and take the first 256.

Thanks Michal. Couldn't word out it better. :-)
English is not my native language... still.

-- 
  /\_./o__ Tomasz Sterna
 (/^/(_^^'  Xiaoka.com
._.(_.)_  XMPP: [EMAIL PROTECTED]



Re: [Standards] Binary data over XMPP

2007-11-09 Thread Michal 'vorner' Vaner
Hello

On Fri, Nov 09, 2007 at 10:18:51AM +0100, Matthias Wimmer wrote:
> Hi Thomasz!
> 
> Tomasz Sterna schrieb:
> > Simplest that comes to mind:
> > Let's take first 256 allowable UTF-8 characters and assign them to 256
> > values of a single byte.
> 
> It is not possible to sent the complete set of the first 256 Unicode
> code points within XML. E.g. U+ cannot be present in an XML document.

That's why there was 'allowable' -- the ones which you are allowed to
send -- put characters in line, strike out all the ones you can't send
and take the first 256.

-- 
Hallowed be the zeroes and ones

Michal 'vorner' Vaner


pgpDwzwX3EEWX.pgp
Description: PGP signature


Re: [Standards] Binary data over XMPP

2007-11-09 Thread Matthias Wimmer
Hi Thomasz!

Tomasz Sterna schrieb:
> Simplest that comes to mind:
> Let's take first 256 allowable UTF-8 characters and assign them to 256
> values of a single byte.

It is not possible to sent the complete set of the first 256 Unicode
code points within XML. E.g. U+ cannot be present in an XML document.


Matthias

-- 
Matthias Wimmer  Fon +49-700 77 00 77 70
Züricher Str. 243Fax +49-89 95 89 91 56
81476 Münchenhttp://ma.tthias.eu/



Re: [Standards] Binary data over XMPP

2007-11-09 Thread Tomasz Sterna
Dnia 08-11-2007, Cz o godzinie 11:11 -0800, Justin Karneges pisze:
> I mean: in the
> > unlucky case of an entity having two open sessions and losing both
> of
> > them, how can the server decide which is the session to recover, but
> > adding some semantics to the ids?
> 
> When you connect again, you specify the ack session id of the
> disconnected 
> connection, so that the server knows which session you are trying to
> recover.

The problem described is what happens with two lost sessions at the same
time.
Without semantics encoded in the ID server has no way of guessing which
session is trying to reconnect.


-- 
  /\_./o__ Tomasz Sterna
 (/^/(_^^'  Xiaoka.com
._.(_.)_  XMPP: [EMAIL PROTECTED]



Re: [Standards] Binary data over XMPP

2007-11-08 Thread Justin Karneges
On Thursday 08 November 2007 1:26 am, Fabio Forno wrote:
> On Nov 8, 2007 9:52 AM, Tomasz Sterna <[EMAIL PROTECTED]> wrote:
> > Dnia 08-11-2007, Cz o godzinie 00:29 +0100, Fabio Forno pisze:
> > > One of the reasons I tend to use id bosh is the ability of keeping the
> > >  session open when the client temporary disconnects
> >
> > XEP-0198: Stanza Acknowledgements already supports session recovery
> > after disconnection with  element.
> > It also ensures, that no packets get lost with the connection failure.
>
> Too many xeps, sometime sou get lost ;) Yep it seems to work also for
> tcp connections, there is a thing I cant understand: when the
> initiating entity tries to recover from a packet of a previouos
> session, how do the server choses the correct session if the 
> of the resources and stanzas acks are offered together? I mean: in the
> unlucky case of an entity having two open sessions and losing both of
> them, how can the server decide which is the session to recover, but
> adding some semantics to the ids?

When you connect again, you specify the ack session id of the disconnected 
connection, so that the server knows which session you are trying to recover.

According to the XEP, you then would do resource binding.  At the very least, 
the XEP should be updated to state that the client must bind to the same 
resource as before, and if it doesn't then the server must assert the correct 
resource in the bind iq reply.  However, I'm tempted to say that when you 
resume an ack session, the resource binding step should just be skipped.  
After all, both parties already know what the resource is supposed to be.

-Justin


Re: [Standards] Binary data over XMPP

2007-11-08 Thread Fabio Forno
On Nov 8, 2007 9:52 AM, Tomasz Sterna <[EMAIL PROTECTED]> wrote:
> Dnia 08-11-2007, Cz o godzinie 00:29 +0100, Fabio Forno pisze:
> > One of the reasons I tend to use id bosh is the ability of keeping the
> >  session open when the client temporary disconnects
>
> XEP-0198: Stanza Acknowledgements already supports session recovery
> after disconnection with  element.
> It also ensures, that no packets get lost with the connection failure.

Too many xeps, sometime sou get lost ;) Yep it seems to work also for
tcp connections, there is a thing I cant understand: when the
initiating entity tries to recover from a packet of a previouos
session, how do the server choses the correct session if the 
of the resources and stanzas acks are offered together? I mean: in the
unlucky case of an entity having two open sessions and losing both of
them, how can the server decide which is the session to recover, but
adding some semantics to the ids?
-- 
Fabio Forno, PhD
Istituto Superiore Mario Boella
Jabber ID: xmpp:[EMAIL PROTECTED]
** Try Jabber http://www.jabber.org


Re: [Standards] Binary data over XMPP

2007-11-08 Thread Tomasz Sterna
Dnia 08-11-2007, Cz o godzinie 00:29 +0100, Fabio Forno pisze:
> One of the reasons I tend to use id bosh is the ability of keeping the
>  session open when the client temporary disconnects

XEP-0198: Stanza Acknowledgements already supports session recovery
after disconnection with  element.
It also ensures, that no packets get lost with the connection failure.


-- 
  /\_./o__ Tomasz Sterna
 (/^/(_^^'  Xiaoka.com
._.(_.)_  XMPP: [EMAIL PROTECTED]



Re: [Standards] Binary data over XMPP

2007-11-08 Thread Tomasz Sterna
Dnia 06-11-2007, Wt o godzinie 11:09 +0100, Michal 'vorner' Vaner pisze:
> Because the FTP data channel (not to mention it offers passive
> transfer,
> too) is _inbound_. If you opened not one TCP connection to the server,
> but two, one for XML and one for blobs, how it would be different from
> single TCP connection?

So, what you're suggesting is opening second connection to the 5222
port, and negotiating second, binary stream?

Well... I like the idea.
But this, would in reality create second, XMPP-Core/servers based, but
effectively unrelated to XMPP-IM, network for binary packet routing.
This network could be used to ie. Ethernet over XMPP with JIDs in place
of MAC or any other wild idea. :-)


-- 
  /\_./o__ Tomasz Sterna
 (/^/(_^^'  Xiaoka.com
._.(_.)_  XMPP: [EMAIL PROTECTED]



Re: [Standards] Binary data over XMPP

2007-11-07 Thread Dawid Toton

Robin Redeker wrote:

On Mon, Nov 05, 2007 at 09:56:12AM -0800, Justin Karneges wrote:
  
 1) XML element to indicate binary mode: this is probably the least 
destructive approach.  Keep in mind that we already have an XML to binary 
protocol change in XMPP: the TLS and SASL encryption layers.  Your XML parser 
needs to be able to stop on a dime when it sees that final '>' character, so 
asking for that in this discussion should not be a big deal.


Just keep in mind that we don't have a way to change "back".
The current change is a very drastic one, like "flush the whole
parser state and begin from start".
  
We can do exactly the same with the switching to binary mode: (I put 
comments in [] brackets)


[We are in XMPP stream. Here the new stanza begins:]

[at this point the parser drops its state; stops precisely after 
the closing '>' ]random-bytes-as-under-TLS-layer[we know which byte is 
the last - the length could be written as a prefix of the blob or as an 
attribute of opening XML tags]


[These two closing tags mark the end of a stanza and have 
always to be the same - we can merely look for '' string or 
whatever. Here one doesn't need XML parser's intervention.]


The XML parser have lost its state (probably just removing 
 openings from the stack), but the XMPP layer 
still remembers that the stream is open and is able to receive next stanzas.


Suppose client and server agreed to use such a protocol as a replacement 
for base64. Since we can efficiently send binary data only as a topmost 
XML chunk, additional identifers are needed that indicate which blob 
goes where. I mean, instead of:


large-base64-data

we could send two stanzas:

arbitrary-bytes


The overhead is roughly few hundred bytes, so for <1kB base64 works 
better. It doesn't matter, since we are looking for a way for midsize 
blocks.


If we didn't care of breaking current implementations, it would be good 
solution to enforce all to do parsing in natural multilayer way - as 
AFAIK some XMPP software already does. I mean: when TLS starts,

* suspend the outer XML parser (don't flush)
* intercept next bytes to feed them into new inner TLS/XML stack
* continue with the outer parser - may parse the closing 
This way we could do the binary mode switching just in places where 
base64 data would otherwise appear.


Dawid



Re: [Standards] Binary data over XMPP

2007-11-07 Thread Fabio Forno
On Nov 7, 2007 11:53 PM, Dave Cridland <[EMAIL PROTECTED]> wrote:
>
> (Hmm, this reminds me, I need to get around to finishing and
> publishing an I-D before the deadline on fast reauth).

Perhaps I'm missing something... Fast reauth? You mean just a speedup
in the login process (e.g. a token for rebinding a session) or also
some optimizations such as avoiding the initial presence burst when
going online?
One of the reasons I tend to use id bosh is the ability of keeping the
 session open when the client temporary disconnects

> >  and therefore we prefere bosh based
> > connections.
>
> I do think that BOSH is an exceedingly good design. But, FWIW, I use
> long-lived TCP connections over mobile networks quite a lot, and I
> find they work fine, even when moving between cells. (I use XMPP, but
> also IMAP and ACAP, all of which have server initiated data
> transfers, or "push" as the media calls it).

I'm aware of this. We support also long lived TCP connections and they
work fine (well, the main reason is that we're not ready, yet, for
proxying through bosh all the possible traffic, and public servers, at
present, support almost only tcp connections). Also BOSH, when
implemented on pipelined http 1.1, exploits a long lived TCP sockets
and the conections are pretty robust. The real advantage of BOSH id
that packets are implicitely framed by http requests/responses and
it's very easy to recover from any error, and it's also easy to
interleave other types of packets. Unfortunately TCP streams don't
have this kind of framing and any attempt of inserting somenthing
between the raw socket and the xml stanzas is dangerous...

> As far as I know, the OMA are increasingly interested in long-lived
> TCP based protocols, too, so the stability of mobile networks will
> hopefully improve.

The problem is that 100% reliable connections are impossible and a
framed binding such as bosh helps in recovering ;) Baiscally my
rationale is: why putting a lot effort and resources for recovering
from a very small failure rate, knowing that fixing everything is
impossible, when a smarter data protocol  does all the job?

>
> You're absolutely right - right now, exchanging large amounts of
> binary data over long thin pipes is a very unlikely state of affairs.
>
> I think this will change, primarily due to encryption, and - as a
> much more minor issue - due to increased "rich messaging". (I'm
> thinking about radio stations showing you pictures of the band now
> playing, and such things, which certainly some mobile companies are
> very keen on).

Yeah, but also in this cases but encryption, I don't think that in
band binary stuff is really necessary.
Perhaps there is only one practical reason: many platforms (i'm
thinking of j2me) don't allow opening new connections without user
authorization, so receiving obb data may be annoying as user
experience.


-- 
Fabio Forno, PhD
Istituto Superiore Mario Boella
Jabber ID: xmpp:[EMAIL PROTECTED]
** Try Jabber http://www.jabber.org


Re: [Standards] Binary data over XMPP

2007-11-07 Thread Dave Cridland

On Wed Nov  7 22:27:34 2007, Fabio Forno wrote:
We're developing a mobile client and we think that kind of  
information
should be threated in a different manner. In mobiles networks  
regular

socket connections have many problems (mainly disconnection handling
that forces a new login)


(Hmm, this reminds me, I need to get around to finishing and  
publishing an I-D before the deadline on fast reauth).



 and therefore we prefere bosh based
connections.


I do think that BOSH is an exceedingly good design. But, FWIW, I use  
long-lived TCP connections over mobile networks quite a lot, and I  
find they work fine, even when moving between cells. (I use XMPP, but  
also IMAP and ACAP, all of which have server initiated data  
transfers, or "push" as the media calls it).


As far as I know, the OMA are increasingly interested in long-lived  
TCP based protocols, too, so the stability of mobile networks will  
hopefully improve.



as rosters, avatars...). Moreover large amounts of binary data
exchanaged with mobiles are very unlikely, so I don't see the
necessity of making xml streams more complex for use cases that are
not well defined, if not improbable.


Right - I'll skip your discussion of binary XML formats (although  
important), but I do want to pick up on this.


You're absolutely right - right now, exchanging large amounts of  
binary data over long thin pipes is a very unlikely state of affairs.


I think this will change, primarily due to encryption, and - as a  
much more minor issue - due to increased "rich messaging". (I'm  
thinking about radio stations showing you pictures of the band now  
playing, and such things, which certainly some mobile companies are  
very keen on).


If the rich messaging doesn't happen, I won't be too bothered. If  
encryption doesn't happen on mobile devices because it's too  
expensive, I'll be very troubled indeed.


Dave
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade


Re: [Standards] Binary data over XMPP

2007-11-07 Thread Fabio Forno
On Nov 7, 2007 1:56 PM, Dave Cridland <[EMAIL PROTECTED]> wrote:
> >
> Yes, base64 is acceptable here, although bear in mind that over a
> charged-by-transfer medium - such as many mobile phone tariffs - that
> 100k image is transferred as 133k, and 33k that you didn't really
> need to transfer sounds like an additional cost we could drop if we
> had the technology to do so. It's not a driver for it, though, I
> agree.

We're developing a mobile client and we think that kind of information
should be threated in a different manner. In mobiles networks regular
socket connections have many problems (mainly disconnection handling
that forces a new login) and therefore we prefere bosh based
connections. Bosh has also the advantage that it may act more
intelligently than a simple proxy: the connector could be an agent
shaping information in more a suitable way for the mobile client
(optimized compression, caching, sending only the diffs of some data
as rosters, avatars...). Moreover large amounts of binary data
exchanaged with mobiles are very unlikely, so I don't see the
necessity of making xml streams more complex for use cases that are
not well defined, if not improbable.
What would be nice (and we're making some thoughts about it) is binary
bosh binding, with binary xml and binary data if necessary. Binary xml
+ compression is by far the most bandwidth efficient way for
exchanging xml and it may have the not trascurable advantage of being
able to implement parsers in very small clients (e.g. pic based nodes
in sensor networks). IMHO this is the only approach allowing full
compatibility with existing installations and ibb binary data for the
few clients that really need it. For regular socket based clients, as
others have already pointed out, there always alternatives and ibb is
the fallback for the few clients / applications that cannot do
otherwise.

-- 
Fabio Forno, PhD
Istituto Superiore Mario Boella
Jabber ID: xmpp:[EMAIL PROTECTED]
** Try Jabber http://www.jabber.org


Re: [Standards] Binary data over XMPP

2007-11-07 Thread Justin Karneges
On Wednesday 07 November 2007 4:56 am, Dave Cridland wrote:
> I personally feel that if we're to say that XMPP truly supports end
> to end encryption, we need to ensure it's of near-equal cost to the
> current way of doing things.

Totally disagree.  Encrypted e2e communication is almost universally 
ASCII-encoded (see PGP/MIME, S/MIME, OTR, PGP over IM, XEP-27).  It would be 
nice to transmit this more efficiently, but I don't buy encryption as a 
motivator.

-Justin


Re: [Standards] Binary data over XMPP

2007-11-07 Thread Rachel Blackman


On Nov 7, 2007, at 10:00 AM, Michal 'vorner' Vaner wrote:


Sure it is not optimal. I was just wondering, how far we want to go in
solving it - how bug the problem is. I still think redefining XMMP  
from

the ground because of this is not the right way.


+1.  Throwing out the underlying XMPP-ness of XMPP seems the wrong  
approach here to me.  Or if we do, we need to immediately go, okay,  
this is no longer XMPP as you know it and make a concerted effort.


Alternatively, if we have decided that sending 100k custom emoticons  
over mobile phones generates 33k of 'needless' traffic which is a deal- 
breaker to the point that solution is throwing out XMPP 1.0 and  
starting over with 2.0, I would say that the more practical solution  
here is not to support custom emoticons on mobile phones.


About the only situation I can see for a mobile phone to be sending a  
binary file inline is if you have just taken a picture with your  
cellphone camera and want to send it to a contact in an  tag.   
Which is a useful situation, but not one where I expect BASE64 use  
would be a deal-breaker, as it is not like you will be getting 14  
inline images per IM session, generally.  Unlike the 'custom emoticon'  
use case mentioned earlier.


I just think we may be overthinking this.  If things go through the  
server, they're safely XML enclosed.  If you want to send raw binary  
data without escaping it, we have at least two methods to negotiate  
client-to-client streams: Jingle and the somewhat more retro stream  
initiation.  Yes, that leaves you a bit out in the cold if you are  
behind some firewall and XMPP is your only tunnel to the outside  
world, but then you have IBB (and 33k should not be a deal-breaker in  
that case, I would think, since you are not on a cellular bandwidth  
plan).


I'm just not convinced this is a problem which doesn't already have  
multiple available solutions, much less one severe enough to require  
throwing out the underlying stream definition and starting over. :)


--
Rachel Blackman <[EMAIL PROTECTED]>
Trillian Messenger - http://www.trillianastra.com/




Re: [Standards] Binary data over XMPP

2007-11-07 Thread Michal 'vorner' Vaner
Hello

On Wed, Nov 07, 2007 at 04:11:21PM +, Dave Cridland wrote:
> On Wed Nov  7 15:02:57 2007, Michal 'vorner' Vaner wrote:
>> Can't compression solve this? Does anyone know, how the base64 encoded
>> data grow/shrink, if they are put trough zlib? Would be nice to know,
>> how far it is worth going with the blob transfers & modifications to
>> protocol.
>
> I've been accused - on this list - of treating compression as a panacea. 
> But it's not a substitute for efficiency. Base64 encoding is recovered to a 
> degree by a good minimal redundancy algorithm, but it tends to shield 
> patterns from a dictionary algorithm. DEFLATE uses a Lempel-Ziv dictionary 
> algorithm first, then Huffman, a minimal redundancy algorithm.

Sure it is not optimal. I was just wondering, how far we want to go in
solving it - how bug the problem is. I still think redefining XMMP from
the ground because of this is not the right way.

-- 
The problem with graduate students, in general, is that they have
to sleep every few days.

Michal 'vorner' Vaner


pgpprtOvHNqcJ.pgp
Description: PGP signature


Re: [Standards] Binary data over XMPP

2007-11-07 Thread Peter Saint-Andre
Tomasz Sterna wrote:
> Dnia 07-11-2007, Śr o godzinie 02:27 -0700, Peter Saint-Andre pisze:
>> 2. Attach a larger color sketch -- a file, the image for which a
>> thumbnail is a representation, or whatever (50k to 1M?). I think we
>> use
>> HTTP-PUT (perhaps via WebDAV) and jabber:x:oob, with IBB as a
>> fallback.
> 
> The OOB approach does not work in cases, where XMPP is the only window
> to the world - either as a BOSH, SSL tunnel to a server listening on 443
> (https) port, firewall rule allowing traffic to 5222 port...

In that situation is it acceptable to fall back to IBB?

Peter

-- 
Peter Saint-Andre
https://stpeter.im/




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Standards] Binary data over XMPP

2007-11-07 Thread Michal 'vorner' Vaner
Hello

On Wed, Nov 07, 2007 at 12:56:31PM +, Dave Cridland wrote:
> Yes, base64 is acceptable here, although bear in mind that over a 
> charged-by-transfer medium - such as many mobile phone tariffs - that 100k 
> image is transferred as 133k, and 33k that you didn't really need to 
> transfer sounds like an additional cost we could drop if we had the 
> technology to do so. It's not a driver for it, though, I agree.

Can't compression solve this? Does anyone know, how the base64 encoded
data grow/shrink, if they are put trough zlib? Would be nice to know,
how far it is worth going with the blob transfers & modifications to
protocol.

-- 
Wait few minutes before opening this email. The temperature difference 
could lead to vapour condensation.

Michal 'vorner' Vaner


pgpaq4QdizNSG.pgp
Description: PGP signature


Re: [Standards] Binary data over XMPP

2007-11-07 Thread Dave Cridland

On Wed Nov  7 15:02:57 2007, Michal 'vorner' Vaner wrote:
Can't compression solve this? Does anyone know, how the base64  
encoded
data grow/shrink, if they are put trough zlib? Would be nice to  
know,

how far it is worth going with the blob transfers & modifications to
protocol.


I've been accused - on this list - of treating compression as a  
panacea. But it's not a substitute for efficiency. Base64 encoding is  
recovered to a degree by a good minimal redundancy algorithm, but it  
tends to shield patterns from a dictionary algorithm. DEFLATE uses a  
Lempel-Ziv dictionary algorithm first, then Huffman, a minimal  
redundancy algorithm.


Lucky, practise is easier than theory. Grab some suitable data,  
compress it, base64+compress it, and compare all the sizes. Gzip is a  
useful tool to do this - the results aren't 100% accurate due to gzip  
overhead, but are close to the zlib compression we use in the  
application layer of XMPP, and are pretty close to DEFLATE (as we  
should be using, and as TLS uses).


I took a C source file, and found this:

-rwxr-xr-x 1 dwd dwd  36K 2007-11-07 15:43 connection.c
The original file. (100%)
-rw-r--r-- 1 dwd dwd  49K 2007-11-07 15:44 connection.c.b64
Base64 encoded, traditionally, with newlines. (135%)
-rw-r--r-- 1 dwd dwd  15K 2007-11-07 15:44 connection.c.b64.gz
Base64, then gzipped. (40%)
-rw-r--r-- 1 dwd dwd 8.1K 2007-11-07 15:44 connection.c.gz
Just gzipped. Note it's nearly half the size. We'll use this as an  
uncompressible object. (22% / 100%)

-rw-r--r-- 1 dwd dwd  11K 2007-11-07 15:45 connection.c.gz.b64
Gzipped, then base64. (30% / 135%)
-rw-r--r-- 1 dwd dwd 8.4K 2007-11-07 15:45 connection.c.gz.b64.gz
Now gzip it again. In principle, this should have recovered the  
base64 encoding, but note that it hasn't. (23% / 103%)


This suggests to me that not only does gzip not recover the base64  
encoding fully - although close - but base64 encoding prior to  
compression really hurts the compressor.


Note that compressing first, then base64 encoding, then compressing  
*again* actually gave better results than base64 *then* compressing,  
meaning that almost every file transfer we do under base64 should be  
compressed first.


Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade


Re: [Standards] Binary data over XMPP

2007-11-07 Thread Dave Cridland

On Wed Nov  7 09:27:49 2007, Peter Saint-Andre wrote:

As always, what are the use cases?

If XML is black-and-white, I see:

1. Include a little dab of color -- an emoticon, a PNG avatar, a
thumbnail for a file, a small inline image for whiteboarding, or
whatever (something less than 50k and perhaps less than 10k). Here
Base64 might be all we need, via data: or cid: URLs perhaps.


Yes, base64 is acceptable here, although bear in mind that over a  
charged-by-transfer medium - such as many mobile phone tariffs - that  
100k image is transferred as 133k, and 33k that you didn't really  
need to transfer sounds like an additional cost we could drop if we  
had the technology to do so. It's not a driver for it, though, I  
agree.




2. Attach a larger color sketch -- a file, the image for which a
thumbnail is a representation, or whatever (50k to 1M?). I think we  
use
HTTP-PUT (perhaps via WebDAV) and jabber:x:oob, with IBB as a  
fallback.



Right, we're into "would be very nice" territory here. 333k (or  
thereabouts) is a noticable chunk on my mobile bill, and it's around  
11 seconds of time on my (256k uplink) DSL.




3. Send a huge color canvas -- a music file, a podcast, a video, or
whatever (1M+?). I don't know what we use for this.


Once we're into filesharing, then yes, we need either a binary  
streaming protocol (A binary variant of IBB), or else we want to ship  
the data out of band.


There's also a use-case you seem to have forgotten, which is the  
reason I raised this now:


4. XTLS and similar encrypted server-mediated client-client streams.

To send these via a peer-to-peer session negotiated via XMPP - like  
Jingle - strikes me as losing a fundamental benefit of XMPP, but it's  
also much cheaper in terms of bandwidth than sending them via the  
server right now.


I personally feel that if we're to say that XMPP truly supports end  
to end encryption, we need to ensure it's of near-equal cost to the  
current way of doing things.


Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade


Re: [Standards] Binary data over XMPP

2007-11-07 Thread Tomasz Sterna
Dnia 07-11-2007, Śr o godzinie 02:27 -0700, Peter Saint-Andre pisze:
> 2. Attach a larger color sketch -- a file, the image for which a
> thumbnail is a representation, or whatever (50k to 1M?). I think we
> use
> HTTP-PUT (perhaps via WebDAV) and jabber:x:oob, with IBB as a
> fallback.

The OOB approach does not work in cases, where XMPP is the only window
to the world - either as a BOSH, SSL tunnel to a server listening on 443
(https) port, firewall rule allowing traffic to 5222 port...

I know, this usage is very common, and XMPP is known to be "the way" to
get a running IM in networks, where Internet == HTTP.

And the decision what is to large for the XMPP server to put through
should be a deployment decision - some servers administrators do not
like 10kB transfers and some are just fine with 100MB transfers.

-- 
  /\_./o__ Tomasz Sterna
 (/^/(_^^'  Xiaoka.com
._.(_.)_  XMPP: [EMAIL PROTECTED]



Re: [Standards] Binary data over XMPP

2007-11-07 Thread Peter Saint-Andre
Kevin Smith wrote:
> On 7 Nov 2007, at 09:27, Peter Saint-Andre wrote:
>> 2. Attach a larger color sketch -- a file, the image for which a
>> thumbnail is a representation, or whatever (50k to 1M?). I think we use
>> HTTP-PUT (perhaps via WebDAV) and jabber:x:oob, with IBB as a fallback.
>> 3. Send a huge color canvas -- a music file, a podcast, a video, or
>> whatever (1M+?). I don't know what we use for this.
> 
> The Jabber Disk method seems to work rather well for these scenarios...

Yes it does:

http://dev.jabbim.cz/jdisk

I would be perfectly happy to standardize on that approach for "larger"
blobs (64k+ or whatever), with IBB as a fallback.

For "smaller" blobs (I think of this as less than 64k since that's the
upper stanza size limit on the jabber.org service, but it might even be
smaller) it seems just fine to include the blob "inline" via some method
yet to be worked out. Adam Nemeth was working on something like this for
emoticons.

It's funny, I was chatting with Jeremie Miller the other day (he's not
on this list AFAIK) and he said "If I had known that someday people
would choose their IM technology based on emoticons, I would have
designed a simple binary-inclusion technology into Jabber from the
beginning." So now we have the chance to remedy the oversight. But
please let's keep it simple, shall we? This is for small stuff like
emoticons and thumbnails.

Peter

-- 
Peter Saint-Andre
https://stpeter.im/



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Standards] Binary data over XMPP

2007-11-07 Thread Kevin Smith

On 7 Nov 2007, at 09:27, Peter Saint-Andre wrote:

2. Attach a larger color sketch -- a file, the image for which a
thumbnail is a representation, or whatever (50k to 1M?). I think we  
use
HTTP-PUT (perhaps via WebDAV) and jabber:x:oob, with IBB as a  
fallback.

3. Send a huge color canvas -- a music file, a podcast, a video, or
whatever (1M+?). I don't know what we use for this.


The Jabber Disk method seems to work rather well for these scenarios...

/K 


Re: [Standards] Binary data over XMPP

2007-11-07 Thread Peter Saint-Andre
Dave Cridland wrote:
> On Mon Nov  5 15:11:33 2007, Thomas Charron wrote:
>> On 11/5/07, Michal 'vorner' Vaner <[EMAIL PROTECTED]> wrote:
>> > Hello
>> > On Mon, Nov 05, 2007 at 02:45:05PM +, Dave Cridland wrote:
>> > > Another option would be to setup a distinct connection (and
>> protocol) for
>> > > routing blobs, and so send them through the server, yet not
>> in-band. I'm
>> > > not comfortable with this, because it means essentially
>> duplicating all
>> > > security information, and maintaining synchronization between two
>> distinct
>> > > streams.
>> > Or make the connection blobs by default, and some blobs could contain
>> > complete XML documents, like this:
>> > lenght of first block
>> > 
>> > length of second block
>> > 
>> > length of third block
>> > some binary data.
>> > It is as much drastic approach as the blobs, it changes the protocol
>> > from the very basic ground. Furthermore, you can extract the stanza and
>> > feed it to any XML parser.
>>
>>   Not to mention the documentation would be much easier.  We could
>> just refer to the BEEP standards instead of having to write our own.
>> Of course, one could argue, just use BEEP at that point.
> 
> Way ahead of you. See the first paragraph of the mail quoted above. :-)
> 
> The essential principle is much the same, but I'm not advocating
> bringing the whole of BEEP into play, here. That has flow-control and
> all sorts, and supports the splitting of a message into multiple frames,
> which brings in a lot of complexity.
> 
> This complexity is unwarranted, in my opinion, in the context of XMPP.
> The one thing we might want - and I stress might - is the framing of
> arbitrary data by framing everything.
> 
> We've always relied, in XMPP, on the implicit framing that XML can give
> us, but that's not always the best option, as we've seen. Base64 doesn't
> - in my opinion - grant us sufficient efficiency in a number of
> circumstances.
> 
> So we need something else, and our two options boil down to either
> framing everything - the BEEP method - or an escape mechanism which is
> used to frame non-XML data - we can call this the IMAP method, since
> it's pretty similar.
> 
> I strongly suspect, given the way the discussion is going, that we
> either have to consider framing everything - and that's a huge break
> from XMPP - or else we need an escape mechanism that works. Or, of
> course, we decide to give up and frame using XML as now, and use base64
> to cope.

As always, what are the use cases?

If XML is black-and-white, I see:

1. Include a little dab of color -- an emoticon, a PNG avatar, a
thumbnail for a file, a small inline image for whiteboarding, or
whatever (something less than 50k and perhaps less than 10k). Here
Base64 might be all we need, via data: or cid: URLs perhaps.

2. Attach a larger color sketch -- a file, the image for which a
thumbnail is a representation, or whatever (50k to 1M?). I think we use
HTTP-PUT (perhaps via WebDAV) and jabber:x:oob, with IBB as a fallback.

3. Send a huge color canvas -- a music file, a podcast, a video, or
whatever (1M+?). I don't know what we use for this.

Peter

-- 
Peter Saint-Andre
https://stpeter.im/



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Standards] Binary data over XMPP

2007-11-06 Thread Dave Cridland

On Tue Nov  6 15:25:44 2007, Tomasz Sterna wrote:

Dnia 06-11-2007, Wt o godzinie 14:56 +, Dave Cridland pisze:
> I'm not following something. So encode the octets #x00 #x01 #x02
> #x5D #x3E, and tell me what you get.

Like this:

Binary <-> Encoded
0x00 <-> 0xC4, 0x80
0x01 <-> 0xC4, 0x81
...


Ah, okay - so you're adding 0x100 to these. I thought this would  
yield 3-octet characters, hence my confusion.




0x20 <-> 0x20
0x21 <-> 0x21
..
0x7F <-> 0x7F
0x80 <-> 0xC2, 0x80
..
0xFF <-> 0xC3, 0xBF



Right.



> I get three bytes that are not legal in a CDATA section, followed  
by  > a sequence of bytes which decode (via UTF-8) to "]]>", which  
in turn  > would end the CDATA section.


Good point.
We either transfer this chunk in &...; escaping, or just transcode  
0x3E

or 0x5D bytes to 2byte UTF-8 character. (Maybe '>' to '»' :)



Or add 0x100 again. (I checked this time, 0x5D encodes to 0xC5 0x9D).

However, using this technique, truly random data will expand by -  
roughly - 60.5%. Base64 beats this, at only 33%. There's only 101  
octets that are legal single-byte UTF-8 octets that we can allow  
safely in CDATA sections, by my count, so that leaves 155 that are  
double-byte.


Base64 operates by encoding 6 bits into an alphabet of 64 symbols;  
encoding 7 bits needs an alphabet of 2^7, or 128 symbols, and would  
give us growth of 14.2% - we don't have 128 symbols to play with,  
though. We could choose an additional 17 double-octet symbols, in  
which case we'd see growth of 20.5% overall. Slightly better than  
base64.


So we'd encode each 7 bits using an alphabet of #x9 | #xA | #xD |  
[#x20-#x3D] | [#x3F-#x5C] | [#x5E-#x111], which would then be UTF-8  
encoded, and be roughly 90% of the size of base64.


However, I think you need to factor in the overhead that no  
encoder/decoder library exists for this, and each individual  
implementation would have to code one, (or wait for someone else to  
do so).


Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade


Re: [Standards] Binary data over XMPP

2007-11-06 Thread Tomasz Sterna
Dnia 06-11-2007, Wt o godzinie 14:56 +, Dave Cridland pisze:
> I'm not following something. So encode the octets #x00 #x01 #x02
> #x5D #x3E, and tell me what you get.

Like this:

Binary <-> Encoded
0x00 <-> 0xC4, 0x80
0x01 <-> 0xC4, 0x81
...
0x20 <-> 0x20
0x21 <-> 0x21
..
0x7F <-> 0x7F
0x80 <-> 0xC2, 0x80
..
0xFF <-> 0xC3, 0xBF


> I get three bytes that are not legal in a CDATA section, followed by  
> a sequence of bytes which decode (via UTF-8) to "]]>", which in turn  
> would end the CDATA section.

Good point.
We either transfer this chunk in &...; escaping, or just transcode 0x3E
or 0x5D bytes to 2byte UTF-8 character. (Maybe '>' to '»' :)


-- 
  /\_./o__ Tomasz Sterna
 (/^/(_^^'  Xiaoka.com
._.(_.)_  XMPP: [EMAIL PROTECTED]



Re: [Standards] Binary data over XMPP

2007-11-06 Thread Dave Cridland

On Tue Nov  6 14:46:32 2007, Tomasz Sterna wrote:

Dnia 06-11-2007, Wt o godzinie 14:35 +, Dave Cridland pisze:
> > Let's take first 256 allowable UTF-8 characters [...]

> Can't do that, because many of those characters are going to be   
> illegal even in CDATA sections.


First _allowable_ 256 UTF-8 characters are for sure legal in CDATA
section.


I'm not following something. So encode the octets #x00 #x01 #x02 #x5D  
#x5D #x3E, and tell me what you get.


I get three bytes that are not legal in a CDATA section, followed by  
a sequence of bytes which decode (via UTF-8) to "]]>", which in turn  
would end the CDATA section.


As far as I can tell, all those octet values would need to be further  
escaped.


Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade


Re: [Standards] Binary data over XMPP

2007-11-06 Thread Christoph Schmidt

First _allowable_ 256 UTF-8 characters are for sure legal in CDATA
section.


What about 0x0..0x19? These chars are invalid in CDATA sections except 
0x9, 0xA and 0xC.




Re: [Standards] Binary data over XMPP

2007-11-06 Thread Niklas Höglund
Michal 'vorner' Vaner wrote:
> But a different question - is binary XML able to transfer binary data?
> And is it possible to map normal XML <-> binary XML one to one? If so,
> we could have a stream feature "use binary XML instead and transfer blob
> elements not-base64-encoded" or something like that. If the server
> needed to push it to a non-binary stream, it would have to base64 it (or
> something like that).
>
> Does it make sense? (Just an crazy idea, I do not know, if it could be
> of any use).
>   


EXI seems to encode binary data as a length-prefixed blob, and I think
all EXI files can be converted to normal XML, so that may work nicely.

The downside of not doing framing or a separate connection is that a
large image in a chat message will stall all subsequent messages until
it is done. And how should a file transfer be handled if NAT boxes or
firewalls prevent Jingle connections?

Wouldn't it be a pragmatic solution to negotiate a framing protocol like
BEEP (or maybe simpler) when opening the connection, or fall back to
base 64 if one of the parties doesn't support that?

-- 
Niklas




Re: [Standards] Binary data over XMPP

2007-11-06 Thread Tomasz Sterna
Dnia 06-11-2007, Wt o godzinie 14:35 +, Dave Cridland pisze:
> > Let's take first 256 allowable UTF-8 characters [...]

> Can't do that, because many of those characters are going to be  
> illegal even in CDATA sections.

First _allowable_ 256 UTF-8 characters are for sure legal in CDATA
section.


> But bear in mind that even then, to encode a single octet will yield  
> between 1 and 3 characters.

I would only use those UTF-8 characters that maps to maximum 2 bytes.
Leaving the 3byte and more...


And a better mapping:
Bytes that are valid UTF-8 characters are mapped 1 to 1.
Only the invalid ones are mapped to 2byte characters.

This way if the "binary" data is ASCII text, it stays human readable.

This is a simple 256 rows translation table, that could be defined
verbatim.


-- 
  /\_./o__ Tomasz Sterna
 (/^/(_^^'  Xiaoka.com
._.(_.)_  XMPP: [EMAIL PROTECTED]



Re: [Standards] Binary data over XMPP

2007-11-06 Thread Dave Cridland

On Tue Nov  6 13:00:44 2007, Tomasz Sterna wrote:

Dnia 05-11-2007, Pn o godzinie 16:23 +0100, Tomasz Sterna pisze:
> Alternatively we could invent binary-2-utf mapping which has less
> overhead than BASE64.

Simplest that comes to mind:
Let's take first 256 allowable UTF-8 characters and assign them to  
256

values of a single byte.
That would be less than 33% BASE64 overhead.


Can't do that, because many of those characters are going to be  
illegal even in CDATA sections.


You could take all those ones, though, and add 256 to the codepoint  
value before encoding - that would - I think - be sufficient.


But bear in mind that even then, to encode a single octet will yield  
between 1 and 3 characters. Encoding essentially random data - which  
includes the output of any decent encryption algorithm - will encode  
half the octets using 2-byte characters, yielding - on average - a  
50% inflation. That's higher than base64, of course.


It's possible that a modified UTF-7 might be better. (And UTF-7,  
modified or not, is acceptable UTF-8).


Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade


Re: [Standards] Binary data over XMPP

2007-11-06 Thread Michal 'vorner' Vaner
Hello

On Tue, Nov 06, 2007 at 02:00:44PM +0100, Tomasz Sterna wrote:
> Dnia 05-11-2007, Pn o godzinie 16:23 +0100, Tomasz Sterna pisze:
> > Alternatively we could invent binary-2-utf mapping which has less
> > overhead than BASE64.
> 
> Simplest that comes to mind:
> Let's take first 256 allowable UTF-8 characters and assign them to 256
> values of a single byte.
> That would be less than 33% BASE64 overhead.
> 
> But I'm sure one of the more knowledgeable in the UTF internals would
> come up with better mapping.

If you want to map every byte to char (for simplicity), then you can not
come with anything better, since the chars at the beginning are the
shortest ones and their size grows with their position.

But, how the data sizes transfered would change, if the stream was
UTF-7? Most of it are namespaces, which contain only ASCII. Then you
have base64 data and most of the text transfered is usually ASCII too.
This could be quite simple to add as a stream feature.

-- 
Anyone who goes to a psychiatrist ought to have his head examined.
-- Samuel Goldwyn

Michal 'vorner' Vaner


pgppSRabIvB1z.pgp
Description: PGP signature


Re: [Standards] Binary data over XMPP

2007-11-06 Thread Tomasz Sterna
Dnia 05-11-2007, Pn o godzinie 16:23 +0100, Tomasz Sterna pisze:
> Alternatively we could invent binary-2-utf mapping which has less
> overhead than BASE64.

Simplest that comes to mind:
Let's take first 256 allowable UTF-8 characters and assign them to 256
values of a single byte.
That would be less than 33% BASE64 overhead.

But I'm sure one of the more knowledgeable in the UTF internals would
come up with better mapping.


-- 
  /\_./o__ Tomasz Sterna
 (/^/(_^^'  Xiaoka.com
._.(_.)_  XMPP: [EMAIL PROTECTED]



Re: [Standards] Binary data over XMPP

2007-11-06 Thread Robin Redeker
On Tue, Nov 06, 2007 at 10:16:19AM +, Dave Cridland wrote:
[.snip.]
> >  You would no longer be  able to do that with binary blobs; you  
> >would have to special-case blob  stanzas fairly heavily, since I  
> >guarantee you that if the characters  '<' or '>' appear un-escaped  
> >in the binary blob, Expat will choke and  die.
> >
> >
> Sure, but there's two options with an escaping mechanism - either  
> synchronized or non-synchronized - and they can be negotiated easily.
> 
> With a non-synch mechanism, the sender just sends out the   
> element, then sends out the binary data, then continues with XML. It  
> can be done in a single TCP packet, but it requires that the receiver  
> processes the data into stanzas prior to processing through the XML  
> parser. Some receivers already do this, so it seems reasonable that  
> this can be an option.

Also take into account that the sender also has to customize the XML
writer to allow writing raw octets, which breaks multiple layers of nice
cosy XML abstractions. (Of course with current XMPP you already need a
XML writer that allows some customisations).

> With a synch mechanism, the sender sends out a  element, and  
> then waits. The receiver then says it's ready for binary data  
> (sending a stanza to indicate this), and the sender then sends the  
> binary data - followed immediately by more XML as required, since a  
> "binary parser" is going to be octet counting anyway. For people who  
> parse all the network traffic at once through a SAX-like parser, this  
> should work fine, at the expense of some efficiency.
> 
> Note that anyone can send non-synchronized blobs, but not everyone  
> can receive them, so a client (for instance) which is built to stream  
> network data directly into a SAX parser can still *send* blobs  
> efficiently.

How do you propose the receiver determines the end of the binary data?
Is it going to be prefixed by a lenght?

Generally: pumping binary crap through XMPP is another big step _away_
from XML compatibility.

Also transforming a stream (TLS) into packets (stanzas) which are used to
again emulate a stream (TLS) sounds crazy to me anyway :)

To encrypt stanzas (packets) with TLS (stream encryption)... *shudder* :)

Robin


Re: [Standards] Binary data over XMPP

2007-11-06 Thread Richard Dobson



Ever tried to get FTP protocol through FW/NAT?
It requires protocol level command channel tracking, to find out related
data channels and let them in.
Special handling, special modules, special setup - ergo: nobody bothers.
  
Well as has been already pointed out by Dave what you are talking about 
(PORT FTP) is completely different from what I suggested in that its the 
client opening the port and not the server, what I was suggesting was 
the server having an extra port open (or even the normal XMPP C2S port 
with a special negotiation turning it into a framed binary connection) 
and just maintaining two connections to the server, one that carries the 
normal XMPP traffic and one that carries the binary frames, you could 
even just use a single framed binary connection (rather than two) and 
have a special XMPP XML frame type to denote it containing XMPP stanzas, 
this is what I do in my server implementation which supports framed as 
well as normal XMPP streams, among other things I find it makes 
implementing a low overhead keepalive/pingpong protocol a whole lot easier.

This is one of the reasons why HTTP (one connection) is omnipresent,
even for file archives, and FTP is becoming forgotten.
  
Sorry but that is not the actual reason by any means, and there are 
plenty of FTP archived around, have you never downloaded a linux ISO? 
Most of the linux ISO download servers i've come across have been FTP 
servers, but anyway this is getting rather off topic.


Richard




Re: [Standards] Binary data over XMPP

2007-11-06 Thread Dave Cridland

On Tue Nov  6 10:09:41 2007, Michal 'vorner' Vaner wrote:
Because the FTP data channel (not to mention it offers passive  
transfer,

too) is _inbound_.


Well, PASV initiated connections are client->server, whereas PORT  
intiated are server->client callbacks. PORT is *almost* dead, now, as  
a result of the complexities of running an ALG in the firewall.



 If you opened not one TCP connection to the server,
but two, one for XML and one for blobs, how it would be different  
from

single TCP connection?


Well, to state the obvious, it's not a *single* TCP connection.  
There's still a distinct increase in attack surface by trying to  
ensure that two connections are assuredly the same client. In  
addition, you've got to synchronize the blobs on one session with the  
XML on the other. I think this would get complicated fast.



But a different question - is binary XML able to transfer binary  
data?
And is it possible to map normal XML <-> binary XML one to one? If  
so,
we could have a stream feature "use binary XML instead and transfer  
blob

elements not-base64-encoded" or something like that. If the server
needed to push it to a non-binary stream, it would have to base64  
it (or

something like that).

Does it make sense? (Just an crazy idea, I do not know, if it could  
be

of any use).


I don't know the binary XML representations very well, but it's  
certainly something I'd be curious about.


One thing of note, though - the bulk of XMPP traffic *now* is not  
binary. We want this to change - or at least, we want this to be able  
to change - without penalty. So a binary XML format would have to  
maintain near-equal efficiency when used for traditional XMPP  
traffic, and in addition be a simple upgrade for implementors.


Dave
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade


Re: [Standards] Binary data over XMPP

2007-11-06 Thread Dave Cridland
Forgive me for sounding like an idiot, but I seem to be missing the  
point here:


On Mon Nov  5 17:45:53 2007, Rachel Blackman wrote:
Expat (a fairly common XML parser out there) will do the job just   
fine.  Your network engine has to separate each stanza out, sure,  
but  that's not hard.  And then you can pass each stanza unaltered  
through  expat and get back your usual XML structures.


Is this saying that given a string containing multiple stanzas, you  
need to seperate them out into one stanza per string, before feeding  
them in? I thought that with a SAX-like XML parser, you needn't  
bother doing that.


  You would no longer be  able to do that with binary blobs; you  
would have to special-case blob  stanzas fairly heavily, since I  
guarantee you that if the characters  '<' or '>' appear un-escaped  
in the binary blob, Expat will choke and  die.



Sure, but there's two options with an escaping mechanism - either  
synchronized or non-synchronized - and they can be negotiated easily.


With a non-synch mechanism, the sender just sends out the   
element, then sends out the binary data, then continues with XML. It  
can be done in a single TCP packet, but it requires that the receiver  
processes the data into stanzas prior to processing through the XML  
parser. Some receivers already do this, so it seems reasonable that  
this can be an option.


With a synch mechanism, the sender sends out a  element, and  
then waits. The receiver then says it's ready for binary data  
(sending a stanza to indicate this), and the sender then sends the  
binary data - followed immediately by more XML as required, since a  
"binary parser" is going to be octet counting anyway. For people who  
parse all the network traffic at once through a SAX-like parser, this  
should work fine, at the expense of some efficiency.


Note that anyone can send non-synchronized blobs, but not everyone  
can receive them, so a client (for instance) which is built to stream  
network data directly into a SAX parser can still *send* blobs  
efficiently.


If we really need a non-BASE64 method of sending binary data  
between  clients, I suggest we re-use Jingle.  That already is a  
mechanism for  negotiation of 'I want to send you this type of  
data, how do I get it  to you?'  There's very few cases I can think  
of where we would want to  be sending binary blobs in a  
server-cached manner anyway.


Server-proxied, not cached.

This implies that encrypted chat sessions don't go via the server,  
for example, meaning that a client intending to encrypt all  
conversations by default is going to use XMPP purely as a session  
initiation protocol, and lose all efficiency (and a degree of  
privacy) as a result. Or else it'll be base64 encoding the entire  
conversation, and lose efficiency that way.


Either way, it will directly impact the usage of encryption - and  
that's ignoring the other ways that binary data is commonly used  
within XMPP.


Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade


Re: [Standards] Binary data over XMPP

2007-11-06 Thread Michal 'vorner' Vaner
Hello

On Tue, Nov 06, 2007 at 10:44:15AM +0100, Tomasz Sterna wrote:
> Dnia 06-11-2007, Wt o godzinie 09:17 +, Richard Dobson pisze:
> > > And repeat the FTP + statefull firewall nightmare?   
> > Sorry but what??? Can you explain exactly what you mean by this.
> 
> Ever tried to get FTP protocol through FW/NAT?
> It requires protocol level command channel tracking, to find out related
> data channels and let them in.
> Special handling, special modules, special setup - ergo: nobody bothers.

Because the FTP data channel (not to mention it offers passive transfer,
too) is _inbound_. If you opened not one TCP connection to the server,
but two, one for XML and one for blobs, how it would be different from
single TCP connection?

But a different question - is binary XML able to transfer binary data?
And is it possible to map normal XML <-> binary XML one to one? If so,
we could have a stream feature "use binary XML instead and transfer blob
elements not-base64-encoded" or something like that. If the server
needed to push it to a non-binary stream, it would have to base64 it (or
something like that).

Does it make sense? (Just an crazy idea, I do not know, if it could be
of any use).

-- 
Q:  Why was Stonehenge abandoned?
A:  It wasn't IBM compatible.

Michal 'vorner' Vaner


pgpocuJEYcEMI.pgp
Description: PGP signature


Re: [Standards] Binary data over XMPP

2007-11-06 Thread Tomasz Sterna
Dnia 06-11-2007, Wt o godzinie 09:17 +, Richard Dobson pisze:
> > And repeat the FTP + statefull firewall nightmare?   
> Sorry but what??? Can you explain exactly what you mean by this.

Ever tried to get FTP protocol through FW/NAT?
It requires protocol level command channel tracking, to find out related
data channels and let them in.
Special handling, special modules, special setup - ergo: nobody bothers.

This is one of the reasons why HTTP (one connection) is omnipresent,
even for file archives, and FTP is becoming forgotten.

-- 
  /\_./o__ Tomasz Sterna
 (/^/(_^^'  Xiaoka.com
._.(_.)_  XMPP: [EMAIL PROTECTED]



Re: [Standards] Binary data over XMPP

2007-11-06 Thread Richard Dobson

Tomasz Sterna wrote:

Dnia 05-11-2007, Pn o godzinie 16:24 +, Richard Dobson pisze:
  

Personally I think it would be better to do as someone already
suggested 
and have a separate connection for framed blobs that you maintain or 
establish when needed to send those



And repeat the FTP + statefull firewall nightmare?
  

Sorry but what??? Can you explain exactly what you mean by this.



Re: [Standards] Binary data over XMPP

2007-11-05 Thread Tomasz Sterna
Dnia 05-11-2007, Pn o godzinie 16:24 +, Richard Dobson pisze:
> Personally I think it would be better to do as someone already
> suggested 
> and have a separate connection for framed blobs that you maintain or 
> establish when needed to send those

And repeat the FTP + statefull firewall nightmare?


-- 
  /\_./o__ Tomasz Sterna
 (/^/(_^^'  Xiaoka.com
._.(_.)_  XMPP: [EMAIL PROTECTED]



Re: [Standards] Binary data over XMPP

2007-11-05 Thread Robin Redeker
On Mon, Nov 05, 2007 at 09:56:12AM -0800, Justin Karneges wrote:
> On Monday 05 November 2007 3:40 am, Dave Cridland wrote:
> > Now, we can't expect that the entire Internet will bend to our will
> > and instantly upgrade, so we need a sane fallback - probably to IBB,
> > or something fairly similar. The interesting question is whether we
> > choose to have this negotiated end to end (which means we'll need to
> > have each hop along the route tested), or whether we say that this
> > down-conversion happens within servers.
> 
> Binary over XMPP has been on my TODO for awhile now, and I have some notes 
> written up about it but nothing publicized.  I think a hop-by-hop approach is 
> best, if we want to have any hope for compatibility.
> 
> Comments on the two formatting approaches:
> 
>  1) XML element to indicate binary mode: this is probably the least 
> destructive approach.  Keep in mind that we already have an XML to binary 
> protocol change in XMPP: the TLS and SASL encryption layers.  Your XML parser 
> needs to be able to stop on a dime when it sees that final '>' character, so 
> asking for that in this discussion should not be a big deal.

Just keep in mind that we don't have a way to change "back".
The current change is a very drastic one, like "flush the whole
parser state and begin from start".


Robin


Re: [Standards] Binary data over XMPP

2007-11-05 Thread Robin Redeker
On Mon, Nov 05, 2007 at 06:47:40PM +0100, Michal 'vorner' Vaner wrote:
> Hello
> 
> On Mon, Nov 05, 2007 at 06:27:33PM +0100, Robin Redeker wrote:
> > On Mon, Nov 05, 2007 at 04:04:10PM +0100, Michal 'vorner' Vaner wrote:
> > > 
> > > It is as much drastic approach as the blobs, it changes the protocol
> > > from the very basic ground. Furthermore, you can extract the stanza and
> > > feed it to any XML parser.
> > 
> > +1 for "real" protocol frames!
> 
> Actually, I was just showing, how deep change was the blobs thing. I'm
> against changing the whole infrastructure inside out. I didn't mean to
> propagate the frames, I just took them as example.

Heh, ok. Just wanted to take the chance to promote the idea once more :)


R


Re: [Standards] Binary data over XMPP

2007-11-05 Thread Richard Dobson

You do not want to use 65 too much. If I skip the fact it is going to
get deprecated by jingle, probably, it is really heavy for small blobs,
like an icon or a funny image in a message. (Of course, it is the right
way for 1GB file you want to send).


Jingle doesnt depreciate XEP-0065 as far as I am aware (just 
XEP-0095/96) given that Peter was working on writing a spec so you can 
use XEP-0065 proxys with Jingle. Also if all you are using it for is 
small things like icons or little images then I fail to see what real 
benefits this has over just using IBB given the complexity it will take 
to implement framing in XMPP, if they are small the overall overhead is 
neglidgeable compared so the effort we would all need to go though to 
implement framing.




Re: [Standards] Binary data over XMPP

2007-11-05 Thread Justin Karneges
On Monday 05 November 2007 3:40 am, Dave Cridland wrote:
> Now, we can't expect that the entire Internet will bend to our will
> and instantly upgrade, so we need a sane fallback - probably to IBB,
> or something fairly similar. The interesting question is whether we
> choose to have this negotiated end to end (which means we'll need to
> have each hop along the route tested), or whether we say that this
> down-conversion happens within servers.

Binary over XMPP has been on my TODO for awhile now, and I have some notes 
written up about it but nothing publicized.  I think a hop-by-hop approach is 
best, if we want to have any hope for compatibility.

Comments on the two formatting approaches:

 1) XML element to indicate binary mode: this is probably the least 
destructive approach.  Keep in mind that we already have an XML to binary 
protocol change in XMPP: the TLS and SASL encryption layers.  Your XML parser 
needs to be able to stop on a dime when it sees that final '>' character, so 
asking for that in this discussion should not be a big deal.

2) Framing mode: this is probably the most optimized approach, but then the 
protocol becomes very unlike XMPP, and yes it may be worth using BEEP then 
(although honestly I haven't read the BEEP RFC in awhile, it probably does 
more than we need).

For framing, I came up with two approaches: "interleaved binary" and "stream 
multiplexing".  Either way you have your TLV framing, and a very tight 
binding to what we're trying to accomplish.

For the interleaved binary, there are two types: XML (0) and binary (1). :)  
Either packet type can contain arbitrary amounts of data.  It would not be 
required for the XML type to contain a complete element, for example.

The following two transmissions would be equivalent (whitespace added for 
clarity).

C0: 
C0:   
C0: SGVsbG8gd29ybGQ=
C0:   
C0: 

C0: 
C0:   
C1: Hello world
C0:   
C0: 

The binary type could be converted to and from Base64 by any hop.  Thus, it is 
important to consider with this protocol that you're not sending a random 
blob of binary, you're sending Base64'd CDATA just in a more optimized 
format.  This simplifies integration into existing XMPP applications.  Stanza 
input and output would look exactly as they do today (containing binary that 
is Base64 encoded).  Only the transport layer would worry about converting 
back and forth.  Indeed, this means that if binary data is received on the 
network, it would probably be Base64 encoded and plugged into the stanza as 
CDATA before passing upwards to the application (to then be decoded 
again :) ).

The advantage of the interleaved approach is that anywhere there is Base64 we 
could do a binary transfer.  So not just IBB, but a presence signature, a 
vcard avatar, etc.

For the stream multiplexing approach, there would be a number of "channels".  
Channel 0 would be the XML stream, and would operate like normal.  Channel 1 
would be an IBB packet.  This gives is a very tight binding to IBB, but that 
may be fine since that's the main way you'd want to transfer binary anyway.

Typical IBB handshake:

C0: 
C0:   
C0: 

S0: 

Client sets channel 1 to be used for this IBB stream:

C0: 

Client sends some IBB packets:

C1: Hello world
C1: Data sent on this channel is not Base64 encoded

Server replies also using a channel:

S0: 

S1: You're right, and neither is this data!

If the next hop does not support ibbbind, then you would transmit as a regular 
IBB packet.  Yes, this means a server supporting ibbbind would have to know 
the IBB protocol (it would not be enough to expand the binary back into 
Base64 and send, it would truly have to reconstruct the ibb iq packet with 
the right sequence number, etc).  However, this intimate binding would end up 
being very optimized.

-Justin


Re: [Standards] Binary data over XMPP

2007-11-05 Thread Michal 'vorner' Vaner
Hello

On Mon, Nov 05, 2007 at 06:27:33PM +0100, Robin Redeker wrote:
> On Mon, Nov 05, 2007 at 04:04:10PM +0100, Michal 'vorner' Vaner wrote:
> > 
> > It is as much drastic approach as the blobs, it changes the protocol
> > from the very basic ground. Furthermore, you can extract the stanza and
> > feed it to any XML parser.
> 
> +1 for "real" protocol frames!

Actually, I was just showing, how deep change was the blobs thing. I'm
against changing the whole infrastructure inside out. I didn't mean to
propagate the frames, I just took them as example.

-- 
Einstein argued that there must be simplified explanations of nature, because
God is not capricious or arbitrary.  No such faith comforts the software
engineer.
-- Fred Brooks

Michal 'vorner' Vaner


pgpvuuTU0R7sK.pgp
Description: PGP signature


Re: [Standards] Binary data over XMPP

2007-11-05 Thread Rachel Blackman


On Nov 5, 2007, at 6:35 AM, Tomasz Sterna wrote:

Dnia 05-11-2007, Pn o godzinie 12:51 +0100, Michal 'vorner' Vaner  
pisze:

You probably can not do that with any reasonably out-of-the-box XML
parser.


You cannot use out-of-the-box XML parser anyway.


You can too.

Expat (a fairly common XML parser out there) will do the job just  
fine.  Your network engine has to separate each stanza out, sure, but  
that's not hard.  And then you can pass each stanza unaltered through  
expat and get back your usual XML structures.  You would no longer be  
able to do that with binary blobs; you would have to special-case blob  
stanzas fairly heavily, since I guarantee you that if the characters  
'<' or '>' appear un-escaped in the binary blob, Expat will choke and  
die.


I'm reasonably sure the same could be said of most other off-the-shelf  
XML parsers.


Sorry, but I'm with vorner on this one; the blob mechanism is neat,  
but too much of a departure from what we have to make it a smooth  
upgrade.  Changing stanza types and so on is one thing, but changing  
the entire parser -- and requiring people to literally re-invent the  
wheel and roll their own XML parsers -- is much less likely to be a  
friendly upgrade, or receive any sort of wide adoption.


If we really need a non-BASE64 method of sending binary data between  
clients, I suggest we re-use Jingle.  That already is a mechanism for  
negotiation of 'I want to send you this type of data, how do I get it  
to you?'  There's very few cases I can think of where we would want to  
be sending binary blobs in a server-cached manner anyway.


--
Rachel Blackman <[EMAIL PROTECTED]>
Trillian Messenger - http://www.trillianastra.com/




Re: [Standards] Binary data over XMPP

2007-11-05 Thread Robin Redeker
On Mon, Nov 05, 2007 at 04:04:10PM +0100, Michal 'vorner' Vaner wrote:
> Hello
> 
> On Mon, Nov 05, 2007 at 02:45:05PM +, Dave Cridland wrote:
> > Another option would be to setup a distinct connection (and protocol) for 
> > routing blobs, and so send them through the server, yet not in-band. I'm 
> > not comfortable with this, because it means essentially duplicating all 
> > security information, and maintaining synchronization between two distinct 
> > streams.
> 
> Or make the connection blobs by default, and some blobs could contain
> complete XML documents, like this:
> lenght of first block
> 
> length of second block
> 
> length of third block
> some binary data.
> 
> It is as much drastic approach as the blobs, it changes the protocol
> from the very basic ground. Furthermore, you can extract the stanza and
> feed it to any XML parser.

+1 for "real" protocol frames!

R


Re: [Standards] Binary data over XMPP

2007-11-05 Thread Michal 'vorner' Vaner
Hello

On Mon, Nov 05, 2007 at 04:24:17PM +, Richard Dobson wrote:
>> I strongly suspect, given the way the discussion is going, that we either 
>> have to consider framing everything - and that's a huge break from XMPP - 
>> or else we need an escape mechanism that works. Or, of course, we decide 
>> to give up and frame using XML as now, and use base64 to cope.
> Personally I think it would be better to do as someone already suggested 
> and have a separate connection for framed blobs that you maintain or 
> establish when needed to send those, sort of like XEP-0065, or why not just 
> use XEP-0065 itself??, and if the server you are using doesn't have a 
> XEP-0065 proxy then you can safely assume that the server administrators 
> don't want you sending lots of data through their server infrastructure.

You do not want to use 65 too much. If I skip the fact it is going to
get deprecated by jingle, probably, it is really heavy for small blobs,
like an icon or a funny image in a message. (Of course, it is the right
way for 1GB file you want to send).

-- 
Einstein argued that there must be simplified explanations of nature, because
God is not capricious or arbitrary.  No such faith comforts the software
engineer.
-- Fred Brooks

Michal 'vorner' Vaner


pgp2CpNMCHfnO.pgp
Description: PGP signature


Re: [Standards] Binary data over XMPP

2007-11-05 Thread Richard Dobson


I strongly suspect, given the way the discussion is going, that we 
either have to consider framing everything - and that's a huge break 
from XMPP - or else we need an escape mechanism that works. Or, of 
course, we decide to give up and frame using XML as now, and use 
base64 to cope.
Personally I think it would be better to do as someone already suggested 
and have a separate connection for framed blobs that you maintain or 
establish when needed to send those, sort of like XEP-0065, or why not 
just use XEP-0065 itself??, and if the server you are using doesn't have 
a XEP-0065 proxy then you can safely assume that the server 
administrators don't want you sending lots of data through their server 
infrastructure.


Richard




Re: [Standards] Binary data over XMPP

2007-11-05 Thread Dave Cridland

On Mon Nov  5 15:11:33 2007, Thomas Charron wrote:

On 11/5/07, Michal 'vorner' Vaner <[EMAIL PROTECTED]> wrote:
> Hello
> On Mon, Nov 05, 2007 at 02:45:05PM +, Dave Cridland wrote:
> > Another option would be to setup a distinct connection (and  
protocol) for
> > routing blobs, and so send them through the server, yet not  
in-band. I'm
> > not comfortable with this, because it means essentially  
duplicating all
> > security information, and maintaining synchronization between  
two distinct

> > streams.
> Or make the connection blobs by default, and some blobs could  
contain

> complete XML documents, like this:
> lenght of first block
> 
> length of second block
> 
> length of third block
> some binary data.
> It is as much drastic approach as the blobs, it changes the  
protocol
> from the very basic ground. Furthermore, you can extract the  
stanza and

> feed it to any XML parser.

  Not to mention the documentation would be much easier.  We could
just refer to the BEEP standards instead of having to write our own.
Of course, one could argue, just use BEEP at that point.


Way ahead of you. See the first paragraph of the mail quoted above.  
:-)


The essential principle is much the same, but I'm not advocating  
bringing the whole of BEEP into play, here. That has flow-control and  
all sorts, and supports the splitting of a message into multiple  
frames, which brings in a lot of complexity.


This complexity is unwarranted, in my opinion, in the context of  
XMPP. The one thing we might want - and I stress might - is the  
framing of arbitrary data by framing everything.


We've always relied, in XMPP, on the implicit framing that XML can  
give us, but that's not always the best option, as we've seen. Base64  
doesn't - in my opinion - grant us sufficient efficiency in a number  
of circumstances.


So we need something else, and our two options boil down to either  
framing everything - the BEEP method - or an escape mechanism which  
is used to frame non-XML data - we can call this the IMAP method,  
since it's pretty similar.


I strongly suspect, given the way the discussion is going, that we  
either have to consider framing everything - and that's a huge break  
from XMPP - or else we need an escape mechanism that works. Or, of  
course, we decide to give up and frame using XML as now, and use  
base64 to cope.


Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade


Re: [Standards] Binary data over XMPP

2007-11-05 Thread Tomasz Sterna
Dnia 05-11-2007, Pn o godzinie 16:36 +0100, Michal 'vorner' Vaner pisze:
> Now I can use SAX. If I had to care about the blobs, I couldn't, or I
> couldn't in an easy way.

Now I see your point. Taken.


> So you start splitting the data with one more run trough them? That is
> nasty

Matter of taste. ;-)


>  and slow.

I would guess that you would need a _very_ targeted benchmark to
actually see a slowdown.

Let's keep in mind, we're I/O bound, so processor time in our context is
cheap.


-- 
  /\_./o__ Tomasz Sterna
 (/^/(_^^'  Xiaoka.com
._.(_.)_  XMPP: [EMAIL PROTECTED]



Re: [Standards] Binary data over XMPP

2007-11-05 Thread Michal 'vorner' Vaner
Hello

On Mon, Nov 05, 2007 at 04:21:21PM +0100, Tomasz Sterna wrote:
> Dnia 05-11-2007, Pn o godzinie 15:59 +0100, Michal 'vorner' Vaner pisze:
> > > Dnia 05-11-2007, Pn o godzinie 12:51 +0100, Michal 'vorner' Vaner pisze:
> > > > You probably can not do that with any reasonably out-of-the-box XML
> > > > parser.
> > > 
> > > You cannot use out-of-the-box XML parser anyway.
> > > You need a one that parses and returns every  subelement
> > > separately.
> > 
> > Sax.
> 
> So you can use out-of-the-box parser, or you cannot?
> Please make up your mind. ;-)

Now I can use SAX. If I had to care about the blobs, I couldn't, or I
couldn't in an easy way.

> > > you stop feeding the data read from socket to parser, and fetch it
> > > directly for routing.
> > 
> > Unless you work like:
> > Got something on network, read all or full buffer (lets say max 4kB),
> > push it trough utf-8->internal strings and take the whole lot and feed
> > it to the parser.
> 
> So you read until '>' is spotted, as Greg suggested.

So you start splitting the data with one more run trough them? That is
nasty and slow.

-- 
I left the ssh key under the doormat

Michal 'vorner' Vaner


pgpNdHkXV7UlP.pgp
Description: PGP signature


Re: [Standards] Binary data over XMPP

2007-11-05 Thread Tomasz Sterna
Dnia 05-11-2007, Pn o godzinie 11:40 +, Dave Cridland pisze:
> It seems to me that there's a number of cases where shipping binary  
> blobs over XMPP is useful, and we don't want to be resorting to  
> base64 every time.

Alternatively we could invent binary-2-utf mapping which has less
overhead than BASE64.

-- 
  /\_./o__ Tomasz Sterna
 (/^/(_^^'  Xiaoka.com
._.(_.)_  XMPP: [EMAIL PROTECTED]



Re: [Standards] Binary data over XMPP

2007-11-05 Thread Tomasz Sterna
Dnia 05-11-2007, Pn o godzinie 15:59 +0100, Michal 'vorner' Vaner pisze:
> > Dnia 05-11-2007, Pn o godzinie 12:51 +0100, Michal 'vorner' Vaner pisze:
> > > You probably can not do that with any reasonably out-of-the-box XML
> > > parser.
> > 
> > You cannot use out-of-the-box XML parser anyway.
> > You need a one that parses and returns every  subelement
> > separately.
> 
> Sax.

So you can use out-of-the-box parser, or you cannot?
Please make up your mind. ;-)


> > you stop feeding the data read from socket to parser, and fetch it
> > directly for routing.
> 
> Unless you work like:
> Got something on network, read all or full buffer (lets say max 4kB),
> push it trough utf-8->internal strings and take the whole lot and feed
> it to the parser.

So you read until '>' is spotted, as Greg suggested.


-- 
  /\_./o__ Tomasz Sterna
 (/^/(_^^'  Xiaoka.com
._.(_.)_  XMPP: [EMAIL PROTECTED]



Re: [Standards] Binary data over XMPP

2007-11-05 Thread Thomas Charron
On 11/5/07, Michal 'vorner' Vaner <[EMAIL PROTECTED]> wrote:
> Hello
> On Mon, Nov 05, 2007 at 02:45:05PM +, Dave Cridland wrote:
> > Another option would be to setup a distinct connection (and protocol) for
> > routing blobs, and so send them through the server, yet not in-band. I'm
> > not comfortable with this, because it means essentially duplicating all
> > security information, and maintaining synchronization between two distinct
> > streams.
> Or make the connection blobs by default, and some blobs could contain
> complete XML documents, like this:
> lenght of first block
> 
> length of second block
> 
> length of third block
> some binary data.
> It is as much drastic approach as the blobs, it changes the protocol
> from the very basic ground. Furthermore, you can extract the stanza and
> feed it to any XML parser.

  Not to mention the documentation would be much easier.  We could
just refer to the BEEP standards instead of having to write our own.
Of course, one could argue, just use BEEP at that point.

:-D

-- 
-- Thomas


Re: [Standards] Binary data over XMPP

2007-11-05 Thread Michal 'vorner' Vaner
Hello

On Mon, Nov 05, 2007 at 02:45:05PM +, Dave Cridland wrote:
> Another option would be to setup a distinct connection (and protocol) for 
> routing blobs, and so send them through the server, yet not in-band. I'm 
> not comfortable with this, because it means essentially duplicating all 
> security information, and maintaining synchronization between two distinct 
> streams.

Or make the connection blobs by default, and some blobs could contain
complete XML documents, like this:
lenght of first block

length of second block

length of third block
some binary data.

It is as much drastic approach as the blobs, it changes the protocol
from the very basic ground. Furthermore, you can extract the stanza and
feed it to any XML parser.

-- 
When eating an elephant take one bite at a time.
-- Gen. C. Abrams

Michal 'vorner' Vaner


pgpoJcX8j7WDU.pgp
Description: PGP signature


Re: [Standards] Binary data over XMPP

2007-11-05 Thread Michal 'vorner' Vaner
Hello

On Mon, Nov 05, 2007 at 03:35:19PM +0100, Tomasz Sterna wrote:
> Dnia 05-11-2007, Pn o godzinie 12:51 +0100, Michal 'vorner' Vaner pisze:
> > You probably can not do that with any reasonably out-of-the-box XML
> > parser.
> 
> You cannot use out-of-the-box XML parser anyway.
> You need a one that parses and returns every  subelement
> separately.

Sax.

> you stop feeding the data read from socket to parser, and fetch it
> directly for routing.

Unless you work like:
Got something on network, read all or full buffer (lets say max 4kB),
push it trough utf-8->internal strings and take the whole lot and feed
it to the parser.

Now you got a blob somewhere in the middle you dragged trough the
codepage changer (and destroyed it, destroying the rest of the data too,
potencialy) and pushed it down the throat of the poor parser, when it
reported the blob start.

> > Furthermore, you may need to pass the stream trough charset
> > decoder to get some internal stringish representation.
> 
> What for?
> Does your language-of-chice not have an effective binary blob
> representation?

But I want to feed my parser with strings. I can not even fill it with
chars one by one, because I do not know, when each utf-8 char ends.

-- 
This message has optimized support for formating.
Please choose green font and black background so it looks like it should.

Michal 'vorner' Vaner


pgpH5cWNDJOIj.pgp
Description: PGP signature


Re: [Standards] Binary data over XMPP

2007-11-05 Thread Greg Hudson
On Mon, 2007-11-05 at 15:35 +0100, Tomasz Sterna wrote:
> You cannot use out-of-the-box XML parser anyway.

SAX-model XML parsers still qualify as "out of the box."

> So, once  is extracted from the stream and reported,
> you stop feeding the data read from socket to parser, and fetch it
> directly for routing.

By the time you have received an event reporting the blob element, you
have potentially already fed it a chunk containing the  and some
of your binary data.

(Unless you're handing it characters one by one, or being careful to
never feed chunks which contain a > character except at the end.  Either
is inefficient and hackish.)




Re: [Standards] Binary data over XMPP

2007-11-05 Thread Dave Cridland

On Mon Nov  5 11:51:16 2007, Michal 'vorner' Vaner wrote:

On Mon, Nov 05, 2007 at 11:40:18AM +, Dave Cridland wrote:
> A new top-level stanza of (say) , which much the same  
attributes as > any other routable stanza, but also has an octet  
count. Upon receipt, the > XML processing is suspended, and the  
following octets are handled verbatim:

>
> to='[EMAIL PROTECTED]/court' > octet-count='4'/>1234


You probably can not do that with any reasonably out-of-the-box XML
parser. Furthermore, you may need to pass the stream trough charset
decoder to get some internal stringish representation. This will  
make it

mad. So, in short, I strongly disagree here.


An alternate would be to encapsulate both XML and blobs, which'd be  
an even more radical departure. (And look impressively like BEEP). So  
for each chunk, you'd predefine how long it was, and whether it was  
blob or XML. (Yes, there are defined formats for doing so, which have  
been mentioned on this list before).


Or - just a thought - we could pinch IMAP's synchronizing literals:

C: 
S: 
C: [4096 octets of blob]
C: [... more XML ...]

This adds a round-trip to all blob-stanza transfers, of course.  
(Although it's a hop-by-hop RTT, not an end-to-end RTT). No reason  
that couldn't be an option, too, so implementations which can cope  
with non-synchronizing blobs can say so. (I personally suspect many  
will be able to).



But you may like SCTP or how's the protocol called and push the  
blobs

out-of-the stream.


Yes, but I doubt that'd get much traction. SCTP stacks are rare  
enough, and especially so in those areas where the base64 encoding  
overhead of (say) IBB makes a serious difference. (Yes, you could  
encourage all XMPP clients to include a SCTP/UDP implementation, but  
that's a heavy requirement, I'd have thought).



 Or another "blobby" TCP connection to the server. (if
you really want to send these things trough the server).


Well, I think increasingly we need to send these things via the  
server. In fact, we're doing so quite a bit - the question is, do we  
care about the base64 overhead enough that we want to address this.


Another option would be to setup a distinct connection (and protocol)  
for routing blobs, and so send them through the server, yet not  
in-band. I'm not comfortable with this, because it means essentially  
duplicating all security information, and maintaining synchronization  
between two distinct streams.


Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade


Re: [Standards] Binary data over XMPP

2007-11-05 Thread Tomasz Sterna
Dnia 05-11-2007, Pn o godzinie 12:51 +0100, Michal 'vorner' Vaner pisze:
> You probably can not do that with any reasonably out-of-the-box XML
> parser.

You cannot use out-of-the-box XML parser anyway.
You need a one that parses and returns every  subelement
separately. So, once  is extracted from the stream and reported,
you stop feeding the data read from socket to parser, and fetch it
directly for routing.

> Furthermore, you may need to pass the stream trough charset
> decoder to get some internal stringish representation.

What for?
Does your language-of-chice not have an effective binary blob
representation?


-- 
  /\_./o__ Tomasz Sterna
 (/^/(_^^'  Xiaoka.com
._.(_.)_  XMPP: [EMAIL PROTECTED]



Re: [Standards] Binary data over XMPP

2007-11-05 Thread Michal 'vorner' Vaner
Ahoj

On Mon, Nov 05, 2007 at 11:40:18AM +, Dave Cridland wrote:
> A new top-level stanza of (say) , which much the same attributes as 
> any other routable stanza, but also has an octet count. Upon receipt, the 
> XML processing is suspended, and the following octets are handled verbatim:
>
>  octet-count='4'/>1234

You probably can not do that with any reasonably out-of-the-box XML
parser. Furthermore, you may need to pass the stream trough charset
decoder to get some internal stringish representation. This will make it
mad. So, in short, I strongly disagree here.

But you may like SCTP or how's the protocol called and push the blobs
out-of-the stream. Or another "blobby" TCP connection to the server. (if
you really want to send these things trough the server).

-- 
"Don't worry about people stealing your ideas.   If your ideas are any good, 
you'll have to ram them down people's throats."
-- Howard Aiken

Michal 'vorner' Vaner


pgpN9uBBE4v9P.pgp
Description: PGP signature


[Standards] Binary data over XMPP

2007-11-05 Thread Dave Cridland
It seems to me that there's a number of cases where shipping binary  
blobs over XMPP is useful, and we don't want to be resorting to  
base64 every time.


I'm thinking, in particular, that this is needed for encrypted  
stanzas, images, and file transfers.


Is it worth our while to consider a single standardized mechanism for  
doing so? There's a number of ways this might work, here's one as a  
basis for discussion:


A new top-level stanza of (say) , which much the same  
attributes as any other routable stanza, but also has an octet count.  
Upon receipt, the XML processing is suspended, and the following  
octets are handled verbatim:


octet-count='4'/>1234


I'm using characters here instead of octets for clarity, but the  
"contents" of the blob element could contain NUL octets, non-UTF-8  
data, etc. Note that I've chosen to express it as an empty element  
followed by the contents - this is primarily because I strongly  
suspect that this is simpler to process for many implementations,  
although it is distinctly un-XML-ish.


The above won't handle imagery, and other blobs that need  
referencing. There's two ways of tackling this - we either allow for  
blobs to be sent inlined with other elements (which I think would be  
difficult to handle), or else we define a new URI scheme - or reuse  
cid - and stick id and content-type attributes on , so:


to='[EMAIL PROTECTED]/court'>


Yo, Shylock, here's a pound of flesh.



 Yo, Shylock, here's a pound of flesh: 



id='foo' octet-count='426'  
content-type='matter-transport/flesh'/>[426 octets of, presumably,  
image]


(See RFC1437 for the top-level MIME type used).

Alternately, we might prefer that the blobs are carried on demand in  
this instance.


Finally, we should probably consider blocking and flow-control - at  
this point, I'll either suggest we examine BEEP, or else we just  
reuse what we have in IBB.


Now, we can't expect that the entire Internet will bend to our will  
and instantly upgrade, so we need a sane fallback - probably to IBB,  
or something fairly similar. The interesting question is whether we  
choose to have this negotiated end to end (which means we'll need to  
have each hop along the route tested), or whether we say that this  
down-conversion happens within servers.


Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade