About sending the image, I would recommend using something like stream initiation (jep-0095), since you can open outbound stream which is much less data consuming, and use inbound only when necessary.
Stream Initiation provides a way to negotiatie streams "between any two entities". That leaves two options for a multiuser case: 1. The sender negotiates a stream with each of the other participants in the sessions and sends the image to each seperately. 2. Serverside support: the sender negotiates a stream with the server and the server negotiates a stream with each of the other participants. Option 1 doesn't seem feasible and option 2, well, requires serverside support... Additionally, a central idea in the whiteboard protocol I'm using is that each participant has an equivalent copy of the whiteboard's contents. That would break if the stream negotiation fails with one of the participants.
Because, if you send that 200kB image, you will probably exceed not only the message size (which you can solve by splitting it), but as well will get a karma limit and you will be blocked for a while before the rest will be sent out. Transfer of such image will take several seconds on most servers out there. For example, default setup of ejabberd allows something like 7kB/s for a single user.
That's too bad, but if the messages do eventually get delivered, it's tolerable. I haven't figured out better way to do it without serverside support. Like Peter said, it would be interesting if server developers could propose some recommended sizes. Joonas