Absolutely,

I wouldn't recommend any industrial strength server application parse large
XML documents into large DOM objects and then go trough the trouble of
serializing it all back down to byte format just for the benefit of sending
it over the network a little faster.  The process is way to taxing on system
resources to be scalable.

You might want to explore the java.util.zip.ZipOutputStream a bit, although
I don't know first hand the scalability of that approach.  The only way to
find out would be to test each one in a way that simulates a large workload.

On a non software level, there is some exploration going on about this very
dilemma.  You might find that interesting.
http://news.cnet.com/news/0-1003-200-1821557.html?tag=st.ne.1002.tgif.ni


-David Blevins


> -----Original Message-----
> From: A mailing list for Enterprise JavaBeans development
> [mailto:[EMAIL PROTECTED]]On Behalf Of David Regan
> Sent: Tuesday, May 09, 2000 8:14 PM
> To: [EMAIL PROTECTED]
> Subject: Re: RMI/HTTP?
>
>
> This could well be what is happening. I had not
> considered that serialization could be shrinking the
> files vs their plaintext counterparts.
>
> In fact, the test program I wrote that parsed out a
> text file (in the same xml format I was using for
> the application) showed a 5k difference in a 60k file,
> or about a 9% difference. (5k less in the serialized
> file)
>
> since my earlier run was with approximately 480k of
> data, I can see how this might be significant, though
> the cost of serializing it and deserializing it seems
> (at first glance) to be of at least similar cost
> over a fast wire ( the test was run over 100mbit
> ethernet).
>
> This implies that an even faster alternative might be
> to use XML with a lightweight compression routine (since
> this could shrink the file a lot more than 9%) and
> would probably not involve much additional compute
> overhead vs serialization.
>
> Food for thought,
> -David
>
> -----Original Message-----
> From: A mailing list for Enterprise JavaBeans development
> [mailto:[EMAIL PROTECTED]]On Behalf Of David Blevins
> Sent: Tuesday, May 09, 2000 4:09 AM
> To: [EMAIL PROTECTED]
> Subject: Re: RMI/HTTP?
>
>
> This is because the RMI code uses object serialization which, in
> your case,
> is actually writing less bytes to the stream.
>
> Object serialization will never write the same object to the stream twice.
> Instead it will write a handle referring to the object.  When the
> receiving
> stream encounters a handle, it simply looks up the object from the list of
> objects it has already read.  This applies to Strings as well.
>
> XML documents contain a lot of redundant tags, as in:
>
> <person>
>   <name>Joe</name>
>   <age>28</age>
> </person>
> <person>
>   <name>Jon</name>
>   <age>45</age>
> </person>
> <person>
>   <name>Jan</name>
>   <age>33</age>
> </person>
> <person>
>   <name>Jim</name>
>   <age>25</age>
>   </person>
> <person>
>   <name>Jed</name>
>   <age>51</age>
> </person>
> <person>
>   <name>Jen</name>
>   <age>24</age>
> </person>
>
> If we were to treat each part of the element tags and the PCData
> in them as
> objects and write them to an object output stream, we would get the
> following output:
>
> ---Stream output-----
> <person>|<name>|Joe|</name>|<age>|28|</age>|</person>|~0|~1|Jon|~3
> |~4|45|~6|
> ~7|~0|~1|Jan|~3|~4|33|~6|~7|~0|~1|Jim|~3|~4|25|~6|~7|~0|~1|Jed|~3|
> ~4|51|~6|~
> 7|~0|~1|Jen|~3|~4|24|~6|~7
> ---Stream output-----
>
> NOTE: This is a simplified representation of the output of object
> serialization. In the example, '|' is used to separate the objects in the
> stream.  '~' denotes a handle.
>
>
> Every time the object input stream reads in an object it doesn't already
> have, it gives it a new number (handle) and puts it in a table.  When the
> object input stream finds a handle in the stream, it goes to the table and
> uses the object at that index.
>
> Here is an example table that would have been generated by the
> above stream.
>
> index| object
> ---------------
>    0 | <person>
>    1 | <name>
>    2 | Joe
>    3 | </name>
>    4 | <age>
>    5 | 28
>    6 | </age>
>    7 | </person>
>    8 | Jon
>    9 | 45
>   10 | Jan
>   11 | 33
>   12 | Jim
>   13 | 25
>   14 | Jed
>   15 | 51
>   16 | Jen
>   17 | 24
> ---------------
>
> This is the essence of one part of object serialization.  As you
> can guess,
> more than what I have talked about happens and more needs to be written to
> the stream.  Also, a great deal of the output of an object written to a
> stream depends on how the object was defined.  But in all cases the
> described effect takes place to some degree.
>
> Additionally, some DOM parsers can be configured not to include
> unnecessary
> white space in the DOMs they generate.  If this is true in your case, you
> get an additional reduction in byte size as opposed to HTTP which has to
> write the document "as is", white space and all.
>
> To test this for yourself, serialize your DOM object to a file and compare
> the byte size with the XML file itself.  I know I would be
> interested to see
> how big the difference is in the file sizes.
>
>
> -David Blevins
>
>
> > From: A mailing list for Enterprise JavaBeans development
> > Subject: RMI/HTTP?
> >
> >
> > This is not really the right forum for this, but I
> > have been playing with Weblogic's custom RMI libraries
> > in my EJB app and coming up with some very odd results.
> >
> > For some reason sending XML data encapsulated in a RMI
> > object over HTTP is faster than sending raw XML over
> > straight HTTP.
> >
> > This really doesn't make sense to me since using RMI
> > (even an optimized version) should generate some
> > additional overhead to the communication vs a raw
> > URLConnection.
> >
> > Since I'm doing XML->DOM translations in both cases,
> > that's not the issue..it really seems to be communication
> > related.
> >
> > RMI code
> >         xml = obj.getDirList(name);
> >         parseXmlFileList(xml, 0, this);
> >
> > non-RMI code
> >     URLConnection urlc = httpServletUrl.openConnection();
> >     urlc.setDoOutput(true);
> >     urlc.setDoInput(true);
> >
> >     PrintWriter pw = new PrintWriter(
> >         new OutputStreamWriter(
> >       urlc.getOutputStream()), true);
> >       pw.println(name);
> >
> >       BufferedReader r = new BufferedReader(
> >         new InputStreamReader(urlc.getInputStream()));
> >       StringBuffer sb = new StringBuffer();
> >    while (true) {
> >         String s = r.readLine();
> >         if (s == null || s.equals("\n") || s.equals(""))
> >                 break;
> >               sb.append(s);
> >             sb.append("\n");
> >             }
> >         xml = sb.toString();
> >         parseXmlFileList(xml, 0, this);
> >
> > Any opinions on this? I'm at a loss on this one.
> > -David
> >
> > ==================================================================
> > =========
> > To unsubscribe, send email to [EMAIL PROTECTED] and include
> > in the body
> > of the message "signoff EJB-INTEREST".  For general help, send email to
> > [EMAIL PROTECTED] and include in the body of the message "help".
> >
>
> ==================================================================
> =========
> To unsubscribe, send email to [EMAIL PROTECTED] and include
> in the body
> of the message "signoff EJB-INTEREST".  For general help, send email to
> [EMAIL PROTECTED] and include in the body of the message "help".
>
> ==================================================================
> =========
> To unsubscribe, send email to [EMAIL PROTECTED] and include
> in the body
> of the message "signoff EJB-INTEREST".  For general help, send email to
> [EMAIL PROTECTED] and include in the body of the message "help".
>

===========================================================================
To unsubscribe, send email to [EMAIL PROTECTED] and include in the body
of the message "signoff EJB-INTEREST".  For general help, send email to
[EMAIL PROTECTED] and include in the body of the message "help".

Reply via email to