Re: Lots of data over a service

Greg Harris Tue, 06 Aug 2013 17:41:31 -0700

Hi Greg #N+1,

That Silverlight weekend in Docklands was a great event, thank you to the
guys that organised it!

I would not say that I implemented my own data compression, more I avoided
any extra fat in the data.  I agree that I would not have been getting much
extra mileage out of the pushing it as far as I did, to be truthful, I had
such bad performance before doing this, I just threw every realistic at
reducing the size of the data.

If I did it again, I would start with a CSV file with two record types
shown by the first character on the line.  This would avoid the cost of
XML/JSON format and the extra complexity of Greg format numbers.  So my
detail line would change from: 3413340017010 to D,34.1,334000,701 being
just 4 more characters in this example.

The 430KB for ~32K rows is for the uncompressed data, it compresses to
168KB, I am not going to do an experiment to show how much more space it
would take up without Greg format numbers and the extra commas, but my gut
tells me that you are right that zip would make the differences very small.

When I did the original work on the Optus soft copy bills, we were getting
~90% zip data compression, but there was a lot of white space in that that
would have compressed down to almost nothing.

Regards

Greg Harris

On Wed, Aug 7, 2013 at 8:22 AM, Greg Keogh <g...@mira.net> wrote:

> Howdy Greg #2 (or 3?)
>
> Haven't seen you since the Silverlight weekend in Docklands a few years
> ago.
>
> Very interesting! You have implemented your own data compression, and we
> used to do very similar things back in the late 70s and 80s when mainframe
> disk space was precious. Compression algorithms and software were not
> available or widely know then. In fact, Wikipedia says the LZ algorithms
> were only published in 1977/78 (not long ago in coding years).
>
> However, I have this uneasy feeling that all of your manual work is made
> mostly redundant by what zipping does for transmission. Zip will
> aggressively remove redundancy from your data, so well in fact that I
> suspect it might reduce the benefits of your pre-processing to a hair's
> width. Although your pre-processing will save space for the raw data if
> that's a problem on the client side.
>
> I was quite amazed that ~6000 of my entities as XML took 6.06MB, but
> deflated down to 492KB which is 8% of the original size and quite suitable
> for transmission as a single blob. I reckon your data as plain CSV would
> also reduce incredibly well.
>
> Given that I also deflate for transmission as a blob, I think my problem
> is now reduced to a pure coding problem: What format is easiest to
> round-trip my entities?
>
> Importantly, I'm looking for a general purpose way of transforming (most
> of the) the entity class properties. XML needs manual coding, Json I'm not
> sure about. I can't use pure binary serialization because it's not
> supported in Silverlight clients.
>
> Greg K
>

Re: Lots of data over a service

Reply via email to