Re: [Maya-Python] [python] which library using to read / write huge amount of data

Justin Israel Wed, 05 Oct 2016 11:35:58 -0700

On Thu, 6 Oct 2016, 2:25 AM Alok Gandhi <alok.gandhi2...@gmail.com> wrote:


> Generally speaking, binary compressed data would help in faster io.
>

Yes agreed.

Using something like zlib, as Marcus suggested, would actually mean you
have to do more work because first you have to serialize your python
objects to strings and then zlib encode them. While you may save disk space
on the resulting string data, you now need to zlib decompress before then
marshaling the plain strings back into objects again.

If you can serialize to a binary format in the first place, you can
describe your data in a way that is cast to read. This also means you may
have the ability to stream the data back in as opposed to reading the
entire blob at once, depending on which format you choose. Json, for
instance,  is capable of streaming when you have one object per line.
A simple example of a binary format is the cPickle module, using protocol
2. It is decently fast for being a builtin standard library option and is a
binary self describing format. Ujson may beat its performance in encoding
however. A long time back I did a post about comparing serializing options
http://justinfx.com/2012/07/25/python-2-7-3-serializer-speed-comparisons/

Some newer options that are not on that list are protocol buffers and
flatbuffers. These are also fast binary formats.

But also, like Marcus suggested, you will have to test both encoding and
decoding performance as they each have their strengths. Depends on what
your needs are.

Justin


> On Wed, Oct 5, 2016 at 2:13 PM, Marcus Ottosson <konstrukt...@gmail.com>
> wrote:
>
> Before getting into how to make something fast, I must point out that your
> journey towards performance is meaningless without having something to run.
> Your first call to port should be to build your tool in the simplest and
> most straightforward and obvious way possible, and then start to look at
> how to make it faster.
> ------------------------------
>
> Now, when it comes to speed, reading and writing are generally at conflict
> with each other. The faster it writes, the slower it reads.
>
> The fastest thing to write is also the smallest - that could mean
> compressing your data for example.
>
> import zlib
>
> data = b"My very long string, "
> compressed = zlib.compress(data)
>
> At this point, writing will be at a peak speed, only dependent on the
> quality of your chosen compression method and amount of content you
> compress.
>
> But reading suffers.
>
> The fastest thing to read is also the smallest - that could mean
> decompressing your data.
>
> import zlib
>
> data = b"My very long string, "
> compressed = zlib.compress(data)
>
> decompressed = zlib.decompress(compressed)assert decompressed == data
>
> If you measure the results of compressed against decompressed, you’ll
> find that it’s actually larger than the original.
>
> import zlib
>
> data = b"My very long string, "
> compressed = zlib.compress(data)
> decompressed = zlib.decompress(compressed)assert len(compressed) > 
> len(decompressed)
>
> What gives? Because the data is so small, the added overhead of the
> compression method outweighs the benefits. Compression, with zlib and
> likely other algorithms, are most effective on large, repetitive data
> structures.
>
> import zlib
>
> data = b"My very long string, " * 10000
> compressed = zlib.compress(data)
> decompressed = zlib.decompress(compressed)assert len(compressed) < 
> len(decompressed)  # Notice the flipped comparison sign
>
> To get a sense of the savings made, you could try something like this.
>
> import sysimport zlib
>
> data = b"My very long string, " * 10000
> compressed = zlib.compress(data)
> decompressed = zlib.decompress(compressed)assert decompressed == data
>
> original_size = sys.getsizeof(data)
> compressed_size = sys.getsizeof(compressed)
>
> print("original: %.2f kb\ncompressed: %.2f kb\n= %i times smaller" % (
>     original_size / 1024.0,
>     compressed_size / 1024.0,
>     original_size / compressed_size)
> )
>
> Which on my system (Windows 10, Python 3.3x64) prints:
>
> original: 205.11 kb
> compressed: 0.58 kb
> = 356 times smaller
>
> Now which is it, do you need it to *read* fast, or *write* fast? :)
>
> Best,
> Marcus
> 
>
> --
> You received this message because you are subscribed to the Google Groups
> "Python Programming for Autodesk Maya" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to python_inside_maya+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/python_inside_maya/CAFRtmOCAjZUST94WFyWE-JjmzN1z9m9hkHbpXtJyLWWv14X-LA%40mail.gmail.com
> <https://groups.google.com/d/msgid/python_inside_maya/CAFRtmOCAjZUST94WFyWE-JjmzN1z9m9hkHbpXtJyLWWv14X-LA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
>
> --
> You received this message because you are subscribed to the Google Groups
> "Python Programming for Autodesk Maya" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to python_inside_maya+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/python_inside_maya/CAPaTLMR3JYvWTTLwf2kbj%3Dtp0R3qqTSqxn3ORvrtEw4E2jn1jA%40mail.gmail.com
> <https://groups.google.com/d/msgid/python_inside_maya/CAPaTLMR3JYvWTTLwf2kbj%3Dtp0R3qqTSqxn3ORvrtEw4E2jn1jA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to python_inside_maya+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/python_inside_maya/CAPGFgA15CiUsAonFQA4H9NLDUpwag%2BGKUG6W_burRaDDgQFJig%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Maya-Python] [python] which library using to read / write huge amount of data

Reply via email to