ok good to know.
but this still does not help with the 1mb entity size limit... even by 
compressing some json objects i would still be over that size. 
think the only solution here is to use the BlobStore with the files api.

On Jun 4, 2012, at 3:24 PM, Bryce Cutt wrote:

> aschmid: The ndb BlobProperty has optional compression built in (see 
> ndb.model.BlobProperty). You could implement the MarshalProperty like this:
> 
> class MarshalProperty(BlobProperty):
>   def _to_base_type(self, value):
>     return marshal.dumps(value, MARSHAL_VERSION)
>   def _from_base_type(self, value):
>     return marshal.loads(value)
> 
> Then when you instantiate a property instance you would specify the 
> compressed option to enable compression:
> prop = MarshalProperty(compressed=True)
> 
> The compressed option in BlobProperty is implemented in such a way that you 
> can turn it on an off and old values will still be read properly as the 
> _from_base_type() method in BlobProperty only decompresses the stored value 
> if it actually was compressed.
> 
> The BlobProperty uses the default compression level (and does not have an 
> option to change the compression level) so if you want to use level 1 (as 
> Andrin recommends) you would need to implement that in your own subclass.
> 
> 
> On Monday, June 4, 2012 7:41:14 AM UTC-7, aschmid wrote:
> is this a valid implementation?
> 
> class JsonMarshalZipProperty(ndb.BlobProperty):
> 
>     def _to_base_type(self, value):
>         return zlib.compress(marshal.dumps(value, MARSHAL_VERSION))
> 
>     def _from_base_type(self, value):
>         return marshal.loads(zlib.decompress(value))
> 
> 
> On Jun 4, 2012, at 9:49 AM, Andreas wrote:
> 
>> great. how would this look for the ndb package?
>> 
>> On Jun 1, 2012, at 2:40 PM, Andrin von Rechenberg wrote:
>> 
>>> Hey there
>>> 
>>> If you want to store megabytes of JSON in datastore
>>> and get it back from datastore into python already parsed, 
>>> this post is for you.
>>> 
>>> I ran a couple of performance tests where I want to store
>>> a 4 MB json object in the datastore and then get it back at
>>> a later point and process it.
>>> 
>>> There are several ways to do this.
>>> 
>>> Challenge 1) Serialization
>>> You need to serialize your data.
>>> For this you can use several different libraries.
>>> JSON objects can be serialized using:
>>> the json lib, the cPickle lib or the marshal lib.
>>> (these are the libraries I'm aware of atm)
>>> 
>>> Challenge 2) Compression
>>> If your serialized data doesn't fit into 1mb you need
>>> to shard your data over multiple datastore entities and
>>> manually build it together when loading the entities back.
>>> If you compress your serialized data and store it then,
>>> you have the cost of compression and decompression,
>>> but you have to fetch fewer datastore entities when you
>>> want to load your data and you have to write fewer
>>> datastore entities if you want to update your data if it
>>> sharded.
>>> 
>>> Solution for 1) Serialization:
>>> cPickle is very slow. It's meant to serialize real
>>> objects and not just json. JSON is much faster,
>>> but compared to marshal it has no chance.
>>> The python marshal library is definitely the
>>> way to serialize JSON. It has the best performance
>>> 
>>> Solution for 2) Compression:
>>> For my use-case it makes absolutely sense to
>>> compress the data the marshal lib produces
>>> before storing it in datastore. I have gigabytes
>>> of JSON data. Compressing the data makes
>>> it about 5x smaller. Doing 5x fewer datastore
>>> operations definitely pays for the the time it
>>> takes to compress and decompress the data.
>>> There are several compression levels you
>>> can use to when using python's zlib.
>>> From 1 (lowest compression, but fastest)
>>> to 9 (highest compression but slowest).
>>> During my tests I figured that the optimum
>>> is to compress your serialized data using
>>> zlib with level 1 compression. Higher
>>> compression takes to much CPU and
>>> the result is only marginally smaller.
>>> 
>>> Here are my test results:
>>> cPickle ziplvl: 0
>>> 
>>> dump: 1.671010s
>>> load: 0.764567s
>>> size: 3297275
>>> cPickle ziplvl: 1
>>> 
>>> dump: 2.033570s
>>> load: 0.874783s
>>> size: 935327
>>> json ziplvl: 0
>>> 
>>> dump: 0.595903s
>>> load: 0.698307s
>>> size: 2321719
>>> json ziplvl: 1
>>> 
>>> dump: 0.667103s
>>> load: 0.795470s
>>> size: 458030
>>> marshal ziplvl: 0
>>> 
>>> dump: 0.118067s
>>> load: 0.314645s
>>> size: 2311342
>>> marshal ziplvl: 1
>>> 
>>> dump: 0.315362s
>>> load: 0.335677s
>>> size: 470956
>>> marshal ziplvl: 2
>>> 
>>> dump: 0.318787s
>>> load: 0.380117s
>>> size: 457196
>>> marshal ziplvl: 3
>>> 
>>> dump: 0.350247s
>>> load: 0.364908s
>>> size: 446085
>>> marshal ziplvl: 4
>>> 
>>> dump: 0.414658s
>>> load: 0.318973s
>>> size: 437764
>>> marshal ziplvl: 5
>>> 
>>> dump: 0.448890s
>>> load: 0.350013s
>>> size: 418712
>>> marshal ziplvl: 6
>>> 
>>> dump: 0.516882s
>>> load: 0.367595s
>>> size: 409947
>>> marshal ziplvl: 7
>>> 
>>> dump: 0.617210s
>>> load: 0.315827s
>>> size: 398354
>>> marshal ziplvl: 8
>>> 
>>> dump: 1.117032s
>>> load: 0.346452s
>>> size: 392332
>>> marshal ziplvl: 9
>>> 
>>> dump: 1.366547s
>>> load: 0.368925s
>>> size: 391921
>>> The results do not include datastore operations,
>>> it's just about creating a blob that can be stored
>>> in the datastore and getting the parsed data back.
>>> The times of "dump" and "load" are seconds it takes
>>> to do this on a Google AppEngine F1 instances
>>> (600Mhz, 128mb RAM).
>>> 
>>> I posted this email on my blog: 
>>> http://devblog.miumeet.com/2012/06/storing-json-efficiently-in-python-on.html
>>> You can also comment there or on this email thread.
>>> 
>>> Enjoy,
>>> -Andrin
>>> 
>>> Here is the library i created an use:
>>> 
>>> #!/usr/bin/env python
>>> #
>>> # Copyright 2012 MiuMeet AG
>>> #
>>> # Licensed under the Apache License, Version 2.0 (the "License");
>>> # you may not use this file except in compliance with the License.
>>> # You may obtain a copy of the License at
>>> #
>>> #     http://www.apache.org/licenses/LICENSE-2.0
>>> #
>>> # Unless required by applicable law or agreed to in writing, software
>>> # distributed under the License is distributed on an "AS IS" BASIS,
>>> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>> # See the License for the specific language governing permissions and
>>> # limitations under the License.
>>> #
>>> 
>>> from google.appengine.api import datastore_types
>>> from google.appengine.ext import db
>>> 
>>> import zlib
>>> import marshal
>>> 
>>> MARSHAL_VERSION = 2
>>> COMPRESSION_LEVEL = 1
>>> 
>>> class JsonMarshalZipProperty(db.BlobProperty):
>>>   """Stores a JSON serializable object using zlib and marshal in a 
>>> db.Blob"""
>>> 
>>>   def default_value(self):
>>>     return None
>>>   
>>>   def get_value_for_datastore(self, model_instance):
>>>     value = self.__get__(model_instance, model_instance.__class__)
>>>     if value is None:
>>>       return None
>>>     return db.Blob(zlib.compress(marshal.dumps(value, MARSHAL_VERSION),
>>>                                  COMPRESSION_LEVEL))
>>> 
>>>   def make_value_from_datastore(self, value):
>>>     if value is not None:
>>>       return marshal.loads(zlib.decompress(value))
>>>     return value
>>> 
>>>   data_type = datastore_types.Blob
>>>   
>>>   def validate(self, value):
>>>     return value
>>> 
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Google App Engine" group.
>>> To post to this group, send email to google-appengine@googlegroups.com.
>>> To unsubscribe from this group, send email to 
>>> google-appengine+unsubscr...@googlegroups.com.
>>> For more options, visit this group at 
>>> http://groups.google.com/group/google-appengine?hl=en.
>> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To view this discussion on the web visit 
> https://groups.google.com/d/msg/google-appengine/-/qKSg7YkFW5YJ.
> To post to this group, send email to google-appengine@googlegroups.com.
> To unsubscribe from this group, send email to 
> google-appengine+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to