ok good to know. but this still does not help with the 1mb entity size limit... even by compressing some json objects i would still be over that size. think the only solution here is to use the BlobStore with the files api.
On Jun 4, 2012, at 3:24 PM, Bryce Cutt wrote: > aschmid: The ndb BlobProperty has optional compression built in (see > ndb.model.BlobProperty). You could implement the MarshalProperty like this: > > class MarshalProperty(BlobProperty): > def _to_base_type(self, value): > return marshal.dumps(value, MARSHAL_VERSION) > def _from_base_type(self, value): > return marshal.loads(value) > > Then when you instantiate a property instance you would specify the > compressed option to enable compression: > prop = MarshalProperty(compressed=True) > > The compressed option in BlobProperty is implemented in such a way that you > can turn it on an off and old values will still be read properly as the > _from_base_type() method in BlobProperty only decompresses the stored value > if it actually was compressed. > > The BlobProperty uses the default compression level (and does not have an > option to change the compression level) so if you want to use level 1 (as > Andrin recommends) you would need to implement that in your own subclass. > > > On Monday, June 4, 2012 7:41:14 AM UTC-7, aschmid wrote: > is this a valid implementation? > > class JsonMarshalZipProperty(ndb.BlobProperty): > > def _to_base_type(self, value): > return zlib.compress(marshal.dumps(value, MARSHAL_VERSION)) > > def _from_base_type(self, value): > return marshal.loads(zlib.decompress(value)) > > > On Jun 4, 2012, at 9:49 AM, Andreas wrote: > >> great. how would this look for the ndb package? >> >> On Jun 1, 2012, at 2:40 PM, Andrin von Rechenberg wrote: >> >>> Hey there >>> >>> If you want to store megabytes of JSON in datastore >>> and get it back from datastore into python already parsed, >>> this post is for you. >>> >>> I ran a couple of performance tests where I want to store >>> a 4 MB json object in the datastore and then get it back at >>> a later point and process it. >>> >>> There are several ways to do this. >>> >>> Challenge 1) Serialization >>> You need to serialize your data. >>> For this you can use several different libraries. >>> JSON objects can be serialized using: >>> the json lib, the cPickle lib or the marshal lib. >>> (these are the libraries I'm aware of atm) >>> >>> Challenge 2) Compression >>> If your serialized data doesn't fit into 1mb you need >>> to shard your data over multiple datastore entities and >>> manually build it together when loading the entities back. >>> If you compress your serialized data and store it then, >>> you have the cost of compression and decompression, >>> but you have to fetch fewer datastore entities when you >>> want to load your data and you have to write fewer >>> datastore entities if you want to update your data if it >>> sharded. >>> >>> Solution for 1) Serialization: >>> cPickle is very slow. It's meant to serialize real >>> objects and not just json. JSON is much faster, >>> but compared to marshal it has no chance. >>> The python marshal library is definitely the >>> way to serialize JSON. It has the best performance >>> >>> Solution for 2) Compression: >>> For my use-case it makes absolutely sense to >>> compress the data the marshal lib produces >>> before storing it in datastore. I have gigabytes >>> of JSON data. Compressing the data makes >>> it about 5x smaller. Doing 5x fewer datastore >>> operations definitely pays for the the time it >>> takes to compress and decompress the data. >>> There are several compression levels you >>> can use to when using python's zlib. >>> From 1 (lowest compression, but fastest) >>> to 9 (highest compression but slowest). >>> During my tests I figured that the optimum >>> is to compress your serialized data using >>> zlib with level 1 compression. Higher >>> compression takes to much CPU and >>> the result is only marginally smaller. >>> >>> Here are my test results: >>> cPickle ziplvl: 0 >>> >>> dump: 1.671010s >>> load: 0.764567s >>> size: 3297275 >>> cPickle ziplvl: 1 >>> >>> dump: 2.033570s >>> load: 0.874783s >>> size: 935327 >>> json ziplvl: 0 >>> >>> dump: 0.595903s >>> load: 0.698307s >>> size: 2321719 >>> json ziplvl: 1 >>> >>> dump: 0.667103s >>> load: 0.795470s >>> size: 458030 >>> marshal ziplvl: 0 >>> >>> dump: 0.118067s >>> load: 0.314645s >>> size: 2311342 >>> marshal ziplvl: 1 >>> >>> dump: 0.315362s >>> load: 0.335677s >>> size: 470956 >>> marshal ziplvl: 2 >>> >>> dump: 0.318787s >>> load: 0.380117s >>> size: 457196 >>> marshal ziplvl: 3 >>> >>> dump: 0.350247s >>> load: 0.364908s >>> size: 446085 >>> marshal ziplvl: 4 >>> >>> dump: 0.414658s >>> load: 0.318973s >>> size: 437764 >>> marshal ziplvl: 5 >>> >>> dump: 0.448890s >>> load: 0.350013s >>> size: 418712 >>> marshal ziplvl: 6 >>> >>> dump: 0.516882s >>> load: 0.367595s >>> size: 409947 >>> marshal ziplvl: 7 >>> >>> dump: 0.617210s >>> load: 0.315827s >>> size: 398354 >>> marshal ziplvl: 8 >>> >>> dump: 1.117032s >>> load: 0.346452s >>> size: 392332 >>> marshal ziplvl: 9 >>> >>> dump: 1.366547s >>> load: 0.368925s >>> size: 391921 >>> The results do not include datastore operations, >>> it's just about creating a blob that can be stored >>> in the datastore and getting the parsed data back. >>> The times of "dump" and "load" are seconds it takes >>> to do this on a Google AppEngine F1 instances >>> (600Mhz, 128mb RAM). >>> >>> I posted this email on my blog: >>> http://devblog.miumeet.com/2012/06/storing-json-efficiently-in-python-on.html >>> You can also comment there or on this email thread. >>> >>> Enjoy, >>> -Andrin >>> >>> Here is the library i created an use: >>> >>> #!/usr/bin/env python >>> # >>> # Copyright 2012 MiuMeet AG >>> # >>> # Licensed under the Apache License, Version 2.0 (the "License"); >>> # you may not use this file except in compliance with the License. >>> # You may obtain a copy of the License at >>> # >>> # http://www.apache.org/licenses/LICENSE-2.0 >>> # >>> # Unless required by applicable law or agreed to in writing, software >>> # distributed under the License is distributed on an "AS IS" BASIS, >>> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. >>> # See the License for the specific language governing permissions and >>> # limitations under the License. >>> # >>> >>> from google.appengine.api import datastore_types >>> from google.appengine.ext import db >>> >>> import zlib >>> import marshal >>> >>> MARSHAL_VERSION = 2 >>> COMPRESSION_LEVEL = 1 >>> >>> class JsonMarshalZipProperty(db.BlobProperty): >>> """Stores a JSON serializable object using zlib and marshal in a >>> db.Blob""" >>> >>> def default_value(self): >>> return None >>> >>> def get_value_for_datastore(self, model_instance): >>> value = self.__get__(model_instance, model_instance.__class__) >>> if value is None: >>> return None >>> return db.Blob(zlib.compress(marshal.dumps(value, MARSHAL_VERSION), >>> COMPRESSION_LEVEL)) >>> >>> def make_value_from_datastore(self, value): >>> if value is not None: >>> return marshal.loads(zlib.decompress(value)) >>> return value >>> >>> data_type = datastore_types.Blob >>> >>> def validate(self, value): >>> return value >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Google App Engine" group. >>> To post to this group, send email to google-appengine@googlegroups.com. >>> To unsubscribe from this group, send email to >>> google-appengine+unsubscr...@googlegroups.com. >>> For more options, visit this group at >>> http://groups.google.com/group/google-appengine?hl=en. >> > > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/google-appengine/-/qKSg7YkFW5YJ. > To post to this group, send email to google-appengine@googlegroups.com. > To unsubscribe from this group, send email to > google-appengine+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.