[google-appengine] Re: Picking entities got faster?

2010-12-07 Thread Piotr Jaroszyński
2010/12/8 Piotr Jaroszyński :
> 12-07 04:56PM 36.559 Running pickle 100 times took 3221 megacycles
> 12-07 04:56PM 37.152 Running depickle 100 times took 1631 megacycles
>
> 12-07 04:56PM 39.778 Running protobufy 100 times took 7339 megacycles
> 12-07 04:56PM 40.424 Running pickle_protobufed 100 times took 1742 megacycles
> 12-07 04:56PM 41.446 Running depickle_protobufed 100 times took 2843 
> megacycles
> 12-07 04:56PM 43.823 Running deprotobufy 100 times took 6634 megacycles

Sorry but the above results were from a version w/o the .Encode() in
protobufy(): Updated:

12-07 05:28PM 34.018 Running pickle 100 times took 3513 megacycles
12-07 05:28PM 34.679 Running depickle 100 times took 1816 megacycles

12-07 05:28PM 37.710 Running protobufy 100 times took 8471 megacycles
12-07 05:28PM 37.878 Running pickle_protobufed 100 times took 391 megacycles
12-07 05:28PM 37.958 Running depickle_protobufed 100 times took 222 megacycles
12-07 05:28PM 41.111 Running deprotobufy 100 times took 8825 megacycles

-- 
Best Regards
Piotr Jaroszyński

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: Picking entities got faster?

2010-12-07 Thread Piotr Jaroszyński
2010/12/8 Piotr Jaroszyński :
> 2010/12/8 Piotr Jaroszyński :
>> 12-07 04:56PM 36.559 Running pickle 100 times took 3221 megacycles
>> 12-07 04:56PM 37.152 Running depickle 100 times took 1631 megacycles
>>
>> 12-07 04:56PM 39.778 Running protobufy 100 times took 7339 megacycles
>> 12-07 04:56PM 40.424 Running pickle_protobufed 100 times took 1742 megacycles
>> 12-07 04:56PM 41.446 Running depickle_protobufed 100 times took 2843 
>> megacycles
>> 12-07 04:56PM 43.823 Running deprotobufy 100 times took 6634 megacycles
>
> Sorry but the above results were from a version w/o the .Encode() in
> protobufy(): Updated:
>
> 12-07 05:28PM 34.018 Running pickle 100 times took 3513 megacycles
> 12-07 05:28PM 34.679 Running depickle 100 times took 1816 megacycles
>
> 12-07 05:28PM 37.710 Running protobufy 100 times took 8471 megacycles
> 12-07 05:28PM 37.878 Running pickle_protobufed 100 times took 391 megacycles
> 12-07 05:28PM 37.958 Running depickle_protobufed 100 times took 222 megacycles
> 12-07 05:28PM 41.111 Running deprotobufy 100 times took 8825 megacycles

To be clear, these results are from production (dev server doesn't
even seem to emulate the quota api).

-- 
Best Regards
Piotr Jaroszyński

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: Picking entities got faster?

2010-12-08 Thread Piotr Jaroszyński
Hello again,

(CCing appengine-python as it seems to be a more appropriate forum)

I have extended the benchmark a little and I am starting to suspect
that in fact protobuf got slower instead of pickle getting faster.

Data:
1) a list of 100 randomly generated new entities (not saved in datastore)
2) a list of 100 entitites fetched from datastore

Two approaches:

1) pickle only
Serialization:
- pickle the list of entities (called pickle in the results)
Desarialization:
- depickle the list (called depickle)

2) protobuf + pickle.
This one is more complex.
Serialization:
- run db.model_to_protobuf() on the data to get a PB (called protobufy
in the results)
- run Encode() on the PBs to get a string from each PB (called protobufy_encode)
- run pickle on the list of strings (called pickle_protobufed)
Deserialization
- run depickle to get a list of strings encoding the PBs (called
depickle_protobufed)
- run pb.FromString() to decode the strings (called deprotobufy_decode)
- run db.model_from_protobuf() to get the entities (called deprotobufy)
Note that the last two steps can be done by db.model_from_protobuf()
alone (it has an isinstance check for str), but by separating them we
get more data.

Results:

1) Using data from datastore

Pickle only:
Running pickle 100 times took 10776 megacycles (3.848261 secs real time)
Running depickle 100 times took 7673 megacycles (2.740252 secs real time)

Protobuf + pickle:
Running protobufy 100 times took 7920 megacycles (2.828590 secs real time)
Running protobufy_encode 100 times took 1227 megacycles (0.438058 secs
real time)
Running pickle_protobufed 100 times took 507 megacycles (0.181028 secs
real time)
Running depickle_protobufed 100 times took 118 megacycles (0.042228
secs real time)
Running deprotobufy_decode 100 times took 2679 megacycles (0.956619
secs real time)
Running deprotobufy 100 times took 9154 megacycles (3.268975 secs real time)


2) Using new random data

Pickle only:
Running pickle 100 times took 3768 megacycles (1.345945 secs real time)
Running depickle 100 times took 1616 megacycles (0.577406 secs real time)

Protobuf + pickle:
Running protobufy 100 times took 9328 megacycles (3.331145 secs real time)
Running protobufy_encode 100 times took 1106 megacycles (0.394689 secs
real time)
Running pickle_protobufed 100 times took 413 megacycles (0.147686 secs
real time)
Running depickle_protobufed 100 times took 90 megacycles (0.032271
secs real time)
Running deprotobufy_decode 100 times took 2300 megacycles (0.821574
secs real time)
Running deprotobufy 100 times took 8120 megacycles (2.899931 secs real time)

We can see a few things here:

1) Pickling/depickling data from datastore is much slower than new
data - probably because it has more state saved in the object
2) Protobuf's Encode and Decode are pretty fast and probably
implemented in native code
3) The slow part of the protobuf approach seems to be the entity <->
ProtocolBuffer conversion which seems to be done solely in python

Has it always been like that? Currently using protobufs seems to be
slower than pickle at least for deserialization and that's the more
important part when you use it for memcache for example.

==

Benchmark handler:

import logging
import pickle
import time
import random

from google.appengine.ext import db
from google.appengine.api import quota
from google.appengine.datastore import entity_pb
from tipfy import RequestHandler
from tipfy import Response

class TestData(db.Model):
name = db.StringProperty(required=True)
min = db.IntegerProperty(required=True)
max = db.IntegerProperty(required=True)

def __unicode__(self):
return self.name

class CachingTestHandler(RequestHandler):
def random_data(self):
for i in xrange(0, 100):
name = "testdata_%02d" % random.randint(1, 99)
td = TestData(name=name, min=random.randint(1, 100),
max=random.randint(1, 100))
yield td

def add_data(self):
for d in self.random_data():
d.put()

return self.redirect_to('test/caching/benchmark')

def writeandlog(self, msg, *args):
msg = msg % args
logging.debug(msg)
self.response.data += msg + '\n'

def time_it(self, name, func, n=100):
tt = -time.time()
t = -quota.get_request_cpu_usage()
for i in xrange(0, n):
func()
t += quota.get_request_cpu_usage()
tt += time.time()

self.writeandlog("Running %s %d times took %d megacycles (%f
secs real time)", name, n, t, tt)

def benchmark(self, from_ds=0):
self.response = Response("")

if from_ds:
self.writeandlog("Using data from datastore")
data = list(TestData.all())
else:
self.writeandlog("Using new random data")
data = list(self.random_data())

def protobufy(data):
return [db.model_to_protobuf(x) for x in data]

def protobufy_encode(data):
retur