My patch to improve string encoding performance is now available as the following code review. The result: 13%-27% improvement on the ProtoBench files included in SVN. This is faster than the JDK because it significantly reduces memory allocations (JDK best case: 5X string length; my best case: string length + 64 bytes). It also eliminates a copy, but it also adds a copy of the String data, so that probably is about equal.

http://codereview.appspot.com/949044

This patch was designed to not change the lite runtime at all, so there is this weird hacky class called FastStringEncoder, that really contains methods that should be added to CodedOutputStream.

I think it would be a good idea to include this patch in the protocol buffer library, although there is a risk that my UTF-8 encoding code may have bugs in it. Hence, I won't be disappointed if this is rejected for the protocol buffer distribution, but I will try to maintain the patch.

I have more detailed performance results, if anyone cares.

Evan



Detailed results for speed messages:

ORIGINAL
Benchmarking benchmarks.GoogleSpeed$SpeedMessage1 with file google_message1.dat
Serialize to byte string: 21006530 iterations in 32.088s; 142.34642MB/s
Serialize to byte array: 19310791 iterations in 29.529s; 142.19565MB/s
Serialize to memory stream: 19679249 iterations in 32.203s; 132.87619MB/s Serialize to /dev/null with FileOutputStream: 15728640 iterations in 29.929s; 114.27044MB/s Serialize to /dev/null reusing FileOutputStream: 14796462 iterations in 27.534s; 116.848595MB/s Serialize to /dev/null with FileChannel: 18961591 iterations in 31.51s; 130.84625MB/s Serialize to /dev/null reusing FileChannel: 19157904 iterations in 30.755s; 135.44632MB/s

Benchmarking benchmarks.GoogleSpeed$SpeedMessage2 with file google_message2.dat
Serialize to byte string: 46108 iterations in 26.724s; 139.15257MB/s
Serialize to byte array: 50547 iterations in 28.874s; 141.19029MB/s
Serialize to memory stream: 48282 iterations in 29.776s; 130.77818MB/s
Serialize to /dev/null with FileOutputStream: 50505 iterations in 28.799s; 141.44037MB/s Serialize to /dev/null reusing FileOutputStream: 51478 iterations in 30.064s; 138.09926MB/s Serialize to /dev/null with FileChannel: 51328 iterations in 29.668s; 139.53477MB/s Serialize to /dev/null reusing FileChannel: 48454 iterations in 27.46s; 142.31332MB/s


OPTIMIZED
Benchmarking benchmarks.GoogleSpeed$SpeedMessage1 with file google_message1.dat
Serialize to byte string: 24207218 iterations in 29.098s; 180.89088MB/s
Serialize to byte array: 24480373 iterations in 29.937s; 177.8053MB/s
Serialize to memory stream: 22928046 iterations in 30.515s; 163.37613MB/s Serialize to /dev/null with FileOutputStream: 20242779 iterations in 29.626s; 148.57033MB/s Serialize to /dev/null reusing FileOutputStream: 19803135 iterations in 27.7s; 155.44943MB/s Serialize to /dev/null with FileChannel: 25135661 iterations in 34.242s; 159.61221MB/s Serialize to /dev/null reusing FileChannel: 22421439 iterations in 29.61s; 164.64934MB/s

Benchmarking benchmarks.GoogleSpeed$SpeedMessage2 with file google_message2.dat
Serialize to byte string: 58071 iterations in 29.694s; 157.72736MB/s
Serialize to byte array: 56888 iterations in 29.112s; 157.60321MB/s
Serialize to memory stream: 53171 iterations in 29.709s; 144.34547MB/s
Serialize to /dev/null with FileOutputStream: 58154 iterations in 29.968s; 156.5086MB/s Serialize to /dev/null reusing FileOutputStream: 57880 iterations in 29.779s; 156.75984MB/s Serialize to /dev/null with FileChannel: 55803 iterations in 28.881s; 155.83382MB/s Serialize to /dev/null reusing FileChannel: 59563 iterations in 30.668s; 156.64175MB/s


Size messages:

ORIGINAL
Benchmarking benchmarks.GoogleSize$SizeMessage1 with file google_message1.dat
Serialize to byte string: 2789755 iterations in 29.686s; 20.433807MB/s
Serialize to byte array: 2748801 iterations in 29.597s; 20.194382MB/s
Serialize to memory stream: 2702515 iterations in 28.65s; 20.510603MB/s
Serialize to /dev/null with FileOutputStream: 2716518 iterations in 29.376s; 20.107351MB/s Serialize to /dev/null reusing FileOutputStream: 2507755 iterations in 28.299s; 19.268545MB/s Serialize to /dev/null with FileChannel: 2809689 iterations in 31.171s; 19.599386MB/s Serialize to /dev/null reusing FileChannel: 2764260 iterations in 29.827s; 20.151354MB/s

Benchmarking benchmarks.GoogleSize$SizeMessage2 with file google_message2.dat
Serialize to byte string: 6530 iterations in 27.688s; 19.021206MB/s
Serialize to byte array: 7303 iterations in 30.9s; 19.061596MB/s
Serialize to memory stream: 6918 iterations in 30.389s; 18.360332MB/s
Serialize to /dev/null with FileOutputStream: 7154 iterations in 31.094s; 18.556187MB/s Serialize to /dev/null reusing FileOutputStream: 6707 iterations in 28.757s; 18.810535MB/s Serialize to /dev/null with FileChannel: 6887 iterations in 28.743s; 19.324774MB/s Serialize to /dev/null reusing FileChannel: 7373 iterations in 31.919s; 18.629936MB/s



OPTIMIZED
Benchmarking benchmarks.GoogleSize$SizeMessage1 with file google_message1.dat
Serialize to byte string: 3432701 iterations in 29.986s; 24.891575MB/s
Serialize to byte array: 3455325 iterations in 30.373s; 24.73638MB/s
Serialize to memory stream: 3398582 iterations in 30.742s; 24.038122MB/s
Serialize to /dev/null with FileOutputStream: 2932259 iterations in 28.331s; 22.504812MB/s Serialize to /dev/null reusing FileOutputStream: 2779893 iterations in 26.785s; 22.566872MB/s Serialize to /dev/null with FileChannel: 3129454 iterations in 28.526s; 23.854078MB/s Serialize to /dev/null reusing FileChannel: 3183935 iterations in 28.779s; 24.056MB/s

Benchmarking benchmarks.GoogleSize$SizeMessage2 with file google_message2.dat
Serialize to byte string: 6497 iterations in 26.656s; 19.657772MB/s
Serialize to byte array: 7231 iterations in 29.827s; 19.552631MB/s
Serialize to memory stream: 6643 iterations in 27.582s; 19.424726MB/s
Serialize to /dev/null with FileOutputStream: 7078 iterations in 27.844s; 20.501957MB/s Serialize to /dev/null reusing FileOutputStream: 7434 iterations in 30.969s; 19.360287MB/s Serialize to /dev/null with FileChannel: 6988 iterations in 29.144s; 19.338385MB/s Serialize to /dev/null reusing FileChannel: 7279 iterations in 30.338s; 19.3509MB/s
Deserialize from byte string: 5254 iterations in 29.942s; 14.152257MB/s
Deserialize from byte array: 5429 iterations in 30.481s; 14.3650465MB/s
Deserialize from memory stream: 6156 iterations in 32.337s; 15.353779MB/s

--
Evan Jones
http://evanjones.ca/

--
You received this message because you are subscribed to the Google Groups "Protocol 
Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Reply via email to