Recursive #validate() for union'ed schemas in Ruby cripples performance
-----------------------------------------------------------------------

                 Key: AVRO-654
                 URL: https://issues.apache.org/jira/browse/AVRO-654
             Project: Avro
          Issue Type: Bug
          Components: ruby
    Affects Versions: 1.3.3
            Reporter: Philip (flip) Kromer


The ruby DatumWriter calls #validate() on each #write(). In the case of a 
schema with many nested unions (cf. Cassandra's*), this requires a recursive 
depth-first search to determine which branch to take. In ruby, these operations 
are very expensive -- enough to limit write speeds to 2k/sec on a machine of 
moderate size.

For repeated writing of the same data structure, one idea would be to create a 
CompiledDatumWriter. This would walk through the validation and assemble an 
tree of the methods to apply to each schema element in turn:
  [ [:write_long 'id'], [:write_bytes, 'name'], [:write_record, 'address', 
[:write_long, 'street']] ] 

---
* 
http://github.com/infochimps/cassandra/blob/beta1_plus_patches/interface/avro/cassandra.avpr

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to