[ 
https://issues.apache.org/jira/browse/CRUNCH-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Ruppert updated CRUNCH-603:
----------------------------------
    Attachment: 0001-TupleWritable-reuse-Writable-instances-where-possibl.patch

> Cache constituent Writables inside TupleWritable `readField` call
> -----------------------------------------------------------------
>
>                 Key: CRUNCH-603
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-603
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.13.0
>            Reporter: Steven Ruppert
>            Assignee: Josh Wills
>            Priority: Minor
>         Attachments: 
> 0001-TupleWritable-reuse-Writable-instances-where-possibl.patch
>
>
> Currently, `TupleWritable.readFields` will, for every field in the tuple, 
> create a new Writable of that field type using reflection 
> (`WritableFactories.newInstance`), through `TupleWritable.getWritable`, in 
> order to deserialize that field. This burns up an unfortunate amount of CPU 
> time.
> I've got a patch for this that caches the writables to be reused (just as the 
> TupleWritable itself is reused throughout hadoop). It appears to work, at 
> least for our cases. I think it will break if you ever  have heterogenous 
> tuple types, but that seems like a bad idea, if not already proscribed in the 
> documentation somewhere.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to