[ 
https://issues.apache.org/jira/browse/AVRO-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962954#comment-13962954
 ] 

Tom White commented on AVRO-1402:
---------------------------------

There has been some further (offline) discussion about whether it would be 
possible to store the scale in the Avro schema, and not in the data for 
efficiency reasons. Something like:

{code}
{
  "type":"record”,
  "name":”org.apache.avro.FixedDecimal”,
  "fields”: [{
    "name":"value”,
    "type":”bytes"
  }],
  "scale":"2”,
  "precision":”4"
}
{code}

In the implementation committed here the name does not uniquely determine the 
RecordMapping so a FixedDecimal(4, 2) has a different RecordMapping to a 
FixedDecimal(3, 0). GenericData has a map of name to RecordMappings, so 
org.apache.avro.FixedDecimal would map to either FixedDecimalRecordMapping(4, 
2) or FixedDecimalRecordMapping(3, 0), but not both.

We could solve this problem by having a stateless FixedDecimalRecordMapping and 
having the read and write methods pass through the record schema to get the 
scale. However, consider the case where there are multiple decimals (with 
different scales) in a single schema. Since you can’t redefine a type multiple 
times (http://avro.apache.org/docs/1.7.6/spec.html#Names), the first one serves 
as the definition, and later ones are just references:

{code}
{"type":"record","name":"rec","fields":[
  
{"name":"dec1","type":{"type":"record","name":"org.apache.avro.FixedDecimal","fields":[{"name":"value","type":"bytes"}],"scale":"2","precision":"4"}},
  
{"name":"dec2","type":"org.apache.avro.FixedDecimal","precision":"3","scale":"0"}
]} 
{code}

When GenericDatumReader/Writer is processing dec2, the value of scale seen is 
2, not 0, since the read/write method sees the record schema, not the 
field-level schema. I can’t see a simple way around this.

Note that in the Decimal schema committed in this JIRA we allow maxPrecision 
and maxScale values to be specified as JSON properties that are not interpreted 
by Avro. E.g.

{code}
{"type":"record","name":"rec","fields":[
  
{"name":"dec1","type":{"type":"record","name":”org.apache.avro.Decimal","fields":[{"name":"scale","type":"int"},{"name":"value","type":"bytes"}],"maxPrecision":"4","maxScale":"2"}},
  
{"name":"dec2","type":"org.apache.avro.Decimal","maxPrecision":"3","maxScale":"0"}
]}
{code}

As it stands an application using this extra metadata would have to be careful 
to read the JSON properties either from the field (if they are present there) 
or the org.apache.avro.Decimal record type. This might be something we improve 
- e.g. by only having the metadata as a field-level properties, not as a part 
of the record definition. That would work for Hive.

> Support for DECIMAL type
> ------------------------
>
>                 Key: AVRO-1402
>                 URL: https://issues.apache.org/jira/browse/AVRO-1402
>             Project: Avro
>          Issue Type: New Feature
>    Affects Versions: 1.7.5
>            Reporter: Mariano Dominguez
>            Assignee: Tom White
>            Priority: Minor
>              Labels: Hive
>             Fix For: 1.7.7
>
>         Attachments: AVRO-1402.patch, AVRO-1402.patch, AVRO-1402.patch, 
> AVRO-1402.patch, UnixEpochRecordMapping.patch
>
>
> Currently, Avro does not seem to support a DECIMAL type or equivalent.
> http://avro.apache.org/docs/1.7.5/spec.html#schema_primitive
> Adding DECIMAL support would be particularly interesting when converting 
> types from Avro to Hive, since DECIMAL is already a supported data type in 
> Hive (0.11.0).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to