[ 
https://issues.apache.org/jira/browse/AVRO-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842650#comment-17842650
 ] 

Oscar Westra van Holthe - Kind commented on AVRO-3980:
------------------------------------------------------

I'm sorry, but I have not been able to reproduce the error with this 
information. I'll try with some observations.

As I understand, you're using the following schema (with probably another 
toplevel name) with a raw encoding:
{code:json}
{"type": "record", "name": "anonymous", "fields": [
  {"name": "statuses", "type": { "type": "array", "items": {
    "type": "record", "name": "com.entity.avro.StatusAvro", "fields": [
      {"name": "status", "type": ["null", "string"]},
      {"name": "reason", "type": ["null", "string"]},
      {"name": "validFor", "type": {
        "type": "record", "name": "com.entity.avro.ValidForAvro", "fields": [
          {"name": "start", "type": "long"},
          {"name": "end", "type": "long"}]
      }}
    ]
  }}}
]}
{code}
Am I correct in assuming you're using the exact same schema for both writing 
with Avro 1.11.1, and reading with Avro 1.11.3?

That aside, you should also have (somewhere) a conversion from the timestamps 
in the JSON (or {{{}Instant{}}}s in the object) to {{{}{{long}}{}}}. The reason 
is that there's no logical type specified to do this for you. Is this the case?

 

Perhaps it's easier to read & write not the raw encoding, but the 
single-message encoding. Although this adds a 10-byte header, it is also safer 
for long-lived data, such as stored in a database. The reason is that is can 
resolve conflicts caused by different, but compatible, schemas. Also, it'll 
fail fast when the correct write schema is not present, and not give weird 
errors like the one you encountered.

This could look something like this:
{code:java}
public static <T extends SpecificRecord> ByteBuffer serialize(T obj)
        throws IOException {
    // Cache the encoder created in the next two lines by object type
    Schema schema = obj.getSchema();
    BinaryMessageEncoder<T> encoder = new 
BinaryMessageEncoder<>(SpecificData.getForSchema(schema), schema);

    return encoder.encode(obj);
}

public static <T extends SpecificRecord> T deserialize(ByteBuffer byteBuffer, 
Class<T> type)
        throws IOException, NoSuchMethodException, InvocationTargetException, 
IllegalAccessException {
    // Cache the next lines in global definitions,
    // or implement a resolver for all schemas for all types stored anywhere as 
Avro binary data
    SchemaStore.Cache schemaStore = new SchemaStore.Cache();
    // schemaStore.addSchema(oldSchema); // Call with all schema versions for 
which data is stored in the database

    // Cache the decoder created in the next 2 lines by target type
    Method createDecoder = type.getDeclaredMethod("createDecoder", 
SchemaStore.class);
    BinaryMessageDecoder<T> decoder = (BinaryMessageDecoder<T>) 
createDecoder.invoke(null, schemaStore);

    return decoder.decode(byteBuffer);
}
{code}

> Error to deserialize field of type Long after upgrade from 1.11.1 to 1.11.3
> ---------------------------------------------------------------------------
>
>                 Key: AVRO-3980
>                 URL: https://issues.apache.org/jira/browse/AVRO-3980
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java, logical types
>    Affects Versions: 1.11.3
>            Reporter: Jari Louvem
>            Priority: Critical
>
> After we upgraded Avro library and avro-maven-plugin from version 1.11.1 to 
> 1.11.3 and
> we started to get the error "cannot read collections larger than 2147483639 
> items in java library".
>  
> This error is generated by SystemLimitException.checkMaxCollectionLength.
> The data that we are trying to deserialize (using avro 1.11.3) was serealized 
> using avro 1.11.1.
> The object that we are trying to deserealize is:
> {
>     "name": "statuses",
>     "type": {
>         "type": "array",
>         "items": "com.entity.avro.StatusAvro"
>     }
> }
> {
>     "name": "statuses",
>     "type": {
>         "type": "array",
>         "items": {
>             "name": "StatusAvro",
>             "type": "record",
>             "namespace": "com.entity.avro",
>             "fields": [
>                 {
>                     "name": "status",
>                     "type": [
>                         "null",
>                         "string"
>                     ]
>                 },
>                 {
>                     "name": "reason",
>                     "type": [
>                         "null",
>                         "string"
>                     ]
>                 },
>                 {
>                     "name": "validFor",
>                     "type": "com.entity.avro.ValidForAvro"
>                 }
>             ]
>         }
>     }
> }
> {
>     "name": "validFor",
>     "type": {
>         "name": "ValidForAvro",
>         "type": "record",
>         "namespace": "com.entity.avro",
>         "fields": [
>             {
>                 "name": "start",
>                 "type": "long"
>             },
>             {
>                 "name": "end",
>                 "type": "long"
>             }
>         ]
>     }
> }
> This is an example of the objects listed above:
> "statuses": [
>     {
>         "status": "INIT",
>         "reason": "Final_New_Reason",
>         "validFor": {
>             "start": "2020-01-30T11:45:00.839Z",
>             "end": "2030-01-23T06:58:21.563Z"
>         }
>     }
> ]
> The problem is that the array has only one item as shown above, so why is it 
> throwing an error of the collection is too long?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to