[ 
https://issues.apache.org/jira/browse/FLINK-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Gevay updated FLINK-2447:
-------------------------------
    Description: 
Consider the following code:

DataSet<FooBarPojo> d1 = env.fromElements(new FooBarPojo());
                DataSet<Tuple2<FooBarPojo, FooBarPojo>> d2 = d1.map(new 
MapFunction<FooBarPojo, Tuple2<FooBarPojo, FooBarPojo>>() {
                        @Override
                        public Tuple2<FooBarPojo, FooBarPojo> map(FooBarPojo 
value) throws Exception {
                                return null;
                        }
                });

where FooBarPojo is the following type:
public class FooBarPojo {
        public int foo, bar;
        public FooBarPojo() {}
}

This should print a tuple type with two identical fields:
Java Tuple2<PojoType<FooBarPojo, fields = [bar: Integer, foo: Integer]>, 
PojoType<FooBarPojo, fields = [bar: Integer, foo: Integer]>>

But it prints the following instead:
Java Tuple2<PojoType<FooBarPojo, fields = [bar: Integer, foo: Integer]>, 
GenericType<FooBarPojo>>

Note, that this problem causes some co-groups in Gelly to crash with 
"org.apache.flink.api.common.InvalidProgramException: The pair of co-group keys 
are not compatible with each other" when the vertex ID type is a POJO, because 
the second field of the Edge type gets to be a generic type, but the POJO gets 
recognized in the Vertex type, and getNumberOfKeyFields returns different 
numbers for the POJO and the generic type.

The source of the problem is the mechanism in TypeExtractor that would detect 
recursive types (see the "alreadySeen" field in TypeExtractor), as it mistakes 
the second appearance of FooBarPojo with a recursive field.

Specifically the following happens: createTypeInfoWithTypeHierarchy starts to 
process the Tuple2<FooBarPojo, FooBarPojo> type, and in line 434 it calls 
itself for the first field, which proceeds into the privateGetForClass case 
which correctly detects that it is a POJO, and correctly returns a 
PojoTypeInfo; but in the meantime in line 1191, privateGetForClass adds 
PojoTypeInfo to "alreadySeen". Then the outer createTypeInfoWithTypeHierarchy 
approaches the second field, goes into privateGetForClass, which mistakenly 
returns a GenericTypeInfo, as it thinks in line 1187, that a recursive type is 
being processed.

(Note, that if we comment out the recursive type detection (the lines that do 
their thing with the alreadySeen field), then the output is correct.)

  was:
Consider the following code:

DataSet<FooBarPojo> d1 = env.fromElements(new FooBarPojo());
                DataSet<Tuple2<FooBarPojo, FooBarPojo>> d2 = d1.map(new 
MapFunction<FooBarPojo, Tuple2<FooBarPojo, FooBarPojo>>() {
                        @Override
                        public Tuple2<FooBarPojo, FooBarPojo> map(FooBarPojo 
value) throws Exception {
                                return null;
                        }
                });

where FooBarPojo is the following type:
public class FooBarPojo {
        public int foo, bar;
        public FooBarPojo() {}
}

This should print a tuple type with two identical fields:
Java Tuple2<PojoType<FooBarPojo, fields = [bar: Integer, foo: Integer]>, 
PojoType<FooBarPojo, fields = [bar: Integer, foo: Integer]>>

But it prints the following instead:
Java Tuple2<PojoType<FooBarPojo, fields = [bar: Integer, foo: Integer]>, 
GenericType<FooBarPojo>>

Note, that this problem causes some co-groups in Gelly to crash with 
"org.apache.flink.api.common.InvalidProgramException: The pair of co-group keys 
are not compatible with each other" when the vertex ID type is a POJO, because 
the second field of the Edge type gets to be a generic type, but the POJO gets 
recognized in the Vertex type, and getNumberOfKeyFields returns different 
numbers for the POJO and the generic type.

The source of the problem is the mechanism in TypeExtractor that would detect 
recursive types (see the "alreadySeen" field in TypeExtractor), as it mistakes 
the second appearance of FooBarPojo with a recursive field.

Specifically the following happens: createTypeInfoWithTypeHierarchy
starts to process the Tuple2<FooBarPojo, FooBarPojo> type, and in line 434 it 
calls itself for the first field, which proceeds into the privateGetForClass 
case which correctly detects that it is a POJO, and correctly returns a 
PojoTypeInfo; but in the meantime in line 1191, privateGetForClass adds 
PojoTypeInfo to "alreadySeen". Then the outer createTypeInfoWithTypeHierarchy 
approaches the second field, goes into privateGetForClass, which mistakenly 
returns a GenericTypeInfo, as it thinks in line 1187, that a recursive type is 
being processed.

(Note, that if we comment out the recursive type detection (the lines that do 
their thing with the alreadySeen field), then the output is correct.)


> TypeExtractor returns wrong type info when a Tuple has two fields of the same 
> POJO type
> ---------------------------------------------------------------------------------------
>
>                 Key: FLINK-2447
>                 URL: https://issues.apache.org/jira/browse/FLINK-2447
>             Project: Flink
>          Issue Type: Bug
>          Components: Java API
>            Reporter: Gabor Gevay
>
> Consider the following code:
> DataSet<FooBarPojo> d1 = env.fromElements(new FooBarPojo());
>               DataSet<Tuple2<FooBarPojo, FooBarPojo>> d2 = d1.map(new 
> MapFunction<FooBarPojo, Tuple2<FooBarPojo, FooBarPojo>>() {
>                       @Override
>                       public Tuple2<FooBarPojo, FooBarPojo> map(FooBarPojo 
> value) throws Exception {
>                               return null;
>                       }
>               });
> where FooBarPojo is the following type:
> public class FooBarPojo {
>       public int foo, bar;
>       public FooBarPojo() {}
> }
> This should print a tuple type with two identical fields:
> Java Tuple2<PojoType<FooBarPojo, fields = [bar: Integer, foo: Integer]>, 
> PojoType<FooBarPojo, fields = [bar: Integer, foo: Integer]>>
> But it prints the following instead:
> Java Tuple2<PojoType<FooBarPojo, fields = [bar: Integer, foo: Integer]>, 
> GenericType<FooBarPojo>>
> Note, that this problem causes some co-groups in Gelly to crash with 
> "org.apache.flink.api.common.InvalidProgramException: The pair of co-group 
> keys are not compatible with each other" when the vertex ID type is a POJO, 
> because the second field of the Edge type gets to be a generic type, but the 
> POJO gets recognized in the Vertex type, and getNumberOfKeyFields returns 
> different numbers for the POJO and the generic type.
> The source of the problem is the mechanism in TypeExtractor that would detect 
> recursive types (see the "alreadySeen" field in TypeExtractor), as it 
> mistakes the second appearance of FooBarPojo with a recursive field.
> Specifically the following happens: createTypeInfoWithTypeHierarchy starts to 
> process the Tuple2<FooBarPojo, FooBarPojo> type, and in line 434 it calls 
> itself for the first field, which proceeds into the privateGetForClass case 
> which correctly detects that it is a POJO, and correctly returns a 
> PojoTypeInfo; but in the meantime in line 1191, privateGetForClass adds 
> PojoTypeInfo to "alreadySeen". Then the outer createTypeInfoWithTypeHierarchy 
> approaches the second field, goes into privateGetForClass, which mistakenly 
> returns a GenericTypeInfo, as it thinks in line 1187, that a recursive type 
> is being processed.
> (Note, that if we comment out the recursive type detection (the lines that do 
> their thing with the alreadySeen field), then the output is correct.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to