[ https://issues.apache.org/jira/browse/BEAM-10265?focusedWorklogId=764250&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-764250 ]
ASF GitHub Bot logged work on BEAM-10265: ----------------------------------------- Author: ASF GitHub Bot Created on: 29/Apr/22 14:01 Start Date: 29/Apr/22 14:01 Worklog Time Spent: 10m Work Description: zhoufek commented on code in PR #17477: URL: https://github.com/apache/beam/pull/17477#discussion_r861812937 ########## sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/StaticSchemaInference.java: ########## @@ -88,21 +89,48 @@ enum MethodType { */ public static Schema schemaFromClass( Class<?> clazz, FieldValueTypeSupplier fieldValueTypeSupplier) { + return schemaFromClass(clazz, fieldValueTypeSupplier, new HashMap<Class, Schema>()); + } + + private static Schema schemaFromClass( + Class<?> clazz, + FieldValueTypeSupplier fieldValueTypeSupplier, + HashMap<Class, Schema> alreadyVisitedSchemas) { + if (alreadyVisitedSchemas.containsKey(clazz)) { + Schema existingSchema = alreadyVisitedSchemas.get(clazz); + if (existingSchema == null) { + throw new IllegalArgumentException( + "Cannot infer schema with a circular reference. Class: " + clazz.getTypeName()); + } + return existingSchema; + } + alreadyVisitedSchemas.put(clazz, null); Schema.Builder builder = Schema.builder(); for (FieldValueTypeInformation type : fieldValueTypeSupplier.get(clazz)) { - Schema.FieldType fieldType = fieldFromType(type.getType(), fieldValueTypeSupplier); + Schema.FieldType fieldType = + fieldFromType(type.getType(), fieldValueTypeSupplier, alreadyVisitedSchemas); if (type.isNullable()) { builder.addNullableField(type.getName(), fieldType); } else { builder.addField(type.getName(), fieldType); } } - return builder.build(); + alreadyVisitedSchemas.remove(clazz); Review Comment: Sorry, missed this. Calling `remove` is unnecessary. `put` and `replace` will both replace an existing value with the new one. ########## sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/StaticSchemaInference.java: ########## @@ -88,21 +89,48 @@ enum MethodType { */ public static Schema schemaFromClass( Class<?> clazz, FieldValueTypeSupplier fieldValueTypeSupplier) { + return schemaFromClass(clazz, fieldValueTypeSupplier, new HashMap<Class, Schema>()); + } + + private static Schema schemaFromClass( + Class<?> clazz, + FieldValueTypeSupplier fieldValueTypeSupplier, + HashMap<Class, Schema> alreadyVisitedSchemas) { Review Comment: Sorry, missed this previously: Inputs should accept the most generic type possible to do what it needs to do. In this case `Map` is preferable, since the input doesn't need to be a `HashMap`. Main concern, since this is an output parameter, is if an `ImmutableMap` is passed, but that'll be caught in tests. With that said, in line with my comment below about using `synchronized` to make the method thread-safe, you could make the `Map` a static field and use `synchronized` methods to keep it thread-safe. That way the computations are preserved across multiple calls. However, I don't know the general preference in Beam, so it might be good to get another opinion. ########## sdks/java/extensions/protobuf/src/main/java/org/apache/beam/sdk/extensions/protobuf/ProtoSchemaTranslator.java: ########## @@ -156,6 +165,16 @@ static Schema getSchema(Class<? extends Message> clazz) { } static Schema getSchema(Descriptors.Descriptor descriptor) { Review Comment: Couple stylistic pointers on this: 1. The first line should start with a verb that fits into the sentence: "This method X..." In this case, "This method returns..." This should just cover the basic behavior of the class. 2. The comment on not being thread safe should be in its own paragraph. There would be a blank link, and this new paragraph starts with `<p>`. IntelliJ will probably add the `</p>`. Remove this if it does. More importantly, the comment on it not being thread-safe should just be something like, "This method is not thread safe." Javadoc comments should focus on what behavior the method exhibits. Not being thread-safe is an example, since it impacts how someone may use this method. However, the details of *why* that behavior exists should not be included. As far as a user is concerned, the reason it isn't thread-safe doesn't matter. The end behavior is the same. > I get why it may not be multi-thread safe, but why are you saying it is ok to leave it as is? I will say that this change doesn't seem to fail any existing tests Changing a public thread-safe method to not being thread-safe could negatively impact users who've been using it from a multi-threaded context and under the assumption that it is thread safe. Since this isn't public, there's less risk. With that said, I think all methods in Beam need to be thread-safe, but I might be misremembering. That's easy enough to fix, though, by marking the method as `synchronized`. Issue Time Tracking ------------------- Worklog Id: (was: 764250) Time Spent: 2h 50m (was: 2h 40m) > GetterBasedSchemaProvider#schemaFor stack overflows when given a recursive > schema > --------------------------------------------------------------------------------- > > Key: BEAM-10265 > URL: https://issues.apache.org/jira/browse/BEAM-10265 > Project: Beam > Issue Type: Bug > Components: sdk-java-core > Affects Versions: 2.21.0, 2.22.0 > Reporter: Reza ardeshir rokni > Assignee: Andrei Gurau > Priority: P3 > Labels: Clarified, starter > Time Spent: 2h 50m > Remaining Estimate: 0h > > Proto: > message TSFoo { > string a = 1; > string b = 2; > TSFoo theOlderMe = 3; > } > new ProtoMessageSchema().schemaFor(TypeDescriptor.of(Foo.TSFoo.class)); > Causes a stackoverflow. > > -- This message was sent by Atlassian Jira (v8.20.7#820007)