pcgrenier commented on a change in pull request #4282: URL: https://github.com/apache/nifi/pull/4282#discussion_r427542274
########## File path: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/queryrecord/FlowFileTable.java ########## @@ -223,12 +225,69 @@ private RelDataType getRelDataType(final DataType fieldType, final JavaTypeFacto case BIGINT: return typeFactory.createJavaType(BigInteger.class); case CHOICE: + final ChoiceDataType choiceDataType = (ChoiceDataType) fieldType; + DataType widestDataType = choiceDataType.getPossibleSubTypes().get(0); + for (final DataType possibleType : choiceDataType.getPossibleSubTypes()) { + if (possibleType == widestDataType) { + continue; + } + if (possibleType.getFieldType().isWiderThan(widestDataType.getFieldType())) { + widestDataType = possibleType; + continue; + } + if (widestDataType.getFieldType().isWiderThan(possibleType.getFieldType())) { + continue; + } + + // Neither is wider than the other. + widestDataType = null; + break; + } + + // If one of the CHOICE data types is the widest, use it. + if (widestDataType != null) { + return getRelDataType(widestDataType, typeFactory); + } + + // None of the data types is strictly the widest. Check if all data types are numeric. + // This would happen, for instance, if the data type is a choice between float and integer. + // If that is the case, we can use a String type for the table schema because all values will fit + // into a String. This will still allow for casting, etc. if the query requires it. + boolean allNumeric = true; + for (final DataType possibleType : choiceDataType.getPossibleSubTypes()) { + if (!isNumeric(possibleType)) { + allNumeric = false; + break; + } + } + + if (allNumeric) { Review comment: While I do agree, but given the example if you have split the records to sort out all the "number" values, the schema and the data could disagree. So logically you could know the data is all numbers but the schema disagrees. So if we are hacking to a string anyway, I don't think it would hurt to just allow it to pass on to let Calcite try and process it. This is just allowing a "best case" situation where Calcite isn't passed something it has no idea what to do with. So I do see a benefit in allowing the string to pass through, since if it is not a number it will still fail at query time but will at least attempt it. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org