pcgrenier commented on pull request #4282:
URL: https://github.com/apache/nifi/pull/4282#issuecomment-631588369


   I do agree, but there are other functions like concat, that do make more
   sense. I also think taking this choice out of the hands of the flow
   developer is bad. I think you are correct in that it is a bad idea to allow
   the sum of strings and ints but also not allowing it at all if the
   developer needs to do it is worse. Really I'm cool with it either way and
   happy to see the fix.
   
   On Wed, May 20, 2020, 12:16 PM markap14 <notificati...@github.com> wrote:
   
   > *@markap14* commented on this pull request.
   > ------------------------------
   >
   > In
   > 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/queryrecord/FlowFileTable.java
   > <https://github.com/apache/nifi/pull/4282#discussion_r428138798>:
   >
   > > +                    return getRelDataType(widestDataType, typeFactory);
   > +                }
   > +
   > +                // None of the data types is strictly the widest. Check 
if all data types are numeric.
   > +                // This would happen, for instance, if the data type is a 
choice between float and integer.
   > +                // If that is the case, we can use a String type for the 
table schema because all values will fit
   > +                // into a String. This will still allow for casting, etc. 
if the query requires it.
   > +                boolean allNumeric = true;
   > +                for (final DataType possibleType : 
choiceDataType.getPossibleSubTypes()) {
   > +                    if (!isNumeric(possibleType)) {
   > +                        allNumeric = false;
   > +                        break;
   > +                    }
   > +                }
   > +
   > +                if (allNumeric) {
   >
   > @pcgrenier <https://github.com/pcgrenier> I do see your argument. But I
   > think I disagree. Let's say that we have the following CSV then:
   >
   > name, other
   > markap14, 48
   > pcgrenier, 19
   >
   > And the data's schema indicates that "other" is a CHOICE between STRING or
   > INT. Fundamentally, it comes down to one question: in this case, should we
   > allow the user to run a query like SELECT SUM(other) as total FROM
   > FLOWFILE.
   >
   > Your argument is yes, because I happen to know that for this particular
   > FlowFile it will succeed.
   >
   > My argument, though, is that we should not allow it. Yes, it would succeed
   > for this FlowFile, but what will happen is that it will fail for other
   > FlowFiles. Other FlowFiles that have the same schema and other FlowFiles
   > for which the data is exactly correct according to the schema. This makes
   > things more confusing because some FlowFiles succeed and others fail. All
   > the while, the schema clearly indicates that the data may be a STRING, and
   > you shouldn't really be able to sum together STRING data. If you do sort
   > out all the number values, it makes sense to use a schema that reflects
   > that. Otherwise, if we allow summing together a CHOICE[int, string] we're
   > not really honoring the schema, in my point of view.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/nifi/pull/4282#discussion_r428138798>, or
   > unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/AA7ZC6TSH6KKCGHPIA4BY63RSP65RANCNFSM4NEKANRA>
   > .
   >
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to