[ https://issues.apache.org/jira/browse/CASSANDRA-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14254657#comment-14254657 ]
Sylvain Lebresne commented on CASSANDRA-8374: --------------------------------------------- bq. but I think that perhaps forcing users to make that choice explicitly is a good thing. I disagree. My point is that in practice, there is no choice. If you make your function throw, then that function just cannot be used on a column name in general, and that's not acceptable imo. Typically, I cannot imagine us having any of our harcoded function throw for that very reason, even if/when we add many more functions (math functions, string manipulation ones, etc...). Therefore, I claim that 99% of functions will, whether we force the user to do it manually or not, return null on nulls input. Some user may start by having throw, but the first time they'll try to use their function in a select clause and it blows up because some row didn't had a value for that column, they will replace their function. bq. Perhaps we can consider having no default and require one of the two options? The paragraph above should explain why I dislike that idea just as much. bq. that four or five words for the sake of explicit correctness is a bad tradeoff. Note that my claim is that there is zero correctness to win. If I though it made sense to return null for roughtly 50% of function and to throw for the remaining 50%, then yes, I'd certainly be amenable to a more verbose default. bq. IMO, returning null (potentially without knowing it) is more dangerous than getting an error that indicates your functions are broken. I think that argument is somewhat contradicatory. If you admit that throwing an error means the function is broken then surely that means people should unbroke their function by returning null when they realize it is broken. Again, unless we make different based on context, which we agree we shouldn't do, then a choice must be made between returning null and an exception, and I just don't think which choice to make really depends on the function itself. But maybe we're touching our disagreement. I don't think returning null on null inputs is so dangerous that it justify writing broken functions (even though I agree neither is perfect). bq. Most of our existing functions are essentially casts, where returning null on null input makes sense. Then let me maybe ask this another way. Can you give me a few example of functions that might make sense to add to our hardcoded functions and for which throwing an exception on null would be reasonable, knowing that it would basically mean the function can't be used in select clauses? I honestly cannot come with any. bq. I'm also not sure how ommitting {{RETURNS NULL ON NULL INPUT}} from the syntax changes anything? We haven't talked about aggregate functions, but I think the proper default behavior for aggregate function is to ignore rows that have nulls. To the best of my knowledge, that's what most aggregate functions do in most RDBMS. That's also what postgresql does if the "state" function is strict (i.e. {{RETURNS NULL ON NULL INPUTS}}). Robert's argument is simply that "ignoring rows" is not properly described by {{RETURNS NULL ON NULL INPUTS}} and omitting it from the syntax avoid that disrepancy. But I don't think it's a big deal: the option is on the state function anyway, not the aggregate ([~snazy] check the postgres doc for aggregate, you'll see that it's not particularly confusing). > Better support of null for UDF > ------------------------------ > > Key: CASSANDRA-8374 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8374 > Project: Cassandra > Issue Type: Bug > Reporter: Sylvain Lebresne > Assignee: Robert Stupp > Fix For: 3.0 > > Attachments: 8473-1.txt, 8473-2.txt > > > Currently, every function needs to deal with it's argument potentially being > {{null}}. There is very many case where that's just annoying, users should be > able to define a function like: > {noformat} > CREATE FUNCTION addTwo(val int) RETURNS int LANGUAGE JAVA AS 'return val + 2;' > {noformat} > without having this crashing as soon as a column it's applied to doesn't a > value for some rows (I'll note that this definition apparently cannot be > compiled currently, which should be looked into). > In fact, I think that by default methods shouldn't have to care about > {{null}} values: if the value is {{null}}, we should not call the method at > all and return {{null}}. There is still methods that may explicitely want to > handle {{null}} (to return a default value for instance), so maybe we can add > an {{ALLOW NULLS}} to the creation syntax. -- This message was sent by Atlassian JIRA (v6.3.4#6332)