[ 
https://issues.apache.org/jira/browse/CASSANDRA-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14254657#comment-14254657
 ] 

Sylvain Lebresne commented on CASSANDRA-8374:
---------------------------------------------

bq. but I think that perhaps forcing users to make that choice explicitly is a 
good thing.

I disagree. My point is that in practice, there is no choice. If you make your 
function throw, then that function just cannot be used on a column name in 
general, and that's not acceptable imo. Typically, I cannot imagine us having 
any of our harcoded function throw for that very reason, even if/when we add 
many more functions (math functions, string manipulation ones, etc...).  
Therefore, I claim that 99% of functions will, whether we force the user to do 
it manually or not, return null on nulls input. Some user may start by having 
throw, but the first time they'll try to use their function in a select clause 
and it blows up because some row didn't had a value for that column, they will 
replace their function.

bq. Perhaps we can consider having no default and require one of the two 
options?

The paragraph above should explain why I dislike that idea just as much.

bq. that four or five words for the sake of explicit correctness is a bad 
tradeoff.

Note that my claim is that there is zero correctness to win. If I though it 
made sense to return null for roughtly 50% of function and to throw for the 
remaining 50%, then yes, I'd certainly be amenable to a more verbose default.

bq. IMO, returning null (potentially without knowing it) is more dangerous than 
getting an error that indicates your functions are broken.

I think that argument is somewhat contradicatory. If you admit that throwing an 
error means the function is broken then surely that means people should unbroke 
their function by returning null when they realize it is broken. Again, unless 
we make different based on context, which we agree we shouldn't do, then a 
choice must be made between returning null and an exception, and I just don't 
think which choice to make really depends on the function itself.

But maybe we're touching our disagreement. I don't think returning null on null 
inputs is so dangerous that it justify writing broken functions (even though I 
agree neither is perfect).

bq. Most of our existing functions are essentially casts, where returning null 
on null input makes sense.

Then let me maybe ask this another way. Can you give me a few example of 
functions that might make sense to add to our hardcoded functions and for which 
throwing an exception on null would be reasonable, knowing that it would 
basically mean the function can't be used in select clauses? I honestly cannot 
come with any.

bq. I'm also not sure how ommitting {{RETURNS NULL ON NULL INPUT}} from the 
syntax changes anything?

We haven't talked about aggregate functions, but I think the proper default 
behavior for aggregate function is to ignore rows that have nulls. To the best 
of my knowledge, that's what most aggregate functions do in most RDBMS. That's 
also what postgresql does if the "state" function is strict (i.e. {{RETURNS 
NULL ON NULL INPUTS}}). Robert's argument is simply that "ignoring rows" is not 
properly described by {{RETURNS NULL ON NULL INPUTS}} and omitting it from the 
syntax avoid that disrepancy. But I don't think it's a big deal: the option is 
on the state function anyway, not the aggregate ([~snazy] check the postgres 
doc for aggregate, you'll see that it's not particularly confusing).

> Better support of null for UDF
> ------------------------------
>
>                 Key: CASSANDRA-8374
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8374
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Robert Stupp
>             Fix For: 3.0
>
>         Attachments: 8473-1.txt, 8473-2.txt
>
>
> Currently, every function needs to deal with it's argument potentially being 
> {{null}}. There is very many case where that's just annoying, users should be 
> able to define a function like:
> {noformat}
> CREATE FUNCTION addTwo(val int) RETURNS int LANGUAGE JAVA AS 'return val + 2;'
> {noformat}
> without having this crashing as soon as a column it's applied to doesn't a 
> value for some rows (I'll note that this definition apparently cannot be 
> compiled currently, which should be looked into).  
> In fact, I think that by default methods shouldn't have to care about 
> {{null}} values: if the value is {{null}}, we should not call the method at 
> all and return {{null}}. There is still methods that may explicitely want to 
> handle {{null}} (to return a default value for instance), so maybe we can add 
> an {{ALLOW NULLS}} to the creation syntax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to