I see. We're having problems with code like this (forgive my noob scala):

val df = Seq(("moose","ice"), (null,"fire")).toDF("animals", "elements")
df
  .filter($"animals".rlike(".*"))
  .filter(callUDF({(value: String) => value.length > 2}, BooleanType, 
$"animals"))
.collect()

This code throws a NPE because:
* Catalyst combines the filters with an AND
* the first filter passes returns null on the first input
* the second filter tries to read the length of that null

This feels weird. Reading that code, I wouldn't expect null to be passed to the 
second filter. Even weirder is that if you call collect() after the first 
filter you won't see nulls, and if you write the data to disk and reread it, 
the NPE won't happen.

It's bewildering! Is this the intended behavior?
________________________________
From: Reynold Xin [r...@databricks.com]
Sent: Monday, September 14, 2015 10:14 PM
To: Zack Sampson
Cc: dev@spark.apache.org
Subject: Re: And.eval short circuiting

rxin=# select null and true;
 ?column?
----------

(1 row)

rxin=# select null and false;
 ?column?
----------
 f
(1 row)


null and false should return false.


On Mon, Sep 14, 2015 at 9:12 PM, Zack Sampson 
<zsamp...@palantir.com<mailto:zsamp...@palantir.com>> wrote:
It seems like And.eval can avoid calculating right.eval if left.eval returns 
null. Is there a reason it's written like it is?


override def eval(input: Row): Any = {
  val l = left.eval(input)
  if (l == false) {
    false
  } else {
    val r = right.eval(input)
    if (r == false) {
      false
    } else {
      if (l != null && r != null) {
        true
      } else {
        null
      }
    }
  }
}

Reply via email to