[GitHub] spark pull request #23153: [SPARK-26147][SQL] only pull out unevaluable pyth...

xuanyuanking Tue, 27 Nov 2018 08:30:14 -0800

Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23153#discussion_r236647128
  
    --- Diff: python/pyspark/sql/tests/test_udf.py ---
    @@ -209,6 +209,18 @@ def test_udf_in_join_condition(self):
             with self.sql_conf({"spark.sql.crossJoin.enabled": True}):
                 self.assertEqual(df.collect(), [Row(a=1, b=1)])
     
    +    def test_udf_in_left_outer_join_condition(self):
    +        # regression test for SPARK-26147
    +        from pyspark.sql.functions import udf, col
    +        left = self.spark.createDataFrame([Row(a=1)])
    +        right = self.spark.createDataFrame([Row(b=1)])
    +        f = udf(lambda a: str(a), StringType())
    +        # The join condition can't be pushed down, as it refers to 
attributes from both sides.
    +        # The Python UDF only refer to attributes from one side, so it's 
evaluable.
    +        df = left.join(right, f("a") == col("b").cast("string"), how = 
"left_outer")
    --- End diff --
    
    style nit: how="left_outer"



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23153: [SPARK-26147][SQL] only pull out unevaluable pyth...

Reply via email to