[jira] [Updated] (SPARK-34002) Broken UDF Encoding
[ https://issues.apache.org/jira/browse/SPARK-34002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-34002: -- Fix Version/s: (was: 3.1.0) 3.1.1 > Broken UDF Encoding > --- > > Key: SPARK-34002 > URL: https://issues.apache.org/jira/browse/SPARK-34002 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.2.0 > Environment: Windows 10 Surface book 3 > Local Spark > IntelliJ Idea > >Reporter: Mark Hamilton >Assignee: L. C. Hsieh >Priority: Blocker > Fix For: 3.1.1 > > > UDFs can behave differently depending on if a dataframe is cached, despite > the dataframe being identical > > Repro: > > {code:java} > import org.apache.spark.sql.expressions.UserDefinedFunction > import org.apache.spark.sql.functions.{col, udf} > case class Bar(a: Int) > > import spark.implicits._ > def f1(bar: Bar): Option[Bar] = { > None > } > def f2(bar: Bar): Option[Bar] = { > Option(bar) > } > val udf1: UserDefinedFunction = udf(f1 _) > val udf2: UserDefinedFunction = udf(f2 _) > // Commenting in the cache will make this example work > val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache() > val newDf = df > .withColumn("c1", udf1(col("c0"))) > .withColumn("c2", udf2(col("c1"))) > newDf.show() > {code} > > Error: > Testing started at 12:58 AM ...Testing started at 12:58 AM ..."C:\Program > Files\Java\jdk1.8.0_271\bin\java.exe" "-javaagent:C:\Program > Files\JetBrains\IntelliJ IDEA 2020.2.3\lib\idea_rt.jar=56657:C:\Program > Files\JetBrains\IntelliJ IDEA 2020.2.3\bin" -Dfile.encoding=UTF-8 -classpath > "C:\Users\marhamil\AppData\Roaming\JetBrains\IntelliJIdea2020.2\plugins\Scala\lib\runners.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\charsets.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\deploy.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\access-bridge-64.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\cldrdata.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\dnsns.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\jaccess.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\jfxrt.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\localedata.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\nashorn.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunec.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunjce_provider.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunmscapi.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunpkcs11.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\zipfs.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\javaws.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jce.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jfr.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jfxswt.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jsse.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\management-agent.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\plugin.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\resources.jar;C:\Program >
[jira] [Updated] (SPARK-34002) Broken UDF Encoding
[ https://issues.apache.org/jira/browse/SPARK-34002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-34002: -- Target Version/s: 3.1.1 > Broken UDF Encoding > --- > > Key: SPARK-34002 > URL: https://issues.apache.org/jira/browse/SPARK-34002 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.2.0 > Environment: Windows 10 Surface book 3 > Local Spark > IntelliJ Idea > >Reporter: Mark Hamilton >Priority: Blocker > > UDFs can behave differently depending on if a dataframe is cached, despite > the dataframe being identical > > Repro: > > {code:java} > import org.apache.spark.sql.expressions.UserDefinedFunction > import org.apache.spark.sql.functions.{col, udf} > case class Bar(a: Int) > > import spark.implicits._ > def f1(bar: Bar): Option[Bar] = { > None > } > def f2(bar: Bar): Option[Bar] = { > Option(bar) > } > val udf1: UserDefinedFunction = udf(f1 _) > val udf2: UserDefinedFunction = udf(f2 _) > // Commenting in the cache will make this example work > val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache() > val newDf = df > .withColumn("c1", udf1(col("c0"))) > .withColumn("c2", udf2(col("c1"))) > newDf.show() > {code} > > Error: > Testing started at 12:58 AM ...Testing started at 12:58 AM ..."C:\Program > Files\Java\jdk1.8.0_271\bin\java.exe" "-javaagent:C:\Program > Files\JetBrains\IntelliJ IDEA 2020.2.3\lib\idea_rt.jar=56657:C:\Program > Files\JetBrains\IntelliJ IDEA 2020.2.3\bin" -Dfile.encoding=UTF-8 -classpath > "C:\Users\marhamil\AppData\Roaming\JetBrains\IntelliJIdea2020.2\plugins\Scala\lib\runners.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\charsets.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\deploy.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\access-bridge-64.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\cldrdata.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\dnsns.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\jaccess.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\jfxrt.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\localedata.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\nashorn.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunec.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunjce_provider.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunmscapi.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunpkcs11.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\zipfs.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\javaws.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jce.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jfr.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jfxswt.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jsse.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\management-agent.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\plugin.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\resources.jar;C:\Program >
[jira] [Updated] (SPARK-34002) Broken UDF Encoding
[ https://issues.apache.org/jira/browse/SPARK-34002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-34002: -- Priority: Blocker (was: Major) > Broken UDF Encoding > --- > > Key: SPARK-34002 > URL: https://issues.apache.org/jira/browse/SPARK-34002 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.2.0 > Environment: Windows 10 Surface book 3 > Local Spark > IntelliJ Idea > >Reporter: Mark Hamilton >Priority: Blocker > > UDFs can behave differently depending on if a dataframe is cached, despite > the dataframe being identical > > Repro: > > {code:java} > import org.apache.spark.sql.expressions.UserDefinedFunction > import org.apache.spark.sql.functions.{col, udf} > case class Bar(a: Int) > > import spark.implicits._ > def f1(bar: Bar): Option[Bar] = { > None > } > def f2(bar: Bar): Option[Bar] = { > Option(bar) > } > val udf1: UserDefinedFunction = udf(f1 _) > val udf2: UserDefinedFunction = udf(f2 _) > // Commenting in the cache will make this example work > val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache() > val newDf = df > .withColumn("c1", udf1(col("c0"))) > .withColumn("c2", udf2(col("c1"))) > newDf.show() > {code} > > Error: > Testing started at 12:58 AM ...Testing started at 12:58 AM ..."C:\Program > Files\Java\jdk1.8.0_271\bin\java.exe" "-javaagent:C:\Program > Files\JetBrains\IntelliJ IDEA 2020.2.3\lib\idea_rt.jar=56657:C:\Program > Files\JetBrains\IntelliJ IDEA 2020.2.3\bin" -Dfile.encoding=UTF-8 -classpath > "C:\Users\marhamil\AppData\Roaming\JetBrains\IntelliJIdea2020.2\plugins\Scala\lib\runners.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\charsets.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\deploy.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\access-bridge-64.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\cldrdata.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\dnsns.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\jaccess.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\jfxrt.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\localedata.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\nashorn.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunec.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunjce_provider.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunmscapi.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunpkcs11.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\zipfs.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\javaws.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jce.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jfr.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jfxswt.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jsse.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\management-agent.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\plugin.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\resources.jar;C:\Program >
[jira] [Updated] (SPARK-34002) Broken UDF Encoding
[ https://issues.apache.org/jira/browse/SPARK-34002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh updated SPARK-34002: Affects Version/s: (was: 3.0.1) 3.1.0 > Broken UDF Encoding > --- > > Key: SPARK-34002 > URL: https://issues.apache.org/jira/browse/SPARK-34002 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 > Environment: Windows 10 Surface book 3 > Local Spark > IntelliJ Idea > >Reporter: Mark Hamilton >Priority: Major > > UDFs can behave differently depending on if a dataframe is cached, despite > the dataframe being identical > > Repro: > > {code:java} > import org.apache.spark.sql.expressions.UserDefinedFunction > import org.apache.spark.sql.functions.{col, udf} > case class Bar(a: Int) > > import spark.implicits._ > def f1(bar: Bar): Option[Bar] = { > None > } > def f2(bar: Bar): Option[Bar] = { > Option(bar) > } > val udf1: UserDefinedFunction = udf(f1 _) > val udf2: UserDefinedFunction = udf(f2 _) > // Commenting in the cache will make this example work > val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache() > val newDf = df > .withColumn("c1", udf1(col("c0"))) > .withColumn("c2", udf2(col("c1"))) > newDf.show() > {code} > > Error: > Testing started at 12:58 AM ...Testing started at 12:58 AM ..."C:\Program > Files\Java\jdk1.8.0_271\bin\java.exe" "-javaagent:C:\Program > Files\JetBrains\IntelliJ IDEA 2020.2.3\lib\idea_rt.jar=56657:C:\Program > Files\JetBrains\IntelliJ IDEA 2020.2.3\bin" -Dfile.encoding=UTF-8 -classpath > "C:\Users\marhamil\AppData\Roaming\JetBrains\IntelliJIdea2020.2\plugins\Scala\lib\runners.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\charsets.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\deploy.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\access-bridge-64.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\cldrdata.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\dnsns.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\jaccess.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\jfxrt.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\localedata.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\nashorn.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunec.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunjce_provider.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunmscapi.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunpkcs11.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\zipfs.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\javaws.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jce.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jfr.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jfxswt.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jsse.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\management-agent.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\plugin.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\resources.jar;C:\Program >
[jira] [Updated] (SPARK-34002) Broken UDF Encoding
[ https://issues.apache.org/jira/browse/SPARK-34002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh updated SPARK-34002: Affects Version/s: 3.2.0 > Broken UDF Encoding > --- > > Key: SPARK-34002 > URL: https://issues.apache.org/jira/browse/SPARK-34002 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.2.0 > Environment: Windows 10 Surface book 3 > Local Spark > IntelliJ Idea > >Reporter: Mark Hamilton >Priority: Major > > UDFs can behave differently depending on if a dataframe is cached, despite > the dataframe being identical > > Repro: > > {code:java} > import org.apache.spark.sql.expressions.UserDefinedFunction > import org.apache.spark.sql.functions.{col, udf} > case class Bar(a: Int) > > import spark.implicits._ > def f1(bar: Bar): Option[Bar] = { > None > } > def f2(bar: Bar): Option[Bar] = { > Option(bar) > } > val udf1: UserDefinedFunction = udf(f1 _) > val udf2: UserDefinedFunction = udf(f2 _) > // Commenting in the cache will make this example work > val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache() > val newDf = df > .withColumn("c1", udf1(col("c0"))) > .withColumn("c2", udf2(col("c1"))) > newDf.show() > {code} > > Error: > Testing started at 12:58 AM ...Testing started at 12:58 AM ..."C:\Program > Files\Java\jdk1.8.0_271\bin\java.exe" "-javaagent:C:\Program > Files\JetBrains\IntelliJ IDEA 2020.2.3\lib\idea_rt.jar=56657:C:\Program > Files\JetBrains\IntelliJ IDEA 2020.2.3\bin" -Dfile.encoding=UTF-8 -classpath > "C:\Users\marhamil\AppData\Roaming\JetBrains\IntelliJIdea2020.2\plugins\Scala\lib\runners.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\charsets.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\deploy.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\access-bridge-64.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\cldrdata.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\dnsns.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\jaccess.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\jfxrt.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\localedata.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\nashorn.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunec.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunjce_provider.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunmscapi.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunpkcs11.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\zipfs.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\javaws.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jce.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jfr.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jfxswt.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jsse.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\management-agent.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\plugin.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\resources.jar;C:\Program >
[jira] [Updated] (SPARK-34002) Broken UDF Encoding
[ https://issues.apache.org/jira/browse/SPARK-34002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-34002: - Component/s: (was: Spark Core) SQL > Broken UDF Encoding > --- > > Key: SPARK-34002 > URL: https://issues.apache.org/jira/browse/SPARK-34002 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 > Environment: Windows 10 Surface book 3 > Local Spark > IntelliJ Idea > >Reporter: Mark Hamilton >Priority: Major > > UDFs can behave differently depending on if a dataframe is cached, despite > the dataframe being identical > > Repro: > > {code:java} > import org.apache.spark.sql.expressions.UserDefinedFunction > import org.apache.spark.sql.functions.{col, udf} > case class Bar(a: Int) > > import spark.implicits._ > def f1(bar: Bar): Option[Bar] = { > None > } > def f2(bar: Bar): Option[Bar] = { > Option(bar) > } > val udf1: UserDefinedFunction = udf(f1 _) > val udf2: UserDefinedFunction = udf(f2 _) > // Commenting in the cache will make this example work > val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache() > val newDf = df > .withColumn("c1", udf1(col("c0"))) > .withColumn("c2", udf2(col("c1"))) > newDf.show() > {code} > > Error: > Testing started at 12:58 AM ...Testing started at 12:58 AM ..."C:\Program > Files\Java\jdk1.8.0_271\bin\java.exe" "-javaagent:C:\Program > Files\JetBrains\IntelliJ IDEA 2020.2.3\lib\idea_rt.jar=56657:C:\Program > Files\JetBrains\IntelliJ IDEA 2020.2.3\bin" -Dfile.encoding=UTF-8 -classpath > "C:\Users\marhamil\AppData\Roaming\JetBrains\IntelliJIdea2020.2\plugins\Scala\lib\runners.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\charsets.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\deploy.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\access-bridge-64.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\cldrdata.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\dnsns.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\jaccess.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\jfxrt.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\localedata.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\nashorn.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunec.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunjce_provider.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunmscapi.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunpkcs11.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\zipfs.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\javaws.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jce.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jfr.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jfxswt.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jsse.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\management-agent.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\plugin.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\resources.jar;C:\Program >
[jira] [Updated] (SPARK-34002) Broken UDF Encoding
[ https://issues.apache.org/jira/browse/SPARK-34002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-34002: - Priority: Major (was: Critical) > Broken UDF Encoding > --- > > Key: SPARK-34002 > URL: https://issues.apache.org/jira/browse/SPARK-34002 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 > Environment: Windows 10 Surface book 3 > Local Spark > IntelliJ Idea > >Reporter: Mark Hamilton >Priority: Major > > UDFs can behave differently depending on if a dataframe is cached, despite > the dataframe being identical > > Repro: > > {code:java} > import org.apache.spark.sql.expressions.UserDefinedFunction > import org.apache.spark.sql.functions.{col, udf} > case class Bar(a: Int) > > import spark.implicits._ > def f1(bar: Bar): Option[Bar] = { > None > } > def f2(bar: Bar): Option[Bar] = { > Option(bar) > } > val udf1: UserDefinedFunction = udf(f1 _) > val udf2: UserDefinedFunction = udf(f2 _) > // Commenting in the cache will make this example work > val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache() > val newDf = df > .withColumn("c1", udf1(col("c0"))) > .withColumn("c2", udf2(col("c1"))) > newDf.show() > {code} > > Error: > Testing started at 12:58 AM ...Testing started at 12:58 AM ..."C:\Program > Files\Java\jdk1.8.0_271\bin\java.exe" "-javaagent:C:\Program > Files\JetBrains\IntelliJ IDEA 2020.2.3\lib\idea_rt.jar=56657:C:\Program > Files\JetBrains\IntelliJ IDEA 2020.2.3\bin" -Dfile.encoding=UTF-8 -classpath > "C:\Users\marhamil\AppData\Roaming\JetBrains\IntelliJIdea2020.2\plugins\Scala\lib\runners.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\charsets.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\deploy.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\access-bridge-64.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\cldrdata.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\dnsns.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\jaccess.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\jfxrt.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\localedata.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\nashorn.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunec.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunjce_provider.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunmscapi.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\sunpkcs11.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\ext\zipfs.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\javaws.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jce.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jfr.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jfxswt.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\jsse.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\management-agent.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\plugin.jar;C:\Program > Files\Java\jdk1.8.0_271\jre\lib\resources.jar;C:\Program >
[jira] [Updated] (SPARK-34002) Broken UDF Encoding
[ https://issues.apache.org/jira/browse/SPARK-34002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Hamilton updated SPARK-34002: -- Description: UDFs can behave differently depending on if a dataframe is cached, despite the dataframe being identical Repro: {code:java} import org.apache.spark.sql.expressions.UserDefinedFunction import org.apache.spark.sql.functions.{col, udf} case class Bar(a: Int) import spark.implicits._ def f1(bar: Bar): Option[Bar] = { None } def f2(bar: Bar): Option[Bar] = { Option(bar) } val udf1: UserDefinedFunction = udf(f1 _) val udf2: UserDefinedFunction = udf(f2 _) // Commenting in the cache will make this example work val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache() val newDf = df .withColumn("c1", udf1(col("c0"))) .withColumn("c2", udf2(col("c1"))) newDf.show() {code} Error: Testing started at 12:58 AM ...Testing started at 12:58 AM ..."C:\Program Files\Java\jdk1.8.0_271\bin\java.exe" "-javaagent:C:\Program Files\JetBrains\IntelliJ IDEA 2020.2.3\lib\idea_rt.jar=56657:C:\Program Files\JetBrains\IntelliJ IDEA 2020.2.3\bin" -Dfile.encoding=UTF-8 -classpath "C:\Users\marhamil\AppData\Roaming\JetBrains\IntelliJIdea2020.2\plugins\Scala\lib\runners.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\charsets.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\deploy.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\access-bridge-64.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\cldrdata.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\dnsns.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\jaccess.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\jfxrt.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\localedata.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\nashorn.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\sunec.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\sunjce_provider.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\sunmscapi.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\sunpkcs11.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\zipfs.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\javaws.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\jce.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\jfr.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\jfxswt.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\jsse.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\management-agent.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\plugin.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\resources.jar;C:\Program
[jira] [Updated] (SPARK-34002) Broken UDF Encoding
[ https://issues.apache.org/jira/browse/SPARK-34002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Hamilton updated SPARK-34002: -- Description: UDFs can behave differently depending on if a dataframe is cached, despite the dataframe being identical Repro: {code:java} import org.apache.spark.sql.expressions.UserDefinedFunction import org.apache.spark.sql.functions.{col, udf} case class Bar(a: Int) import spark.implicits._ def f1(bar: Bar): Option[Bar] = { None } def f2(bar: Bar): Option[Bar] = { Option(bar) } val udf1: UserDefinedFunction = udf(f1 _) val udf2: UserDefinedFunction = udf(f2 _) // Commenting in the cache will make this example work val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache() val newDf = df .withColumn("c1", udf1(col("c0"))) .withColumn("c2", udf2(col("c1"))) newDf.show() {code} Error: {code:java} Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties21/01/05 00:52:57 INFO SparkContext: Running Spark version 3.0.121/01/05 00:52:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable21/01/05 00:52:57 INFO ResourceUtils: ==21/01/05 00:52:57 INFO ResourceUtils: Resources for spark.driver: 21/01/05 00:52:57 INFO ResourceUtils: ==21/01/05 00:52:57 INFO SparkContext: Submitted application: JsonOutputParserSuite21/01/05 00:52:57 INFO SparkContext: Spark configuration:spark.app.name=JsonOutputParserSuitespark.driver.maxResultSize=6gspark.logConf=truespark.master=local[*]spark.sql.crossJoin.enabled=truespark.sql.shuffle.partitions=20spark.sql.warehouse.dir=file:/code/mmlspark/spark-warehouse21/01/05 00:52:58 INFO SecurityManager: Changing view acls to: marhamil21/01/05 00:52:58 INFO SecurityManager: Changing modify acls to: marhamil21/01/05 00:52:58 INFO SecurityManager: Changing view acls groups to: 21/01/05 00:52:58 INFO SecurityManager: Changing modify acls groups to: 21/01/05 00:52:58 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(marhamil); groups with view permissions: Set(); users with modify permissions: Set(marhamil); groups with modify permissions: Set()21/01/05 00:52:58 INFO Utils: Successfully started service 'sparkDriver' on port 52315.21/01/05 00:52:58 INFO SparkEnv: Registering MapOutputTracker21/01/05 00:52:58 INFO SparkEnv: Registering BlockManagerMaster21/01/05 00:52:58 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information21/01/05 00:52:58 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up21/01/05 00:52:58 INFO SparkEnv: Registering BlockManagerMasterHeartbeat21/01/05 00:52:58 INFO DiskBlockManager: Created local directory at C:\Users\marhamil\AppData\Local\Temp\blockmgr-9a5c80ef-ade6-41ac-9933-a26f6c29171921/01/05 00:52:58 INFO MemoryStore: MemoryStore started with capacity 4.0 GiB21/01/05 00:52:59 INFO SparkEnv: Registering OutputCommitCoordinator21/01/05 00:52:59 INFO Utils: Successfully started service 'SparkUI' on port 4040.21/01/05 00:52:59 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://host.docker.internal:404021/01/05 00:52:59 INFO Executor: Starting executor ID driver on host host.docker.internal21/01/05 00:52:59 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 52359.21/01/05 00:52:59 INFO NettyBlockTransferService: Server created on host.docker.internal:5235921/01/05 00:52:59 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy21/01/05 00:52:59 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:52:59 INFO BlockManagerMasterEndpoint: Registering block manager host.docker.internal:52359 with 4.0 GiB RAM, BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:52:59 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:52:59 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:53:00 WARN SharedState: Not allowing to set spark.sql.warehouse.dir or hive.metastore.warehouse.dir in SparkSession's options, it should be set statically for cross-session usagesFailed to execute user defined function(JsonOutputParserSuite$$Lambda$574/51376124: (struct) => struct)org.apache.spark.SparkException: Failed to execute user defined function(JsonOutputParserSuite$$Lambda$574/51376124: (struct) => struct) at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1130) at
[jira] [Updated] (SPARK-34002) Broken UDF Encoding
[ https://issues.apache.org/jira/browse/SPARK-34002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Hamilton updated SPARK-34002: -- Description: UDFs can behave differently depending on if a dataframe is cached, despite the dataframe being identical Repro: {code:java} case class Bar(a: Int) import spark.implicits._ def f1(bar: Bar): Option[Bar] = { None } def f2(bar: Bar): Option[Bar] = { Option(bar) } val udf1: UserDefinedFunction = udf(f1 _) val udf2: UserDefinedFunction = udf(f2 _) // Commenting in the cache will make this example work val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache() val newDf = df .withColumn("c1", udf1(col("c0"))) .withColumn("c2", udf2(col("c1"))) newDf.show() {code} Error: {code:java} Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties21/01/05 00:52:57 INFO SparkContext: Running Spark version 3.0.121/01/05 00:52:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable21/01/05 00:52:57 INFO ResourceUtils: ==21/01/05 00:52:57 INFO ResourceUtils: Resources for spark.driver: 21/01/05 00:52:57 INFO ResourceUtils: ==21/01/05 00:52:57 INFO SparkContext: Submitted application: JsonOutputParserSuite21/01/05 00:52:57 INFO SparkContext: Spark configuration:spark.app.name=JsonOutputParserSuitespark.driver.maxResultSize=6gspark.logConf=truespark.master=local[*]spark.sql.crossJoin.enabled=truespark.sql.shuffle.partitions=20spark.sql.warehouse.dir=file:/code/mmlspark/spark-warehouse21/01/05 00:52:58 INFO SecurityManager: Changing view acls to: marhamil21/01/05 00:52:58 INFO SecurityManager: Changing modify acls to: marhamil21/01/05 00:52:58 INFO SecurityManager: Changing view acls groups to: 21/01/05 00:52:58 INFO SecurityManager: Changing modify acls groups to: 21/01/05 00:52:58 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(marhamil); groups with view permissions: Set(); users with modify permissions: Set(marhamil); groups with modify permissions: Set()21/01/05 00:52:58 INFO Utils: Successfully started service 'sparkDriver' on port 52315.21/01/05 00:52:58 INFO SparkEnv: Registering MapOutputTracker21/01/05 00:52:58 INFO SparkEnv: Registering BlockManagerMaster21/01/05 00:52:58 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information21/01/05 00:52:58 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up21/01/05 00:52:58 INFO SparkEnv: Registering BlockManagerMasterHeartbeat21/01/05 00:52:58 INFO DiskBlockManager: Created local directory at C:\Users\marhamil\AppData\Local\Temp\blockmgr-9a5c80ef-ade6-41ac-9933-a26f6c29171921/01/05 00:52:58 INFO MemoryStore: MemoryStore started with capacity 4.0 GiB21/01/05 00:52:59 INFO SparkEnv: Registering OutputCommitCoordinator21/01/05 00:52:59 INFO Utils: Successfully started service 'SparkUI' on port 4040.21/01/05 00:52:59 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://host.docker.internal:404021/01/05 00:52:59 INFO Executor: Starting executor ID driver on host host.docker.internal21/01/05 00:52:59 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 52359.21/01/05 00:52:59 INFO NettyBlockTransferService: Server created on host.docker.internal:5235921/01/05 00:52:59 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy21/01/05 00:52:59 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:52:59 INFO BlockManagerMasterEndpoint: Registering block manager host.docker.internal:52359 with 4.0 GiB RAM, BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:52:59 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:52:59 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:53:00 WARN SharedState: Not allowing to set spark.sql.warehouse.dir or hive.metastore.warehouse.dir in SparkSession's options, it should be set statically for cross-session usagesFailed to execute user defined function(JsonOutputParserSuite$$Lambda$574/51376124: (struct) => struct)org.apache.spark.SparkException: Failed to execute user defined function(JsonOutputParserSuite$$Lambda$574/51376124: (struct) => struct) at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1130) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:156) at