Hi guys,
Apologies for the long mail.

I am running below code snippet

import org.apache.spark.sql.SparkSession
object ColumnNameWithDot {
 def main(args: Array[String]): Unit = {

 val spark = SparkSession.builder.appName("Simple Application")
 .config("spark.master", "local").getOrCreate()

 spark.sparkContext.setLogLevel("OFF")

 import spark.implicits._
 val df = Seq(("abc", 23), ("def", 44), (null, 9)).toDF("ColWith.Dot",
"Col")
 df.na.fill(Map("`ColWith.Dot`" -> "n/a")).show()

 }
}

and it is failing with error
Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot
resolve column name "ColWith.Dot" among (ColWith.Dot, Col);

I checked that code-fix were made for the similar issue, found
https://issues.apache.org/jira/browse/SPARK-19473; but none of them fixed
all cases.

I debugged the code below are the observations

   1. In org.apache.spark.sql.DataFrameNaFunctions.fillMap(values:
   Seq[(String, Any)]) the df.resolve(colName) call succeeds, since column
   name is quoted with back tick it resolves the column
   2. val projections = df.schema.fields.map {
       ...
       ...
   }.getOrElse(df.col(f.name)) fails since resolved column name is not
   quoted with backtick

Problem lies in the
org.apache.spark.sql.catalyst.expressions
resolve(nameParts: Seq[String], resolver: Resolver): Option[NamedExpression]

where the comment says we try to resolve it as a column.

// If none of attributes match database.table.column pattern or
// `table.column` pattern, we try to resolve it as a column.
val (candidates, nestedFields) = matches match {
    case (Seq(), _) =>
        val name = nameParts.head
        val attributes = collectMatches(name,
direct.get(name.toLowerCase(Locale.ROOT)))
        (attributes, nameParts.tail)
    case _ => matches
}

should be changed to

// If none of attributes match database.table.column pattern or
// `table.column` pattern, we try to resolve it as a column.
val (candidates, nestedFields) = matches match {
    case (Seq(), _) =>
        val name = nameParts.mkString(".")
        val attributes = collectMatches(name,
direct.get(name.toLowerCase(Locale.ROOT)))
        (attributes, Seq.empty)
    case _ => matches
}
git diff is as below

-          val name = nameParts.head
+          val name = nameParts.mkString(".")
           val attributes = collectMatches(name,
direct.get(name.toLowerCase(Locale.ROOT)))
-          (attributes, nameParts.tail)
+          (attributes, Seq.empty)

I tested this change, there is no longer need to use backtick with columns
having dot in the name.
Can this change be merged?

Regards,
Amandeep

Reply via email to