Rex Xiong created SPARK-31660:
---------------------------------

             Summary: Dataset.joinWith supports JoinType object as input 
parameter
                 Key: SPARK-31660
                 URL: https://issues.apache.org/jira/browse/SPARK-31660
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.5
            Reporter: Rex Xiong


Current Dataset.joinWith API accepts String type joinType, it doesn't support 
JoinType object.
I prefer JoinType object (like enum) than String, less chance to have typo and 
has better readability
{code:scala}
def joinWith[U](other: Dataset[U], condition: Column, joinType: String): 
Dataset[(T, U)] = {{code}
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala


If I pass LeftOuter.sql to joinType, it will throw exception, since there's a 
white space in LeftOuter.sql
{code:scala}
case object LeftOuter extends JoinType {
  override def sql: String = "LEFT OUTER"
}
{code}
While the constructor of JoinType only removes underscore, doesn't handle white 
spaces, 
{code:scala}
object JoinType {
  def apply(typ: String): JoinType = typ.toLowerCase(Locale.ROOT).replace("_", 
"") match {
    case "inner" => Inner
    case "outer" | "full" | "fullouter" => FullOuter
    case "leftouter" | "left" => LeftOuter
    case "rightouter" | "right" => RightOuter
    case "leftsemi" | "semi" => LeftSemi
    case "leftanti" | "anti" => LeftAnti
    case "cross" => Cross
    case _ =>
      val supported = Seq(
        "inner",
        "outer", "full", "fullouter", "full_outer",
        "leftouter", "left", "left_outer",
        "rightouter", "right", "right_outer",
        "leftsemi", "left_semi", "semi",
        "leftanti", "left_anti", "anti",
        "cross")

      throw new IllegalArgumentException(s"Unsupported join type '$typ'. " +
        "Supported join types include: " + supported.mkString("'", "', '", "'") 
+ ".")
  }
}{code}
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/joinTypes.scala

I suggest we either add another set of APIs which provide JoinType instead of 
String, or change JoinType.apply to remove white space as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to