[jira] [Updated] (SPARK-34192) Move char padding to write side

Kent Yao (Jira) Mon, 25 Jan 2021 02:15:07 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-34192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kent Yao updated SPARK-34192:
-----------------------------
    Description: 
On the read side, the char length check and padding bring issues to CBO and PPD 
and other issues to the catalyst.

It's more reasonable to do it on the write side,  as Spark doesn't take full 
control of the storage layer.



https://issues.apache.org/jira/browse/HIVE-13618

For varchar and string, the below case still exists for the limitation of Hive 
metastore
For char, we now write fixed-length values, the issue should be fixed

{code:java}
  test("SPARK-34192: Know issue of hive for tailing spaces") {
    // https://issues.apache.org/jira/browse/HIVE-13618
    // Trailing spaces in partition column will be treated differently
    // This is because Mysql and Derby(used in tests) considers 'a' = 'a '
    // whereas others like (Postgres, Oracle) doesn't exhibit this problem.
    Seq("char(5)", "string", "VARCHAR(5)").foreach { typ =>
      withTable("t") {
        sql(s"CREATE TABLE t(i STRING, c $typ) USING $format PARTITIONED BY 
(c)")
        sql(s"INSERT INTO t VALUES ('1', 'a ')")
        val e = intercept[AnalysisException](sql(s"INSERT INTO t VALUES ('1', 
'a  ')"))
        assert(e.getMessage.contains("Expecting a partition with name c=a  ,"))
      }
    }
  }

{code}



  was:
On the read side, the char length check and padding bring issues to CBO and PPD 
and other issues to the catalyst.

It's more reasonable to do it on the write side,  as Spark doesn't take fully 
control of the storage layer.


  test("SPARK-34192: Know issue of hive for tailing spaces") {
    // https://issues.apache.org/jira/browse/HIVE-13618
    // Trailing spaces in partition column will be treated differently
    // This is because Mysql and Derby(used in tests) considers 'a' = 'a '
    // whereas others like (Postgres, Oracle) doesn't exhibit this problem.
    Seq("char(5)", "string", "VARCHAR(5)").foreach { typ =>
      withTable("t") {
        sql(s"CREATE TABLE t(i STRING, c $typ) USING $format PARTITIONED BY 
(c)")
        sql(s"INSERT INTO t VALUES ('1', 'a ')")
        val e = intercept[AnalysisException](sql(s"INSERT INTO t VALUES ('1', 
'a  ')"))
        assert(e.getMessage.contains("Expecting a partition with name c=a  ,"))
      }
    }
  }


> Move char padding to write side
> -------------------------------
>
>                 Key: SPARK-34192
>                 URL: https://issues.apache.org/jira/browse/SPARK-34192
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Kent Yao
>            Priority: Major
>
> On the read side, the char length check and padding bring issues to CBO and 
> PPD and other issues to the catalyst.
> It's more reasonable to do it on the write side,  as Spark doesn't take full 
> control of the storage layer.
> https://issues.apache.org/jira/browse/HIVE-13618
> For varchar and string, the below case still exists for the limitation of 
> Hive metastore
> For char, we now write fixed-length values, the issue should be fixed
> {code:java}
>   test("SPARK-34192: Know issue of hive for tailing spaces") {
>     // https://issues.apache.org/jira/browse/HIVE-13618
>     // Trailing spaces in partition column will be treated differently
>     // This is because Mysql and Derby(used in tests) considers 'a' = 'a '
>     // whereas others like (Postgres, Oracle) doesn't exhibit this problem.
>     Seq("char(5)", "string", "VARCHAR(5)").foreach { typ =>
>       withTable("t") {
>         sql(s"CREATE TABLE t(i STRING, c $typ) USING $format PARTITIONED BY 
> (c)")
>         sql(s"INSERT INTO t VALUES ('1', 'a ')")
>         val e = intercept[AnalysisException](sql(s"INSERT INTO t VALUES ('1', 
> 'a  ')"))
>         assert(e.getMessage.contains("Expecting a partition with name c=a  
> ,"))
>       }
>     }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34192) Move char padding to write side

Reply via email to