[ 
https://issues.apache.org/jira/browse/SPARK-53504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-53504:
-----------------------------
    Description: 
h3. Issues on adding new Catalyst types:
Based on experience on adding ANSI intervals, TIME type and migration of 
TIMESTAMP_LTZ onto Proleptic Gregorian calendar.

* Public data type classes like DateType, TimestampType contain minimum 
information and no implementation. Some classes like StructType, DecimalType, 
StringType contain much more operation over the types.
* Type operations are spread across entire codebase. There is high chance to 
miss processing of new type.
* Error prone since all errors are caught by tests in runtime. No help from the 
compiler.

h3. Examples of the current implementation:
{code}
find . -name "*.scala" -print0|xargs -0 grep case|grep '=>'|grep 
DayTimeIntervalType|grep -v test|wc -l
     133
{code}

h3. The goal is to add a set of interface and ops objects.
The interfaces define operations (internal) over the catalyst types. And the 
ops objects implement such interfaces. For instance:

{code:scala}
case class TimeTypeOps(t: TimeType)
  extends TypeApiOps
  with EncodeTypeOps
  with FormatTypeOps
  with TypeOps
  with PhyTypeOps
  with LiteralTypeOps {
{code}
where *LiteralTypeOps* is

{code:scala}
trait LiteralTypeOps {
  // Gets a literal with default value of the type
  def getDefaultLiteral: Literal
  // Gets an Java literal as a string. It can be used in codegen
  def getJavaLiteral(v: Any): String
}
{code}



  was:
h3. Issues on adding new Catalyst types:
Based on experience on adding ANSI intervals, TIME type and migration of 
TIMESTAMP_LTZ onto Proleptic Gregorian calendar.

* Public data type classes like DateType, TimestampType contain minimum 
information and no implementation. Some classes like StructType, DecimalType, 
StringType contain much more operation over the types.
* Type operations are spread across entire codebase. There is high chance to 
miss processing of new type.
* Error prone since all errors are caught by tests in runtime. No help from the 
compiler.
h3. Examples of the current implementation:
{code}
find . -name "*.scala" -print0|xargs -0 grep case|grep '=>'|grep 
DayTimeIntervalType|grep -v test|wc -l
     133
{code}

h3. The goal is to add a set of interface and ops objects.
The interfaces define operations (internal) over the catalyst types. And the 
ops objects implement such interfaces. For instance:

{code:scala}
case class TimeTypeOps(t: TimeType)
  extends TypeApiOps
  with EncodeTypeOps
  with FormatTypeOps
  with TypeOps
  with PhyTypeOps
  with LiteralTypeOps {
{code}



> Type framework
> --------------
>
>                 Key: SPARK-53504
>                 URL: https://issues.apache.org/jira/browse/SPARK-53504
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.1.0
>            Reporter: Max Gekk
>            Assignee: Max Gekk
>            Priority: Major
>
> h3. Issues on adding new Catalyst types:
> Based on experience on adding ANSI intervals, TIME type and migration of 
> TIMESTAMP_LTZ onto Proleptic Gregorian calendar.
> * Public data type classes like DateType, TimestampType contain minimum 
> information and no implementation. Some classes like StructType, DecimalType, 
> StringType contain much more operation over the types.
> * Type operations are spread across entire codebase. There is high chance to 
> miss processing of new type.
> * Error prone since all errors are caught by tests in runtime. No help from 
> the compiler.
> h3. Examples of the current implementation:
> {code}
> find . -name "*.scala" -print0|xargs -0 grep case|grep '=>'|grep 
> DayTimeIntervalType|grep -v test|wc -l
>      133
> {code}
> h3. The goal is to add a set of interface and ops objects.
> The interfaces define operations (internal) over the catalyst types. And the 
> ops objects implement such interfaces. For instance:
> {code:scala}
> case class TimeTypeOps(t: TimeType)
>   extends TypeApiOps
>   with EncodeTypeOps
>   with FormatTypeOps
>   with TypeOps
>   with PhyTypeOps
>   with LiteralTypeOps {
> {code}
> where *LiteralTypeOps* is
> {code:scala}
> trait LiteralTypeOps {
>   // Gets a literal with default value of the type
>   def getDefaultLiteral: Literal
>   // Gets an Java literal as a string. It can be used in codegen
>   def getJavaLiteral(v: Any): String
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to