[ 
https://issues.apache.org/jira/browse/SPARK-38796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel updated SPARK-38796:
---------------------------
    Description: 
This tracks implementing the 'to_number' and 'try_to_number' SQL function 
expressions according to new semantics described below. The former is 
equivalent to the latter except that it throws an exception instead of 
returning NULL for cases where the input string does not match the format 
string.

 

-----------

 

try_to_number function (expr, fmt):

Returns 'expr' cast to DECIMAL using formatting 'fmt', or 'NULL' if 'expr' is 
not a valid match for the given format.

 

Syntax: 

[ S ] [ L | $ ]

[ 0 | 9 | G | , ] [...]

[ . | D ] 

[ 0 | 9 ] [...]       

[ L | $ ] [ PR | MI | S ] ' }

 

Arguments:

'expr': A STRING expression representing a number. 'expr' may include leading 
or trailing spaces.
'fmt': An STRING literal, specifying the expected format of 'expr'.

 

Returns:

A DECIMAL(p, s) where 'p' is the total number of digits ('0' or '9') and 's' is 
the number of digits after the decimal point, or 0 if there is none.

 

Format elements allowed (case insensitive):
 * 0 or 9

  Specifies an expected digit between '0' and '9'. 
  A '0' to the left of the decimal points indicates that 'expr' must have at 
least as many digits. A leading '9' indicates that 'expr' may omit these digits.

  'expr' must not be larger than the number of digits to the left of the 
decimal point allowed by the format string.

  Digits to the right of the decimal point in the format string indicate the 
most digits that 'expr' may have to the right of the decimal point.
 * . or D

  Specifies the position of the decimal point.

  'expr' does not need to include a decimal point.
 * , or G

  Specifies the position of the ',' grouping (thousands) separator.
  There must be a '0' or '9' to the left of the rightmost grouping separator. 
  'expr' must match the grouping separator relevant for the size of the number. 
 * $

  Specifies the location of the '$' currency sign. This character may only be 
specified once.
 * S 

  Specifies the position of an option '+' or '-' sign. This character may only 
be specified once.
 * MI

  Specifies that 'expr' has an optional '-' sign at the end, but no '+'.
 * PR

  Specifies that 'expr' indicates a negative number with wrapping angled 
brackets ('<1>'). If 'expr' contains any characters other than '0' through '9' 
and those permitted in 'fmt' a 'NULL' is returned.

 

Examples:

– The format expects:
–  * an optional sign at the beginning,
–  * followed by a dollar sign,
–  * followed by a number between 3 and 6 digits long,
–  * thousands separators,
–  * up to two dights beyond the decimal point. 
> SELECT try_to_number('-$12,345.67', 'S$999,099.99');
 -12345.67
– The plus sign is optional, and so are fractional digits.
> SELECT try_to_number('$345', 'S$999,099.99');
 345.00
– The format requires at least three digits.
> SELECT try_to_number('$45', 'S$999,099.99');
 NULL
– The format requires at least three digits.
> SELECT try_to_number('$045', 'S$999,099.99');
 45.00
– Using brackets to denote negative values
> SELECT try_to_number('<1234>', '999999PR');
 -1234

  was:
This tracks implementing the 'to_number' and 'try_to_number' SQL function 
expressions according to new semantics described below. The former is 
equivalent to the latter except that it throws an exception instead of 
returning NULL for cases where the input string does not match the format 
string.

-----------

 

try_to_number function (expr, fmt):

Returns 'expr' cast to DECIMAL using formatting 'fmt', or 'NULL' if 'expr' is 
not a valid match for the given format.

 


Syntax: 

[ S ] [ L | $ ]

[ 0 | 9 | G | , ] [...]

[ . | D ] 

[ 0 | 9 ] [...]       

[ L | $ ] [ PR | MI | S ] ' }


Arguments:

'expr': A STRING expression representing a number. 'expr' may include leading 
or trailing spaces.
'fmt': An STRING literal, specifying the expected format of 'expr'.

 

Returns:

A DECIMAL(p, s) where 'p' is the total number of digits ('0' or '9') and 's' is 
the number of digits after the decimal point, or 0 if there is none.

 

Format elements allowed (case insensitive):
 * 0 or 9

  Specifies an expected digit between '0' and '9'. 
  A '0' to the left of the decimal points indicates that 'expr' must have at 
least as many digits. A leading '9' indicates that 'expr' may omit these digits.

  'expr' must not be larger than the number of digits to the left of the 
decimal point allowed by the format string.

  Digits to the right of the decimal point in the format string indicate the 
most digits that 'expr' may have to the right of the decimal point.
 * . or D

  Specifies the position of the decimal point.

  'expr' does not need to include a decimal point.
 * , or G

  Specifies the position of the ',' grouping (thousands) separator.
  There must be a '0' or '9' to the left of the rightmost grouping separator. 
  'expr' must match the grouping separator relevant for the size of the number. 
 * $

  Specifies the location of the '$' currency sign. This character may only be 
specified once.
 * S 

  Specifies the position of an option '+' or '-' sign. This character may only 
be specified once.
 * MI

  Specifies that 'expr' has an optional '-' sign at the end, but no '+'.
 * PR

  Specifies that 'expr' indicates a negative number with wrapping angled 
brackets ('<1>'). If 'expr' contains any characters other than '0' through '9' 
and those permitted in 'fmt' a 'NULL' is returned.

 


Examples:

– The format expects:
–  * an optional sign at the beginning,
–  * followed by a dollar sign,
–  * followed by a number between 3 and 6 digits long,
–  * thousands separators,
–  * up to two dights beyond the decimal point. 
> SELECT try_to_number('-$12,345.67', 'S$999,099.99');
 -12345.67
– The plus sign is optional, and so are fractional digits.
> SELECT try_to_number('$345', 'S$999,099.99');
 345.00
– The format requires at least three digits.
> SELECT try_to_number('$45', 'S$999,099.99');
 NULL
– The format requires at least three digits.
> SELECT try_to_number('$045', 'S$999,099.99');
 45.00
– Using brackets to denote negative values
> SELECT try_to_number('<1234>', '999999PR');
 -1234


> Implement the to_number and try_to_number SQL functions according to a new 
> specification
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-38796
>                 URL: https://issues.apache.org/jira/browse/SPARK-38796
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Daniel
>            Priority: Major
>
> This tracks implementing the 'to_number' and 'try_to_number' SQL function 
> expressions according to new semantics described below. The former is 
> equivalent to the latter except that it throws an exception instead of 
> returning NULL for cases where the input string does not match the format 
> string.
>  
> -----------
>  
> try_to_number function (expr, fmt):
> Returns 'expr' cast to DECIMAL using formatting 'fmt', or 'NULL' if 'expr' is 
> not a valid match for the given format.
>  
> Syntax: 
> [ S ] [ L | $ ]
> [ 0 | 9 | G | , ] [...]
> [ . | D ] 
> [ 0 | 9 ] [...]       
> [ L | $ ] [ PR | MI | S ] ' }
>  
> Arguments:
> 'expr': A STRING expression representing a number. 'expr' may include leading 
> or trailing spaces.
> 'fmt': An STRING literal, specifying the expected format of 'expr'.
>  
> Returns:
> A DECIMAL(p, s) where 'p' is the total number of digits ('0' or '9') and 's' 
> is the number of digits after the decimal point, or 0 if there is none.
>  
> Format elements allowed (case insensitive):
>  * 0 or 9
>   Specifies an expected digit between '0' and '9'. 
>   A '0' to the left of the decimal points indicates that 'expr' must have at 
> least as many digits. A leading '9' indicates that 'expr' may omit these 
> digits.
>   'expr' must not be larger than the number of digits to the left of the 
> decimal point allowed by the format string.
>   Digits to the right of the decimal point in the format string indicate the 
> most digits that 'expr' may have to the right of the decimal point.
>  * . or D
>   Specifies the position of the decimal point.
>   'expr' does not need to include a decimal point.
>  * , or G
>   Specifies the position of the ',' grouping (thousands) separator.
>   There must be a '0' or '9' to the left of the rightmost grouping separator. 
>   'expr' must match the grouping separator relevant for the size of the 
> number. 
>  * $
>   Specifies the location of the '$' currency sign. This character may only be 
> specified once.
>  * S 
>   Specifies the position of an option '+' or '-' sign. This character may 
> only be specified once.
>  * MI
>   Specifies that 'expr' has an optional '-' sign at the end, but no '+'.
>  * PR
>   Specifies that 'expr' indicates a negative number with wrapping angled 
> brackets ('<1>'). If 'expr' contains any characters other than '0' through 
> '9' and those permitted in 'fmt' a 'NULL' is returned.
>  
> Examples:
> – The format expects:
> –  * an optional sign at the beginning,
> –  * followed by a dollar sign,
> –  * followed by a number between 3 and 6 digits long,
> –  * thousands separators,
> –  * up to two dights beyond the decimal point. 
> > SELECT try_to_number('-$12,345.67', 'S$999,099.99');
>  -12345.67
> – The plus sign is optional, and so are fractional digits.
> > SELECT try_to_number('$345', 'S$999,099.99');
>  345.00
> – The format requires at least three digits.
> > SELECT try_to_number('$45', 'S$999,099.99');
>  NULL
> – The format requires at least three digits.
> > SELECT try_to_number('$045', 'S$999,099.99');
>  45.00
> – Using brackets to denote negative values
> > SELECT try_to_number('<1234>', '999999PR');
>  -1234



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to