Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-06-05 Thread via GitHub


comphead merged PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-06-04 Thread via GitHub


codecov-commenter commented on PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#issuecomment-2148675725

   ## 
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/455?dropdown=coverage=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 Report
   All modified and coverable lines are covered by tests :white_check_mark:
   > Project coverage is 34.23%. Comparing base 
[(`9ca63a2`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/9ca63a23edf67033e4f4eba5a9d004aa472743d2?dropdown=coverage=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 to head 
[(`e8f3b77`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/e8f3b77596ebe1617c28f812cc63d4206ed064a1?dropdown=coverage=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   > Report is 27 commits behind head on main.
   
   
   Additional details and impacted files
   
   
   ```diff
   @@ Coverage Diff  @@
   ##   main #455  +/-   ##
   
   + Coverage 34.18%   34.23%   +0.04% 
   + Complexity  851  806  -45 
   
 Files   116  105  -11 
 Lines 3857038488  -82 
 Branches   8531 8562  +31 
   
   - Hits  1318713175  -12 
   + Misses2261222554  -58 
   + Partials   2771 2759  -12 
   ```
   
   
   
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/datafusion-comet/pull/455?dropdown=coverage=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-06-04 Thread via GitHub


andygrove commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1626637845


##
docs/source/user-guide/overview.md:
##
@@ -29,7 +29,7 @@ Comet aims to support:
 - a native Parquet implementation, including both reader and writer
 - full implementation of Spark operators, including
   Filter/Project/Aggregation/Join/Exchange etc.
-- full implementation of Spark built-in expressions
+- [full implementation](../../../docs/spark_expressions_support.md) of Spark 
built-in expressions.

Review Comment:
   ```suggestion
   - full implementation of Spark built-in expressions.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-06-04 Thread via GitHub


andygrove commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1626637123


##
docs/source/user-guide/overview.md:
##
@@ -29,7 +29,7 @@ Comet aims to support:
 - a native Parquet implementation, including both reader and writer
 - full implementation of Spark operators, including
   Filter/Project/Aggregation/Join/Exchange etc.
-- full implementation of Spark built-in expressions
+- [full implementation](../../../docs/spark_expressions_support.md) of Spark 
built-in expressions.

Review Comment:
   This won't build correctly:
   
   
   ```
   /Users/andy/git/apache/datafusion-comet/docs/temp/user-guide/overview.md:32: 
WARNING: Unknown source document '../spark_expressions_support' 
[myst.xref_missing]
   ```
   
   Let's revert this change for this PR and handle where we publish (user guide 
vs contributor guide) in a follow-up PR.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-06-04 Thread via GitHub


comphead commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1626633782


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,475 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [x] any
+ - [x] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [x] avg
+ - [x] bit_and
+ - [x] bit_or
+ - [x] bit_xor
+ - [x] bool_and
+ - [x] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [x] count
+ - [x] count_if
+ - [ ] count_min_sketch
+ - [x] covar_pop
+ - [x] covar_samp
+ - [x] every
+ - [x] first
+ - [x] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [x] last
+ - [x] last_value
+ - [x] max
+ - [ ] max_by
+ - [x] mean
+ - [ ] median
+ - [x] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [x] regr_avgx
+ - [x] regr_avgy
+ - [x] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [x] some
+ - [x] std
+ - [x] stddev
+ - [x] stddev_pop
+ - [x] stddev_samp
+ - [x] sum
+ - [ ] try_avg
+ - [ ] try_sum
+ - [x] var_pop
+ - [x] var_samp
+ - [x] variance
+
+### array_funcs
+ - [ ] array
+ - [ ] array_append
+ - [ ] array_compact
+ - [ ] array_contains
+ - [ ] array_distinct
+ - [ ] array_except
+ - [ ] array_insert
+ - [ ] array_intersect
+ - [ ] array_join
+ - [ ] array_max
+ - [ ] array_min
+ - [ ] array_position
+ - [ ] array_remove
+ - [ ] array_repeat
+ - [ ] array_union
+ - [ ] arrays_overlap
+ - [ ] arrays_zip
+ - [ ] flatten
+ - [ ] get
+ - [ ] sequence
+ - [ ] shuffle
+ - [ ] slice
+ - [ ] sort_array
+
+### bitwise_funcs
+ - [x] &
+ - [x] ^
+ - [ ] bit_count
+ - [ ] bit_get
+ - [ ] getbit
+ - [x] shiftright
+ - [ ] shiftrightunsigned
+ - [x] |
+ - [x] ~
+
+### collection_funcs
+ - [ ] array_size
+ - [ ] cardinality
+ - [ ] concat
+ - [x] reverse
+ - [ ] size
+
+### conditional_funcs
+ - [x] coalesce
+ - [x] if
+ - [x] ifnull
+ - [ ] nanvl
+ - [x] nullif
+ - [x] nvl
+ - [x] nvl2
+ - [ ] when
+
+### conversion_funcs
+ - [ ] bigint
+ - [ ] binary
+ - [ ] boolean
+ - [ ] cast
+ - [ ] date
+ - [ ] decimal
+ - [ ] double
+ - [ ] float
+ - [ ] int
+ - [ ] smallint
+ - [ ] string
+ - [ ] timestamp
+ - [ ] tinyint
+
+### csv_funcs
+ - [ ] from_csv
+ - [ ] schema_of_csv
+ - [ ] to_csv
+
+### datetime_funcs
+ - [ ] add_months
+ - [ ] convert_timezone
+ - [x] curdate
+ - [x] current_date
+ - [ ] current_timestamp
+ - [x] current_timezone
+ - [ ] date_add
+ - [ ] date_diff
+ - [ ] date_format
+ - [ ] date_from_unix_date
+ - [x] date_part
+ - [ ] date_sub
+ - [ ] date_trunc
+ - [ ] dateadd
+ - [ ] datediff
+ - [x] datepart
+ - [ ] day
+ - [ ] dayofmonth
+ - [ ] dayofweek
+ - [ ] dayofyear
+ - [x] extract
+ - [ ] from_unixtime
+ - [ ] from_utc_timestamp
+ - [ ] hour
+ - [ ] last_day
+ - [ ] localtimestamp
+ - [ ] make_date
+ - [ ] make_dt_interval
+ - [ ] make_interval
+ - [ ] make_timestamp
+ - [ ] make_timestamp_ltz
+ - [ ] make_timestamp_ntz
+ - [ ] make_ym_interval
+ - [ ] minute
+ - [ ] month
+ - [ ] months_between
+ - [ ] next_day
+ - [ ] now
+ - [ ] quarter
+ - [ ] second
+ - [ ] timestamp_micros
+ - [ ] timestamp_millis
+ - [ ] timestamp_seconds
+ - [ ] to_date
+ - [ ] to_timestamp
+ - [ ] to_timestamp_ltz
+ - [ ] to_timestamp_ntz
+ - [ ] to_unix_timestamp
+ - [ ] to_utc_timestamp
+ - [ ] trunc
+ - [ ] try_to_timestamp
+ - [ ] unix_date
+ - [ ] unix_micros
+ - [ ] unix_millis
+ - [ ] unix_seconds
+ - [ ] unix_timestamp
+ - [ ] weekday
+ - [ ] weekofyear
+ - [ ] year
+
+### generator_funcs
+ - [ ] explode
+ - [ ] explode_outer
+ - [ ] inline
+ - [ ] inline_outer
+ - [ ] posexplode
+ - [ ] posexplode_outer
+ - [ ] stack
+
+### hash_funcs
+ - [ ] crc32
+ - [ ] hash
+ - [x] md5
+ - [ ] sha
+ - [ ] sha1
+ - [ ] sha2
+ - [ ] xxhash64
+
+### json_funcs
+ - [ ] from_json
+ - [ ] get_json_object
+ - [ ] json_array_length
+ - [ ] json_object_keys
+ - [ ] json_tuple
+ - [ ] schema_of_json
+ - [ ] to_json
+
+### lambda_funcs
+ - [ ] aggregate
+ - [ ] array_sort
+ - [ ] exists
+ - [ ] filter
+ - [ ] forall
+ - [ ] map_filter
+ - [ ] map_zip_with
+ - [ ] reduce
+ - [ ] transform
+ - [ ] transform_keys
+ - [ ] transform_values
+ - [ ] zip_with
+
+### map_funcs
+ - [ ] element_at
+ - [ ] map
+ - [ ] map_concat
+ - [ ] map_contains_key
+ - [ ] map_entries
+ - [ ] map_from_arrays
+ - [ ] map_from_entries
+ - [ ] map_keys
+ - [ ] map_values
+ - [ ] str_to_map
+ - [ ] try_element_at
+
+### math_funcs
+ - [x] %
+ - [x] *
+ - [x] +
+ - [x] -
+ - [x] /
+ - [x] abs
+ - [x] acos
+ - [ ] acosh
+ - [x] asin
+ - [ ] asinh
+ - [x] atan
+ - [x] atan2
+ - [ ] atanh
+ - [ ] bin
+ - [ ] bround
+ - [ ] cbrt
+ - [x] ceil
+ - [x] ceiling
+ - [ ] conv
+ - [x] cos
+ - [ ] cosh
+ - [ ] cot
+ - [ ] csc
+ - [ ] degrees
+ - [ ] div
+ - [ ] e
+ - [x] exp
+ - [ ] expm1
+ - [ ] factorial
+ - [x] floor
+ - [ ] greatest
+ - [ ] hex
+ - [ ] hypot
+ - [ ] least
+ - [x] ln
+ - [ 

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-06-04 Thread via GitHub


andygrove commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1626623804


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,475 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [x] any
+ - [x] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [x] avg
+ - [x] bit_and
+ - [x] bit_or
+ - [x] bit_xor
+ - [x] bool_and
+ - [x] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [x] count
+ - [x] count_if
+ - [ ] count_min_sketch
+ - [x] covar_pop
+ - [x] covar_samp
+ - [x] every
+ - [x] first
+ - [x] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [x] last
+ - [x] last_value
+ - [x] max
+ - [ ] max_by
+ - [x] mean
+ - [ ] median
+ - [x] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [x] regr_avgx
+ - [x] regr_avgy
+ - [x] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [x] some
+ - [x] std
+ - [x] stddev
+ - [x] stddev_pop
+ - [x] stddev_samp
+ - [x] sum
+ - [ ] try_avg
+ - [ ] try_sum
+ - [x] var_pop
+ - [x] var_samp
+ - [x] variance
+
+### array_funcs
+ - [ ] array
+ - [ ] array_append
+ - [ ] array_compact
+ - [ ] array_contains
+ - [ ] array_distinct
+ - [ ] array_except
+ - [ ] array_insert
+ - [ ] array_intersect
+ - [ ] array_join
+ - [ ] array_max
+ - [ ] array_min
+ - [ ] array_position
+ - [ ] array_remove
+ - [ ] array_repeat
+ - [ ] array_union
+ - [ ] arrays_overlap
+ - [ ] arrays_zip
+ - [ ] flatten
+ - [ ] get
+ - [ ] sequence
+ - [ ] shuffle
+ - [ ] slice
+ - [ ] sort_array
+
+### bitwise_funcs
+ - [x] &
+ - [x] ^
+ - [ ] bit_count
+ - [ ] bit_get
+ - [ ] getbit
+ - [x] shiftright
+ - [ ] shiftrightunsigned
+ - [x] |
+ - [x] ~
+
+### collection_funcs
+ - [ ] array_size
+ - [ ] cardinality
+ - [ ] concat
+ - [x] reverse
+ - [ ] size
+
+### conditional_funcs
+ - [x] coalesce
+ - [x] if
+ - [x] ifnull
+ - [ ] nanvl
+ - [x] nullif
+ - [x] nvl
+ - [x] nvl2
+ - [ ] when
+
+### conversion_funcs
+ - [ ] bigint
+ - [ ] binary
+ - [ ] boolean
+ - [ ] cast
+ - [ ] date
+ - [ ] decimal
+ - [ ] double
+ - [ ] float
+ - [ ] int
+ - [ ] smallint
+ - [ ] string
+ - [ ] timestamp
+ - [ ] tinyint
+
+### csv_funcs
+ - [ ] from_csv
+ - [ ] schema_of_csv
+ - [ ] to_csv
+
+### datetime_funcs
+ - [ ] add_months
+ - [ ] convert_timezone
+ - [x] curdate
+ - [x] current_date
+ - [ ] current_timestamp
+ - [x] current_timezone
+ - [ ] date_add
+ - [ ] date_diff
+ - [ ] date_format
+ - [ ] date_from_unix_date
+ - [x] date_part
+ - [ ] date_sub
+ - [ ] date_trunc
+ - [ ] dateadd
+ - [ ] datediff
+ - [x] datepart
+ - [ ] day
+ - [ ] dayofmonth
+ - [ ] dayofweek
+ - [ ] dayofyear
+ - [x] extract
+ - [ ] from_unixtime
+ - [ ] from_utc_timestamp
+ - [ ] hour
+ - [ ] last_day
+ - [ ] localtimestamp
+ - [ ] make_date
+ - [ ] make_dt_interval
+ - [ ] make_interval
+ - [ ] make_timestamp
+ - [ ] make_timestamp_ltz
+ - [ ] make_timestamp_ntz
+ - [ ] make_ym_interval
+ - [ ] minute
+ - [ ] month
+ - [ ] months_between
+ - [ ] next_day
+ - [ ] now
+ - [ ] quarter
+ - [ ] second
+ - [ ] timestamp_micros
+ - [ ] timestamp_millis
+ - [ ] timestamp_seconds
+ - [ ] to_date
+ - [ ] to_timestamp
+ - [ ] to_timestamp_ltz
+ - [ ] to_timestamp_ntz
+ - [ ] to_unix_timestamp
+ - [ ] to_utc_timestamp
+ - [ ] trunc
+ - [ ] try_to_timestamp
+ - [ ] unix_date
+ - [ ] unix_micros
+ - [ ] unix_millis
+ - [ ] unix_seconds
+ - [ ] unix_timestamp
+ - [ ] weekday
+ - [ ] weekofyear
+ - [ ] year
+
+### generator_funcs
+ - [ ] explode
+ - [ ] explode_outer
+ - [ ] inline
+ - [ ] inline_outer
+ - [ ] posexplode
+ - [ ] posexplode_outer
+ - [ ] stack
+
+### hash_funcs
+ - [ ] crc32
+ - [ ] hash
+ - [x] md5
+ - [ ] sha
+ - [ ] sha1
+ - [ ] sha2
+ - [ ] xxhash64
+
+### json_funcs
+ - [ ] from_json
+ - [ ] get_json_object
+ - [ ] json_array_length
+ - [ ] json_object_keys
+ - [ ] json_tuple
+ - [ ] schema_of_json
+ - [ ] to_json
+
+### lambda_funcs
+ - [ ] aggregate
+ - [ ] array_sort
+ - [ ] exists
+ - [ ] filter
+ - [ ] forall
+ - [ ] map_filter
+ - [ ] map_zip_with
+ - [ ] reduce
+ - [ ] transform
+ - [ ] transform_keys
+ - [ ] transform_values
+ - [ ] zip_with
+
+### map_funcs
+ - [ ] element_at
+ - [ ] map
+ - [ ] map_concat
+ - [ ] map_contains_key
+ - [ ] map_entries
+ - [ ] map_from_arrays
+ - [ ] map_from_entries
+ - [ ] map_keys
+ - [ ] map_values
+ - [ ] str_to_map
+ - [ ] try_element_at
+
+### math_funcs
+ - [x] %
+ - [x] *
+ - [x] +
+ - [x] -
+ - [x] /
+ - [x] abs
+ - [x] acos
+ - [ ] acosh
+ - [x] asin
+ - [ ] asinh
+ - [x] atan
+ - [x] atan2
+ - [ ] atanh
+ - [ ] bin
+ - [ ] bround
+ - [ ] cbrt
+ - [x] ceil
+ - [x] ceiling
+ - [ ] conv
+ - [x] cos
+ - [ ] cosh
+ - [ ] cot
+ - [ ] csc
+ - [ ] degrees
+ - [ ] div
+ - [ ] e
+ - [x] exp
+ - [ ] expm1
+ - [ ] factorial
+ - [x] floor
+ - [ ] greatest
+ - [ ] hex
+ - [ ] hypot
+ - [ ] least
+ - [x] ln
+ - [ 

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-06-04 Thread via GitHub


andygrove commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1626623804


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,475 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [x] any
+ - [x] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [x] avg
+ - [x] bit_and
+ - [x] bit_or
+ - [x] bit_xor
+ - [x] bool_and
+ - [x] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [x] count
+ - [x] count_if
+ - [ ] count_min_sketch
+ - [x] covar_pop
+ - [x] covar_samp
+ - [x] every
+ - [x] first
+ - [x] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [x] last
+ - [x] last_value
+ - [x] max
+ - [ ] max_by
+ - [x] mean
+ - [ ] median
+ - [x] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [x] regr_avgx
+ - [x] regr_avgy
+ - [x] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [x] some
+ - [x] std
+ - [x] stddev
+ - [x] stddev_pop
+ - [x] stddev_samp
+ - [x] sum
+ - [ ] try_avg
+ - [ ] try_sum
+ - [x] var_pop
+ - [x] var_samp
+ - [x] variance
+
+### array_funcs
+ - [ ] array
+ - [ ] array_append
+ - [ ] array_compact
+ - [ ] array_contains
+ - [ ] array_distinct
+ - [ ] array_except
+ - [ ] array_insert
+ - [ ] array_intersect
+ - [ ] array_join
+ - [ ] array_max
+ - [ ] array_min
+ - [ ] array_position
+ - [ ] array_remove
+ - [ ] array_repeat
+ - [ ] array_union
+ - [ ] arrays_overlap
+ - [ ] arrays_zip
+ - [ ] flatten
+ - [ ] get
+ - [ ] sequence
+ - [ ] shuffle
+ - [ ] slice
+ - [ ] sort_array
+
+### bitwise_funcs
+ - [x] &
+ - [x] ^
+ - [ ] bit_count
+ - [ ] bit_get
+ - [ ] getbit
+ - [x] shiftright
+ - [ ] shiftrightunsigned
+ - [x] |
+ - [x] ~
+
+### collection_funcs
+ - [ ] array_size
+ - [ ] cardinality
+ - [ ] concat
+ - [x] reverse
+ - [ ] size
+
+### conditional_funcs
+ - [x] coalesce
+ - [x] if
+ - [x] ifnull
+ - [ ] nanvl
+ - [x] nullif
+ - [x] nvl
+ - [x] nvl2
+ - [ ] when
+
+### conversion_funcs
+ - [ ] bigint
+ - [ ] binary
+ - [ ] boolean
+ - [ ] cast
+ - [ ] date
+ - [ ] decimal
+ - [ ] double
+ - [ ] float
+ - [ ] int
+ - [ ] smallint
+ - [ ] string
+ - [ ] timestamp
+ - [ ] tinyint
+
+### csv_funcs
+ - [ ] from_csv
+ - [ ] schema_of_csv
+ - [ ] to_csv
+
+### datetime_funcs
+ - [ ] add_months
+ - [ ] convert_timezone
+ - [x] curdate
+ - [x] current_date
+ - [ ] current_timestamp
+ - [x] current_timezone
+ - [ ] date_add
+ - [ ] date_diff
+ - [ ] date_format
+ - [ ] date_from_unix_date
+ - [x] date_part
+ - [ ] date_sub
+ - [ ] date_trunc
+ - [ ] dateadd
+ - [ ] datediff
+ - [x] datepart
+ - [ ] day
+ - [ ] dayofmonth
+ - [ ] dayofweek
+ - [ ] dayofyear
+ - [x] extract
+ - [ ] from_unixtime
+ - [ ] from_utc_timestamp
+ - [ ] hour
+ - [ ] last_day
+ - [ ] localtimestamp
+ - [ ] make_date
+ - [ ] make_dt_interval
+ - [ ] make_interval
+ - [ ] make_timestamp
+ - [ ] make_timestamp_ltz
+ - [ ] make_timestamp_ntz
+ - [ ] make_ym_interval
+ - [ ] minute
+ - [ ] month
+ - [ ] months_between
+ - [ ] next_day
+ - [ ] now
+ - [ ] quarter
+ - [ ] second
+ - [ ] timestamp_micros
+ - [ ] timestamp_millis
+ - [ ] timestamp_seconds
+ - [ ] to_date
+ - [ ] to_timestamp
+ - [ ] to_timestamp_ltz
+ - [ ] to_timestamp_ntz
+ - [ ] to_unix_timestamp
+ - [ ] to_utc_timestamp
+ - [ ] trunc
+ - [ ] try_to_timestamp
+ - [ ] unix_date
+ - [ ] unix_micros
+ - [ ] unix_millis
+ - [ ] unix_seconds
+ - [ ] unix_timestamp
+ - [ ] weekday
+ - [ ] weekofyear
+ - [ ] year
+
+### generator_funcs
+ - [ ] explode
+ - [ ] explode_outer
+ - [ ] inline
+ - [ ] inline_outer
+ - [ ] posexplode
+ - [ ] posexplode_outer
+ - [ ] stack
+
+### hash_funcs
+ - [ ] crc32
+ - [ ] hash
+ - [x] md5
+ - [ ] sha
+ - [ ] sha1
+ - [ ] sha2
+ - [ ] xxhash64
+
+### json_funcs
+ - [ ] from_json
+ - [ ] get_json_object
+ - [ ] json_array_length
+ - [ ] json_object_keys
+ - [ ] json_tuple
+ - [ ] schema_of_json
+ - [ ] to_json
+
+### lambda_funcs
+ - [ ] aggregate
+ - [ ] array_sort
+ - [ ] exists
+ - [ ] filter
+ - [ ] forall
+ - [ ] map_filter
+ - [ ] map_zip_with
+ - [ ] reduce
+ - [ ] transform
+ - [ ] transform_keys
+ - [ ] transform_values
+ - [ ] zip_with
+
+### map_funcs
+ - [ ] element_at
+ - [ ] map
+ - [ ] map_concat
+ - [ ] map_contains_key
+ - [ ] map_entries
+ - [ ] map_from_arrays
+ - [ ] map_from_entries
+ - [ ] map_keys
+ - [ ] map_values
+ - [ ] str_to_map
+ - [ ] try_element_at
+
+### math_funcs
+ - [x] %
+ - [x] *
+ - [x] +
+ - [x] -
+ - [x] /
+ - [x] abs
+ - [x] acos
+ - [ ] acosh
+ - [x] asin
+ - [ ] asinh
+ - [x] atan
+ - [x] atan2
+ - [ ] atanh
+ - [ ] bin
+ - [ ] bround
+ - [ ] cbrt
+ - [x] ceil
+ - [x] ceiling
+ - [ ] conv
+ - [x] cos
+ - [ ] cosh
+ - [ ] cot
+ - [ ] csc
+ - [ ] degrees
+ - [ ] div
+ - [ ] e
+ - [x] exp
+ - [ ] expm1
+ - [ ] factorial
+ - [x] floor
+ - [ ] greatest
+ - [ ] hex
+ - [ ] hypot
+ - [ ] least
+ - [x] ln
+ - [ 

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-30 Thread via GitHub


comphead commented on PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#issuecomment-2141027572

   @andygrove I fixed all the comments, however you are right, sometimes we 
support partially the function.
   means part of syntax or some value range not supported.
   
   here comes an idea for follow up PR to introduce partially supported 
status(or similar) with the reason why it is supported partially


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-29 Thread via GitHub


comphead commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1619544584


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,475 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [x] any
+ - [x] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [x] avg
+ - [x] bit_and
+ - [x] bit_or
+ - [x] bit_xor
+ - [x] bool_and
+ - [x] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [x] count
+ - [x] count_if
+ - [ ] count_min_sketch
+ - [x] covar_pop
+ - [x] covar_samp
+ - [x] every
+ - [x] first
+ - [x] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [x] last
+ - [x] last_value
+ - [x] max
+ - [ ] max_by
+ - [x] mean
+ - [ ] median
+ - [x] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [x] regr_avgx
+ - [x] regr_avgy
+ - [x] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [x] some
+ - [x] std
+ - [x] stddev
+ - [x] stddev_pop
+ - [x] stddev_samp
+ - [x] sum
+ - [ ] try_avg
+ - [ ] try_sum
+ - [x] var_pop
+ - [x] var_samp
+ - [x] variance
+
+### array_funcs
+ - [ ] array
+ - [ ] array_append
+ - [ ] array_compact
+ - [ ] array_contains
+ - [ ] array_distinct
+ - [ ] array_except
+ - [ ] array_insert
+ - [ ] array_intersect
+ - [ ] array_join
+ - [ ] array_max
+ - [ ] array_min
+ - [ ] array_position
+ - [ ] array_remove
+ - [ ] array_repeat
+ - [ ] array_union
+ - [ ] arrays_overlap
+ - [ ] arrays_zip
+ - [ ] flatten
+ - [ ] get
+ - [ ] sequence
+ - [ ] shuffle
+ - [ ] slice
+ - [ ] sort_array
+
+### bitwise_funcs
+ - [x] &
+ - [x] ^
+ - [ ] bit_count
+ - [ ] bit_get
+ - [ ] getbit
+ - [x] shiftright
+ - [ ] shiftrightunsigned
+ - [x] |
+ - [x] ~
+
+### collection_funcs
+ - [ ] array_size
+ - [ ] cardinality
+ - [ ] concat
+ - [x] reverse
+ - [ ] size
+
+### conditional_funcs
+ - [x] coalesce
+ - [x] if
+ - [x] ifnull
+ - [ ] nanvl
+ - [x] nullif
+ - [x] nvl
+ - [x] nvl2
+ - [ ] when
+
+### conversion_funcs
+ - [ ] bigint
+ - [ ] binary
+ - [ ] boolean
+ - [ ] cast
+ - [ ] date
+ - [ ] decimal
+ - [ ] double
+ - [ ] float
+ - [ ] int
+ - [ ] smallint
+ - [ ] string
+ - [ ] timestamp
+ - [ ] tinyint
+
+### csv_funcs
+ - [ ] from_csv
+ - [ ] schema_of_csv
+ - [ ] to_csv
+
+### datetime_funcs
+ - [ ] add_months
+ - [ ] convert_timezone
+ - [x] curdate
+ - [x] current_date
+ - [ ] current_timestamp
+ - [x] current_timezone
+ - [ ] date_add
+ - [ ] date_diff
+ - [ ] date_format
+ - [ ] date_from_unix_date
+ - [x] date_part
+ - [ ] date_sub
+ - [ ] date_trunc
+ - [ ] dateadd
+ - [ ] datediff
+ - [x] datepart
+ - [ ] day
+ - [ ] dayofmonth
+ - [ ] dayofweek
+ - [ ] dayofyear
+ - [x] extract
+ - [ ] from_unixtime
+ - [ ] from_utc_timestamp
+ - [ ] hour
+ - [ ] last_day
+ - [ ] localtimestamp
+ - [ ] make_date
+ - [ ] make_dt_interval
+ - [ ] make_interval
+ - [ ] make_timestamp
+ - [ ] make_timestamp_ltz
+ - [ ] make_timestamp_ntz
+ - [ ] make_ym_interval
+ - [ ] minute
+ - [ ] month
+ - [ ] months_between
+ - [ ] next_day
+ - [ ] now
+ - [ ] quarter
+ - [ ] second
+ - [ ] timestamp_micros
+ - [ ] timestamp_millis
+ - [ ] timestamp_seconds
+ - [ ] to_date
+ - [ ] to_timestamp
+ - [ ] to_timestamp_ltz
+ - [ ] to_timestamp_ntz
+ - [ ] to_unix_timestamp
+ - [ ] to_utc_timestamp
+ - [ ] trunc
+ - [ ] try_to_timestamp
+ - [ ] unix_date
+ - [ ] unix_micros
+ - [ ] unix_millis
+ - [ ] unix_seconds
+ - [ ] unix_timestamp
+ - [ ] weekday
+ - [ ] weekofyear
+ - [ ] year
+
+### generator_funcs
+ - [ ] explode
+ - [ ] explode_outer
+ - [ ] inline
+ - [ ] inline_outer
+ - [ ] posexplode
+ - [ ] posexplode_outer
+ - [ ] stack
+
+### hash_funcs
+ - [ ] crc32
+ - [ ] hash
+ - [x] md5
+ - [ ] sha
+ - [ ] sha1
+ - [ ] sha2
+ - [ ] xxhash64
+
+### json_funcs
+ - [ ] from_json
+ - [ ] get_json_object
+ - [ ] json_array_length
+ - [ ] json_object_keys
+ - [ ] json_tuple
+ - [ ] schema_of_json
+ - [ ] to_json
+
+### lambda_funcs
+ - [ ] aggregate
+ - [ ] array_sort
+ - [ ] exists
+ - [ ] filter
+ - [ ] forall
+ - [ ] map_filter
+ - [ ] map_zip_with
+ - [ ] reduce
+ - [ ] transform
+ - [ ] transform_keys
+ - [ ] transform_values
+ - [ ] zip_with
+
+### map_funcs
+ - [ ] element_at
+ - [ ] map
+ - [ ] map_concat
+ - [ ] map_contains_key
+ - [ ] map_entries
+ - [ ] map_from_arrays
+ - [ ] map_from_entries
+ - [ ] map_keys
+ - [ ] map_values
+ - [ ] str_to_map
+ - [ ] try_element_at
+
+### math_funcs
+ - [x] %
+ - [x] *
+ - [x] +
+ - [x] -
+ - [x] /
+ - [x] abs
+ - [x] acos
+ - [ ] acosh
+ - [x] asin
+ - [ ] asinh
+ - [x] atan
+ - [x] atan2
+ - [ ] atanh
+ - [ ] bin
+ - [ ] bround
+ - [ ] cbrt
+ - [x] ceil
+ - [x] ceiling
+ - [ ] conv
+ - [x] cos
+ - [ ] cosh
+ - [ ] cot
+ - [ ] csc
+ - [ ] degrees
+ - [ ] div
+ - [ ] e
+ - [x] exp
+ - [ ] expm1
+ - [ ] factorial
+ - [x] floor
+ - [ ] greatest
+ - [ ] hex
+ - [ ] hypot
+ - [ ] least
+ - [x] ln
+ - [ 

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-29 Thread via GitHub


comphead commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1619538503


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,475 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [x] any
+ - [x] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [x] avg
+ - [x] bit_and
+ - [x] bit_or
+ - [x] bit_xor
+ - [x] bool_and
+ - [x] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [x] count
+ - [x] count_if
+ - [ ] count_min_sketch
+ - [x] covar_pop
+ - [x] covar_samp
+ - [x] every
+ - [x] first
+ - [x] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [x] last
+ - [x] last_value
+ - [x] max
+ - [ ] max_by
+ - [x] mean
+ - [ ] median
+ - [x] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [x] regr_avgx
+ - [x] regr_avgy
+ - [x] regr_count

Review Comment:
   regr_avgx supported by DF
   ```
   > SELECT regr_avgx(1, 2);
   +--+
   | REGR_AVGX(Int64(1),Int64(2)) |
   +--+
   | 2.0  |
   +--+
   ```
   so I think all is fair here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-29 Thread via GitHub


comphead commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1619533113


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,475 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [x] any
+ - [x] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [x] avg
+ - [x] bit_and
+ - [x] bit_or
+ - [x] bit_xor
+ - [x] bool_and
+ - [x] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [x] count
+ - [x] count_if
+ - [ ] count_min_sketch
+ - [x] covar_pop
+ - [x] covar_samp
+ - [x] every
+ - [x] first
+ - [x] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [x] last
+ - [x] last_value
+ - [x] max
+ - [ ] max_by
+ - [x] mean
+ - [ ] median
+ - [x] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [x] regr_avgx
+ - [x] regr_avgy
+ - [x] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [x] some
+ - [x] std
+ - [x] stddev
+ - [x] stddev_pop
+ - [x] stddev_samp
+ - [x] sum
+ - [ ] try_avg
+ - [ ] try_sum
+ - [x] var_pop
+ - [x] var_samp
+ - [x] variance
+
+### array_funcs
+ - [ ] array
+ - [ ] array_append
+ - [ ] array_compact
+ - [ ] array_contains
+ - [ ] array_distinct
+ - [ ] array_except
+ - [ ] array_insert
+ - [ ] array_intersect
+ - [ ] array_join
+ - [ ] array_max
+ - [ ] array_min
+ - [ ] array_position
+ - [ ] array_remove
+ - [ ] array_repeat
+ - [ ] array_union
+ - [ ] arrays_overlap
+ - [ ] arrays_zip
+ - [ ] flatten
+ - [ ] get
+ - [ ] sequence
+ - [ ] shuffle
+ - [ ] slice
+ - [ ] sort_array
+
+### bitwise_funcs
+ - [x] &
+ - [x] ^
+ - [ ] bit_count
+ - [ ] bit_get
+ - [ ] getbit
+ - [x] shiftright
+ - [ ] shiftrightunsigned
+ - [x] |
+ - [x] ~
+
+### collection_funcs
+ - [ ] array_size
+ - [ ] cardinality
+ - [ ] concat
+ - [x] reverse
+ - [ ] size
+
+### conditional_funcs
+ - [x] coalesce
+ - [x] if
+ - [x] ifnull
+ - [ ] nanvl
+ - [x] nullif
+ - [x] nvl
+ - [x] nvl2
+ - [ ] when
+
+### conversion_funcs
+ - [ ] bigint
+ - [ ] binary
+ - [ ] boolean
+ - [ ] cast
+ - [ ] date
+ - [ ] decimal
+ - [ ] double
+ - [ ] float
+ - [ ] int
+ - [ ] smallint
+ - [ ] string
+ - [ ] timestamp
+ - [ ] tinyint
+
+### csv_funcs
+ - [ ] from_csv
+ - [ ] schema_of_csv
+ - [ ] to_csv
+
+### datetime_funcs
+ - [ ] add_months
+ - [ ] convert_timezone
+ - [x] curdate
+ - [x] current_date
+ - [ ] current_timestamp
+ - [x] current_timezone
+ - [ ] date_add
+ - [ ] date_diff
+ - [ ] date_format
+ - [ ] date_from_unix_date
+ - [x] date_part
+ - [ ] date_sub
+ - [ ] date_trunc
+ - [ ] dateadd
+ - [ ] datediff
+ - [x] datepart
+ - [ ] day
+ - [ ] dayofmonth
+ - [ ] dayofweek
+ - [ ] dayofyear
+ - [x] extract
+ - [ ] from_unixtime
+ - [ ] from_utc_timestamp
+ - [ ] hour
+ - [ ] last_day
+ - [ ] localtimestamp
+ - [ ] make_date
+ - [ ] make_dt_interval
+ - [ ] make_interval
+ - [ ] make_timestamp
+ - [ ] make_timestamp_ltz
+ - [ ] make_timestamp_ntz
+ - [ ] make_ym_interval
+ - [ ] minute
+ - [ ] month
+ - [ ] months_between
+ - [ ] next_day
+ - [ ] now
+ - [ ] quarter
+ - [ ] second
+ - [ ] timestamp_micros
+ - [ ] timestamp_millis
+ - [ ] timestamp_seconds
+ - [ ] to_date
+ - [ ] to_timestamp
+ - [ ] to_timestamp_ltz
+ - [ ] to_timestamp_ntz
+ - [ ] to_unix_timestamp
+ - [ ] to_utc_timestamp
+ - [ ] trunc
+ - [ ] try_to_timestamp
+ - [ ] unix_date
+ - [ ] unix_micros
+ - [ ] unix_millis
+ - [ ] unix_seconds
+ - [ ] unix_timestamp
+ - [ ] weekday
+ - [ ] weekofyear
+ - [ ] year
+
+### generator_funcs
+ - [ ] explode
+ - [ ] explode_outer
+ - [ ] inline
+ - [ ] inline_outer
+ - [ ] posexplode
+ - [ ] posexplode_outer
+ - [ ] stack
+
+### hash_funcs
+ - [ ] crc32
+ - [ ] hash
+ - [x] md5
+ - [ ] sha
+ - [ ] sha1
+ - [ ] sha2
+ - [ ] xxhash64
+
+### json_funcs
+ - [ ] from_json
+ - [ ] get_json_object
+ - [ ] json_array_length
+ - [ ] json_object_keys
+ - [ ] json_tuple
+ - [ ] schema_of_json
+ - [ ] to_json
+
+### lambda_funcs
+ - [ ] aggregate
+ - [ ] array_sort
+ - [ ] exists
+ - [ ] filter
+ - [ ] forall
+ - [ ] map_filter
+ - [ ] map_zip_with
+ - [ ] reduce
+ - [ ] transform
+ - [ ] transform_keys
+ - [ ] transform_values
+ - [ ] zip_with
+
+### map_funcs
+ - [ ] element_at
+ - [ ] map
+ - [ ] map_concat
+ - [ ] map_contains_key
+ - [ ] map_entries
+ - [ ] map_from_arrays
+ - [ ] map_from_entries
+ - [ ] map_keys
+ - [ ] map_values
+ - [ ] str_to_map
+ - [ ] try_element_at
+
+### math_funcs
+ - [x] %
+ - [x] *
+ - [x] +
+ - [x] -
+ - [x] /
+ - [x] abs
+ - [x] acos
+ - [ ] acosh
+ - [x] asin
+ - [ ] asinh
+ - [x] atan
+ - [x] atan2
+ - [ ] atanh
+ - [ ] bin
+ - [ ] bround
+ - [ ] cbrt
+ - [x] ceil
+ - [x] ceiling
+ - [ ] conv
+ - [x] cos
+ - [ ] cosh
+ - [ ] cot
+ - [ ] csc
+ - [ ] degrees
+ - [ ] div
+ - [ ] e
+ - [x] exp
+ - [ ] expm1
+ - [ ] factorial
+ - [x] floor
+ - [ ] greatest
+ - [ ] hex
+ - [ ] hypot
+ - [ ] least
+ - [x] ln
+ - [ 

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-29 Thread via GitHub


comphead commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1619530851


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,475 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [x] any
+ - [x] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [x] avg
+ - [x] bit_and
+ - [x] bit_or
+ - [x] bit_xor
+ - [x] bool_and
+ - [x] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [x] count
+ - [x] count_if
+ - [ ] count_min_sketch
+ - [x] covar_pop
+ - [x] covar_samp
+ - [x] every
+ - [x] first
+ - [x] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [x] last
+ - [x] last_value
+ - [x] max
+ - [ ] max_by
+ - [x] mean
+ - [ ] median
+ - [x] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [x] regr_avgx
+ - [x] regr_avgy
+ - [x] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [x] some
+ - [x] std
+ - [x] stddev
+ - [x] stddev_pop
+ - [x] stddev_samp
+ - [x] sum
+ - [ ] try_avg
+ - [ ] try_sum
+ - [x] var_pop
+ - [x] var_samp
+ - [x] variance
+
+### array_funcs
+ - [ ] array
+ - [ ] array_append
+ - [ ] array_compact
+ - [ ] array_contains
+ - [ ] array_distinct
+ - [ ] array_except
+ - [ ] array_insert
+ - [ ] array_intersect
+ - [ ] array_join
+ - [ ] array_max
+ - [ ] array_min
+ - [ ] array_position
+ - [ ] array_remove
+ - [ ] array_repeat
+ - [ ] array_union
+ - [ ] arrays_overlap
+ - [ ] arrays_zip
+ - [ ] flatten
+ - [ ] get
+ - [ ] sequence
+ - [ ] shuffle
+ - [ ] slice
+ - [ ] sort_array
+
+### bitwise_funcs
+ - [x] &
+ - [x] ^
+ - [ ] bit_count
+ - [ ] bit_get
+ - [ ] getbit
+ - [x] shiftright
+ - [ ] shiftrightunsigned
+ - [x] |
+ - [x] ~
+
+### collection_funcs
+ - [ ] array_size
+ - [ ] cardinality
+ - [ ] concat
+ - [x] reverse
+ - [ ] size
+
+### conditional_funcs
+ - [x] coalesce
+ - [x] if
+ - [x] ifnull
+ - [ ] nanvl
+ - [x] nullif
+ - [x] nvl
+ - [x] nvl2
+ - [ ] when
+
+### conversion_funcs
+ - [ ] bigint
+ - [ ] binary
+ - [ ] boolean
+ - [ ] cast
+ - [ ] date
+ - [ ] decimal
+ - [ ] double
+ - [ ] float
+ - [ ] int
+ - [ ] smallint
+ - [ ] string
+ - [ ] timestamp
+ - [ ] tinyint
+
+### csv_funcs
+ - [ ] from_csv
+ - [ ] schema_of_csv
+ - [ ] to_csv
+
+### datetime_funcs
+ - [ ] add_months
+ - [ ] convert_timezone
+ - [x] curdate
+ - [x] current_date
+ - [ ] current_timestamp
+ - [x] current_timezone
+ - [ ] date_add
+ - [ ] date_diff
+ - [ ] date_format
+ - [ ] date_from_unix_date
+ - [x] date_part
+ - [ ] date_sub
+ - [ ] date_trunc
+ - [ ] dateadd
+ - [ ] datediff
+ - [x] datepart
+ - [ ] day
+ - [ ] dayofmonth
+ - [ ] dayofweek
+ - [ ] dayofyear
+ - [x] extract
+ - [ ] from_unixtime
+ - [ ] from_utc_timestamp
+ - [ ] hour
+ - [ ] last_day
+ - [ ] localtimestamp
+ - [ ] make_date
+ - [ ] make_dt_interval
+ - [ ] make_interval
+ - [ ] make_timestamp
+ - [ ] make_timestamp_ltz
+ - [ ] make_timestamp_ntz
+ - [ ] make_ym_interval
+ - [ ] minute
+ - [ ] month
+ - [ ] months_between
+ - [ ] next_day
+ - [ ] now
+ - [ ] quarter
+ - [ ] second
+ - [ ] timestamp_micros
+ - [ ] timestamp_millis
+ - [ ] timestamp_seconds
+ - [ ] to_date
+ - [ ] to_timestamp
+ - [ ] to_timestamp_ltz
+ - [ ] to_timestamp_ntz
+ - [ ] to_unix_timestamp
+ - [ ] to_utc_timestamp
+ - [ ] trunc
+ - [ ] try_to_timestamp
+ - [ ] unix_date
+ - [ ] unix_micros
+ - [ ] unix_millis
+ - [ ] unix_seconds
+ - [ ] unix_timestamp
+ - [ ] weekday
+ - [ ] weekofyear
+ - [ ] year
+
+### generator_funcs
+ - [ ] explode
+ - [ ] explode_outer
+ - [ ] inline
+ - [ ] inline_outer
+ - [ ] posexplode
+ - [ ] posexplode_outer
+ - [ ] stack
+
+### hash_funcs
+ - [ ] crc32
+ - [ ] hash
+ - [x] md5
+ - [ ] sha
+ - [ ] sha1
+ - [ ] sha2
+ - [ ] xxhash64
+
+### json_funcs
+ - [ ] from_json
+ - [ ] get_json_object
+ - [ ] json_array_length
+ - [ ] json_object_keys
+ - [ ] json_tuple
+ - [ ] schema_of_json
+ - [ ] to_json
+
+### lambda_funcs
+ - [ ] aggregate
+ - [ ] array_sort
+ - [ ] exists
+ - [ ] filter
+ - [ ] forall
+ - [ ] map_filter
+ - [ ] map_zip_with
+ - [ ] reduce
+ - [ ] transform
+ - [ ] transform_keys
+ - [ ] transform_values
+ - [ ] zip_with
+
+### map_funcs
+ - [ ] element_at
+ - [ ] map
+ - [ ] map_concat
+ - [ ] map_contains_key
+ - [ ] map_entries
+ - [ ] map_from_arrays
+ - [ ] map_from_entries
+ - [ ] map_keys
+ - [ ] map_values
+ - [ ] str_to_map
+ - [ ] try_element_at
+
+### math_funcs
+ - [x] %
+ - [x] *
+ - [x] +
+ - [x] -
+ - [x] /
+ - [x] abs
+ - [x] acos
+ - [ ] acosh
+ - [x] asin
+ - [ ] asinh
+ - [x] atan
+ - [x] atan2
+ - [ ] atanh
+ - [ ] bin
+ - [ ] bround
+ - [ ] cbrt
+ - [x] ceil
+ - [x] ceiling
+ - [ ] conv
+ - [x] cos
+ - [ ] cosh
+ - [ ] cot
+ - [ ] csc
+ - [ ] degrees
+ - [ ] div
+ - [ ] e
+ - [x] exp
+ - [ ] expm1
+ - [ ] factorial
+ - [x] floor
+ - [ ] greatest
+ - [ ] hex
+ - [ ] hypot
+ - [ ] least
+ - [x] ln
+ - [ 

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-29 Thread via GitHub


comphead commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1619529446


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,475 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [x] any
+ - [x] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [x] avg
+ - [x] bit_and
+ - [x] bit_or
+ - [x] bit_xor
+ - [x] bool_and
+ - [x] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [x] count
+ - [x] count_if
+ - [ ] count_min_sketch
+ - [x] covar_pop
+ - [x] covar_samp
+ - [x] every
+ - [x] first
+ - [x] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [x] last
+ - [x] last_value
+ - [x] max
+ - [ ] max_by
+ - [x] mean
+ - [ ] median
+ - [x] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [x] regr_avgx
+ - [x] regr_avgy
+ - [x] regr_count

Review Comment:
   ```
   test("regr_avgx") {
   Seq(false, true).foreach { dictionary =>
 withSQLConf(
   "parquet.enable.dictionary" -> dictionary.toString,
   "spark.comet.exec.shuffle.enabled" -> "true",
   CometConf.COMET_ENABLED.key -> "true",
   CometConf.COMET_EXEC_ENABLED.key -> "true",
   CometConf.COMET_SHUFFLE_ENFORCE_MODE_ENABLED.key -> "true",
   CometConf.COMET_EXEC_ALL_OPERATOR_ENABLED.key -> "true",
 ) {
   val table = "test"
   withTable(table) {
 sql(s"create table $table(a int, b int) using parquet")
 sql(s"insert into $table VALUES (1, 2), (2, 2), (2, 3), (2, 4)")
 checkSparkAnswerAndOperator(s"SELECT regr_avgx(a, b) FROM $table")
   }
 }
   }
 }
   ```
   
   regr_avgx test passes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-29 Thread via GitHub


comphead commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1619525532


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,475 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [x] any
+ - [x] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [x] avg
+ - [x] bit_and
+ - [x] bit_or
+ - [x] bit_xor
+ - [x] bool_and
+ - [x] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [x] count
+ - [x] count_if
+ - [ ] count_min_sketch
+ - [x] covar_pop
+ - [x] covar_samp
+ - [x] every
+ - [x] first
+ - [x] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [x] last
+ - [x] last_value
+ - [x] max
+ - [ ] max_by
+ - [x] mean
+ - [ ] median
+ - [x] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [x] regr_avgx
+ - [x] regr_avgy
+ - [x] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [x] some
+ - [x] std
+ - [x] stddev
+ - [x] stddev_pop
+ - [x] stddev_samp
+ - [x] sum
+ - [ ] try_avg
+ - [ ] try_sum
+ - [x] var_pop
+ - [x] var_samp
+ - [x] variance
+
+### array_funcs
+ - [ ] array
+ - [ ] array_append
+ - [ ] array_compact
+ - [ ] array_contains
+ - [ ] array_distinct
+ - [ ] array_except
+ - [ ] array_insert
+ - [ ] array_intersect
+ - [ ] array_join
+ - [ ] array_max
+ - [ ] array_min
+ - [ ] array_position
+ - [ ] array_remove
+ - [ ] array_repeat
+ - [ ] array_union
+ - [ ] arrays_overlap
+ - [ ] arrays_zip
+ - [ ] flatten
+ - [ ] get
+ - [ ] sequence
+ - [ ] shuffle
+ - [ ] slice
+ - [ ] sort_array
+
+### bitwise_funcs
+ - [x] &
+ - [x] ^
+ - [ ] bit_count
+ - [ ] bit_get
+ - [ ] getbit
+ - [x] shiftright
+ - [ ] shiftrightunsigned
+ - [x] |
+ - [x] ~
+
+### collection_funcs
+ - [ ] array_size
+ - [ ] cardinality
+ - [ ] concat
+ - [x] reverse
+ - [ ] size
+
+### conditional_funcs
+ - [x] coalesce
+ - [x] if
+ - [x] ifnull
+ - [ ] nanvl
+ - [x] nullif
+ - [x] nvl
+ - [x] nvl2
+ - [ ] when
+
+### conversion_funcs
+ - [ ] bigint
+ - [ ] binary
+ - [ ] boolean
+ - [ ] cast
+ - [ ] date
+ - [ ] decimal
+ - [ ] double
+ - [ ] float
+ - [ ] int
+ - [ ] smallint
+ - [ ] string
+ - [ ] timestamp
+ - [ ] tinyint
+
+### csv_funcs
+ - [ ] from_csv
+ - [ ] schema_of_csv
+ - [ ] to_csv
+
+### datetime_funcs
+ - [ ] add_months
+ - [ ] convert_timezone
+ - [x] curdate
+ - [x] current_date
+ - [ ] current_timestamp
+ - [x] current_timezone
+ - [ ] date_add
+ - [ ] date_diff
+ - [ ] date_format
+ - [ ] date_from_unix_date
+ - [x] date_part
+ - [ ] date_sub
+ - [ ] date_trunc
+ - [ ] dateadd
+ - [ ] datediff
+ - [x] datepart
+ - [ ] day
+ - [ ] dayofmonth
+ - [ ] dayofweek
+ - [ ] dayofyear
+ - [x] extract
+ - [ ] from_unixtime
+ - [ ] from_utc_timestamp
+ - [ ] hour
+ - [ ] last_day
+ - [ ] localtimestamp
+ - [ ] make_date
+ - [ ] make_dt_interval
+ - [ ] make_interval
+ - [ ] make_timestamp
+ - [ ] make_timestamp_ltz
+ - [ ] make_timestamp_ntz
+ - [ ] make_ym_interval
+ - [ ] minute
+ - [ ] month
+ - [ ] months_between
+ - [ ] next_day
+ - [ ] now
+ - [ ] quarter
+ - [ ] second
+ - [ ] timestamp_micros
+ - [ ] timestamp_millis
+ - [ ] timestamp_seconds
+ - [ ] to_date
+ - [ ] to_timestamp
+ - [ ] to_timestamp_ltz
+ - [ ] to_timestamp_ntz
+ - [ ] to_unix_timestamp
+ - [ ] to_utc_timestamp
+ - [ ] trunc
+ - [ ] try_to_timestamp
+ - [ ] unix_date
+ - [ ] unix_micros
+ - [ ] unix_millis
+ - [ ] unix_seconds
+ - [ ] unix_timestamp
+ - [ ] weekday
+ - [ ] weekofyear
+ - [ ] year
+
+### generator_funcs
+ - [ ] explode
+ - [ ] explode_outer
+ - [ ] inline
+ - [ ] inline_outer
+ - [ ] posexplode
+ - [ ] posexplode_outer
+ - [ ] stack
+
+### hash_funcs
+ - [ ] crc32
+ - [ ] hash
+ - [x] md5
+ - [ ] sha
+ - [ ] sha1
+ - [ ] sha2
+ - [ ] xxhash64
+
+### json_funcs
+ - [ ] from_json
+ - [ ] get_json_object
+ - [ ] json_array_length
+ - [ ] json_object_keys
+ - [ ] json_tuple
+ - [ ] schema_of_json
+ - [ ] to_json
+
+### lambda_funcs
+ - [ ] aggregate
+ - [ ] array_sort
+ - [ ] exists
+ - [ ] filter
+ - [ ] forall
+ - [ ] map_filter
+ - [ ] map_zip_with
+ - [ ] reduce
+ - [ ] transform
+ - [ ] transform_keys
+ - [ ] transform_values
+ - [ ] zip_with
+
+### map_funcs
+ - [ ] element_at
+ - [ ] map
+ - [ ] map_concat
+ - [ ] map_contains_key
+ - [ ] map_entries
+ - [ ] map_from_arrays
+ - [ ] map_from_entries
+ - [ ] map_keys
+ - [ ] map_values
+ - [ ] str_to_map
+ - [ ] try_element_at
+
+### math_funcs
+ - [x] %
+ - [x] *
+ - [x] +
+ - [x] -
+ - [x] /
+ - [x] abs
+ - [x] acos
+ - [ ] acosh
+ - [x] asin
+ - [ ] asinh
+ - [x] atan
+ - [x] atan2
+ - [ ] atanh
+ - [ ] bin
+ - [ ] bround
+ - [ ] cbrt
+ - [x] ceil
+ - [x] ceiling
+ - [ ] conv
+ - [x] cos
+ - [ ] cosh
+ - [ ] cot
+ - [ ] csc
+ - [ ] degrees
+ - [ ] div
+ - [ ] e
+ - [x] exp
+ - [ ] expm1
+ - [ ] factorial
+ - [x] floor
+ - [ ] greatest
+ - [ ] hex
+ - [ ] hypot
+ - [ ] least
+ - [x] ln
+ - [ 

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-29 Thread via GitHub


comphead commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1619523325


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,475 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [x] any
+ - [x] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [x] avg
+ - [x] bit_and
+ - [x] bit_or
+ - [x] bit_xor
+ - [x] bool_and
+ - [x] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [x] count
+ - [x] count_if
+ - [ ] count_min_sketch
+ - [x] covar_pop
+ - [x] covar_samp
+ - [x] every
+ - [x] first
+ - [x] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [x] last
+ - [x] last_value
+ - [x] max
+ - [ ] max_by
+ - [x] mean
+ - [ ] median
+ - [x] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [x] regr_avgx
+ - [x] regr_avgy
+ - [x] regr_count

Review Comment:
   The test is exactly for year, but if only YEAR supported, what is supposed 
to show to the user? Not supported? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-29 Thread via GitHub


andygrove commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1619275529


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,475 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [x] any
+ - [x] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [x] avg
+ - [x] bit_and
+ - [x] bit_or
+ - [x] bit_xor
+ - [x] bool_and
+ - [x] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [x] count
+ - [x] count_if
+ - [ ] count_min_sketch
+ - [x] covar_pop
+ - [x] covar_samp
+ - [x] every
+ - [x] first
+ - [x] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [x] last
+ - [x] last_value
+ - [x] max
+ - [ ] max_by
+ - [x] mean
+ - [ ] median
+ - [x] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [x] regr_avgx
+ - [x] regr_avgy
+ - [x] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [x] some
+ - [x] std
+ - [x] stddev
+ - [x] stddev_pop
+ - [x] stddev_samp
+ - [x] sum
+ - [ ] try_avg
+ - [ ] try_sum
+ - [x] var_pop
+ - [x] var_samp
+ - [x] variance
+
+### array_funcs
+ - [ ] array
+ - [ ] array_append
+ - [ ] array_compact
+ - [ ] array_contains
+ - [ ] array_distinct
+ - [ ] array_except
+ - [ ] array_insert
+ - [ ] array_intersect
+ - [ ] array_join
+ - [ ] array_max
+ - [ ] array_min
+ - [ ] array_position
+ - [ ] array_remove
+ - [ ] array_repeat
+ - [ ] array_union
+ - [ ] arrays_overlap
+ - [ ] arrays_zip
+ - [ ] flatten
+ - [ ] get
+ - [ ] sequence
+ - [ ] shuffle
+ - [ ] slice
+ - [ ] sort_array
+
+### bitwise_funcs
+ - [x] &
+ - [x] ^
+ - [ ] bit_count
+ - [ ] bit_get
+ - [ ] getbit
+ - [x] shiftright
+ - [ ] shiftrightunsigned
+ - [x] |
+ - [x] ~
+
+### collection_funcs
+ - [ ] array_size
+ - [ ] cardinality
+ - [ ] concat
+ - [x] reverse
+ - [ ] size
+
+### conditional_funcs
+ - [x] coalesce
+ - [x] if
+ - [x] ifnull
+ - [ ] nanvl
+ - [x] nullif
+ - [x] nvl
+ - [x] nvl2
+ - [ ] when
+
+### conversion_funcs
+ - [ ] bigint
+ - [ ] binary
+ - [ ] boolean
+ - [ ] cast
+ - [ ] date
+ - [ ] decimal
+ - [ ] double
+ - [ ] float
+ - [ ] int
+ - [ ] smallint
+ - [ ] string
+ - [ ] timestamp
+ - [ ] tinyint
+
+### csv_funcs
+ - [ ] from_csv
+ - [ ] schema_of_csv
+ - [ ] to_csv
+
+### datetime_funcs
+ - [ ] add_months
+ - [ ] convert_timezone
+ - [x] curdate
+ - [x] current_date
+ - [ ] current_timestamp
+ - [x] current_timezone
+ - [ ] date_add
+ - [ ] date_diff
+ - [ ] date_format
+ - [ ] date_from_unix_date
+ - [x] date_part
+ - [ ] date_sub
+ - [ ] date_trunc
+ - [ ] dateadd
+ - [ ] datediff
+ - [x] datepart
+ - [ ] day
+ - [ ] dayofmonth
+ - [ ] dayofweek
+ - [ ] dayofyear
+ - [x] extract

Review Comment:
   We only support `extract` for `YEAR`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-29 Thread via GitHub


andygrove commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1619275258


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,475 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [x] any
+ - [x] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [x] avg
+ - [x] bit_and
+ - [x] bit_or
+ - [x] bit_xor
+ - [x] bool_and
+ - [x] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [x] count
+ - [x] count_if
+ - [ ] count_min_sketch
+ - [x] covar_pop
+ - [x] covar_samp
+ - [x] every
+ - [x] first
+ - [x] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [x] last
+ - [x] last_value
+ - [x] max
+ - [ ] max_by
+ - [x] mean
+ - [ ] median
+ - [x] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [x] regr_avgx
+ - [x] regr_avgy
+ - [x] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [x] some
+ - [x] std
+ - [x] stddev
+ - [x] stddev_pop
+ - [x] stddev_samp
+ - [x] sum
+ - [ ] try_avg
+ - [ ] try_sum
+ - [x] var_pop
+ - [x] var_samp
+ - [x] variance
+
+### array_funcs
+ - [ ] array
+ - [ ] array_append
+ - [ ] array_compact
+ - [ ] array_contains
+ - [ ] array_distinct
+ - [ ] array_except
+ - [ ] array_insert
+ - [ ] array_intersect
+ - [ ] array_join
+ - [ ] array_max
+ - [ ] array_min
+ - [ ] array_position
+ - [ ] array_remove
+ - [ ] array_repeat
+ - [ ] array_union
+ - [ ] arrays_overlap
+ - [ ] arrays_zip
+ - [ ] flatten
+ - [ ] get
+ - [ ] sequence
+ - [ ] shuffle
+ - [ ] slice
+ - [ ] sort_array
+
+### bitwise_funcs
+ - [x] &
+ - [x] ^
+ - [ ] bit_count
+ - [ ] bit_get
+ - [ ] getbit
+ - [x] shiftright
+ - [ ] shiftrightunsigned
+ - [x] |
+ - [x] ~
+
+### collection_funcs
+ - [ ] array_size
+ - [ ] cardinality
+ - [ ] concat
+ - [x] reverse
+ - [ ] size
+
+### conditional_funcs
+ - [x] coalesce
+ - [x] if
+ - [x] ifnull
+ - [ ] nanvl
+ - [x] nullif
+ - [x] nvl
+ - [x] nvl2
+ - [ ] when
+
+### conversion_funcs
+ - [ ] bigint
+ - [ ] binary
+ - [ ] boolean
+ - [ ] cast
+ - [ ] date
+ - [ ] decimal
+ - [ ] double
+ - [ ] float
+ - [ ] int
+ - [ ] smallint
+ - [ ] string
+ - [ ] timestamp
+ - [ ] tinyint
+
+### csv_funcs
+ - [ ] from_csv
+ - [ ] schema_of_csv
+ - [ ] to_csv
+
+### datetime_funcs
+ - [ ] add_months
+ - [ ] convert_timezone
+ - [x] curdate
+ - [x] current_date
+ - [ ] current_timestamp
+ - [x] current_timezone
+ - [ ] date_add
+ - [ ] date_diff
+ - [ ] date_format
+ - [ ] date_from_unix_date
+ - [x] date_part

Review Comment:
   We only support `date_part` for `YEAR`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-29 Thread via GitHub


viirya commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1619233974


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,475 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [x] any
+ - [x] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [x] avg
+ - [x] bit_and
+ - [x] bit_or
+ - [x] bit_xor
+ - [x] bool_and
+ - [x] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [x] count
+ - [x] count_if
+ - [ ] count_min_sketch
+ - [x] covar_pop
+ - [x] covar_samp
+ - [x] every
+ - [x] first
+ - [x] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [x] last
+ - [x] last_value
+ - [x] max
+ - [ ] max_by
+ - [x] mean
+ - [ ] median
+ - [x] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [x] regr_avgx
+ - [x] regr_avgy
+ - [x] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [x] some
+ - [x] std
+ - [x] stddev
+ - [x] stddev_pop
+ - [x] stddev_samp
+ - [x] sum
+ - [ ] try_avg
+ - [ ] try_sum
+ - [x] var_pop
+ - [x] var_samp
+ - [x] variance
+
+### array_funcs
+ - [ ] array
+ - [ ] array_append
+ - [ ] array_compact
+ - [ ] array_contains
+ - [ ] array_distinct
+ - [ ] array_except
+ - [ ] array_insert
+ - [ ] array_intersect
+ - [ ] array_join
+ - [ ] array_max
+ - [ ] array_min
+ - [ ] array_position
+ - [ ] array_remove
+ - [ ] array_repeat
+ - [ ] array_union
+ - [ ] arrays_overlap
+ - [ ] arrays_zip
+ - [ ] flatten
+ - [ ] get
+ - [ ] sequence
+ - [ ] shuffle
+ - [ ] slice
+ - [ ] sort_array
+
+### bitwise_funcs
+ - [x] &
+ - [x] ^
+ - [ ] bit_count
+ - [ ] bit_get
+ - [ ] getbit
+ - [x] shiftright
+ - [ ] shiftrightunsigned
+ - [x] |
+ - [x] ~
+
+### collection_funcs
+ - [ ] array_size
+ - [ ] cardinality
+ - [ ] concat
+ - [x] reverse
+ - [ ] size
+
+### conditional_funcs
+ - [x] coalesce
+ - [x] if
+ - [x] ifnull
+ - [ ] nanvl
+ - [x] nullif
+ - [x] nvl
+ - [x] nvl2
+ - [ ] when
+
+### conversion_funcs
+ - [ ] bigint
+ - [ ] binary
+ - [ ] boolean
+ - [ ] cast
+ - [ ] date
+ - [ ] decimal
+ - [ ] double
+ - [ ] float
+ - [ ] int
+ - [ ] smallint
+ - [ ] string
+ - [ ] timestamp
+ - [ ] tinyint
+
+### csv_funcs
+ - [ ] from_csv
+ - [ ] schema_of_csv
+ - [ ] to_csv
+
+### datetime_funcs
+ - [ ] add_months
+ - [ ] convert_timezone
+ - [x] curdate
+ - [x] current_date
+ - [ ] current_timestamp
+ - [x] current_timezone
+ - [ ] date_add
+ - [ ] date_diff
+ - [ ] date_format
+ - [ ] date_from_unix_date
+ - [x] date_part
+ - [ ] date_sub
+ - [ ] date_trunc
+ - [ ] dateadd
+ - [ ] datediff
+ - [x] datepart
+ - [ ] day
+ - [ ] dayofmonth
+ - [ ] dayofweek
+ - [ ] dayofyear
+ - [x] extract
+ - [ ] from_unixtime
+ - [ ] from_utc_timestamp
+ - [ ] hour
+ - [ ] last_day
+ - [ ] localtimestamp
+ - [ ] make_date
+ - [ ] make_dt_interval
+ - [ ] make_interval
+ - [ ] make_timestamp
+ - [ ] make_timestamp_ltz
+ - [ ] make_timestamp_ntz
+ - [ ] make_ym_interval
+ - [ ] minute
+ - [ ] month
+ - [ ] months_between
+ - [ ] next_day
+ - [ ] now
+ - [ ] quarter
+ - [ ] second
+ - [ ] timestamp_micros
+ - [ ] timestamp_millis
+ - [ ] timestamp_seconds
+ - [ ] to_date
+ - [ ] to_timestamp
+ - [ ] to_timestamp_ltz
+ - [ ] to_timestamp_ntz
+ - [ ] to_unix_timestamp
+ - [ ] to_utc_timestamp
+ - [ ] trunc
+ - [ ] try_to_timestamp
+ - [ ] unix_date
+ - [ ] unix_micros
+ - [ ] unix_millis
+ - [ ] unix_seconds
+ - [ ] unix_timestamp
+ - [ ] weekday
+ - [ ] weekofyear
+ - [ ] year
+
+### generator_funcs
+ - [ ] explode
+ - [ ] explode_outer
+ - [ ] inline
+ - [ ] inline_outer
+ - [ ] posexplode
+ - [ ] posexplode_outer
+ - [ ] stack
+
+### hash_funcs
+ - [ ] crc32
+ - [ ] hash
+ - [x] md5
+ - [ ] sha
+ - [ ] sha1
+ - [ ] sha2
+ - [ ] xxhash64
+
+### json_funcs
+ - [ ] from_json
+ - [ ] get_json_object
+ - [ ] json_array_length
+ - [ ] json_object_keys
+ - [ ] json_tuple
+ - [ ] schema_of_json
+ - [ ] to_json
+
+### lambda_funcs
+ - [ ] aggregate
+ - [ ] array_sort
+ - [ ] exists
+ - [ ] filter
+ - [ ] forall
+ - [ ] map_filter
+ - [ ] map_zip_with
+ - [ ] reduce
+ - [ ] transform
+ - [ ] transform_keys
+ - [ ] transform_values
+ - [ ] zip_with
+
+### map_funcs
+ - [ ] element_at
+ - [ ] map
+ - [ ] map_concat
+ - [ ] map_contains_key
+ - [ ] map_entries
+ - [ ] map_from_arrays
+ - [ ] map_from_entries
+ - [ ] map_keys
+ - [ ] map_values
+ - [ ] str_to_map
+ - [ ] try_element_at
+
+### math_funcs
+ - [x] %
+ - [x] *
+ - [x] +
+ - [x] -
+ - [x] /
+ - [x] abs
+ - [x] acos
+ - [ ] acosh
+ - [x] asin
+ - [ ] asinh
+ - [x] atan
+ - [x] atan2
+ - [ ] atanh
+ - [ ] bin
+ - [ ] bround
+ - [ ] cbrt
+ - [x] ceil
+ - [x] ceiling
+ - [ ] conv
+ - [x] cos
+ - [ ] cosh
+ - [ ] cot
+ - [ ] csc
+ - [ ] degrees
+ - [ ] div
+ - [ ] e
+ - [x] exp
+ - [ ] expm1
+ - [ ] factorial
+ - [x] floor
+ - [ ] greatest
+ - [ ] hex
+ - [ ] hypot
+ - [ ] least
+ - [x] ln
+ - [ ] 

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-29 Thread via GitHub


andygrove commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1619211189


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,475 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [x] any
+ - [x] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [x] avg
+ - [x] bit_and
+ - [x] bit_or
+ - [x] bit_xor
+ - [x] bool_and
+ - [x] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [x] count
+ - [x] count_if
+ - [ ] count_min_sketch
+ - [x] covar_pop
+ - [x] covar_samp
+ - [x] every
+ - [x] first
+ - [x] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [x] last
+ - [x] last_value
+ - [x] max
+ - [ ] max_by
+ - [x] mean
+ - [ ] median
+ - [x] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [x] regr_avgx
+ - [x] regr_avgy
+ - [x] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [x] some
+ - [x] std
+ - [x] stddev
+ - [x] stddev_pop
+ - [x] stddev_samp
+ - [x] sum
+ - [ ] try_avg
+ - [ ] try_sum
+ - [x] var_pop
+ - [x] var_samp
+ - [x] variance
+
+### array_funcs
+ - [ ] array
+ - [ ] array_append
+ - [ ] array_compact
+ - [ ] array_contains
+ - [ ] array_distinct
+ - [ ] array_except
+ - [ ] array_insert
+ - [ ] array_intersect
+ - [ ] array_join
+ - [ ] array_max
+ - [ ] array_min
+ - [ ] array_position
+ - [ ] array_remove
+ - [ ] array_repeat
+ - [ ] array_union
+ - [ ] arrays_overlap
+ - [ ] arrays_zip
+ - [ ] flatten
+ - [ ] get
+ - [ ] sequence
+ - [ ] shuffle
+ - [ ] slice
+ - [ ] sort_array
+
+### bitwise_funcs
+ - [x] &
+ - [x] ^
+ - [ ] bit_count
+ - [ ] bit_get
+ - [ ] getbit
+ - [x] shiftright
+ - [ ] shiftrightunsigned
+ - [x] |
+ - [x] ~
+
+### collection_funcs
+ - [ ] array_size
+ - [ ] cardinality
+ - [ ] concat
+ - [x] reverse
+ - [ ] size
+
+### conditional_funcs
+ - [x] coalesce
+ - [x] if
+ - [x] ifnull
+ - [ ] nanvl
+ - [x] nullif
+ - [x] nvl
+ - [x] nvl2
+ - [ ] when
+
+### conversion_funcs
+ - [ ] bigint
+ - [ ] binary
+ - [ ] boolean
+ - [ ] cast
+ - [ ] date
+ - [ ] decimal
+ - [ ] double
+ - [ ] float
+ - [ ] int
+ - [ ] smallint
+ - [ ] string
+ - [ ] timestamp
+ - [ ] tinyint
+
+### csv_funcs
+ - [ ] from_csv
+ - [ ] schema_of_csv
+ - [ ] to_csv
+
+### datetime_funcs
+ - [ ] add_months
+ - [ ] convert_timezone
+ - [x] curdate
+ - [x] current_date
+ - [ ] current_timestamp
+ - [x] current_timezone
+ - [ ] date_add
+ - [ ] date_diff
+ - [ ] date_format
+ - [ ] date_from_unix_date
+ - [x] date_part
+ - [ ] date_sub
+ - [ ] date_trunc
+ - [ ] dateadd
+ - [ ] datediff
+ - [x] datepart
+ - [ ] day
+ - [ ] dayofmonth
+ - [ ] dayofweek
+ - [ ] dayofyear
+ - [x] extract
+ - [ ] from_unixtime
+ - [ ] from_utc_timestamp
+ - [ ] hour
+ - [ ] last_day
+ - [ ] localtimestamp
+ - [ ] make_date
+ - [ ] make_dt_interval
+ - [ ] make_interval
+ - [ ] make_timestamp
+ - [ ] make_timestamp_ltz
+ - [ ] make_timestamp_ntz
+ - [ ] make_ym_interval
+ - [ ] minute
+ - [ ] month
+ - [ ] months_between
+ - [ ] next_day
+ - [ ] now
+ - [ ] quarter
+ - [ ] second
+ - [ ] timestamp_micros
+ - [ ] timestamp_millis
+ - [ ] timestamp_seconds
+ - [ ] to_date
+ - [ ] to_timestamp
+ - [ ] to_timestamp_ltz
+ - [ ] to_timestamp_ntz
+ - [ ] to_unix_timestamp
+ - [ ] to_utc_timestamp
+ - [ ] trunc
+ - [ ] try_to_timestamp
+ - [ ] unix_date
+ - [ ] unix_micros
+ - [ ] unix_millis
+ - [ ] unix_seconds
+ - [ ] unix_timestamp
+ - [ ] weekday
+ - [ ] weekofyear
+ - [ ] year
+
+### generator_funcs
+ - [ ] explode
+ - [ ] explode_outer
+ - [ ] inline
+ - [ ] inline_outer
+ - [ ] posexplode
+ - [ ] posexplode_outer
+ - [ ] stack
+
+### hash_funcs
+ - [ ] crc32
+ - [ ] hash
+ - [x] md5
+ - [ ] sha
+ - [ ] sha1
+ - [ ] sha2
+ - [ ] xxhash64
+
+### json_funcs
+ - [ ] from_json
+ - [ ] get_json_object
+ - [ ] json_array_length
+ - [ ] json_object_keys
+ - [ ] json_tuple
+ - [ ] schema_of_json
+ - [ ] to_json
+
+### lambda_funcs
+ - [ ] aggregate
+ - [ ] array_sort
+ - [ ] exists
+ - [ ] filter
+ - [ ] forall
+ - [ ] map_filter
+ - [ ] map_zip_with
+ - [ ] reduce
+ - [ ] transform
+ - [ ] transform_keys
+ - [ ] transform_values
+ - [ ] zip_with
+
+### map_funcs
+ - [ ] element_at
+ - [ ] map
+ - [ ] map_concat
+ - [ ] map_contains_key
+ - [ ] map_entries
+ - [ ] map_from_arrays
+ - [ ] map_from_entries
+ - [ ] map_keys
+ - [ ] map_values
+ - [ ] str_to_map
+ - [ ] try_element_at
+
+### math_funcs
+ - [x] %
+ - [x] *
+ - [x] +
+ - [x] -
+ - [x] /
+ - [x] abs
+ - [x] acos
+ - [ ] acosh
+ - [x] asin
+ - [ ] asinh
+ - [x] atan
+ - [x] atan2
+ - [ ] atanh
+ - [ ] bin
+ - [ ] bround
+ - [ ] cbrt
+ - [x] ceil
+ - [x] ceiling
+ - [ ] conv
+ - [x] cos
+ - [ ] cosh
+ - [ ] cot
+ - [ ] csc
+ - [ ] degrees
+ - [ ] div
+ - [ ] e
+ - [x] exp
+ - [ ] expm1
+ - [ ] factorial
+ - [x] floor
+ - [ ] greatest
+ - [ ] hex
+ - [ ] hypot
+ - [ ] least
+ - [x] ln
+ - [ 

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-29 Thread via GitHub


andygrove commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1619210130


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,475 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [x] any
+ - [x] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [x] avg
+ - [x] bit_and
+ - [x] bit_or
+ - [x] bit_xor
+ - [x] bool_and
+ - [x] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [x] count
+ - [x] count_if
+ - [ ] count_min_sketch
+ - [x] covar_pop
+ - [x] covar_samp
+ - [x] every
+ - [x] first
+ - [x] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [x] last
+ - [x] last_value
+ - [x] max
+ - [ ] max_by
+ - [x] mean
+ - [ ] median
+ - [x] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [x] regr_avgx
+ - [x] regr_avgy
+ - [x] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [x] some
+ - [x] std
+ - [x] stddev
+ - [x] stddev_pop
+ - [x] stddev_samp
+ - [x] sum
+ - [ ] try_avg
+ - [ ] try_sum
+ - [x] var_pop
+ - [x] var_samp
+ - [x] variance
+
+### array_funcs
+ - [ ] array
+ - [ ] array_append
+ - [ ] array_compact
+ - [ ] array_contains
+ - [ ] array_distinct
+ - [ ] array_except
+ - [ ] array_insert
+ - [ ] array_intersect
+ - [ ] array_join
+ - [ ] array_max
+ - [ ] array_min
+ - [ ] array_position
+ - [ ] array_remove
+ - [ ] array_repeat
+ - [ ] array_union
+ - [ ] arrays_overlap
+ - [ ] arrays_zip
+ - [ ] flatten
+ - [ ] get
+ - [ ] sequence
+ - [ ] shuffle
+ - [ ] slice
+ - [ ] sort_array
+
+### bitwise_funcs
+ - [x] &
+ - [x] ^
+ - [ ] bit_count
+ - [ ] bit_get
+ - [ ] getbit
+ - [x] shiftright
+ - [ ] shiftrightunsigned
+ - [x] |
+ - [x] ~
+
+### collection_funcs
+ - [ ] array_size
+ - [ ] cardinality
+ - [ ] concat
+ - [x] reverse
+ - [ ] size
+
+### conditional_funcs
+ - [x] coalesce
+ - [x] if
+ - [x] ifnull
+ - [ ] nanvl
+ - [x] nullif
+ - [x] nvl
+ - [x] nvl2
+ - [ ] when
+
+### conversion_funcs
+ - [ ] bigint
+ - [ ] binary
+ - [ ] boolean
+ - [ ] cast
+ - [ ] date
+ - [ ] decimal
+ - [ ] double
+ - [ ] float
+ - [ ] int
+ - [ ] smallint
+ - [ ] string
+ - [ ] timestamp
+ - [ ] tinyint
+
+### csv_funcs
+ - [ ] from_csv
+ - [ ] schema_of_csv
+ - [ ] to_csv
+
+### datetime_funcs
+ - [ ] add_months
+ - [ ] convert_timezone
+ - [x] curdate
+ - [x] current_date
+ - [ ] current_timestamp
+ - [x] current_timezone
+ - [ ] date_add
+ - [ ] date_diff
+ - [ ] date_format
+ - [ ] date_from_unix_date
+ - [x] date_part
+ - [ ] date_sub
+ - [ ] date_trunc
+ - [ ] dateadd
+ - [ ] datediff
+ - [x] datepart
+ - [ ] day
+ - [ ] dayofmonth
+ - [ ] dayofweek
+ - [ ] dayofyear
+ - [x] extract
+ - [ ] from_unixtime
+ - [ ] from_utc_timestamp
+ - [ ] hour
+ - [ ] last_day
+ - [ ] localtimestamp
+ - [ ] make_date
+ - [ ] make_dt_interval
+ - [ ] make_interval
+ - [ ] make_timestamp
+ - [ ] make_timestamp_ltz
+ - [ ] make_timestamp_ntz
+ - [ ] make_ym_interval
+ - [ ] minute
+ - [ ] month
+ - [ ] months_between
+ - [ ] next_day
+ - [ ] now
+ - [ ] quarter
+ - [ ] second
+ - [ ] timestamp_micros
+ - [ ] timestamp_millis
+ - [ ] timestamp_seconds
+ - [ ] to_date
+ - [ ] to_timestamp
+ - [ ] to_timestamp_ltz
+ - [ ] to_timestamp_ntz
+ - [ ] to_unix_timestamp
+ - [ ] to_utc_timestamp
+ - [ ] trunc
+ - [ ] try_to_timestamp
+ - [ ] unix_date
+ - [ ] unix_micros
+ - [ ] unix_millis
+ - [ ] unix_seconds
+ - [ ] unix_timestamp
+ - [ ] weekday
+ - [ ] weekofyear
+ - [ ] year
+
+### generator_funcs
+ - [ ] explode
+ - [ ] explode_outer
+ - [ ] inline
+ - [ ] inline_outer
+ - [ ] posexplode
+ - [ ] posexplode_outer
+ - [ ] stack
+
+### hash_funcs
+ - [ ] crc32
+ - [ ] hash
+ - [x] md5
+ - [ ] sha
+ - [ ] sha1
+ - [ ] sha2
+ - [ ] xxhash64
+
+### json_funcs
+ - [ ] from_json
+ - [ ] get_json_object
+ - [ ] json_array_length
+ - [ ] json_object_keys
+ - [ ] json_tuple
+ - [ ] schema_of_json
+ - [ ] to_json
+
+### lambda_funcs
+ - [ ] aggregate
+ - [ ] array_sort
+ - [ ] exists
+ - [ ] filter
+ - [ ] forall
+ - [ ] map_filter
+ - [ ] map_zip_with
+ - [ ] reduce
+ - [ ] transform
+ - [ ] transform_keys
+ - [ ] transform_values
+ - [ ] zip_with
+
+### map_funcs
+ - [ ] element_at
+ - [ ] map
+ - [ ] map_concat
+ - [ ] map_contains_key
+ - [ ] map_entries
+ - [ ] map_from_arrays
+ - [ ] map_from_entries
+ - [ ] map_keys
+ - [ ] map_values
+ - [ ] str_to_map
+ - [ ] try_element_at
+
+### math_funcs
+ - [x] %
+ - [x] *
+ - [x] +
+ - [x] -
+ - [x] /
+ - [x] abs
+ - [x] acos
+ - [ ] acosh
+ - [x] asin
+ - [ ] asinh
+ - [x] atan
+ - [x] atan2
+ - [ ] atanh
+ - [ ] bin
+ - [ ] bround
+ - [ ] cbrt
+ - [x] ceil
+ - [x] ceiling
+ - [ ] conv
+ - [x] cos
+ - [ ] cosh
+ - [ ] cot
+ - [ ] csc
+ - [ ] degrees
+ - [ ] div
+ - [ ] e
+ - [x] exp
+ - [ ] expm1
+ - [ ] factorial
+ - [x] floor
+ - [ ] greatest
+ - [ ] hex
+ - [ ] hypot
+ - [ ] least
+ - [x] ln
+ - [ 

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-29 Thread via GitHub


andygrove commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1619208014


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,475 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [x] any
+ - [x] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [x] avg
+ - [x] bit_and
+ - [x] bit_or
+ - [x] bit_xor
+ - [x] bool_and
+ - [x] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [x] count
+ - [x] count_if
+ - [ ] count_min_sketch
+ - [x] covar_pop
+ - [x] covar_samp
+ - [x] every
+ - [x] first
+ - [x] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [x] last
+ - [x] last_value
+ - [x] max
+ - [ ] max_by
+ - [x] mean
+ - [ ] median
+ - [x] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [x] regr_avgx
+ - [x] regr_avgy
+ - [x] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [x] some
+ - [x] std
+ - [x] stddev
+ - [x] stddev_pop
+ - [x] stddev_samp
+ - [x] sum
+ - [ ] try_avg
+ - [ ] try_sum
+ - [x] var_pop
+ - [x] var_samp
+ - [x] variance
+
+### array_funcs
+ - [ ] array
+ - [ ] array_append
+ - [ ] array_compact
+ - [ ] array_contains
+ - [ ] array_distinct
+ - [ ] array_except
+ - [ ] array_insert
+ - [ ] array_intersect
+ - [ ] array_join
+ - [ ] array_max
+ - [ ] array_min
+ - [ ] array_position
+ - [ ] array_remove
+ - [ ] array_repeat
+ - [ ] array_union
+ - [ ] arrays_overlap
+ - [ ] arrays_zip
+ - [ ] flatten
+ - [ ] get
+ - [ ] sequence
+ - [ ] shuffle
+ - [ ] slice
+ - [ ] sort_array
+
+### bitwise_funcs
+ - [x] &
+ - [x] ^
+ - [ ] bit_count
+ - [ ] bit_get
+ - [ ] getbit
+ - [x] shiftright
+ - [ ] shiftrightunsigned
+ - [x] |
+ - [x] ~
+
+### collection_funcs
+ - [ ] array_size
+ - [ ] cardinality
+ - [ ] concat
+ - [x] reverse
+ - [ ] size
+
+### conditional_funcs
+ - [x] coalesce
+ - [x] if
+ - [x] ifnull
+ - [ ] nanvl
+ - [x] nullif
+ - [x] nvl
+ - [x] nvl2
+ - [ ] when
+
+### conversion_funcs
+ - [ ] bigint
+ - [ ] binary
+ - [ ] boolean
+ - [ ] cast
+ - [ ] date
+ - [ ] decimal
+ - [ ] double
+ - [ ] float
+ - [ ] int
+ - [ ] smallint
+ - [ ] string
+ - [ ] timestamp
+ - [ ] tinyint
+
+### csv_funcs
+ - [ ] from_csv
+ - [ ] schema_of_csv
+ - [ ] to_csv
+
+### datetime_funcs
+ - [ ] add_months
+ - [ ] convert_timezone
+ - [x] curdate
+ - [x] current_date
+ - [ ] current_timestamp
+ - [x] current_timezone
+ - [ ] date_add
+ - [ ] date_diff
+ - [ ] date_format
+ - [ ] date_from_unix_date
+ - [x] date_part
+ - [ ] date_sub
+ - [ ] date_trunc
+ - [ ] dateadd
+ - [ ] datediff
+ - [x] datepart
+ - [ ] day
+ - [ ] dayofmonth
+ - [ ] dayofweek
+ - [ ] dayofyear
+ - [x] extract
+ - [ ] from_unixtime
+ - [ ] from_utc_timestamp
+ - [ ] hour
+ - [ ] last_day
+ - [ ] localtimestamp
+ - [ ] make_date
+ - [ ] make_dt_interval
+ - [ ] make_interval
+ - [ ] make_timestamp
+ - [ ] make_timestamp_ltz
+ - [ ] make_timestamp_ntz
+ - [ ] make_ym_interval
+ - [ ] minute
+ - [ ] month
+ - [ ] months_between
+ - [ ] next_day
+ - [ ] now
+ - [ ] quarter
+ - [ ] second
+ - [ ] timestamp_micros
+ - [ ] timestamp_millis
+ - [ ] timestamp_seconds
+ - [ ] to_date
+ - [ ] to_timestamp
+ - [ ] to_timestamp_ltz
+ - [ ] to_timestamp_ntz
+ - [ ] to_unix_timestamp
+ - [ ] to_utc_timestamp
+ - [ ] trunc
+ - [ ] try_to_timestamp
+ - [ ] unix_date
+ - [ ] unix_micros
+ - [ ] unix_millis
+ - [ ] unix_seconds
+ - [ ] unix_timestamp
+ - [ ] weekday
+ - [ ] weekofyear
+ - [ ] year
+
+### generator_funcs
+ - [ ] explode
+ - [ ] explode_outer
+ - [ ] inline
+ - [ ] inline_outer
+ - [ ] posexplode
+ - [ ] posexplode_outer
+ - [ ] stack
+
+### hash_funcs
+ - [ ] crc32
+ - [ ] hash
+ - [x] md5
+ - [ ] sha
+ - [ ] sha1
+ - [ ] sha2
+ - [ ] xxhash64
+
+### json_funcs
+ - [ ] from_json
+ - [ ] get_json_object
+ - [ ] json_array_length
+ - [ ] json_object_keys
+ - [ ] json_tuple
+ - [ ] schema_of_json
+ - [ ] to_json
+
+### lambda_funcs
+ - [ ] aggregate
+ - [ ] array_sort
+ - [ ] exists
+ - [ ] filter
+ - [ ] forall
+ - [ ] map_filter
+ - [ ] map_zip_with
+ - [ ] reduce
+ - [ ] transform
+ - [ ] transform_keys
+ - [ ] transform_values
+ - [ ] zip_with
+
+### map_funcs
+ - [ ] element_at
+ - [ ] map
+ - [ ] map_concat
+ - [ ] map_contains_key
+ - [ ] map_entries
+ - [ ] map_from_arrays
+ - [ ] map_from_entries
+ - [ ] map_keys
+ - [ ] map_values
+ - [ ] str_to_map
+ - [ ] try_element_at
+
+### math_funcs
+ - [x] %
+ - [x] *
+ - [x] +
+ - [x] -
+ - [x] /
+ - [x] abs
+ - [x] acos
+ - [ ] acosh
+ - [x] asin
+ - [ ] asinh
+ - [x] atan
+ - [x] atan2
+ - [ ] atanh
+ - [ ] bin
+ - [ ] bround
+ - [ ] cbrt
+ - [x] ceil
+ - [x] ceiling
+ - [ ] conv
+ - [x] cos
+ - [ ] cosh
+ - [ ] cot
+ - [ ] csc
+ - [ ] degrees
+ - [ ] div
+ - [ ] e
+ - [x] exp
+ - [ ] expm1
+ - [ ] factorial
+ - [x] floor
+ - [ ] greatest
+ - [ ] hex
+ - [ ] hypot
+ - [ ] least
+ - [x] ln
+ - [ 

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-29 Thread via GitHub


andygrove commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1619197931


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,475 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [x] any
+ - [x] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [x] avg
+ - [x] bit_and
+ - [x] bit_or
+ - [x] bit_xor
+ - [x] bool_and
+ - [x] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [x] count
+ - [x] count_if
+ - [ ] count_min_sketch
+ - [x] covar_pop
+ - [x] covar_samp
+ - [x] every
+ - [x] first
+ - [x] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [x] last
+ - [x] last_value
+ - [x] max
+ - [ ] max_by
+ - [x] mean
+ - [ ] median
+ - [x] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [x] regr_avgx
+ - [x] regr_avgy
+ - [x] regr_count

Review Comment:
   I don't think that we support these expressions



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-28 Thread via GitHub


comphead commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1617434431


##
spark/src/test/scala/org/apache/comet/CometExpressionCoverageSuite.scala:
##
@@ -217,6 +325,25 @@ class CometExpressionCoverageSuite extends CometTestBase 
with AdaptiveSparkPlanH
 str shouldBe s"${getLicenseHeader()}\n# Supported Spark Expressions\n\n### 
group1\n - [x] f1\n - [ ] f2\n\n### group2\n - [x] f3\n - [ ] f4\n\n### 
group3\n - [x] f5"
   }
 
+  test("get sql function arguments") {
+// getSqlFunctionArguments("SELECT unix_seconds(TIMESTAMP('1970-01-01 
00:00:01Z'))") shouldBe Seq("TIMESTAMP('1970-01-01 00:00:01Z')")
+// getSqlFunctionArguments("SELECT decode(unhex('537061726B2053514C'), 
'UTF-8')") shouldBe Seq("unhex('537061726B2053514C')", "'UTF-8'")
+// getSqlFunctionArguments("SELECT extract(YEAR FROM TIMESTAMP '2019-08-12 
01:00:00.123456')") shouldBe Seq("'YEAR'", "TIMESTAMP '2019-08-12 
01:00:00.123456'")
+// getSqlFunctionArguments("SELECT exists(array(1, 2, 3), x -> x % 2 == 
0)") shouldBe Seq("array(1, 2, 3)")
+getSqlFunctionArguments("select to_char(454, '999')") shouldBe 
Seq("array(1, 2, 3)")

Review Comment:
   Correct, the test is ignored for now, thats why it is passed. annoying. 
Fixed that



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-27 Thread via GitHub


advancedxy commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1616589501


##
spark/src/test/scala/org/apache/comet/CometExpressionCoverageSuite.scala:
##
@@ -217,6 +325,25 @@ class CometExpressionCoverageSuite extends CometTestBase 
with AdaptiveSparkPlanH
 str shouldBe s"${getLicenseHeader()}\n# Supported Spark Expressions\n\n### 
group1\n - [x] f1\n - [ ] f2\n\n### group2\n - [x] f3\n - [ ] f4\n\n### 
group3\n - [x] f5"
   }
 
+  test("get sql function arguments") {
+// getSqlFunctionArguments("SELECT unix_seconds(TIMESTAMP('1970-01-01 
00:00:01Z'))") shouldBe Seq("TIMESTAMP('1970-01-01 00:00:01Z')")
+// getSqlFunctionArguments("SELECT decode(unhex('537061726B2053514C'), 
'UTF-8')") shouldBe Seq("unhex('537061726B2053514C')", "'UTF-8'")
+// getSqlFunctionArguments("SELECT extract(YEAR FROM TIMESTAMP '2019-08-12 
01:00:00.123456')") shouldBe Seq("'YEAR'", "TIMESTAMP '2019-08-12 
01:00:00.123456'")
+// getSqlFunctionArguments("SELECT exists(array(1, 2, 3), x -> x % 2 == 
0)") shouldBe Seq("array(1, 2, 3)")
+getSqlFunctionArguments("select to_char(454, '999')") shouldBe 
Seq("array(1, 2, 3)")

Review Comment:
   hmmm, i think it should be updated to 
   ```scala
   getSqlFunctionArguments("select to_char(454, '999')") shouldBe Seq(454, 
"999")
   ```
   ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-27 Thread via GitHub


advancedxy commented on PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#issuecomment-2134246796

   > Thanks @advancedxy I fixed the flaws you mentioned. However I'd like to 
make refactoring you recommended in followup PR, this PR getting too large for 
review
   
   Of course, sounds good to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-27 Thread via GitHub


comphead commented on PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#issuecomment-2134236220

   Thanks @advancedxy I fixed the flaws you mentioned. However I'd like to make 
refactoring you recommended in followup PR, this PR getting too large for review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-27 Thread via GitHub


comphead commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1616497338


##
spark/src/test/scala/org/apache/comet/CometExpressionCoverageSuite.scala:
##
@@ -54,16 +57,79 @@ class CometExpressionCoverageSuite extends CometTestBase 
with AdaptiveSparkPlanH
   private val valuesPattern = """(?i)FROM VALUES(.+?);""".r
   private val selectPattern = """(i?)SELECT(.+?)FROM""".r
 
+  // exclude funcs Comet has no plans to support streaming in near future
+  // like spark streaming functions, java calls
+  private val outofRoadmapFuncs =
+List("window", "session_window", "window_time", "java_method", "reflect")
+  private val sqlConf = Seq(
+"spark.comet.exec.shuffle.enabled" -> "true",
+"spark.sql.optimizer.excludedRules" -> 
"org.apache.spark.sql.catalyst.optimizer.ConstantFolding",
+"spark.sql.adaptive.optimizer.excludedRules" -> 
"org.apache.spark.sql.catalyst.optimizer.ConstantFolding")
+
+  // Tests to run manually as its syntax is different from usual or nested
+  val manualTests: Map[String, (String, String)] = Map(
+"!" -> ("select true a", "select ! true from tbl"),
+"%" -> ("select 1 a, 2 b", "select a + b from tbl"),

Review Comment:
   Corrected



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-27 Thread via GitHub


comphead commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1616497081


##
spark/src/test/scala/org/apache/comet/CometExpressionCoverageSuite.scala:
##
@@ -217,6 +325,25 @@ class CometExpressionCoverageSuite extends CometTestBase 
with AdaptiveSparkPlanH
 str shouldBe s"${getLicenseHeader()}\n# Supported Spark Expressions\n\n### 
group1\n - [x] f1\n - [ ] f2\n\n### group2\n - [x] f3\n - [ ] f4\n\n### 
group3\n - [x] f5"
   }
 
+  test("get sql function arguments") {
+// getSqlFunctionArguments("SELECT unix_seconds(TIMESTAMP('1970-01-01 
00:00:01Z'))") shouldBe Seq("TIMESTAMP('1970-01-01 00:00:01Z')")
+// getSqlFunctionArguments("SELECT decode(unhex('537061726B2053514C'), 
'UTF-8')") shouldBe Seq("unhex('537061726B2053514C')", "'UTF-8'")
+// getSqlFunctionArguments("SELECT extract(YEAR FROM TIMESTAMP '2019-08-12 
01:00:00.123456')") shouldBe Seq("'YEAR'", "TIMESTAMP '2019-08-12 
01:00:00.123456'")
+// getSqlFunctionArguments("SELECT exists(array(1, 2, 3), x -> x % 2 == 
0)") shouldBe Seq("array(1, 2, 3)")
+getSqlFunctionArguments("select to_char(454, '999')") shouldBe 
Seq("array(1, 2, 3)")

Review Comment:
   Oops, uncommented



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-25 Thread via GitHub


advancedxy commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1614532699


##
spark/src/test/scala/org/apache/comet/CometExpressionCoverageSuite.scala:
##
@@ -217,6 +325,25 @@ class CometExpressionCoverageSuite extends CometTestBase 
with AdaptiveSparkPlanH
 str shouldBe s"${getLicenseHeader()}\n# Supported Spark Expressions\n\n### 
group1\n - [x] f1\n - [ ] f2\n\n### group2\n - [x] f3\n - [ ] f4\n\n### 
group3\n - [x] f5"
   }
 
+  test("get sql function arguments") {
+// getSqlFunctionArguments("SELECT unix_seconds(TIMESTAMP('1970-01-01 
00:00:01Z'))") shouldBe Seq("TIMESTAMP('1970-01-01 00:00:01Z')")
+// getSqlFunctionArguments("SELECT decode(unhex('537061726B2053514C'), 
'UTF-8')") shouldBe Seq("unhex('537061726B2053514C')", "'UTF-8'")
+// getSqlFunctionArguments("SELECT extract(YEAR FROM TIMESTAMP '2019-08-12 
01:00:00.123456')") shouldBe Seq("'YEAR'", "TIMESTAMP '2019-08-12 
01:00:00.123456'")
+// getSqlFunctionArguments("SELECT exists(array(1, 2, 3), x -> x % 2 == 
0)") shouldBe Seq("array(1, 2, 3)")
+getSqlFunctionArguments("select to_char(454, '999')") shouldBe 
Seq("array(1, 2, 3)")

Review Comment:
   this test is wrong? the arguments are not correct.



##
spark/src/test/scala/org/apache/comet/CometExpressionCoverageSuite.scala:
##
@@ -54,16 +57,79 @@ class CometExpressionCoverageSuite extends CometTestBase 
with AdaptiveSparkPlanH
   private val valuesPattern = """(?i)FROM VALUES(.+?);""".r
   private val selectPattern = """(i?)SELECT(.+?)FROM""".r
 
+  // exclude funcs Comet has no plans to support streaming in near future
+  // like spark streaming functions, java calls
+  private val outofRoadmapFuncs =
+List("window", "session_window", "window_time", "java_method", "reflect")
+  private val sqlConf = Seq(
+"spark.comet.exec.shuffle.enabled" -> "true",
+"spark.sql.optimizer.excludedRules" -> 
"org.apache.spark.sql.catalyst.optimizer.ConstantFolding",
+"spark.sql.adaptive.optimizer.excludedRules" -> 
"org.apache.spark.sql.catalyst.optimizer.ConstantFolding")
+
+  // Tests to run manually as its syntax is different from usual or nested
+  val manualTests: Map[String, (String, String)] = Map(
+"!" -> ("select true a", "select ! true from tbl"),
+"%" -> ("select 1 a, 2 b", "select a + b from tbl"),

Review Comment:
   the mapped should be `select a % b from the tbl`?



##
spark/src/test/scala/org/apache/comet/CometExpressionCoverageSuite.scala:
##
@@ -54,16 +57,79 @@ class CometExpressionCoverageSuite extends CometTestBase 
with AdaptiveSparkPlanH
   private val valuesPattern = """(?i)FROM VALUES(.+?);""".r
   private val selectPattern = """(i?)SELECT(.+?)FROM""".r
 
+  // exclude funcs Comet has no plans to support streaming in near future
+  // like spark streaming functions, java calls
+  private val outofRoadmapFuncs =
+List("window", "session_window", "window_time", "java_method", "reflect")
+  private val sqlConf = Seq(
+"spark.comet.exec.shuffle.enabled" -> "true",
+"spark.sql.optimizer.excludedRules" -> 
"org.apache.spark.sql.catalyst.optimizer.ConstantFolding",
+"spark.sql.adaptive.optimizer.excludedRules" -> 
"org.apache.spark.sql.catalyst.optimizer.ConstantFolding")
+
+  // Tests to run manually as its syntax is different from usual or nested
+  val manualTests: Map[String, (String, String)] = Map(
+"!" -> ("select true a", "select ! true from tbl"),
+"%" -> ("select 1 a, 2 b", "select a + b from tbl"),

Review Comment:
   Or maybe you can just generate the binary operators and its mappings in a 
pragmatic way?  Such as:
   ```scala
   Seq("%", "&", ..., "|").map(x => x -> ("select 1 a, 2 b", s"select a $x b 
from tbl")
   ```



##
spark/src/test/scala/org/apache/comet/CometExpressionCoverageSuite.scala:
##
@@ -116,20 +182,62 @@ class CometExpressionCoverageSuite extends CometTestBase 
with AdaptiveSparkPlanH
   // ConstantFolding is a operator optimization rule in Catalyst 
that replaces expressions
   // that can be statically evaluated with their equivalent 
literal values.
   dfMessage = runDatafusionCli(q)
-  testSingleLineQuery(
-"select 'dummy' x",
-s"${q.dropRight(1)}, x from tbl",
-excludedOptimizerRules =
-  
Some("org.apache.spark.sql.catalyst.optimizer.ConstantFolding"))
+
+  manualTests.get(func.name) match {
+// the test is manual query
+case Some(test) => testSingleLineQuery(test._1, test._2, 
sqlConf = sqlConf)
+case None =>
+  // extract function arguments as a sql text
+  // example:
+  // cos(0) -> 0
+  // explode_outer(array(10, 20)) -> array(10, 20)
+  val args = getSqlFunctionArguments(q.dropRight(1))
+  val 

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-24 Thread via GitHub


comphead commented on PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#issuecomment-2130409303

   @andygrove @advancedxy I fixed the test, implementing extra parsing and 
manual small tests if the parsing is complicated. I hope now we have better 
picture. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-23 Thread via GitHub


comphead commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1612223812


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,477 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [ ] any
+ - [ ] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [ ] avg
+ - [ ] bit_and
+ - [ ] bit_or
+ - [ ] bit_xor
+ - [ ] bool_and
+ - [ ] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [ ] count
+ - [ ] count_if
+ - [ ] count_min_sketch
+ - [ ] covar_pop
+ - [ ] covar_samp
+ - [ ] every
+ - [ ] first
+ - [ ] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [ ] last
+ - [ ] last_value
+ - [ ] max
+ - [ ] max_by
+ - [ ] mean
+ - [ ] median
+ - [ ] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [ ] regr_avgx
+ - [ ] regr_avgy
+ - [ ] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [ ] some
+ - [ ] std
+ - [ ] stddev
+ - [ ] stddev_pop
+ - [ ] stddev_samp
+ - [ ] sum
+ - [ ] try_avg
+ - [ ] try_sum
+ - [ ] var_pop
+ - [ ] var_samp
+ - [ ] variance
+
+### array_funcs
+ - [ ] array
+ - [ ] array_append
+ - [ ] array_compact
+ - [ ] array_contains
+ - [ ] array_distinct
+ - [ ] array_except
+ - [ ] array_insert
+ - [ ] array_intersect
+ - [ ] array_join
+ - [ ] array_max
+ - [ ] array_min
+ - [ ] array_position
+ - [ ] array_remove
+ - [ ] array_repeat
+ - [ ] array_union
+ - [ ] arrays_overlap
+ - [ ] arrays_zip
+ - [ ] flatten
+ - [x] get
+ - [ ] sequence
+ - [ ] shuffle
+ - [ ] slice
+ - [ ] sort_array
+
+### bitwise_funcs
+ - [x] &
+ - [x] ^
+ - [ ] bit_count
+ - [ ] bit_get
+ - [ ] getbit
+ - [x] shiftright
+ - [ ] shiftrightunsigned
+ - [x] |
+ - [ ] ~
+
+### collection_funcs
+ - [ ] array_size
+ - [ ] cardinality
+ - [ ] concat
+ - [x] reverse
+ - [ ] size
+
+### conditional_funcs
+ - [x] coalesce
+ - [x] if
+ - [ ] ifnull
+ - [ ] nanvl
+ - [x] nullif
+ - [ ] nvl
+ - [x] nvl2
+ - [x] when
+
+### conversion_funcs
+ - [ ] bigint
+ - [ ] binary
+ - [ ] boolean
+ - [x] cast
+ - [ ] date
+ - [ ] decimal
+ - [ ] double
+ - [ ] float
+ - [ ] int
+ - [ ] smallint
+ - [ ] string
+ - [ ] timestamp
+ - [ ] tinyint
+
+### csv_funcs
+ - [ ] from_csv
+ - [ ] schema_of_csv
+ - [ ] to_csv
+
+### datetime_funcs
+ - [ ] add_months
+ - [ ] convert_timezone
+ - [x] curdate
+ - [x] current_date
+ - [ ] current_timestamp
+ - [x] current_timezone
+ - [ ] date_add
+ - [ ] date_diff
+ - [ ] date_format
+ - [ ] date_from_unix_date
+ - [x] date_part
+ - [ ] date_sub
+ - [ ] date_trunc
+ - [ ] dateadd
+ - [ ] datediff
+ - [x] datepart
+ - [ ] day
+ - [ ] dayofmonth
+ - [ ] dayofweek
+ - [ ] dayofyear
+ - [x] extract
+ - [ ] from_unixtime
+ - [ ] from_utc_timestamp
+ - [ ] hour
+ - [ ] last_day
+ - [ ] localtimestamp
+ - [ ] make_date
+ - [ ] make_dt_interval
+ - [ ] make_interval
+ - [ ] make_timestamp
+ - [ ] make_timestamp_ltz
+ - [ ] make_timestamp_ntz
+ - [ ] make_ym_interval
+ - [ ] minute
+ - [ ] month
+ - [ ] months_between
+ - [ ] next_day
+ - [ ] now
+ - [ ] quarter
+ - [ ] second
+ - [ ] timestamp_micros
+ - [ ] timestamp_millis
+ - [ ] timestamp_seconds
+ - [x] to_date

Review Comment:
   I think the problem here when Spark evaluates function of literal it skips 
Comet... the test above tests the function of the column and Comet enabled. 
Thinking how to fix it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-23 Thread via GitHub


comphead commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1612087281


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,477 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [ ] any
+ - [ ] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [ ] avg
+ - [ ] bit_and
+ - [ ] bit_or
+ - [ ] bit_xor
+ - [ ] bool_and
+ - [ ] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [ ] count
+ - [ ] count_if
+ - [ ] count_min_sketch
+ - [ ] covar_pop
+ - [ ] covar_samp
+ - [ ] every
+ - [ ] first
+ - [ ] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [ ] last
+ - [ ] last_value
+ - [ ] max
+ - [ ] max_by
+ - [ ] mean
+ - [ ] median
+ - [ ] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [ ] regr_avgx
+ - [ ] regr_avgy
+ - [ ] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [ ] some
+ - [ ] std
+ - [ ] stddev
+ - [ ] stddev_pop
+ - [ ] stddev_samp
+ - [ ] sum
+ - [ ] try_avg
+ - [ ] try_sum
+ - [ ] var_pop
+ - [ ] var_samp
+ - [ ] variance
+
+### array_funcs
+ - [ ] array
+ - [ ] array_append
+ - [ ] array_compact
+ - [ ] array_contains
+ - [ ] array_distinct
+ - [ ] array_except
+ - [ ] array_insert
+ - [ ] array_intersect
+ - [ ] array_join
+ - [ ] array_max
+ - [ ] array_min
+ - [ ] array_position
+ - [ ] array_remove
+ - [ ] array_repeat
+ - [ ] array_union
+ - [ ] arrays_overlap
+ - [ ] arrays_zip
+ - [ ] flatten
+ - [x] get
+ - [ ] sequence
+ - [ ] shuffle
+ - [ ] slice
+ - [ ] sort_array
+
+### bitwise_funcs
+ - [x] &
+ - [x] ^
+ - [ ] bit_count
+ - [ ] bit_get
+ - [ ] getbit
+ - [x] shiftright
+ - [ ] shiftrightunsigned
+ - [x] |
+ - [ ] ~
+
+### collection_funcs
+ - [ ] array_size
+ - [ ] cardinality
+ - [ ] concat
+ - [x] reverse
+ - [ ] size
+
+### conditional_funcs
+ - [x] coalesce
+ - [x] if
+ - [ ] ifnull
+ - [ ] nanvl
+ - [x] nullif
+ - [ ] nvl
+ - [x] nvl2
+ - [x] when
+
+### conversion_funcs
+ - [ ] bigint
+ - [ ] binary
+ - [ ] boolean
+ - [x] cast
+ - [ ] date
+ - [ ] decimal
+ - [ ] double
+ - [ ] float
+ - [ ] int
+ - [ ] smallint
+ - [ ] string
+ - [ ] timestamp
+ - [ ] tinyint
+
+### csv_funcs
+ - [ ] from_csv
+ - [ ] schema_of_csv
+ - [ ] to_csv
+
+### datetime_funcs
+ - [ ] add_months
+ - [ ] convert_timezone
+ - [x] curdate
+ - [x] current_date
+ - [ ] current_timestamp
+ - [x] current_timezone
+ - [ ] date_add
+ - [ ] date_diff
+ - [ ] date_format
+ - [ ] date_from_unix_date
+ - [x] date_part
+ - [ ] date_sub
+ - [ ] date_trunc
+ - [ ] dateadd
+ - [ ] datediff
+ - [x] datepart
+ - [ ] day
+ - [ ] dayofmonth
+ - [ ] dayofweek
+ - [ ] dayofyear
+ - [x] extract
+ - [ ] from_unixtime
+ - [ ] from_utc_timestamp
+ - [ ] hour
+ - [ ] last_day
+ - [ ] localtimestamp
+ - [ ] make_date
+ - [ ] make_dt_interval
+ - [ ] make_interval
+ - [ ] make_timestamp
+ - [ ] make_timestamp_ltz
+ - [ ] make_timestamp_ntz
+ - [ ] make_ym_interval
+ - [ ] minute
+ - [ ] month
+ - [ ] months_between
+ - [ ] next_day
+ - [ ] now
+ - [ ] quarter
+ - [ ] second
+ - [ ] timestamp_micros
+ - [ ] timestamp_millis
+ - [ ] timestamp_seconds
+ - [x] to_date

Review Comment:
   I'm checking that, I just run the test manually and it failed as you 
mentioned
   
   ```
 test("to_date") {
   Seq(false, true).foreach { dictionary =>
 withSQLConf(
   "parquet.enable.dictionary" -> dictionary.toString,
   "spark.comet.exec.shuffle.enabled" -> "true",
 ) {
   val table = "test"
   withTable(table) {
 sql(s"create table $table(col string) using parquet")
 sql(s"insert into $table VALUES ('2009-07-30 04:17:52')")
 checkSparkAnswerAndOperator(s"SELECT to_date(col) FROM $table")
   }
 }
   }
 }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-23 Thread via GitHub


andygrove commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1612070732


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,477 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [ ] any
+ - [ ] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [ ] avg
+ - [ ] bit_and
+ - [ ] bit_or
+ - [ ] bit_xor
+ - [ ] bool_and
+ - [ ] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [ ] count
+ - [ ] count_if
+ - [ ] count_min_sketch
+ - [ ] covar_pop
+ - [ ] covar_samp
+ - [ ] every
+ - [ ] first
+ - [ ] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [ ] last
+ - [ ] last_value
+ - [ ] max
+ - [ ] max_by
+ - [ ] mean
+ - [ ] median
+ - [ ] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [ ] regr_avgx
+ - [ ] regr_avgy
+ - [ ] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [ ] some
+ - [ ] std
+ - [ ] stddev
+ - [ ] stddev_pop
+ - [ ] stddev_samp
+ - [ ] sum
+ - [ ] try_avg
+ - [ ] try_sum
+ - [ ] var_pop
+ - [ ] var_samp
+ - [ ] variance
+
+### array_funcs
+ - [ ] array
+ - [ ] array_append
+ - [ ] array_compact
+ - [ ] array_contains
+ - [ ] array_distinct
+ - [ ] array_except
+ - [ ] array_insert
+ - [ ] array_intersect
+ - [ ] array_join
+ - [ ] array_max
+ - [ ] array_min
+ - [ ] array_position
+ - [ ] array_remove
+ - [ ] array_repeat
+ - [ ] array_union
+ - [ ] arrays_overlap
+ - [ ] arrays_zip
+ - [ ] flatten
+ - [x] get
+ - [ ] sequence
+ - [ ] shuffle
+ - [ ] slice
+ - [ ] sort_array
+
+### bitwise_funcs
+ - [x] &
+ - [x] ^
+ - [ ] bit_count
+ - [ ] bit_get
+ - [ ] getbit
+ - [x] shiftright
+ - [ ] shiftrightunsigned
+ - [x] |
+ - [ ] ~
+
+### collection_funcs
+ - [ ] array_size
+ - [ ] cardinality
+ - [ ] concat
+ - [x] reverse
+ - [ ] size
+
+### conditional_funcs
+ - [x] coalesce
+ - [x] if
+ - [ ] ifnull
+ - [ ] nanvl
+ - [x] nullif
+ - [ ] nvl
+ - [x] nvl2
+ - [x] when
+
+### conversion_funcs
+ - [ ] bigint
+ - [ ] binary
+ - [ ] boolean
+ - [x] cast
+ - [ ] date
+ - [ ] decimal
+ - [ ] double
+ - [ ] float
+ - [ ] int
+ - [ ] smallint
+ - [ ] string
+ - [ ] timestamp
+ - [ ] tinyint
+
+### csv_funcs
+ - [ ] from_csv
+ - [ ] schema_of_csv
+ - [ ] to_csv
+
+### datetime_funcs
+ - [ ] add_months
+ - [ ] convert_timezone
+ - [x] curdate
+ - [x] current_date
+ - [ ] current_timestamp
+ - [x] current_timezone
+ - [ ] date_add
+ - [ ] date_diff
+ - [ ] date_format
+ - [ ] date_from_unix_date
+ - [x] date_part
+ - [ ] date_sub
+ - [ ] date_trunc
+ - [ ] dateadd
+ - [ ] datediff
+ - [x] datepart
+ - [ ] day
+ - [ ] dayofmonth
+ - [ ] dayofweek
+ - [ ] dayofyear
+ - [x] extract
+ - [ ] from_unixtime
+ - [ ] from_utc_timestamp
+ - [ ] hour
+ - [ ] last_day
+ - [ ] localtimestamp
+ - [ ] make_date
+ - [ ] make_dt_interval
+ - [ ] make_interval
+ - [ ] make_timestamp
+ - [ ] make_timestamp_ltz
+ - [ ] make_timestamp_ntz
+ - [ ] make_ym_interval
+ - [ ] minute
+ - [ ] month
+ - [ ] months_between
+ - [ ] next_day
+ - [ ] now
+ - [ ] quarter
+ - [ ] second
+ - [ ] timestamp_micros
+ - [ ] timestamp_millis
+ - [ ] timestamp_seconds
+ - [x] to_date

Review Comment:
   With Spark 3.4, `to_date` with no format arg translates to `cast(expr as 
date)`, which we do not currently support (but will soon - there is PR pending) 
and Comet cannot run natively because `Unsupported cast from StringType to 
DateType`.
   
   When a format arg is supplied, Comet cannot run natively because 
`gettimestamp is not supported`.
   
   Do you know why this doc says that it is supported?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub


advancedxy commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1609204351


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,477 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [ ] any
+ - [ ] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [ ] avg
+ - [ ] bit_and
+ - [ ] bit_or
+ - [ ] bit_xor
+ - [ ] bool_and
+ - [ ] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [ ] count
+ - [ ] count_if
+ - [ ] count_min_sketch
+ - [ ] covar_pop
+ - [ ] covar_samp
+ - [ ] every
+ - [ ] first
+ - [ ] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [ ] last
+ - [ ] last_value
+ - [ ] max
+ - [ ] max_by
+ - [ ] mean
+ - [ ] median
+ - [ ] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [ ] regr_avgx
+ - [ ] regr_avgy
+ - [ ] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [ ] some
+ - [ ] std
+ - [ ] stddev
+ - [ ] stddev_pop
+ - [ ] stddev_samp

Review Comment:
   Yeah.. Maybe we need to enable Comet Shuffle to re-run the 
CometExpressionCoverageSuite.



##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,477 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [ ] any
+ - [ ] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [ ] avg
+ - [ ] bit_and
+ - [ ] bit_or
+ - [ ] bit_xor
+ - [ ] bool_and
+ - [ ] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [ ] count
+ - [ ] count_if
+ - [ ] count_min_sketch
+ - [ ] covar_pop
+ - [ ] covar_samp
+ - [ ] every
+ - [ ] first
+ - [ ] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [ ] last
+ - [ ] last_value
+ - [ ] max
+ - [ ] max_by
+ - [ ] mean
+ - [ ] median
+ - [ ] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [ ] regr_avgx
+ - [ ] regr_avgy
+ - [ ] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [ ] some
+ - [ ] std
+ - [ ] stddev
+ - [ ] stddev_pop
+ - [ ] stddev_samp
+ - [ ] sum
+ - [ ] try_avg
+ - [ ] try_sum
+ - [ ] var_pop
+ - [ ] var_samp
+ - [ ] variance
+
+### array_funcs
+ - [ ] array
+ - [ ] array_append
+ - [ ] array_compact
+ - [ ] array_contains
+ - [ ] array_distinct
+ - [ ] array_except
+ - [ ] array_insert
+ - [ ] array_intersect
+ - [ ] array_join
+ - [ ] array_max
+ - [ ] array_min
+ - [ ] array_position
+ - [ ] array_remove
+ - [ ] array_repeat
+ - [ ] array_union
+ - [ ] arrays_overlap
+ - [ ] arrays_zip
+ - [ ] flatten
+ - [x] get
+ - [ ] sequence
+ - [ ] shuffle
+ - [ ] slice
+ - [ ] sort_array
+
+### bitwise_funcs
+ - [x] &
+ - [x] ^
+ - [ ] bit_count
+ - [ ] bit_get
+ - [ ] getbit
+ - [x] shiftright
+ - [ ] shiftrightunsigned
+ - [x] |
+ - [ ] ~
+
+### collection_funcs
+ - [ ] array_size
+ - [ ] cardinality
+ - [ ] concat
+ - [x] reverse
+ - [ ] size
+
+### conditional_funcs
+ - [x] coalesce
+ - [x] if
+ - [ ] ifnull
+ - [ ] nanvl
+ - [x] nullif
+ - [ ] nvl

Review Comment:
   hmm, it should be supported? It's essential the same as `coalesce`, which is 
replaced during analysis phase.
   
   Maybe we should file an issue to track this kind of problem.



##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,477 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [ ] any
+ - [ ] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [ ] avg
+ - [ ] bit_and
+ - [ ] bit_or
+ - [ ] bit_xor
+ - [ ] bool_and
+ - [ ] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [ ] count
+ - [ ] count_if
+ - [ ] count_min_sketch
+ - [ ] covar_pop
+ - [ ] covar_samp
+ - [ ] every
+ - [ ] first
+ - [ ] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [ ] last
+ - [ ] last_value
+ - [ ] max
+ - [ ] max_by
+ - [ ] mean
+ - [ ] median
+ - [ ] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [ ] regr_avgx
+ - [ ] regr_avgy
+ - [ ] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [ ] some
+ - [ ] std
+ - [ ] stddev
+ - [ ] stddev_pop
+ - [ ] stddev_samp
+ - [ ] sum
+ - [ ] try_avg
+ - [ ] try_sum
+ - [ ] var_pop
+ - [ ] var_samp
+ - [ ] variance
+
+### array_funcs
+ - [ ] array
+ - [ ] array_append
+ - [ ] array_compact
+ - [ ] array_contains
+ - [ ] array_distinct
+ - [ ] array_except
+ - [ ] array_insert
+ - [ ] array_intersect
+ - [ ] array_join
+ - [ ] array_max
+ - [ ] array_min
+ - [ ] array_position
+ - [ ] array_remove
+ - [ ] array_repeat
+ - [ ] array_union
+ - [ ] arrays_overlap
+ - [ ] arrays_zip
+ - [ ] flatten
+ - [x] get
+ - [ ] sequence
+ - [ ] shuffle
+ - [ ] slice

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub


comphead commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608634043


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,477 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [ ] any
+ - [ ] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [ ] avg
+ - [ ] bit_and
+ - [ ] bit_or
+ - [ ] bit_xor
+ - [ ] bool_and
+ - [ ] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [ ] count
+ - [ ] count_if
+ - [ ] count_min_sketch
+ - [ ] covar_pop
+ - [ ] covar_samp
+ - [ ] every
+ - [ ] first
+ - [ ] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [ ] last
+ - [ ] last_value
+ - [ ] max
+ - [ ] max_by
+ - [ ] mean
+ - [ ] median
+ - [ ] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [ ] regr_avgx
+ - [ ] regr_avgy
+ - [ ] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [ ] some
+ - [ ] std
+ - [ ] stddev
+ - [ ] stddev_pop
+ - [ ] stddev_samp

Review Comment:
   good point



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub


viirya commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608605897


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,477 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [ ] any
+ - [ ] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [ ] avg
+ - [ ] bit_and
+ - [ ] bit_or
+ - [ ] bit_xor
+ - [ ] bool_and
+ - [ ] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [ ] count
+ - [ ] count_if
+ - [ ] count_min_sketch
+ - [ ] covar_pop
+ - [ ] covar_samp
+ - [ ] every
+ - [ ] first
+ - [ ] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [ ] last
+ - [ ] last_value
+ - [ ] max
+ - [ ] max_by
+ - [ ] mean
+ - [ ] median
+ - [ ] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [ ] regr_avgx
+ - [ ] regr_avgy
+ - [ ] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [ ] some
+ - [ ] std
+ - [ ] stddev
+ - [ ] stddev_pop
+ - [ ] stddev_samp

Review Comment:
   Have you enabled Comet shuffle? The upper `HashAggregate` cannot be 
translated to `CometHashAggregate` because Comet shuffle is not enabled.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub


viirya commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608605897


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,477 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [ ] any
+ - [ ] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [ ] avg
+ - [ ] bit_and
+ - [ ] bit_or
+ - [ ] bit_xor
+ - [ ] bool_and
+ - [ ] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [ ] count
+ - [ ] count_if
+ - [ ] count_min_sketch
+ - [ ] covar_pop
+ - [ ] covar_samp
+ - [ ] every
+ - [ ] first
+ - [ ] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [ ] last
+ - [ ] last_value
+ - [ ] max
+ - [ ] max_by
+ - [ ] mean
+ - [ ] median
+ - [ ] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [ ] regr_avgx
+ - [ ] regr_avgy
+ - [ ] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [ ] some
+ - [ ] std
+ - [ ] stddev
+ - [ ] stddev_pop
+ - [ ] stddev_samp

Review Comment:
   Have you enabled Comet shuffle?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub


comphead commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608598651


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,477 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [ ] any
+ - [ ] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [ ] avg
+ - [ ] bit_and
+ - [ ] bit_or
+ - [ ] bit_xor
+ - [ ] bool_and
+ - [ ] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [ ] count
+ - [ ] count_if
+ - [ ] count_min_sketch
+ - [ ] covar_pop
+ - [ ] covar_samp
+ - [ ] every
+ - [ ] first
+ - [ ] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [ ] last
+ - [ ] last_value
+ - [ ] max
+ - [ ] max_by
+ - [ ] mean
+ - [ ] median
+ - [ ] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [ ] regr_avgx
+ - [ ] regr_avgy
+ - [ ] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [ ] some
+ - [ ] std
+ - [ ] stddev
+ - [ ] stddev_pop
+ - [ ] stddev_samp

Review Comment:
   @viirya @andygrove @kazuyukitanimura @advancedxy do you guys think this is a 
sign of the expression not natively supported?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub


comphead commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608597228


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,477 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [ ] any
+ - [ ] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [ ] avg
+ - [ ] bit_and
+ - [ ] bit_or
+ - [ ] bit_xor
+ - [ ] bool_and
+ - [ ] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [ ] count
+ - [ ] count_if
+ - [ ] count_min_sketch
+ - [ ] covar_pop
+ - [ ] covar_samp
+ - [ ] every
+ - [ ] first
+ - [ ] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [ ] last
+ - [ ] last_value
+ - [ ] max
+ - [ ] max_by
+ - [ ] mean
+ - [ ] median
+ - [ ] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [ ] regr_avgx
+ - [ ] regr_avgy
+ - [ ] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [ ] some
+ - [ ] std
+ - [ ] stddev
+ - [ ] stddev_pop
+ - [ ] stddev_samp

Review Comment:
   I ran the test manually 
   
   ```
 test("sttdev") {
   Seq(false, true).foreach { dictionary =>
 withSQLConf("parquet.enable.dictionary" -> dictionary.toString) {
   val table = "test"
   withTable(table) {
 sql(s"create table $table(col int) using parquet")
 sql(s"insert into $table VALUES (1), (2), (3)")
 checkSparkAnswerAndOperator(s"SELECT stddev_pop(col) FROM $table")
   }
 }
   }
 }
   ```
   and it fails `Expected only Comet native operators, but found HashAggregate.`
   
   the physical plan is 
   ```
   == Physical Plan ==
   AdaptiveSparkPlan isFinalPlan=false
   +- HashAggregate(keys=[], functions=[stddev_pop(cast(col#0 as double))])
  +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=118]
 +- CometHashAggregate [col#0], Partial, [partial_stddev_pop(cast(col#0 
as double))]
+- CometScan parquet [col#0] Batched: true, DataFilters: [], 
Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/stddev], 
PartitionFilters: [], PushedFilters: [], ReadSchema: struct
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub


comphead commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608510461


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,477 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [ ] any
+ - [ ] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [ ] avg
+ - [ ] bit_and
+ - [ ] bit_or
+ - [ ] bit_xor
+ - [ ] bool_and
+ - [ ] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [ ] count
+ - [ ] count_if
+ - [ ] count_min_sketch
+ - [ ] covar_pop
+ - [ ] covar_samp
+ - [ ] every
+ - [ ] first
+ - [ ] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [ ] last
+ - [ ] last_value
+ - [ ] max
+ - [ ] max_by
+ - [ ] mean
+ - [ ] median
+ - [ ] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [ ] regr_avgx
+ - [ ] regr_avgy
+ - [ ] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [ ] some
+ - [ ] std
+ - [ ] stddev
+ - [ ] stddev_pop
+ - [ ] stddev_samp

Review Comment:
   I'll double check that but the `spark_builtin_expr_coverage.txt` shows 
   ```
   Unsupported: Expected only Comet native operators but found Spark fallback
   ```
   for both of them. I'll verify if its a test problem or not



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub


andygrove commented on PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#issuecomment-2122853631

   This is very cool @comphead but it looks like it is not detecting any of the 
aggregate functions that we support?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub


andygrove commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608498362


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,477 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [ ] any
+ - [ ] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [ ] avg
+ - [ ] bit_and
+ - [ ] bit_or
+ - [ ] bit_xor
+ - [ ] bool_and
+ - [ ] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [ ] count
+ - [ ] count_if
+ - [ ] count_min_sketch
+ - [ ] covar_pop
+ - [ ] covar_samp
+ - [ ] every
+ - [ ] first
+ - [ ] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [ ] last
+ - [ ] last_value
+ - [ ] max
+ - [ ] max_by
+ - [ ] mean
+ - [ ] median
+ - [ ] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [ ] regr_avgx
+ - [ ] regr_avgy
+ - [ ] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [ ] some
+ - [ ] std
+ - [ ] stddev
+ - [ ] stddev_pop
+ - [ ] stddev_samp

Review Comment:
   These are supported according to 
https://datafusion.apache.org/comet/user-guide/expressions.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub


andygrove commented on code in PR #455:
URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608493460


##
docs/spark_expressions_support.md:
##
@@ -0,0 +1,477 @@
+
+
+# Supported Spark Expressions
+
+### agg_funcs
+ - [ ] any
+ - [ ] any_value
+ - [ ] approx_count_distinct
+ - [ ] approx_percentile
+ - [ ] array_agg
+ - [ ] avg
+ - [ ] bit_and
+ - [ ] bit_or
+ - [ ] bit_xor
+ - [ ] bool_and
+ - [ ] bool_or
+ - [ ] collect_list
+ - [ ] collect_set
+ - [ ] corr
+ - [ ] count
+ - [ ] count_if
+ - [ ] count_min_sketch
+ - [ ] covar_pop
+ - [ ] covar_samp
+ - [ ] every
+ - [ ] first
+ - [ ] first_value
+ - [ ] grouping
+ - [ ] grouping_id
+ - [ ] histogram_numeric
+ - [ ] kurtosis
+ - [ ] last
+ - [ ] last_value
+ - [ ] max
+ - [ ] max_by
+ - [ ] mean
+ - [ ] median
+ - [ ] min
+ - [ ] min_by
+ - [ ] mode
+ - [ ] percentile
+ - [ ] percentile_approx
+ - [ ] regr_avgx
+ - [ ] regr_avgy
+ - [ ] regr_count
+ - [ ] regr_intercept
+ - [ ] regr_r2
+ - [ ] regr_slope
+ - [ ] regr_sxx
+ - [ ] regr_sxy
+ - [ ] regr_syy
+ - [ ] skewness
+ - [ ] some
+ - [ ] std
+ - [ ] stddev
+ - [ ] stddev_pop
+ - [ ] stddev_samp
+ - [ ] sum
+ - [ ] try_avg
+ - [ ] try_sum
+ - [ ] var_pop
+ - [ ] var_samp

Review Comment:
   According to 
https://datafusion.apache.org/comet/user-guide/expressions.html, we do support 
`VariancePop` and `VarianceSamp`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-20 Thread via GitHub


comphead opened a new pull request, #455:
URL: https://github.com/apache/datafusion-comet/pull/455

   ## Which issue does this PR close?
   
   
   
   Closes #.
   
   ## Rationale for this change
   
   
   
   ## What changes are included in this PR?
   
   
   
   ## How are these changes tested?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org