This is an automated email from the ASF dual-hosted git repository.
alamb pushed a change to branch dependabot/cargo/main/itertools-0.13
in repository https://gitbox.apache.org/repos/asf/datafusion.git
from 0bb0f3508d Avoid deprecated API
add 70744d59d5 Convert variance sample to udaf (#10713)
add ece7ae5eca Improve docs and fix a typo (#10798)
add 9b1a6f34ab fix: `array_slice` and `array_element` panicked on empty
args (#10804)
add 9845e6eb58 Avoid the usage of intermediate ScalarValue to improve
performance of extracting statistics from parquet files (#10711)
add df5dab77c2 SMJ: Add more tests and improve comments (#10784)
add 70256baefa feat: Update Parquet row filtering to handle type coercion
(#10716)
add c5cefa8eb1 Handle EmptyRelation during SQL unparsing (#10803)
add c580ef4e69 Document Committer and PMC process (#10778)
add 053b53ef7b Int64 as default type for make_array function empty or null
case (#10790)
add 82b4d1bcc4 Split `SessionState` into its own module (#10794)
add fca1df945e Add `StreamProvider` for configuring `StreamTable` (#10600)
add 7c91627c88 Bench: Add `PREFER_HASH_JOIN` env variable (#10809)
add 1a26eca25a Add `ParquetAccessPlan`, unify RowGroup selection and
PagePruning selection (#10738)
add d97ac8d65c Fix `ScalarUDFImpl::propagate_constraints` doc (#10810)
add 089b232304 Extract Parquet statistics from `Interval` column (#10801)
add f05458678b build(deps): upgrade sqlparser to 0.47.0 (#10392)
add 586241f06c refactor and simplify unparser (#10811)
add 8fcb3e4b8a Minor: Remove code duplication in `memory_limit` derivation
for datafusion-cli (#10814)
add cb9068ce56 build(deps): update Arrow/Parquet to `52.0`, object-store
to `0.10` (#10765)
add 5bb6b35627 fix: use total ordering in the min & max accumulator for
floats (#10627)
add c012e9ccd6 chore: Prepare 39.0.0-rc1 (#10828)
add e3af174efe Remove expr_fn::sum and replace them with function stub
(#10816)
add cfbfc03719 Debug print as many fields as possible for `SessionState`
(#10818)
add 90f89e08ab Optimize Parquet RowGroup pruning, update
StatisticsExtractor API (#10802)
add 6b7021479c Remove Built-in sum and Rename to lowercase `sum` (#10831)
add 8b1f06b3e7 Convert `stddev` and `stddev_pop` to UDAF (#10834)
add 24a08465e1 Introduce expr builder for aggregate function (#10560)
add ad0dc2fb43 chore: Improve change log generator (#10841)
add 9503456388 Support user defined `ParquetAccessPlan` in `ParquetExec`,
validation to `ParquetAccessPlan::select` (#10813)
add e8fdc09c6e Convert `VariancePopulation` to UDAF (#10836)
add 3773fb7fb5 Convert `approx_median` to UDAF (#10840)
add f6450e247c MINOR: use workspace deps in proto-common (#10848)
add e094f94d2a fix: Support double quotes in `date_part` (#10833)
add 0bd84e1178 Add Window::try_new_from_schema (#10850)
add 29fda881ae Minor: Clarify `SessionContext::state` docs (#10847)
add 5912025591 Add support for reading CSV files with comments (#10467)
add 2f860cb220 Minor: Update SIGMOD paper reference url (#10860)
add f37f1a5b96 docs(variance): Correct typos in comments (#10844)
add 11e143c391 Convert approx_distinct to UDAF (#10851)
add 351600fd85 add proto-common crate (#10858)
add 1e37066793 Add missing code close tick in LiteralGuarantee docs
(#10859)
add 76f5110366 Implement TPCH substrait integration teset, support tpch_1
(#10842)
add 47026a2a3d Remove unecessary passing around of `suffix: &str` in
`pruning.rs`'s `RequiredColumns` (#10863)
add 99063ca33c chore: Reuse DFSSchema::datatype_is_logically_equal method
(#10867)
add f554c9fdf1 Bump braces in /datafusion/wasmtest/datafusion-wasm-app
(#10865)
add d84d75a23e Docs: Add `unnest` to SQL Reference (#10839)
add 1b89da4455 Support correct output column names and struct field names
when consuming/producing Substrait (#10829)
add 0ec292f454 Make Logical Plans more readable by removing extra aliases
(#10832)
add c50f0dc6ef Minor: Improve `ListingTable` documentation (#10854)
add 97ea05c0f6 Extending join fuzz tests to support join filtering (#10728)
add 9b3b80510e replace and(.., not(...)) with and_not(..) (#10885)
add 7f6fc07577 Disabling test for semi join with filters (#10887)
add 73381fe357 Minor: Update `min_statistics` and `max_statistics` to be
helpers, update docs (#10866)
add 87d826703b chore: remove interval test (#10888)
add dfdda7cb04 fix: Ignore nullability of list elements when consuming
Substrait (#10874)
add 908a3a1d2f Minor: SMJ fuzz tests fix for rowcounts (#10891)
add 8f718dd3ce Move `Count` to `functions-aggregate`, update MSRV to rust
1.75 (#10484)
add ea21b08e47 refactor: fetch statistics for a given ParquetMetaData
(#10880)
add 1aa205d06b Move FileSinkExec::metrics to the correct place (#239)
(#10901)
add 1fc5f915b9 Refine ParquetAccessPlan comments and tests (#10896)
add 2d2685914d ci: fix clippy failures on main (#10903)
add b7d2aea1dd Minor: disable flaky fuzz test (#10904)
add b627ca3e78 Remove builtin count (#10893)
add cc60278f50 Move Regr_* functions to use UDAF (#10898)
add 8f76ac553a Docs: clarify when the reader will read from object store
when using cached metadata (#10909)
add 4dd4121901 Minor: Fix `bench.sh tpch data` (#10905)
add 38bd8932fd Minor: use venv in benchmark compare (#10894)
add 0203a1a21d Support explicit type and name during table creation
(#10273)
add 9ab597b251 Simplify Join Partition Rules (#10911)
add e711775f08 Move Literal to physical-expr-common (#10910)
add ebca68109d chore: update some error messages for clarity (#10916)
add 2f43476471 Initial Extract parquet data page statistics API (#10852)
add 87aea14309 Add contains function, and support in datafusion substrait
consumer (#10879)
add 648c20c388 Minor: Improve arrow_statistics tests (#10927)
add d175163ef6 rm env (#10933)
add c884bdb692 Convert ApproxPercentileCont and
ApproxPercentileContWithWeight to UDAF (#10917)
add d4228feca3 refactor: remove extra default in max rows (#10941)
add 378b9eecd4 chore: Improve performance of Parquet statistics conversion
(#10932)
add c4fd7545ba Add catalog::resolve_table_references (#10876)
add a923c659cf feat: Add support for Int8 and Int16 data types in data
page statistics (#10931)
add 2daadb7523 Convert BitAnd, BitOr, BitXor to UDAF (#10930)
add 9b1bb68e37 refactor: improve PoolType argument handling for CLI
(#10940)
add 861a2364bd feat: add CliSessionContext trait for cli (#10890)
add e1cfb48215 Minor: remove string copy from Column::from_qualified_name
(#10947)
add 1cb0057b99 Fix: StatisticsConverter `counts` for missing columns
(#10946)
add f373a866ce Add initial support for Utf8View and BinaryView types
(#10925)
add a8847e1a82 fix: Support `NOT <field> IN (<subquery>)` via anti join
(#10936)
add ac161bba33 fix: CTEs defined in a subquery can escape their scope
(#10954)
add 0c177d18dc Use shorter aliases in CSE (#10939)
add b26c1b819d fix: Fix the incorrect null joined rows for SMJ outer join
with join filter (#10892)
add 500b73f996 Substrait support for ParquetExec round trip for simple
select (#10949)
add 41a788238c Support to unparse `ScalarValue::IntervalMonthDayNano` to
String (#10956)
add a873f51563 Convert `StringAgg` to UDAF (#10945)
add a2c9d1a8ba Minor: Return option from row_group_row_count (#10973)
add e9f9a239ae Minor: Add routine to debug join fuzz tests (#10970)
add fbf793434e Support to unparse `ScalarValue::TimestampNanosecond` to
String (#10984)
add 2a49d61658 build(deps-dev): bump ws in
/datafusion/wasmtest/datafusion-wasm-app (#10988)
add 80f4322429 Minor: reuse Rows buffer in GroupValuesRows (#10980)
add 5cb1917f8c Add example for writing SQL analysis using DataFusion
structures (#10938)
add 8fda4a6163 feat(optimizer): handle partial anchored regex cases and
improve doc (#10977)
add 4109f581ce Push down filter for Unnest plan (#10974)
add 268f648db9 Minor: add parquet page stats for float{16, 32, 64} (#10982)
add ea0ba99d94 Fix `file_stream_provider` example compilation failure on
windows (#10975)
add c6b2efccf6 Stop copying LogicalPlan and Exprs in
`CommonSubexprEliminate` (2-3% planning speed improvement) (#10835)
add 0f80b9261f feat: Update documentation link in physical optimizer
module (#11002)
add 61e2ddbf29 fix push down logic for unnest (#10991)
add 1e7c38b4f0 Minor: add test for pushdown past unnest (#11017)
add 5bfc11ba4a Update docs for `protoc` minimum installed version (#11006)
add 89def2c6e5 Convert `bool_and` & `bool_or` to UDAF (#11009)
add 58d23c5c05 feat: support uint data page extraction (#11018)
add 5316278cea propagate error instead of panicking on out of bounds in
physical-expr/src/analysis.rs (#10992)
add 1155b0b15e Minor: Add more docs and examples for `Transformed` and
`TransformedResult` (#11003)
add 18042fd691 feat: propagate EmptyRelation for more join types (#10963)
add 1f3ba116a4 doc: Update links in the documantation (#11044)
add 5498a02853 Add drop_columns to dataframe api (#11010)
add fd5a68f802 Push down filter plan for non-unnest column (#11019)
add 4a0c7f35a0 Consider timezones with `UTC` and `+00:00` to be the same
(#10960)
add 6dffc53e76 Deprecate `OptimizerRule::try_optimize` (#11022)
add 098ba30ce5 Relax combine partial final rule (#10913)
add 8aad936e3b Compute gcd with u64 instead of i64 because of overflows
(#11036)
add 30a6ed557d Add distinct_on to dataframe api (#11012)
add ce4940d0c8 chore: add test to show current behavior of string to
timezone vs. timestamp to timezone (#11056)
add 4916e891c2 Boolean parquet get datapage stat (#11054)
add a4799c093c Using display_name for Expr::Aggregation (#11020)
add 569be9eb1b Minor: Convert `Count`'s name to lowercase (#11028)
add accd75b49a Minor: Move `function::Hint` to `datafusion-expr` crate to
avoid physical-expr dependency for `datafusion-function` crate (#11061)
add 81611ad2c5 Support to unparse ScalarValue::TimestampMillisecond to
String (#11046)
add 8a98307f2a support to unparse interval to string (#11065)
add a22423d526 feat: Add method to add analyzer rules to SessionContext
(#10849)
add 6c0e4fb5d9 SMJ: fix streaming row concurrency issue for LEFT SEMI
filtered join (#11041)
add ea46e82088 Add `advanced_parquet_index.rs` example of index in into
parquet files (#10701)
add 98373ab5af Add Expr::column_refs to find column references without
copying (#10948)
add 9f8b731827 Give `OptimizerRule::try_optimize` default implementation
and cleanup duplicated custom implementations (#11059)
add 08e4e6ad02 Fix `FormatOptions::CSV` propagation (#10912)
add 6f10dbc124 Support parsing SQL strings to Exprs (#10995)
add 8d8dd9075d Support dictionary data type in array_to_string (#10908)
add c2ea6b34aa Implement min/max for interval types (#11015)
add fdd1e3db71 Convert Average to UDAF #10942 (#10964)
add a19fc621d1 fix: remove the Sized requirement on ExecutionPlan::name()
(#11047)
add b872080989 Improve LIKE performance for Dictionary arrays (#11058)
add 8aad208299 handle overflow in gcd and return this as an error (#11057)
add d32747d09a Convert Correlation to UDAF (#11064)
add 3051d1928e Migrate more code from `Expr::to_columns` to
`Expr::column_refs` (#11067)
add c7ac8b8221 Minor: Examples cleanup + more docs in pruning example
(#11086)
add 3ff0bfe35d feat: Support duplicate column names in Joins in Substrait
consumer (#11049)
add f0ef0e6f6c decimal support for unparser (#11092)
add ede5598d55 Improve `CommonSubexprEliminate` identifier management
(10% faster planning) (#10473)
add a202a0170a Change wildcard qualifier type from `String` to
`TableReference` (#11073)
add 528c4ab6ad Allow access to UDTF in `SessionContext` (#11071)
add ed7c884d64 Strip table qualifiers from schema in `UNION ALL` for
unparser (#11082)
add 459afbb3a1 Update ListingTable to use StatisticsConverter (#11068)
add e26601811f to_timestamp functions should preserve timezone (#11038)
add 31daf25576 Rewrite array operator to function in parser (#11101)
add 8b244eee3a Resolve empty relation opt for join types (#11066)
add d44c7f2ac3 Add composed extension codec example (#11095)
add 58e2904dde Minor: Avoid some repetition in to_timestamp (#11116)
add 49aee89796 : (#11110)
add 83c80a9d6b fix: ScalarValue::new_ten error cites one not ten (#11126)
add db64743db1 Deprecate Expr::column_refs (#11115)
add 5f02c8a065 Return `&Arc` reference to inner trait object (#11103)
add ec3c71dc6c Overflow in negate operator (#11084)
add 26b646f3dd Minor: Add Architectural Goals to the docs (#11109)
add 1daf007462 fix: gcd returns negative results (#11099)
add 7adc940245 Fix overflow in pow (#11124)
add dd56dbe67c feat: Add support for Timestamp data types in data page
statistics. (#11123)
add 82f7bf40ad Support to unparse Time scalar value to String (#11121)
add 550c936de9 Support to unparse `TimestampSecond` and
`TimestampMicrosecond` to String (#11120)
add aff777b668 Add standalone example for `OptimizerRule` (#11087)
add f6f63b97cd Fix overflow in factorial (#11134)
add 7e49ccf3dd Temporary Fix: Query error when grouping by case
expressions (#11133)
add e9e2951abb Fix nullability of array_agg. (#11093)
add 2d1e8505ea fix: LCM panicked due to overflow (#11131)
add ff116c3da6 Support filter for List (#11091)
add 8216e32e87 [MINOR]: Fix some minor silent bugs (#11127)
add b468ba7883 remove `derive(Copy)` from `Operator` (#11132)
add 0c4e4a1795 Minor Fix for Logical and Physical Expr Conversions (#11142)
add ad56b7ef6f Support Date Parquet Data Page Statistics (#11135)
add 7dcef2251f fix flaky array query slt test (#11140)
add 4d1665550f Fix running in Docker instructions (#11141)
add 64b8eeafde feat: Add support for
`Binary`/`LargeBinary`/`Utf8`/`LargeUtf8` data types in data page statistics
(#11136)
add 10948ca2f0 Support Decimal and Decimal256 Parquet Data Page Statistics
(#11138)
add f58df32753 feat: Support Map type in Substrait conversions (#11129)
add d2ff2189df Implement comparisons on nested data types such that
distinct/except would work (#11117)
add acadfbf25f Minor: dont panic with bad arguments to round (#10899)
add 5501e8eb32 Support COPY TO Externally Defined File Formats, add
FileType trait (#11060)
add 57280e42dc Minor: reduce replication for nested comparison (#11149)
add 838e0f7504 Initial commit (#11158)
add 3bd720033e expose table name in proto extension codec (#11139)
add ca9c322ab3 adding config to control Varchar behavior (#11090)
add 47db63fb02 minor: consolidate `gcd` related tests (#11164)
add f1360b8486 Minor: move batch spilling methods to `lib.rs` to make it
reusable (#11154)
add 09b3c7342c Move schema projection to where it's used in ListingTable
(#11167)
add 330ece8e43 feat: Conditionally allow to keep partition_by columns when
using PARTITIONED BY enhancement (#11107)
add 7a7797c891 Make running in docker instruction be copy-pastable (#11148)
add c80da91e0f fix(typo): unqualifed to unqualified (#11159)
add 14d39734dc Rewrite `array @> array` and `array <@ array` in
sql_expr_to_logical_expr (#11155)
add 27d3aa6a62 Minor: make some physical_optimizer rules public (#11171)
add 1164a37262 Remove pr_benchmarks.yml (#11165)
add 90145dfc6c Optionally display schema in explain plan (#11177)
add 61ba6550e5 fix: Support dictionary type in parquet metadata
statistics. (#11169)
add e52b5e581b fix: Ignore nullability in Substrait structs (#11130)
add d19487c968 Minor: Add more support for ScalarValue::Float16 (#11156)
add 78055fe13b Minor: fix SQLOptions::with_allow_ddl comments (#11166)
add a64df83502 fix doc (#11181)
add 14696a0500 Update sqllogictest requirement from 0.20.0 to 0.21.0
(#11189)
add d624c0d03e Support Time Parquet Data Page Statistics (#11187)
add e40c8a8915 fix: Support Substrait's compound names also for window
functions (#11163)
add 65006b2a81 Adds support for Dictionary statistics from parquet data
pages. (#11195)
add 9fc53121a3 Make function public (#11191)
add 48a1754b33 Introduce user defined SQL planner API (#11180)
add 09cdb7834d Consolidate `Filter::remove_aliases` into
`Expr::unalias_nested` (#11001)
add ab8761d8b9 docs: add example for custom file format with `COPY TO`
(#11174)
add 4bc322819f Covert grouping to udaf (#11147)
add 75b9c9bea2 Make statistics_from_parquet_meta a sync function (#11205)
add 43ea68208b Allow user defined SQL planners to be registered (#11208)
add 58f79e143e Recursive `unnest` (#11062)
add 1840ab5331 Document how to test examples in user guide, add some more
coverage (#11178)
add a4796fa078 Minor: Move MemoryCatalog*Provider into a module, improve
comments (#11183)
add 4f4cd81de7 Fix docs wordings (#11226)
add 3421b52605 Add standalone example of using the SQL frontend (#11088)
add b76c1b7050 Add Optimizer Sanity Checker, improve sortedness
equivalence properties (#11196)
add 03c8db0a98 fix: Incorrect LEFT JOIN evaluation result on OR conditions
(#11203)
add 9d48045654 Implement user defined planner for extract (#11215)
add 03848c52f0 Move basic SQL query examples to user guide (#11217)
add dc7535a598 Support FixedSizedBinaryArray Parquet Data Page Statistics
(#11200)
add c049a94d0e Implement ScalarValue::Map (#11224)
add 699356178c fix: Be more lenient in interpreting input args for builtin
window functions (#11199)
add 4aa584c677 Remove unmaintained python pre-commit configuration (#11255)
add 4615a2d56c Enable clone_on_ref_ptr clippy lint on execution crate
(#11239)
add c6eee61918 Minor: Improve documentation about pushdown join predicates
(#11209)
add 0922d4a624 Minor: clean up data page statistics tests and fix bugs
(#11236)
add 1ffe0535ad Replacing pattern matching through downcast with trait
method (#11257)
add a753c373ee Update substrait requirement from 0.34.0 to 0.35.0 (#11206)
add 5bdc7454d9 Enhance short circuit handling in `CommonSubexprEliminate`
(#11197)
add b4afa182e7 Add bench for data page statistics parquet extraction
(#10950)
add ecc1c01f76 Register SQL planners in `SessionState` constructor (#11253)
add fe66daaa81 Support DuckDB style struct syntax (#11214)
add 6e63748818 Enable clone_on_ref_ptr clippy lint on expr crate (#11238)
add b46d5b7fc6 Optimize PushDownFilter to avoid recreating schema columns
(#11211)
add 351e5f9564 Remove outdated `rewrite_expr.rs` example (#11085)
add 0d2525e6ea Implement TPCH substrait integration teset, support tpch_2
(#11234)
add 9355f4a5c7 Enable `clone_on_ref_ptr` clippy lint on physical-expr
crate (#11240)
add dce77db316 Add standalone `AnalyzerRule` example that implements row
level access control (#11089)
add df999d6675 Replace println! with assert! if possible in DataFusion
examples (#11237)
add a0dd0a1442 minor: format `Expr::get_type()` (#11267)
add 7df000a333 Fix hash join for nested types (#11232)
add 13cb65e441 Infer count() aggregation is not null (#11256)
add 1b3a7af673 Fix count() docs around including null values (#11293)
add 5657886121 Remove unnecessary qualified names (#11292)
add 682fc05452 Fix running examples readme (#11225)
add 6f86bfad2f feat: enable "substring" as a UDF in addition to "substr"
(#11277)
add 00bbb42c96 fix: correctly handle Substrait windows with rows bounds
(and validate executability of test plans) (#11278)
add b9fdc53ac8 Minor: Add `ConstExpr`::from, use in places (#11283)
add a3e1c3d055 Implement TPCH substrait integration teset, support tpch_3
(#11298)
add 2af3d3a55b Implement user defined planner for position (#11243)
add 08c5345e93 Upgrade to arrow 52.1.0 (and fix clippy issues on main)
(#11302)
add e693ed7a3c AggregateExec: Take grouping sets into account for
InputOrderMode (#11301)
add 9f8ba6ab68 Add user_defined_sql_planners(..) to FunctionRegistry
(#11296)
add 45599ce310 use safe cast in propagate_constraints (#11297)
add 5aa7c4ae79 Minor: Remove clone in optimizer (#11315)
add 229c1398d6 minor: Add `PhysicalSortExpr::new` (#11310)
add 6f330c98b3 Fix data page statistics when all rows are null in a data
page (#11295)
add 940efd3b42 Made userDefinedFunctionPlanner to uniform the usages
(#11318)
add 1e39a8575b Implement user defined planner for `create_struct` &
`create_named_struct` (#11273)
add c254b8bb88 Improve stats convert performance for Binary/String/Boolean
arrays (#11319)
add 2f02c43b62 Fix typos in datafusion-examples/datafusion-cli/docs
(#11259)
add 1ea6545868 Minor: Fix Failing TPC-DS Test (#11331)
add 8b8452298d HashJoin can preserve the right ordering when join type is
Right (#11276)
add 8891be4694 Update substrait requirement from 0.35.0 to 0.36.0 (#11328)
add 2bd14cf62c Improve timestamp predicates support (#11326)
add f5e114e4f7 Implement user defined planner for sql_substring_to_expr
(#11327)
add b6281b54bb Improve volatile expression handling in
`CommonSubexprEliminate` (#11265)
add 0c02cad2b5 Support `IS NULL` and `IS NOT NULL` on Unions (#11321)
add 4ac1428025 Convert `nth_value` to UDAF (#11287)
add e4b54f6eae Implement TPCH substrait integration test, support tpch_4
and tpch_5 (#11311)
add 37428bb203 Enable `clone_on_ref_ptr` clippy lint on physical-plan
crate (#11241)
add 894a8794d1 fix: When consuming Substrait, temporarily rename clashing
duplicate columns (#11329)
add 99911449bc Remove any aliases in `Filter::try_new` rather than
erroring (#11307)
add 8ae56fc2b8 Improve `DataFrame` Users Guide (#11324)
add 782df39007 chore: Rename UserDefinedSQLPlanner to ExprPlanner (#11338)
add a6898d36c4 Revert "remove `derive(Copy)` from `Operator` (#11132)"
(#11341)
add fbe3270bd8 Merge remote-tracking branch 'apache/main' into
dependabot/cargo/main/itertools-0.13
No new revisions were added by this update.
Summary of changes:
.github/workflows/pr_benchmarks.yml | 101 --
.github/workflows/rust.yml | 2 +-
.github_changelog_generator | 28 -
.pre-commit-config.yaml | 69 -
Cargo.toml | 78 +-
benchmarks/.gitignore | 3 +-
benchmarks/README.md | 10 +-
benchmarks/bench.sh | 64 +-
benchmarks/compare.py | 2 +-
docs/.gitignore => benchmarks/requirements.txt | 5 +-
ci/scripts/rust_example.sh | 2 +-
datafusion-cli/Cargo.lock | 714 +++++----
datafusion-cli/Cargo.toml | 12 +-
datafusion-cli/Dockerfile | 13 +-
datafusion-cli/examples/cli-session-context.rs | 97 ++
datafusion-cli/src/catalog.rs | 9 +-
datafusion-cli/src/cli_context.rs | 98 ++
datafusion-cli/src/command.rs | 4 +-
datafusion-cli/src/exec.rs | 59 +-
datafusion-cli/src/helper.rs | 4 +-
datafusion-cli/src/lib.rs | 2 +
datafusion-cli/src/main.rs | 45 +-
datafusion-cli/src/object_storage.rs | 55 +-
.../src/pool_type.rs | 58 +-
datafusion-cli/tests/cli_integration.rs | 2 +
datafusion-examples/Cargo.toml | 5 +
datafusion-examples/README.md | 30 +-
.../examples/advanced_parquet_index.rs | 664 ++++++++
datafusion-examples/examples/advanced_udaf.rs | 4 +-
datafusion-examples/examples/advanced_udwf.rs | 2 +-
datafusion-examples/examples/analyzer_rule.rs | 200 +++
datafusion-examples/examples/avro_sql.rs | 51 -
datafusion-examples/examples/catalog.rs | 2 +-
.../examples/composed_extension_codec.rs | 291 ++++
datafusion-examples/examples/csv_opener.rs | 1 +
datafusion-examples/examples/csv_sql.rs | 70 -
datafusion-examples/examples/custom_file_format.rs | 234 +++
datafusion-examples/examples/dataframe_subquery.rs | 1 +
datafusion-examples/examples/expr_api.rs | 50 +-
.../external_dependency/dataframe-to-s3.rs | 6 +-
.../examples/file_stream_provider.rs | 202 +++
datafusion-examples/examples/function_factory.rs | 2 +-
datafusion-examples/examples/json_opener.rs | 2 +-
datafusion-examples/examples/optimizer_rule.rs | 220 +++
datafusion-examples/examples/parquet_index.rs | 24 +-
datafusion-examples/examples/parquet_sql.rs | 51 -
datafusion-examples/examples/parse_sql_expr.rs | 157 ++
datafusion-examples/examples/pruning.rs | 5 +
datafusion-examples/examples/rewrite_expr.rs | 255 ---
datafusion-examples/examples/simple_udwf.rs | 2 +-
.../examples/simplify_udaf_expression.rs | 38 +-
.../examples/simplify_udwf_expression.rs | 10 +-
datafusion-examples/examples/sql_analysis.rs | 309 ++++
datafusion-examples/examples/sql_frontend.rs | 207 +++
datafusion/CHANGELOG.md | 39 +-
datafusion/common/Cargo.toml | 2 +-
datafusion/common/src/column.rs | 19 +-
datafusion/common/src/config.rs | 109 +-
datafusion/common/src/dfschema.rs | 17 +-
datafusion/common/src/display/mod.rs | 14 +-
datafusion/common/src/file_options/csv_writer.rs | 6 +
datafusion/common/src/file_options/file_type.rs | 116 +-
datafusion/common/src/file_options/mod.rs | 12 +-
datafusion/common/src/hash_utils.rs | 6 +-
datafusion/common/src/lib.rs | 4 +-
datafusion/common/src/pyarrow.rs | 37 +-
datafusion/common/src/scalar/mod.rs | 656 +++++++-
datafusion/common/src/tree_node.rs | 88 +-
datafusion/common/src/utils/mod.rs | 25 +-
datafusion/core/Cargo.toml | 4 +-
datafusion/core/benches/parquet_statistic.rs | 186 ++-
datafusion/core/benches/sql_planner.rs | 2 +-
datafusion/core/benches/sql_query_with_io.rs | 2 +-
datafusion/core/src/catalog/information_schema.rs | 2 +-
datafusion/core/src/catalog/listing_schema.rs | 2 +-
datafusion/core/src/catalog/memory.rs | 352 +++++
datafusion/core/src/catalog/mod.rs | 384 +++--
datafusion/core/src/catalog/schema.rs | 155 +-
datafusion/core/src/dataframe/mod.rs | 455 +++++-
datafusion/core/src/dataframe/parquet.rs | 19 +-
.../core/src/datasource/file_format/arrow.rs | 57 +-
datafusion/core/src/datasource/file_format/avro.rs | 56 +
datafusion/core/src/datasource/file_format/csv.rs | 113 +-
.../file_format/file_compression_type.rs | 83 +-
datafusion/core/src/datasource/file_format/json.rs | 80 +-
datafusion/core/src/datasource/file_format/mod.rs | 114 +-
.../core/src/datasource/file_format/options.rs | 10 +
.../core/src/datasource/file_format/parquet.rs | 493 ++++--
.../core/src/datasource/file_format/write/demux.rs | 13 +-
.../datasource/file_format/write/orchestration.rs | 1 +
datafusion/core/src/datasource/function.rs | 5 +
datafusion/core/src/datasource/listing/helpers.rs | 2 +-
datafusion/core/src/datasource/listing/mod.rs | 11 +
datafusion/core/src/datasource/listing/table.rs | 152 +-
.../core/src/datasource/listing_table_factory.rs | 41 +-
.../core/src/datasource/physical_plan/csv.rs | 48 +-
.../core/src/datasource/physical_plan/json.rs | 14 +-
.../core/src/datasource/physical_plan/mod.rs | 2 +
.../physical_plan/parquet/access_plan.rs | 555 +++++++
.../datasource/physical_plan/parquet/metrics.rs | 4 +-
.../src/datasource/physical_plan/parquet/mod.rs | 70 +-
.../src/datasource/physical_plan/parquet/opener.rs | 76 +-
.../physical_plan/parquet/page_filter.rs | 288 ++--
.../src/datasource/physical_plan/parquet/reader.rs | 11 +-
.../datasource/physical_plan/parquet/row_filter.rs | 121 +-
.../datasource/physical_plan/parquet/row_groups.rs | 247 ++-
.../datasource/physical_plan/parquet/statistics.rs | 1671 ++++++++++++++++----
datafusion/core/src/datasource/schema_adapter.rs | 46 +-
datafusion/core/src/datasource/statistics.rs | 21 +-
datafusion/core/src/datasource/stream.rs | 140 +-
datafusion/core/src/execution/context/csv.rs | 4 +-
datafusion/core/src/execution/context/mod.rs | 1206 ++------------
datafusion/core/src/execution/mod.rs | 2 +
datafusion/core/src/execution/session_state.rs | 1279 +++++++++++++++
datafusion/core/src/lib.rs | 151 +-
.../src/physical_optimizer/aggregate_statistics.rs | 82 +-
.../combine_partial_final_agg.rs | 132 +-
.../src/physical_optimizer/enforce_distribution.rs | 3 +
.../limited_distinct_aggregation.rs | 16 +-
datafusion/core/src/physical_optimizer/mod.rs | 6 +-
.../core/src/physical_optimizer/optimizer.rs | 18 +-
.../src/physical_optimizer/output_requirements.rs | 4 +-
.../src/physical_optimizer/pipeline_checker.rs | 334 ----
.../src/physical_optimizer/projection_pushdown.rs | 3 +
datafusion/core/src/physical_optimizer/pruning.rs | 54 +-
.../replace_with_order_preserving_variants.rs | 1 +
.../core/src/physical_optimizer/sanity_checker.rs | 666 ++++++++
.../core/src/physical_optimizer/sort_pushdown.rs | 31 +-
.../core/src/physical_optimizer/test_utils.rs | 11 +-
.../src/physical_optimizer/update_aggr_exprs.rs | 1 +
datafusion/core/src/physical_planner.rs | 161 +-
datafusion/core/src/test/mod.rs | 26 +-
datafusion/core/src/test_util/mod.rs | 10 +-
datafusion/core/tests/custom_sources_cases/mod.rs | 6 +-
.../provider_filter_pushdown.rs | 5 +
.../core/tests/custom_sources_cases/statistics.rs | 4 +
datafusion/core/tests/data/double_quote.csv | 5 +
.../core/tests/dataframe/dataframe_functions.rs | 7 +-
datafusion/core/tests/dataframe/mod.rs | 189 ++-
datafusion/core/tests/expr_api/mod.rs | 191 ++-
datafusion/core/tests/expr_api/parse_sql_expr.rs | 93 ++
datafusion/core/tests/expr_api/simplification.rs | 8 +-
datafusion/core/tests/fifo/mod.rs | 65 +-
datafusion/core/tests/fuzz_cases/aggregate_fuzz.rs | 21 +-
datafusion/core/tests/fuzz_cases/join_fuzz.rs | 628 ++++++--
.../fuzz_cases/sort_preserving_repartition_fuzz.rs | 6 +-
datafusion/core/tests/fuzz_cases/window_fuzz.rs | 9 +-
datafusion/core/tests/parquet/arrow_statistics.rs | 865 +++++++---
datafusion/core/tests/parquet/custom_reader.rs | 2 +-
.../core/tests/parquet/external_access_plan.rs | 418 +++++
datafusion/core/tests/parquet/mod.rs | 49 +-
datafusion/core/tests/parquet/row_group_pruning.rs | 2 +-
datafusion/core/tests/parquet/utils.rs | 55 +
datafusion/core/tests/parquet_exec.rs | 2 +
datafusion/core/tests/path_partition.rs | 26 +-
datafusion/core/tests/sql/aggregates.rs | 38 +-
datafusion/core/tests/sql/explain_analyze.rs | 8 +-
datafusion/core/tests/sql/joins.rs | 16 +-
datafusion/core/tests/sql/select.rs | 2 +-
datafusion/core/tests/tpcds_planning.rs | 5 +-
datafusion/core/tests/user_defined/expr_planner.rs | 123 ++
datafusion/core/tests/user_defined/mod.rs | 3 +
.../tests/user_defined/user_defined_aggregates.rs | 4 +-
.../core/tests/user_defined/user_defined_plan.rs | 142 +-
.../user_defined/user_defined_scalar_functions.rs | 127 +-
.../user_defined/user_defined_table_functions.rs | 15 +
datafusion/execution/src/cache/cache_manager.rs | 4 +-
datafusion/execution/src/cache/cache_unit.rs | 6 +-
datafusion/execution/src/disk_manager.rs | 2 +-
datafusion/execution/src/lib.rs | 2 +
datafusion/execution/src/memory_pool/mod.rs | 4 +-
datafusion/execution/src/object_store.rs | 2 +-
datafusion/execution/src/task.rs | 23 +-
datafusion/expr/Cargo.toml | 1 +
datafusion/expr/src/aggregate_function.rs | 262 +--
datafusion/expr/src/expr.rs | 363 ++++-
datafusion/expr/src/expr_fn.rs | 140 +-
datafusion/expr/src/expr_rewriter/mod.rs | 3 +-
datafusion/expr/src/expr_rewriter/order_by.rs | 8 +-
datafusion/expr/src/expr_schema.rs | 110 +-
datafusion/expr/src/function.rs | 12 +-
datafusion/expr/src/interval_arithmetic.rs | 51 +-
datafusion/expr/src/lib.rs | 7 +-
datafusion/expr/src/logical_plan/builder.rs | 103 +-
datafusion/expr/src/logical_plan/ddl.rs | 25 +-
datafusion/expr/src/logical_plan/display.rs | 4 +-
datafusion/expr/src/logical_plan/dml.rs | 6 +-
datafusion/expr/src/logical_plan/mod.rs | 4 +-
datafusion/expr/src/logical_plan/plan.rs | 126 +-
datafusion/expr/src/logical_plan/tree_node.rs | 4 +-
datafusion/expr/src/planner.rs | 207 +++
datafusion/expr/src/registry.rs | 16 +
datafusion/expr/src/table_source.rs | 19 +-
datafusion/expr/src/test/function_stub.rs | 347 ++++
.../{sql/tests/cases => expr/src/test}/mod.rs | 2 +-
datafusion/expr/src/tree_node.rs | 3 +-
datafusion/expr/src/type_coercion/aggregates.rs | 362 +----
datafusion/expr/src/type_coercion/binary.rs | 44 +-
datafusion/expr/src/type_coercion/functions.rs | 28 +-
datafusion/expr/src/udaf.rs | 194 ++-
datafusion/expr/src/udf.rs | 22 +-
datafusion/expr/src/udwf.rs | 10 +-
datafusion/expr/src/utils.rs | 37 +-
.../src}/approx_distinct.rs | 272 ++--
.../functions-aggregate/src/approx_median.rs | 119 ++
.../src}/approx_percentile_cont.rs | 269 ++--
.../src}/approx_percentile_cont_with_weight.rs | 161 +-
.../src}/average.rs | 408 +++--
.../functions-aggregate/src/bit_and_or_xor.rs | 458 ++++++
datafusion/functions-aggregate/src/bool_and_or.rs | 343 ++++
datafusion/functions-aggregate/src/correlation.rs | 225 +++
datafusion/functions-aggregate/src/count.rs | 559 +++++++
datafusion/functions-aggregate/src/first_last.rs | 25 +-
datafusion/functions-aggregate/src/grouping.rs | 97 ++
.../src}/hyperloglog.rs | 2 +-
datafusion/functions-aggregate/src/lib.rs | 81 +-
datafusion/functions-aggregate/src/macros.rs | 44 +-
datafusion/functions-aggregate/src/median.rs | 4 +-
.../src}/nth_value.rs | 213 +--
.../aggregate => functions-aggregate/src}/regr.rs | 127 +-
datafusion/functions-aggregate/src/stddev.rs | 366 +++++
datafusion/functions-aggregate/src/string_agg.rs | 153 ++
datafusion/functions-aggregate/src/sum.rs | 10 +-
datafusion/functions-aggregate/src/variance.rs | 346 ++++
datafusion/functions-array/Cargo.toml | 3 +-
datafusion/functions-array/src/empty.rs | 10 +-
datafusion/functions-array/src/extract.rs | 19 +-
datafusion/functions-array/src/lib.rs | 4 +-
datafusion/functions-array/src/make_array.rs | 23 +-
datafusion/functions-array/src/planner.rs | 160 ++
datafusion/functions-array/src/range.rs | 12 +-
datafusion/functions-array/src/rewrite.rs | 181 ---
datafusion/functions-array/src/set_ops.rs | 15 +-
datafusion/functions-array/src/string.rs | 29 +-
datafusion/functions-array/src/utils.rs | 19 +-
datafusion/functions/Cargo.toml | 2 +-
datafusion/functions/src/core/arrow_cast.rs | 4 +
datafusion/functions/src/core/mod.rs | 2 +-
datafusion/functions/src/core/planner.rs | 59 +
datafusion/functions/src/datetime/date_bin.rs | 81 +-
datafusion/functions/src/datetime/date_part.rs | 8 +-
datafusion/functions/src/datetime/to_timestamp.rs | 152 +-
datafusion/functions/src/lib.rs | 3 +
datafusion/functions/src/math/factorial.rs | 44 +-
datafusion/functions/src/math/gcd.rs | 73 +-
datafusion/functions/src/math/lcm.rs | 50 +-
datafusion/functions/src/math/power.rs | 33 +-
datafusion/functions/src/math/round.rs | 109 +-
datafusion/functions/src/planner.rs | 51 +
datafusion/functions/src/regex/regexpreplace.rs | 2 +-
datafusion/functions/src/string/btrim.rs | 2 +-
.../src/string/{ends_with.rs => contains.rs} | 110 +-
datafusion/functions/src/string/ltrim.rs | 2 +-
datafusion/functions/src/string/mod.rs | 8 +-
datafusion/functions/src/string/rtrim.rs | 2 +-
datafusion/functions/src/unicode/substr.rs | 6 +
datafusion/functions/src/utils.rs | 2 +-
datafusion/optimizer/Cargo.toml | 4 +
datafusion/optimizer/README.md | 6 +-
.../optimizer/src/analyzer/count_wildcard_rule.rs | 85 +-
datafusion/optimizer/src/analyzer/mod.rs | 2 +-
datafusion/optimizer/src/analyzer/subquery.rs | 23 +-
datafusion/optimizer/src/analyzer/type_coercion.rs | 87 +-
.../optimizer/src/common_subexpr_eliminate.rs | 1628 +++++++++++++------
datafusion/optimizer/src/decorrelate.rs | 21 +-
.../src/decorrelate_predicate_subquery.rs | 94 +-
datafusion/optimizer/src/eliminate_cross_join.rs | 8 -
.../optimizer/src/eliminate_duplicated_expr.rs | 10 +-
datafusion/optimizer/src/eliminate_filter.rs | 21 +-
.../optimizer/src/eliminate_group_by_constant.rs | 36 +-
datafusion/optimizer/src/eliminate_join.rs | 10 +-
datafusion/optimizer/src/eliminate_limit.rs | 22 +-
datafusion/optimizer/src/eliminate_nested_union.rs | 44 +-
datafusion/optimizer/src/eliminate_one_union.rs | 13 +-
datafusion/optimizer/src/eliminate_outer_join.rs | 23 +-
.../optimizer/src/extract_equijoin_predicate.rs | 9 +-
datafusion/optimizer/src/filter_null_join_keys.rs | 10 +-
datafusion/optimizer/src/lib.rs | 1 +
.../optimizer/src/optimize_projections/mod.rs | 54 +-
.../src/optimize_projections/required_indices.rs | 4 +-
datafusion/optimizer/src/optimizer.rs | 55 +-
.../optimizer/src/propagate_empty_relation.rs | 246 ++-
datafusion/optimizer/src/push_down_filter.rs | 425 +++--
datafusion/optimizer/src/push_down_limit.rs | 10 +-
.../optimizer/src/replace_distinct_aggregate.rs | 33 +-
.../optimizer/src/rewrite_disjunctive_predicate.rs | 9 -
.../optimizer/src/scalar_subquery_to_join.rs | 33 +-
.../src/simplify_expressions/expr_simplifier.rs | 14 +-
.../optimizer/src/simplify_expressions/regex.rs | 68 +-
.../src/simplify_expressions/simplify_exprs.rs | 10 +-
.../optimizer/src/single_distinct_to_groupby.rs | 148 +-
datafusion/optimizer/src/test/mod.rs | 39 +-
.../optimizer/src/unwrap_cast_in_comparison.rs | 10 +-
datafusion/optimizer/src/utils.rs | 36 +-
.../optimizer/tests/optimizer_integration.rs | 56 +-
datafusion/physical-expr-common/Cargo.toml | 2 +
.../src/aggregate/count_distinct/bytes.rs | 10 +-
.../{groups_accumulator => count_distinct}/mod.rs | 9 +-
.../src/aggregate/count_distinct/native.rs | 29 +-
.../src/aggregate/merge_arrays.rs | 195 +++
.../physical-expr-common/src/aggregate/mod.rs | 30 +-
.../src/aggregate/tdigest.rs | 48 +-
.../src/binary_map.rs | 25 +-
datafusion/physical-expr-common/src/datum.rs | 147 ++
.../physical-expr-common/src/expressions/cast.rs | 100 +-
.../src/expressions/literal.rs | 3 +-
.../physical-expr-common/src/expressions/mod.rs | 2 +
datafusion/physical-expr-common/src/lib.rs | 2 +
.../physical-expr-common/src/physical_expr.rs | 4 +-
datafusion/physical-expr-common/src/sort_expr.rs | 23 +-
datafusion/physical-expr-common/src/utils.rs | 31 +-
datafusion/physical-expr/Cargo.toml | 1 -
.../physical-expr/src/aggregate/approx_median.rs | 99 --
.../physical-expr/src/aggregate/array_agg.rs | 21 +-
.../src/aggregate/array_agg_distinct.rs | 17 +-
.../src/aggregate/array_agg_ordered.rs | 219 +--
.../physical-expr/src/aggregate/bit_and_or_xor.rs | 695 --------
.../physical-expr/src/aggregate/bool_and_or.rs | 394 -----
datafusion/physical-expr/src/aggregate/build_in.rs | 878 +---------
.../physical-expr/src/aggregate/correlation.rs | 524 ------
datafusion/physical-expr/src/aggregate/count.rs | 348 ----
.../src/aggregate/count_distinct/mod.rs | 718 ---------
.../physical-expr/src/aggregate/covariance.rs | 227 ---
datafusion/physical-expr/src/aggregate/grouping.rs | 103 --
.../src/aggregate/groups_accumulator/mod.rs | 6 +-
datafusion/physical-expr/src/aggregate/min_max.rs | 211 ++-
datafusion/physical-expr/src/aggregate/mod.rs | 22 -
datafusion/physical-expr/src/aggregate/stddev.rs | 337 ----
.../physical-expr/src/aggregate/string_agg.rs | 246 ---
datafusion/physical-expr/src/aggregate/sum.rs | 291 ----
.../physical-expr/src/aggregate/sum_distinct.rs | 202 ---
datafusion/physical-expr/src/aggregate/variance.rs | 429 -----
datafusion/physical-expr/src/analysis.rs | 14 +-
datafusion/physical-expr/src/equivalence/class.rs | 130 +-
datafusion/physical-expr/src/equivalence/mod.rs | 32 +-
.../physical-expr/src/equivalence/ordering.rs | 26 +-
.../physical-expr/src/equivalence/projection.rs | 59 +-
.../physical-expr/src/equivalence/properties.rs | 270 ++--
datafusion/physical-expr/src/expressions/binary.rs | 74 +-
datafusion/physical-expr/src/expressions/case.rs | 42 +-
datafusion/physical-expr/src/expressions/datum.rs | 58 -
.../physical-expr/src/expressions/in_list.rs | 140 +-
.../physical-expr/src/expressions/is_not_null.rs | 56 +-
.../physical-expr/src/expressions/is_null.rs | 131 +-
datafusion/physical-expr/src/expressions/like.rs | 16 +-
datafusion/physical-expr/src/expressions/mod.rs | 31 +-
.../physical-expr/src/expressions/negative.rs | 2 +-
datafusion/physical-expr/src/expressions/not.rs | 2 +-
.../physical-expr/src/expressions/try_cast.rs | 4 +-
datafusion/physical-expr/src/functions.rs | 10 +-
.../physical-expr/src/intervals/cp_solver.rs | 81 +-
.../physical-expr/src/intervals/test_utils.rs | 18 +-
datafusion/physical-expr/src/intervals/utils.rs | 29 +-
datafusion/physical-expr/src/lib.rs | 8 +-
datafusion/physical-expr/src/partitioning.rs | 8 +-
datafusion/physical-expr/src/physical_expr.rs | 20 +-
datafusion/physical-expr/src/planner.rs | 40 +-
datafusion/physical-expr/src/scalar_function.rs | 2 +-
datafusion/physical-expr/src/utils/guarantee.rs | 32 +-
datafusion/physical-expr/src/utils/mod.rs | 6 +-
datafusion/physical-expr/src/window/built_in.rs | 4 +-
datafusion/physical-expr/src/window/lead_lag.rs | 6 +-
datafusion/physical-expr/src/window/nth_value.rs | 5 +-
.../physical-expr/src/window/sliding_aggregate.rs | 2 +-
datafusion/physical-expr/src/window/window_expr.rs | 7 +-
.../src/aggregates/group_values/bytes.rs | 2 +-
.../src/aggregates/group_values/primitive.rs | 2 +
.../src/aggregates/group_values/row.rs | 16 +-
datafusion/physical-plan/src/aggregates/mod.rs | 353 +++--
.../physical-plan/src/aggregates/no_grouping.rs | 9 +-
.../physical-plan/src/aggregates/order/partial.rs | 3 +-
.../physical-plan/src/aggregates/row_hash.rs | 16 +-
.../src/aggregates/topk/hash_table.rs | 2 +
.../physical-plan/src/aggregates/topk/heap.rs | 2 +
.../physical-plan/src/aggregates/topk_stream.rs | 10 +-
datafusion/physical-plan/src/analyze.rs | 16 +-
datafusion/physical-plan/src/coalesce_batches.rs | 8 +-
.../physical-plan/src/coalesce_partitions.rs | 12 +-
datafusion/physical-plan/src/common.rs | 7 +-
datafusion/physical-plan/src/display.rs | 51 +-
datafusion/physical-plan/src/empty.rs | 15 +-
datafusion/physical-plan/src/explain.rs | 6 +-
datafusion/physical-plan/src/filter.rs | 28 +-
datafusion/physical-plan/src/insert.rs | 24 +-
datafusion/physical-plan/src/joins/cross_join.rs | 14 +-
datafusion/physical-plan/src/joins/hash_join.rs | 313 ++--
.../physical-plan/src/joins/nested_loop_join.rs | 59 +-
.../physical-plan/src/joins/sort_merge_join.rs | 437 ++---
.../physical-plan/src/joins/stream_join_utils.rs | 11 +-
.../physical-plan/src/joins/symmetric_hash_join.rs | 46 +-
datafusion/physical-plan/src/joins/test_utils.rs | 60 +-
datafusion/physical-plan/src/joins/utils.rs | 215 ++-
datafusion/physical-plan/src/lib.rs | 101 +-
datafusion/physical-plan/src/limit.rs | 12 +-
datafusion/physical-plan/src/memory.rs | 9 +-
datafusion/physical-plan/src/placeholder_row.rs | 13 +-
datafusion/physical-plan/src/projection.rs | 20 +-
datafusion/physical-plan/src/recursive_query.rs | 23 +-
datafusion/physical-plan/src/repartition/mod.rs | 61 +-
datafusion/physical-plan/src/sorts/builder.rs | 6 +-
datafusion/physical-plan/src/sorts/merge.rs | 3 +-
datafusion/physical-plan/src/sorts/partial_sort.rs | 53 +-
datafusion/physical-plan/src/sorts/sort.rs | 160 +-
.../src/sorts/sort_preserving_merge.rs | 50 +-
datafusion/physical-plan/src/sorts/stream.rs | 2 +-
datafusion/physical-plan/src/stream.rs | 28 +-
datafusion/physical-plan/src/streaming.rs | 6 +-
datafusion/physical-plan/src/test.rs | 5 +-
datafusion/physical-plan/src/test/exec.rs | 38 +-
datafusion/physical-plan/src/topk/mod.rs | 4 +-
datafusion/physical-plan/src/tree_node.rs | 2 +-
datafusion/physical-plan/src/union.rs | 106 +-
datafusion/physical-plan/src/unnest.rs | 36 +-
datafusion/physical-plan/src/values.rs | 10 +-
.../src/windows/bounded_window_agg_exec.rs | 73 +-
datafusion/physical-plan/src/windows/mod.rs | 94 +-
.../physical-plan/src/windows/window_agg_exec.rs | 15 +-
datafusion/physical-plan/src/work_table.rs | 8 +-
datafusion/proto-common/Cargo.toml | 10 +-
datafusion/proto-common/gen/Cargo.toml | 2 +-
.../proto-common/proto/datafusion_common.proto | 22 +-
datafusion/proto-common/src/from_proto/mod.rs | 43 +-
datafusion/proto-common/src/generated/pbjson.rs | 302 +++-
datafusion/proto-common/src/generated/prost.rs | 43 +-
datafusion/proto-common/src/to_proto/mod.rs | 46 +-
datafusion/proto/CONTRIBUTING.md | 2 +-
datafusion/proto/Cargo.toml | 3 +-
datafusion/proto/gen/Cargo.toml | 2 +-
datafusion/proto/gen/src/main.rs | 1 +
datafusion/proto/proto/datafusion.proto | 75 +-
datafusion/proto/src/bytes/mod.rs | 5 +
datafusion/proto/src/bytes/registry.rs | 5 +
.../proto/src/generated/datafusion_proto_common.rs | 43 +-
datafusion/proto/src/generated/pbjson.rs | 272 ++--
datafusion/proto/src/generated/prost.rs | 169 +-
datafusion/proto/src/logical_plan/file_formats.rs | 409 +++++
datafusion/proto/src/logical_plan/from_proto.rs | 51 +-
datafusion/proto/src/logical_plan/mod.rs | 49 +-
datafusion/proto/src/logical_plan/to_proto.rs | 99 +-
datafusion/proto/src/physical_plan/from_proto.rs | 27 +-
datafusion/proto/src/physical_plan/mod.rs | 32 +-
datafusion/proto/src/physical_plan/to_proto.rs | 137 +-
.../proto/tests/cases/roundtrip_logical_plan.rs | 231 ++-
.../proto/tests/cases/roundtrip_physical_plan.rs | 128 +-
datafusion/sql/Cargo.toml | 1 +
datafusion/sql/examples/sql.rs | 22 +-
datafusion/sql/src/cte.rs | 11 +-
datafusion/sql/src/expr/binary_op.rs | 2 +
datafusion/sql/src/expr/function.rs | 134 +-
datafusion/sql/src/expr/json_access.rs | 31 -
datafusion/sql/src/expr/mod.rs | 627 +++-----
datafusion/sql/src/expr/subquery.rs | 6 +-
datafusion/sql/src/expr/substring.rs | 24 +-
datafusion/sql/src/expr/value.rs | 32 +-
datafusion/sql/src/parser.rs | 36 +-
datafusion/sql/src/planner.rs | 119 +-
datafusion/sql/src/query.rs | 18 +-
datafusion/sql/src/relation/join.rs | 31 +-
datafusion/sql/src/select.rs | 122 +-
datafusion/sql/src/statement.rs | 205 +--
datafusion/sql/src/unparser/ast.rs | 259 ++-
datafusion/sql/src/unparser/expr.rs | 422 +++--
datafusion/sql/src/unparser/mod.rs | 1 +
datafusion/sql/src/unparser/plan.rs | 29 +-
datafusion/sql/src/unparser/rewrite.rs | 101 ++
datafusion/sql/src/utils.rs | 146 +-
datafusion/sql/tests/cases/plan_to_sql.rs | 30 +-
datafusion/sql/tests/common/mod.rs | 38 +-
datafusion/sql/tests/sql_integration.rs | 208 ++-
datafusion/sqllogictest/Cargo.toml | 12 +-
.../test_files/agg_func_substitute.slt | 30 +-
datafusion/sqllogictest/test_files/aggregate.slt | 463 ++++--
datafusion/sqllogictest/test_files/array.slt | 159 +-
datafusion/sqllogictest/test_files/array_query.slt | 128 +-
.../sqllogictest/test_files/arrow_typeof.slt | 12 +-
datafusion/sqllogictest/test_files/avro.slt | 6 +-
datafusion/sqllogictest/test_files/cast.slt | 20 +
datafusion/sqllogictest/test_files/copy.slt | 21 +
datafusion/sqllogictest/test_files/cse.slt | 173 ++
datafusion/sqllogictest/test_files/csv_files.slt | 91 ++
datafusion/sqllogictest/test_files/cte.slt | 31 +
datafusion/sqllogictest/test_files/errors.slt | 10 +-
datafusion/sqllogictest/test_files/explain.slt | 38 +-
datafusion/sqllogictest/test_files/expr.slt | 149 +-
datafusion/sqllogictest/test_files/functions.slt | 24 +-
datafusion/sqllogictest/test_files/group_by.slt | 370 +++--
.../sqllogictest/test_files/information_schema.slt | 6 +
datafusion/sqllogictest/test_files/insert.slt | 28 +-
.../sqllogictest/test_files/insert_to_external.slt | 20 +-
datafusion/sqllogictest/test_files/join.slt | 193 +++
datafusion/sqllogictest/test_files/joins.slt | 448 +++++-
datafusion/sqllogictest/test_files/json.slt | 6 +-
datafusion/sqllogictest/test_files/limit.slt | 18 +-
datafusion/sqllogictest/test_files/math.slt | 118 ++
.../test_files/optimizer_group_by_constant.slt | 16 +-
datafusion/sqllogictest/test_files/order.slt | 4 +-
datafusion/sqllogictest/test_files/predicates.slt | 4 +-
.../sqllogictest/test_files/push_down_filter.slt | 124 ++
datafusion/sqllogictest/test_files/regexp.slt | 81 +
datafusion/sqllogictest/test_files/repartition.slt | 12 +-
datafusion/sqllogictest/test_files/scalar.slt | 80 +-
datafusion/sqllogictest/test_files/select.slt | 22 +-
.../sqllogictest/test_files/set_variable.slt | 8 +-
.../sqllogictest/test_files/sort_merge_join.slt | 131 +-
datafusion/sqllogictest/test_files/strings.slt | 49 +
datafusion/sqllogictest/test_files/struct.slt | 52 +
datafusion/sqllogictest/test_files/subquery.slt | 205 +--
datafusion/sqllogictest/test_files/timestamps.slt | 45 +-
.../sqllogictest/test_files/tpch/q1.slt.part | 16 +-
.../sqllogictest/test_files/tpch/q10.slt.part | 10 +-
.../sqllogictest/test_files/tpch/q11.slt.part | 24 +-
.../sqllogictest/test_files/tpch/q12.slt.part | 10 +-
.../sqllogictest/test_files/tpch/q13.slt.part | 18 +-
.../sqllogictest/test_files/tpch/q14.slt.part | 10 +-
.../sqllogictest/test_files/tpch/q15.slt.part | 20 +-
.../sqllogictest/test_files/tpch/q16.slt.part | 10 +-
.../sqllogictest/test_files/tpch/q17.slt.part | 24 +-
.../sqllogictest/test_files/tpch/q18.slt.part | 16 +-
.../sqllogictest/test_files/tpch/q19.slt.part | 10 +-
.../sqllogictest/test_files/tpch/q20.slt.part | 14 +-
.../sqllogictest/test_files/tpch/q21.slt.part | 10 +-
.../sqllogictest/test_files/tpch/q22.slt.part | 20 +-
.../sqllogictest/test_files/tpch/q3.slt.part | 10 +-
.../sqllogictest/test_files/tpch/q4.slt.part | 10 +-
.../sqllogictest/test_files/tpch/q5.slt.part | 10 +-
.../sqllogictest/test_files/tpch/q6.slt.part | 10 +-
.../sqllogictest/test_files/tpch/q7.slt.part | 10 +-
.../sqllogictest/test_files/tpch/q8.slt.part | 10 +-
.../sqllogictest/test_files/tpch/q9.slt.part | 10 +-
datafusion/sqllogictest/test_files/union.slt | 18 +-
datafusion/sqllogictest/test_files/unnest.slt | 105 +-
datafusion/sqllogictest/test_files/window.slt | 587 +++----
datafusion/substrait/Cargo.toml | 7 +-
datafusion/substrait/src/logical_plan/consumer.rs | 902 +++++++----
datafusion/substrait/src/logical_plan/producer.rs | 110 +-
datafusion/substrait/src/physical_plan/consumer.rs | 87 +-
datafusion/substrait/src/physical_plan/producer.rs | 93 +-
.../substrait/tests/cases/consumer_integration.rs | 269 ++++
.../cases/{logical_plans.rs => function_test.rs} | 25 +-
datafusion/substrait/tests/cases/logical_plans.rs | 64 +-
datafusion/substrait/tests/cases/mod.rs | 2 +
.../tests/cases/roundtrip_logical_plan.rs | 223 +--
.../tests/cases/roundtrip_physical_plan.rs | 92 +-
datafusion/substrait/tests/testdata/Readme.md | 51 +
.../tests/testdata/contains_plan.substrait.json | 133 ++
datafusion/substrait/tests/testdata/data.csv | 4 +-
datafusion/substrait/tests/testdata/data.parquet | Bin 0 -> 4342 bytes
datafusion/substrait/tests/testdata/empty.parquet | Bin 0 -> 976 bytes
.../test_plans/non_nullable_lists.substrait.json | 71 +
.../select_not_bool.substrait.json | 0
.../test_plans/select_window.substrait.json | 153 ++
.../substrait/tests/testdata/tpch/customer.csv | 2 +
.../substrait/tests/testdata/tpch/lineitem.csv | 2 +
.../substrait/tests/testdata/tpch/nation.csv | 2 +
.../substrait/tests/testdata/tpch/orders.csv | 2 +
datafusion/substrait/tests/testdata/tpch/part.csv | 2 +
.../substrait/tests/testdata/tpch/partsupp.csv | 2 +
.../substrait/tests/testdata/tpch/region.csv | 2 +
.../substrait/tests/testdata/tpch/supplier.csv | 2 +
.../tests/testdata/tpch_substrait_plans}/README.md | 4 +-
.../testdata/tpch_substrait_plans/query_1.json | 810 ++++++++++
.../testdata/tpch_substrait_plans/query_2.json | 1582 ++++++++++++++++++
.../testdata/tpch_substrait_plans/query_3.json | 851 ++++++++++
.../testdata/tpch_substrait_plans/query_4.json | 540 +++++++
.../testdata/tpch_substrait_plans/query_5.json | 1254 +++++++++++++++
.../wasmtest/datafusion-wasm-app/package-lock.json | 40 +-
dev/changelog/39.0.0.md | 333 ++++
dev/release/README.md | 53 +-
dev/release/generate-changelog.py | 68 +-
docs/source/contributor-guide/getting_started.md | 4 +-
docs/source/contributor-guide/inviting.md | 427 +++++
docs/source/contributor-guide/testing.md | 25 +
docs/source/index.rst | 5 +-
docs/source/library-user-guide/adding-udfs.md | 4 +-
.../source/library-user-guide/using-the-sql-api.md | 195 +++
.../library-user-guide/working-with-exprs.md | 6 +-
docs/source/user-guide/cli/installation.md | 7 +-
docs/source/user-guide/cli/usage.md | 2 +-
docs/source/user-guide/configs.md | 5 +-
docs/source/user-guide/dataframe.md | 121 +-
docs/source/user-guide/example-usage.md | 13 +-
docs/source/user-guide/expressions.md | 14 +-
docs/source/user-guide/introduction.md | 2 +-
docs/source/user-guide/sql/aggregate_functions.md | 6 +-
docs/source/user-guide/sql/dml.md | 5 +-
docs/source/user-guide/sql/scalar_functions.md | 93 ++
docs/source/user-guide/sql/write_options.md | 10 +
587 files changed, 39827 insertions(+), 19665 deletions(-)
delete mode 100644 .github/workflows/pr_benchmarks.yml
delete mode 100644 .github_changelog_generator
delete mode 100644 .pre-commit-config.yaml
copy docs/.gitignore => benchmarks/requirements.txt (95%)
create mode 100644 datafusion-cli/examples/cli-session-context.rs
create mode 100644 datafusion-cli/src/cli_context.rs
copy datafusion/wasmtest/datafusion-wasm-app/webpack.config.js =>
datafusion-cli/src/pool_type.rs (56%)
create mode 100644 datafusion-examples/examples/advanced_parquet_index.rs
create mode 100644 datafusion-examples/examples/analyzer_rule.rs
delete mode 100644 datafusion-examples/examples/avro_sql.rs
create mode 100644 datafusion-examples/examples/composed_extension_codec.rs
delete mode 100644 datafusion-examples/examples/csv_sql.rs
create mode 100644 datafusion-examples/examples/custom_file_format.rs
create mode 100644 datafusion-examples/examples/file_stream_provider.rs
create mode 100644 datafusion-examples/examples/optimizer_rule.rs
delete mode 100644 datafusion-examples/examples/parquet_sql.rs
create mode 100644 datafusion-examples/examples/parse_sql_expr.rs
delete mode 100644 datafusion-examples/examples/rewrite_expr.rs
create mode 100644 datafusion-examples/examples/sql_analysis.rs
create mode 100644 datafusion-examples/examples/sql_frontend.rs
create mode 100644 datafusion/core/src/catalog/memory.rs
create mode 100644
datafusion/core/src/datasource/physical_plan/parquet/access_plan.rs
create mode 100644 datafusion/core/src/execution/session_state.rs
delete mode 100644 datafusion/core/src/physical_optimizer/pipeline_checker.rs
create mode 100644 datafusion/core/src/physical_optimizer/sanity_checker.rs
create mode 100644 datafusion/core/tests/data/double_quote.csv
create mode 100644 datafusion/core/tests/expr_api/parse_sql_expr.rs
create mode 100644 datafusion/core/tests/parquet/external_access_plan.rs
create mode 100644 datafusion/core/tests/parquet/utils.rs
create mode 100644 datafusion/core/tests/user_defined/expr_planner.rs
create mode 100644 datafusion/expr/src/planner.rs
create mode 100644 datafusion/expr/src/test/function_stub.rs
copy datafusion/{sql/tests/cases => expr/src/test}/mod.rs (97%)
rename datafusion/{physical-expr/src/aggregate =>
functions-aggregate/src}/approx_distinct.rs (79%)
create mode 100644 datafusion/functions-aggregate/src/approx_median.rs
rename datafusion/{physical-expr/src/aggregate =>
functions-aggregate/src}/approx_percentile_cont.rs (65%)
rename datafusion/{physical-expr/src/aggregate =>
functions-aggregate/src}/approx_percentile_cont_with_weight.rs (50%)
rename datafusion/{physical-expr/src/aggregate =>
functions-aggregate/src}/average.rs (76%)
create mode 100644 datafusion/functions-aggregate/src/bit_and_or_xor.rs
create mode 100644 datafusion/functions-aggregate/src/bool_and_or.rs
create mode 100644 datafusion/functions-aggregate/src/correlation.rs
create mode 100644 datafusion/functions-aggregate/src/count.rs
create mode 100644 datafusion/functions-aggregate/src/grouping.rs
rename datafusion/{physical-expr/src/aggregate =>
functions-aggregate/src}/hyperloglog.rs (99%)
rename datafusion/{physical-expr/src/aggregate =>
functions-aggregate/src}/nth_value.rs (75%)
rename datafusion/{physical-expr/src/aggregate =>
functions-aggregate/src}/regr.rs (84%)
create mode 100644 datafusion/functions-aggregate/src/stddev.rs
create mode 100644 datafusion/functions-aggregate/src/string_agg.rs
create mode 100644 datafusion/functions-aggregate/src/variance.rs
create mode 100644 datafusion/functions-array/src/planner.rs
delete mode 100644 datafusion/functions-array/src/rewrite.rs
create mode 100644 datafusion/functions/src/core/planner.rs
create mode 100644 datafusion/functions/src/planner.rs
copy datafusion/functions/src/string/{ends_with.rs => contains.rs} (55%)
rename datafusion/{physical-expr =>
physical-expr-common}/src/aggregate/count_distinct/bytes.rs (90%)
copy datafusion/physical-expr-common/src/aggregate/{groups_accumulator =>
count_distinct}/mod.rs (82%)
rename datafusion/{physical-expr =>
physical-expr-common}/src/aggregate/count_distinct/native.rs (90%)
create mode 100644
datafusion/physical-expr-common/src/aggregate/merge_arrays.rs
rename datafusion/{physical-expr =>
physical-expr-common}/src/aggregate/tdigest.rs (95%)
rename datafusion/{physical-expr => physical-expr-common}/src/binary_map.rs
(98%)
create mode 100644 datafusion/physical-expr-common/src/datum.rs
rename datafusion/{physical-expr =>
physical-expr-common}/src/expressions/literal.rs (98%)
delete mode 100644 datafusion/physical-expr/src/aggregate/approx_median.rs
delete mode 100644 datafusion/physical-expr/src/aggregate/bit_and_or_xor.rs
delete mode 100644 datafusion/physical-expr/src/aggregate/bool_and_or.rs
delete mode 100644 datafusion/physical-expr/src/aggregate/correlation.rs
delete mode 100644 datafusion/physical-expr/src/aggregate/count.rs
delete mode 100644 datafusion/physical-expr/src/aggregate/count_distinct/mod.rs
delete mode 100644 datafusion/physical-expr/src/aggregate/covariance.rs
delete mode 100644 datafusion/physical-expr/src/aggregate/grouping.rs
delete mode 100644 datafusion/physical-expr/src/aggregate/stddev.rs
delete mode 100644 datafusion/physical-expr/src/aggregate/string_agg.rs
delete mode 100644 datafusion/physical-expr/src/aggregate/sum.rs
delete mode 100644 datafusion/physical-expr/src/aggregate/sum_distinct.rs
delete mode 100644 datafusion/physical-expr/src/aggregate/variance.rs
delete mode 100644 datafusion/physical-expr/src/expressions/datum.rs
create mode 100644 datafusion/proto/src/logical_plan/file_formats.rs
delete mode 100644 datafusion/sql/src/expr/json_access.rs
create mode 100644 datafusion/sql/src/unparser/rewrite.rs
create mode 100644 datafusion/sqllogictest/test_files/cse.slt
create mode 100644 datafusion/sqllogictest/test_files/push_down_filter.slt
create mode 100644 datafusion/substrait/tests/cases/consumer_integration.rs
copy datafusion/substrait/tests/cases/{logical_plans.rs => function_test.rs}
(60%)
create mode 100644 datafusion/substrait/tests/testdata/Readme.md
create mode 100644
datafusion/substrait/tests/testdata/contains_plan.substrait.json
create mode 100644 datafusion/substrait/tests/testdata/data.parquet
create mode 100644 datafusion/substrait/tests/testdata/empty.parquet
create mode 100644
datafusion/substrait/tests/testdata/test_plans/non_nullable_lists.substrait.json
rename datafusion/substrait/tests/testdata/{ =>
test_plans}/select_not_bool.substrait.json (100%)
create mode 100644
datafusion/substrait/tests/testdata/test_plans/select_window.substrait.json
create mode 100644 datafusion/substrait/tests/testdata/tpch/customer.csv
create mode 100644 datafusion/substrait/tests/testdata/tpch/lineitem.csv
create mode 100644 datafusion/substrait/tests/testdata/tpch/nation.csv
create mode 100644 datafusion/substrait/tests/testdata/tpch/orders.csv
create mode 100644 datafusion/substrait/tests/testdata/tpch/part.csv
create mode 100644 datafusion/substrait/tests/testdata/tpch/partsupp.csv
create mode 100644 datafusion/substrait/tests/testdata/tpch/region.csv
create mode 100644 datafusion/substrait/tests/testdata/tpch/supplier.csv
copy {python =>
datafusion/substrait/tests/testdata/tpch_substrait_plans}/README.md (77%)
create mode 100644
datafusion/substrait/tests/testdata/tpch_substrait_plans/query_1.json
create mode 100644
datafusion/substrait/tests/testdata/tpch_substrait_plans/query_2.json
create mode 100644
datafusion/substrait/tests/testdata/tpch_substrait_plans/query_3.json
create mode 100644
datafusion/substrait/tests/testdata/tpch_substrait_plans/query_4.json
create mode 100644
datafusion/substrait/tests/testdata/tpch_substrait_plans/query_5.json
create mode 100644 dev/changelog/39.0.0.md
create mode 100644 docs/source/contributor-guide/inviting.md
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]