Hi,

The DataFusion project (an in-memory SQL engine built upon Arrow in Rust)
has decided to adopt the Postgres dialect of SQL. The Postgres 'dialect'
largely refers to the functions/API that Postgres has added in addition to
meeting the ANSI SQL standard functions as all dialects have slightly
different inclusions (see
https://www.postgresql.org/docs/13/functions-string.html vs
https://dev.mysql.com/doc/refman/8.0/en/string-functions.html). By
selecting the Postgres dialect DataFusion is able to guarantee some level
of compatibility with existing user code bases and help the DataFusion
project identify and incorporate missing features based on the years of
experience in the Postgres project.

As an example, the `concat_ws` (concatenate with separator) is a Postgres
function and has a set of expected behaviors based when executed (
https://www.postgresql.org/docs/13/functions-string.html#id-1.5.8.10.7.2.2.5.1.1.1).
The same function also exists in other SQL dialects (for example MySQL:
https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_concat-ws)
but may have slightly different behavior. To achieve and guarantee
compatibility the DataFusion implementation relies on the examples provided
in the Postgres documentation to describe the behavior of the function and
use the provided example in testing to ensure they behave the same.
DataFusion also re-uses the documentation from the Postgres project in code
comments as it is generally succinct and well written to describe the
intended behavior of the implementation.

As we want to ensure we are compliant with the Postgres License (
https://www.postgresql.org/about/licence/) I have proposed two actions:
- Adding the license to the
https://github.com/apache/arrow/blob/master/LICENSE.txt file in this PR:
https://github.com/apache/arrow/pull/9507 The reason this was left with
generalised language is that if other contributors do inadvertently add
other Postgres functions in other files I do not want to fall outside of
the license obligations.
- Annotating the main impacted source file with the generic statement:
    // Some of these functions reference the Postgres documentation
    // or implementation to ensure compatibility and are subject to
    // the Postgres license.
  in this PR:
https://github.com/apache/arrow/pull/9551/files#diff-abe8768fe7124198cca7a84ad7b2c678b3cc8e5de3d1bc867d498536a2fdddc7R18

Can anyone provide advice here or should we reach out to the Apache
Software Foundation Lawyers (https://www.apache.org/legal/)?

Thanks

Reply via email to