slinkydeveloper opened a new pull request #17897:
URL: https://github.com/apache/flink/pull/17897
## What is the purpose of the change
This PR moves the FileSystemTableSource/Sink out of flink-table-runtime and
inside flink-connector-files module. This PR ended up being bigger than
expected, so I tried to factor out the "preliminary changes" to the actual file
moving in separate hotfix commits. More details in the commit details section.
## Brief change log
### Classes changes:
* Renamed package `org.apache.flink.table.filesystem` to
`org.apache.flink.connector.file.table`
* Moved every production class `org.apache.flink.connector.file.table` from
`flink-table-runtime` to `flink-connector-files`
* Moved some test classes inside `org.apache.flink.connector.file.table`
from `flink-table-runtime` to `flink-connector-files`, others in
`flink-table-planner`. Now no filesystem related test lives in
`flink-table-runtime`
* Reworked and moved the testcsv format inside `flink-table-planner`
* Moved the columnar `RowData` implementations from `flink-table-runtime` to
`flink-table-common`, under the package `org.apache.flink.table.data.columnar`
* Moved `FileSystemFormatFactory`, `BulkWriterFormatFactory` and
`BulkReaderFormatFactory` to `flink-connector-files`, in a new package
`org.apache.flink.connector.file.table.factories`
* Moved `BulkDecodingFormat` to `flink-connector-files`, in a new package
`org.apache.flink.connector.file.table.formats`
### Module changes:
* No table-* package depends on `flink-connector-files` for production jars
anymore
* `flink-table-planner` still depends on `flink-connector-files` in test
classpath
* `flink-orc` and `flink-parquet` don't depend on `flink-table-runtime`
anymore, hence the scala suffix is dropped
* Every format module now depends optionally on `flink-connector-files` to
implement `BulkWriterFormatFactory` and `BulkReaderFormatFactory`
## Commit details
* Fix the `FactoryUtil` loading mechanism to tolerate
`NoClassDefFoundError`: This is necessary in scenarios when users bring in a
format module, implementing
`BulkReaderFormatFactory`/`BulkWriterFormatFactory`, but don't have
flink-connector-files in the classpath because they don't use it. E.g. you're
using `flink-connector-kafka` and you want to use `flink-avro`, which brings in
an implementation of `BulkWriterFormatFactory`. This scenario is already tested
by every test in flink-connector-kafka which doesn't bring in
flink-connector-files in its test classpath.
* Remove planner dependency on `FileSystemConnectorFiles` hardcoding the
used option
* Copied `DecimalDataUtils#is32BitDecimal` and
`DecimalDataUtils#is32BitDecimal` in `ParquetSchemaConverter` to remove the
dependency on `DecimalDataUtils` (which lives in the runtime module)
* Refactored the testcsv format:
- Reimplemented the writing side using `SerializationSchemaFormatFactory`,
since the implementation of `BulkReaderFormatFactory` was causing issues when
using the `flink-table-planner` test-jar in other modules.
- Tried to make the format implementation as much as possible independent
from planner internal classes, but this cannot be done completely due to an
issue in `FileSystemTableSink#createSourceContext`, which cannot provide the
`DataStructureConverter`. This issue can potentially come up with other formats
used by `FileSystemFormatFactory` as well, so we probably need to address it
separately from this PR.
- Moved to a new ad-hoc package and moved in `flink-table-planner`
* Big commit that moves code. Other than moving code and fixing imports as
described above, it does the following:
- Both parquet and orc formats were using `InternalTypeInfo` for creating
the `TypeInformation`. Now they're using the context.
- Changed some javadocs of deprecated apis in
`org.apache.flink.table.sinks` to remove the dependency from
`flink-connector-files` types
- All the code moved in common is now annotated with `@Internal`
- Some tests couldn't be moved to flink-connector-files because depending
on several test utils/test bases from runtime. Neverthless, they have been
adapted to the new package name.
- Every package depending on either the moved format factories or the
filesystem table sink/source internals now depend on `flink-connector-files`
* Other commits involve package changes and are in a separate commit for
clarity:
- Dropped the scala suffix from flink-orc and flink-parquet
- flink-table-uber now needs to manually import flink-connector-files (as
it's not a transitive depedency from flink-table-common anymore)
## Verifying this change
Every change is already covered by existing tests
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): moves dependencies
around
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: changes the package name of flink-orc and flink-parquet
- The serializers: no
- The runtime per-record code paths (performance sensitive): no
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
- The S3 file system connector: no
## Documentation
- Does this pull request introduce a new feature? no
- If yes, how is the feature documented? not applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]