[GitHub] [flink] slinkydeveloper opened a new pull request #17897: [FLINK-24687][table] Move FileSystemTableSource/Sink in flink-connector-files

GitBox Wed, 24 Nov 2021 07:27:55 -0800


slinkydeveloper opened a new pull request #17897:
URL: https://github.com/apache/flink/pull/17897



   ## What is the purpose of the change
   
   This PR moves the FileSystemTableSource/Sink out of flink-table-runtime and 
inside flink-connector-files module. This PR ended up being bigger than 
expected, so I tried to factor out the "preliminary changes" to the actual file 
moving in separate hotfix commits. More details in the commit details section.
   
   ## Brief change log
   
   ### Classes changes:
   
   * Renamed package `org.apache.flink.table.filesystem` to 
`org.apache.flink.connector.file.table`
   * Moved every production class `org.apache.flink.connector.file.table` from 
`flink-table-runtime` to `flink-connector-files`
   * Moved some test classes inside `org.apache.flink.connector.file.table` 
from `flink-table-runtime` to `flink-connector-files`, others in 
`flink-table-planner`. Now no filesystem related test lives in 
`flink-table-runtime`
   * Reworked and moved the testcsv format inside `flink-table-planner` 
   * Moved the columnar `RowData` implementations from `flink-table-runtime` to 
`flink-table-common`, under the package `org.apache.flink.table.data.columnar`
   * Moved `FileSystemFormatFactory`, `BulkWriterFormatFactory` and 
`BulkReaderFormatFactory` to `flink-connector-files`, in a new package 
`org.apache.flink.connector.file.table.factories`
   * Moved `BulkDecodingFormat` to `flink-connector-files`, in a new package 
`org.apache.flink.connector.file.table.formats`
   
   ### Module changes:
   
   * No table-* package depends on `flink-connector-files` for production jars 
anymore
   * `flink-table-planner` still depends on `flink-connector-files` in test 
classpath
   * `flink-orc` and `flink-parquet` don't depend on `flink-table-runtime` 
anymore, hence the scala suffix is dropped
   * Every format module now depends optionally on `flink-connector-files` to 
implement `BulkWriterFormatFactory` and `BulkReaderFormatFactory`
   
   ## Commit details
   
   * Fix the `FactoryUtil` loading mechanism to tolerate 
`NoClassDefFoundError`: This is necessary in scenarios when users bring in a 
format module, implementing 
`BulkReaderFormatFactory`/`BulkWriterFormatFactory`, but don't have 
flink-connector-files in the classpath because they don't use it. E.g. you're 
using `flink-connector-kafka` and you want to use `flink-avro`, which brings in 
an implementation of `BulkWriterFormatFactory`. This scenario is already tested 
by every test in flink-connector-kafka which doesn't bring in 
flink-connector-files in its test classpath.
   * Remove planner dependency on `FileSystemConnectorFiles` hardcoding the 
used option
   * Copied `DecimalDataUtils#is32BitDecimal` and 
`DecimalDataUtils#is32BitDecimal` in `ParquetSchemaConverter` to remove the 
dependency on `DecimalDataUtils` (which lives in the runtime module)
   * Refactored the testcsv format:
     - Reimplemented the writing side using `SerializationSchemaFormatFactory`, 
since the implementation of `BulkReaderFormatFactory` was causing issues when 
using the `flink-table-planner` test-jar in other modules.
     - Tried to make the format implementation as much as possible independent 
from planner internal classes, but this cannot be done completely due to an 
issue in `FileSystemTableSink#createSourceContext`, which cannot provide the 
`DataStructureConverter`. This issue can potentially come up with other formats 
used by `FileSystemFormatFactory` as well, so we probably need to address it 
separately from this PR.
     - Moved to a new ad-hoc package and moved in `flink-table-planner`
   * Big commit that moves code. Other than moving code and fixing imports as 
described above, it does the following:
     - Both parquet and orc formats were using `InternalTypeInfo` for creating 
the `TypeInformation`. Now they're using the context.
     - Changed some javadocs of deprecated apis in 
`org.apache.flink.table.sinks` to remove the dependency from 
`flink-connector-files` types
     - All the code moved in common is now annotated with `@Internal`
     - Some tests couldn't be moved to flink-connector-files because depending 
on several test utils/test bases from runtime. Neverthless, they have been 
adapted to the new package name.
     - Every package depending on either the moved format factories or the 
filesystem table sink/source internals now depend on `flink-connector-files`
   * Other commits involve package changes and are in a separate commit for 
clarity:
     - Dropped the scala suffix from flink-orc and flink-parquet
     - flink-table-uber now needs to manually import flink-connector-files (as 
it's not a transitive depedency from flink-table-common anymore)
   
   ## Verifying this change
   
   Every change is already covered by existing tests
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): moves dependencies 
around
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: changes the package name of flink-orc and flink-parquet
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] slinkydeveloper opened a new pull request #17897: [FLINK-24687][table] Move FileSystemTableSource/Sink in flink-connector-files

Reply via email to