devinjdangelo commented on code in PR #7244:
URL: https://github.com/apache/arrow-datafusion/pull/7244#discussion_r1289999392
##########
datafusion/core/src/datasource/file_format/parquet.rs:
##########
@@ -543,6 +574,172 @@ async fn fetch_statistics(
Ok(statistics)
}
+/// Implements [`DataSink`] for writing to a parquet file.
+struct ParquetSink {
+ /// Config options for writing data
+ config: FileSinkConfig,
+}
+
+impl Debug for ParquetSink {
+ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+ f.debug_struct("ParquetSink").finish()
+ }
+}
+
+impl DisplayAs for ParquetSink {
+ fn fmt_as(&self, t: DisplayFormatType, f: &mut fmt::Formatter<'_>) ->
fmt::Result {
+ match t {
+ DisplayFormatType::Default | DisplayFormatType::Verbose => {
+ write!(
+ f,
+ "ParquetSink(writer_mode={:?}, file_groups=",
+ self.config.writer_mode
+ )?;
+ FileGroupDisplay(&self.config.file_groups).fmt_as(t, f)?;
+ write!(f, ")")
+ }
+ }
+ }
+}
+
+impl ParquetSink {
+ fn new(config: FileSinkConfig) -> Self {
+ Self { config }
+ }
+
+ /// Builds a parquet WriterProperties struct, setting options as
appropriate from TaskContext options
+ fn parquet_writer_props_from_context(
+ &self,
+ context: &Arc<TaskContext>,
+ ) -> WriterProperties {
+ let parquet_context =
&context.session_config().options().execution.parquet;
+ let mut builder = WriterProperties::builder()
+ .set_created_by(parquet_context.created_by.clone())
+
.set_data_page_row_count_limit(parquet_context.data_page_row_count_limit)
+ .set_data_page_size_limit(parquet_context.data_pagesize_limit);
Review Comment:
Yes more to do here. Some of the WriterProperties configs are not primitive
types (such as compression and statistics). Still working through this, but the
plan is to have the Session level config be a string and implement
TryFrom(String) for these types. Of course since these types/traits are defined
in other crates, so we would either need to implement TryFrom in arrow-rs or
have a wrapper type in Datafusion.
Let me know if we already have a pattern to handle configs that ultimately
need to map to non primitive types that I just missed. 🤔
##########
datafusion/core/src/datasource/file_format/parquet.rs:
##########
@@ -543,6 +574,172 @@ async fn fetch_statistics(
Ok(statistics)
}
+/// Implements [`DataSink`] for writing to a parquet file.
+struct ParquetSink {
+ /// Config options for writing data
+ config: FileSinkConfig,
+}
+
+impl Debug for ParquetSink {
+ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+ f.debug_struct("ParquetSink").finish()
+ }
+}
+
+impl DisplayAs for ParquetSink {
+ fn fmt_as(&self, t: DisplayFormatType, f: &mut fmt::Formatter<'_>) ->
fmt::Result {
+ match t {
+ DisplayFormatType::Default | DisplayFormatType::Verbose => {
+ write!(
+ f,
+ "ParquetSink(writer_mode={:?}, file_groups=",
+ self.config.writer_mode
+ )?;
+ FileGroupDisplay(&self.config.file_groups).fmt_as(t, f)?;
+ write!(f, ")")
+ }
+ }
+ }
+}
+
+impl ParquetSink {
+ fn new(config: FileSinkConfig) -> Self {
+ Self { config }
+ }
+
+ /// Builds a parquet WriterProperties struct, setting options as
appropriate from TaskContext options
+ fn parquet_writer_props_from_context(
+ &self,
+ context: &Arc<TaskContext>,
+ ) -> WriterProperties {
+ let parquet_context =
&context.session_config().options().execution.parquet;
+ let mut builder = WriterProperties::builder()
+ .set_created_by(parquet_context.created_by.clone())
+
.set_data_page_row_count_limit(parquet_context.data_page_row_count_limit)
+ .set_data_page_size_limit(parquet_context.data_pagesize_limit);
Review Comment:
Yes more to do here. Some of the WriterProperties configs are not primitive
types (such as compression and statistics). Still working through this, but the
plan is to have the Session level config be a string and implement
TryFrom(String) for these types. Of course since these types/traits are defined
in other crates, so we would either need to implement TryFrom in arrow-rs or
have a wrapper type in Datafusion.
Let me know if we already have a pattern to handle configs that ultimately
need to map to non primitive types that I just missed. 🤔
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]