devinjdangelo commented on code in PR #7244:
URL: https://github.com/apache/arrow-datafusion/pull/7244#discussion_r1289380024


##########
datafusion/common/src/config.rs:
##########
@@ -270,7 +270,48 @@ config_namespace! {
         /// will be reordered heuristically to minimize the cost of 
evaluation. If false,
         /// the filters are applied in the same order as written in the query
         pub reorder_filters: bool, default = false
+
+        // The following map to parquet::file::properties::WriterProperties
+
+        /// Sets best effort maximum size of data page in bytes
+        pub data_pagesize_limit: usize, default = 1024 * 1024

Review Comment:
   I think we will definitely want `COPY TO` to be able to set any of these 
configs on a per statement basis.
   
   For `insert into`, we could allow the table itself to be registered with 
specific settings e.g.:
   
   ```sql
   create external table my_table(x int, y int) 
   stored as parquet
   location '/tmp/my_table' 
   WITH (
   DATA_PAGESIZE_LIMIT 2048,
   DATA_PAGE_ROW_COUNT_LIMIT 100000)
   ...
   );
   ```
   
   `insert into mytable` would then use any table specific settings or fall 
back to the session level configs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to