eladkal commented on code in PR #36817:
URL: https://github.com/apache/airflow/pull/36817#discussion_r1457350670


##########
airflow/providers/google/CHANGELOG.rst:
##########
@@ -27,6 +27,16 @@
 Changelog
 ---------
 
+.. note::
+  The default value of ``parquet_row_group_size`` in ``BaseSQLToGCSOperator`` 
has changed from 1 to
+  100000, in order to have a default that provides better compression 
efficiency and performance of
+  reading the data in the output Parquet files. In many cases, the previous 
value of 1 resulted in
+  very large files, long task durations and out of memory issues. A default 
value of 100000 may require
+  more memory to execute the operator, in which case users can override the 
``parquet_row_group_size``
+  parameter in the operator. All operators that are derived from 
``BaseSQLToGCSOperator`` are affected
+  when ``export_format`` is ``parquet``: ``MySQLToGCSOperator``, 
``PrestoToGCSOperator``,
+  ``OracleToGCSOperator``, ``TrinoToGCSOperator``, ``MSSQLToGCSOperator`` and 
``PostgresToGCSOperator``.

Review Comment:
   This is good explnation I would even highlight that we consider this change 
as bug fix. Which means that users understand that we weighed in all factors 
and made an inform decision. It's much better than users might think that we 
overlooked it.
   ```suggestion
     The default value of ``parquet_row_group_size`` in 
``BaseSQLToGCSOperator`` has changed from 1 to
     100000, in order to have a default that provides better compression 
efficiency and performance of
     reading the data in the output Parquet files. In many cases, the previous 
value of 1 resulted in
     very large files, long task durations and out of memory issues. A default 
value of 100000 may require
     more memory to execute the operator, in which case users can override the 
``parquet_row_group_size``
     parameter in the operator. All operators that are derived from 
``BaseSQLToGCSOperator`` are affected
     when ``export_format`` is ``parquet``: ``MySQLToGCSOperator``, 
``PrestoToGCSOperator``,
     ``OracleToGCSOperator``, ``TrinoToGCSOperator``, ``MSSQLToGCSOperator`` 
and ``PostgresToGCSOperator``. Due to the above we treat this change as bug fix.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to