gemini-code-assist[bot] commented on code in PR #38484:
URL: https://github.com/apache/beam/pull/38484#discussion_r3234785728


##########
sdks/python/apache_beam/dataframe/io.py:
##########
@@ -684,13 +684,18 @@ def expand(self, pcoll):
       dir, name = self.path, ''
     else:
       dir, name = io.filesystems.FileSystems.split(self.path)
+    num_shards = self.kwargs.pop('num_shards', None)
+    write_to_files_kwargs = {}
+    if num_shards is not None:
+      write_to_files_kwargs['shards'] = num_shards
+      write_to_files_kwargs['max_writers_per_bundle'] = 0

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   This change has two issues:
   1. It introduces a bug: if a user passes `max_writers_per_bundle`, it is not 
removed from `self.kwargs`. This will cause a `TypeError` as it will be passed 
down to the underlying pandas writer function which doesn't accept it.
   2. It unconditionally sets `max_writers_per_bundle` to 0 if `num_shards` is 
specified, overriding any value the user might have provided. This reduces 
flexibility.
   
   A better approach is to respect a user-provided `max_writers_per_bundle` 
while defaulting to 0 to force sharding when `num_shards` is given. Also, 
`max_writers_per_bundle` should always be removed from `kwargs` to prevent the 
`TypeError`.
   
   ```python
       num_shards = self.kwargs.pop('num_shards', None)
       max_writers_per_bundle = self.kwargs.pop('max_writers_per_bundle', None)
       write_to_files_kwargs = {}
       if num_shards is not None:
         write_to_files_kwargs['shards'] = num_shards
         write_to_files_kwargs['max_writers_per_bundle'] = (
             0 if max_writers_per_bundle is None else max_writers_per_bundle)
       elif max_writers_per_bundle is not None:
         write_to_files_kwargs['max_writers_per_bundle'] = 
max_writers_per_bundle
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to