Martin Thøgersen created ARROW-15494: ----------------------------------------
Summary: [Docs] Clarify {{existing_data_behavior}} docstring Key: ARROW-15494 URL: https://issues.apache.org/jira/browse/ARROW-15494 Project: Apache Arrow Issue Type: Improvement Components: Documentation Affects Versions: 7.0.1 Reporter: Martin Thøgersen Clarify wording slightly of \{{pyarrow.dataset.write_dataset()}} parameter {{existing_data_behavior}} [https://github.com/apache/arrow/blob/a27c55660e575a3987283d5d9e443642db48f215/python/pyarrow/dataset.py#L812-L827] Proposed wording: {noformat} existing_data_behavior : 'error' | 'overwrite_or_ignore' | \ 'delete_matching' Controls how the dataset will handle data that already exists in the destination. The default behavior ('error') is to raise an error if any data exists in the `base_dir` destination. 'overwrite_or_ignore' will ignore any existing data and will overwrite files with the same name as an output file. Other existing files will be ignored. This behavior, in combination with a unique basename_template for each write, will allow for an append workflow. 'delete_matching' is useful when you are writing a partitioned dataset. The first time each partition leaf-level directory is encountered the entire leaf-level directory will be deleted. This allows you to overwrite old partitions completely. {noformat} I.e. clarify that: - {{error}} applies to the base_dir level. - {{delete_matching}} applies to the leaf-level directory. -- This message was sent by Atlassian Jira (v8.20.1#820001)