Martin Thøgersen created ARROW-15494:
----------------------------------------

             Summary: [Docs] Clarify {{existing_data_behavior}} docstring
                 Key: ARROW-15494
                 URL: https://issues.apache.org/jira/browse/ARROW-15494
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Documentation
    Affects Versions: 7.0.1
            Reporter: Martin Thøgersen


Clarify wording slightly of \{{pyarrow.dataset.write_dataset()}} parameter 
{{existing_data_behavior}}

[https://github.com/apache/arrow/blob/a27c55660e575a3987283d5d9e443642db48f215/python/pyarrow/dataset.py#L812-L827]

Proposed wording:

{noformat}
    existing_data_behavior : 'error' | 'overwrite_or_ignore' | \
'delete_matching'
        Controls how the dataset will handle data that already exists in
        the destination.  The default behavior ('error') is to raise an error
        if any data exists in the `base_dir` destination.

        'overwrite_or_ignore' will ignore any existing data and will
        overwrite files with the same name as an output file.  Other
        existing files will be ignored.  This behavior, in combination
        with a unique basename_template for each write, will allow for
        an append workflow.

        'delete_matching' is useful when you are writing a partitioned
        dataset.  The first time each partition leaf-level directory is 
        encountered the entire leaf-level directory will be deleted.  This
        allows you to overwrite old partitions completely.
{noformat}

I.e. clarify that:
- {{error}} applies to the base_dir level.
- {{delete_matching}} applies to the leaf-level directory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to