Joerg Schneider created ARROW-13048:
---------------------------------------

             Summary: [Python] S3FileSystem fails moving filepaths containing = 
or +
                 Key: ARROW-13048
                 URL: https://issues.apache.org/jira/browse/ARROW-13048
             Project: Apache Arrow
          Issue Type: Bug
    Affects Versions: 4.0.1
            Reporter: Joerg Schneider


Hi Arrow team,

we have the very common use-case of having partitioned parquet tables on S3, 
written by Spark. These include equals (=) to denote the partition value per 
folder.

 

In trying to use PyArrows S3FileSystem `move` function, it's not possible to 
move these objects in the bucket underneath a path which contains `=` 
somewhere: 
{code:java}
OSError: When copying key 
'table/date=202007/part-00000-e39069c2-0ea6-4a62-85ea-8011047cd4f4.c000.snappy.parquet'
 in bucket 'bucket' to key 
'table2/date=202007/part-00000-e39069c2-0ea6-4a62-85ea-8011047cd4f4.c000.snappy.parquet'
 in bucket 'bucket': AWS Error [code 133]: The specified key does not 
exist.{code}

It is also not possible to move, using preemptively URL-quoted paths, like 
these:

 
{code:java}
OSError: When copying key 
'table/date%3D202007/part-00000-e39069c2-0ea6-4a62-85ea-8011047cd4f4.c000.snappy.parquet'
 in bucket 'bucket' to key 
'table2/date%3D202007/part-00000-e39069c2-0ea6-4a62-85ea-8011047cd4f4.c000.snappy.parquet'
 in bucket 'bucket': AWS Error [code 133]: The specified key does not 
exist.{code}
 

The source object does definitely exist, it has in fact been returned by a 
FileSelector from PyArrow itself and is just passed to move.


Is there any configuration option to be set, or special quoting to be used?

Thanks in advance.
Joerg

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to