[ 
https://issues.apache.org/jira/browse/ARROW-14257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17426081#comment-17426081
 ] 

Weston Pace commented on ARROW-14257:
-------------------------------------

In Python it is always use_async=True.  In R the scanner is hidden from the 
user on dataset writes but the option there is use_async as well.  In C++ the 
option is UseAsync in the ScannerBuilder.  How about,

"Writing datasets requires that the input scanner is configured to scan 
asynchronously via the use_async or UseAsync options."

> [Doc][Python] dataset doc build fails
> -------------------------------------
>
>                 Key: ARROW-14257
>                 URL: https://issues.apache.org/jira/browse/ARROW-14257
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Documentation, Python
>            Reporter: Antoine Pitrou
>            Assignee: Joris Van den Bossche
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 6.0.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
> >>>-------------------------------------------------------------------------
> Exception in /home/antoine/arrow/dev/docs/source/python/dataset.rst at block 
> ending on line 578
> Specify :okexcept: as an option in the ipython:: block to suppress this 
> message
> ---------------------------------------------------------------------------
> ArrowNotImplementedError                  Traceback (most recent call last)
> <ipython-input-66-0fdb20f82a93> in <module>
> ----> 1 ds.write_dataset(scanner, new_root, format="parquet", 
> partitioning=new_part)
> ~/arrow/dev/python/pyarrow/dataset.py in write_dataset(data, base_dir, 
> basename_template, format, partitioning, partitioning_flavor, schema, 
> filesystem, file_options, use_threads, max_partitions, file_visitor)
>     861     _filesystemdataset_write(
>     862         scanner, base_dir, basename_template, filesystem, 
> partitioning,
> --> 863         file_options, max_partitions, file_visitor
>     864     )
> ~/arrow/dev/python/pyarrow/_dataset.pyx in 
> pyarrow._dataset._filesystemdataset_write()
> ~/arrow/dev/python/pyarrow/error.pxi in pyarrow.lib.check_status()
> ArrowNotImplementedError: Asynchronous scanning is not supported by 
> SyncScanner
> /home/antoine/arrow/dev/cpp/src/arrow/dataset/file_base.cc:367  
> scanner->ScanBatchesAsync()
> <<<-------------------------------------------------------------------------
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to