[ https://issues.apache.org/jira/browse/ARROW-14257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17426081#comment-17426081 ]
Weston Pace commented on ARROW-14257: ------------------------------------- In Python it is always use_async=True. In R the scanner is hidden from the user on dataset writes but the option there is use_async as well. In C++ the option is UseAsync in the ScannerBuilder. How about, "Writing datasets requires that the input scanner is configured to scan asynchronously via the use_async or UseAsync options." > [Doc][Python] dataset doc build fails > ------------------------------------- > > Key: ARROW-14257 > URL: https://issues.apache.org/jira/browse/ARROW-14257 > Project: Apache Arrow > Issue Type: Bug > Components: Documentation, Python > Reporter: Antoine Pitrou > Assignee: Joris Van den Bossche > Priority: Blocker > Labels: pull-request-available > Fix For: 6.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > {code} > >>>------------------------------------------------------------------------- > Exception in /home/antoine/arrow/dev/docs/source/python/dataset.rst at block > ending on line 578 > Specify :okexcept: as an option in the ipython:: block to suppress this > message > --------------------------------------------------------------------------- > ArrowNotImplementedError Traceback (most recent call last) > <ipython-input-66-0fdb20f82a93> in <module> > ----> 1 ds.write_dataset(scanner, new_root, format="parquet", > partitioning=new_part) > ~/arrow/dev/python/pyarrow/dataset.py in write_dataset(data, base_dir, > basename_template, format, partitioning, partitioning_flavor, schema, > filesystem, file_options, use_threads, max_partitions, file_visitor) > 861 _filesystemdataset_write( > 862 scanner, base_dir, basename_template, filesystem, > partitioning, > --> 863 file_options, max_partitions, file_visitor > 864 ) > ~/arrow/dev/python/pyarrow/_dataset.pyx in > pyarrow._dataset._filesystemdataset_write() > ~/arrow/dev/python/pyarrow/error.pxi in pyarrow.lib.check_status() > ArrowNotImplementedError: Asynchronous scanning is not supported by > SyncScanner > /home/antoine/arrow/dev/cpp/src/arrow/dataset/file_base.cc:367 > scanner->ScanBatchesAsync() > <<<------------------------------------------------------------------------- > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)