Arttii commented on issue #12667:
URL: https://github.com/apache/arrow/issues/12667#issuecomment-1072673994


   Running this:
   ```
   pyarrow.__version__
   df = pd.DataFrame({"a": [1, 2, 3]})
   table = pyarrow.Table.from_pandas(df)
   userdata_parquet_dataset= pyarrow.dataset.dataset(table)
   batches=[r for r in userdata_parquet_dataset.to_batches()]
   
reader=pyarrow.dataset.Scanner.from_batches(batches,userdata_parquet_dataset.schema).to_reader()
   dataset=pyarrow.dataset.dataset(reader)
   
    ```
   
   gets me this:
   
   ```
   ---------------------------------------------------------------------------
   TypeError                                 Traceback (most recent call last)
   Input In [15], in <cell line: 7>()
         5 batches=[r for r in userdata_parquet_dataset.to_batches()]
         6 
reader=pyarrow.dataset.Scanner.from_batches(batches,userdata_parquet_dataset.schema).to_reader()
   ----> 7 dataset=pyarrow.dataset.dataset(reader)
   
   File 
~/workdir/duckdb/.venv/lib/python3.9/site-packages/pyarrow/dataset.py:687, in 
dataset(source, schema, format, filesystem, partitioning, partition_base_dir, 
exclude_invalid_files, ignore_prefixes)
       685     return _in_memory_dataset(source, **kwargs)
       686 else:
   --> 687     raise TypeError(
       688         'Expected a path-like, list of path-likes or a list of 
Datasets '
       689         'instead of the given type: {}'.format(type(source).__name__)
       690     )
   
   TypeError: Expected a path-like, list of path-likes or a list of Datasets 
instead of the given type: RecordBatchReader
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to