Arttii commented on issue #12667:
URL: https://github.com/apache/arrow/issues/12667#issuecomment-1072673994
Running this:
```
pyarrow.__version__
df = pd.DataFrame({"a": [1, 2, 3]})
table = pyarrow.Table.from_pandas(df)
userdata_parquet_dataset= pyarrow.dataset.dataset(table)
batches=[r for r in userdata_parquet_dataset.to_batches()]
reader=pyarrow.dataset.Scanner.from_batches(batches,userdata_parquet_dataset.schema).to_reader()
dataset=pyarrow.dataset.dataset(reader)
```
gets me this:
```
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [15], in <cell line: 7>()
5 batches=[r for r in userdata_parquet_dataset.to_batches()]
6
reader=pyarrow.dataset.Scanner.from_batches(batches,userdata_parquet_dataset.schema).to_reader()
----> 7 dataset=pyarrow.dataset.dataset(reader)
File
~/workdir/duckdb/.venv/lib/python3.9/site-packages/pyarrow/dataset.py:687, in
dataset(source, schema, format, filesystem, partitioning, partition_base_dir,
exclude_invalid_files, ignore_prefixes)
685 return _in_memory_dataset(source, **kwargs)
686 else:
--> 687 raise TypeError(
688 'Expected a path-like, list of path-likes or a list of
Datasets '
689 'instead of the given type: {}'.format(type(source).__name__)
690 )
TypeError: Expected a path-like, list of path-likes or a list of Datasets
instead of the given type: RecordBatchReader
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]