bilelomrani1 opened a new issue, #334: URL: https://github.com/apache/arrow-julia/issues/334
I have an `.arrow` file generated with `pyarrow` whose schema is the following: ``` input: struct<open: fixed_size_list<item: float>[512], high: fixed_size_list<item: float>[512], low: fixed_size_list<item: float>[512], close: fixed_size_list<item: float>[512]> not null child 0, open: fixed_size_list<item: float>[512] child 0, item: float child 1, high: fixed_size_list<item: float>[512] child 0, item: float child 2, low: fixed_size_list<item: float>[512] child 0, item: float child 3, close: fixed_size_list<item: float>[512] child 0, item: float ``` With `pyarrow`, I load and iterate over records with the following: ```python with pa.memory_map('arraydata.arrow', 'r') as source: loaded_arrays = pa.ipc.open_file(source).read_all() a = 0 for batch in loaded_arrays.to_batches(): for input_candles in batch["input"]: a += 1 ``` Iterating over my example file (~10,000 lines) takes 210 ms. In julia, I load and iterate over the same file with the following: ```julia stream = Arrow.Stream("./arraydata.arrow") function bench_iteration(stream) a = 0 for batch in stream for sample in batch.input a += 1 end end end @btime bench_iteration($stream) ``` ``` 3.169 s (25272097 allocations: 1.70 GiB) ``` Iterating over records takes 15 more time with `Arrow.jl`. Am I doing something wrong? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org