[ 
https://issues.apache.org/jira/browse/ARROW-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209945#comment-17209945
 ] 

Andy Grove commented on ARROW-10226:
------------------------------------

Query works fine against tbl files but not against parquet files (it's reading 
the wrong columns somehow). Spark works fine so the issue is not with the 
Parquet files. Really odd to find this now.

> [Rust] [DataFusion] TPC-H query 1 no longer completes for 100GB dataset
> -----------------------------------------------------------------------
>
>                 Key: ARROW-10226
>                 URL: https://issues.apache.org/jira/browse/ARROW-10226
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Rust, Rust - DataFusion
>            Reporter: Andy Grove
>            Assignee: Andy Grove
>            Priority: Blocker
>             Fix For: 2.0.0
>
>
> I re-installed my desktop a few days ago (now using Ubuntu 20.04 LTS)  and 
> when I try and run the TPC-H benchmark, it never completes and eventually 
> uses up all 64 GB RAM.
> I can run Spark against the data  set and the query completes in 24 seconds, 
> which IIRC is how long it took before.
> It is possible that something is odd on my environment, but it is also 
> possible/likely that this is a real bug.
> I am investigating this and will update the Jira once I know more.
> I also went back to old commits that were working for me before and they show 
> the same issue so I don't think this is related to a recent code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to