Hi, Querying 150 Billion records spread over ~21 000 parquets stored in hdfs on 13 nodes (6 cores each, Max Dir. Mem: 32GB, Max Heap 8 GB).
Is their a known issue or drill limitation that would explain why the first query below can't return the expected single row and aggregation ? create table ANALYSIS_RESULT as ( select to_date(to_timestamp((SECONDS)), count(1) from hdfs.`/data/ where Int32Field2=123456 or Int32Field2=4567898 group by to_date(to_timestamp((SECONDS))); After *20 hours*, SYSTEM ERROR: Foreman Exception: One more more nodes lost connectivity during query. If we do the query in 2 steps: create table ANALYSIS_RESULT as ( select Int32Field1 as SECONDS from hdfs.`/data/` where Int32Field2=123456 or Int32Field2=4567898); result was returned in *43 minutes* ( a single record ). select to_date(to_timestamp((SECONDS)), count(1) from ANALYSIS_RESULT group by to_date(to_timestamp((SECONDS)); Aggregation of that single record is of course done in < 1 second. 2016-04-04 1 I also tried select to_date(to_timestamp((SECONDS)), count(1) from ( select Int32Field1 as SECONDS from hdfs.`/data/` where Int32Field2=123456 or Int32Field2=4567898) group by o_date(to_timestamp((SECONDS)) Same thing: After *21 hours*, SYSTEM ERROR: Foreman Exception: One more more nodes lost connectivity during query. Thanks for your help Francois
