Hi again,

  I am running into an issue on a query done on 760 000 parquet files
stored in HDFS. We are using Drill 1.10, 8GB heap, 20GB direct mem. Drill
runs with debug log enabled all the time.

The query is standard select on  8 fields from hdfs.`/path` where this =
that ....


For about an hour I see this message on the foreman:

[pool-9-thread-##] DEBUG o.a.d.exec.store.parquet.Metadata - It is
determined from metadata that the date values are definitely CORRECT

Then

[some UUID:foreman] INFO o.a.d.exec.store.parquet.Metadata - Fetch parquet
metadata : Executed 761659 out of 761659 using 16 threads. Time : 3022416ms

Then :
Java.lang.OutOfMemoryError: Java Heap Space
   at java.util.Arrays.copyOf
   ...
   at java.io.PrintWriter.println(PrintWriter.java:757)
   at org.apache.calcite.rel.externalize.RelWriterImplt.explain
(RelWriterImpl.java:118)
   at org.apachje.calcite.rel.externalize.RelWriterImpl.done
(RelWriterImpl.java:160)
    ...
   at org.apache.calcite.plan.RelOptUtil.toString (RelOptUtil.java:1927)
   at
org.apache.drill.exec.planner.sql.handlers.DefaultSQLHandler.log(DefaultSQLHandler.java:138)
   ...
   at
org.apache.drill.exec.planner.sql.handlers.CreateTableHandler.getPlan(CreateTableHandler:102)
   at
org.apache.drill.exec.planner.DrillSqlWorker.getQueryPlan(DrillSqlWorker:131)
   ...
   at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1050)
   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:281)



I think it might be caused by having too much files to query, chunking our
select into smaller piece actually helped.
Also suspect that the DEBUG logging is taxing the poor node a bit much.

Do you think adding more memory would address the issue (I can't try this
right now) or you would think it is caused by a bug?


Thank in advance for any advises,

Francois

Reply via email to