Ouch! 

Looks like a logger was left behind in DEBUG mode. Can you manually turn that 
off?

More memory would help in this case, because it seems that the foreman node is 
the one running out of heap space as it goes through the metadata for all the 
files. Is there a reason you are generating so many files to query? There is 
most likely a lower threshold for a parquet file size, below which you might be 
better off just using something like a CSV format.



-----Original Message-----
From: François Méthot [mailto:fmetho...@gmail.com] 
Sent: Tuesday, October 17, 2017 10:35 AM
To: dev@drill.apache.org
Subject: log flooded by "date values definitively CORRECT"

Hi again,

  I am running into an issue on a query done on 760 000 parquet files stored in 
HDFS. We are using Drill 1.10, 8GB heap, 20GB direct mem. Drill runs with debug 
log enabled all the time.

The query is standard select on  8 fields from hdfs.`/path` where this = that 
....


For about an hour I see this message on the foreman:

[pool-9-thread-##] DEBUG o.a.d.exec.store.parquet.Metadata - It is determined 
from metadata that the date values are definitely CORRECT

Then

[some UUID:foreman] INFO o.a.d.exec.store.parquet.Metadata - Fetch parquet 
metadata : Executed 761659 out of 761659 using 16 threads. Time : 3022416ms

Then :
Java.lang.OutOfMemoryError: Java Heap Space
   at java.util.Arrays.copyOf
   ...
   at java.io.PrintWriter.println(PrintWriter.java:757)
   at org.apache.calcite.rel.externalize.RelWriterImplt.explain
(RelWriterImpl.java:118)
   at org.apachje.calcite.rel.externalize.RelWriterImpl.done
(RelWriterImpl.java:160)
    ...
   at org.apache.calcite.plan.RelOptUtil.toString (RelOptUtil.java:1927)
   at
org.apache.drill.exec.planner.sql.handlers.DefaultSQLHandler.log(DefaultSQLHandler.java:138)
   ...
   at
org.apache.drill.exec.planner.sql.handlers.CreateTableHandler.getPlan(CreateTableHandler:102)
   at
org.apache.drill.exec.planner.DrillSqlWorker.getQueryPlan(DrillSqlWorker:131)
   ...
   at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1050)
   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:281)



I think it might be caused by having too much files to query, chunking our 
select into smaller piece actually helped.
Also suspect that the DEBUG logging is taxing the poor node a bit much.

Do you think adding more memory would address the issue (I can't try this right 
now) or you would think it is caused by a bug?


Thank in advance for any advises,

Francois

Reply via email to