Hi Marc,

I don't think any of the core team has used MinIO. Sounds like you are running 
Drill in Docker. So, the first question to others is: is anyone using Drill in 
Docker against plain old S3? If someone is, and does not hit the delay issues 
you describe, then we can narrow down the problem to something in your 
environment.

Can you test MinIO without Drill? Maybe create a Docker container with the AWS 
client, ssh into the container, and use the command line tools to download your 
nation.parquet file, Check if you also encounter the delay. If so, this tells 
us there is an environment issue. If the command line is fast, then perhaps we 
have a Drill issue.

The next step would be to enable logging. I don't know if we have detailed 
logging around file actions (open, read, close); so we'll have to check to see 
if logging will give us the detail we need.

Thanks,
- Paul

 

    On Tuesday, January 28, 2020, 4:04:07 AM PST, Marc Sole Fonte 
<ms...@iti.es> wrote:  
 
 Hello,

I m currently trying to use Drill to query MinIO (S3 API) but I am having a lot 
of problems related to the time it takes (I got a lot of timeouts). Both 
services (one instance each) are running in docker in my local computer.

The problem is that the first query takes like 40+ seconds and, after it has 
finished, it takes less than 1 second. I am querying a very small parquet file.

As an instance, these are two queries that I executed. The first query planning 
took 27.08 seconds:

01/10/2020 13:42:04 anonymous
SELECT N_NAME as COUNTRY FROM minio_jupyter.`nation.parquet` WHERE N_REGIONKEY 
= 2
Succeeded 5.421 sec 0eff029cf8dc
01/10/2020 13:37:27 anonymous
SELECT N_NAME as COUNTRY FROM minio_jupyter.`nation.parquet` WHERE N_REGIONKEY 
= 2
Succeeded 33.508 sec 0eff029cf8dc

This is not an isolated case. It happens everytime I try to use it. I run a new 
docker clean image each time.

Also, if I try to execute the same query multiple times (because of timeout) I 
get the same problem till the first query (48.296s planning in this case) 
finishes. Some times I even get slow queries after thath (3+ seconds).

01/28/2020 10:33:59    anonymous      
<http://localhost:9000/profiles/21cff1e7-a8a8-6128-d329-ac369bd69c32>
SELECT N_NAME as COUNTRY FROM minio_jupyter.`nation.parquet` WHERE N_REGIONKEY 
= 2
        Succeeded      3.494 sec      86acfa9818e1
01/28/2020 10:21:14    anonymous      
<http://localhost:9000/profiles/21cff4e5-fe98-2db8-c617-d15c96470235>
SELECT N_NAME as COUNTRY FROM minio_jupyter.`nation.parquet` WHERE N_REGIONKEY 
= 2
        Succeeded      4.595 sec      86acfa9818e1
01/28/2020 10:20:33    anonymous      
<http://localhost:9000/profiles/21cff50d-ae9c-1629-f80f-db5c3a253762>
SELECT N_NAME as COUNTRY FROM minio_jupyter.`nation.parquet` WHERE N_REGIONKEY 
= 2
        Succeeded      31.801 sec      86acfa9818e1
01/28/2020 10:20:16    anonymous      
<http://localhost:9000/profiles/21cff51e-e55f-456e-c399-51289fadb77a>
SELECT N_NAME as COUNTRY FROM minio_jupyter.`nation.parquet` WHERE N_REGIONKEY 
= 2
        Succeeded      49.098 sec      86acfa9818e1
01/28/2020 10:20:03    anonymous      
<http://localhost:9000/profiles/21cff52e-2792-79e8-48b5-f258e6efb02b>
SELECT N_NAME as COUNTRY FROM minio_jupyter.`nation.parquet` WHERE N_REGIONKEY 
= 2
        Succeeded      01 min 2.494 sec        86acfa9818e1
Thank you for your help,
Marc

  

Reply via email to