[ 
https://issues.apache.org/jira/browse/DRILL-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17294810#comment-17294810
 ] 

Paul Rogers commented on DRILL-7872:
------------------------------------

Hi [~Klar],

The released versions of Drill have a known memory limitation in the REST API. 
Basically, Drill caches the entire result set, on the heap, as multiple objects 
per field of data (that is, multiple for each row and column.) Needless to say, 
this does not scale.

Originally, the REST API was only for small, interactive queries run from the 
Drill web console; it was never meant for "production" use.

But, since many folks use the REST API as their primary API (despite its many 
limitations), the master version has a fix to stream the data without caching. 
See [DRILL-7733|https://github.com/apache/drill/pull/2149]. With this fix, 
memory usage for REST queries is constant (though the REST format is still 
terribly inefficient for large queries.)

The answer is not to *downgrade*, it is to *upgrade* to the master branch. It 
should be easy to build your own version of Drill with the fix. Grab the latest 
master and do the build as described in the docs. Fortunately, Drill is quite 
easy to build once you have the right version of Maven and Java.

Otherwise, as the error message suggest, you can use the JDBC or ODBC APIs as 
they are designed to scale, run in parallel, support multiple queries, allow 
query cancellation, and so on.

> Heap memory within docker container does not get recycled
> ---------------------------------------------------------
>
>                 Key: DRILL-7872
>                 URL: https://issues.apache.org/jira/browse/DRILL-7872
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.18.0
>            Reporter: Christian
>            Priority: Major
>             Fix For: 1.19.0
>
>
> Hi - I'm running the official Docker container apache/drill:1.18.0. The Heap 
> memory usage keeps increasing after each query until I get a
> "\{\n \"errorMessage\" : \"RESOURCE ERROR: There is not enough heap memory to 
> run this query using the web interface. \\n\\nPlease try a query with fewer 
> columns or with a filter or limit condition to limit the data returned. 
> \\nYou can also try an ODBC/JDBC client. \\n\\n[Error Id: 
> d5fef16a-cca7-4ee2-8316-98d510e0f41b ]\"\n}" error.
> I tested it by literally sending the same query over and over and it always 
> does the same - the first few times the queries succeed but at the same time 
> the heap increases until the max heap size is reached and then I just get 
> above error.
> I'm querying parquet files through the REST api.
> I changed the config of the drill-env.sh file but no luck with these either.
> export DRILLBIT_MAX_PROC_MEM=${DRILLBIT_MAX_PROC_MEM:-"20G"}
> export DRILL_HEAP=${DRILL_HEAP:-"8G"}
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"8G"}
>  
> export DRILLBIT_CODE_CACHE_SIZE=${DRILLBIT_CODE_CACHE_SIZE:-"2G"}
> Any guidance would be appreciated.
> Regards,
> Christian
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to