weijunlu opened a new issue, #2871:
URL: https://github.com/apache/drill/issues/2871

   Drill version:
   <version>1.22.0-SNAPSHOT</version>
   
   **Describe the bug**
   the mergejoin memery leak.
   
   **To Reproduce**
   1. Configure direct memory, In the drill-env.sh file in the conf directory 
add:
   export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"1G"}
   2. drill hive storage plugin configuration:
   {
     "type": "hive",
     "configProps": {
       "hive.metastore.sasl.enabled": "false",
       "fs.default.name": "hdfs://xxx:9000",
       "datanucleus.schema.autoCreateAll": "true",
       "javax.jdo.option.ConnectionURL": 
"jdbc:mysql://xxx:3306/tpch?createDatabaseIfNotExist=true&amp;useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false",
       "javax.jdo.option.ConnectionUserName": "root",
       "javax.jdo.option.ConnectionPassword": "******",
       "hive.metastore.uris": "thrift://xxx:9083",
       "hive.metastore.schema.verification": "false",
       "hive.metastore.warehouse.dir": "/user/hive/warehouse",
       "hive.jdbc.driver": "org.apache.hive.jdbc.HiveDriver",
       "hive.jdbc.url": "jdbc:hive2://xxx:10000",
       "hive.jdbc.user": "root",
       "hive.jdbc.password": "******",
       "hive.dml.mode": "jdbc",
       "writable": "true"
     },
     "enabled": true,
     "authMode": "SHARED_USER"
   }
   3. Create eight tpch tables, use tpch tool to generate 1s data, and import 
data into hive. See the attachment for details.
   4. Write the execution script, run the tpche sql statements in 5 concurrency:
   shell script:
   [root@jupiterdev memeryleak]# cat para_power_test.sh
   #!/bin/bash
   
   drill_home=/opt/apache-drill-1.22.0-SNAPSHOT/bin
   
   random_sql(){
   while true
   do
    $drill_home/sqlline -u \"jdbc:drill:zk=localhost:2181\" -f tpch_sql8.sql >> 
log/tpch1s_sql8.log 2>&1
   done
   }
   
   main(){
   unset HADOOP_CLASSPATH
   #sleep 2h
   #TPCH power test
   for i in `seq 1 5`
   do
        random_sql & 
   done
   }
   main $@
   the sql as follows:
   select 
   o_year, 
   sum(case when nation = 'CHINA' then volume else 0 end) / sum(volume) as 
mkt_share 
   from (
   select 
   extract(year from o_orderdate) as o_year, 
   l_extendedprice * 1.0 as volume, 
   n2.n_name as nation 
   from hive.tpch.part, hive.tpch.supplier, hive.tpch.lineitem, 
hive.tpch.orders, hive.tpch.customer, hive.tpch.nation n1, hive.tpch.nation n2, 
hive.tpch.region
   where 
   p_partkey = l_partkey 
   and s_suppkey = l_suppkey 
   and l_orderkey = o_orderkey 
   and o_custkey = c_custkey 
   and c_nationkey = n1.n_nationkey 
   and n1.n_regionkey = r_regionkey 
   and r_name = 'ASIA' 
   and s_nationkey = n2.n_nationkey 
   and o_orderdate between date '1995-01-01' 
   and date '1996-12-31' 
   and p_type = 'LARGE BRUSHED BRASS') as all_nations 
   group by o_year 
   order by o_year;
   5. Execute the script and view the sqlline running status:
   [root@jupiterdev memeryleak]# ./para_power_test.sh
   [root@jupiterdev memeryleak]# jps | grep SqlLine
   5638 SqlLine
   5799 SqlLine
   5705 SqlLine
   5709 SqlLine
   5584 SqlLine
   
   6. View drill's direct memory through the 8047 page:
   <img width="959" alt="1" 
src="https://github.com/apache/drill/assets/104422448/e6331b33-c6c9-4f4f-8c6a-7334b46f6b69";>
   <img width="959" alt="2" 
src="https://github.com/apache/drill/assets/104422448/162500be-6a59-4a18-8b73-79a6bac8279d";>
   
   7. After running for a while, an sql statement error is detected, and then 
pkill -f para_power_test.sh
   8. Check again that there is no sqlline execution, bug the direct memery is 
not fully release.
   <img width="959" alt="4" 
src="https://github.com/apache/drill/assets/104422448/9f9724f8-6210-48c4-a1ca-132215d2602f";>
   
   **Expected behavior**
   Expected  that the direct memery can fully release.
   
   **Additional context**
   update the planner.enable_hashjoin option to false through the 8047 page
   <img width="562" alt="1705471433736" 
src="https://github.com/apache/drill/assets/104422448/a12ed78b-be76-44a7-b75a-08828440983e";>
   [hive.txt](https://github.com/apache/drill/files/13959322/hive.txt)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to