[
https://issues.apache.org/jira/browse/PHOENIX-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Suhas Nalapure closed PHOENIX-4502.
-----------------------------------
Duplicate of 4503.
> Phoenix-Spark plugin doesn't release zookeeper connections
> ----------------------------------------------------------
>
> Key: PHOENIX-4502
> URL: https://issues.apache.org/jira/browse/PHOENIX-4502
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.11.0
> Environment: HBase 1.2 on Linux (Ubuntu, CentOS)
> Reporter: Suhas Nalapure
> Priority: Major
>
> *1. Phoenix-Spark plugin doesn't release zookeeper connections*
> Example:
> _for(int i=0; i < 50; i++){
> Dataset<Row> df =
> sqlContext.read().format("org.apache.phoenix.spark")
> .option("table",
> "\"Sales\"").option("zkUrl", "localhost:2181")
> .load();
> df.show(2);
> }
> Thread.sleep(1000*60); _
> When the above snippet is executed, we can see number of connections to 2181
> increasing and not getting released until after the main thread wakes up from
> sleep and program ends as can be seen below (14 is the number of connections
> even before the program starts to run) :
> netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 16:52:05
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 22
> 16:52:15
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 38
> 16:52:18
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 68
> 16:52:23
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 100
> 16:52:27
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:52:32
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:52:38
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:52:52
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:53:00
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:53:24
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 16:53:32
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 16:53:34
> root@user1 ~ $
> *2. Instead if "jdbc" format is used to create Spark Dataframe, the
> connection count doesn't shoot up *
> Example:
> _for(int i=0; i < 50; i++){
> Dataset<Row> df = sqlContext.read().format("jdbc")
> .option("url",
> "jdbc:phoenix:localhost:2181")
> .option("dbtable", "\"Sales\"")
> .option("driver",
> "org.apache.phoenix.jdbc.PhoenixDriver")
> .load();
> df.show(2);
> }
> Thread.sleep(1000*60);_
> Connection counts during program execution(14 being the count before
> execution starts):
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 17:00:42
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 17:00:43
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:00:46
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:00:50
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:00:55
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:12
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:18
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:28
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:34
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:37
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:39
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 17:02:07
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)