[ https://issues.apache.org/jira/browse/SPARK-13614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175516#comment-15175516 ]
chillon_m edited comment on SPARK-13614 at 3/3/16 2:14 AM: ----------------------------------------------------------- @[~srowen] the same size of dataset(hot.count()=599147,ghot.size=21844),collect don't trigger memory leak(first image),but show() trigger it.why?in general,collect trigger it easily("Keep in mind that your entire dataset must fit in memory on a single machine to use collect() on it, so collect() shouldn’t be used on large datasets." in <learning spark>),but collect don't trigger. was (Author: chillon_m): [~srowen] the same size of dataset(hot.count()=599147,ghot.size=21844),collect don't trigger memory leak(first image),but show() trigger it.why?in general,collect trigger it easily("Keep in mind that your entire dataset must fit in memory on a single machine to use collect() on it, so collect() shouldn’t be used on large datasets." in <learning spark>),but collect don't trigger. > show() trigger memory leak,why? > ------------------------------- > > Key: SPARK-13614 > URL: https://issues.apache.org/jira/browse/SPARK-13614 > Project: Spark > Issue Type: Question > Components: SQL > Affects Versions: 1.5.2 > Reporter: chillon_m > Attachments: memory leak.png, memory.png > > > hot.count()=599147 > ghot.size=21844 > [bigdata@namenode spark-1.5.2-bin-hadoop2.4]$ bin/spark-shell > --driver-class-path /home/bigdata/mysql-connector-java-5.1.38-bin.jar > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 1.5.2 > /_/ > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80) > Type in expressions to have them evaluated. > Type :help for more information. > Spark context available as sc. > SQL context available as sqlContext. > scala> val hot=sqlContext.read.format("jdbc").options(Map("url" -> > "jdbc:mysql://:/?user=&password=","dbtable" -> "")).load() > Wed Mar 02 14:22:37 CST 2016 WARN: Establishing SSL connection without > server's identity verification is not recommended. According to MySQL > 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established > by default if explicit option isn't set. For compliance with existing > applications not using SSL the verifyServerCertificate property is set to > 'false'. You need either to explicitly disable SSL by setting useSSL=false, > or set useSSL=true and provide truststore for server certificate verification. > hot: org.apache.spark.sql.DataFrame = [] > scala> val ghot=hot.groupBy("Num","pNum").count().collect() > Wed Mar 02 14:22:59 CST 2016 WARN: Establishing SSL connection without > server's identity verification is not recommended. According to MySQL > 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established > by default if explicit option isn't set. For compliance with existing > applications not using SSL the verifyServerCertificate property is set to > 'false'. You need either to explicitly disable SSL by setting useSSL=false, > or set useSSL=true and provide truststore for server certificate verification. > ghot: Array[org.apache.spark.sql.Row] = Array([[],[],[], [,42310... > scala> ghot.take(20) > res0: Array[org.apache.spark.sql.Row] = Array([],[],[],[],[],[],[],[]....) > scala> hot.groupBy("Num","pNum").count().show() > Wed Mar 02 14:26:05 CST 2016 WARN: Establishing SSL connection without > server's identity verification is not recommended. According to MySQL > 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established > by default if explicit option isn't set. For compliance with existing > applications not using SSL the verifyServerCertificate property is set to > 'false'. You need either to explicitly disable SSL by setting useSSL=false, > or set useSSL=true and provide truststore for server certificate verification. > 16/03/02 14:26:33 ERROR Executor: Managed memory leak detected; size = > 4194304 bytes, TID = 202 > +----------+---------+-----+ > | QQNum| TroopNum|count| > +----------+---------+-----+ > |1XXXXXXXXX|38XXXXXXX| 1| > |1XXXXXXXXX| 5XXXXXXX| 2| > |1XXXXXXXXX|26XXXXXXX| 6| > |1XXXXXXXXX|14XXXXXXX| 3| > |1XXXXXXXXX|41XXXXXXX| 14| > |1XXXXXXXXX|48XXXXXXX| 18| > |1XXXXXXXXX|23XXXXXXX| 2| > |1XXXXXXXXX| XXXXXXX| 34| > |1XXXXXXXXX|52XXXXXXX| 1| > |1XXXXXXXXX|52XXXXXXX| 2| > |1XXXXXXXXX|49XXXXXXX| 3| > |1XXXXXXXXX|42XXXXXXX| 3| > |1XXXXXXXXX|17XXXXXXX| 11| > |1XXXXXXXXX|25XXXXXXX| 129| > |1XXXXXXXXX|13XXXXXXX| 2| > |1XXXXXXXXX|19XXXXXXX| 1| > |1XXXXXXXXX|32XXXXXXX| 9| > |1XXXXXXXXX|38XXXXXXX| 6| > |1XXXXXXXXX|38XXXXXXX| 13| > |1XXXXXXXXX|30XXXXXXX| 4| > +----------+---------+-----+ > only showing top 20 rows -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org