[jira] [Comment Edited] (SPARK-13614) show() trigger memory leak,why?
[ https://issues.apache.org/jira/browse/SPARK-13614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175516#comment-15175516 ] chillon_m edited comment on SPARK-13614 at 3/3/16 2:16 AM: --- @[~srowen] the same size of dataset(hot.count()=599147,ghot.size=21844,10Byte/row),collect don't trigger memory leak(first image),but show() trigger it.why?in general,collect trigger it easily("Keep in mind that your entire dataset must fit in memory on a single machine to use collect() on it, so collect() shouldn’t be used on large datasets." in ),but collect don't trigger. was (Author: chillon_m): @[~srowen] the same size of dataset(hot.count()=599147,ghot.size=21844),collect don't trigger memory leak(first image),but show() trigger it.why?in general,collect trigger it easily("Keep in mind that your entire dataset must fit in memory on a single machine to use collect() on it, so collect() shouldn’t be used on large datasets." in ),but collect don't trigger. > show() trigger memory leak,why? > --- > > Key: SPARK-13614 > URL: https://issues.apache.org/jira/browse/SPARK-13614 > Project: Spark > Issue Type: Question > Components: SQL >Affects Versions: 1.5.2 >Reporter: chillon_m > Attachments: memory leak.png, memory.png > > > hot.count()=599147 > ghot.size=21844 > [bigdata@namenode spark-1.5.2-bin-hadoop2.4]$ bin/spark-shell > --driver-class-path /home/bigdata/mysql-connector-java-5.1.38-bin.jar > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 1.5.2 > /_/ > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80) > Type in expressions to have them evaluated. > Type :help for more information. > Spark context available as sc. > SQL context available as sqlContext. > scala> val hot=sqlContext.read.format("jdbc").options(Map("url" -> > "jdbc:mysql://:/?user==","dbtable" -> "")).load() > Wed Mar 02 14:22:37 CST 2016 WARN: Establishing SSL connection without > server's identity verification is not recommended. According to MySQL > 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established > by default if explicit option isn't set. For compliance with existing > applications not using SSL the verifyServerCertificate property is set to > 'false'. You need either to explicitly disable SSL by setting useSSL=false, > or set useSSL=true and provide truststore for server certificate verification. > hot: org.apache.spark.sql.DataFrame = [] > scala> val ghot=hot.groupBy("Num","pNum").count().collect() > Wed Mar 02 14:22:59 CST 2016 WARN: Establishing SSL connection without > server's identity verification is not recommended. According to MySQL > 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established > by default if explicit option isn't set. For compliance with existing > applications not using SSL the verifyServerCertificate property is set to > 'false'. You need either to explicitly disable SSL by setting useSSL=false, > or set useSSL=true and provide truststore for server certificate verification. > ghot: Array[org.apache.spark.sql.Row] = Array([[],[],[], [,42310... > scala> ghot.take(20) > res0: Array[org.apache.spark.sql.Row] = Array([],[],[],[],[],[],[],[]) > scala> hot.groupBy("Num","pNum").count().show() > Wed Mar 02 14:26:05 CST 2016 WARN: Establishing SSL connection without > server's identity verification is not recommended. According to MySQL > 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established > by default if explicit option isn't set. For compliance with existing > applications not using SSL the verifyServerCertificate property is set to > 'false'. You need either to explicitly disable SSL by setting useSSL=false, > or set useSSL=true and provide truststore for server certificate verification. > 16/03/02 14:26:33 ERROR Executor: Managed memory leak detected; size = > 4194304 bytes, TID = 202 > +--+-+-+ > | QQNum| TroopNum|count| > +--+-+-+ > |1X|38XXX|1| > |1X| 5XXX|2| > |1X|26XXX|6| > |1X|14XXX|3| > |1X|41XXX| 14| > |1X|48XXX| 18| > |1X|23XXX|2| > |1X| XXX| 34| > |1X|52XXX|1| > |1X|52XXX|2| > |1X|49XXX|3| > |1X|42XXX|3| > |1X|17XXX| 11| > |1X|25XXX| 129| > |1X|13XXX|2| > |1X|19XXX|1| > |1X|32XXX|9| > |1X|38XXX|6| > |1X|38XXX| 13| > |1X|30XXX|4| > +--+-+-+ > only showing top 20 rows -- This message was sent by Atlassian JIRA
[jira] [Comment Edited] (SPARK-13614) show() trigger memory leak,why?
[ https://issues.apache.org/jira/browse/SPARK-13614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175516#comment-15175516 ] chillon_m edited comment on SPARK-13614 at 3/3/16 2:14 AM: --- @[~srowen] the same size of dataset(hot.count()=599147,ghot.size=21844),collect don't trigger memory leak(first image),but show() trigger it.why?in general,collect trigger it easily("Keep in mind that your entire dataset must fit in memory on a single machine to use collect() on it, so collect() shouldn’t be used on large datasets." in ),but collect don't trigger. was (Author: chillon_m): [~srowen] the same size of dataset(hot.count()=599147,ghot.size=21844),collect don't trigger memory leak(first image),but show() trigger it.why?in general,collect trigger it easily("Keep in mind that your entire dataset must fit in memory on a single machine to use collect() on it, so collect() shouldn’t be used on large datasets." in ),but collect don't trigger. > show() trigger memory leak,why? > --- > > Key: SPARK-13614 > URL: https://issues.apache.org/jira/browse/SPARK-13614 > Project: Spark > Issue Type: Question > Components: SQL >Affects Versions: 1.5.2 >Reporter: chillon_m > Attachments: memory leak.png, memory.png > > > hot.count()=599147 > ghot.size=21844 > [bigdata@namenode spark-1.5.2-bin-hadoop2.4]$ bin/spark-shell > --driver-class-path /home/bigdata/mysql-connector-java-5.1.38-bin.jar > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 1.5.2 > /_/ > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80) > Type in expressions to have them evaluated. > Type :help for more information. > Spark context available as sc. > SQL context available as sqlContext. > scala> val hot=sqlContext.read.format("jdbc").options(Map("url" -> > "jdbc:mysql://:/?user==","dbtable" -> "")).load() > Wed Mar 02 14:22:37 CST 2016 WARN: Establishing SSL connection without > server's identity verification is not recommended. According to MySQL > 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established > by default if explicit option isn't set. For compliance with existing > applications not using SSL the verifyServerCertificate property is set to > 'false'. You need either to explicitly disable SSL by setting useSSL=false, > or set useSSL=true and provide truststore for server certificate verification. > hot: org.apache.spark.sql.DataFrame = [] > scala> val ghot=hot.groupBy("Num","pNum").count().collect() > Wed Mar 02 14:22:59 CST 2016 WARN: Establishing SSL connection without > server's identity verification is not recommended. According to MySQL > 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established > by default if explicit option isn't set. For compliance with existing > applications not using SSL the verifyServerCertificate property is set to > 'false'. You need either to explicitly disable SSL by setting useSSL=false, > or set useSSL=true and provide truststore for server certificate verification. > ghot: Array[org.apache.spark.sql.Row] = Array([[],[],[], [,42310... > scala> ghot.take(20) > res0: Array[org.apache.spark.sql.Row] = Array([],[],[],[],[],[],[],[]) > scala> hot.groupBy("Num","pNum").count().show() > Wed Mar 02 14:26:05 CST 2016 WARN: Establishing SSL connection without > server's identity verification is not recommended. According to MySQL > 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established > by default if explicit option isn't set. For compliance with existing > applications not using SSL the verifyServerCertificate property is set to > 'false'. You need either to explicitly disable SSL by setting useSSL=false, > or set useSSL=true and provide truststore for server certificate verification. > 16/03/02 14:26:33 ERROR Executor: Managed memory leak detected; size = > 4194304 bytes, TID = 202 > +--+-+-+ > | QQNum| TroopNum|count| > +--+-+-+ > |1X|38XXX|1| > |1X| 5XXX|2| > |1X|26XXX|6| > |1X|14XXX|3| > |1X|41XXX| 14| > |1X|48XXX| 18| > |1X|23XXX|2| > |1X| XXX| 34| > |1X|52XXX|1| > |1X|52XXX|2| > |1X|49XXX|3| > |1X|42XXX|3| > |1X|17XXX| 11| > |1X|25XXX| 129| > |1X|13XXX|2| > |1X|19XXX|1| > |1X|32XXX|9| > |1X|38XXX|6| > |1X|38XXX| 13| > |1X|30XXX|4| > +--+-+-+ > only showing top 20 rows -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (SPARK-13614) show() trigger memory leak,why?
[ https://issues.apache.org/jira/browse/SPARK-13614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175516#comment-15175516 ] chillon_m edited comment on SPARK-13614 at 3/3/16 2:14 AM: --- [~srowen] the same size of dataset(hot.count()=599147,ghot.size=21844),collect don't trigger memory leak(first image),but show() trigger it.why?in general,collect trigger it easily("Keep in mind that your entire dataset must fit in memory on a single machine to use collect() on it, so collect() shouldn’t be used on large datasets." in ),but collect don't trigger. was (Author: chillon_m): the same size of dataset,collect don't trigger memory leak(first image),but show() trigger it.why? > show() trigger memory leak,why? > --- > > Key: SPARK-13614 > URL: https://issues.apache.org/jira/browse/SPARK-13614 > Project: Spark > Issue Type: Question > Components: SQL >Affects Versions: 1.5.2 >Reporter: chillon_m > Attachments: memory leak.png, memory.png > > > hot.count()=599147 > ghot.size=21844 > [bigdata@namenode spark-1.5.2-bin-hadoop2.4]$ bin/spark-shell > --driver-class-path /home/bigdata/mysql-connector-java-5.1.38-bin.jar > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 1.5.2 > /_/ > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80) > Type in expressions to have them evaluated. > Type :help for more information. > Spark context available as sc. > SQL context available as sqlContext. > scala> val hot=sqlContext.read.format("jdbc").options(Map("url" -> > "jdbc:mysql://:/?user==","dbtable" -> "")).load() > Wed Mar 02 14:22:37 CST 2016 WARN: Establishing SSL connection without > server's identity verification is not recommended. According to MySQL > 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established > by default if explicit option isn't set. For compliance with existing > applications not using SSL the verifyServerCertificate property is set to > 'false'. You need either to explicitly disable SSL by setting useSSL=false, > or set useSSL=true and provide truststore for server certificate verification. > hot: org.apache.spark.sql.DataFrame = [] > scala> val ghot=hot.groupBy("Num","pNum").count().collect() > Wed Mar 02 14:22:59 CST 2016 WARN: Establishing SSL connection without > server's identity verification is not recommended. According to MySQL > 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established > by default if explicit option isn't set. For compliance with existing > applications not using SSL the verifyServerCertificate property is set to > 'false'. You need either to explicitly disable SSL by setting useSSL=false, > or set useSSL=true and provide truststore for server certificate verification. > ghot: Array[org.apache.spark.sql.Row] = Array([[],[],[], [,42310... > scala> ghot.take(20) > res0: Array[org.apache.spark.sql.Row] = Array([],[],[],[],[],[],[],[]) > scala> hot.groupBy("Num","pNum").count().show() > Wed Mar 02 14:26:05 CST 2016 WARN: Establishing SSL connection without > server's identity verification is not recommended. According to MySQL > 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established > by default if explicit option isn't set. For compliance with existing > applications not using SSL the verifyServerCertificate property is set to > 'false'. You need either to explicitly disable SSL by setting useSSL=false, > or set useSSL=true and provide truststore for server certificate verification. > 16/03/02 14:26:33 ERROR Executor: Managed memory leak detected; size = > 4194304 bytes, TID = 202 > +--+-+-+ > | QQNum| TroopNum|count| > +--+-+-+ > |1X|38XXX|1| > |1X| 5XXX|2| > |1X|26XXX|6| > |1X|14XXX|3| > |1X|41XXX| 14| > |1X|48XXX| 18| > |1X|23XXX|2| > |1X| XXX| 34| > |1X|52XXX|1| > |1X|52XXX|2| > |1X|49XXX|3| > |1X|42XXX|3| > |1X|17XXX| 11| > |1X|25XXX| 129| > |1X|13XXX|2| > |1X|19XXX|1| > |1X|32XXX|9| > |1X|38XXX|6| > |1X|38XXX| 13| > |1X|30XXX|4| > +--+-+-+ > only showing top 20 rows -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org