[ https://issues.apache.org/jira/browse/SPARK-18857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-18857: ------------------------------------ Assignee: Apache Spark > SparkSQL ThriftServer hangs while extracting huge data volumes in incremental > collect mode > ------------------------------------------------------------------------------------------ > > Key: SPARK-18857 > URL: https://issues.apache.org/jira/browse/SPARK-18857 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.2 > Reporter: vishal agrawal > Assignee: Apache Spark > Attachments: GC-spark-1.6.3, GC-spark-2.0.2 > > > We are trying to run a sql query on our spark cluster and extracting around > 200 million records through SparkSQL ThriftServer interface. This query works > fine for Spark 1.6.3 version, however for spark 2.0.2, thrift server hangs > after fetching data from a few partitions (we are using incremental collect > mode with 400 partitions). As per documentation max memory taken up by thrift > server should be what is required by the biggest data partition. But we > observed that Thrift server is not releasing the old partitions memory > whenever the GC occurs even though it has moved to next partition data > fetches. which is not the case with 1.6.3 version. > On further investigation we found that SparkExecuteStatementOperation.scala > was modified for "[SPARK-16563][SQL] fix spark sql thrift server FetchResults > bug" and result set iterator was duplicated to keep a reference to the first > set. > + val (itra, itrb) = iter.duplicate > + iterHeader = itra > + iter = itrb > We suspect that this is resulting in the memory not being cleared on GC. To > confirm this we created an iterator in our test class and fetched the data > once without duplicating and second time with creating a duplicate. we could > see that in first instance it ran fine and fetched the entire data set while > in second instance driver hanged after fetching data from a few partitions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org