[ https://issues.apache.org/jira/browse/SPARK-20848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16021186#comment-16021186 ]
Sean Owen commented on SPARK-20848: ----------------------------------- Yes, possibly. The main tradeoff is that concurrent read jobs share a pool of threads, and don't get their own. You don't need to spin up new threads, but also, will have a thread pool lying around for the whole app lifetime. No big deal. The main question is whether concurrent jobs were intended to limit their total concurrency on purpose by sharing a pool or not. > Dangling threads when reading parquet files in local mode > --------------------------------------------------------- > > Key: SPARK-20848 > URL: https://issues.apache.org/jira/browse/SPARK-20848 > Project: Spark > Issue Type: Bug > Components: Input/Output, SQL > Affects Versions: 2.1.1, 2.2.0 > Reporter: Nick Pritchard > Attachments: Screen Shot 2017-05-22 at 4.13.52 PM.png > > > On each call to {{spark.read.parquet}}, a new ForkJoinPool is created. One of > the threads in the pool is kept in the {{WAITING}} state, and never stopped, > which leads to unbounded growth in number of threads. > This behavior is a regression from v2.1.0. > Reproducible example: > {code} > val spark = SparkSession > .builder() > .appName("test") > .master("local") > .getOrCreate() > while(true) { > spark.read.parquet("/path/to/file") > Thread.sleep(5000) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org