[ https://issues.apache.org/jira/browse/SPARK-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567993#comment-14567993 ]
Reynold Xin commented on SPARK-8008: ------------------------------------ As discussed on the dev list, there's already warning to avoid high concurrency {code} /** * Construct a [[DataFrame]] representing the database table accessible via JDBC URL * url named table. Partitions of the table will be retrieved in parallel based on the parameters * passed to this function. * * Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash * your external database systems. * * @param url JDBC database url of the form `jdbc:subprotocol:subname` * @param table Name of the table in the external database. * @param columnName the name of a column of integral type that will be used for partitioning. * @param lowerBound the minimum value of `columnName` used to decide partition stride * @param upperBound the maximum value of `columnName` used to decide partition stride * @param numPartitions the number of partitions. the range `minValue`-`maxValue` will be split * evenly into this many partitions * @param connectionProperties JDBC database connection arguments, a list of arbitrary string * tag/value. Normally at least a "user" and "password" property * should be included. * * @since 1.4.0 */ {code} Even with the warning, it'd be great to have some way to throttle. > sqlContext.jdbc can kill your database due to high concurrency > -------------------------------------------------------------- > > Key: SPARK-8008 > URL: https://issues.apache.org/jira/browse/SPARK-8008 > Project: Spark > Issue Type: Bug > Reporter: Rene Treffer > > Spark tries to load as many partitions as possible in parallel, which can in > turn overload the database although it would be possible to load all > partitions given a lower concurrency. > It would be nice to either limit the maximum concurrency or to at least warn > about this behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org