[ 
https://issues.apache.org/jira/browse/SPARK-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567230#comment-14567230
 ] 

Rene Treffer commented on SPARK-8008:
-------------------------------------

At the moment each partition uses it's own connection as far as I can tell, I 
have to double check how this works on a cluster where even multiple server 
might fetch data.

I'm currently loading year+month wise, due to DB schema (index on actual days, 
locality based on year/month).

I don't think larger batches would be an solution. 3 months may require >160Mio 
rows. I don't think batching that into one partition is a good idea.

> sqlContext.jdbc can kill your database due to high concurrency
> --------------------------------------------------------------
>
>                 Key: SPARK-8008
>                 URL: https://issues.apache.org/jira/browse/SPARK-8008
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Rene Treffer
>
> Spark tries to load as many partitions as possible in parallel, which can in 
> turn overload the database although it would be possible to load all 
> partitions given a lower concurrency.
> It would be nice to either limit the maximum concurrency or to at least warn 
> about this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to