Hi, I'm using sqlContext.jdbc(uri, table, where).map(_ => 1).aggregate(0)(_+_,_+_) on an interactive shell (where "where" is an Array[String] of 32 to 48 elements). (The code is tailored to your db, specifically through the where conditions, I'd have otherwise post it) That should be the DataFrame API, but I'm just trying to load everything and discard it as soon as possible :-)
(1) Never do a silent drop of the values by default: it kills confidence. An option sounds reasonable. Some sort of insight / log would be great. (How many columns of what type were truncated? why?) Note that I could declare the field as string via JdbcDialects (thank you guys for merging that :-) ). I have quite bad experiences with silent drops / truncates of columns and thus _like_ the strict way of spark. It causes trouble but noticing later that your data was corrupted during conversion is even worse. (2) SPARK-8004 https://issues.apache.org/jira/browse/SPARK-8004 (3) One option would be to make it safe to use, the other option would be to document the behavior (s.th. like "WARNING: this method tries to load as many partitions as possible, make sure your database can handle the load or load them in chunks and use union"). SPARK-8008 https://issues.apache.org/jira/browse/SPARK-8008 Regards, Rene Treffer