Hi Devs, Hopefully one of you know more on this?
Thanks Andy ---------- Forwarded message ---------- From: Andy Huang <andy.hu...@servian.com.au> Date: Wed, Sep 23, 2015 at 12:39 PM Subject: Parallel collection in driver programs To: u...@spark.apache.org Hi All, Would like know if anyone has experienced with parallel collection in the driver program. And, if there is actual advantage/disadvantage of doing so. E.g. With a collection of Jdbc connections and tables We have adapted our non-spark code which utilize parallel collection to the spark code and it seems to work fine. val conf = List( ("tbl1","dbo.tbl1::tb1_id::0::127::128"), ("tbl2","dbo.tbl2::tb2_id::0::31::32"), ("tbl3","dbo.tbl3::tb3_id::0::63::64") ) val _JDBC_DEFAULT = "jdbc:sqlserver://192.168.52.1;database=TestSource" val _STORE_DEFAULT = "hdfs://192.168.52.132:9000/" val prop = new Properties() prop.setProperty("user","sa") prop.setProperty("password","password") conf.par.map(pair=>{ val qry = pair._2.split("::")(0) val pCol = pair._2.split("::")(1) val lo = pair._2.split("::")(2).toInt val hi = pair._2.split("::")(3).toInt val part = pair._2.split("::")(4).toInt //create dataframe from jdbc table val jdbcDF = sqlContext.read.jdbc( _JDBC_DEFAULT, "("+qry+") a", pCol, lo, //lower bound hi, //upper bound part, //number of partitions prop //java.utils.Properties - key value pair ) //save to parquet jdbcDF.write.mode("overwrite").parquet(_STORE_DEFAULT+pair._1+".parquet") }) Thanks. -- Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 | f: 02 9376 0730| m: 0433221979 -- Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 | f: 02 9376 0730| m: 0433221979