Thanks Denny! That does help. I will give that a shot. Question: If I am going this route, I am wondering how can I only read few columns of a table (not whole table) from JDBC as data frame. This function from data frame reader does not give an option to read only certain columns: def jdbc(url: String, table: String, predicates: Array[String], connectionProperties: Properties): DataFrame
On the other hand if I want to create a JDBCRdd I can specify a select query (instead of full table): new JdbcRDD(sc: SparkContext, getConnection: () ⇒ Connection, sql: String, lowerBound: Long, upperBound: Long, numPartitions: Int, mapRow: (ResultSet) ⇒ T = JdbcRDD.resultSetToObjectArray)(implicit arg0: ClassTag[T]) May be if I do df.select(col1, col2) on data frame created via a table, will spark be smart enough to fetch only two columns not entire table? Any way to test this? From: Denny Lee <denny.g....@gmail.com<mailto:denny.g....@gmail.com>> Date: Thursday, November 3, 2016 at 10:59 AM To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>, "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: How do I convert a data frame to broadcast variable? If you're able to read the data in as a DataFrame, perhaps you can use a BroadcastHashJoin so that way you can join to that table presuming its small enough to distributed? Here's a handy guide on a BroadcastHashJoin: https://docs.cloud.databricks.com/docs/latest/databricks_guide/index.html#04%20SQL,%20DataFrames%20%26%20Datasets/05%20BroadcastHashJoin%20-%20scala.html HTH! On Thu, Nov 3, 2016 at 8:53 AM Jain, Nishit <nja...@underarmour.com<mailto:nja...@underarmour.com>> wrote: I have a lookup table in HANA database. I want to create a spark broadcast variable for it. What would be the suggested approach? Should I read it as an data frame and convert data frame into broadcast variable? Thanks, Nishit