This usage is fine, because you are only using the HiveContext locally on
the driver. It's applied in a function that's used on a Scala collection.

You can't use the HiveContext or SparkContext in a distribution operation.
It has nothing to do with for loops.

The fact that they're serializable is misleading. It's there, I believe,
because these objects may be inadvertently referenced in the closure of a
function that executes remotely, yet doesn't use the context. The closure
cleaner can't always remove this reference. The task would fail to
serialize even though it doesn't use the context. You will find these
objects serialize but then don't work if used remotely.

The NPE you see is an unrelated cosmetic problem that was fixed in 2.0.1
IIRC.

On Wed, Oct 26, 2016 at 4:28 AM Ajay Chander <itsche...@gmail.com> wrote:

> Hi Everyone,
>
> I was thinking if I can use hiveContext inside foreach like below,
>
> object Test {
>   def main(args: Array[String]): Unit = {
>
>     val conf = new SparkConf()
>     val sc = new SparkContext(conf)
>     val hiveContext = new HiveContext(sc)
>
>     val dataElementsFile = args(0)
>     val deDF = 
> hiveContext.read.text(dataElementsFile).toDF("DataElement").coalesce(1).distinct().cache()
>
>     def calculate(de: Row) {
>       val dataElement = de.getAs[String]("DataElement").trim
>       val df1 = hiveContext.sql("SELECT cyc_dt, supplier_proc_i, '" + 
> dataElement + "' as data_elm, " + dataElement + " as data_elm_val FROM 
> TEST_DB.TEST_TABLE1 ")
>       df1.write.insertInto("TEST_DB.TEST_TABLE1")
>     }
>
>     deDF.collect().foreach(calculate)
>   }
> }
>
>
> I looked at 
> https://spark.apache.org/docs/1.6.0/api/scala/index.html#org.apache.spark.sql.hive.HiveContext
>  and I see it is extending SqlContext which extends Logging with Serializable.
>
> Can anyone tell me if this is the right way to use it ? Thanks for your time.
>
> Regards,
>
> Ajay
>
>

Reply via email to