Re: HiveContext is Serialized?

2016-10-26 Thread Mich Talebzadeh
Thanks Sean. I believe you are referring to below statement "You can't use the HiveContext or SparkContext in a distribution operation. It has nothing to do with for loops. The fact that they're serializable is misleading. It's there, I believe, because these objects may be inadvertently

Re: HiveContext is Serialized?

2016-10-26 Thread Sean Owen
Yes, but the question here is why the context objects are marked serializable when they are not meant to be sent somewhere as bytes. I tried to answer that apparent inconsistency below. On Wed, Oct 26, 2016, 10:21 Mich Talebzadeh wrote: > Hi, > > Sorry for asking this

Re: HiveContext is Serialized?

2016-10-26 Thread Mich Talebzadeh
Hi, Sorry for asking this rather naïve question. The notion of serialisation in Spark and where it can be serialised or not. Does this generally refer to the concept of serialisation in the context of data storage? In this context for example with reference to RDD operations is it process of

Re: HiveContext is Serialized?

2016-10-26 Thread Sean Owen
It is the driver that has the info needed to schedule and manage distributed jobs and that is by design. This is narrowly about using the HiveContext or SparkContext directly. Of course SQL operations are distributed. On Wed, Oct 26, 2016, 10:03 Mich Talebzadeh wrote:

Re: HiveContext is Serialized?

2016-10-26 Thread ayan guha
In your use case, your dedf need not to be a data frame. You could use SC.textFile().collect. Even better you can just read off a local file, as your file is very small, unless you are planning to use yarn cluster mode. On 26 Oct 2016 16:43, "Ajay Chander" wrote: > Sean,

Re: HiveContext is Serialized?

2016-10-26 Thread Mich Talebzadeh
Hi Sean, Your point: "You can't use the HiveContext or SparkContext in a distribution operation..." Is this because of design issue? Case in point if I created a DF from RDD and register it as a tempTable, does this imply that any sql calls on that table islocalised and not distributed among

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
Sean, thank you for making it clear. It was helpful. Regards, Ajay On Wednesday, October 26, 2016, Sean Owen wrote: > This usage is fine, because you are only using the HiveContext locally on > the driver. It's applied in a function that's used on a Scala collection. > >

Re: HiveContext is Serialized?

2016-10-25 Thread Sunita Arvind
Thanks for the response Sean. I have seen the NPE on similar issues very consistently and assumed that could be the reason :) Thanks for clarifying. regards Sunita On Tue, Oct 25, 2016 at 10:11 PM, Sean Owen wrote: > This usage is fine, because you are only using the

Re: HiveContext is Serialized?

2016-10-25 Thread Sean Owen
This usage is fine, because you are only using the HiveContext locally on the driver. It's applied in a function that's used on a Scala collection. You can't use the HiveContext or SparkContext in a distribution operation. It has nothing to do with for loops. The fact that they're serializable

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
Sunita, Thanks for your time. In my scenario, based on each attribute from deDF(1 column with just 66 rows), I have to query a Hive table and insert into another table. Thanks, Ajay On Wed, Oct 26, 2016 at 12:21 AM, Sunita Arvind wrote: > Ajay, > > Afaik Generally these

Re: HiveContext is Serialized?

2016-10-25 Thread Sunita Arvind
Ajay, Afaik Generally these contexts cannot be accessed within loops. The sql query itself would run on distributed datasets so it's a parallel execution. Putting them in foreach would make it nested in nested. So serialization would become hard. Not sure I could explain it right. If you can

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
Jeff, Thanks for your response. I see below error in the logs. You think it has to do anything with hiveContext ? Do I have to serialize it before using inside foreach ? 16/10/19 15:16:23 ERROR scheduler.LiveListenerBus: Listener SQLListener threw an exception java.lang.NullPointerException

Re: HiveContext is Serialized?

2016-10-25 Thread Jeff Zhang
In your sample code, you can use hiveContext in the foreach as it is scala List foreach operation which runs in driver side. But you cannot use hiveContext in RDD.foreach Ajay Chander 于2016年10月26日周三 上午11:28写道: > Hi Everyone, > > I was thinking if I can use hiveContext

HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
Hi Everyone, I was thinking if I can use hiveContext inside foreach like below, object Test { def main(args: Array[String]): Unit = { val conf = new SparkConf() val sc = new SparkContext(conf) val hiveContext = new HiveContext(sc) val dataElementsFile = args(0) val deDF =