Re: Does HiveContext connect to HiveServer2?

2015-06-24 Thread Nitin kak
Hi Marcelo, The issue does not happen while connecting to the hive metstore, that works fine. It seems that HiveContext only uses Hive CLI to execute the queries while HiveServer2 does not support it. I dont think you can specify any configuration in hive-site.xml which can make it connect to

Re: Hive query execution from Spark(through HiveContext) failing with Apache Sentry

2015-06-22 Thread Nitin kak
Any response to this guys? On Fri, Jun 19, 2015 at 2:34 PM, Nitin kak nitinkak...@gmail.com wrote: Any other suggestions guys? On Wed, Jun 17, 2015 at 7:54 PM, Nitin kak nitinkak...@gmail.com wrote: With Sentry, only hive user has the permission for read/write/execute on the subdirectories

Re: Hive query execution from Spark(through HiveContext) failing with Apache Sentry

2015-06-19 Thread Nitin kak
Any other suggestions guys? On Wed, Jun 17, 2015 at 7:54 PM, Nitin kak nitinkak...@gmail.com wrote: With Sentry, only hive user has the permission for read/write/execute on the subdirectories of warehouse. All the users get translated to hive when interacting with hiveserver2. But i think

Hive query execution from Spark(through HiveContext) failing with Apache Sentry

2015-06-17 Thread Nitin kak
I am trying to run a hive query from Spark code using HiveContext object. It was running fine earlier but since the Apache Sentry has been set installed the process is failing with this exception : *org.apache.hadoop.security.AccessControlException: Permission denied: user=kakn,

Re: Hive query execution from Spark(through HiveContext) failing with Apache Sentry

2015-06-17 Thread Nitin kak
: Try to grant read execute access through sentry. On 18 Jun 2015 05:47, Nitin kak nitinkak...@gmail.com javascript:_e(%7B%7D,'cvml','nitinkak...@gmail.com'); wrote: I am trying to run a hive query from Spark code using HiveContext object. It was running fine earlier but since the Apache Sentry

Re: Hive query execution from Spark(through HiveContext) failing with Apache Sentry

2015-06-17 Thread Nitin kak
: Try to grant read execute access through sentry. On 18 Jun 2015 05:47, Nitin kak nitinkak...@gmail.com javascript:_e(%7B%7D,'cvml','nitinkak...@gmail.com'); wrote: I am trying to run a hive query from Spark code using HiveContext object. It was running fine earlier but since the Apache Sentry

Re: HiveContext test, Spark Context did not initialize after waiting 10000ms

2015-05-26 Thread Nitin kak
That is a much better solution than how I resolved it. I got around it by placing comma separated jar paths for all the hive related jars in --jars clause. I will try your solution. Thanks for sharing it. On Tue, May 26, 2015 at 4:14 AM, Mohammad Islam misla...@yahoo.com wrote: I got a similar

Re: Where can I find logs set inside RDD processing functions?

2015-02-06 Thread Nitin kak
The yarn log aggregation is enabled and the logs which I get through yarn logs -applicationId your_application_id are no different than what I get through logs in Yarn Application tracking URL. They still dont have the above logs. On Fri, Feb 6, 2015 at 3:36 PM, Petar Zecevic

Re: Where can I find logs set inside RDD processing functions?

2015-02-06 Thread Nitin kak
yarn.nodemanager.remote-app-log-dir is set to /tmp/logs On Fri, Feb 6, 2015 at 4:14 PM, Ted Yu yuzhih...@gmail.com wrote: To add to What Petar said, when YARN log aggregation is enabled, consider specifying yarn.nodemanager.remote-app-log-dir which is where aggregated logs are saved.

Re: Sort based shuffle not working properly?

2015-02-03 Thread Nitin kak
bother to also sort them within each partition On Tue, Feb 3, 2015 at 5:41 PM, Nitin kak nitinkak...@gmail.com wrote: I thought thats what sort based shuffled did, sort the keys going to the same partition. I have tried (c1, c2) as (Int, Int) tuple as well. I don't think that ordering of c2

Re: Sort based shuffle not working properly?

2015-02-03 Thread Nitin kak
I thought thats what sort based shuffled did, sort the keys going to the same partition. I have tried (c1, c2) as (Int, Int) tuple as well. I don't think that ordering of c2 type is the problem here. On Tue, Feb 3, 2015 at 5:21 PM, Sean Owen so...@cloudera.com wrote: Hm, I don't think the sort

Re: Running beyond memory limits in ConnectedComponents

2015-01-15 Thread Nitin kak
memory asked by Spark to approximately 22G. On Thu, Jan 15, 2015 at 12:54 PM, Nitin kak nitinkak...@gmail.com wrote: Is this Overhead memory allocation used for any specific purpose. For example, will it be any different if I do *--executor-memory 22G *with overhead set to 0%(hypothetically) vs

Re: Running beyond memory limits in ConnectedComponents

2015-01-15 Thread Nitin kak
I am sorry for the formatting error, the value for *yarn.scheduler.maximum-allocation-mb = 28G* On Thu, Jan 15, 2015 at 11:31 AM, Nitin kak nitinkak...@gmail.com wrote: Thanks for sticking to this thread. I am guessing what memory my app requests and what Yarn requests on my part should

Re: Running beyond memory limits in ConnectedComponents

2015-01-15 Thread Nitin kak
20G or about 1.4G. You might set this higher to 2G to give more overhead. See the --config property=value syntax documented in http://spark.apache.org/docs/latest/submitting-applications.html On Thu, Jan 15, 2015 at 3:47 AM, Nitin kak nitinkak...@gmail.com wrote: Thanks Sean. I guess

Re: Running beyond memory limits in ConnectedComponents

2015-01-14 Thread Nitin kak
Thanks Sean. I guess Cloudera Manager has parameters executor_total_max_heapsize and worker_max_heapsize which point to the parameters you mentioned above. How much should that cushon between the jvm heap size and yarn memory limit be? I tried setting jvm memory to 20g and yarn to 24g, but it

Re: Is Spark 1.1.0 incompatible with Hive?

2014-10-27 Thread Nitin kak
Yes, I added all the Hive jars present in Cloudera distribution of Hadoop. I added them because I was getting ClassNotFoundException for many required classes(one example stack trace below). So, someone on the community suggested to include the hive jars: *Exception in thread main

Re: Is Spark 1.1.0 incompatible with Hive?

2014-10-27 Thread Nitin kak
is to deploy the plain Apache version of Spark on CDH Yarn. On Mon, Oct 27, 2014 at 11:10 AM, Nitin kak nitinkak...@gmail.com wrote: Yes, I added all the Hive jars present in Cloudera distribution of Hadoop. I added them because I was getting ClassNotFoundException for many required classes(one

Re: Is Spark 1.1.0 incompatible with Hive?

2014-10-27 Thread Nitin kak
Somehow worked by placing all the jars(except guava) in hive lib after --jars. Had initially tried to place the jars under another temporary folder and pointing the executor and driver extraClassPath to that director, but didnt work. On Mon, Oct 27, 2014 at 2:21 PM, Nitin kak nitinkak