[jira] [Commented] (SPARK-9042) Spark SQL incompatibility if security is enforced on the Hive warehouse

2015-12-16 Thread Vijay Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061453#comment-15061453
 ] 

Vijay Singh commented on SPARK-9042:


Hi Charmee,

You can invoke spark-shell or spark-submit in following fasion to gain access 
to hivecontext functionality. Here is an example for spark-shell
{code}
HADOOP_CONF_DIR=/etc/hive/conf spark-shell --master yarn-client 
--driver-class-path '/opt/cloudera/parcels/CDH/lib/hive/lib/*' 
--driver-java-options 
'-Dspark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hive/lib/*'
{code}
Additionally, the service/user account's group can be granted access to 
metastore in following fashion if metastore access is restricted. 
# Go to Cloudera Manager > Hive > Configuration > Service-Wide > Proxy > Hive 
Metastore Access Control and Proxy User Groups Override
# Add the group name for {color:red} all service account and users that should 
require hive metastore access if required {color} in addition to hive  and hue 
users.
# Restart the Hive Metastore Server for the changes to take effect.



> Spark SQL incompatibility if security is enforced on the Hive warehouse
> ---
>
> Key: SPARK-9042
> URL: https://issues.apache.org/jira/browse/SPARK-9042
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Nitin Kak
>
> Hive queries executed from Spark using HiveContext use CLI to create the 
> query plan and then access the Hive table directories(under 
> /user/hive/warehouse/) directly. This gives AccessContolException if Apache 
> Sentry is installed:
> org.apache.hadoop.security.AccessControlException: Permission denied: 
> user=kakn, access=READ_EXECUTE, 
> inode="/user/hive/warehouse/mastering.db/sample_table":hive:hive:drwxrwx--t 
> With Apache Sentry, only "hive" user(created only for Sentry) has the 
> permissions to access the hive warehouse directory. After Sentry 
> installations all the queries are directed to HiveServer2 which translates 
> the changes the invoking user to "hive" and then access the hive warehouse 
> directory. However, HiveContext does not execute the query through 
> HiveServer2 which is leading to the issue. Here is an example of executing 
> hive query through HiveContext.
> val hqlContext = new HiveContext(sc) // Create context to run Hive queries 
> val pairRDD = hqlContext.sql(hql) // where hql is the string with hive query 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9042) Spark SQL incompatibility if security is enforced on the Hive warehouse

2015-12-16 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060223#comment-15060223
 ] 

Sean Owen commented on SPARK-9042:
--

OK, could be. I'm updating this by request from more expert people internally, 
who might comment here. If it's a slightly different issue we can reopen and 
alter the description if needed to narrow it down.

That said, isn't this a Sentry question, and not Spark? It seems like it either 
blocks metastore access on purpose, or, can allow the necessary access, in 
which case it's some config somewhere that needs to be made to allow it.

> Spark SQL incompatibility if security is enforced on the Hive warehouse
> ---
>
> Key: SPARK-9042
> URL: https://issues.apache.org/jira/browse/SPARK-9042
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Nitin Kak
>
> Hive queries executed from Spark using HiveContext use CLI to create the 
> query plan and then access the Hive table directories(under 
> /user/hive/warehouse/) directly. This gives AccessContolException if Apache 
> Sentry is installed:
> org.apache.hadoop.security.AccessControlException: Permission denied: 
> user=kakn, access=READ_EXECUTE, 
> inode="/user/hive/warehouse/mastering.db/sample_table":hive:hive:drwxrwx--t 
> With Apache Sentry, only "hive" user(created only for Sentry) has the 
> permissions to access the hive warehouse directory. After Sentry 
> installations all the queries are directed to HiveServer2 which translates 
> the changes the invoking user to "hive" and then access the hive warehouse 
> directory. However, HiveContext does not execute the query through 
> HiveServer2 which is leading to the issue. Here is an example of executing 
> hive query through HiveContext.
> val hqlContext = new HiveContext(sc) // Create context to run Hive queries 
> val pairRDD = hqlContext.sql(hql) // where hql is the string with hive query 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9042) Spark SQL incompatibility if security is enforced on the Hive warehouse

2015-12-16 Thread Charmee Patel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060219#comment-15060219
 ] 

Charmee Patel commented on SPARK-9042:
--

Yes, we can take it back to cloudera land. I will follow up on that
separately.

However, I do not believe we had an hdfs access issue. We could read the
data fine. We could see our queries write into temp directories under the
table fine as well. After all data was written to temp location our process
failed because Hive Metastore rejected changes to partitions.




> Spark SQL incompatibility if security is enforced on the Hive warehouse
> ---
>
> Key: SPARK-9042
> URL: https://issues.apache.org/jira/browse/SPARK-9042
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Nitin Kak
>
> Hive queries executed from Spark using HiveContext use CLI to create the 
> query plan and then access the Hive table directories(under 
> /user/hive/warehouse/) directly. This gives AccessContolException if Apache 
> Sentry is installed:
> org.apache.hadoop.security.AccessControlException: Permission denied: 
> user=kakn, access=READ_EXECUTE, 
> inode="/user/hive/warehouse/mastering.db/sample_table":hive:hive:drwxrwx--t 
> With Apache Sentry, only "hive" user(created only for Sentry) has the 
> permissions to access the hive warehouse directory. After Sentry 
> installations all the queries are directed to HiveServer2 which translates 
> the changes the invoking user to "hive" and then access the hive warehouse 
> directory. However, HiveContext does not execute the query through 
> HiveServer2 which is leading to the issue. Here is an example of executing 
> hive query through HiveContext.
> val hqlContext = new HiveContext(sc) // Create context to run Hive queries 
> val pairRDD = hqlContext.sql(hql) // where hql is the string with hive query 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9042) Spark SQL incompatibility if security is enforced on the Hive warehouse

2015-12-16 Thread Andrew Ray (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060208#comment-15060208
 ] 

Andrew Ray commented on SPARK-9042:
---

Sean, I think there are a couple issues going on here. In my experience with 
the Sentry HDFS plugin, you can read tables just fine from spark (which was the 
stated issue here). However there are other similar issues that are real, you 
can't create/modify any tables. There are two issues there. First is HDFS 
permissions, the sentry hdfs plugin only gives you read access. Second is Hive 
metastore permissions, even if you create the table in some other hdfs location 
that you have write access to you will still fail as you can't make 
modifications to the hive metastore as it has a whitelist of users that is by 
default set to just hive and impala.

> Spark SQL incompatibility if security is enforced on the Hive warehouse
> ---
>
> Key: SPARK-9042
> URL: https://issues.apache.org/jira/browse/SPARK-9042
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Nitin Kak
>
> Hive queries executed from Spark using HiveContext use CLI to create the 
> query plan and then access the Hive table directories(under 
> /user/hive/warehouse/) directly. This gives AccessContolException if Apache 
> Sentry is installed:
> org.apache.hadoop.security.AccessControlException: Permission denied: 
> user=kakn, access=READ_EXECUTE, 
> inode="/user/hive/warehouse/mastering.db/sample_table":hive:hive:drwxrwx--t 
> With Apache Sentry, only "hive" user(created only for Sentry) has the 
> permissions to access the hive warehouse directory. After Sentry 
> installations all the queries are directed to HiveServer2 which translates 
> the changes the invoking user to "hive" and then access the hive warehouse 
> directory. However, HiveContext does not execute the query through 
> HiveServer2 which is leading to the issue. Here is an example of executing 
> hive query through HiveContext.
> val hqlContext = new HiveContext(sc) // Create context to run Hive queries 
> val pairRDD = hqlContext.sql(hql) // where hql is the string with hive query 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9042) Spark SQL incompatibility if security is enforced on the Hive warehouse

2015-12-16 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060201#comment-15060201
 ] 

Sean Owen commented on SPARK-9042:
--

Yea, this is coming from our support. (Can we take this to Cloudera land until 
it's clear?) I was told this was resolved and it was a Sentry plugin config 
issue.

> Spark SQL incompatibility if security is enforced on the Hive warehouse
> ---
>
> Key: SPARK-9042
> URL: https://issues.apache.org/jira/browse/SPARK-9042
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Nitin Kak
>
> Hive queries executed from Spark using HiveContext use CLI to create the 
> query plan and then access the Hive table directories(under 
> /user/hive/warehouse/) directly. This gives AccessContolException if Apache 
> Sentry is installed:
> org.apache.hadoop.security.AccessControlException: Permission denied: 
> user=kakn, access=READ_EXECUTE, 
> inode="/user/hive/warehouse/mastering.db/sample_table":hive:hive:drwxrwx--t 
> With Apache Sentry, only "hive" user(created only for Sentry) has the 
> permissions to access the hive warehouse directory. After Sentry 
> installations all the queries are directed to HiveServer2 which translates 
> the changes the invoking user to "hive" and then access the hive warehouse 
> directory. However, HiveContext does not execute the query through 
> HiveServer2 which is leading to the issue. Here is an example of executing 
> hive query through HiveContext.
> val hqlContext = new HiveContext(sc) // Create context to run Hive queries 
> val pairRDD = hqlContext.sql(hql) // where hql is the string with hive query 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9042) Spark SQL incompatibility if security is enforced on the Hive warehouse

2015-12-16 Thread Charmee Patel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060196#comment-15060196
 ] 

Charmee Patel commented on SPARK-9042:
--

Sentry plugin is only managing access to HDFS. We have no issues reading
data from tables based on appropriate permissions. The production cluster
where we encounter this issue was configured for Sentry by Cloudera team.
But we can follow up one more time. Vijay Singh, who commented on this
issue and was on cloudera team helped us narrow down the issue to Hive
Metastore + Sentry being the culprit.





> Spark SQL incompatibility if security is enforced on the Hive warehouse
> ---
>
> Key: SPARK-9042
> URL: https://issues.apache.org/jira/browse/SPARK-9042
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Nitin Kak
>
> Hive queries executed from Spark using HiveContext use CLI to create the 
> query plan and then access the Hive table directories(under 
> /user/hive/warehouse/) directly. This gives AccessContolException if Apache 
> Sentry is installed:
> org.apache.hadoop.security.AccessControlException: Permission denied: 
> user=kakn, access=READ_EXECUTE, 
> inode="/user/hive/warehouse/mastering.db/sample_table":hive:hive:drwxrwx--t 
> With Apache Sentry, only "hive" user(created only for Sentry) has the 
> permissions to access the hive warehouse directory. After Sentry 
> installations all the queries are directed to HiveServer2 which translates 
> the changes the invoking user to "hive" and then access the hive warehouse 
> directory. However, HiveContext does not execute the query through 
> HiveServer2 which is leading to the issue. Here is an example of executing 
> hive query through HiveContext.
> val hqlContext = new HiveContext(sc) // Create context to run Hive queries 
> val pairRDD = hqlContext.sql(hql) // where hql is the string with hive query 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9042) Spark SQL incompatibility if security is enforced on the Hive warehouse

2015-12-16 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060185#comment-15060185
 ] 

Sean Owen commented on SPARK-9042:
--

As I say -- AFAIK this problem is resolved by enabling the Sentry plugin for 
HDFS. Did you do that?

> Spark SQL incompatibility if security is enforced on the Hive warehouse
> ---
>
> Key: SPARK-9042
> URL: https://issues.apache.org/jira/browse/SPARK-9042
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Nitin Kak
>
> Hive queries executed from Spark using HiveContext use CLI to create the 
> query plan and then access the Hive table directories(under 
> /user/hive/warehouse/) directly. This gives AccessContolException if Apache 
> Sentry is installed:
> org.apache.hadoop.security.AccessControlException: Permission denied: 
> user=kakn, access=READ_EXECUTE, 
> inode="/user/hive/warehouse/mastering.db/sample_table":hive:hive:drwxrwx--t 
> With Apache Sentry, only "hive" user(created only for Sentry) has the 
> permissions to access the hive warehouse directory. After Sentry 
> installations all the queries are directed to HiveServer2 which translates 
> the changes the invoking user to "hive" and then access the hive warehouse 
> directory. However, HiveContext does not execute the query through 
> HiveServer2 which is leading to the issue. Here is an example of executing 
> hive query through HiveContext.
> val hqlContext = new HiveContext(sc) // Create context to run Hive queries 
> val pairRDD = hqlContext.sql(hql) // where hql is the string with hive query 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9042) Spark SQL incompatibility if security is enforced on the Hive warehouse

2015-12-16 Thread Charmee Patel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060177#comment-15060177
 ] 

Charmee Patel commented on SPARK-9042:
--

I don't agree with closing this issue. 

We have an actual Production environment where Cloudera Support has configured 
Sentry. Read/Insert on a specific table works fine. But as soon as we have 
queries that create new partitions, we get exact same permissions issue. Any 
queries that also alter information hive metastore are blocked. "Create table 
As" is also blocked. 

Look at comment up here by Vijay Singh - he nails down that the problem is how 
Hive Context works directly with Hive Metastore and Sentry blocks that.

> Spark SQL incompatibility if security is enforced on the Hive warehouse
> ---
>
> Key: SPARK-9042
> URL: https://issues.apache.org/jira/browse/SPARK-9042
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Nitin Kak
>
> Hive queries executed from Spark using HiveContext use CLI to create the 
> query plan and then access the Hive table directories(under 
> /user/hive/warehouse/) directly. This gives AccessContolException if Apache 
> Sentry is installed:
> org.apache.hadoop.security.AccessControlException: Permission denied: 
> user=kakn, access=READ_EXECUTE, 
> inode="/user/hive/warehouse/mastering.db/sample_table":hive:hive:drwxrwx--t 
> With Apache Sentry, only "hive" user(created only for Sentry) has the 
> permissions to access the hive warehouse directory. After Sentry 
> installations all the queries are directed to HiveServer2 which translates 
> the changes the invoking user to "hive" and then access the hive warehouse 
> directory. However, HiveContext does not execute the query through 
> HiveServer2 which is leading to the issue. Here is an example of executing 
> hive query through HiveContext.
> val hqlContext = new HiveContext(sc) // Create context to run Hive queries 
> val pairRDD = hqlContext.sql(hql) // where hql is the string with hive query 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org