multiple Spark Thrift Servers running in the same machine throws org.apache.hadoop.security.AccessControlException

2016-11-24 Thread
I have two users (etl , dev ) start Spark Thrift Server in the same machine . I 
connected by beeline etl STS to execute a command,and throwed  
org.apache.hadoop.security.AccessControlException.I don't know why is dev user 
perform,not etl.
It is a spark bug?  I am using spark2.0.2

Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
 Permission denied: user=dev, access=EXECUTE, 
inode="/user/hive/warehouse/tb_spark_sts/etl_cycle_id=20161122":etl:supergroup:drwxr-x---,group:etl:rwx,group:oth_dev:rwx,default:user:data_mining:r-x,default:group::rwx,default:group:etl:rwx,default:group:oth_dev:rwx,default:mask::rwx,default:other::---
at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkAccessAcl(DefaultAuthorizationProvider.java:335)
at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:231)
at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:178)
at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6250)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3942)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:811)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getFileInfo(AuthorizationProviderProxyClientProtocol.java:502)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:815)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)


Regards.





hive.exec.stagingdir not effect in spark2.0.1

2016-10-18 Thread

Hi ,
 I have set property "hive.exec.stagingdir" to hdfs dir 
"/tmp/spark_log/${user.name}/.hive-staging" in hive-site.xml ,but it not effect 
in spark2.0.1,
directory name that still be created inside table locations .It works in spark 
1.6,and creates hive-staging files in  hdfs dir 
"/tmp/spark_log/${user.name}/.hive-staging",so I can use shell to   delete  
batch files.





How to share cached tables when the Thrift server runs in multi-session mode in spark 1.6

2016-06-03 Thread
 HI



  I created a cached table through Session A via beeline, through which I 
am able to access data.  I tried to access this cached table from another 
session, but I cannot find it.
  

Got the solution from spark site itself: 

From Spark 1.6, by default the Thrift server runs in multi-session mode. Which 
means each JDBC/ODBC connection owns a copy of their own SQL configuration and 
temporary function registry.Cached tables are still shared though. If you 
prefer to run the Thrift server in the old single-session mode, please set 
option spark.sql.hive.thriftServer.singleSession to true. You may either add 
this option to spark-defaults.conf, or pass it to start-thriftserver.sh via 
--conf:


What ‘s the meaning of “Cached tables are still shared though”  ? 





Regards

答复: My notes on Spark Performance & Tuning Guide

2016-05-17 Thread
Thanks for your sharing!
Please include me too

发件人: Mich Talebzadeh
发送时间: ‎2016/‎5/‎18 5:16
收件人: user @spark
主题: Re: My notes on Spark Performance & Tuning Guide

Hi all,

Many thanks for your tremendous interest in the forthcoming notes. I have
had nearly thirty requests and many supporting kind words from the
colleagues in this forum.

I will strive to get the first draft ready as soon as possible. Apologies
for not being more specific. However, hopefully not too long for your
perusal.


Regards,


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 12 May 2016 at 11:08, Mich Talebzadeh  wrote:

> Hi Al,,
>
>
> Following the threads in spark forum, I decided to write up on
> configuration of Spark including allocation of resources and configuration
> of driver, executors, threads, execution of Spark apps and general
> troubleshooting taking into account the allocation of resources for Spark
> applications and OS tools at the disposal.
>
> Since the most widespread configuration as I notice is with "Spark
> Standalone Mode", I have decided to write these notes starting with
> Standalone and later on moving to Yarn
>
>
>-
>
>*Standalone *– a simple cluster manager included with Spark that makes
>it easy to set up a cluster.
>-
>
>*YARN* – the resource manager in Hadoop 2.
>
>
> I would appreciate if anyone interested in reading and commenting to get
> in touch with me directly on mich.talebza...@gmail.com so I can send the
> write-up for their review and comments.
>
>
> Just to be clear this is not meant to be any commercial proposition or
> anything like that. As I seem to get involved with members troubleshooting
> issues and threads on this topic, I thought it is worthwhile writing a note
> about it to summarise the findings for the benefit of the community.
>
>
> Regards.
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>


insert into a partition table take a long time

2016-04-27 Thread

 Hello Sir/Madam
   I want to insert into a partition table using dynamic partition (about 300G 
,dst table created in a orc format), 
   but in stage "get_partition_with_auth" take a long time ,
   while I  have set 

hive.exec.dynamic.partition=true 

hive.exec.dynamic.partition.mode="nonstrict"
   
   The following is my environment:
   hadoop2.5.0CDH5.2.1
   hive 0.13.1
   spark-1.6.1-bin-2.5.0-cdh5.2.1(I have recompiled,but hive.version=1.2.1 )
   
   I found a issue: https://issues.apache.org/jira/browse/SPARK-11785
  When deployed against remote Hive metastore, execution Hive client points 
to the actual Hive metastore rather than local execution Derby metastore using 
Hive 1.2.1 libraries delivered together with Spark (SPARK-11783).
JDBC calls are not properly dispatched to metastore Hive client in Thrift 
server, but handled by execution Hive. (SPARK-9686).
When a JDBC call like getSchemas() comes, execution Hive client using a 
higher version (1.2.1) is used to talk to a lower version Hive metastore 
(0.13.1). Because of incompatible changes made between these two versions, the 
Thrift RPC call fails and exceptions are thrown.
  
   when I run bin/spark-sql ,here is info:
   16/04/28 11:08:59 INFO metastore.MetaStoreDirectSql: Using direct SQL, 
underlying DB is DERBY
   16/04/28 11:08:59 INFO metastore.ObjectStore: Initialized ObjectStore
   16/04/28 11:08:59 WARN metastore.ObjectStore: Version information not found 
in metastore. hive.metastore.schema.verification is not enabled so recording 
the schema version 1.2.0
   16/04/28 11:08:59 WARN metastore.ObjectStore: Failed to get database 
default, returning NoSuchObjectException
   16/04/28 11:08:59 INFO metastore.HiveMetaStore: Added admin role in metastore
   16/04/28 11:08:59 INFO metastore.HiveMetaStore: Added public role in 
metastore
   16/04/28 11:09:00 INFO metastore.HiveMetaStore: No user is added in admin 
role, since config is empty
   16/04/28 11:09:00 INFO metastore.HiveMetaStore: 0: get_all_databases
   16/04/28 11:09:00 INFO HiveMetaStore.audit: ugi=ocdcip=unknown-ip-addr   
   cmd=get_all_databases
   16/04/28 11:09:00 INFO metastore.HiveMetaStore: 0: get_functions: db=default 
pat=*
   16/04/28 11:09:00 INFO HiveMetaStore.audit: ugi=ocdcip=unknown-ip-addr   
   cmd=get_functions: db=default pat=*
   
   
So can you suggest me the any optimized way ,or may I have to upgrate 
hadoop and hive version ?

 Thanks