Re: Spark job is failing with kerberos error while creating hive context in yarn-cluster mode (through spark-submit)
I have a custom hive-site.xml for spark in sparks conf directory. These properties are the minimal ones that you need for spark, I believe. hive.metastore.kerberos.principal = copy from your hive-site.xml, i.e. "hive/_h...@foo.com" hive.metastore.uris = copy from your hive-site.xml, i.e. thrift://ms1.foo.com:9083 hive.metastore.sasl.enabled = true hive.security.authorization.enabled = false Cheers, Doug > On May 23, 2016, at 7:41 AM, Chandraprakash Bhagtani > wrote: > > Hi, > > My Spark job is failing with kerberos issues while creating hive context in > yarn-cluster mode. However it is running with yarn-client mode. My spark > version is 1.6.1 > > I am passing hive-site.xml through --files option. > > I tried searching online and found that the same issue is fixed with the > following jira SPARK-6207. it is fixed in spark 1.4, but I am running 1.6.1 > > Am i missing any configuration here? > > > -- > Thanks & Regards, > Chandra Prakash Bhagtani - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Error trying to get DF for Hive table stored HBase
I’m trying to create a DF for an external Hive table that is in HBase. I get the a NoSuchMethodError org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initSerdeParams(Lorg/apache/hadoop/conf/Configuration;Ljava/util/Properties;Ljava/lang/String;)Lorg/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe$SerDeParameters; I’m running Spark 1.6.0 on HDP 2.2.4-12-1 (Hive 0.14 and HBase 0.98.4) in secure mode. Anybody see this before ? Below is a stack trace and the hive table’s info. scala> sqlContext.table("item_data_lib.pcn_item") java.lang.NoSuchMethodError: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initSerdeParams(Lorg/apache/hadoop/conf/Configuration;Ljava/util/Properties;Ljava/lang/String;)Lorg/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe$SerDeParameters; at org.apache.hadoop.hive.hbase.HBaseSerDeParameters.(HBaseSerDeParameters.java:93) at org.apache.hadoop.hive.hbase.HBaseSerDe.initialize(HBaseSerDe.java:92) at org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:53) at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:521) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:391) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:276) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:258) at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:605) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1$$anonfun$3.apply(ClientWrapper.scala:331) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1$$anonfun$3.apply(ClientWrapper.scala:326) at scala.Option.map(Option.scala:145) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1.apply(ClientWrapper.scala:326) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1.apply(ClientWrapper.scala:321) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:279) at org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:226) at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:225) at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:268) at org.apache.spark.sql.hive.client.ClientWrapper.getTableOption(ClientWrapper.scala:321) at org.apache.spark.sql.hive.client.ClientInterface$class.getTable(ClientInterface.scala:122) at org.apache.spark.sql.hive.client.ClientWrapper.getTable(ClientWrapper.scala:60) at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:384) at org.apache.spark.sql.hive.HiveContext$$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:457) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:161) at org.apache.spark.sql.hive.HiveContext$$anon$2.lookupRelation(HiveContext.scala:457) at org.apache.spark.sql.SQLContext.table(SQLContext.scala:831) at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827) hive> show create table item_data_lib.pcn_item; OK CREATE EXTERNAL TABLE `item_data_lib.pcn_item`( `key` string COMMENT 'from deserializer', `p1` string COMMENT 'from deserializer', `p2` string COMMENT 'from deserializer', `p3` string COMMENT 'from deserializer', `p4` string COMMENT 'from deserializer', `p5` string COMMENT 'from deserializer', `p6` string COMMENT 'from deserializer', `p7` string COMMENT 'from deserializer', `p8` string COMMENT 'from deserializer', `p9` string COMMENT 'from deserializer', `p10` string COMMENT 'from deserializer', `p11` string COMMENT 'from deserializer', `p12` string COMMENT 'from deserializer', `p13` string COMMENT 'from deserializer', `d1` string COMMENT 'from deserializer', `d2` string COMMENT 'from deserializer', `d3` string COMMENT 'from deserializer', `d4` string COMMENT 'from deserializer', `d5` string COMMENT 'from deserializer', `d6` string COMMENT 'from deserializer', `d7` string COMMENT 'from deserializer', `d8` string COMMENT 'from deserializer', `d9` string COMMENT 'from deserializer', `d10` string COMMENT 'from deserializer', `d11` string COMMENT 'from deserializer', `d12` string COMMENT 'from deserializer', `d13` string COMMENT 'from deserializer', `d14` string COMMENT 'from deserializer', `d15` string COMMENT 'from deserializer', `d16` string COMMENT 'from deserializer', `d17` string COMMENT 'from deserializer') ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping'=':key,p:p1,p:p2,p:p3,p:p4,p:p5,p:p6,p:
Re: Accessing external Kerberised resources from Spark executors in Yarn client/cluster mode
Another thing to check is to make sure each one of you executor nodes has the JCE jars installed. try{ javax.crypto.Cipher.getMaxAllowedKeyLength("AES") > 128 } catch { case e:java.security.NoSuchAlgorithmException => false } Setting "-Dsun.security.krb5.debug=true” and “-Dsun.security.jgss.debug=true” in spark.executor.extraJavaOptions and running loginUserFromKeytab() will generate a lot of info in the executor logs, which might be helpful to figure out what is going on too. Cheers, Doug > On Oct 22, 2015, at 7:59 AM, Deenar Toraskar > wrote: > > Hi All > > I am trying to access a SQLServer that uses Kerberos for authentication from > Spark. I can successfully connect to the SQLServer from the driver node, but > any connections to SQLServer from executors fails with "Failed to find any > Kerberos tgt". > > org.apache.hadoop.security.UserGroupInformation.getCurrentUser on the driver > returns myPrincipal (auth:KERBEROS) as expected. And the same call on > executors returns > > sc.parallelize(0 to 10).map { _ =>(("hostname".!!).trim, > UserGroupInformation.getCurrentUser.toString)}.collect.distinct > > returns > > Array((hostname1, myprincipal (auth:SIMPLE), (hostname2, myprincipal > (auth:SIMPLE)) > > > I tried passing the keytab and logging in explicitly from the executors, but > that didnt help either. > > sc.parallelize(0 to 10).map { _ > =>(SparkHadoopUtil.get.loginUserFromKeytab("myprincipal",SparkFiles.get("myprincipal.keytab")), > ("hostname".!!).trim, > UserGroupInformation.getCurrentUser.toString)}.collect.distinct > > Digging deeper I found SPARK-6207 and came across code for each Kerberised > service that is accessed from the executors in Yarn Client, such as > > obtainTokensForNamenodes(nns, hadoopConf, credentials) > > obtainTokenForHiveMetastore(hadoopConf, credentials) > > I was wondering if anyone has been successful in accessing external resources > (running external to the Hadoop cluster) secured by Kerberos in Spark > executors running in Yarn. > > > > Regards > Deenar > > > On 20 April 2015 at 21:58, Andrew Lee wrote: > Hi All, > > Affected version: spark 1.2.1 / 1.2.2 / 1.3-rc1 > > Posting this problem to user group first to see if someone is encountering > the same problem. > > When submitting spark jobs that invokes HiveContext APIs on a Kerberos Hadoop > + YARN (2.4.1) cluster, > I'm getting this error. > > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > > Apparently, the Kerberos ticket is not on the remote data node nor computing > node since we don't > deploy Kerberos tickets, and that is not a good practice either. On the other > hand, we can't just SSH to every machine and run kinit for that users. This > is not practical and it is insecure. > > The point here is that shouldn't there be a delegation token during the doAs > to use the token instead of the ticket ? > I'm trying to understand what is missing in Spark's HiveContext API while a > normal MapReduce job that invokes Hive APIs will work, but not in Spark SQL. > Any insights or feedback are appreciated. > > Anyone got this running without pre-deploying (pre-initializing) all tickets > node by node? Is this worth filing a JIRA? > > > > 15/03/25 18:59:08 INFO hive.metastore: Trying to connect to metastore with > URI thrift://alee-cluster.test.testserver.com:9083 > 15/03/25 18:59:08 ERROR transport.TSaslTransport: SASL negotiation failure > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212) > at > org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) > at > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) > at > org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:336) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:214) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >
Re: Spark_1.5.1_on_HortonWorks
I have been running 1.5.1 with Hive in secure mode on HDP 2.2.4 without any problems. Doug > On Oct 21, 2015, at 12:05 AM, Ajay Chander wrote: > > Hi Everyone, > > Any one has any idea if spark-1.5.1 is available as a service on HortonWorks > ? I have spark-1.3.1 installed on the Cluster and it is a HortonWorks > distribution. Now I want upgrade it to spark-1.5.1. Anyone here have any idea > about it? Thank you in advance. > > Regards, > Ajay - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Unable to start spark-shell on YARN
The error is because the shell is trying to resolve hdp.version and can’t. To fix this, you need to put a file called java-opts in your conf directory that has something like this. -Dhdp.version=2.x.x.x Where 2.x.x.x is there version of hdp that you are using. Cheers, Doug > On Sep 24, 2015, at 6:11 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: > > Spark 1.4.1 > YARN > Hadoop version: 2.7.1.2.3.1.0-2574 > ./bin/spark-shell --master yarn > Hadoop cluster setup using Ambari. > > > Shell fails as YARN job failed. Any suggestions ? > > LOGS: > > 15/09/24 15:07:51 INFO impl.YarnClientImpl: Submitted application > application_1443126834156_0016 > 15/09/24 15:07:52 INFO yarn.Client: Application report for > application_1443126834156_0016 (state: ACCEPTED) > 15/09/24 15:07:52 INFO yarn.Client: >client token: N/A >diagnostics: N/A >ApplicationMaster host: N/A >ApplicationMaster RPC port: -1 >queue: default >start time: 1443132471179 >final status: UNDEFINED >tracking URL: http://host:8088/proxy/application_1443126834156_0016/ >user: zeppelin > 15/09/24 15:07:53 INFO yarn.Client: Application report for > application_1443126834156_0016 (state: ACCEPTED) > 15/09/24 15:07:54 INFO yarn.Client: Application report for > application_1443126834156_0016 (state: ACCEPTED) > 15/09/24 15:07:55 INFO yarn.Client: Application report for > application_1443126834156_0016 (state: ACCEPTED) > 15/09/24 15:07:56 INFO yarn.Client: Application report for > application_1443126834156_0016 (state: FAILED) > 15/09/24 15:07:56 INFO yarn.Client: >client token: N/A >diagnostics: Application application_1443126834156_0016 failed 2 times > due to AM Container for appattempt_1443126834156_0016_02 exited with > exitCode: 1 > For more detailed output, check application tracking > page:http://host:8088/cluster/app/application_1443126834156_0016Then, click > on links to logs of each attempt. > Diagnostics: Exception from container-launch. > Container id: container_e03_1443126834156_0016_02_01 > Exit code: 1 > Exception message: > /hadoop/yarn/local/usercache/zeppelin/appcache/application_1443126834156_0016/container_e03_1443126834156_0016_02_01/launch_container.sh: > line 24: > $PWD:$PWD/__hadoop_conf__:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: > bad substitution > > Stack trace: ExitCodeException exitCode=1: > /hadoop/yarn/local/usercache/zeppelin/appcache/application_1443126834156_0016/container_e03_1443126834156_0016_02_01/launch_container.sh: > line 24: > $PWD:$PWD/__hadoop_conf__:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: > bad substitution > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) > at org.apache.hadoop.util.Shell.run(Shell.java:456) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolEx
Re: Local Repartition
Hi Daniel, Take a look at .coalesce() I’ve seen good results by coalescing to num executors * 10, but I’m still trying to figure out the optimal number of partitions per executor. To get the number of executors, sc.getConf.getInt(“spark.executor.instances”,-1) Cheers, Doug > On Jul 20, 2015, at 5:04 AM, Daniel Haviv > wrote: > > Hi, > My data is constructed from a lot of small files which results in a lot of > partitions per RDD. > Is there some way to locally repartition the RDD without shuffling so that > all of the partitions that reside on a specific node will become X partitions > on the same node ? > > Thank you. > Daniel - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark 1.4 on HortonWork HDP 2.2
If you run Hadoop in secure mode and want to talk to Hive 0.14, it won’t work, see SPARK-5111 I have a patched version of 1.3.1 that I’ve been using. I haven’t had the time to get 1.4.0 working. Cheers, Doug > On Jun 19, 2015, at 8:39 AM, ayan guha wrote: > > I think you can get spark 1.4 pre built with hadoop 2.6 (as that what hdp 2.2 > provides) and just start using it > > On Fri, Jun 19, 2015 at 10:28 PM, Ashish Soni wrote: > I do not where to start as Spark 1.2 comes bundled with HDP2.2 but i want to > use 1.4 and i do not know how to update it to 1.4 > > Ashish > > On Fri, Jun 19, 2015 at 8:26 AM, ayan guha wrote: > what problem are you facing? are you trying to build it yurself or > gettingpre-built version? > > On Fri, Jun 19, 2015 at 10:22 PM, Ashish Soni wrote: > Hi , > > Is any one able to install Spark 1.4 on HDP 2.2 , Please let me know how can > i do the same ? > > Ashish > > > > -- > Best Regards, > Ayan Guha > > > > > -- > Best Regards, > Ayan Guha - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark 1.4.0-rc4 HiveContext.table("db.tbl") NoSuchTableException
Hi Yin, Thanks for the suggestion. I’m not happy about this, and I don’t agree with your position that since it wasn’t an “officially” supported feature no harm was done breaking it in the course of implementing SPARK-6908. I would still argue that it changed and therefore broke .table()’s api. (As you know, I’ve filed 2 bugs regarding this SPARK-8105 and SPARK-8107) I’m done complaining about this issue. My short term plan is to change my code for 1.4.0 and possibility work on a cleaner solution for 1.5.0 that will be acceptable. Thanks for looking into it and responding to my initial email. Doug > On Jun 5, 2015, at 3:36 PM, Yin Huai wrote: > > Hi Doug, > > For now, I think you can use "sqlContext.sql("USE databaseName")" to change > the current database. > > Thanks, > > Yin > > On Thu, Jun 4, 2015 at 12:04 PM, Yin Huai wrote: > Hi Doug, > > sqlContext.table does not officially support database name. It only supports > table name as the parameter. We will add a method to support database name in > future. > > Thanks, > > Yin > > On Thu, Jun 4, 2015 at 8:10 AM, Doug Balog wrote: > Hi Yin, > I’m very surprised to hear that its not supported in 1.3 because I’ve been > using it since 1.3.0. > It worked great up until SPARK-6908 was merged into master. > > What is the supported way to get DF for a table that is not in the default > database ? > > IMHO, If you are not going to support “databaseName.tableName”, > sqlContext.table() should have a version that takes a database and a table, ie > > def table(databaseName: String, tableName: String): DataFrame = > DataFrame(this, catalog.lookupRelation(Seq(databaseName,tableName))) > > The handling of databases in Spark(sqlContext, hiveContext, Catalog) could be > better. > > Thanks, > > Doug > > > On Jun 3, 2015, at 8:21 PM, Yin Huai wrote: > > > > Hi Doug, > > > > Actually, sqlContext.table does not support database name in both Spark 1.3 > > and Spark 1.4. We will support it in future version. > > > > Thanks, > > > > Yin > > > > > > > > On Wed, Jun 3, 2015 at 10:45 AM, Doug Balog > > wrote: > > Hi, > > > > sqlContext.table(“db.tbl”) isn’t working for me, I get a > > NoSuchTableException. > > > > But I can access the table via > > > > sqlContext.sql(“select * from db.tbl”) > > > > So I know it has the table info from the metastore. > > > > Anyone else see this ? > > > > I’ll keep digging. > > I compiled via make-distribution -Pyarn -phadoop-2.4 -Phive > > -Phive-thriftserver > > It worked for me in 1.3.1 > > > > Cheers, > > > > Doug > > > > > > - > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > > > > > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark 1.4.0-rc4 HiveContext.table("db.tbl") NoSuchTableException
Hi Yin, I’m very surprised to hear that its not supported in 1.3 because I’ve been using it since 1.3.0. It worked great up until SPARK-6908 was merged into master. What is the supported way to get DF for a table that is not in the default database ? IMHO, If you are not going to support “databaseName.tableName”, sqlContext.table() should have a version that takes a database and a table, ie def table(databaseName: String, tableName: String): DataFrame = DataFrame(this, catalog.lookupRelation(Seq(databaseName,tableName))) The handling of databases in Spark(sqlContext, hiveContext, Catalog) could be better. Thanks, Doug > On Jun 3, 2015, at 8:21 PM, Yin Huai wrote: > > Hi Doug, > > Actually, sqlContext.table does not support database name in both Spark 1.3 > and Spark 1.4. We will support it in future version. > > Thanks, > > Yin > > > > On Wed, Jun 3, 2015 at 10:45 AM, Doug Balog wrote: > Hi, > > sqlContext.table(“db.tbl”) isn’t working for me, I get a NoSuchTableException. > > But I can access the table via > > sqlContext.sql(“select * from db.tbl”) > > So I know it has the table info from the metastore. > > Anyone else see this ? > > I’ll keep digging. > I compiled via make-distribution -Pyarn -phadoop-2.4 -Phive > -Phive-thriftserver > It worked for me in 1.3.1 > > Cheers, > > Doug > > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark 1.4.0-rc4 HiveContext.table("db.tbl") NoSuchTableException
Hi, sqlContext.table(“db.tbl”) isn’t working for me, I get a NoSuchTableException. But I can access the table via sqlContext.sql(“select * from db.tbl”) So I know it has the table info from the metastore. Anyone else see this ? I’ll keep digging. I compiled via make-distribution -Pyarn -phadoop-2.4 -Phive -Phive-thriftserver It worked for me in 1.3.1 Cheers, Doug - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark Job triggers second attempt
I bet you are running on YARN in cluster mode. If you are running on yarn in client mode, .set(“spark.yarn.maxAppAttempts”,”1”) works as you expect, because YARN doesn’t start your app on the cluster until you call SparkContext(). But If you are running on yarn in cluster mode, the driver program runs from a cluster node. So your app is already running on the cluster when you call .set(). To make it work in cluster mode, the property must be set on the spark-submit command line via "—conf spark.yarn.maxAppAttempts=1” or —driver-options “-Dspark.yarn.maxAppAttempts=1” A note should be added to running-on-yarn.html in the "Important notes” section that says that in cluster mode you need to set spark.yarn.* properties from spark-submit command line. Cheers, Doug > On May 7, 2015, at 2:34 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: > > How i can stop Spark to stop triggering second attempt in case the first > fails. > I do not want to wait for the second attempt to fail again so that i can > debug faster. > > .set("spark.yarn.maxAppAttempts", "0") OR .set("spark.yarn.maxAppAttempts", > "1") > > is not helping. > > > -- > Deepak > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class
The “best” solution to spark-shell’s problem is creating a file $SPARK_HOME/conf/java-opts with “-Dhdp.version=2.2.0.0-2014” Cheers, Doug > On Mar 28, 2015, at 1:25 PM, Michael Stone wrote: > > I've also been having trouble running 1.3.0 on HDP. The > spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0-2041 > configuration directive seems to work with pyspark, but not propagate when > using spark-shell. (That is, everything works find with pyspark, and > spark-shell fails with the "bad substitution" message.) > > Mike Stone > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class
I found the problem. In mapped-site.xml, mapreduce.application.classpath has references to “${hdp.version}” which is not getting replaced when launch_container.sh is created. The executor fails with a substitution error at line 27 in launch_container.sh because bash can’t deal with “${hdp.version}." I have hdp.version defined in my spark-defaults.conf via spark.{driver,yarn.am}.extraJavaOptions -Dhdp.version=2.2.0-2041, so something is not doing the substitution. To work around this problem, I replaced "${hdp.version}” with “current” in mapred-site.xml. I found a similar bug, https://issues.apache.org/jira/browse/AMBARI-8028, and the fix was exactly what I did to work around it. Not sure if this is an AMBARI bug (not doing variable substitution when writing mapred-site.xml) or YARN bug (its not doing the variable substitution when writing launch_container.sh) Anybody have an opinion ? Doug > On Mar 19, 2015, at 5:51 PM, Doug Balog wrote: > > I’m seeing the same problem. > I’ve set logging to DEBUG, and I think some hints are in the “Yarn AM launch > context” that is printed out > before Yarn runs java. > > My next step is to talk to the admins and get them to set > yarn.nodemanager.delete.debug-delay-sec > in the config, as recommended in > http://spark.apache.org/docs/latest/running-on-yarn.html > Then I can see exactly whats in the directory. > > Doug > > ps Sorry for the dup message Bharath and Todd, used wrong email address. > > >> On Mar 19, 2015, at 1:19 AM, Bharath Ravi Kumar wrote: >> >> Thanks for clarifying Todd. This may then be an issue specific to the HDP >> version we're using. Will continue to debug and post back if there's any >> resolution. >> >> On Thu, Mar 19, 2015 at 3:40 AM, Todd Nist wrote: >> Yes I believe you are correct. >> >> For the build you may need to specify the specific HDP version of hadoop to >> use with the -Dhadoop.version=. I went with the default 2.6.0, but >> Horton may have a vendor specific version that needs to go here. I know I >> saw a similar post today where the solution was to use >> -Dhadoop.version=2.5.0-cdh5.3.2 but that was for a cloudera installation. I >> am not sure what the HDP version would be to put here. >> >> -Todd >> >> On Wed, Mar 18, 2015 at 12:49 AM, Bharath Ravi Kumar >> wrote: >> Hi Todd, >> >> Yes, those entries were present in the conf under the same SPARK_HOME that >> was used to run spark-submit. On a related note, I'm assuming that the >> additional spark yarn options (like spark.yarn.jar) need to be set in the >> same properties file that is passed to spark-submit. That apart, I assume >> that no other host on the cluster should require a "deployment of" the spark >> distribution or any other config change to support a spark job. Isn't that >> correct? >> >> On Tue, Mar 17, 2015 at 6:19 PM, Todd Nist wrote: >> Hi Bharath, >> >> Do you have these entries in your $SPARK_HOME/conf/spark-defaults.conf file? >> >> spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0-2041 >> spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0-2041 >> >> >> >> >> On Tue, Mar 17, 2015 at 1:04 AM, Bharath Ravi Kumar >> wrote: >> Still no luck running purpose-built 1.3 against HDP 2.2 after following all >> the instructions. Anyone else faced this issue? >> >> On Mon, Mar 16, 2015 at 8:53 PM, Bharath Ravi Kumar >> wrote: >> Hi Todd, >> >> Thanks for the help. I'll try again after building a distribution with the >> 1.3 sources. However, I wanted to confirm what I mentioned earlier: is it >> sufficient to copy the distribution only to the client host from where >> spark-submit is invoked(with spark.yarn.jar set), or is there a need to >> ensure that the entire distribution is deployed made available pre-deployed >> on every host in the yarn cluster? I'd assume that the latter shouldn't be >> necessary. >> >> On Mon, Mar 16, 2015 at 8:38 PM, Todd Nist wrote: >> Hi Bharath, >> >> I ran into the same issue a few days ago, here is a link to a post on >> Horton's fourm. http://hortonworks.com/community/forums/search/spark+1.2.1/ >> Incase anyone else needs to perform this these are the steps I took to get >> it to work with Spark 1.2.1 as well as Spark 1.3.0-RC3: >> >> 1. Pull 1.2.1 Source >> 2. Apply the following patches >> a. Address jackson version, https://github.com/apache/spark/pull/3938 >> b. Address the propagation of the hdp.version set in
Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class
I’m seeing the same problem. I’ve set logging to DEBUG, and I think some hints are in the “Yarn AM launch context” that is printed out before Yarn runs java. My next step is to talk to the admins and get them to set yarn.nodemanager.delete.debug-delay-sec in the config, as recommended in http://spark.apache.org/docs/latest/running-on-yarn.html Then I can see exactly whats in the directory. Doug ps Sorry for the dup message Bharath and Todd, used wrong email address. > On Mar 19, 2015, at 1:19 AM, Bharath Ravi Kumar wrote: > > Thanks for clarifying Todd. This may then be an issue specific to the HDP > version we're using. Will continue to debug and post back if there's any > resolution. > > On Thu, Mar 19, 2015 at 3:40 AM, Todd Nist wrote: > Yes I believe you are correct. > > For the build you may need to specify the specific HDP version of hadoop to > use with the -Dhadoop.version=. I went with the default 2.6.0, but > Horton may have a vendor specific version that needs to go here. I know I > saw a similar post today where the solution was to use > -Dhadoop.version=2.5.0-cdh5.3.2 but that was for a cloudera installation. I > am not sure what the HDP version would be to put here. > > -Todd > > On Wed, Mar 18, 2015 at 12:49 AM, Bharath Ravi Kumar > wrote: > Hi Todd, > > Yes, those entries were present in the conf under the same SPARK_HOME that > was used to run spark-submit. On a related note, I'm assuming that the > additional spark yarn options (like spark.yarn.jar) need to be set in the > same properties file that is passed to spark-submit. That apart, I assume > that no other host on the cluster should require a "deployment of" the spark > distribution or any other config change to support a spark job. Isn't that > correct? > > On Tue, Mar 17, 2015 at 6:19 PM, Todd Nist wrote: > Hi Bharath, > > Do you have these entries in your $SPARK_HOME/conf/spark-defaults.conf file? > > spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0-2041 > spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0-2041 > > > > > On Tue, Mar 17, 2015 at 1:04 AM, Bharath Ravi Kumar > wrote: > Still no luck running purpose-built 1.3 against HDP 2.2 after following all > the instructions. Anyone else faced this issue? > > On Mon, Mar 16, 2015 at 8:53 PM, Bharath Ravi Kumar > wrote: > Hi Todd, > > Thanks for the help. I'll try again after building a distribution with the > 1.3 sources. However, I wanted to confirm what I mentioned earlier: is it > sufficient to copy the distribution only to the client host from where > spark-submit is invoked(with spark.yarn.jar set), or is there a need to > ensure that the entire distribution is deployed made available pre-deployed > on every host in the yarn cluster? I'd assume that the latter shouldn't be > necessary. > > On Mon, Mar 16, 2015 at 8:38 PM, Todd Nist wrote: > Hi Bharath, > > I ran into the same issue a few days ago, here is a link to a post on > Horton's fourm. http://hortonworks.com/community/forums/search/spark+1.2.1/ > Incase anyone else needs to perform this these are the steps I took to get it > to work with Spark 1.2.1 as well as Spark 1.3.0-RC3: > > 1. Pull 1.2.1 Source > 2. Apply the following patches > a. Address jackson version, https://github.com/apache/spark/pull/3938 > b. Address the propagation of the hdp.version set in the spark-default.conf, > https://github.com/apache/spark/pull/3409 > 3. build with $SPARK_HOME./make-distribution.sh –name hadoop2.6 –tgz -Pyarn > -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests > package > > Then deploy the resulting artifact => spark-1.2.1-bin-hadoop2.6.tgz following > instructions in the HDP Spark preview > http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/ > > FWIW spark-1.3.0 appears to be working fine with HDP as well and steps 2a and > 2b are not required. > > HTH > > -Todd > > > On Mon, Mar 16, 2015 at 10:13 AM, Bharath Ravi Kumar > wrote: > Hi, > > Trying to run spark ( 1.2.1 built for hdp 2.2) against a yarn cluster results > in the AM failing to start with following error on stderr: > Error: Could not find or load main class > org.apache.spark.deploy.yarn.ExecutorLauncher > An application id was assigned to the job, but there were no logs. Note that > the spark distribution has not been "installed" on every host in the cluster > and the aforementioned spark build was copied to one of the hadoop client > hosts in the cluster to launch the > job. Spark-submit was run with --master yarn-client and spark.yarn.jar was > set to the assembly jar from the above distribution. Switching the spark > distribution to the HDP recommended version > and following the instructions on this page did not fix the problem either. > Any idea what may have caused this error ? > > Thanks, > Bharath > > > > > > > - To unsubscribe, e-ma