Re: Spark job is failing with kerberos error while creating hive context in yarn-cluster mode (through spark-submit)

2016-05-23 Thread Doug Balog
I have a custom  hive-site.xml for spark in sparks conf directory.
These properties are the minimal ones that you need for spark, I believe.

hive.metastore.kerberos.principal = copy from your hive-site.xml,  i.e.  
"hive/_h...@foo.com"
hive.metastore.uris = copy from your hive-site.xml,  i.e. 
thrift://ms1.foo.com:9083
hive.metastore.sasl.enabled = true
hive.security.authorization.enabled = false

Cheers,

Doug



> On May 23, 2016, at 7:41 AM, Chandraprakash Bhagtani  
> wrote:
> 
> Hi,
> 
> My Spark job is failing with kerberos issues while creating hive context in 
> yarn-cluster mode. However it is running with yarn-client mode. My spark 
> version is 1.6.1
> 
> I am passing hive-site.xml through --files option. 
> 
> I tried searching online and found that the same issue is fixed with the 
> following jira SPARK-6207. it is fixed in spark 1.4, but I am running 1.6.1
> 
> Am i missing any configuration here?
> 
> 
> -- 
> Thanks & Regards,
> Chandra Prakash Bhagtani


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Error trying to get DF for Hive table stored HBase

2016-02-02 Thread Doug Balog
I’m trying to create a DF for an external Hive table that is in HBase. 
I get the a NoSuchMethodError 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initSerdeParams(Lorg/apache/hadoop/conf/Configuration;Ljava/util/Properties;Ljava/lang/String;)Lorg/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe$SerDeParameters;

I’m running Spark 1.6.0 on HDP 2.2.4-12-1 (Hive 0.14 and HBase 0.98.4) in 
secure mode. 

Anybody see this before ?

Below is a stack trace and the hive table’s info.

scala> sqlContext.table("item_data_lib.pcn_item")
java.lang.NoSuchMethodError: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initSerdeParams(Lorg/apache/hadoop/conf/Configuration;Ljava/util/Properties;Ljava/lang/String;)Lorg/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe$SerDeParameters;
at 
org.apache.hadoop.hive.hbase.HBaseSerDeParameters.(HBaseSerDeParameters.java:93)
at 
org.apache.hadoop.hive.hbase.HBaseSerDe.initialize(HBaseSerDe.java:92)
at 
org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:53)
at 
org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:521)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:391)
at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:276)
at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:258)
at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:605)
at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1$$anonfun$3.apply(ClientWrapper.scala:331)
at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1$$anonfun$3.apply(ClientWrapper.scala:326)
at scala.Option.map(Option.scala:145)
at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1.apply(ClientWrapper.scala:326)
at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1.apply(ClientWrapper.scala:321)
at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:279)
at 
org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:226)
at 
org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:225)
at 
org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:268)
at 
org.apache.spark.sql.hive.client.ClientWrapper.getTableOption(ClientWrapper.scala:321)
at 
org.apache.spark.sql.hive.client.ClientInterface$class.getTable(ClientInterface.scala:122)
at 
org.apache.spark.sql.hive.client.ClientWrapper.getTable(ClientWrapper.scala:60)
at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:384)
at 
org.apache.spark.sql.hive.HiveContext$$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:457)
at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:161)
at 
org.apache.spark.sql.hive.HiveContext$$anon$2.lookupRelation(HiveContext.scala:457)
at org.apache.spark.sql.SQLContext.table(SQLContext.scala:831)
at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827)


hive> show create table item_data_lib.pcn_item;
OK
CREATE EXTERNAL TABLE `item_data_lib.pcn_item`(
  `key` string COMMENT 'from deserializer',
  `p1` string COMMENT 'from deserializer',
  `p2` string COMMENT 'from deserializer',
  `p3` string COMMENT 'from deserializer',
  `p4` string COMMENT 'from deserializer',
  `p5` string COMMENT 'from deserializer',
  `p6` string COMMENT 'from deserializer',
  `p7` string COMMENT 'from deserializer',
  `p8` string COMMENT 'from deserializer',
  `p9` string COMMENT 'from deserializer',
  `p10` string COMMENT 'from deserializer',
  `p11` string COMMENT 'from deserializer',
  `p12` string COMMENT 'from deserializer',
  `p13` string COMMENT 'from deserializer',
  `d1` string COMMENT 'from deserializer',
  `d2` string COMMENT 'from deserializer',
  `d3` string COMMENT 'from deserializer',
  `d4` string COMMENT 'from deserializer',
  `d5` string COMMENT 'from deserializer',
  `d6` string COMMENT 'from deserializer',
  `d7` string COMMENT 'from deserializer',
  `d8` string COMMENT 'from deserializer',
  `d9` string COMMENT 'from deserializer',
  `d10` string COMMENT 'from deserializer',
  `d11` string COMMENT 'from deserializer',
  `d12` string COMMENT 'from deserializer',
  `d13` string COMMENT 'from deserializer',
  `d14` string COMMENT 'from deserializer',
  `d15` string COMMENT 'from deserializer',
  `d16` string COMMENT 'from deserializer',
  `d17` string COMMENT 'from deserializer')
ROW FORMAT SERDE
  'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY
  'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
  

Re: Accessing external Kerberised resources from Spark executors in Yarn client/cluster mode

2015-10-22 Thread Doug Balog
Another thing to check is to make sure each one of you executor nodes has the 
JCE jars installed.

try{ javax.crypto.Cipher.getMaxAllowedKeyLength("AES") > 128 } catch { case  
e:java.security.NoSuchAlgorithmException => false }

Setting  "-Dsun.security.krb5.debug=true” and “-Dsun.security.jgss.debug=true”  
in spark.executor.extraJavaOptions
and running loginUserFromKeytab() will generate a lot of info in the executor 
logs, which might be helpful to figure out what is going on too.

Cheers,

Doug


> On Oct 22, 2015, at 7:59 AM, Deenar Toraskar  
> wrote:
> 
> Hi All
> 
> I am trying to access a SQLServer that uses Kerberos for authentication from 
> Spark. I can successfully connect to the SQLServer from the driver node, but 
> any connections to SQLServer from executors fails with "Failed to find any 
> Kerberos tgt". 
> 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser on the driver 
> returns myPrincipal (auth:KERBEROS) as expected. And the same call on 
> executors returns
> 
> sc.parallelize(0 to 10).map { _ =>(("hostname".!!).trim, 
> UserGroupInformation.getCurrentUser.toString)}.collect.distinct 
> 
> returns
> 
> Array((hostname1, myprincipal (auth:SIMPLE), (hostname2, myprincipal 
> (auth:SIMPLE))
> 
> 
> I tried passing the keytab and logging in explicitly from the executors, but 
> that didnt help either.
> 
> sc.parallelize(0 to 10).map { _ 
> =>(SparkHadoopUtil.get.loginUserFromKeytab("myprincipal",SparkFiles.get("myprincipal.keytab")),
>  ("hostname".!!).trim, 
> UserGroupInformation.getCurrentUser.toString)}.collect.distinct
> 
> Digging deeper I found SPARK-6207 and came across code for each Kerberised 
> service that is accessed from the executors in Yarn Client, such as
> 
> obtainTokensForNamenodes(nns, hadoopConf, credentials)
> 
> obtainTokenForHiveMetastore(hadoopConf, credentials)
> 
> I was wondering if anyone has been successful in accessing external resources 
> (running external to the Hadoop cluster) secured by Kerberos in Spark 
> executors running in Yarn. 
> 
> 
> 
> Regards
> Deenar
> 
> 
> On 20 April 2015 at 21:58, Andrew Lee  wrote:
> Hi All,
> 
> Affected version: spark 1.2.1 / 1.2.2 / 1.3-rc1
> 
> Posting this problem to user group first to see if someone is encountering 
> the same problem. 
> 
> When submitting spark jobs that invokes HiveContext APIs on a Kerberos Hadoop 
> + YARN (2.4.1) cluster, 
> I'm getting this error. 
> 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> 
> Apparently, the Kerberos ticket is not on the remote data node nor computing 
> node since we don't 
> deploy Kerberos tickets, and that is not a good practice either. On the other 
> hand, we can't just SSH to every machine and run kinit for that users. This 
> is not practical and it is insecure.
> 
> The point here is that shouldn't there be a delegation token during the doAs 
> to use the token instead of the ticket ? 
> I'm trying to understand what is missing in Spark's HiveContext API while a 
> normal MapReduce job that invokes Hive APIs will work, but not in Spark SQL. 
> Any insights or feedback are appreciated.
> 
> Anyone got this running without pre-deploying (pre-initializing) all tickets 
> node by node? Is this worth filing a JIRA?
> 
> 
> 
> 15/03/25 18:59:08 INFO hive.metastore: Trying to connect to metastore with 
> URI thrift://alee-cluster.test.testserver.com:9083
> 15/03/25 18:59:08 ERROR transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
>   at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>   at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
>   at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>   at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>   at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
>   at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:336)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:214)
>   at 

Re: Spark_1.5.1_on_HortonWorks

2015-10-20 Thread Doug Balog
I have been running 1.5.1 with Hive in secure mode on HDP 2.2.4 without any 
problems.

Doug

> On Oct 21, 2015, at 12:05 AM, Ajay Chander  wrote:
> 
> Hi Everyone,
> 
> Any one has any idea if spark-1.5.1 is available as a service on HortonWorks 
> ? I have spark-1.3.1 installed on the Cluster and it is a HortonWorks 
> distribution. Now I want upgrade it to spark-1.5.1. Anyone here have any idea 
> about it? Thank you in advance.
> 
> Regards,
> Ajay


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Unable to start spark-shell on YARN

2015-09-24 Thread Doug Balog
The error is because the shell is trying to resolve hdp.version and can’t.
To fix this, you need to put a file called java-opts in your conf directory 
that  has something like this.

-Dhdp.version=2.x.x.x

Where 2.x.x.x is there version of hdp that you are using.
Cheers,

Doug

> On Sep 24, 2015, at 6:11 PM, ÐΞ€ρ@Ҝ (๏̯͡๏)  wrote:
> 
> Spark 1.4.1
> YARN
> Hadoop version: 2.7.1.2.3.1.0-2574
> ./bin/spark-shell  --master yarn
> Hadoop cluster setup using Ambari.
> 
> 
> Shell fails as YARN job failed. Any suggestions ? 
> 
> LOGS:
> 
> 15/09/24 15:07:51 INFO impl.YarnClientImpl: Submitted application 
> application_1443126834156_0016
> 15/09/24 15:07:52 INFO yarn.Client: Application report for 
> application_1443126834156_0016 (state: ACCEPTED)
> 15/09/24 15:07:52 INFO yarn.Client: 
>client token: N/A
>diagnostics: N/A
>ApplicationMaster host: N/A
>ApplicationMaster RPC port: -1
>queue: default
>start time: 1443132471179
>final status: UNDEFINED
>tracking URL: http://host:8088/proxy/application_1443126834156_0016/
>user: zeppelin
> 15/09/24 15:07:53 INFO yarn.Client: Application report for 
> application_1443126834156_0016 (state: ACCEPTED)
> 15/09/24 15:07:54 INFO yarn.Client: Application report for 
> application_1443126834156_0016 (state: ACCEPTED)
> 15/09/24 15:07:55 INFO yarn.Client: Application report for 
> application_1443126834156_0016 (state: ACCEPTED)
> 15/09/24 15:07:56 INFO yarn.Client: Application report for 
> application_1443126834156_0016 (state: FAILED)
> 15/09/24 15:07:56 INFO yarn.Client: 
>client token: N/A
>diagnostics: Application application_1443126834156_0016 failed 2 times 
> due to AM Container for appattempt_1443126834156_0016_02 exited with  
> exitCode: 1
> For more detailed output, check application tracking 
> page:http://host:8088/cluster/app/application_1443126834156_0016Then, click 
> on links to logs of each attempt.
> Diagnostics: Exception from container-launch.
> Container id: container_e03_1443126834156_0016_02_01
> Exit code: 1
> Exception message: 
> /hadoop/yarn/local/usercache/zeppelin/appcache/application_1443126834156_0016/container_e03_1443126834156_0016_02_01/launch_container.sh:
>  line 24: 
> $PWD:$PWD/__hadoop_conf__:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:
>  bad substitution
> 
> Stack trace: ExitCodeException exitCode=1: 
> /hadoop/yarn/local/usercache/zeppelin/appcache/application_1443126834156_0016/container_e03_1443126834156_0016_02_01/launch_container.sh:
>  line 24: 
> $PWD:$PWD/__hadoop_conf__:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:
>  bad substitution
> 
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
>   at org.apache.hadoop.util.Shell.run(Shell.java:456)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> 

Re: Local Repartition

2015-07-20 Thread Doug Balog
Hi Daniel,
Take a look at .coalesce()
I’ve seen good results by coalescing to num executors * 10, but I’m still 
trying to figure out the 
optimal number of partitions per executor. 
To get the number of executors, sc.getConf.getInt(“spark.executor.instances”,-1)


Cheers,

Doug

 On Jul 20, 2015, at 5:04 AM, Daniel Haviv daniel.ha...@veracity-group.com 
 wrote:
 
 Hi,
 My data is constructed from a lot of small files which results in a lot of 
 partitions per RDD.
 Is there some way to locally repartition the RDD without shuffling so that 
 all of the partitions that reside on a specific node will become X partitions 
 on the same node ?
 
 Thank you.
 Daniel


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark 1.4 on HortonWork HDP 2.2

2015-06-19 Thread Doug Balog
If you run Hadoop in secure mode and want to talk to Hive 0.14, it won’t work, 
see SPARK-5111 
I have a patched version of 1.3.1 that I’ve been using.
I haven’t had the time to get 1.4.0 working. 

Cheers,

Doug



 On Jun 19, 2015, at 8:39 AM, ayan guha guha.a...@gmail.com wrote:
 
 I think you can get spark 1.4 pre built with hadoop 2.6 (as that what hdp 2.2 
 provides) and just start using it
 
 On Fri, Jun 19, 2015 at 10:28 PM, Ashish Soni asoni.le...@gmail.com wrote:
 I do not where to start  as Spark 1.2 comes bundled with HDP2.2 but i want to 
 use 1.4 and i do not know how to update it to 1.4
 
 Ashish
 
 On Fri, Jun 19, 2015 at 8:26 AM, ayan guha guha.a...@gmail.com wrote:
 what problem are you facing? are you trying to build it yurself or 
 gettingpre-built version?
 
 On Fri, Jun 19, 2015 at 10:22 PM, Ashish Soni asoni.le...@gmail.com wrote:
 Hi , 
 
 Is any one able to install Spark 1.4 on HDP 2.2 , Please let me know how can 
 i do the same ?
 
 Ashish
 
 
 
 -- 
 Best Regards,
 Ayan Guha
 
 
 
 
 -- 
 Best Regards,
 Ayan Guha


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark 1.4.0-rc4 HiveContext.table(db.tbl) NoSuchTableException

2015-06-05 Thread Doug Balog
Hi Yin,
 Thanks for the suggestion.
I’m not happy about this, and I don’t agree with your position that since it 
wasn’t an “officially” supported feature 
 no harm was done breaking it in the course of implementing SPARK-6908. I would 
still argue that it changed 
and therefore broke .table()’s api.
(As you know, I’ve filed 2 bugs regarding this SPARK-8105 and SPARK-8107)

I’m done complaining about this issue. 
My short term plan is to change my code for 1.4.0 and 
possibility work on a cleaner solution for 1.5.0 that will be acceptable.

Thanks for looking into it and responding to my initial email.

Doug


 On Jun 5, 2015, at 3:36 PM, Yin Huai yh...@databricks.com wrote:
 
 Hi Doug,
 
 For now, I think you can use sqlContext.sql(USE databaseName) to change 
 the current database.
 
 Thanks,
 
 Yin
 
 On Thu, Jun 4, 2015 at 12:04 PM, Yin Huai yh...@databricks.com wrote:
 Hi Doug,
 
 sqlContext.table does not officially support database name. It only supports 
 table name as the parameter. We will add a method to support database name in 
 future.
 
 Thanks,
 
 Yin
 
 On Thu, Jun 4, 2015 at 8:10 AM, Doug Balog doug.sparku...@dugos.com wrote:
 Hi Yin,
  I’m very surprised to hear that its not supported in 1.3 because I’ve been 
 using it since 1.3.0.
 It worked great up until  SPARK-6908 was merged into master.
 
 What is the supported way to get  DF for a table that is not in the default 
 database ?
 
 IMHO, If you are not going to support “databaseName.tableName”, 
 sqlContext.table() should have a version that takes a database and a table, ie
 
 def table(databaseName: String, tableName: String): DataFrame =
   DataFrame(this, catalog.lookupRelation(Seq(databaseName,tableName)))
 
 The handling of databases in Spark(sqlContext, hiveContext, Catalog) could be 
 better.
 
 Thanks,
 
 Doug
 
  On Jun 3, 2015, at 8:21 PM, Yin Huai yh...@databricks.com wrote:
 
  Hi Doug,
 
  Actually, sqlContext.table does not support database name in both Spark 1.3 
  and Spark 1.4. We will support it in future version.
 
  Thanks,
 
  Yin
 
 
 
  On Wed, Jun 3, 2015 at 10:45 AM, Doug Balog doug.sparku...@dugos.com 
  wrote:
  Hi,
 
  sqlContext.table(“db.tbl”) isn’t working for me, I get a 
  NoSuchTableException.
 
  But I can access the table via
 
  sqlContext.sql(“select * from db.tbl”)
 
  So I know it has the table info from the metastore.
 
  Anyone else see this ?
 
  I’ll keep digging.
  I compiled via make-distribution  -Pyarn -phadoop-2.4 -Phive 
  -Phive-thriftserver
  It worked for me in 1.3.1
 
  Cheers,
 
  Doug
 
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 
 
 
 
 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark 1.4.0-rc4 HiveContext.table(db.tbl) NoSuchTableException

2015-06-04 Thread Doug Balog
Hi Yin,
 I’m very surprised to hear that its not supported in 1.3 because I’ve been 
using it since 1.3.0.
It worked great up until  SPARK-6908 was merged into master.

What is the supported way to get  DF for a table that is not in the default 
database ?

IMHO, If you are not going to support “databaseName.tableName”, 
sqlContext.table() should have a version that takes a database and a table, ie

def table(databaseName: String, tableName: String): DataFrame =
  DataFrame(this, catalog.lookupRelation(Seq(databaseName,tableName)))

The handling of databases in Spark(sqlContext, hiveContext, Catalog) could be 
better.

Thanks,

Doug

 On Jun 3, 2015, at 8:21 PM, Yin Huai yh...@databricks.com wrote:
 
 Hi Doug,
 
 Actually, sqlContext.table does not support database name in both Spark 1.3 
 and Spark 1.4. We will support it in future version. 
 
 Thanks,
 
 Yin
 
  
 
 On Wed, Jun 3, 2015 at 10:45 AM, Doug Balog doug.sparku...@dugos.com wrote:
 Hi,
 
 sqlContext.table(“db.tbl”) isn’t working for me, I get a NoSuchTableException.
 
 But I can access the table via
 
 sqlContext.sql(“select * from db.tbl”)
 
 So I know it has the table info from the metastore.
 
 Anyone else see this ?
 
 I’ll keep digging.
 I compiled via make-distribution  -Pyarn -phadoop-2.4 -Phive 
 -Phive-thriftserver
 It worked for me in 1.3.1
 
 Cheers,
 
 Doug
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 
 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark 1.4.0-rc4 HiveContext.table(db.tbl) NoSuchTableException

2015-06-03 Thread Doug Balog
Hi, 
 
sqlContext.table(“db.tbl”) isn’t working for me, I get a NoSuchTableException.

But I can access the table via 

sqlContext.sql(“select * from db.tbl”)

So I know it has the table info from the metastore. 

Anyone else see this ?

I’ll keep digging. 
I compiled via make-distribution  -Pyarn -phadoop-2.4 -Phive -Phive-thriftserver
It worked for me in 1.3.1

Cheers,

Doug


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Job triggers second attempt

2015-05-07 Thread Doug Balog
I bet you are running on YARN in cluster mode.

If you are running on yarn in client mode, 
.set(“spark.yarn.maxAppAttempts”,”1”) works as you expect,
because YARN doesn’t start your app on the cluster until you call 
SparkContext().

But If you are running on yarn in cluster mode, the driver program runs from a 
cluster node.
So your app is already running on the cluster when you call .set().
To make it work in cluster mode,  the property must be set on the spark-submit 
command line via 
—conf spark.yarn.maxAppAttempts=1”
or —driver-options “-Dspark.yarn.maxAppAttempts=1”


A note should be added to running-on-yarn.html in the Important notes” section 
that
says that in cluster mode you need to set  spark.yarn.* properties from 
spark-submit command line.

Cheers,

Doug




 On May 7, 2015, at 2:34 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
 
 How i can stop Spark to stop triggering second attempt in case the first 
 fails.
 I do not want to wait for the second attempt to fail again so that i can 
 debug faster.
 
 .set(spark.yarn.maxAppAttempts, 0) OR .set(spark.yarn.maxAppAttempts, 
 1)
 
 is not helping.
 
 
 -- 
 Deepak
 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-30 Thread Doug Balog
The “best” solution to spark-shell’s  problem is creating a file 
$SPARK_HOME/conf/java-opts
with “-Dhdp.version=2.2.0.0-2014”

Cheers,

Doug

 On Mar 28, 2015, at 1:25 PM, Michael Stone mst...@mathom.us wrote:
 
 I've also been having trouble running 1.3.0 on HDP. The 
 spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0-2041
 configuration directive seems to work with pyspark, but not propagate when 
 using spark-shell. (That is, everything works find with pyspark, and 
 spark-shell fails with the bad substitution message.)
 
 Mike Stone
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-24 Thread Doug Balog
I found the problem.
In  mapped-site.xml, mapreduce.application.classpath has references to 
“${hdp.version}” which is not getting replaced
when launch_container.sh is created. The executor fails with a substitution 
error at line 27 in launch_container.sh because bash
can’t deal with “${hdp.version}.
I have hdp.version defined in my spark-defaults.conf via 
spark.{driver,yarn.am}.extraJavaOptions -Dhdp.version=2.2.0-2041,
so something is not doing the substitution.

To work around this problem, I replaced ${hdp.version}” with “current” in 
mapred-site.xml.
I found a similar bug, https://issues.apache.org/jira/browse/AMBARI-8028, and 
the fix was exactly what I did to work around it.
Not sure if this is an AMBARI bug (not doing variable substitution when writing 
mapred-site.xml) or YARN bug (its not doing the variable substitution when 
writing launch_container.sh) 

Anybody have an opinion ? 

Doug



 On Mar 19, 2015, at 5:51 PM, Doug Balog doug.sparku...@dugos.com wrote:
 
 I’m seeing the same problem.
 I’ve set logging to DEBUG, and I think some hints are in the “Yarn AM launch 
 context” that is printed out 
 before Yarn  runs java. 
 
 My next step is to talk to the admins and get them to set 
 yarn.nodemanager.delete.debug-delay-sec
 in the config, as recommended in 
 http://spark.apache.org/docs/latest/running-on-yarn.html
 Then I can see exactly whats in the directory.
 
 Doug
 
 ps Sorry for the dup message Bharath and Todd, used wrong email address.
 
 
 On Mar 19, 2015, at 1:19 AM, Bharath Ravi Kumar reachb...@gmail.com wrote:
 
 Thanks for clarifying Todd. This may then be an issue specific to the HDP 
 version we're using. Will continue to debug and post back if there's any 
 resolution.
 
 On Thu, Mar 19, 2015 at 3:40 AM, Todd Nist tsind...@gmail.com wrote:
 Yes I believe you are correct.  
 
 For the build you may need to specify the specific HDP version of hadoop to 
 use with the -Dhadoop.version=.  I went with the default 2.6.0, but 
 Horton may have a vendor specific version that needs to go here.  I know I 
 saw a similar post today where the solution was to use 
 -Dhadoop.version=2.5.0-cdh5.3.2 but that was for a cloudera installation.  I 
 am not sure what the HDP version would be to put here.
 
 -Todd
 
 On Wed, Mar 18, 2015 at 12:49 AM, Bharath Ravi Kumar reachb...@gmail.com 
 wrote:
 Hi Todd,
 
 Yes, those entries were present in the conf under the same SPARK_HOME that 
 was used to run spark-submit. On a related note, I'm assuming that the 
 additional spark yarn options (like spark.yarn.jar) need to be set in the 
 same properties file that is passed to spark-submit. That apart, I assume 
 that no other host on the cluster should require a deployment of the spark 
 distribution or any other config change to support a spark job.  Isn't that 
 correct?
 
 On Tue, Mar 17, 2015 at 6:19 PM, Todd Nist tsind...@gmail.com wrote:
 Hi Bharath,
 
 Do you have these entries in your $SPARK_HOME/conf/spark-defaults.conf file?
 
 spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0-2041
 spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0-2041
 
 
 
 
 On Tue, Mar 17, 2015 at 1:04 AM, Bharath Ravi Kumar reachb...@gmail.com 
 wrote:
 Still no luck running purpose-built 1.3 against HDP 2.2 after following all 
 the instructions. Anyone else faced this issue?
 
 On Mon, Mar 16, 2015 at 8:53 PM, Bharath Ravi Kumar reachb...@gmail.com 
 wrote:
 Hi Todd,
 
 Thanks for the help. I'll try again after building a distribution with the 
 1.3 sources. However, I wanted to confirm what I mentioned earlier:  is it 
 sufficient to copy the distribution only to the client host from where  
 spark-submit is invoked(with spark.yarn.jar set), or is there a need to 
 ensure that the entire distribution is deployed made available pre-deployed 
 on every host in the yarn cluster? I'd assume that the latter shouldn't be 
 necessary.
 
 On Mon, Mar 16, 2015 at 8:38 PM, Todd Nist tsind...@gmail.com wrote:
 Hi Bharath,
 
 I ran into the same issue a few days ago, here is a link to a post on 
 Horton's fourm.  http://hortonworks.com/community/forums/search/spark+1.2.1/
 Incase anyone else needs to perform this these are the steps I took to get 
 it to work with Spark 1.2.1 as well as Spark 1.3.0-RC3:
 
 1. Pull 1.2.1 Source
 2. Apply the following patches
 a. Address jackson version, https://github.com/apache/spark/pull/3938
 b. Address the propagation of the hdp.version set in the spark-default.conf, 
 https://github.com/apache/spark/pull/3409
 3. build with $SPARK_HOME./make-distribution.sh –name hadoop2.6 –tgz -Pyarn 
 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests 
 package
 
 Then deploy the resulting artifact = spark-1.2.1-bin-hadoop2.6.tgz 
 following instructions in the HDP Spark preview 
 http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/
 
 FWIW spark-1.3.0 appears to be working fine with HDP as well and steps 2a 
 and 2b are not required.
 
 HTH
 
 -Todd

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-19 Thread Doug Balog
I’m seeing the same problem.
I’ve set logging to DEBUG, and I think some hints are in the “Yarn AM launch 
context” that is printed out 
before Yarn  runs java. 

My next step is to talk to the admins and get them to set 
yarn.nodemanager.delete.debug-delay-sec
in the config, as recommended in 
http://spark.apache.org/docs/latest/running-on-yarn.html
Then I can see exactly whats in the directory.

Doug

ps Sorry for the dup message Bharath and Todd, used wrong email address.


 On Mar 19, 2015, at 1:19 AM, Bharath Ravi Kumar reachb...@gmail.com wrote:
 
 Thanks for clarifying Todd. This may then be an issue specific to the HDP 
 version we're using. Will continue to debug and post back if there's any 
 resolution.
 
 On Thu, Mar 19, 2015 at 3:40 AM, Todd Nist tsind...@gmail.com wrote:
 Yes I believe you are correct.  
 
 For the build you may need to specify the specific HDP version of hadoop to 
 use with the -Dhadoop.version=.  I went with the default 2.6.0, but 
 Horton may have a vendor specific version that needs to go here.  I know I 
 saw a similar post today where the solution was to use 
 -Dhadoop.version=2.5.0-cdh5.3.2 but that was for a cloudera installation.  I 
 am not sure what the HDP version would be to put here.
 
 -Todd
 
 On Wed, Mar 18, 2015 at 12:49 AM, Bharath Ravi Kumar reachb...@gmail.com 
 wrote:
 Hi Todd,
 
 Yes, those entries were present in the conf under the same SPARK_HOME that 
 was used to run spark-submit. On a related note, I'm assuming that the 
 additional spark yarn options (like spark.yarn.jar) need to be set in the 
 same properties file that is passed to spark-submit. That apart, I assume 
 that no other host on the cluster should require a deployment of the spark 
 distribution or any other config change to support a spark job.  Isn't that 
 correct?
 
 On Tue, Mar 17, 2015 at 6:19 PM, Todd Nist tsind...@gmail.com wrote:
 Hi Bharath,
 
 Do you have these entries in your $SPARK_HOME/conf/spark-defaults.conf file?
 
 spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0-2041
 spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0-2041
 
 
 
 
 On Tue, Mar 17, 2015 at 1:04 AM, Bharath Ravi Kumar reachb...@gmail.com 
 wrote:
 Still no luck running purpose-built 1.3 against HDP 2.2 after following all 
 the instructions. Anyone else faced this issue?
 
 On Mon, Mar 16, 2015 at 8:53 PM, Bharath Ravi Kumar reachb...@gmail.com 
 wrote:
 Hi Todd,
 
 Thanks for the help. I'll try again after building a distribution with the 
 1.3 sources. However, I wanted to confirm what I mentioned earlier:  is it 
 sufficient to copy the distribution only to the client host from where  
 spark-submit is invoked(with spark.yarn.jar set), or is there a need to 
 ensure that the entire distribution is deployed made available pre-deployed 
 on every host in the yarn cluster? I'd assume that the latter shouldn't be 
 necessary.
 
 On Mon, Mar 16, 2015 at 8:38 PM, Todd Nist tsind...@gmail.com wrote:
 Hi Bharath,
 
 I ran into the same issue a few days ago, here is a link to a post on 
 Horton's fourm.  http://hortonworks.com/community/forums/search/spark+1.2.1/
 Incase anyone else needs to perform this these are the steps I took to get it 
 to work with Spark 1.2.1 as well as Spark 1.3.0-RC3:
 
 1. Pull 1.2.1 Source
 2. Apply the following patches
 a. Address jackson version, https://github.com/apache/spark/pull/3938
 b. Address the propagation of the hdp.version set in the spark-default.conf, 
 https://github.com/apache/spark/pull/3409
 3. build with $SPARK_HOME./make-distribution.sh –name hadoop2.6 –tgz -Pyarn 
 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests 
 package
 
 Then deploy the resulting artifact = spark-1.2.1-bin-hadoop2.6.tgz following 
 instructions in the HDP Spark preview 
 http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/
 
 FWIW spark-1.3.0 appears to be working fine with HDP as well and steps 2a and 
 2b are not required.
 
 HTH
 
 -Todd
 
 
 On Mon, Mar 16, 2015 at 10:13 AM, Bharath Ravi Kumar reachb...@gmail.com 
 wrote:
 Hi,
 
 Trying to run spark ( 1.2.1 built for hdp 2.2) against a yarn cluster results 
 in the AM failing to start with following error on stderr: 
 Error: Could not find or load main class 
 org.apache.spark.deploy.yarn.ExecutorLauncher 
 An application id was assigned to the job, but there were no logs. Note that 
 the spark distribution has not been installed on every host in the cluster 
 and the aforementioned spark build was copied  to one of the hadoop client 
 hosts in the cluster to launch the 
 job. Spark-submit was run with --master yarn-client and spark.yarn.jar was 
 set to the assembly jar from the above distribution. Switching the spark 
 distribution to the HDP recommended  version 
 and following the instructions on this page did not fix the problem either. 
 Any idea what may have caused this error ? 
 
 Thanks,
 Bharath