Spark 1.4.0-rc4 HiveContext.table("db.tbl") NoSuchTableException

2015-06-03 Thread Doug Balog
Hi, 
 
sqlContext.table(“db.tbl”) isn’t working for me, I get a NoSuchTableException.

But I can access the table via 

sqlContext.sql(“select * from db.tbl”)

So I know it has the table info from the metastore. 

Anyone else see this ?

I’ll keep digging. 
I compiled via make-distribution  -Pyarn -phadoop-2.4 -Phive -Phive-thriftserver
It worked for me in 1.3.1

Cheers,

Doug


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark 1.4.0-rc4 HiveContext.table("db.tbl") NoSuchTableException

2015-06-04 Thread Doug Balog
Hi Yin,
 I’m very surprised to hear that its not supported in 1.3 because I’ve been 
using it since 1.3.0.
It worked great up until  SPARK-6908 was merged into master.

What is the supported way to get  DF for a table that is not in the default 
database ?

IMHO, If you are not going to support “databaseName.tableName”, 
sqlContext.table() should have a version that takes a database and a table, ie

def table(databaseName: String, tableName: String): DataFrame =
  DataFrame(this, catalog.lookupRelation(Seq(databaseName,tableName)))

The handling of databases in Spark(sqlContext, hiveContext, Catalog) could be 
better.

Thanks,

Doug

> On Jun 3, 2015, at 8:21 PM, Yin Huai  wrote:
> 
> Hi Doug,
> 
> Actually, sqlContext.table does not support database name in both Spark 1.3 
> and Spark 1.4. We will support it in future version. 
> 
> Thanks,
> 
> Yin
> 
>  
> 
> On Wed, Jun 3, 2015 at 10:45 AM, Doug Balog  wrote:
> Hi,
> 
> sqlContext.table(“db.tbl”) isn’t working for me, I get a NoSuchTableException.
> 
> But I can access the table via
> 
> sqlContext.sql(“select * from db.tbl”)
> 
> So I know it has the table info from the metastore.
> 
> Anyone else see this ?
> 
> I’ll keep digging.
> I compiled via make-distribution  -Pyarn -phadoop-2.4 -Phive 
> -Phive-thriftserver
> It worked for me in 1.3.1
> 
> Cheers,
> 
> Doug
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 
> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark 1.4.0-rc4 HiveContext.table("db.tbl") NoSuchTableException

2015-06-05 Thread Doug Balog
Hi Yin,
 Thanks for the suggestion.
I’m not happy about this, and I don’t agree with your position that since it 
wasn’t an “officially” supported feature 
 no harm was done breaking it in the course of implementing SPARK-6908. I would 
still argue that it changed 
and therefore broke .table()’s api.
(As you know, I’ve filed 2 bugs regarding this SPARK-8105 and SPARK-8107)

I’m done complaining about this issue. 
My short term plan is to change my code for 1.4.0 and 
possibility work on a cleaner solution for 1.5.0 that will be acceptable.

Thanks for looking into it and responding to my initial email.

Doug


> On Jun 5, 2015, at 3:36 PM, Yin Huai  wrote:
> 
> Hi Doug,
> 
> For now, I think you can use "sqlContext.sql("USE databaseName")" to change 
> the current database.
> 
> Thanks,
> 
> Yin
> 
> On Thu, Jun 4, 2015 at 12:04 PM, Yin Huai  wrote:
> Hi Doug,
> 
> sqlContext.table does not officially support database name. It only supports 
> table name as the parameter. We will add a method to support database name in 
> future.
> 
> Thanks,
> 
> Yin
> 
> On Thu, Jun 4, 2015 at 8:10 AM, Doug Balog  wrote:
> Hi Yin,
>  I’m very surprised to hear that its not supported in 1.3 because I’ve been 
> using it since 1.3.0.
> It worked great up until  SPARK-6908 was merged into master.
> 
> What is the supported way to get  DF for a table that is not in the default 
> database ?
> 
> IMHO, If you are not going to support “databaseName.tableName”, 
> sqlContext.table() should have a version that takes a database and a table, ie
> 
> def table(databaseName: String, tableName: String): DataFrame =
>   DataFrame(this, catalog.lookupRelation(Seq(databaseName,tableName)))
> 
> The handling of databases in Spark(sqlContext, hiveContext, Catalog) could be 
> better.
> 
> Thanks,
> 
> Doug
> 
> > On Jun 3, 2015, at 8:21 PM, Yin Huai  wrote:
> >
> > Hi Doug,
> >
> > Actually, sqlContext.table does not support database name in both Spark 1.3 
> > and Spark 1.4. We will support it in future version.
> >
> > Thanks,
> >
> > Yin
> >
> >
> >
> > On Wed, Jun 3, 2015 at 10:45 AM, Doug Balog  
> > wrote:
> > Hi,
> >
> > sqlContext.table(“db.tbl”) isn’t working for me, I get a 
> > NoSuchTableException.
> >
> > But I can access the table via
> >
> > sqlContext.sql(“select * from db.tbl”)
> >
> > So I know it has the table info from the metastore.
> >
> > Anyone else see this ?
> >
> > I’ll keep digging.
> > I compiled via make-distribution  -Pyarn -phadoop-2.4 -Phive 
> > -Phive-thriftserver
> > It worked for me in 1.3.1
> >
> > Cheers,
> >
> > Doug
> >
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
> >
> 
> 
> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark 1.4 on HortonWork HDP 2.2

2015-06-19 Thread Doug Balog
If you run Hadoop in secure mode and want to talk to Hive 0.14, it won’t work, 
see SPARK-5111 
I have a patched version of 1.3.1 that I’ve been using.
I haven’t had the time to get 1.4.0 working. 

Cheers,

Doug



> On Jun 19, 2015, at 8:39 AM, ayan guha  wrote:
> 
> I think you can get spark 1.4 pre built with hadoop 2.6 (as that what hdp 2.2 
> provides) and just start using it
> 
> On Fri, Jun 19, 2015 at 10:28 PM, Ashish Soni  wrote:
> I do not where to start  as Spark 1.2 comes bundled with HDP2.2 but i want to 
> use 1.4 and i do not know how to update it to 1.4
> 
> Ashish
> 
> On Fri, Jun 19, 2015 at 8:26 AM, ayan guha  wrote:
> what problem are you facing? are you trying to build it yurself or 
> gettingpre-built version?
> 
> On Fri, Jun 19, 2015 at 10:22 PM, Ashish Soni  wrote:
> Hi , 
> 
> Is any one able to install Spark 1.4 on HDP 2.2 , Please let me know how can 
> i do the same ?
> 
> Ashish
> 
> 
> 
> -- 
> Best Regards,
> Ayan Guha
> 
> 
> 
> 
> -- 
> Best Regards,
> Ayan Guha


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark job is failing with kerberos error while creating hive context in yarn-cluster mode (through spark-submit)

2016-05-23 Thread Doug Balog
I have a custom  hive-site.xml for spark in sparks conf directory.
These properties are the minimal ones that you need for spark, I believe.

hive.metastore.kerberos.principal = copy from your hive-site.xml,  i.e.  
"hive/_h...@foo.com"
hive.metastore.uris = copy from your hive-site.xml,  i.e. 
thrift://ms1.foo.com:9083
hive.metastore.sasl.enabled = true
hive.security.authorization.enabled = false

Cheers,

Doug



> On May 23, 2016, at 7:41 AM, Chandraprakash Bhagtani  
> wrote:
> 
> Hi,
> 
> My Spark job is failing with kerberos issues while creating hive context in 
> yarn-cluster mode. However it is running with yarn-client mode. My spark 
> version is 1.6.1
> 
> I am passing hive-site.xml through --files option. 
> 
> I tried searching online and found that the same issue is fixed with the 
> following jira SPARK-6207. it is fixed in spark 1.4, but I am running 1.6.1
> 
> Am i missing any configuration here?
> 
> 
> -- 
> Thanks & Regards,
> Chandra Prakash Bhagtani


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Error trying to get DF for Hive table stored HBase

2016-02-02 Thread Doug Balog
I’m trying to create a DF for an external Hive table that is in HBase. 
I get the a NoSuchMethodError 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initSerdeParams(Lorg/apache/hadoop/conf/Configuration;Ljava/util/Properties;Ljava/lang/String;)Lorg/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe$SerDeParameters;

I’m running Spark 1.6.0 on HDP 2.2.4-12-1 (Hive 0.14 and HBase 0.98.4) in 
secure mode. 

Anybody see this before ?

Below is a stack trace and the hive table’s info.

scala> sqlContext.table("item_data_lib.pcn_item")
java.lang.NoSuchMethodError: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initSerdeParams(Lorg/apache/hadoop/conf/Configuration;Ljava/util/Properties;Ljava/lang/String;)Lorg/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe$SerDeParameters;
at 
org.apache.hadoop.hive.hbase.HBaseSerDeParameters.(HBaseSerDeParameters.java:93)
at 
org.apache.hadoop.hive.hbase.HBaseSerDe.initialize(HBaseSerDe.java:92)
at 
org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:53)
at 
org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:521)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:391)
at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:276)
at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:258)
at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:605)
at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1$$anonfun$3.apply(ClientWrapper.scala:331)
at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1$$anonfun$3.apply(ClientWrapper.scala:326)
at scala.Option.map(Option.scala:145)
at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1.apply(ClientWrapper.scala:326)
at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1.apply(ClientWrapper.scala:321)
at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:279)
at 
org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:226)
at 
org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:225)
at 
org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:268)
at 
org.apache.spark.sql.hive.client.ClientWrapper.getTableOption(ClientWrapper.scala:321)
at 
org.apache.spark.sql.hive.client.ClientInterface$class.getTable(ClientInterface.scala:122)
at 
org.apache.spark.sql.hive.client.ClientWrapper.getTable(ClientWrapper.scala:60)
at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:384)
at 
org.apache.spark.sql.hive.HiveContext$$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:457)
at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:161)
at 
org.apache.spark.sql.hive.HiveContext$$anon$2.lookupRelation(HiveContext.scala:457)
at org.apache.spark.sql.SQLContext.table(SQLContext.scala:831)
at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827)


hive> show create table item_data_lib.pcn_item;
OK
CREATE EXTERNAL TABLE `item_data_lib.pcn_item`(
  `key` string COMMENT 'from deserializer',
  `p1` string COMMENT 'from deserializer',
  `p2` string COMMENT 'from deserializer',
  `p3` string COMMENT 'from deserializer',
  `p4` string COMMENT 'from deserializer',
  `p5` string COMMENT 'from deserializer',
  `p6` string COMMENT 'from deserializer',
  `p7` string COMMENT 'from deserializer',
  `p8` string COMMENT 'from deserializer',
  `p9` string COMMENT 'from deserializer',
  `p10` string COMMENT 'from deserializer',
  `p11` string COMMENT 'from deserializer',
  `p12` string COMMENT 'from deserializer',
  `p13` string COMMENT 'from deserializer',
  `d1` string COMMENT 'from deserializer',
  `d2` string COMMENT 'from deserializer',
  `d3` string COMMENT 'from deserializer',
  `d4` string COMMENT 'from deserializer',
  `d5` string COMMENT 'from deserializer',
  `d6` string COMMENT 'from deserializer',
  `d7` string COMMENT 'from deserializer',
  `d8` string COMMENT 'from deserializer',
  `d9` string COMMENT 'from deserializer',
  `d10` string COMMENT 'from deserializer',
  `d11` string COMMENT 'from deserializer',
  `d12` string COMMENT 'from deserializer',
  `d13` string COMMENT 'from deserializer',
  `d14` string COMMENT 'from deserializer',
  `d15` string COMMENT 'from deserializer',
  `d16` string COMMENT 'from deserializer',
  `d17` string COMMENT 'from deserializer')
ROW FORMAT SERDE
  'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY
  'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
  
'hbase.columns.mapping'=':key,p:p1,p:p2,p:p3,p:p4,p:p5,p:p6,p:

Re: Spark_1.5.1_on_HortonWorks

2015-10-20 Thread Doug Balog
I have been running 1.5.1 with Hive in secure mode on HDP 2.2.4 without any 
problems.

Doug

> On Oct 21, 2015, at 12:05 AM, Ajay Chander  wrote:
> 
> Hi Everyone,
> 
> Any one has any idea if spark-1.5.1 is available as a service on HortonWorks 
> ? I have spark-1.3.1 installed on the Cluster and it is a HortonWorks 
> distribution. Now I want upgrade it to spark-1.5.1. Anyone here have any idea 
> about it? Thank you in advance.
> 
> Regards,
> Ajay


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Accessing external Kerberised resources from Spark executors in Yarn client/cluster mode

2015-10-22 Thread Doug Balog
Another thing to check is to make sure each one of you executor nodes has the 
JCE jars installed.

try{ javax.crypto.Cipher.getMaxAllowedKeyLength("AES") > 128 } catch { case  
e:java.security.NoSuchAlgorithmException => false }

Setting  "-Dsun.security.krb5.debug=true” and “-Dsun.security.jgss.debug=true”  
in spark.executor.extraJavaOptions
and running loginUserFromKeytab() will generate a lot of info in the executor 
logs, which might be helpful to figure out what is going on too.

Cheers,

Doug


> On Oct 22, 2015, at 7:59 AM, Deenar Toraskar  
> wrote:
> 
> Hi All
> 
> I am trying to access a SQLServer that uses Kerberos for authentication from 
> Spark. I can successfully connect to the SQLServer from the driver node, but 
> any connections to SQLServer from executors fails with "Failed to find any 
> Kerberos tgt". 
> 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser on the driver 
> returns myPrincipal (auth:KERBEROS) as expected. And the same call on 
> executors returns
> 
> sc.parallelize(0 to 10).map { _ =>(("hostname".!!).trim, 
> UserGroupInformation.getCurrentUser.toString)}.collect.distinct 
> 
> returns
> 
> Array((hostname1, myprincipal (auth:SIMPLE), (hostname2, myprincipal 
> (auth:SIMPLE))
> 
> 
> I tried passing the keytab and logging in explicitly from the executors, but 
> that didnt help either.
> 
> sc.parallelize(0 to 10).map { _ 
> =>(SparkHadoopUtil.get.loginUserFromKeytab("myprincipal",SparkFiles.get("myprincipal.keytab")),
>  ("hostname".!!).trim, 
> UserGroupInformation.getCurrentUser.toString)}.collect.distinct
> 
> Digging deeper I found SPARK-6207 and came across code for each Kerberised 
> service that is accessed from the executors in Yarn Client, such as
> 
> obtainTokensForNamenodes(nns, hadoopConf, credentials)
> 
> obtainTokenForHiveMetastore(hadoopConf, credentials)
> 
> I was wondering if anyone has been successful in accessing external resources 
> (running external to the Hadoop cluster) secured by Kerberos in Spark 
> executors running in Yarn. 
> 
> 
> 
> Regards
> Deenar
> 
> 
> On 20 April 2015 at 21:58, Andrew Lee  wrote:
> Hi All,
> 
> Affected version: spark 1.2.1 / 1.2.2 / 1.3-rc1
> 
> Posting this problem to user group first to see if someone is encountering 
> the same problem. 
> 
> When submitting spark jobs that invokes HiveContext APIs on a Kerberos Hadoop 
> + YARN (2.4.1) cluster, 
> I'm getting this error. 
> 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> 
> Apparently, the Kerberos ticket is not on the remote data node nor computing 
> node since we don't 
> deploy Kerberos tickets, and that is not a good practice either. On the other 
> hand, we can't just SSH to every machine and run kinit for that users. This 
> is not practical and it is insecure.
> 
> The point here is that shouldn't there be a delegation token during the doAs 
> to use the token instead of the ticket ? 
> I'm trying to understand what is missing in Spark's HiveContext API while a 
> normal MapReduce job that invokes Hive APIs will work, but not in Spark SQL. 
> Any insights or feedback are appreciated.
> 
> Anyone got this running without pre-deploying (pre-initializing) all tickets 
> node by node? Is this worth filing a JIRA?
> 
> 
> 
> 15/03/25 18:59:08 INFO hive.metastore: Trying to connect to metastore with 
> URI thrift://alee-cluster.test.testserver.com:9083
> 15/03/25 18:59:08 ERROR transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
>   at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>   at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
>   at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>   at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>   at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
>   at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:336)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:214)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>

Re: Local Repartition

2015-07-20 Thread Doug Balog
Hi Daniel,
Take a look at .coalesce()
I’ve seen good results by coalescing to num executors * 10, but I’m still 
trying to figure out the 
optimal number of partitions per executor. 
To get the number of executors, sc.getConf.getInt(“spark.executor.instances”,-1)


Cheers,

Doug

> On Jul 20, 2015, at 5:04 AM, Daniel Haviv  
> wrote:
> 
> Hi,
> My data is constructed from a lot of small files which results in a lot of 
> partitions per RDD.
> Is there some way to locally repartition the RDD without shuffling so that 
> all of the partitions that reside on a specific node will become X partitions 
> on the same node ?
> 
> Thank you.
> Daniel


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Unable to start spark-shell on YARN

2015-09-24 Thread Doug Balog
The error is because the shell is trying to resolve hdp.version and can’t.
To fix this, you need to put a file called java-opts in your conf directory 
that  has something like this.

-Dhdp.version=2.x.x.x

Where 2.x.x.x is there version of hdp that you are using.
Cheers,

Doug

> On Sep 24, 2015, at 6:11 PM, ÐΞ€ρ@Ҝ (๏̯͡๏)  wrote:
> 
> Spark 1.4.1
> YARN
> Hadoop version: 2.7.1.2.3.1.0-2574
> ./bin/spark-shell  --master yarn
> Hadoop cluster setup using Ambari.
> 
> 
> Shell fails as YARN job failed. Any suggestions ? 
> 
> LOGS:
> 
> 15/09/24 15:07:51 INFO impl.YarnClientImpl: Submitted application 
> application_1443126834156_0016
> 15/09/24 15:07:52 INFO yarn.Client: Application report for 
> application_1443126834156_0016 (state: ACCEPTED)
> 15/09/24 15:07:52 INFO yarn.Client: 
>client token: N/A
>diagnostics: N/A
>ApplicationMaster host: N/A
>ApplicationMaster RPC port: -1
>queue: default
>start time: 1443132471179
>final status: UNDEFINED
>tracking URL: http://host:8088/proxy/application_1443126834156_0016/
>user: zeppelin
> 15/09/24 15:07:53 INFO yarn.Client: Application report for 
> application_1443126834156_0016 (state: ACCEPTED)
> 15/09/24 15:07:54 INFO yarn.Client: Application report for 
> application_1443126834156_0016 (state: ACCEPTED)
> 15/09/24 15:07:55 INFO yarn.Client: Application report for 
> application_1443126834156_0016 (state: ACCEPTED)
> 15/09/24 15:07:56 INFO yarn.Client: Application report for 
> application_1443126834156_0016 (state: FAILED)
> 15/09/24 15:07:56 INFO yarn.Client: 
>client token: N/A
>diagnostics: Application application_1443126834156_0016 failed 2 times 
> due to AM Container for appattempt_1443126834156_0016_02 exited with  
> exitCode: 1
> For more detailed output, check application tracking 
> page:http://host:8088/cluster/app/application_1443126834156_0016Then, click 
> on links to logs of each attempt.
> Diagnostics: Exception from container-launch.
> Container id: container_e03_1443126834156_0016_02_01
> Exit code: 1
> Exception message: 
> /hadoop/yarn/local/usercache/zeppelin/appcache/application_1443126834156_0016/container_e03_1443126834156_0016_02_01/launch_container.sh:
>  line 24: 
> $PWD:$PWD/__hadoop_conf__:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:
>  bad substitution
> 
> Stack trace: ExitCodeException exitCode=1: 
> /hadoop/yarn/local/usercache/zeppelin/appcache/application_1443126834156_0016/container_e03_1443126834156_0016_02_01/launch_container.sh:
>  line 24: 
> $PWD:$PWD/__hadoop_conf__:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:
>  bad substitution
> 
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
>   at org.apache.hadoop.util.Shell.run(Shell.java:456)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolEx

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-19 Thread Doug Balog
I’m seeing the same problem.
I’ve set logging to DEBUG, and I think some hints are in the “Yarn AM launch 
context” that is printed out 
before Yarn  runs java. 

My next step is to talk to the admins and get them to set 
yarn.nodemanager.delete.debug-delay-sec
in the config, as recommended in 
http://spark.apache.org/docs/latest/running-on-yarn.html
Then I can see exactly whats in the directory.

Doug

ps Sorry for the dup message Bharath and Todd, used wrong email address.


> On Mar 19, 2015, at 1:19 AM, Bharath Ravi Kumar  wrote:
> 
> Thanks for clarifying Todd. This may then be an issue specific to the HDP 
> version we're using. Will continue to debug and post back if there's any 
> resolution.
> 
> On Thu, Mar 19, 2015 at 3:40 AM, Todd Nist  wrote:
> Yes I believe you are correct.  
> 
> For the build you may need to specify the specific HDP version of hadoop to 
> use with the -Dhadoop.version=.  I went with the default 2.6.0, but 
> Horton may have a vendor specific version that needs to go here.  I know I 
> saw a similar post today where the solution was to use 
> -Dhadoop.version=2.5.0-cdh5.3.2 but that was for a cloudera installation.  I 
> am not sure what the HDP version would be to put here.
> 
> -Todd
> 
> On Wed, Mar 18, 2015 at 12:49 AM, Bharath Ravi Kumar  
> wrote:
> Hi Todd,
> 
> Yes, those entries were present in the conf under the same SPARK_HOME that 
> was used to run spark-submit. On a related note, I'm assuming that the 
> additional spark yarn options (like spark.yarn.jar) need to be set in the 
> same properties file that is passed to spark-submit. That apart, I assume 
> that no other host on the cluster should require a "deployment of" the spark 
> distribution or any other config change to support a spark job.  Isn't that 
> correct?
> 
> On Tue, Mar 17, 2015 at 6:19 PM, Todd Nist  wrote:
> Hi Bharath,
> 
> Do you have these entries in your $SPARK_HOME/conf/spark-defaults.conf file?
> 
> spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0-2041
> spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0-2041
> 
> 
> 
> 
> On Tue, Mar 17, 2015 at 1:04 AM, Bharath Ravi Kumar  
> wrote:
> Still no luck running purpose-built 1.3 against HDP 2.2 after following all 
> the instructions. Anyone else faced this issue?
> 
> On Mon, Mar 16, 2015 at 8:53 PM, Bharath Ravi Kumar  
> wrote:
> Hi Todd,
> 
> Thanks for the help. I'll try again after building a distribution with the 
> 1.3 sources. However, I wanted to confirm what I mentioned earlier:  is it 
> sufficient to copy the distribution only to the client host from where  
> spark-submit is invoked(with spark.yarn.jar set), or is there a need to 
> ensure that the entire distribution is deployed made available pre-deployed 
> on every host in the yarn cluster? I'd assume that the latter shouldn't be 
> necessary.
> 
> On Mon, Mar 16, 2015 at 8:38 PM, Todd Nist  wrote:
> Hi Bharath,
> 
> I ran into the same issue a few days ago, here is a link to a post on 
> Horton's fourm.  http://hortonworks.com/community/forums/search/spark+1.2.1/
> Incase anyone else needs to perform this these are the steps I took to get it 
> to work with Spark 1.2.1 as well as Spark 1.3.0-RC3:
> 
> 1. Pull 1.2.1 Source
> 2. Apply the following patches
> a. Address jackson version, https://github.com/apache/spark/pull/3938
> b. Address the propagation of the hdp.version set in the spark-default.conf, 
> https://github.com/apache/spark/pull/3409
> 3. build with $SPARK_HOME./make-distribution.sh –name hadoop2.6 –tgz -Pyarn 
> -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests 
> package
> 
> Then deploy the resulting artifact => spark-1.2.1-bin-hadoop2.6.tgz following 
> instructions in the HDP Spark preview 
> http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/
> 
> FWIW spark-1.3.0 appears to be working fine with HDP as well and steps 2a and 
> 2b are not required.
> 
> HTH
> 
> -Todd
> 
> 
> On Mon, Mar 16, 2015 at 10:13 AM, Bharath Ravi Kumar  
> wrote:
> Hi,
> 
> Trying to run spark ( 1.2.1 built for hdp 2.2) against a yarn cluster results 
> in the AM failing to start with following error on stderr: 
> Error: Could not find or load main class 
> org.apache.spark.deploy.yarn.ExecutorLauncher 
> An application id was assigned to the job, but there were no logs. Note that 
> the spark distribution has not been "installed" on every host in the cluster 
> and the aforementioned spark build was copied  to one of the hadoop client 
> hosts in the cluster to launch the 
> job. Spark-submit was run with --master yarn-client and spark.yarn.jar was 
> set to the assembly jar from the above distribution. Switching the spark 
> distribution to the HDP recommended  version 
> and following the instructions on this page did not fix the problem either. 
> Any idea what may have caused this error ? 
> 
> Thanks,
> Bharath
> 
> 
> 
> 
> 
> 
> 


-
To unsubscribe, e-ma

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-24 Thread Doug Balog
I found the problem.
In  mapped-site.xml, mapreduce.application.classpath has references to 
“${hdp.version}” which is not getting replaced
when launch_container.sh is created. The executor fails with a substitution 
error at line 27 in launch_container.sh because bash
can’t deal with “${hdp.version}."
I have hdp.version defined in my spark-defaults.conf via 
spark.{driver,yarn.am}.extraJavaOptions -Dhdp.version=2.2.0-2041,
so something is not doing the substitution.

To work around this problem, I replaced "${hdp.version}” with “current” in 
mapred-site.xml.
I found a similar bug, https://issues.apache.org/jira/browse/AMBARI-8028, and 
the fix was exactly what I did to work around it.
Not sure if this is an AMBARI bug (not doing variable substitution when writing 
mapred-site.xml) or YARN bug (its not doing the variable substitution when 
writing launch_container.sh) 

Anybody have an opinion ? 

Doug



> On Mar 19, 2015, at 5:51 PM, Doug Balog  wrote:
> 
> I’m seeing the same problem.
> I’ve set logging to DEBUG, and I think some hints are in the “Yarn AM launch 
> context” that is printed out 
> before Yarn  runs java. 
> 
> My next step is to talk to the admins and get them to set 
> yarn.nodemanager.delete.debug-delay-sec
> in the config, as recommended in 
> http://spark.apache.org/docs/latest/running-on-yarn.html
> Then I can see exactly whats in the directory.
> 
> Doug
> 
> ps Sorry for the dup message Bharath and Todd, used wrong email address.
> 
> 
>> On Mar 19, 2015, at 1:19 AM, Bharath Ravi Kumar  wrote:
>> 
>> Thanks for clarifying Todd. This may then be an issue specific to the HDP 
>> version we're using. Will continue to debug and post back if there's any 
>> resolution.
>> 
>> On Thu, Mar 19, 2015 at 3:40 AM, Todd Nist  wrote:
>> Yes I believe you are correct.  
>> 
>> For the build you may need to specify the specific HDP version of hadoop to 
>> use with the -Dhadoop.version=.  I went with the default 2.6.0, but 
>> Horton may have a vendor specific version that needs to go here.  I know I 
>> saw a similar post today where the solution was to use 
>> -Dhadoop.version=2.5.0-cdh5.3.2 but that was for a cloudera installation.  I 
>> am not sure what the HDP version would be to put here.
>> 
>> -Todd
>> 
>> On Wed, Mar 18, 2015 at 12:49 AM, Bharath Ravi Kumar  
>> wrote:
>> Hi Todd,
>> 
>> Yes, those entries were present in the conf under the same SPARK_HOME that 
>> was used to run spark-submit. On a related note, I'm assuming that the 
>> additional spark yarn options (like spark.yarn.jar) need to be set in the 
>> same properties file that is passed to spark-submit. That apart, I assume 
>> that no other host on the cluster should require a "deployment of" the spark 
>> distribution or any other config change to support a spark job.  Isn't that 
>> correct?
>> 
>> On Tue, Mar 17, 2015 at 6:19 PM, Todd Nist  wrote:
>> Hi Bharath,
>> 
>> Do you have these entries in your $SPARK_HOME/conf/spark-defaults.conf file?
>> 
>> spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0-2041
>> spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0-2041
>> 
>> 
>> 
>> 
>> On Tue, Mar 17, 2015 at 1:04 AM, Bharath Ravi Kumar  
>> wrote:
>> Still no luck running purpose-built 1.3 against HDP 2.2 after following all 
>> the instructions. Anyone else faced this issue?
>> 
>> On Mon, Mar 16, 2015 at 8:53 PM, Bharath Ravi Kumar  
>> wrote:
>> Hi Todd,
>> 
>> Thanks for the help. I'll try again after building a distribution with the 
>> 1.3 sources. However, I wanted to confirm what I mentioned earlier:  is it 
>> sufficient to copy the distribution only to the client host from where  
>> spark-submit is invoked(with spark.yarn.jar set), or is there a need to 
>> ensure that the entire distribution is deployed made available pre-deployed 
>> on every host in the yarn cluster? I'd assume that the latter shouldn't be 
>> necessary.
>> 
>> On Mon, Mar 16, 2015 at 8:38 PM, Todd Nist  wrote:
>> Hi Bharath,
>> 
>> I ran into the same issue a few days ago, here is a link to a post on 
>> Horton's fourm.  http://hortonworks.com/community/forums/search/spark+1.2.1/
>> Incase anyone else needs to perform this these are the steps I took to get 
>> it to work with Spark 1.2.1 as well as Spark 1.3.0-RC3:
>> 
>> 1. Pull 1.2.1 Source
>> 2. Apply the following patches
>> a. Address jackson version, https://github.com/apache/spark/pull/3938
>> b. Address the propagation of the hdp.version set in

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-30 Thread Doug Balog
The “best” solution to spark-shell’s  problem is creating a file 
$SPARK_HOME/conf/java-opts
with “-Dhdp.version=2.2.0.0-2014”

Cheers,

Doug

> On Mar 28, 2015, at 1:25 PM, Michael Stone  wrote:
> 
> I've also been having trouble running 1.3.0 on HDP. The 
> spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0-2041
> configuration directive seems to work with pyspark, but not propagate when 
> using spark-shell. (That is, everything works find with pyspark, and 
> spark-shell fails with the "bad substitution" message.)
> 
> Mike Stone
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Job triggers second attempt

2015-05-07 Thread Doug Balog
I bet you are running on YARN in cluster mode.

If you are running on yarn in client mode, 
.set(“spark.yarn.maxAppAttempts”,”1”) works as you expect,
because YARN doesn’t start your app on the cluster until you call 
SparkContext().

But If you are running on yarn in cluster mode, the driver program runs from a 
cluster node.
So your app is already running on the cluster when you call .set().
To make it work in cluster mode,  the property must be set on the spark-submit 
command line via 
"—conf spark.yarn.maxAppAttempts=1”
or —driver-options “-Dspark.yarn.maxAppAttempts=1”


A note should be added to running-on-yarn.html in the "Important notes” section 
that
says that in cluster mode you need to set  spark.yarn.* properties from 
spark-submit command line.

Cheers,

Doug




> On May 7, 2015, at 2:34 AM, ÐΞ€ρ@Ҝ (๏̯͡๏)  wrote:
> 
> How i can stop Spark to stop triggering second attempt in case the first 
> fails.
> I do not want to wait for the second attempt to fail again so that i can 
> debug faster.
> 
> .set("spark.yarn.maxAppAttempts", "0") OR .set("spark.yarn.maxAppAttempts", 
> "1")
> 
> is not helping.
> 
> 
> -- 
> Deepak
> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org