Hive Metastore Service Startup Fails

2015-06-01 Thread Pratik Gadiya
Hello All,

When I try to deploy hortonworks cluster using ambari blueprint APIs, it 
results in failure while starting up of Hive Metastore service.

The same blueprint most of the times works appropriately on the same 
environment.

The parameter which gets changed in the entire blueprint w.r.t hive is,

Host Mapping File Content:
{'blueprint': 'onemasterblueprint',
'configurations': [{u'hive-env': {u'hive_metastore_user_passwd': 'tkdw1rN'}},
{u'gateway-site': {u'gateway.port': u'8445'}},
{u'nagios-env': {u'nagios_contact': u'a...@us.ibm.com'}},
{u'hive-site': {u'javax.jdo.option.ConnectionPassword': 
'tkdw1rN'}},
{'hdfs-site': {'dfs.datanode.data.dir': 
'/disk1/hadoop/hdfs/data,/disk2/hadoop/hdfs/data',
   'dfs.namenode.checkpoint.dir': 
'/disk1/hadoop/hdfs/namesecondary',
   'dfs.namenode.name.dir': 
'/disk1/hadoop/hdfs/namenode'}},
{'core-site': {'fs.swift.impl': 
'org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem',
   'fs.swift.service.softlayer.auth.url': 
'https://dal05.objectstorage.service.networklayer.com/auth/v1.0',
   
'fs.swift.service.softlayer.connect.timeout': '12',
   'fs.swift.service.softlayer.public': 'false',
   'fs.swift.service.softlayer.use.encryption': 
'true',
   'fs.swift.service.softlayer.use.get.auth': 
'true'}}],
'default_password': 'tkdw1rN',
'host_groups': [{'hosts': [{'fqdn': 'vmktest0003.test.analytics.com'}],
  'name': 'master'},
 {'hosts': [{'fqdn': 'vmktest0004.test.analytics.com'}],
  'name': 'compute'}]}

Error.txt:
2015-06-01 05:59:22,178 - Error while executing command 'start':
Traceback (most recent call last):
  File 
/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py,
 line 123, in execute
method(env)
  File 
/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HIVE/package/scripts/hive_metastore.py,
 line 43, in start
self.configure(env) # FOR SECURITY
  File 
/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HIVE/package/scripts/hive_metastore.py,
 line 38, in configure
hive(name='metastore')
  File 
/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HIVE/package/scripts/hive.py,
 line 97, in hive
not_if = check_schema_created_cmd
  File /usr/lib/python2.6/site-packages/resource_management/core/base.py, 
line 148, in __init__
self.env.run()
  File 
/usr/lib/python2.6/site-packages/resource_management/core/environment.py, 
line 149, in run
self.run_action(resource, action)
  File 
/usr/lib/python2.6/site-packages/resource_management/core/environment.py, 
line 115, in run_action
provider_action()
  File 
/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py,
 line 241, in action_run
raise ex
Fail: Execution of 'export HIVE_CONF_DIR=/etc/hive/conf.server ; 
/usr/hdp/current/hive-client/bin/schematool -initSchema -dbType mysql -userName 
hive -passWord [PROTECTED]' returned 1. 15/06/01 05:59:21 WARN conf.HiveConf: 
HiveConf of name hive.optimize.mapjoin.mapreduce does not exist
15/06/01 05:59:21 WARN conf.HiveConf: HiveConf of name hive.heapsize does not 
exist
15/06/01 05:59:21 WARN conf.HiveConf: HiveConf of name 
hive.server2.enable.impersonation does not exist
15/06/01 05:59:21 WARN conf.HiveConf: HiveConf of name 
hive.auto.convert.sortmerge.join.noconditionaltask does not exist
Metastore connection URL: 
jdbc:mysql://vmktest0009.test.analytics.ibmcloud.com/hive?createDatabaseIfNotExist=true
Metastore Connection Driver :  com.mysql.jdbc.Driver
Metastore connection User: hive
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema 
version.
*** schemaTool failed ***

Output.txt:


2015-06-01 05:59:07,907 - Changing permission for 
/var/lib/ambari-agent/data/tmp/start_metastore_script from 644 to 755

2015-06-01 05:59:07,909 - Execute['export HIVE_CONF_DIR=/etc/hive/conf.server ; 
/usr/hdp/current/hive-client/bin/schematool -initSchema -dbType mysql -userName 
hive -passWord [PROTECTED]'] {'not_if': 'export 
HIVE_CONF_DIR=/etc/hive/conf.server ; 
/usr/hdp/current/hive-client/bin/schematool -info -dbType mysql -userName hive 
-passWord \'Hb2\'\'\'aasz\''}

2015-06-01 05:59:22,178 - Error while executing command 'start':

Traceback (most recent call last):

  File 
/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py,
 line 123, in execute

method(env)

  File 
/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HIVE/package/scripts/hive_metastore.py,
 line 43, in start

self.configure(env) # FOR SECURITY

  File 
/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HIVE/package/scripts/hive_metastore.py,
 line 38, in configure

 

Streaming K-medoids

2015-06-01 Thread Marko Dinic

Hello everyone,

I have an idea and I would like to get a validation from community about 
it.


In Mahout there is an implementation of Streaming K-means. I'm 
interested in your opinion would it make sense to make a similar 
implementation of Streaming K-medoids?


K-medoids has even bigger problems than K-means because it's not 
scalable, but can be useful in some cases (e.g. It allows more 
sophisticated distance measures).


What is your opinion about such an approach? Does anyone see problems 
with it?


I already implemented K-medoids using this approach 
https://seer.lcc.ufmg.br/index.php/jidm/article/viewFile/99/82 but I now 
have a problem with a distance measure that does not allow projections, 
so I came up to the idea to implement it in a similar way as Streaming 
K-medoids.


Best regards,
Marko


reduce finished container statuses not present at AM

2015-06-01 Thread Grandl Robert
Hi guys,
I was running a simple Terasort job with 4 mappers and 2 reducers using Hadoop 
3.0.0-Snapshot(trunk).
I was analyzing the finished containers reported to AM in 
RMContainerAllocator.java - getResources(), but I realized that none of the 
finished containers for reducers are sent back to AM. Is this a bug or as 
designed ? 

I remember I did a similar analysis for Hadoop 2.3 and I got finished 
containers statuses both for mappers and reducers. 

Could someone explain me what is the case ?
Thank you,Robert



Re: a non-commerial distribution of hadoop ecosystem?

2015-06-01 Thread Chris Nauroth
Hello Demai,

Apache Bigtop is a project that tests and publishes rpm and deb packages for 
Hadoop ecosystem components.  They'll have more details on their own site.

http://bigtop.apache.org/

Would this suit your needs?

--Chris Nauroth

From: Demai Ni nid...@gmail.commailto:nid...@gmail.com
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Monday, June 1, 2015 at 1:37 PM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: a non-commerial distribution of hadoop ecosystem?

hi, Guys,

I have been doing some research/POC using hadoop system. Normally, I either use 
homebrew on mac for single node installation, or use CDH(Cloudera) for a 3~4 
nodes small linux cluster.

My question is besides the commercial distributions: CDH(Cloudera)  , HDP 
(Horton work), and others like Mapr, IBM... Is there a distribution that is NOT 
owned by a company?  I am looking for something simple for cluster 
configuration/installation for multiple components: hdfs, yarn, zookeeper, 
hive, hbase, maybe Spark. Surely, for a well-experience person(not me), he/she 
can build the distribution from Apache releases. Well, I am more interested on 
building application on top of it, and hopefully to find one packed them 
together.

BTW, I don't need the latest releases like other commercial distribution 
offered.  I am also looking into the ODP(the open data platform), but that 
project is kind of quiet after the initial Feb announcement.

Thanks.

 Demai


Re: a non-commerial distribution of hadoop ecosystem?

2015-06-01 Thread Demai Ni
Chris and Roman,

many thanks for the quick response.  I will take a look at bigtop.
Actually, I heard about it, but thought it is a installation framework,
instead of a hadoop distribution. Now I am looking at the BigTop 0.7.0
hadoop instruction, which probably will work fine for my needs. Appreciate
the pointer.

Roman, I will ping you off list for ODP. I was hoping ODP will be the one
for me. Well, in reality, it is owned by a few companies, at least not by
ONE company. :-)  It is fine with me, as long as ODP is open to be used by
others. I am just having trouble to find document/installation info of the
ODP. maybe I should google harder? :-)

Demai


On Mon, Jun 1, 2015 at 1:46 PM, Roman Shaposhnik r...@apache.org wrote:

 On Mon, Jun 1, 2015 at 1:37 PM, Demai Ni nid...@gmail.com wrote:
  My question is besides the commercial distributions: CDH(Cloudera)  , HDP
  (Horton work), and others like Mapr, IBM... Is there a distribution that
 is
  NOT owned by a company?  I am looking for something simple for cluster
  configuration/installation for multiple components: hdfs, yarn,
 zookeeper,
  hive, hbase, maybe Spark. Surely, for a well-experience person(not me),
  he/she can build the distribution from Apache releases. Well, I am more
  interested on building application on top of it, and hopefully to find
 one
  packed them together.

 Apache Bigtop (CCed) aims at delivering a 100% open and
 community-driven distribution of big data management technologies
 around Apache Hadoop. Same as, for example, what Debian is trying
 to do for Linux.

  BTW, I don't need the latest releases like other commercial distribution
  offered.  I am also looking into the ODP(the open data platform), but
 that
  project is kind of quiet after the initial Feb announcement.

 Feel free to ping me off list if you want more details on ODP.

 Thanks,
 Roman.



Re: a non-commerial distribution of hadoop ecosystem?

2015-06-01 Thread Roman Shaposhnik
On Mon, Jun 1, 2015 at 1:37 PM, Demai Ni nid...@gmail.com wrote:
 My question is besides the commercial distributions: CDH(Cloudera)  , HDP
 (Horton work), and others like Mapr, IBM... Is there a distribution that is
 NOT owned by a company?  I am looking for something simple for cluster
 configuration/installation for multiple components: hdfs, yarn, zookeeper,
 hive, hbase, maybe Spark. Surely, for a well-experience person(not me),
 he/she can build the distribution from Apache releases. Well, I am more
 interested on building application on top of it, and hopefully to find one
 packed them together.

Apache Bigtop (CCed) aims at delivering a 100% open and
community-driven distribution of big data management technologies
around Apache Hadoop. Same as, for example, what Debian is trying
to do for Linux.

 BTW, I don't need the latest releases like other commercial distribution
 offered.  I am also looking into the ODP(the open data platform), but that
 project is kind of quiet after the initial Feb announcement.

Feel free to ping me off list if you want more details on ODP.

Thanks,
Roman.


Re: a non-commerial distribution of hadoop ecosystem?

2015-06-01 Thread Andrew Purtell
Bigtop, in a nutshell, is a non-commercial multi-stakeholder Apache project 
that produces a build framework that takes as input source from Hadoop and 
related big data projects and produces as output OS native packages for 
installation and management - certainly, a distribution of the Hadoop ecosystem 
- coupled with a suite of integration tests for ensuring the distribution 
components are working well together, coupled with a suite of Puppet scripts 
for post-deploy configuration management. It's a rather large nutshell. (Smile) 
 Bigtop distribution packages are supported by Cask's Coopr (coopr.io) and I 
think to some extent by Ambari (haven't tried it).

I've personally used Bigtop for years to produce several custom Hadoop 
distributions. For this purpose it is a great tool. 

Please mail u...@bigtop.apache.org if you would like to know more, we'd love to 
talk with you.


 On Jun 2, 2015, at 7:16 AM, Demai Ni nid...@gmail.com wrote:
 
 Chris and Roman,
 
 many thanks for the quick response.  I will take a look at bigtop. Actually, 
 I heard about it, but thought it is a installation framework, instead of a 
 hadoop distribution. Now I am looking at the BigTop 0.7.0 hadoop instruction, 
 which probably will work fine for my needs. Appreciate the pointer.
 
 Roman, I will ping you off list for ODP. I was hoping ODP will be the one for 
 me. Well, in reality, it is owned by a few companies, at least not by ONE 
 company. :-)  It is fine with me, as long as ODP is open to be used by 
 others. I am just having trouble to find document/installation info of the 
 ODP. maybe I should google harder? :-)
 
 Demai 
 
 
 On Mon, Jun 1, 2015 at 1:46 PM, Roman Shaposhnik r...@apache.org wrote:
 On Mon, Jun 1, 2015 at 1:37 PM, Demai Ni nid...@gmail.com wrote:
  My question is besides the commercial distributions: CDH(Cloudera)  , HDP
  (Horton work), and others like Mapr, IBM... Is there a distribution that is
  NOT owned by a company?  I am looking for something simple for cluster
  configuration/installation for multiple components: hdfs, yarn, zookeeper,
  hive, hbase, maybe Spark. Surely, for a well-experience person(not me),
  he/she can build the distribution from Apache releases. Well, I am more
  interested on building application on top of it, and hopefully to find one
  packed them together.
 
 Apache Bigtop (CCed) aims at delivering a 100% open and
 community-driven distribution of big data management technologies
 around Apache Hadoop. Same as, for example, what Debian is trying
 to do for Linux.
 
  BTW, I don't need the latest releases like other commercial distribution
  offered.  I am also looking into the ODP(the open data platform), but that
  project is kind of quiet after the initial Feb announcement.
 
 Feel free to ping me off list if you want more details on ODP.
 
 Thanks,
 Roman.