Hive Metastore Service Startup Fails
Hello All, When I try to deploy hortonworks cluster using ambari blueprint APIs, it results in failure while starting up of Hive Metastore service. The same blueprint most of the times works appropriately on the same environment. The parameter which gets changed in the entire blueprint w.r.t hive is, Host Mapping File Content: {'blueprint': 'onemasterblueprint', 'configurations': [{u'hive-env': {u'hive_metastore_user_passwd': 'tkdw1rN'}}, {u'gateway-site': {u'gateway.port': u'8445'}}, {u'nagios-env': {u'nagios_contact': u'a...@us.ibm.com'}}, {u'hive-site': {u'javax.jdo.option.ConnectionPassword': 'tkdw1rN'}}, {'hdfs-site': {'dfs.datanode.data.dir': '/disk1/hadoop/hdfs/data,/disk2/hadoop/hdfs/data', 'dfs.namenode.checkpoint.dir': '/disk1/hadoop/hdfs/namesecondary', 'dfs.namenode.name.dir': '/disk1/hadoop/hdfs/namenode'}}, {'core-site': {'fs.swift.impl': 'org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem', 'fs.swift.service.softlayer.auth.url': 'https://dal05.objectstorage.service.networklayer.com/auth/v1.0', 'fs.swift.service.softlayer.connect.timeout': '12', 'fs.swift.service.softlayer.public': 'false', 'fs.swift.service.softlayer.use.encryption': 'true', 'fs.swift.service.softlayer.use.get.auth': 'true'}}], 'default_password': 'tkdw1rN', 'host_groups': [{'hosts': [{'fqdn': 'vmktest0003.test.analytics.com'}], 'name': 'master'}, {'hosts': [{'fqdn': 'vmktest0004.test.analytics.com'}], 'name': 'compute'}]} Error.txt: 2015-06-01 05:59:22,178 - Error while executing command 'start': Traceback (most recent call last): File /usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py, line 123, in execute method(env) File /var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HIVE/package/scripts/hive_metastore.py, line 43, in start self.configure(env) # FOR SECURITY File /var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HIVE/package/scripts/hive_metastore.py, line 38, in configure hive(name='metastore') File /var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HIVE/package/scripts/hive.py, line 97, in hive not_if = check_schema_created_cmd File /usr/lib/python2.6/site-packages/resource_management/core/base.py, line 148, in __init__ self.env.run() File /usr/lib/python2.6/site-packages/resource_management/core/environment.py, line 149, in run self.run_action(resource, action) File /usr/lib/python2.6/site-packages/resource_management/core/environment.py, line 115, in run_action provider_action() File /usr/lib/python2.6/site-packages/resource_management/core/providers/system.py, line 241, in action_run raise ex Fail: Execution of 'export HIVE_CONF_DIR=/etc/hive/conf.server ; /usr/hdp/current/hive-client/bin/schematool -initSchema -dbType mysql -userName hive -passWord [PROTECTED]' returned 1. 15/06/01 05:59:21 WARN conf.HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist 15/06/01 05:59:21 WARN conf.HiveConf: HiveConf of name hive.heapsize does not exist 15/06/01 05:59:21 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist 15/06/01 05:59:21 WARN conf.HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist Metastore connection URL: jdbc:mysql://vmktest0009.test.analytics.ibmcloud.com/hive?createDatabaseIfNotExist=true Metastore Connection Driver : com.mysql.jdbc.Driver Metastore connection User: hive org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version. *** schemaTool failed *** Output.txt: 2015-06-01 05:59:07,907 - Changing permission for /var/lib/ambari-agent/data/tmp/start_metastore_script from 644 to 755 2015-06-01 05:59:07,909 - Execute['export HIVE_CONF_DIR=/etc/hive/conf.server ; /usr/hdp/current/hive-client/bin/schematool -initSchema -dbType mysql -userName hive -passWord [PROTECTED]'] {'not_if': 'export HIVE_CONF_DIR=/etc/hive/conf.server ; /usr/hdp/current/hive-client/bin/schematool -info -dbType mysql -userName hive -passWord \'Hb2\'\'\'aasz\''} 2015-06-01 05:59:22,178 - Error while executing command 'start': Traceback (most recent call last): File /usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py, line 123, in execute method(env) File /var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HIVE/package/scripts/hive_metastore.py, line 43, in start self.configure(env) # FOR SECURITY File /var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HIVE/package/scripts/hive_metastore.py, line 38, in configure
Streaming K-medoids
Hello everyone, I have an idea and I would like to get a validation from community about it. In Mahout there is an implementation of Streaming K-means. I'm interested in your opinion would it make sense to make a similar implementation of Streaming K-medoids? K-medoids has even bigger problems than K-means because it's not scalable, but can be useful in some cases (e.g. It allows more sophisticated distance measures). What is your opinion about such an approach? Does anyone see problems with it? I already implemented K-medoids using this approach https://seer.lcc.ufmg.br/index.php/jidm/article/viewFile/99/82 but I now have a problem with a distance measure that does not allow projections, so I came up to the idea to implement it in a similar way as Streaming K-medoids. Best regards, Marko
reduce finished container statuses not present at AM
Hi guys, I was running a simple Terasort job with 4 mappers and 2 reducers using Hadoop 3.0.0-Snapshot(trunk). I was analyzing the finished containers reported to AM in RMContainerAllocator.java - getResources(), but I realized that none of the finished containers for reducers are sent back to AM. Is this a bug or as designed ? I remember I did a similar analysis for Hadoop 2.3 and I got finished containers statuses both for mappers and reducers. Could someone explain me what is the case ? Thank you,Robert
Re: a non-commerial distribution of hadoop ecosystem?
Hello Demai, Apache Bigtop is a project that tests and publishes rpm and deb packages for Hadoop ecosystem components. They'll have more details on their own site. http://bigtop.apache.org/ Would this suit your needs? --Chris Nauroth From: Demai Ni nid...@gmail.commailto:nid...@gmail.com Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Date: Monday, June 1, 2015 at 1:37 PM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: a non-commerial distribution of hadoop ecosystem? hi, Guys, I have been doing some research/POC using hadoop system. Normally, I either use homebrew on mac for single node installation, or use CDH(Cloudera) for a 3~4 nodes small linux cluster. My question is besides the commercial distributions: CDH(Cloudera) , HDP (Horton work), and others like Mapr, IBM... Is there a distribution that is NOT owned by a company? I am looking for something simple for cluster configuration/installation for multiple components: hdfs, yarn, zookeeper, hive, hbase, maybe Spark. Surely, for a well-experience person(not me), he/she can build the distribution from Apache releases. Well, I am more interested on building application on top of it, and hopefully to find one packed them together. BTW, I don't need the latest releases like other commercial distribution offered. I am also looking into the ODP(the open data platform), but that project is kind of quiet after the initial Feb announcement. Thanks. Demai
Re: a non-commerial distribution of hadoop ecosystem?
Chris and Roman, many thanks for the quick response. I will take a look at bigtop. Actually, I heard about it, but thought it is a installation framework, instead of a hadoop distribution. Now I am looking at the BigTop 0.7.0 hadoop instruction, which probably will work fine for my needs. Appreciate the pointer. Roman, I will ping you off list for ODP. I was hoping ODP will be the one for me. Well, in reality, it is owned by a few companies, at least not by ONE company. :-) It is fine with me, as long as ODP is open to be used by others. I am just having trouble to find document/installation info of the ODP. maybe I should google harder? :-) Demai On Mon, Jun 1, 2015 at 1:46 PM, Roman Shaposhnik r...@apache.org wrote: On Mon, Jun 1, 2015 at 1:37 PM, Demai Ni nid...@gmail.com wrote: My question is besides the commercial distributions: CDH(Cloudera) , HDP (Horton work), and others like Mapr, IBM... Is there a distribution that is NOT owned by a company? I am looking for something simple for cluster configuration/installation for multiple components: hdfs, yarn, zookeeper, hive, hbase, maybe Spark. Surely, for a well-experience person(not me), he/she can build the distribution from Apache releases. Well, I am more interested on building application on top of it, and hopefully to find one packed them together. Apache Bigtop (CCed) aims at delivering a 100% open and community-driven distribution of big data management technologies around Apache Hadoop. Same as, for example, what Debian is trying to do for Linux. BTW, I don't need the latest releases like other commercial distribution offered. I am also looking into the ODP(the open data platform), but that project is kind of quiet after the initial Feb announcement. Feel free to ping me off list if you want more details on ODP. Thanks, Roman.
Re: a non-commerial distribution of hadoop ecosystem?
On Mon, Jun 1, 2015 at 1:37 PM, Demai Ni nid...@gmail.com wrote: My question is besides the commercial distributions: CDH(Cloudera) , HDP (Horton work), and others like Mapr, IBM... Is there a distribution that is NOT owned by a company? I am looking for something simple for cluster configuration/installation for multiple components: hdfs, yarn, zookeeper, hive, hbase, maybe Spark. Surely, for a well-experience person(not me), he/she can build the distribution from Apache releases. Well, I am more interested on building application on top of it, and hopefully to find one packed them together. Apache Bigtop (CCed) aims at delivering a 100% open and community-driven distribution of big data management technologies around Apache Hadoop. Same as, for example, what Debian is trying to do for Linux. BTW, I don't need the latest releases like other commercial distribution offered. I am also looking into the ODP(the open data platform), but that project is kind of quiet after the initial Feb announcement. Feel free to ping me off list if you want more details on ODP. Thanks, Roman.
Re: a non-commerial distribution of hadoop ecosystem?
Bigtop, in a nutshell, is a non-commercial multi-stakeholder Apache project that produces a build framework that takes as input source from Hadoop and related big data projects and produces as output OS native packages for installation and management - certainly, a distribution of the Hadoop ecosystem - coupled with a suite of integration tests for ensuring the distribution components are working well together, coupled with a suite of Puppet scripts for post-deploy configuration management. It's a rather large nutshell. (Smile) Bigtop distribution packages are supported by Cask's Coopr (coopr.io) and I think to some extent by Ambari (haven't tried it). I've personally used Bigtop for years to produce several custom Hadoop distributions. For this purpose it is a great tool. Please mail u...@bigtop.apache.org if you would like to know more, we'd love to talk with you. On Jun 2, 2015, at 7:16 AM, Demai Ni nid...@gmail.com wrote: Chris and Roman, many thanks for the quick response. I will take a look at bigtop. Actually, I heard about it, but thought it is a installation framework, instead of a hadoop distribution. Now I am looking at the BigTop 0.7.0 hadoop instruction, which probably will work fine for my needs. Appreciate the pointer. Roman, I will ping you off list for ODP. I was hoping ODP will be the one for me. Well, in reality, it is owned by a few companies, at least not by ONE company. :-) It is fine with me, as long as ODP is open to be used by others. I am just having trouble to find document/installation info of the ODP. maybe I should google harder? :-) Demai On Mon, Jun 1, 2015 at 1:46 PM, Roman Shaposhnik r...@apache.org wrote: On Mon, Jun 1, 2015 at 1:37 PM, Demai Ni nid...@gmail.com wrote: My question is besides the commercial distributions: CDH(Cloudera) , HDP (Horton work), and others like Mapr, IBM... Is there a distribution that is NOT owned by a company? I am looking for something simple for cluster configuration/installation for multiple components: hdfs, yarn, zookeeper, hive, hbase, maybe Spark. Surely, for a well-experience person(not me), he/she can build the distribution from Apache releases. Well, I am more interested on building application on top of it, and hopefully to find one packed them together. Apache Bigtop (CCed) aims at delivering a 100% open and community-driven distribution of big data management technologies around Apache Hadoop. Same as, for example, what Debian is trying to do for Linux. BTW, I don't need the latest releases like other commercial distribution offered. I am also looking into the ODP(the open data platform), but that project is kind of quiet after the initial Feb announcement. Feel free to ping me off list if you want more details on ODP. Thanks, Roman.