Reduce the amount of logging going into /var/log/hive/userlogs
Last time I looked there wasn't much info available on how to reduce the size of the logs written here (the only suggestions being delete them after a day). Is there anything I can do now to reduce what's logged there in the first place? Cheers, Krishna
Re: Loading xml to hive and fetching unbounded tags
Are you trying to capture this data in one column and use XPATH with UDF to get the data. On Wed, Jun 11, 2014 at 11:12 AM, harish tangella harish.tange...@gmail.com wrote: Hi, Request you to help. Fetching unbounded tags from the xml in hive We tried with xpath but unable to get all the unbounded tags. a sample xml file is Rows Row APPLICATION_ID1/APPLICATION_ID AppDetails AppDetail APPLICATION_CODE1/APPLICATION_CODE /AppDetail AppDetail APPLICATION_CODE2/APPLICATION_CODE /AppDetail /AppDetails /Row /Rows we are able to get the application code by giving [1] in appdetail. Request for help to get all the appdetail tags.
Re: Loading xml to hive and fetching unbounded tags
Hi, We are trying to get the data in the form of rows not in columns ..We are able to get partial data by implementing RecordReader. Logic we have applied is - getting the xml with start and end tag as 'Row' as the result we get only the second row, expected is 2 rows Refering to below xml , Expected result is : RowAPPLICATION_ID1/APPLICATION_IDAppDetailsAppDetail APPLICATION_CODE1/APPLICATION_CODE/AppDetail/AppDetails/Row RowAPPLICATION_ID1/APPLICATION_IDAppDetails AppDetailAPPLICATION_CODE2/APPLICATION_CODE/AppDetail/AppDetails /Row In case if we use Xpath.. we get the data in the column wise , when we do select APPLICATION_ID,APPLICATION_CODE from the table , we get 1,[1,2] On Fri, Jun 13, 2014 at 4:01 PM, Knowledge gatherer knowledge.gatherer@gmail.com wrote: Are you trying to capture this data in one column and use XPATH with UDF to get the data. On Wed, Jun 11, 2014 at 11:12 AM, harish tangella harish.tange...@gmail.com wrote: Hi, Request you to help. Fetching unbounded tags from the xml in hive We tried with xpath but unable to get all the unbounded tags. a sample xml file is Rows Row APPLICATION_ID1/APPLICATION_ID AppDetails AppDetail APPLICATION_CODE1/APPLICATION_CODE /AppDetail AppDetail APPLICATION_CODE2/APPLICATION_CODE /AppDetail /AppDetails /Row /Rows we are able to get the application code by giving [1] in appdetail. Request for help to get all the appdetail tags.
HCatalog access from a Java app
I’m experimenting with HCatalog, and would like to be able to access tables and their schema from a Java application (not Hive/Pig/MapReduce). However, the API seems to be hidden, which leads leads me to believe that this is not a supported use case. Is HCatalog use limited to one of the supported frameworks? TIA Brian
Re: HCatalog access from a Java app
You should be able to access this information. The exact API depends on the version of Hive/HCat. As you know earlier HCat API is being deprecated and will be removed in Hive 0.14.0. I can provide you with the code sample if you tell me what you are trying to do and what version of Hive you are using. On Fri, Jun 13, 2014 at 7:33 AM, Brian Jeltema brian.jelt...@digitalenvoy.net wrote: I’m experimenting with HCatalog, and would like to be able to access tables and their schema from a Java application (not Hive/Pig/MapReduce). However, the API seems to be hidden, which leads leads me to believe that this is not a supported use case. Is HCatalog use limited to one of the supported frameworks? TIA Brian
Re: HCatalog access from a Java app
Version 0.12.0. I’d like to obtain the table’s schema, scan a table partition, and use the schema to parse the rows. I can probably figure this out by looking at the HCatalog source. My concern was that the HCatalog packages in the Hive distributions are excluded in the JavaDoc, which implies that the API is not public. Is there a reason for this? Brian On Jun 13, 2014, at 9:10 AM, Dmitry Vasilenko dvasi...@gmail.com wrote: You should be able to access this information. The exact API depends on the version of Hive/HCat. As you know earlier HCat API is being deprecated and will be removed in Hive 0.14.0. I can provide you with the code sample if you tell me what you are trying to do and what version of Hive you are using. On Fri, Jun 13, 2014 at 7:33 AM, Brian Jeltema brian.jelt...@digitalenvoy.net wrote: I’m experimenting with HCatalog, and would like to be able to access tables and their schema from a Java application (not Hive/Pig/MapReduce). However, the API seems to be hidden, which leads leads me to believe that this is not a supported use case. Is HCatalog use limited to one of the supported frameworks? TIA Brian
Re: HCatalog access from a Java app
I am not sure about java docs... ;-] I have spent the last three years integrating with HCat and to make it work had to go thru the code... So here are some samples that can be helpful to start with. If you are using Hive 0.12.0 I would not bother with the new APIs... I had to create some shim classes for HCat to make my code version independent but I cannot share that. So 1. To enumerate tables ... just use Hive client ... this seems to be version independent hiveMetastoreClient = new HiveMetaStoreClient(conf); // the conf should contain the hive.metastore.uris property that point to your Hive Metastore thrift server ListString databases = hiveMetastoreClient.getAllDatabases(); // this will get you all the databases ListString tables = hiveMetastoreClient.getAllTables(database); // this will get you all the tables for the give data base 2. To get the table schema... I assume that you are after HCat schema import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.mapreduce.InputSplit; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.hcatalog.data.schema.HCatSchemaUtils; import org.apache.hcatalog.mapreduce.HCatInputFormat; import org.apache.hcatalog.mapreduce.HCatSplit; import org.apache.hcatalog.mapreduce.InputJobInfo; Job job = new Job(config); job.setJarByClass(XX.class); // this will be your class job.setInputFormatClass(HCatInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); InputJobInfo inputJobInfo = InputJobInfo.create(my_data_base, my_table, partition filter); HCatInputFormat.setInput(job, inputJobInfo); HCatSchema s = HCatInputFormat.getTableSchema(job); 3. To read the HCat records It depends on how you' like to read the records ... will you be reading ALL the records remotely from the client app or you will get input splits and read the records on mappers??? The code will be different (somewhat)... let me know... On Fri, Jun 13, 2014 at 8:25 AM, Brian Jeltema brian.jelt...@digitalenvoy.net wrote: Version 0.12.0. I’d like to obtain the table’s schema, scan a table partition, and use the schema to parse the rows. I can probably figure this out by looking at the HCatalog source. My concern was that the HCatalog packages in the Hive distributions are excluded in the JavaDoc, which implies that the API is not public. Is there a reason for this? Brian On Jun 13, 2014, at 9:10 AM, Dmitry Vasilenko dvasi...@gmail.com wrote: You should be able to access this information. The exact API depends on the version of Hive/HCat. As you know earlier HCat API is being deprecated and will be removed in Hive 0.14.0. I can provide you with the code sample if you tell me what you are trying to do and what version of Hive you are using. On Fri, Jun 13, 2014 at 7:33 AM, Brian Jeltema brian.jelt...@digitalenvoy.net wrote: I’m experimenting with HCatalog, and would like to be able to access tables and their schema from a Java application (not Hive/Pig/MapReduce). However, the API seems to be hidden, which leads leads me to believe that this is not a supported use case. Is HCatalog use limited to one of the supported frameworks? TIA Brian
Re: HCatalog access from a Java app
Doing this, with the appropriate substitutions for my table, jarClass, etc: 2. To get the table schema... I assume that you are after HCat schema import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.mapreduce.InputSplit; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.hcatalog.data.schema.HCatSchemaUtils; import org.apache.hcatalog.mapreduce.HCatInputFormat; import org.apache.hcatalog.mapreduce.HCatSplit; import org.apache.hcatalog.mapreduce.InputJobInfo; Job job = new Job(config); job.setJarByClass(XX.class); // this will be your class job.setInputFormatClass(HCatInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); InputJobInfo inputJobInfo = InputJobInfo.create(my_data_base, my_table, partition filter); HCatInputFormat.setInput(job, inputJobInfo); HCatSchema s = HCatInputFormat.getTableSchema(job); results in: Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.hcatalog.mapreduce.HCatBaseInputFormat.getTableSchema(HCatBaseInputFormat.java:234)
Re: HCatalog access from a Java app
Please take a look at http://stackoverflow.com/questions/22630323/hadoop-java-lang-incompatibleclasschangeerror-found-interface-org-apache-hadoo On Fri, Jun 13, 2014 at 9:53 AM, Brian Jeltema brian.jelt...@digitalenvoy.net wrote: Doing this, with the appropriate substitutions for my table, jarClass, etc: 2. To get the table schema... I assume that you are after HCat schema import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.mapreduce.InputSplit; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.hcatalog.data.schema.HCatSchemaUtils; import org.apache.hcatalog.mapreduce.HCatInputFormat; import org.apache.hcatalog.mapreduce.HCatSplit; import org.apache.hcatalog.mapreduce.InputJobInfo; Job job = new Job(config); job.setJarByClass(XX.class); // this will be your class job.setInputFormatClass(HCatInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); InputJobInfo inputJobInfo = InputJobInfo.create(my_data_base, my_table, partition filter); HCatInputFormat.setInput(job, inputJobInfo); HCatSchema s = HCatInputFormat.getTableSchema(job); results in: Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.hcatalog.mapreduce.HCatBaseInputFormat.getTableSchema(HCatBaseInputFormat.java:234)
Re: HCatalog access from a Java app
BTW, you can also get the Hive schema and partitions (using the code from #1) Table table = hiveMetastoreClient.getTable(databaseName, tableName); ListFieldSchema schema = hiveMetastoreClient.getSchema(databaseName, tableName); ListFieldSchema partitions = table.getPartitionKeys(); The HCat and Hive APIs for the schema differ but for the task at hand maybe you do not need HCatSchema... just a thought... On Fri, Jun 13, 2014 at 10:32 AM, Dmitry Vasilenko dvasi...@gmail.com wrote: Please take a look at http://stackoverflow.com/questions/22630323/hadoop-java-lang-incompatibleclasschangeerror-found-interface-org-apache-hadoo On Fri, Jun 13, 2014 at 9:53 AM, Brian Jeltema brian.jelt...@digitalenvoy.net wrote: Doing this, with the appropriate substitutions for my table, jarClass, etc: 2. To get the table schema... I assume that you are after HCat schema import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.mapreduce.InputSplit; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.hcatalog.data.schema.HCatSchemaUtils; import org.apache.hcatalog.mapreduce.HCatInputFormat; import org.apache.hcatalog.mapreduce.HCatSplit; import org.apache.hcatalog.mapreduce.InputJobInfo; Job job = new Job(config); job.setJarByClass(XX.class); // this will be your class job.setInputFormatClass(HCatInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); InputJobInfo inputJobInfo = InputJobInfo.create(my_data_base, my_table, partition filter); HCatInputFormat.setInput(job, inputJobInfo); HCatSchema s = HCatInputFormat.getTableSchema(job); results in: Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.hcatalog.mapreduce.HCatBaseInputFormat.getTableSchema(HCatBaseInputFormat.java:234)
RE: hbase import..
Fyi neglected to turn off iptables . From: Kennedy, Sean C. Sent: Thursday, June 12, 2014 8:26 PM To: user@hive.apache.org Subject: hbase import.. Trying to importsv: /hd/hadoop/bin/hadoop jar /hbase/hbase-0.94.15/hbase-0.94.15.jar importtsv '-Dimporttsv.separator=,' -Dimporttsv.columns=HBASE_ROW_KEY,ColumnONE,ColumnTWO,ColumnThree TESTTABLE /ma/segwhdfs/hpvppm/test/3coltestfile It looks like my zookeepers are up but I still run into the problem.. Any help appreciated ...Using Hadoop 1.2.1 14/06/12 20:24:06 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=18 watcher=hconnection 14/06/12 20:24:06 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 26555@usbhbase 14/06/12 20:24:06 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 14/06/12 20:24:06 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates Direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates Direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system.
Running Hive JDBC get ClassNotFound: org.apache.hadoop.conf.Configuration
Hi I recently downloaded the HDP 2.1 Sandbox. I'm trying to create a simple java program that connects to the hive server. I'm using maven with the following dependency: dependency groupIdorg.apache.hive/groupId artifactIdhive-jdbc/artifactId version0.13.1/version /dependency Here is the java program: public class PruebaHive { public PruebaHive() { } public static void main(String[] args) throws Exception { Class.forName(org.apache.hive.jdbc.HiveDriver); Connection connection = DriverManager.getConnection(jdbc:hive2:// 192.168.182.128:1, , ); connection.close(); } } But I'm getting the following exception: Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:367) at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:200) at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:178) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:582) at java.sql.DriverManager.getConnection(DriverManager.java:185) at ve.com.pacific.buscador.PruebaHive.main(PruebaHive.java:18) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) Regards, Néstor