Reduce the amount of logging going into /var/log/hive/userlogs

2014-06-13 Thread Krishna Rao
Last time I looked there wasn't much info available on how to reduce the
size of the logs written here (the only suggestions being delete them after
a day).

Is there anything I can do now to reduce what's logged there in the first
place?

Cheers,

Krishna


Re: Loading xml to hive and fetching unbounded tags

2014-06-13 Thread Knowledge gatherer
Are you trying to capture this data in one column and use XPATH with UDF to
get the data.


On Wed, Jun 11, 2014 at 11:12 AM, harish tangella harish.tange...@gmail.com
 wrote:

 Hi,

   Request you to help.

   Fetching unbounded tags from the xml in hive

  We tried with xpath but unable to get all the unbounded tags.

 a sample xml file is

 Rows
 Row
 APPLICATION_ID1/APPLICATION_ID
 AppDetails
 AppDetail
 APPLICATION_CODE1/APPLICATION_CODE
 /AppDetail
 AppDetail
 APPLICATION_CODE2/APPLICATION_CODE
 /AppDetail
 /AppDetails
 /Row
 /Rows

 we are able to get the application code by giving [1] in appdetail.
 Request for help to get all the appdetail tags.



Re: Loading xml to hive and fetching unbounded tags

2014-06-13 Thread harish tangella
Hi,

We are trying to get the data in the form of rows not in columns ..We are
able to get partial data by implementing RecordReader. Logic we have
applied is - getting the xml with start and end tag as 'Row' as the result
we get only the second row, expected is 2 rows

Refering to below xml , Expected result is :

 RowAPPLICATION_ID1/APPLICATION_IDAppDetailsAppDetail
APPLICATION_CODE1/APPLICATION_CODE/AppDetail/AppDetails/Row

RowAPPLICATION_ID1/APPLICATION_IDAppDetails
AppDetailAPPLICATION_CODE2/APPLICATION_CODE/AppDetail/AppDetails
/Row

In case if we use Xpath.. we get the data in the column wise , when we do
select  APPLICATION_ID,APPLICATION_CODE from the table , we get 1,[1,2]









On Fri, Jun 13, 2014 at 4:01 PM, Knowledge gatherer 
knowledge.gatherer@gmail.com wrote:

  Are you trying to capture this data in one column and use XPATH with UDF
 to get the data.


 On Wed, Jun 11, 2014 at 11:12 AM, harish tangella 
 harish.tange...@gmail.com wrote:

 Hi,

   Request you to help.

   Fetching unbounded tags from the xml in hive

  We tried with xpath but unable to get all the unbounded tags.

 a sample xml file is

 Rows
 Row
 APPLICATION_ID1/APPLICATION_ID
 AppDetails
 AppDetail
 APPLICATION_CODE1/APPLICATION_CODE
 /AppDetail
 AppDetail
 APPLICATION_CODE2/APPLICATION_CODE
 /AppDetail
 /AppDetails
 /Row
 /Rows

 we are able to get the application code by giving [1] in appdetail.
 Request for help to get all the appdetail tags.





HCatalog access from a Java app

2014-06-13 Thread Brian Jeltema
I’m experimenting with HCatalog, and would like to be able to access tables and 
their schema
from a Java application (not Hive/Pig/MapReduce). However, the API seems to be 
hidden, which
leads leads me to believe that this is not a supported use case. Is HCatalog 
use limited to
one of the supported frameworks?

TIA

Brian

Re: HCatalog access from a Java app

2014-06-13 Thread Dmitry Vasilenko
You should be able to access this information. The exact API depends on the
version of Hive/HCat. As you know earlier HCat API is being deprecated and
will be removed in Hive 0.14.0. I can provide you with the code sample if
you tell me what you are trying to do and what version of Hive you are
using.


On Fri, Jun 13, 2014 at 7:33 AM, Brian Jeltema 
brian.jelt...@digitalenvoy.net wrote:

 I’m experimenting with HCatalog, and would like to be able to access
 tables and their schema
 from a Java application (not Hive/Pig/MapReduce). However, the API seems
 to be hidden, which
 leads leads me to believe that this is not a supported use case. Is
 HCatalog use limited to
 one of the supported frameworks?

 TIA

 Brian


Re: HCatalog access from a Java app

2014-06-13 Thread Brian Jeltema
Version 0.12.0.

I’d like to obtain the table’s schema, scan a table partition, and use the 
schema to parse the rows.

I can probably figure this out by looking at the HCatalog source. My concern 
was that
the HCatalog packages in the Hive distributions are excluded in the JavaDoc, 
which implies
that the API is not public. Is there a reason for this?

Brian

On Jun 13, 2014, at 9:10 AM, Dmitry Vasilenko dvasi...@gmail.com wrote:

 You should be able to access this information. The exact API depends on the 
 version of Hive/HCat. As you know earlier HCat API is being deprecated and 
 will be removed in Hive 0.14.0. I can provide you with the code sample if you 
 tell me what you are trying to do and what version of Hive you are using. 
 
 
 On Fri, Jun 13, 2014 at 7:33 AM, Brian Jeltema 
 brian.jelt...@digitalenvoy.net wrote:
 I’m experimenting with HCatalog, and would like to be able to access tables 
 and their schema
 from a Java application (not Hive/Pig/MapReduce). However, the API seems to 
 be hidden, which
 leads leads me to believe that this is not a supported use case. Is HCatalog 
 use limited to
 one of the supported frameworks?
 
 TIA
 
 Brian
 



Re: HCatalog access from a Java app

2014-06-13 Thread Dmitry Vasilenko
I am not sure about java docs... ;-]
I have spent the last three years integrating with HCat and to make it work
had to go thru the code...

So here are some samples that can be helpful to start with. If you are
using Hive 0.12.0 I would not bother with the new APIs... I had to create
some shim classes for HCat to make my code version independent but I cannot
share that.

So

1. To enumerate tables ... just use Hive client ... this seems to be
version independent

   hiveMetastoreClient = new HiveMetaStoreClient(conf);

// the conf should contain the hive.metastore.uris property that point to
your Hive Metastore thrift server
   ListString databases = hiveMetastoreClient.getAllDatabases();
// this will get you all the databases
   ListString tables = hiveMetastoreClient.getAllTables(database);
// this will get you all the tables for the give data base

2. To get the table schema... I assume that you are after HCat schema


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hcatalog.data.schema.HCatSchemaUtils;
import org.apache.hcatalog.mapreduce.HCatInputFormat;
import org.apache.hcatalog.mapreduce.HCatSplit;
import org.apache.hcatalog.mapreduce.InputJobInfo;


  Job job = new Job(config);
  job.setJarByClass(XX.class); // this will be your class
job.setInputFormatClass(HCatInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
  InputJobInfo inputJobInfo = InputJobInfo.create(my_data_base,
my_table, partition filter);
HCatInputFormat.setInput(job, inputJobInfo);
HCatSchema s =  HCatInputFormat.getTableSchema(job);


3. To read the HCat records

It depends on how you' like to read the records  ... will you be reading
ALL the records remotely from the client app
or you will get input splits and read the records on mappers???

The code will be different (somewhat)... let me know...





















On Fri, Jun 13, 2014 at 8:25 AM, Brian Jeltema 
brian.jelt...@digitalenvoy.net wrote:

 Version 0.12.0.

 I’d like to obtain the table’s schema, scan a table partition, and use the
 schema to parse the rows.

 I can probably figure this out by looking at the HCatalog source. My
 concern was that
 the HCatalog packages in the Hive distributions are excluded in the
 JavaDoc, which implies
 that the API is not public. Is there a reason for this?

 Brian

 On Jun 13, 2014, at 9:10 AM, Dmitry Vasilenko dvasi...@gmail.com wrote:

 You should be able to access this information. The exact API depends on
 the version of Hive/HCat. As you know earlier HCat API is being deprecated
 and will be removed in Hive 0.14.0. I can provide you with the code sample
 if you tell me what you are trying to do and what version of Hive you are
 using.


 On Fri, Jun 13, 2014 at 7:33 AM, Brian Jeltema 
 brian.jelt...@digitalenvoy.net wrote:

 I’m experimenting with HCatalog, and would like to be able to access
 tables and their schema
 from a Java application (not Hive/Pig/MapReduce). However, the API seems
 to be hidden, which
 leads leads me to believe that this is not a supported use case. Is
 HCatalog use limited to
 one of the supported frameworks?

 TIA

 Brian






Re: HCatalog access from a Java app

2014-06-13 Thread Brian Jeltema
Doing this, with the appropriate substitutions for my table, jarClass, etc:

 2. To get the table schema... I assume that you are after HCat schema  
 
 
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.mapreduce.InputSplit;
 import org.apache.hadoop.mapreduce.Job;
 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 import org.apache.hcatalog.data.schema.HCatSchemaUtils;
 import org.apache.hcatalog.mapreduce.HCatInputFormat;
 import org.apache.hcatalog.mapreduce.HCatSplit;
 import org.apache.hcatalog.mapreduce.InputJobInfo;
 
 
   Job job = new Job(config);
   job.setJarByClass(XX.class); // this will be your class 
 job.setInputFormatClass(HCatInputFormat.class);
 job.setOutputFormatClass(TextOutputFormat.class);
   InputJobInfo inputJobInfo = InputJobInfo.create(my_data_base, my_table, 
 partition filter);
 HCatInputFormat.setInput(job, inputJobInfo);
 HCatSchema s =  HCatInputFormat.getTableSchema(job);

results in:

Exception in thread main java.lang.IncompatibleClassChangeError: Found 
interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at 
org.apache.hcatalog.mapreduce.HCatBaseInputFormat.getTableSchema(HCatBaseInputFormat.java:234)




Re: HCatalog access from a Java app

2014-06-13 Thread Dmitry Vasilenko
Please take a look at
http://stackoverflow.com/questions/22630323/hadoop-java-lang-incompatibleclasschangeerror-found-interface-org-apache-hadoo




On Fri, Jun 13, 2014 at 9:53 AM, Brian Jeltema 
brian.jelt...@digitalenvoy.net wrote:

 Doing this, with the appropriate substitutions for my table, jarClass, etc:

 2. To get the table schema... I assume that you are after HCat schema


 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.mapreduce.InputSplit;
 import org.apache.hadoop.mapreduce.Job;
 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 import org.apache.hcatalog.data.schema.HCatSchemaUtils;
 import org.apache.hcatalog.mapreduce.HCatInputFormat;
 import org.apache.hcatalog.mapreduce.HCatSplit;
 import org.apache.hcatalog.mapreduce.InputJobInfo;


   Job job = new Job(config);
   job.setJarByClass(XX.class); // this will be your class
 job.setInputFormatClass(HCatInputFormat.class);
 job.setOutputFormatClass(TextOutputFormat.class);
   InputJobInfo inputJobInfo = InputJobInfo.create(my_data_base,
 my_table, partition filter);
 HCatInputFormat.setInput(job, inputJobInfo);
 HCatSchema s =  HCatInputFormat.getTableSchema(job);


 results in:

 Exception in thread main java.lang.IncompatibleClassChangeError: Found
 interface org.apache.hadoop.mapreduce.JobContext, but class was expected
 at
 org.apache.hcatalog.mapreduce.HCatBaseInputFormat.getTableSchema(HCatBaseInputFormat.java:234)





Re: HCatalog access from a Java app

2014-06-13 Thread Dmitry Vasilenko
BTW, you can also get the Hive schema and partitions (using the code from
#1)

Table table = hiveMetastoreClient.getTable(databaseName, tableName);
ListFieldSchema schema = hiveMetastoreClient.getSchema(databaseName,
tableName);
ListFieldSchema partitions = table.getPartitionKeys();

The HCat and Hive APIs for the schema differ but for the task at hand maybe
you do not need HCatSchema... just a thought...



On Fri, Jun 13, 2014 at 10:32 AM, Dmitry Vasilenko dvasi...@gmail.com
wrote:

 Please take a look at

 http://stackoverflow.com/questions/22630323/hadoop-java-lang-incompatibleclasschangeerror-found-interface-org-apache-hadoo




 On Fri, Jun 13, 2014 at 9:53 AM, Brian Jeltema 
 brian.jelt...@digitalenvoy.net wrote:

 Doing this, with the appropriate substitutions for my table, jarClass,
 etc:

 2. To get the table schema... I assume that you are after HCat schema


 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.mapreduce.InputSplit;
 import org.apache.hadoop.mapreduce.Job;
 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 import org.apache.hcatalog.data.schema.HCatSchemaUtils;
 import org.apache.hcatalog.mapreduce.HCatInputFormat;
 import org.apache.hcatalog.mapreduce.HCatSplit;
 import org.apache.hcatalog.mapreduce.InputJobInfo;


   Job job = new Job(config);
   job.setJarByClass(XX.class); // this will be your class
 job.setInputFormatClass(HCatInputFormat.class);
 job.setOutputFormatClass(TextOutputFormat.class);
   InputJobInfo inputJobInfo = InputJobInfo.create(my_data_base,
 my_table, partition filter);
 HCatInputFormat.setInput(job, inputJobInfo);
 HCatSchema s =  HCatInputFormat.getTableSchema(job);


 results in:

 Exception in thread main java.lang.IncompatibleClassChangeError: Found
 interface org.apache.hadoop.mapreduce.JobContext, but class was expected
 at
 org.apache.hcatalog.mapreduce.HCatBaseInputFormat.getTableSchema(HCatBaseInputFormat.java:234)






RE: hbase import..

2014-06-13 Thread Kennedy, Sean C.
Fyi neglected to turn off iptables .

From: Kennedy, Sean C.
Sent: Thursday, June 12, 2014 8:26 PM
To: user@hive.apache.org
Subject: hbase import..

Trying to importsv:

/hd/hadoop/bin/hadoop jar /hbase/hbase-0.94.15/hbase-0.94.15.jar importtsv 
'-Dimporttsv.separator=,' 
-Dimporttsv.columns=HBASE_ROW_KEY,ColumnONE,ColumnTWO,ColumnThree TESTTABLE  
/ma/segwhdfs/hpvppm/test/3coltestfile


It looks like my zookeepers are up but I still run into the problem..

Any help appreciated ...Using Hadoop 1.2.1 



14/06/12 20:24:06 INFO zookeeper.ZooKeeper: Initiating client connection, 
connectString=localhost:2181 sessionTimeout=18 watcher=hconnection
14/06/12 20:24:06 INFO zookeeper.RecoverableZooKeeper: The identifier of this 
process is 26555@usbhbase
14/06/12 20:24:06 INFO zookeeper.ClientCnxn: Opening socket connection to 
server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL 
(unknown error)
14/06/12 20:24:06 WARN zookeeper.ClientCnxn: Session 0x0 for server null, 
unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)

Notice:  This e-mail message, together with any attachments, contains
information of Merck  Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates Direct contact information
for affiliates is available at
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from
your system.
Notice:  This e-mail message, together with any attachments, contains
information of Merck  Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates Direct contact information
for affiliates is available at 
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from 
your system.


Running Hive JDBC get ClassNotFound: org.apache.hadoop.conf.Configuration

2014-06-13 Thread Néstor Boscán
Hi

I recently downloaded the HDP 2.1 Sandbox. I'm trying to create a simple
java program that connects to the hive server. I'm using maven with the
following dependency:

dependency
groupIdorg.apache.hive/groupId
artifactIdhive-jdbc/artifactId
version0.13.1/version
/dependency

Here is the java program:

public class PruebaHive {
public PruebaHive() {
}

public static void main(String[] args) throws Exception {
Class.forName(org.apache.hive.jdbc.HiveDriver);
Connection connection = DriverManager.getConnection(jdbc:hive2://
192.168.182.128:1, , );

connection.close();
}
}

But I'm getting the following exception:

Exception in thread main java.lang.NoClassDefFoundError:
org/apache/hadoop/conf/Configuration
 at
org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:367)
at
org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:200)
 at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:178)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
 at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:185)
 at ve.com.pacific.buscador.PruebaHive.main(PruebaHive.java:18)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.conf.Configuration
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

Regards,

Néstor