Re: select count(*) from table;
If you have enabled performance optimization by enabling statistics it will come from there if the underlying file format supports infile statistics (like ORC), it will come from there if its just plain vanilla text file format, it needs to run a job to get the count so the longest of all On Tue, Mar 22, 2016 at 12:44 PM, Amey Barve wrote: > select count(*) from table; > > How does hive evaluate count(*) on a table? > > Does it return count by actually querying table, or directly return count > by consulting some statistics locally. > > For Hive's Text format it takes few seconds while Hive's Orc format takes > fraction of seconds. > > Regards, > Amey > -- Nitin Pawar
Re: [GitHub] hive pull request: HIVE 2304 : for hive2
as per understanding, apache hive development does not support git pull requests yet you may want to create a patch and upload to appropriate jira ticket On Sat, Jul 19, 2014 at 3:15 AM, codingtony wrote: > GitHub user codingtony opened a pull request: > > https://github.com/apache/hive/pull/20 > > HIVE 2304 : for hive2 > > Fix for HivePreparedStatement for the hive2 driver. > > Applied the same setObject() code that fixed HIVE 2304 for hive1 > driver. > > You can merge this pull request into a Git repository by running: > > $ git pull https://github.com/codingtony/hive HIVE-2304-hive2 > > Alternatively you can review and apply these changes as the patch at: > > https://github.com/apache/hive/pull/20.patch > > To close this pull request, make a commit to your master/trunk branch > with (at least) the following in the commit message: > > This closes #20 > > > commit 05dea4aaa70f9fb1676c97fe57b3f6813eeef111 > Author: Sushanth Sowmyan > Date: 2014-06-02T19:25:00Z > > Hive 0.13.1-rc3 release. > > git-svn-id: > https://svn.apache.org/repos/asf/hive/tags/release-0.13.1-rc3@1599318 > 13f79535-47bb-0310-9956-ffa450edef68 > > commit 85a78a0d6b992df238bce96fd57afb385b5d8b06 > Author: Sushanth Sowmyan > Date: 2014-06-05T21:03:32Z > > Hive 0.13.1 release. > > git-svn-id: > https://svn.apache.org/repos/asf/hive/tags/release-0.13.1@1600763 > 13f79535-47bb-0310-9956-ffa450edef68 > > commit 5464913c3e4707eba29eb5c917453afed905411b > Author: Tony Bussieres > Date: 2014-07-18T21:36:08Z > > HIVE-2304 : Apply same fix but for > org.apache.hive.jdbc.HivePreparedStatement (hive2) > > > > > --- > If your project is set up for it, you can reply to this email and have your > reply appear on GitHub as well. If your project does not have this feature > enabled and wishes so, or if the feature is enabled but not working, please > contact infrastructure at infrastruct...@apache.org or file a JIRA ticket > with INFRA. > --- > -- Nitin Pawar
Re: Possible bug loading data in Hive.
The error you see is with hive metastore and these issues were kind of related to two sided 1) Load on metastore 2) datanuclueas related For now if possible, see if you can restart hive metastore and that resolves your issue. On Tue, Jun 10, 2014 at 3:27 PM, Fernando Agudo wrote: > I have problems to upgrade to hive-0.13 or 0.12 because is in production. > Only have this configuration of the datanuclues: > > > datanucleus.fixedDatastore > true > > > datanucleus.autoCreateSchema > false > > > This is relevant for the problem? > > Thanks, > > > On 10/06/14 10:53, Nitin Pawar wrote: > >> Hive 0.9.0 with CDH4.1 <--- This is very old release. >> >> I would recommend to upgrade to hive-0.13 or at least 0.12 and see. >> >> Error you are seeing is on loading data into a partition and metastore >> alter/add partition is failing. >> >> Can you try upgrading and see if that resolves your issue? >> If not can you share your datanuclues related settings in hive >> >> >> On Tue, Jun 10, 2014 at 2:16 PM, Fernando Agudo >> wrote: >> >> Hello, >>> >>> I'm working with Hive 0.9.0 with CDH4.1. I have a process which it's >>> loading data in Hive every minute. It creates the partition if it's >>> necessary. >>> I have been monitoring this process for three days and I realize that >>> there's a method (*listStorageDescriptorsWithCD*) which increases the >>> execution time. First execution this method lasted about 15 millisencond >>> and in the end it took more than 3 seconds (three days later), after >>> that, >>> Hive throws an exception and starts working again. >>> >>> I have checking this method but I haven't figured out any suspicious, >>> could it be a bug? >>> >>> >>> >>> *2014-06-05 09:58:20,921* DEBUG metastore.ObjectStore (ObjectStore.java: >>> listStorageDescriptorsWithCD(2036)) - Executing >>> listStorageDescriptorsWithCD >>> *2014-06-05 09:58:20,928* DEBUG metastore.ObjectStore (ObjectStore.java: >>> listStorageDescriptorsWithCD(2045)) - Done executing query for >>> listStorageDescriptorsWithCD >>> >>> >>> *2014-06-08 20:15:33,867* DEBUG metastore.ObjectStore (ObjectStore.java: >>> listStorageDescriptorsWithCD(2036)) - Executing listStorageDescriptor >>> sWithCD >>> *2014-06-08 20:15:36,134* DEBUG metastore.ObjectStore (ObjectStore.java: >>> listStorageDescriptorsWithCD(2045)) - Done executing query for listSt >>> orageDescriptorsWithCD >>> >>> >>> >>> 2014-06-08 20:16:34,600 DEBUG metastore.ObjectStore (ObjectStore.java: >>> removeUnusedColumnDescriptor(1989)) - execute removeUnusedColumnDescr >>> iptor >>> *2014-06-08 20:16:34,600 DEBUG metastore.ObjectStore (ObjectStore.java: >>> listStorageDescriptorsWithCD(2036)) - Executing listStorageDescriptor** >>> **sWithCD* >>> 2014-06-08 20:16:34,805 ERROR metadata.Hive >>> (Hive.java:getPartition(1453)) >>> - org.apache.hadoop.hive.ql.metadata.HiveException: Unable to al >>> ter partition. >>> at org.apache.hadoop.hive.ql.metadata.Hive.alterPartition( >>> Hive.java:429) >>> at org.apache.hadoop.hive.ql.metadata.Hive.getPartition( >>> Hive.java:1446) >>> at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition( >>> Hive.java:1158) >>> at org.apache.hadoop.hive.ql.exec.MoveTask.execute( >>> MoveTask.java:304) >>> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task. >>> java:153) >>> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential( >>> TaskRunner.java:57) >>> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java: >>> 1331) >>> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1117) >>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:950) >>> at org.apache.hadoop.hive.service.HiveServer$ >>> HiveServerHandler.execute(HiveServer.java:191) >>> at org.apache.hadoop.hive.service.ThriftHive$Processor$ >>> execute.getResult(ThriftHive.java:630) >>> at org.apache.hadoop.hive.service.ThriftHive$Processor$ >>> execute.getResult(ThriftHive.java:618) >>> at org.apache.thrift.ProcessFunction.process( >>> ProcessFunction.java:32) >>> at org.apach
Re: Possible bug loading data in Hive.
metastore.HiveMetaStore$ > HMSHandler.alter_partition(HiveMetaStore.java:1771) > at org.apache.hadoop.hive.metastore.HiveMetaStoreClient. > alter_partition(HiveMetaStoreClient.java:834) > at org.apache.hadoop.hive.ql.metadata.Hive.alterPartition( > Hive.java:425) > ... 17 more > > 2014-06-08 20:16:34,827 ERROR exec.Task (SessionState.java:printError(403)) > - Failed with exception org.apache.hadoop.hive.ql.metadata.HiveException: > Unable to alter partition. > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter > partition. > at org.apache.hadoop.hive.ql.metadata.Hive.getPartition( > Hive.java:1454) > at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition( > Hive.java:1158) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute( > MoveTask.java:304) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) > at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential( > TaskRunner.java:57) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1331) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1117) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:950) > at org.apache.hadoop.hive.service.HiveServer$ > HiveServerHandler.execute(HiveServer.java:191) > at org.apache.hadoop.hive.service.ThriftHive$Processor$ > execute.getResult(ThriftHive.java:630) > at org.apache.hadoop.hive.service.ThriftHive$Processor$ > execute.getResult(ThriftHive.java:618) > at org.apache.thrift.ProcessFunction.process( > ProcessFunction.java:32) > at org.apache.thrift.TBaseProcessor.process( > TBaseProcessor.java:34) > at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run( > TThreadPoolServer.java:176) > at java.util.concurrent.ThreadPoolExecutor$Worker. > runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to > alter partition. > at org.apache.hadoop.hive.ql.metadata.Hive.alterPartition( > Hive.java:429) > at org.apache.hadoop.hive.ql.metadata.Hive.getPartition( > Hive.java:1446) > ... 16 more > Caused by: MetaException(message:The transaction for alter partition did > not commit successfully.) > at org.apache.hadoop.hive.metastore.ObjectStore. > alterPartition(ObjectStore.java:1927) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hive.metastore.RetryingRawStore. > invoke(RetryingRawStore.java:111) > at $Proxy0.alterPartition(Unknown Source) > at org.apache.hadoop.hive.metastore.HiveAlterHandler. > alterPartition(HiveAlterHandler.java:254) > at org.apache.hadoop.hive.metastore.HiveMetaStore$ > HMSHandler.rename_partition(HiveMetaStore.java:1816) > at org.apache.hadoop.hive.metastore.HiveMetaStore$ > HMSHandler.rename_partition(HiveMetaStore.java:1788) > at org.apache.hadoop.hive.metastore.HiveMetaStore$ > HMSHandler.alter_partition(HiveMetaStore.java:1771) > at org.apache.hadoop.hive.metastore.HiveMetaStoreClient. > alter_partition(HiveMetaStoreClient.java:834) > at org.apache.hadoop.hive.ql.metadata.Hive.alterPartition( > Hive.java:425) > ... 17 more > 2014-06-08 20:16:34,852 ERROR ql.Driver (SessionState.java:printError(403)) > - FAILED: Execution Error, return code 1 from org.apache.hadoop > .hive.ql.exec.MoveTask > > > -- > *Fernando Agudo Tarancón* > /Big Data Software Engineer/ > > Telf.: +34 917 680 490 > Fax: +34 913 833 301 > C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain > > _http://www.bidoop.es_ > > -- Nitin Pawar
Re: Scheduling the next Hive Contributors Meeting
I am not a contributor but a spectator to what hive have been doing last couple of years. I work out of India and would love to just sit back and listen to all the new upcoming things (if that's allowed) :) On Sat, Nov 9, 2013 at 1:08 AM, Brock Noland wrote: > Hi, > > Thanks Carl and Thejas! I would be attending remotely so the webex or > google hangout would be very much appreciated. Please let me know if there > is anything I can do to help enable either a webex or hangout! > > The Apache Sentry (incubating)[1] community which depends on Hive would be > interested in briefly describing the project to the Hive community and > discuss how we can work together to move both projects forward! As a side > note, there have been lively discussions on the integration of other > incubating projects therefore I'd just like to share that the changes > Sentry is interested in are very small in scope and unlikely to cause > disruption to the Hive community. > > Cheers! > Brock > > [1] http://incubator.apache.org/projects/sentry.html > > > On Fri, Nov 8, 2013 at 1:08 PM, Carl Steinbach wrote: > > > We're long overdue for a Hive Contributors Meeting. Thejas has offered to > > host the next meeting at Hortonworks on November 19th from 4-6pm. We will > > have a Google Hangout or Webex setup for people who wish to attend > > remotely. If you want to attend but can't because of a scheduling > conflict > > please let us know. If enough people fall into this category we will try > to > > reschedule. > > > > Thanks. > > > > Carl > > > -- Nitin Pawar
Re: Skip trash while dropping Hive table
On hive cli I normally set this set fs.trash.interval=0; in hiverc and use it This setting is hdfs related and I would not recommend it setting it on hdfs-site.xml as it will then apply across hdfs which is not desirable most of the times. On Tue, Nov 5, 2013 at 5:28 AM, Chu Tong wrote: > Hi all, > > Is there an existing way to drop Hive tables without having the deleted > files hitting trash? If not, can we add something similar to Hive for this? > > > Thanks a lot. > -- Nitin Pawar
Re: Single Mapper - HIVE 0.11
whats the size of the table? (in GBs? ) Whats the max and min split sizes have you provied? On Wed, Oct 9, 2013 at 10:28 PM, Gourav Sengupta wrote: > Hi, > > I am trying to run a join using two tables stored in ORC file format. > > The first table has 34 million records and the second has around 300,000 > records. > > Setting "set hive.auto.convert.join=true" makes the entire query run via a > single mapper. > In case I am setting "set hive.auto.convert.join=false" then there are two > mappers first one reads the second table and then the entire large table > goes through the second mapper. > > Is there something that I am doing wrong because there are three nodes in > the HADOOP cluster currently and I was expecting that at least 6 mappers > should have been used. > > Thanks and Regards, > Gourav > -- Nitin Pawar
Self join issue
Hi, I just raised a ticket for a table with self join query. Table is created with json serde provided by cloudera. When I run a single query on the table like select col from table where col='xyz', this works perfectly fine with a mapreduce job. but when I try to run the query of self join on the table it says serde not found on query parsing. i have mentioned the steps in detail on JIRA HIVE-5432<https://issues.apache.org/jira/browse/HIVE-5432> . Can somebody tell what's special when the query is parsed for join and stand alone query? Due to this issue, I have to create temporary tables and make sure I clean them up myself after the jobs are over. Thanks, Nitin Pawar
[jira] [Created] (HIVE-5432) self join for a table with serde definition fails with classNotFoundException, single queries work fine
Nitin Pawar created HIVE-5432: - Summary: self join for a table with serde definition fails with classNotFoundException, single queries work fine Key: HIVE-5432 URL: https://issues.apache.org/jira/browse/HIVE-5432 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.11.0 Environment: rhel6.4 Reporter: Nitin Pawar Steps to reproduce hive> add jar /home/hive/udfs/hive-serdes-1.0-SNAPSHOT.jar; Added /home/hive/udfs/hive-serdes-1.0-SNAPSHOT.jar to class path Added resource: /home/hive/udfs/hive-serdes-1.0-SNAPSHOT.jar hive> create table if not exists test(a string,b string) ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'; OK Time taken: 0.159 seconds hive> load data local inpath '/tmp/1' overwrite into table test; Copying data from file:/tmp/1 Copying file: file:/tmp/1 Loading data to table default.test Table default.test stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 51, raw_data_size: 0] OK Time taken: 0.659 seconds hive> select a from test; Total MapReduce jobs = 1 Launching Job 1 out of 1 ... ... hive> select * from (select b from test where a="test")x join (select b from test where a="test1")y on (x.b = y.b); Total MapReduce jobs = 1 setting HADOOP_USER_NAMEhive Execution log at: /tmp/hive/.log java.lang.ClassNotFoundException: com.cloudera.hive.serde.JSONSerDe Continuing ... 2013-10-03 05:13:00 Starting to launch local task to process map join; maximum memory = 1065484288 org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception nulljava.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FetchOperator.getRowInspectorFromTable(FetchOperator.java:230) at org.apache.hadoop.hive.ql.exec.FetchOperator.getOutputObjectInspector(FetchOperator.java:595) at org.apache.hadoop.hive.ql.exec.MapredLocalTask.initializeOperators(MapredLocalTask.java:406) at org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:290) at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:682) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) at org.apache.hadoop.hive.ql.exec.FetchOperator.getOutputObjectInspector(FetchOperator.java:631) at org.apache.hadoop.hive.ql.exec.MapredLocalTask.initializeOperators(MapredLocalTask.java:406) at org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:290) at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:682) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: -- This message was sent by Atlassian JIRA (v6.1#6144)
Re: Error - loading data into tables
Manickam, I am really not sure if hive supports Federated namespaces yet. I have cc'd dev list. May be any of the core hive developers will be able to tell how to load data using hive on a federated hdfs. On Tue, Oct 1, 2013 at 12:59 PM, Manickam P wrote: > Hi Pawar, > > I tried that option but not working. I have a federated HDFS cluster and > given below is my core site xml. > > I created the HDFS directory inside that /home/storage/mount1 and tried to > load the file now also i'm getting the same error. > > Can you pls tell me what mistake i'm doing here? bcoz i dont have any clue. > > > ** > * * > * * > * fs.default.name* > * viewfs:///* > * * > * * > * fs.viewfs.mounttable.default.link./home/storage/mount1* > * hdfs://10.108.99.68:8020* > * * > * * > * fs.viewfs.mounttable.default.link./home/storage/mount2* > * hdfs://10.108.99.69:8020* > ** > ** > > > Thanks, > Manickam Ppa > > -- > Date: Mon, 30 Sep 2013 21:53:03 +0530 > Subject: Re: Error - loading data into tables > From: nitinpawar...@gmail.com > To: u...@hive.apache.org > > > Is this /home/strorage/... a hdfs directory? > I think its a normal filesystem directory. > > Try running this > load data local inpath '*/home/storage/mount1/tabled.txt' INTO TABLE TEST; > *" > > > On Mon, Sep 30, 2013 at 7:13 PM, Manickam P wrote: > > Hi, > > I'm getting the below error while loading the data into hive table. > *return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask* > * > * > I used "* LOAD DATA INPATH '/home/storage/mount1/tabled.txt' INTO TABLE > TEST;*" this query to insert into table. > > > Thanks, > Manickam P > > > > > -- > Nitin Pawar > -- Nitin Pawar
Re: Hive Issue
nnection.(JDBC4Connection.java:47) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) > at > com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:381) > at > com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:305) > at java.sql.DriverManager.getConnection(DriverManager.java:582) > at java.sql.DriverManager.getConnection(DriverManager.java:185) > at > org.apache.commons.dbcp.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:75) > at > org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:582) > at > org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1148) > at > org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106) > at > org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:52 > 1) > ... 44 more > Caused by: java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) > at > java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) > at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) > at java.net.Socket.connect(Socket.java:529) > at java.net.Socket.connect(Socket.java:478) > at java.net.Socket.(Socket.java:375) > at java.net.Socket.(Socket.java:218) > at > com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:257) > at com.mysql.jdbc.MysqlIO.(MysqlIO.java:294) > ... 63 more > Nested Throwables StackTrace: > com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications > link failure > > The last packet sent successfully to the server was 0 milliseconds ago. > The driver has not received any packets from the serve > r. > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) > at > com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1116) > at com.mysql.jdbc.MysqlIO.(MysqlIO.java:344) > at > com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2332) > at > com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2369) > at > com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2153) > at com.mysql.jdbc.ConnectionImpl.(ConnectionImpl.java:792) > at com.mysql.jdbc.JDBC4Connection.(JDBC4Connection.java:47) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) > at > com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:381) > at > com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:305) > at java.sql.DriverManager.getConnection(DriverManager.java:582) > at java.sql.DriverManager.getConnection(DriverManager.java:185) > at > org.apache.commons.dbcp.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:75) > at > org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:582) > at > org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1148) > at > org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106) > at > org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:52 > 1) > at > org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:290) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:593) > at > org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:300) > at > org.datanucleus.ObjectManagerFactoryImpl.initialiseStoreManager(ObjectManagerFactoryImpl.java:161) > at > org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:583) > -- Nitin Pawar
Re: Last time request for cwiki update privileges
rmat class as a string literal, e.g., > 'org.apache.hadoop.hive.contrib.fileformat.base64.Base64TextInputFormat'. > For LZO compression, the values to use are 'INPUTFORMAT > "com.hadoop.mapred.DeprecatedLzoTextInputFormat" OUTPUTFORMAT > "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"' (see LZO > Compression< > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO>). > > My cwiki id is > https://cwiki.apache.org/confluence/display/~sanjaysubraman...@yahoo.com > It will be great if I could get edit privileges > > Thanks > sanjay > > CONFIDENTIALITY NOTICE > == > This email message and any attachments are for the exclusive use of the > intended recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message along > with any attachments, from your computer system. If you are the intended > recipient, please be advised that the content of this message is subject to > access, review and disclosure by the sender's Email System Administrator. > > CONFIDENTIALITY NOTICE > == > This email message and any attachments are for the exclusive use of the > intended recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message along > with any attachments, from your computer system. If you are the intended > recipient, please be advised that the content of this message is subject to > access, review and disclosure by the sender's Email System Administrator. > > > CONFIDENTIALITY NOTICE > == > This email message and any attachments are for the exclusive use of the > intended recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message along > with any attachments, from your computer system. If you are the intended > recipient, please be advised that the content of this message is subject to > access, review and disclosure by the sender's Email System Administrator. > -- Nitin Pawar
Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors
The mentioned flow is called when you have unsecure mode of thrift metastore client-server connection. So one way to avoid this is have a secure way. public boolean process(final TProtocol in, final TProtocol out) throwsTException { setIpAddress(in); ... ... ... @Override protected void setIpAddress(final TProtocol in) { TUGIContainingTransport ugiTrans = (TUGIContainingTransport)in.getTransport(); Socket socket = ugiTrans.getSocket(); if (socket != null) { setIpAddress(socket); >From the above code snippet, it looks like the null pointer exception is not handled if the getSocket returns null. can you check whats the ulimit setting on the server? If its set to default can you set it to unlimited and restart hcat server. (This is just a wild guess). also the getSocket method suggests "If the underlying TTransport is an instance of TSocket, it returns the Socket object which it contains. Otherwise it returns null." so someone from thirft gurus need to tell us whats happening. I have no knowledge of this depth may be Ashutosh or Thejas will be able to help on this. >From the netstat close_wait, it looks like the hive metastore server has not closed the connection (do not know why yet), may be the hive dev guys can help.Are there too many connections in close_wait state? On Tue, Jul 30, 2013 at 5:52 AM, agateaaa wrote: > Looking at the hive metastore server logs see errors like these: > > 2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer > (TThreadPoolServer.java:run(182)) - Error occurred during processing of > message. > java.lang.NullPointerException > at > > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183) > at > > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79) > at > > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > approx same time as we see timeout or connection reset errors. > > Dont know if this is the cause or the side affect of he connection > timeout/connection reset errors. Does anybody have any pointers or > suggestions ? > > Thanks > > > On Mon, Jul 29, 2013 at 11:29 AM, agateaaa wrote: > > > Thanks Nitin! > > > > We have simiar setup (identical hcatalog and hive server versions) on a > > another production environment and dont see any errors (its been running > ok > > for a few months) > > > > Unfortunately we wont be able to move to hcat 0.5 and hive 0.11 or hive > > 0.10 soon. > > > > I did see that the last time we ran into this problem doing a netstat-ntp > > | grep ":1" see that server was holding on to one socket connection > in > > CLOSE_WAIT state for a long time > > (hive metastore server is running on port 1). Dont know if thats > > relevant here or not > > > > Can you suggest any hive configuration settings we can tweak or > networking > > tools/tips, we can use to narrow this down ? > > > > Thanks > > Agateaaa > > > > > > > > > > On Mon, Jul 29, 2013 at 11:02 AM, Nitin Pawar >wrote: > > > >> Is there any chance you can do a update on test environment with > hcat-0.5 > >> and hive-0(11 or 10) and see if you can reproduce the issue? > >> > >> We used to see this error when there was load on hcat server or some > >> network issue connecting to the server(second one was rare occurrence) > >> > >> > >> On Mon, Jul 29, 2013 at 11:13 PM, agateaaa wrote: > >> > >>> Hi All: > >>> > >>> We are running into frequent problem using HCatalog 0.4.1 (HIve > Metastore > >>> Server 0.9) where we get connection reset or connection timeout errors. > >>> > >>> The hive metastore server has been allocated enough (12G) memory. > >>> > >>> This is a critical problem for us and would appreciate if anyone has > any > >>> pointers. > >>> > >>> We did add a retry logic in our client, which seems to help, but I am > >>> just > >>> wondering how can we narrow down to the root cause > >>> of this problem. Could this be a hiccup in networking which causes the > >>> hive > >>> server to get into a unresponsive state ? > >>> > >>> Thanks >
Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors
t_ugi(ThriftHiveMetastore.java:2136) > at > > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122) > at > > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286) > at > > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197) > at > > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:157) > at > > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102) > at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888) > at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:830) > at > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:954) > at > > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7524) > at > > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > Caused by: java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at > > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) > ... 31 more > -- Nitin Pawar
Re: HCatalog (from Hive 0.11) and Hadoop 2
There is a build scheduled on jenkins for hive trunk which is failing. I will give it a try on my local for hive-011, there is another build which does the ptests which is disabled due to lots of test case failures. https://builds.apache.org/job/Hive-trunk-hadoop2/ I will update you if I could build it On Mon, Jul 29, 2013 at 8:07 PM, Rodrigo Trujillo < rodrigo.truji...@linux.vnet.ibm.com> wrote: > Hi, > > is it possible to build Hive 0.11 and HCatalog with Hadoop 2 (2.0.4-alpha) > ?? > > Regards, > > Rodrigo > > -- Nitin Pawar
Re: ant maven-build not working in trunk
I just tried a build with both jdk versions build = ant clean package jdk7 on branch-0.10 with patch from HIVE-3384 and it works jdk6 on trunk without any changes it works i created a new redhat vm and installed sun jdk 6u43 and tried it. It works too. when i try ant maven-build -Dmvn.publish.repo=local it does fail with make-pom target not existing. Alan has a Jira on this: https://issues.apache.org/jira/browse/HIVE-4387 There is a patch available there for branch-0.11. I will try to build with that. On Thu, Jun 13, 2013 at 10:14 AM, amareshwari sriramdasu < amareshw...@gmail.com> wrote: > Nitin, > > Hive does not compile with jdk7. You have to use jdk6 for compiling > > > On Wed, Jun 12, 2013 at 9:42 PM, Nitin Pawar >wrote: > > > I tried the build on trunk > > > > i did not hit the issue of make-pom but i hit the issue of jdbc with > jdk7. > > I will apply the patch and try again > > > > > > On Wed, Jun 12, 2013 at 4:48 PM, amareshwari sriramdasu < > > amareshw...@gmail.com> wrote: > > > >> Hello, > >> > >> ant maven-build -Dmvn.publish.repo=local fails to build hcatalog with > >> following error : > >> > >> > >> /home/amareshwaris/hive/build. > >> xml:121: The following error occurred while executing this line: > >> /home/amareshwaris/hive/build.xml:123: The following error occurred > while > >> executing this line: > >> Target "make-pom" does not exist in the project "hcatalog". > >> > >> Was curious to know if I'm only one facing this or Is there anyother way > >> to > >> publish maven artifacts locally? > >> > >> Thanks > >> Amareshwari > >> > > > > > > > > -- > > Nitin Pawar > > > -- Nitin Pawar
Re: ant maven-build not working in trunk
I tried the build on trunk i did not hit the issue of make-pom but i hit the issue of jdbc with jdk7. I will apply the patch and try again On Wed, Jun 12, 2013 at 4:48 PM, amareshwari sriramdasu < amareshw...@gmail.com> wrote: > Hello, > > ant maven-build -Dmvn.publish.repo=local fails to build hcatalog with > following error : > > > /home/amareshwaris/hive/build. > xml:121: The following error occurred while executing this line: > /home/amareshwaris/hive/build.xml:123: The following error occurred while > executing this line: > Target "make-pom" does not exist in the project "hcatalog". > > Was curious to know if I'm only one facing this or Is there anyother way to > publish maven artifacts locally? > > Thanks > Amareshwari > -- Nitin Pawar
adding a new property for hive history file HIVE-1708
Hi Guys, I am trying to work on this JIRA HIVE-1708<https://issues.apache.org/jira/browse/HIVE-1708> . I have added one property HIVE_CLI_ENABLE_LOGGING to enable or disable the history and tested it. I am stuck at a point what should be the default value for HIVE_CLI_HISTORY_FILE_PATH? Currently this is set to String historyDirectory = System.getProperty("user.home"); String historyFile = historyDirectory + File.separator + HISTORYFILE; Any ideas on what will be the default path then ? -- Nitin Pawar
Re: plugable partitioning
whenever you create a partition in hive, it needs to be registered with the metadata store. So short answer would be partition data is looked from metadata store instead of the actual source data. having a lot of partitions does slow down hive (around 1+). Normally have not seen anyone using hourly partitions. You may want to look at adding daily partition and bucket by hour. but if you are adding data directly into partition directories then there is no alternative other than adding partitions to metadata store manually apart from alter partition. if you are using hcatalog as metadata store then it does provide an api to register your partition so you can automate your data loading and registering both in a single flow. Others will correct me if I have made any wrong assumption On Mon, Apr 15, 2013 at 8:15 PM, Steve Hoffman wrote: > Looking for some pointers on where the partitioning is figured out in the > source when a query is executed. > I'm investigating an alternative partitioning scheme based on date patterns > (using external tables). > > The situation is that I have data being written to some HDFS root directory > with some dated pattern (i.e. /MM/DD). Today I have to run an alter > table to insert this partition every day. It gets worse if you have hourly > partitions. This seems like it can be described once (root + date > partition pattern in the metastore). > > So looking for some pointers on where in the code this is currently > handled. > > Thanks, > Steve > -- Nitin Pawar
[jira] [Commented] (HIVE-1708) make hive history file configurable
[ https://issues.apache.org/jira/browse/HIVE-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630609#comment-13630609 ] Nitin Pawar commented on HIVE-1708: --- I did add a new setting to hive-site.xml and made some change in the cli code and tested it for making hive history optional. I wanted to add one more property for the hive history file path but currently it is set to .hivehistory inside each individual users home directory. If I have to retain this property how will I keep the default value in hive-site.xml. As all the users will have different home directories on different linux distributions, how do we default the path then? can we change the file path to something like log location which resides inside /tmp ? Is that an acceptable change? > make hive history file configurable > --- > > Key: HIVE-1708 > URL: https://issues.apache.org/jira/browse/HIVE-1708 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain > > Currentlly, it is derived from > System.getProperty("user.home")/.hivehistory; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Hive compilation issues on branch-0.10 and trunk
Hi Mark, Yes I applied the patch and got it working with JDK7. Can we continue using JDK7? Thanks, Nitin On Apr 11, 2013 8:48 PM, "Mark Grover" wrote: > Nitin, > I have been able to build hive trunk with JDK 1.6. Did you try the > workaround listed in HIVE-4231? > > Mark > > On Thu, Apr 11, 2013 at 2:42 AM, Nitin Pawar >wrote: > > > Hello, > > > > I am trying to build hive on both trunk and branch-0.10 > > > > I have tried both SUN JDK6 and JDK7 > > With both the version running into different issues > > > > with JDK6 running into issue mentioned at HIVE-4231 > > with JDK7 running into issue mentioned at HIVE-3384 > > > > can somebody please help out with this? > > > > What would be recommended JDK version going forward for development > > activities ? > > > > -- > > Nitin Pawar > > >
Hive compilation issues on branch-0.10 and trunk
Hello, I am trying to build hive on both trunk and branch-0.10 I have tried both SUN JDK6 and JDK7 With both the version running into different issues with JDK6 running into issue mentioned at HIVE-4231 with JDK7 running into issue mentioned at HIVE-3384 can somebody please help out with this? What would be recommended JDK version going forward for development activities ? -- Nitin Pawar
[jira] [Commented] (HIVE-4231) Build fails with "WrappedRuntimeException: Content is not allowed in prolog." when _JAVA_OPTIONS="-Dfile.encoding=UTF-8"
[ https://issues.apache.org/jira/browse/HIVE-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628789#comment-13628789 ] Nitin Pawar commented on HIVE-4231: --- Even I am running into same issue when trying to build hive project I have same environment at Sho but OS is rhel 6.3 the log says exactly same and to add to the log contents for the failed xml file is [root@localhost branch-0.10]# cat /root/apache/hive/branch-0.10/build/builtins/metadata/class-info.xml Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/hadoop/hive/ql/exec/Description : Unsupported major.minor version 51.0 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) at java.lang.ClassLoader.defineClass(ClassLoader.java:615) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at org.apache.hive.pdk.FunctionExtractor.main(FunctionExtractor.java:27) [root@localhost branch-0.10]# > Build fails with "WrappedRuntimeException: Content is not allowed in prolog." > when _JAVA_OPTIONS="-Dfile.encoding=UTF-8" > > > Key: HIVE-4231 > URL: https://issues.apache.org/jira/browse/HIVE-4231 > Project: Hive > Issue Type: Bug >Affects Versions: 0.11.0 >Reporter: Sho Shimauchi >Priority: Minor > > Build failed with the follwing error when I set _JAVA_OPTIONS to > "-Dfile.encoding=UTF-8": > {code} > extract-functions: > [xslt] Processing > /Users/sho/src/apache/hive/build/builtins/metadata/class-info.xml to > /Users/sho/src/apache/hive/build/builtins/metadata/class-registration.sql > [xslt] Loading stylesheet > /Users/sho/src/apache/hive/pdk/scripts/class-registration.xsl > [xslt] : Error! Content is not allowed in prolog. > [xslt] : Error! > com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: Content is not > allowed in prolog. > [xslt] Failed to process > /Users/sho/src/apache/hive/build/builtins/metadata/class-info.xml > BUILD FAILED > /Users/sho/src/apache/hive/build.xml:517: The following error occurred while > executing this line: > /Users/sho/src/apache/hive/builtins/build.xml:37: The following error > occurred while executing this line: > /Users/sho/src/apache/hive/pdk/scripts/build-plugin.xml:118: > javax.xml.transform.TransformerException: > javax.xml.transform.TransformerException: > com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: Content is not > allowed in prolog. > at > com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:735) > at > com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:336) > at > org.apache.tools.ant.taskdefs.optional.TraXLiaison.transform(TraXLiaison.java:194) > at > org.apache.tools.ant.taskdefs.XSLTProcess.process(XSLTProcess.java:852) > at > org.apache.tools.ant.taskdefs.XSLTProcess.execute(XSLTProcess.java:388) > at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) > at org.apache.tools.ant.Task.perform(Task.java:348) > at org.apache.tools.ant.Target.execute(Target.java:390) > at org.apache.tools.ant.Target.performTasks(Target.java:411) > at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399) > at > org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38) > at org.apache.tools.ant.Project.executeTargets(Project.java:1251) > at org.apache.tools.ant.taskdefs.Ant.exec
[jira] [Created] (HIVE-2980) Show a warning or an error when the data directory is empty or not existing
Nitin Pawar created HIVE-2980: - Summary: Show a warning or an error when the data directory is empty or not existing Key: HIVE-2980 URL: https://issues.apache.org/jira/browse/HIVE-2980 Project: Hive Issue Type: Improvement Reporter: Nitin Pawar It looks like a good idea to show a warning or an error when the data directory is missing or empty. This will help in cut down the debugging time as well a good information to have on the deleted data -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2814) Can we have a feature to disable creating empty buckets on a larger number of buckets creates?
Can we have a feature to disable creating empty buckets on a larger number of buckets creates? --- Key: HIVE-2814 URL: https://issues.apache.org/jira/browse/HIVE-2814 Project: Hive Issue Type: Bug Reporter: Nitin Pawar Priority: Minor When we create buckets on a larger datasets, its not often that all the partitions have same number of buckets so we choose the largest possible number to capture the buckets mostly. It results into creating lot of empty buckets, which might be an overhead of hadoop as well as for hive queries. Also it takes a lot of time to just create empty buckets. Is there a way where I can say do not create empty buckets? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira