Re: Question regarding lock manager

2021-09-03 Thread Alan Gates
You do not need ZooKeeper to use ACID in Hive. The first thing I would check is that you have configured your system as described on this page: https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions. Also, make sure you have not set hive.lock.manager to zookeeper. There are other

[ANNOUNCE] Apache Hive 2.3.7 Released

2020-04-19 Thread Alan Gates
The Apache Hive team is proud to announce the release of Apache Hive version 2.3.7. The Apache Hive (TM) data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache Hadoop (TM), it provides, among others: * Tools to enable

Re: metastore without hadoop

2020-04-10 Thread Alan Gates
It needs Hadoop libraries; it still uses some HDFS libraries for file reading and password management. It does not require a running, installed Hadoop. It does not ship with the required Hadoop libraries to avoid version clashes when installed on a running Hadoop system. Alan. On Thu, Apr 9,

Re: If Hive Metastore is compatibility with MariaDB version 10.x.?

2020-01-17 Thread Alan Gates
Hive is tested against MariaDB 5.5, so I can't say whether it will work against version 10. You would need to do some testing with it to see. Alan. On Fri, Jan 17, 2020 at 4:29 AM Oleksiy S wrote: > Hi all. > > Could you please help? Customer asked if Hive Metastore is compatible with >

Re: Locks with ACID: need some clarifications

2019-09-09 Thread Alan Gates
say "you just can't have two simultaneous deletes in the same > partition", simultaneous means for the same transaction ? > If a create 2 "transactions" for 2 deletes on the same table/partition it > works. Am I right ? > > > Le lun. 9 sept. 2019 à 19:04, Alan Ga

Re: Locks with ACID: need some clarifications

2019-09-09 Thread Alan Gates
In Hive 2 update and delete take what are called semi-shared locks (meaning they allow shared locks through, while not allowing other semi-shared locks), and insert and select take shared locks. So you can insert or select while deleting, you just can't have two simultaneous deletes in the same

[ANNOUNCE] Apache Hive 3.1.2 released

2019-08-27 Thread Alan Gates
The Apache Hive team is proud to announce the release of Apache Hive version 3.1.2. The Apache Hive (TM) data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache Hadoop (TM), it provides, among others: * Tools to enable easy

[ANNOUNCE] Apache Hive 2.3.6 Released

2019-08-23 Thread Alan Gates
The Apache Hive team is proud to announce the release of Apache Hive version 2.3.6. The Apache Hive (TM) data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache Hadoop (TM), it provides, among others: * Tools to enable easy

Re: Question on Hive metastore thrift uri

2019-06-25 Thread Alan Gates
It depends on how you configure the system. If you are using HS2 you can configure it to talk directly to the metastoredb (by providing it with the JDBC connection information and setting the metastore thrift url to localhost) or to talk through a metastore server instance (by not providing the

Re: Restrict users from creating tables in default warehouse

2019-06-06 Thread Alan Gates
The easiest way to do this is through grant and revoke statements and/or file permissions. Hive has several authorization schemes (storage based auth, sql standard auth, integration with Ranger and Sentry) added over several releases. Which version of Hive are you using and which, if any, of

Re: Hive Insert and Select only specific columns ( not all columns ) - Partitioned table

2019-05-30 Thread Alan Gates
You need to provide a value for the deptno partition key. You can't insert into a partitioned table without providing a value for the partition column. You can either give it a static value: insert into table emp_parquet partition (deptno = 'x') select empno, ename from emp or you can set it

Re: hcatalog and hiveserver2

2019-05-24 Thread Alan Gates
HCatalog was built as an interface to allow tools such as Pig and MapReduce to access Hive tabular data, for both read and write. In more recent versions of Hive, HCatalog has not been updated to support the newest features, such as reading or writing transactional data or, in Hive 3.x, accessing

Re: Consuming delta from Hive tables

2019-05-20 Thread Alan Gates
an use write_id to consume only > updated rows. > Store the maximum write_id(X) seen in the result and next time query for > all rows with row_id greater than X. > > Thanks, > Bhargav > > On Fri, May 17, 2019 at 10:37 PM Alan Gates wrote: > >> Sorry, looks like you sen

Re: Consuming delta from Hive tables

2019-05-17 Thread Alan Gates
Sorry, looks like you sent this earlier and I missed it. A couple of things. One, write_id is per transaction per table. So for table T, all rows written in w1 will have the same write_id, though they will each have their own monotonically increasing row_ids. Row_ids are scoped by a write_id,

Re: Any HIVE DDL statement takes minutes to execute

2019-05-16 Thread Alan Gates
no issue. > > > > HTH > > > > > > On Thu, 16 May 2019 at 08:16, Iulian Mongescu > wrote: > > Hi Alan, > > > > I’m using MySQL (Mariadb) for the metastore and I was thinking on this > possibility too but from all my tests on metastore database th

Re: Any HIVE DDL statement takes minutes to execute

2019-05-15 Thread Alan Gates
What are you using as the RDBMS for your metastore? A first place I'd look is if the communications with the RDBMS are slow for some reason. Alan. On Wed, May 15, 2019 at 10:34 AM Iulian Mongescu wrote: > Hello, > > > > I'm working on a HDP-2.6.5.0 cluster with kerberos enabled and I have a >

[ANNOUNCE] Apache Hive 2.3.5 Released

2019-05-15 Thread Alan Gates
The Apache Hive team is proud to announce the release of Apache Hive version 2.3.5. The Apache Hive (TM) data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache Hadoop (TM), it provides, among others: * Tools to enable easy

Re: Consuming delta from Hive tables

2019-05-06 Thread Alan Gates
The other issue is an external system has no ability to control when the compactor is run (it rewrites deltas into the base files and thus erases intermediate states that would interest you). The mapping of writeids (table specific) to transaction ids (system wide) is also cleaned intermittently,

Re: HS2: Permission denied for my own table?

2019-04-17 Thread Alan Gates
See https://cwiki.apache.org/confluence/display/Hive/Setting+up+HiveServer2#SettingUpHiveServer2-Impersonation Alan. On Tue, Apr 16, 2019 at 10:03 PM Kaidi Zhao wrote: > Hello! > > Did I miss anything here or it is an known issue? Hive 1.2.1, hadoop > 2.7.x, kerberos, impersonation. > > Using

Re: How to update Hive ACID tables in Flink

2019-03-12 Thread Alan Gates
/docs/acid.html > The INSERT/UPDATE/DELETE seems to be implemented: > OPERATIONSERIALIZATION > INSERT 0 > UPDATE 1 > DELETE 2 > Do you think this approach is suitable ? > > > > Le mar. 12 mars 2019 à 19:30, Alan Gates a écrit : > >> Have you looked at Hive's st

Re: Read Hive ACID tables in Spark or Pig

2019-03-12 Thread Alan Gates
d friends in the >> process. Hope one day all the friends speak together again: pig, spark, >> presto read/write ACID together. >> >> On Sat, Mar 09, 2019 at 02:23:48PM -0800, Alan Gates wrote: >> > There's only been one significant change in ACID that requires di

Re: How to update Hive ACID tables in Flink

2019-03-12 Thread Alan Gates
Have you looked at Hive's streaming ingest? https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest It is designed for this case, though it only handles insert (not update), so if you need updates you'd have to do the merge as you are currently doing. Alan. On Mon, Mar 11, 2019 at

Re: Read Hive ACID tables in Spark or Pig

2019-03-09 Thread Alan Gates
ors ? > > Thanks > > > On Wed, Mar 06, 2019 at 09:51:51AM -0800, Alan Gates wrote: > > Pig is in the same place as Spark, that the tables need to be compacted > first. > > The issue is that outside readers don't understand which records in the > delta > > files a

Re: Read Hive ACID tables in Spark or Pig

2019-03-06 Thread Alan Gates
Pig is in the same place as Spark, that the tables need to be compacted first. The issue is that outside readers don't understand which records in the delta files are valid and which are not. Theoretically all this is possible, as outside clients could get the valid transaction list from the

Re: Standalone Metastore Question

2019-02-26 Thread Alan Gates
The standalone metastore released in 3.0 is the exact same metastore released with Hive 3.0. The only differences are in the install tool 'schematool' and the start and stop script. Hive 3 is being used in production a number of places. I don't know if anyone is running the metastore alone in

Re: Difference in performance of temp table vs subqueries

2019-01-24 Thread Alan Gates
That's a broad question and it depends on what you're doing. Since temp tables will materialize the intermediate result while subqueries will not I'd guess in most cases subqueries are faster. But again, it depends on what you're doing, and you'd need to benchmark your particular queries both

Re: Hive Metastore Hook to to fire only on success

2018-10-05 Thread Alan Gates
Which version of Hive are you on and which hook are you seeing fire? Based on looking at the master code you should only see the commitCreateTable hook call if the creation succeeds. Alan. On Thu, Oct 4, 2018 at 12:36 AM Daniel Haviv wrote: > Hi, > I'm writing a HMS hook and I noticed that

Re: How to Grant All Privileges for All Databases except one in Hive SQL

2018-09-21 Thread Alan Gates
> Yes i am looking something like this only and since it is not available, > does that mean i have to go for each table ? > > I am asking because we have many DBs and a lot of tables within each DB so > is there any other way ? > > Regards, > Anup Tiwari > > > On

Re: Question about OVER clause

2018-09-21 Thread Alan Gates
This article might be helpful. It's for SQL Server, but the semantics should be similar. https://www.sqlpassion.at/archive/2015/01/22/sql-server-windowing-functions-rows-vs-range/ Alan. On Wed, Sep 19, 2018 at 6:47 AM 孙志禹 wrote: > Dears, >What is the difference between *ROW BETWEEN* and

Re: How to Grant All Privileges for All Databases except one in Hive SQL

2018-09-17 Thread Alan Gates
+-+-+-++---++--+ > | anup ||| | readonly| > ROLE| SELECT | false | 1537187896000 | hadoop | > > +---+++-+-+-++-

Re: How to Grant All Privileges for All Databases except one in Hive SQL

2018-09-14 Thread Alan Gates
You can see a full list of what grant supports at https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization#SQLStandardBasedHiveAuthorization-Grant There is no "grant x to user on all databases" or regex expressions for database names. So you'll have to do the

Re: Hive Metada as a microservice

2018-07-05 Thread Alan Gates
In 3.0, you can download the metastore as a separate artifact, either source or binary (e.g. http://ftp.wayne.edu/apache/hive/hive-standalone-metastore-3.0.0/). It does not require any other parts of Hive beyond what's released in that artifact. I'm not sure if this meets your definition of a

Re: drop partitions

2018-06-18 Thread Alan Gates
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DropPartitions Alan. On Sat, Jun 16, 2018 at 8:03 PM Mahender Sarangam < mahender.bigd...@outlook.com> wrote: > Hi All, > > What is right syntax for dropping the partitions. Alter table drop if > exists

Re: Oracle 11g Hive 2.1 metastore backend

2018-06-06 Thread Alan Gates
We currently run our Oracle tests against 11g, but that is only for the 3.0 and beyond releases. Given the error I am guessing this is a result of the Oracle version plus the datanucleus version, which we changed between 2.1 and 2.3. Alan. On Wed, Jun 6, 2018 at 12:12 PM Mich Talebzadeh wrote:

Re: Is 'application' a reserved word?

2018-05-30 Thread Alan Gates
It is. You can see the definitive list of keywords at https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g (Note this is for the master branch, you can switch the branch around to find the list for a particular release.) It would be good to file a

Re: Combining hive tables as one query

2018-05-15 Thread Alan Gates
in same functionality as in > postgres and views are also not helping here. > > On Tue, May 15, 2018 at 11:19 AM, Alan Gates <alanfga...@gmail.com> wrote: > >> In general this is done using joins, as in all SQL engines. A google >> search on "intro to SQL joins&q

Re: Combining hive tables as one query

2018-05-15 Thread Alan Gates
In general this is done using joins, as in all SQL engines. A google search on "intro to SQL joins" will suggest a number of resources, for example https://www.essentialsql.com/get-ready-to-learn-sql-12-introduction-to-database-joins/ Alan. On Tue, May 15, 2018 at 7:37 AM, Sowjanya Kakarala

Re: Dynamic vs Static partition

2018-01-08 Thread Alan Gates
When doing dynamic partitioning, Hive needs to look at each record and determine which partition to place it in. In the case where all records go to the same partition, it is more efficient to tell Hive up front, that is, to use static partitioning. So you can use dynamic partition for large

Re: Options for connecting to Apache Hive

2017-11-10 Thread Alan Gates
There are ODBC drivers available for Hive, though they aren’t part of the open source package and are not free. Google can help you find them. As Elliot says, you can use the Thrift protocol, which is what the JDBC driver uses. You can find the thrift definition in the code at

Re: HCatClient vs HiveMetaStoreClient (or IMetaStoreClient)

2017-11-10 Thread Alan Gates
HCatClient is useful if you are already using HCat. If not, use HiveMetaStoreClient. It’s been kept much more up to date. Alan. On Fri, Nov 10, 2017 at 9:23 AM, Patel,Stephen wrote: > From a cursory inspection, it seems that HCatClient provides a subset of > the

Re: on master branch, hive code has some itests maven build error

2017-09-25 Thread Alan Gates
I would suggest removing -Pdist in the initial mvn command. That should only be used to build tarballs for distribution. So your initial mvn command should just be: mvn clean install -DskipTests Alan. On Sat, Sep 23, 2017 at 3:55 AM, eric wong wrote: > I try to add some q

Re: please help unsubscribing from mailing lists

2017-08-15 Thread Alan Gates
I think http://untroubled.org/ezmlm/manual/Unsubscribing.html#Unsubscribing has what you need. Alan. On Tue, Aug 15, 2017 at 1:09 PM, Chris Drome wrote: > I am currently subscribed to all three Hive mailing lists (user, dev, > commits) using cdr...@yahoo-inc.com. > > I'm

Re: FYI: Backports of Hive UDFs

2017-06-02 Thread Alan Gates
Research Engineer, Treasure Data, Inc. > http://myui.github.io/ > > 2017-06-02 2:24 GMT+09:00 Alan Gates <alanfga...@gmail.com>: > > I'm curious why these can't be backported inside Hive. If someone is > > willing to do the work to do the backport we can check them into the &g

Re: FYI: Backports of Hive UDFs

2017-06-01 Thread Alan Gates
I'm curious why these can't be backported inside Hive. If someone is willing to do the work to do the backport we can check them into the Hive 1 branch. On Thu, Jun 1, 2017 at 1:44 AM, Makoto Yui wrote: > Hi, > > I created a repository for backporting recent Hive UDFs (as of

Re: Compaction - get compacted files

2017-04-13 Thread Alan Gates
Answers inline. Alan. > On Mar 29, 2017, at 03:08, Riccardo Iacomini > wrote: > > Hello, > I have some questions about the compaction process. I need to manually > trigger compaction operations on a standard partitioned orc table (not ACID), > and be able to

Re: Adding a Hive Statement of SQL Conformance to the docs

2017-01-13 Thread Alan Gates
+1. I think this will be great for existing and potential Hive users. Alan. > On Jan 13, 2017, at 9:09 AM, Carter Shanklin wrote: > > I get asked from time to time what Hive's level of SQL conformance is, and > it's difficult to provide a clean answer. Most SQL

Re: HMS connections to meta db

2016-12-19 Thread Alan Gates
Do you mean the connection between the Hive client and the Hive metastore (if you are using the command line?) or the connection between the metastore server code and the RDBMS. The connection to the RDBMS uses JDBC connection pooling to avoid making and tearing down many connections. The

Re: [ANNOUNCE] Apache Hive 2.1.1 Released

2016-12-08 Thread Alan Gates
Apache keeps just the latest version of each release on the mirrors. You can find all Hive releases at https://archive.apache.org/dist/hive/ if you need 2.1.0. Alan. > On Dec 8, 2016, at 14:40, Stephen Sprague wrote: > > out of curiosity any reason why release 2.1.0

Re: I delete my table in hive,but the file in HDFS not be deleted

2016-12-06 Thread Alan Gates
Is the table external or managed? External tables do not remove their data when dropped, managed tables do. Alan. > On Dec 6, 2016, at 18:08, 446463...@qq.com wrote: > > I meet a problem in hive. > > I drop a table in hive and the table name ' user_info_20161206' >

Re: Compaction in hive

2016-12-06 Thread Alan Gates
What exactly do you mean by compaction? Hive has a compactor that runs over ACID tables to handle the delta files[1], but I’m guessing you don’t mean that. Are you wanting to concatenate files in existing tables? The usual way to do that is alter table concatenate[2]. Or do you mean

Re: Difference between MANAGED_TABLE and EXTERNAL_TABLE in org.apache.hadoop.hive.metastore.TableType

2016-12-01 Thread Alan Gates
Hive does not assume that it owns the data for an external table. Thus when an external table is dropped, the data is not deleted. People often use this as a way to load data into a directory in HDFS and then “cast” a table structure over it by creating an external table with that directory

Re: Problems with Hive Streaming. Compactions not working. Out of memory errors.

2016-11-29 Thread Alan Gates
I’m guessing that this is an issue in the metastore database where it is unable to read from the transaction tables due to the ingestion rate. What version of Hive are you using? What database are you storing the metadata in? Alan. > On Nov 29, 2016, at 00:05, Diego Fustes Villadóniga

Re: Adding a New Primitive Type in Hive

2016-11-21 Thread Alan Gates
A few questions: 1) What operators do you envision UUID supporting? Are there UDFs specific to it? Are there constraints on assuring its uniqueness? 2) A more general form of question 1, what about UUID is different from a string or decimal(20, 0) (either of which should be able to store a

Re: kylin log file is to large

2016-11-13 Thread Alan Gates
This question would be better asked on the kylin lists, since Hive doesn’t start either kylin.sh or diag.sh. Alan. > On Nov 14, 2016, at 06:29, 446463...@qq.com wrote: > > Hi: > I run kylin instance in mine test environment.I find that the kylin log > increases so fast. and my hard disk is

Re: Hive metadata on Hbase

2016-10-24 Thread Alan Gates
Some thoughts on this: First, there’s no plan to remove the option to use an RDBMS such as Oracle as your backend. Hive’s RawStore interface is built such that various implementations of the metadata storage can easily coexist. Obviously different users will make different choices about what

Re: Hive in-memory offerings in forthcoming releases

2016-10-10 Thread Alan Gates
Hive doesn’t usually publish long term roadmaps. I am not familiar with either SAP ASE or Oracle 12c so I can’t say whether Hive is headed in that direction or not. We see LLAP as very important for speeding up Hive processing, especially in the cloud where fetches from blob storage are very

Re: Hive orc use case

2016-09-26 Thread Alan Gates
which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 26 September 2016 at 18:54, Alan Gates <alanfga...@gmail.com> wro

Re: Hive orc use case

2016-09-26 Thread Alan Gates
ORC does not store data row by row. It decomposes the rows into columns, and then stores pointer to those columns, as well as a number of indices and statistics, in a footer of the file. Due to the footer, in the simple case you cannot read the file before you close it or append to it. We

Re: How can I force Hive to start compaction on a table immediately

2016-08-01 Thread Alan Gates
There’s no way to force immediate compaction. If there are compaction workers in the metastore that aren’t busy they should pick that up immediately. But there isn’t an ability to create a worker thread and start compacting. Alan. > On Aug 1, 2016, at 14:50, Mich Talebzadeh

Re: Hive compaction didn't launch

2016-07-28 Thread Alan Gates
txnIds in partition B > deltas, compaction won't happen. So open transaction in partition A blocks > compaction in partition B. That's seems wrong to me. > > On Thu, Jul 28, 2016 at 7:06 PM, Alan Gates <alanfga...@gmail.com> wrote: > Hive is doing the right thing there, as it can

Re: Hive compaction didn't launch

2016-07-28 Thread Alan Gates
, Jul 27, 2016 at 8:46 PM, Igor Kuzmenko <f1she...@gmail.com> wrote: > Thanks for reply, Alan. My guess with Storm was wrong. Today I get same > behavior with running Storm topology. > Anyway, I'd like to know, how can I check that transaction batch was closed > correctly? &g

Re: Hive compaction didn't launch

2016-07-27 Thread Alan Gates
I don’t know the details of how the storm application that streams into Hive works, but this sounds like the transaction batches weren’t getting closed. Compaction can’t happen until those batches are closed. Do you know how you had storm configured? Also, you might ask separately on the

Re: Want to be one contributor

2016-07-18 Thread Alan Gates
esume the build with the > command > [ERROR] mvn -rf :hive-common​ > > > Rgds, > Alpesh > > On Thu, Jul 14, 2016 at 5:11 PM, Alan Gates <alanfga...@gmail.com> wrote: > https://cwiki.apache.org/confluence/display/Hive/Home#Home-ResourcesforContributors >

Re: Want to be one contributor

2016-07-14 Thread Alan Gates
https://cwiki.apache.org/confluence/display/Hive/Home#Home-ResourcesforContributors is a good place to start. Welcome to Hive. Alan. > On Jul 14, 2016, at 16:01, Alpesh Patel wrote: > > Hi Guys, > > I am part of this group since 1 year. Just an audience and now want

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Alan Gates
> On Jul 11, 2016, at 16:22, Mich Talebzadeh wrote: > > > • If I add LLAP, will that be more efficient in terms of memory usage > compared to Hive or not? Will it keep the data in memory for reuse or not. > Yes, this is exactly what LLAP does. It keeps

Re: Delete hive partition while executing query.

2016-06-07 Thread Alan Gates
java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1036) > ... 15 more > > So second thread definitely waits until first thread completes and than make > a partition dr

Re: Spark support for update/delete operations on Hive ORC transactional tables

2016-06-06 Thread Alan Gates
This JIRA https://issues.apache.org/jira/browse/HIVE-12366 moved the heartbeat logic from the engine to the client. AFAIK this was the only issue preventing working with Spark as an engine. That JIRA was released in 2.0. I want to stress that to my knowledge no one has tested this combination

Re: Delete hive partition while executing query.

2016-06-06 Thread Alan Gates
Do you have the system configured to use the DbTxnManager? See https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration for details on how to set this up. The transaction manager is what manages locking and makes sure that your queries don’t stomp each

Re: Hello, I have an issue about hcatalog

2016-05-18 Thread Alan Gates
This looks to me like a Hadoop issue rather than Hive. It appears that you cannot connect to HDFS. Have you tried connecting to HDFS outside of Hive/HCatalog? Alan. > On May 18, 2016, at 04:24, Mark Memory wrote: > > hello guys, sorry to bother you. > > I'm using

Re: Hive SQL based authorization don't have support for group

2016-05-12 Thread Alan Gates
By group here I assume you mean posix style file group (from HDFS)? No, there isn’t any connection right now. We’d like to be able to pick up groups from HDFS and define those as roles in Hive, but we haven’t added that feature. You’ll need to define a role that includes the members of that

Re: Standard Deviation in Hive 2 is still incorrect

2016-04-19 Thread Alan Gates
Have you filed a JIRA ticket for this? If not, please do so we can track it and fix it. Patches are welcomed as well. :) Alan. > On Apr 4, 2016, at 15:27, Mich Talebzadeh wrote: > > > Hi, > > I reported back in April 2015 that what Hive calls Standard Deviation

Re: Hive footprint

2016-04-18 Thread Alan Gates
> On Apr 18, 2016, at 15:34, Mich Talebzadeh wrote: > > Hi, > > > If Hive had the ability (organic) to have local variable and stored procedure > support then it would be top notch Data Warehouse. Given its metastore, I > don't see any technical reason why it

Re: [VOTE] Bylaws change to allow some commits without review

2016-04-18 Thread Alan Gates
+1. Alan. > On Apr 18, 2016, at 06:53, Lars Francke wrote: > > Thanks for the votes so far. Can I get some more people interested please? > > On Fri, Apr 15, 2016 at 7:35 PM, Jason Dere wrote: > ​+1 > From: Lefty Leverenz

Re: Automatic Update statistics on ORC tables in Hive

2016-03-28 Thread Alan Gates
I resolved that as Won’t Fix. See the last comment on the JIRA for my rationale. Alan. > On Mar 28, 2016, at 03:53, Mich Talebzadeh wrote: > > Thanks. This does not seem to be implemented although the Jira says resolved. > It also mentions the timestamp of the

Re: Spark SQL is not returning records for HIVE transactional tables on HDP

2016-03-14 Thread Alan Gates
> On Mar 14, 2016, at 10:31, Mich Talebzadeh wrote: > > That is an interesting point Alan. > > Does this imply that Hive on Spark (Hive 2 encourages Spark or TEZ) is going > to have an issue with transactional tables? I was partially wrong. In HIVE-12366 Wei

Re: Spark SQL is not returning records for HIVE transactional tables on HDP

2016-03-14 Thread Alan Gates
I don’t know why you’re seeing Hive on Spark sometimes work with transactional tables and sometimes not. But note that in general it doesn’t work. The Spark runtime in Hive does not send heartbeats to the transaction/lock manager so it will timeout any job that takes longer than the heartbeat

Re: Hive StreamingAPI leaves table in not consistent state

2016-03-11 Thread Alan Gates
I believe this is an issue in the Storm Hive bolt. I don’t have an Apache JIRA on it, but if you ask on the Hortonworks lists we can connect you with the fix for the storm bolt. Alan. > On Mar 10, 2016, at 04:02, Igor Kuzmenko wrote: > > Hello, I'm using Hortonworks Data

Re: Hive Context: Hive Metastore Client

2016-03-09 Thread Alan Gates
One way people have gotten around the lack of LDAP connectivity in HS2 has been to use Apache Knox. That project’s goal is to provide a single login capability for Hadoop related projects so that users can tie their LDAP or Active Directory servers into Hadoop. Alan. > On Mar 8, 2016, at

Re: How does Hive do authentication on UDF

2016-03-01 Thread Alan Gates
There are several Hive authorization schemes, but at the moment none of them restrict function use. At some point we’d like to add that feature to SQL standard authorization (see https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization ) but no one has done it

Re: Hive-2.0.1 Release date

2016-02-29 Thread Alan Gates
HPLSQL released with Hive in 2.0, but it hasn’t been integrated as in there isn’t a single parser that handles HPLSQL and Hive SQL. The work to integrate it is significant, and a new feature, so it definitely won’t be done in a bug fix release but in a feature bearing release (that is 2.x, not

Re: Hive 2 performance

2016-02-25 Thread Alan Gates
HPLSQL is part of Hive, but it is not fully integrated into Hive itself yet. It is still an external module that handles the control flow while passing Hive SQL into Hive via JDBC. We’d like to integrate it fully with Hive’s parser but we’re not there yet. Alan. > On Feb 25, 2016, at 14:26,

Re: Stroing boolean value in Hive table

2016-02-18 Thread Alan Gates
How the data is stored is up to the storage format (text, rcfile, orc, etc.). Do you mean in your text file you’d like booleans stored as 0 or 1? You could use the case statement to convert them to integers like: select case _boolvar_ when true then 1 when false then 0 end from … Alan. > On

Re: ORC format

2016-02-01 Thread Alan Gates
ORC does not currently expose a primary key to the user, though we have talked of having it do that. As Mich says the indexing on ORC is oriented towards statistics that help the optimizer plan the query. This can be very important in split generation (determining which parts of the input

Re: Indexes in Hive

2016-01-06 Thread Alan Gates
The issue with this is that HDFS lacks the ability to co-locate blocks. So if you break your columns into one file per column (the more traditional column route) you end up in a situation where 2/3 of the time only one of your columns is being locally read, which results in a significant

Re: Immutable data in Hive

2015-12-30 Thread Alan Gates
Traditionally data in Hive was write once (insert) read many. You could append to tables and partitions, add new partitions, etc. You could remove data by dropping tables or partitions. But there was no updates of data or deletes of particular rows. This was what was meant by immutable.

Re: Loop if table is not empty

2015-12-28 Thread Alan Gates
Have you looked at the new procedural HPL/SQL available in recent Hive? If you are using an older version of Hive you can check out hplsql.org, which allows you to install it separately. Alan. Thomas Achache December 28, 2015 at 2:30 Hi everyone, I am running

Re: Attempt to do update or delete using transaction manager that does not support these operations. (state=42000,code=10294)

2015-12-22 Thread Alan Gates
Also note that transactions only work with MR or Tez as the backend. The required work to have them work with Spark hasn't been done. Alan. Mich Talebzadeh December 22, 2015 at 9:43 Dropped and created table tt as follows: drop table if exists tt; create table

Re: Attempt to do update or delete using transaction manager that does not support these operations. (state=42000,code=10294)

2015-12-22 Thread Alan Gates
s email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility. *From:*Alan Gates [mailto:alanfga...@gmail.com] *Sent:* 22 December 2015 20:39 *To:* user@hive.apache.org *Subject:* Re: Attempt to do update or delete using transactio

Re: Difference between ORC and RC files

2015-12-21 Thread Alan Gates
ORC offers a number of features not available in RC files: * Better encoding of data. Integer values are run length encoded. Strings and dates are stored in a dictionary (and the resulting pointers then run length encoded). * Internal indexes and statistics on the data. This allows for more

Re: Hive partition load

2015-12-17 Thread Alan Gates
Yes, you can load different partitions simultaneously. Alan. Suyog Parlikar December 17, 2015 at 5:02 Hello everyone, Can we load different partitions of a hive table simultaneously. Is there any locking issues in that if yes what are they? Please find

Re: How to register permanent function during hive thrift server is running

2015-12-03 Thread Alan Gates
No restart of the thrift service should be required. Alan. Todd December 3, 2015 at 3:12 Hi, I am using Hive 0.14.0, and have hive thrift server running.During its running, I would use “create function” to add a permanent function, Does hive support this **without

Re: Query performance correlated to increase in delta files?

2015-11-20 Thread Alan Gates
Are you running the compactor as part of your metastore? It's occasionally compacts the delta files in order to reduce read time. See https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions for details. Alan. Sai Gopalakrishnan November

Re: ORC tables loading

2015-11-17 Thread Alan Gates
The reads and writes both happen in parallel, so as more nodes are available for read and write, at least in this case, the time stays roughly the same. Alan. James Pirz November 16, 2015 at 21:23 Hi, I am using Hive 1.2 with ORC tables on Hadoop 2.6 on a

Re: hive locking doubt

2015-11-16 Thread Alan Gates
You are correct that DbTxnManager does not support the explicit locking of tables. Instead it obtains locks based on SQL statements that are being executed. If you use the DummyTxnManager (the default) and set concurrency to true and the lock manager to ZooKeeperHiveLockManager then your

Re: clarification please

2015-10-29 Thread Alan Gates
Ashok Kumar October 28, 2015 at 22:43 hi gurus, kindly clarify the following please * Hive currently does not support indexes or indexes are not used in the query Mostly true. There is a create index, but Hive does not use the resulting index by

Re: insert timestamp values in Hive

2015-10-27 Thread Alan Gates
Actually, for INSERT VALUES you don't have to have a transactional table (you do to use UPDATE or DELETE). So I would expect this to work as is. What happens if you do: create table foo (x int); insert into foo values (5); select * from foo; Do you get 5 or null? This will tell whether the

Re: Locking when using the Metastore/HCatalog APIs.

2015-10-27 Thread Alan Gates
Answers inlined. Elliot West October 22, 2015 at 6:40 I notice from the Hive locking wiki page that locks may be acquired for a range of HQL DDL operations. I wanted to know how the locking scheme mapped

Re: Question about hive-jdbc

2015-10-21 Thread Alan Gates
The way to keep track of when things are getting done in Hive is to check the JIRA, https://issues.apache.org/jira/browse/HIVE I'm not aware of anyone working on those issues at the moment, but a search of the JIRA will tell you if anyone has filed a bug on it. Alan. Hafiz Mujadid

Re: View definition information

2015-10-15 Thread Alan Gates
It should certainly be possible. Can you file a JIRA adding this as a new feature, and if you're so inclined feel free to contribute a patch to add this. Alan. Rachna Jotwani Bakhru October 14, 2015 at 16:47 We are currently using the HCatalog API to get the Hive

Re: truncating tables via hcatalog api?

2015-10-08 Thread Alan Gates
That's correct, HCatClient doesn't provide that feature at this time. It would be easy enough to add if you want to provide a patch for it. Alan. Nathan Bamford October 6, 2015 at 12:14 Hello all, The product I work on using the HCatalog api

  1   2   >