RE: HiveServer2 Kerberos
The prompts are a waste of time when kerberised - you can just hit enter twice at them once you have a ticket, so what's the point? I think the JIRA is valid (but if I recall correctly it is also a duplicate of an existing one). Sent from my Windows Phone From: Rahul Sharmamailto:kippy@gmail.com Sent: 26/08/2015 17:53 To: user@hive.apache.orgmailto:user@hive.apache.org Subject: Re: HiveServer2 Kerberos Even I (and a few others I know in different orgs) have been confused by the password prompts. So looking at the multiple users using their own credentials to authenticate, would that mean Kerberos is not used for authentication? Only for Authorization? In which case what will the authorization be verified against? The credentials user supplied or the principal that was supplied? At the risk of sounding too naive: * How is kerberos used with HiveServer2? Is it only used for secure (as in authenticated, authorized) communication with metastore and hadoop services? In which case having different user name and password for the user to login would make sense. * If its also used for authenticate/authorize the JDBC connection, then wouldn't separate keytabs/principals solve the multiple users use case? Again, my apologies if the questions are too naive. The docs, didn't answers these questions. I would be happy to help update them if others feel the questions are valid. On Wed, Aug 26, 2015 at 9:01 AM, kulkarni.swar...@gmail.commailto:kulkarni.swar...@gmail.com kulkarni.swar...@gmail.commailto:kulkarni.swar...@gmail.com wrote: Nope. Because the credentials are different. You might have multiple users using there own credentials to authenticate themselves but there are only single defined credentials to be used by the metastore server. On Wed, Aug 26, 2015 at 10:58 AM, Loïc Chanel loic.cha...@telecomnancy.netmailto:loic.cha...@telecomnancy.net wrote: I understand the behavior, but when Kerberos is enabled, isn't that a bit redundant ? Loïc CHANEL Engineering student at TELECOM Nancy Trainee at Worldline - Villeurbanne 2015-08-26 17:53 GMT+02:00 kulkarni.swar...@gmail.commailto:kulkarni.swar...@gmail.com kulkarni.swar...@gmail.commailto:kulkarni.swar...@gmail.com: my understanding is that after using kerberos authentication, you probably don’t need the password. That is not an accurate statement. Beeline is a JDBC client as compared to Hive CLI which is a thrift client to talk to HIveServer2. So it would need the password to establish that JDBC connection. If you look at the beeline console code[1], it actually first tries to read the javax.jdo.option.ConnectionUserName and javax.jdo.option.ConnectionPassword property which is the same username and password that you have setup your backing metastore DB with. If it is MySWL, it would be the password you set MySQL with or empty if you haven't(or are using derby). Kerberos is merely a tool for you to authenticate yourself so that you cannot impersonate yourself as someone else. [1] https://github.com/apache/hive/blob/3991dba30c5068cac296f32e24e97cf87efa266c/beeline/src/java/org/apache/hive/beeline/Commands.java#L1117-L1125 On Wed, Aug 26, 2015 at 10:13 AM, Loïc Chanel loic.cha...@telecomnancy.netmailto:loic.cha...@telecomnancy.net wrote: Here it is : https://issues.apache.org/jira/browse/HIVE-11653 Loïc CHANEL Engineering student at TELECOM Nancy Trainee at Worldline - Villeurbanne 2015-08-25 23:10 GMT+02:00 Sergey Shelukhin ser...@hortonworks.commailto:ser...@hortonworks.com: Sure! From: Loïc Chanel loic.cha...@telecomnancy.netmailto:loic.cha...@telecomnancy.net Reply-To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Date: Tuesday, August 25, 2015 at 00:23 To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Subject: Re: HiveServer2 Kerberos It is the case. Would you like me to fill a JIRA about it ? Loïc CHANEL Engineering student at TELECOM Nancy Trainee at Worldline - Villeurbanne 2015-08-24 19:24 GMT+02:00 Sergey Shelukhin ser...@hortonworks.commailto:ser...@hortonworks.com: If that is the case it sounds like a bug… From: Jary Du jary...@gmail.commailto:jary...@gmail.com Reply-To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Date: Thursday, August 20, 2015 at 08:56 To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Subject: Re: HiveServer2 Kerberos My understanding is that it will always ask you user/password even though you don’t need them. It is just the way how hive is setup. On Aug 20, 2015, at 8:28 AM, Loïc Chanel loic.cha...@telecomnancy.netmailto:loic.cha...@telecomnancy.net wrote: !connect jdbc:hive2://192.168.6.210:1/db;principal=hive/hiveh...@westeros.wlhttp://192.168.6.210:1/db;principal=hive/hiveh...@westeros.wl
RE: Partition Columns
Hi Appan, I think the answer is that the parser is not able to detect that partitions are useful in Query 2, because the where condition is on a derived field. i.e. Hive can tell that if you say where some_partition_field=”some partition value” then it only needs to scan that partition, but if you bury the partition columns in a derived field like in Query 2 it is unable to spot that and so does a full table scan. I think (but don’t know for sure) that this will be fairly typical of all SQL engines. Your best bet is to use direct conditions like in Query 1. In this case it may have been better for you to persist a field containing the whole date and partition on that instead, in order to make it simpler to pick up a date range along the lines of Query2. Thanks, Martin. From: Appan Thirumaligai [mailto:appanhiv...@gmail.com] Sent: 15 May 2015 01:18 To: user@hive.apache.org Subject: Re: Partition Columns Mungeol, I did check the # of mappers and that did not change between the two queries but when I ran a count(*) query the total execution time reduced significantly for Query1 vs Query2. Also, the amount data the query reads does change when the where clause changes. I still can't explain why one is faster over the other. Thanks, Appan On Thu, May 14, 2015 at 4:46 PM, Mungeol Heo mungeol@gmail.commailto:mungeol@gmail.com wrote: Hi, Appan. you can just simply check the amount of data your query reads from the table. or the number of the mapper for running that query. then, you can know whether it filtering or scanning all table. Of course, it is a lazy approach. but, you can give a try. I think query 1 should work fine. because I am using a lot of that kind of queries and it works fine for me. Thanks, mungeol On Fri, May 15, 2015 at 8:31 AM, Appan Thirumaligai appanhiv...@gmail.commailto:appanhiv...@gmail.com wrote: I agree with you Viral. I see the same behavior as well. We are on Hive 0.13 for the cluster where I'm testing this. On Thu, May 14, 2015 at 2:16 PM, Viral Bajaria viral.baja...@gmail.commailto:viral.baja...@gmail.com wrote: Hi Appan, In my experience I have seen that Query 2 does not use partition pruning because it's not a straight up filtering and involves using functions (aka UDFs). What version of Hive are you using ? Thanks, Viral On Thu, May 14, 2015 at 1:48 PM, Appan Thirumaligai appanhiv...@gmail.commailto:appanhiv...@gmail.com wrote: Hi, I have a question on Hive Optimizer. I have a table with partition columns eg.,Sales partitioned by year, month, day. Assume that I have two years worth of data on this table. I'm running two queries on this table. Query 1: Select * from Sales where year=2015 and month = 5 and day between 1 and 7 Query 2: Select * from Sales where concat_ws('-',cast(year as string),lpad(cast(month as string),2,'0'),lpad(cast(day as string),2,'0')) between '2015-01-01' and '2015-01-07' When I ran Explain command on the above two queries I get a Filter operation for the 2nd Query and there is no Filter Operation for the first query. My question is: Do both queries use the partitions or is it used only in Query 1 and for Query 2 it will be a scan of all the data? Thanks for your help. Thanks, Appan Registered in England and Wales at Players House, 300 Attercliffe Common, Sheffield, S9 2AG. Company number 05935923. This email and its attachments are confidential and are intended solely for the use of the addressed recipient. Any views or opinions expressed are those of the author and do not necessarily represent Jaywing. If you are not the intended recipient, you must not forward or show this to anyone or take any action based upon it. Please contact the sender if you received this in error.
Clear up Hive scratch directory
Hi, One of my users tried to run an HUGE join, which failed due to a lack of space in HDFS. This has resulted in a large amount of data remaining in the Hive scratch directory which I need to clear down. I've tried setting hive.start.cleanup.scratchdir to true and restarting Hive, but it didn't tidy it up. So, I'm wondering if it is safe to just delete the content of the directory in HDFS (while Hive is stopped). Could anyone advise please? Many thanks, Martin. Registered in England and Wales at Players House, 300 Attercliffe Common, Sheffield, S9 2AG. Company number 05935923. This email and its attachments are confidential and are intended solely for the use of the addressed recipient. Any views or opinions expressed are those of the author and do not necessarily represent Jaywing. If you are not the intended recipient, you must not forward or show this to anyone or take any action based upon it. Please contact the sender if you received this in error.