how ALTER statement affects reading the ORC files before table schema change
Hi, Experts, We have an application which creates a ORC table in hive to store the events. The table schema needs to be changed dynamically since new data adds more new columns (append only). We noticed the application needs to use alter table to change the table schema before loading the new data in order to read previous records (before schema change) correctly. For example: create table t1 (c1 structf1:string, f2:int) partitions by (ds timestamp) stored as orc; load data inpath ‘/old/data/file/path’ into table t1 partition (ds=‘2015-06-02 00:00:00’); alter table t1 change c1 c1 structf1:string, f2:int, f3:string; load data inpath ‘/new/data/file/path’ into table t1 partition (ds=‘2015-06-02 00:00:00’); If ALTER is not executed, say, just load both old and new ORC files into another table with the final schema, the previous records can not be read at all with hitting ClassCastException. May I know what magic ALTER statement does to have ORC reader reading the file before schema change? Thanks very much for your help! Jessica
RE: Top N query
Is Inner Join supported by the Hive version you are using ? Regards Arpita Singh From: Ravisankar Mani [mailto:rrav...@gmail.com] Sent: Wednesday, June 03, 2015 11:29 AM To: user@hive.apache.org Subject: Top N query Hi everyone, FAILED: SemanticException [Error 10019]: Line 9:5 OR not supported in JOIN currently 'movieid_0'Exception occur while executing the following query. But i need the add the conditions. Kindly help other way to resolve the query. Please refer the following query SELECT IF(ISNOTNULL(SUM(recommend_ratings2.rating)),SUM(recommend_ratings2.rating), 0 ) AS rating ,recommend_ratings2.movieid AS movieid FROM default.recommend_ratings2 INNER JOIN (SELECT recommend_ratings2.movieid AS movieid_0 ,IF(ISNOTNULL(SUM(recommend_ratings2.rating)),SUM(recommend_ratings2.rating), 0 ) AS Measure_0 FROM default.recommend_ratings2 GROUP BY recommend_ratings2.movieid ORDER BY Measure_0 DESC limit 2 ) T_0 ON ( recommend_ratings2.movieid = T_0.movieid_0 or ((recommend_ratings2.movieid IS NULL) AND (T_0.movieid_0 IS NULL))) Regards Ravi This email and any attachments may contain confidential information and intellectual property (including copyright material). It is only for the use of the addressee(s) in accordance with any instructions contained within it. If you are not the addressee, you are prohibited from copying, forwarding, disclosing, saving or otherwise using it in any way. If you receive this email in error, please immediately advise the sender and delete it. Steria may monitor the content of emails within its network to ensure compliance with its policies and procedures. Emails are susceptible to alteration and their integrity (including origin) cannot be assured. Steria shall not be liable for any modification to a message, or for messages falsely sent.
MetaException(message:java.security.AccessControlException: Permission denied
Hi, I was running a spark job to insert overwrite hive table and got Permission denied. My question is why spark job did the insert by using user 'hive', not myself who ran the job? How can I fix the problem? val hiveContext = new HiveContext(sc) import hiveContext.implicits._ hiveContext.sql(INSERT OVERWRITE table 4dim ... ) Caused by: MetaException(message:java.security.AccessControlException: Permission denied: user=hive, access=WRITE, inode=/apps/hive/warehouse/wrf_tables/4dim/zone=2/z=1/year=2009/month=1:patcharee:hdfs:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:185) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6795) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6777) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6702) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAccess(FSNamesystem.java:9529) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkAccess(NameNodeRpcServer.java:1516) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.checkAccess(ClientNamenodeProtocolServerSideTranslatorPB.java:1433) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) ) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_partition_result$alter_partition_resultStandardScheme.read(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_partition_result$alter_partition_resultStandardScheme.read(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_partition_result.read(ThriftHiveMetastore.java) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_alter_partition(ThriftHiveMetastore.java:2033) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.alter_partition(ThriftHiveMetastore.java:2018) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_partition(HiveMetaStoreClient.java:1091) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89) at com.sun.proxy.$Proxy37.alter_partition(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.alterPartition(Hive.java:469) ... 26 more BR, Patcharee
Group by query
Hi everyone, I have selected specific column using aggregation . FAILED: SemanticException [Error 10002]: Line 2:26 Invalid column reference 'contactid . Exception occur while executing following query. But it working in sql server without using group by. Kindly refer the query. SELECT adventurepersoncontacts.contactid as contactid ,adventurepersoncontacts.fullname as fullname ,adventurepersoncontacts.age as age ,adventurepersoncontacts.emailaddress as emailaddress ,adventurepersoncontacts.phoneno as phoneno ,adventurepersoncontacts.modifieddate as modifieddate , COUNT(adventurepersoncontacts.contactid) as contact FROM default.adventurepersoncontacts Regards, Ravisankar M R
Hive Query
Ravi, Use if condition .. it works .. UseISNOTNULL instead, and give the replace field as third parameter for if condition. NOTE: false and null are together in if condition If(ISNOTNULL(sum(columnname), sum(columnname),0) as sumVendor Regards, Teja
Re: Write access to wiki
Done. Welcome to the Hive wiki team, Anant! -- Lefty On Tue, Jun 2, 2015 at 9:57 PM, anant nag nntna...@gmail.com wrote: Hi, I want to update the Hive Cli documentation with the information on how to use the feature added in HIVE-9664( Hive add jar command should be able to download and add jars from a repository). Could you please provide me write access to the wiki. My confluence username is *nntnag17* Thanks, -- Anant Nag Software Engineer LinkedIn Corporation an...@linkedin.com
Re: Top N query
Hi Arpita, Inner join working perfectly. But the 'or' condition not supported in inner join. Regards Ravi On Wed, Jun 3, 2015 at 6:39 AM, Arpita SINGH arpita.si...@steria.co.in wrote: Is Inner Join supported by the Hive version you are using ? Regards Arpita Singh *From:* Ravisankar Mani [mailto:rrav...@gmail.com] *Sent:* Wednesday, June 03, 2015 11:29 AM *To:* user@hive.apache.org *Subject:* Top N query Hi everyone, FAILED: SemanticException [Error 10019]: Line 9:5 OR not supported in JOIN currently 'movieid_0'Exception occur while executing the following query. But i need the add the conditions. Kindly help other way to resolve the query. Please refer the following query SELECT IF(ISNOTNULL(SUM(recommend_ratings2.rating)),SUM(recommend_ratings2.rating), 0 ) AS rating ,recommend_ratings2.movieid AS movieid FROM default.recommend_ratings2 INNER JOIN (SELECT recommend_ratings2.movieid AS movieid_0 ,IF(ISNOTNULL(SUM(recommend_ratings2.rating)),SUM(recommend_ratings2.rating), 0 ) AS Measure_0 FROM default.recommend_ratings2 GROUP BY recommend_ratings2.movieid ORDER BY Measure_0 DESC limit 2 ) T_0 ON ( recommend_ratings2.movieid = T_0.movieid_0 or ((recommend_ratings2.movieid IS NULL) AND (T_0.movieid_0 IS NULL))) Regards Ravi This email and any attachments may contain confidential information and intellectual property (including copyright material). It is only for the use of the addressee(s) in accordance with any instructions contained within it. If you are not the addressee, you are prohibited from copying, forwarding, disclosing, saving or otherwise using it in any way. If you receive this email in error, please immediately advise the sender and delete it. Steria may monitor the content of emails within its network to ensure compliance with its policies and procedures. Emails are susceptible to alteration and their integrity (including origin) cannot be assured. Steria shall not be liable for any modification to a message, or for messages falsely sent.
Re: Null condition hive query
Why don't you try IF(column_name IS NULL, 0) AS column_name; After this you can perform your aggregation on top of this. On Wed, Jun 3, 2015 at 11:16 AM, Ravisankar Mani rrav...@gmail.com wrote: Hi Everyone, I need to check condition measure value is null or not. If it is null is then replace 0 otherwise in returns same values It is a sample sql query SELECT ISNULL(SUM(CAST([Purchasing].[Vendor].[BusinessEntityID] AS decimal(38,6))),0) AS [BusinessEntityID], [Purchasing].[Vendor].[AccountNumber] AS [AccountNumber] FROM [Purchasing].[Vendor] GROUP BY [Purchasing].[Vendor].[AccountNumber] ORDER BY 2 ASC if i give 'sum(columnname,0) ' it replace 0 if any null found i have check in hive query ISNULL(columnname) its only returns true or false the how to i replace 0.. Could you please help regarding this query Regards. Ravi
Re: Group by query
Ravishankar, below query will works fine without including group by SELECT COUNT(adventurepersoncontacts.contactid) as contact FROM default.adventurepersoncontacts But when you are planning to get aggregated column with non-aggregated column you have to specify the level of aggregation. Why don't you include selected column in group by, I don't think so it will affect your output. Thanks, Bhagwan On Wed, Jun 3, 2015 at 4:05 PM, Ravisankar Mani rrav...@gmail.com wrote: Hi everyone, I have selected specific column using aggregation . FAILED: SemanticException [Error 10002]: Line 2:26 Invalid column reference 'contactid . Exception occur while executing following query. But it working in sql server without using group by. Kindly refer the query. SELECT adventurepersoncontacts.contactid as contactid ,adventurepersoncontacts.fullname as fullname ,adventurepersoncontacts.age as age ,adventurepersoncontacts.emailaddress as emailaddress ,adventurepersoncontacts.phoneno as phoneno ,adventurepersoncontacts.modifieddate as modifieddate , COUNT(adventurepersoncontacts.contactid) as contact FROM default.adventurepersoncontacts Regards, Ravisankar M R
Re: HIve Joins vs Pig Joins
Are Tez, CBO, and ORC are considered in this. With so much efforts invested to Hive especially from Hortonworks and Cloudera, I think Hive will have the future, such as Hive over spark and so on. On Wed, Jun 3, 2015 at 12:06 AM, Ritesh Kumar Singh riteshoneinamill...@gmail.com wrote: Check this paper : Pig-Hive benchmarks http://www.ibm.com/developerworks/library/ba-pigvhive/pighivebenchmarking.pdf Discusses these things in details. Cheers, Ritesh On Tue, Jun 2, 2015 at 11:32 AM, sreejesh s sreejesh...@yahoo.com wrote: Hi, I have read couple of articles that say pig joins perform better compared to Hive joins... Is that true ? if Yes could you please explain the reason. Thanks -- Thanks, Will
Re: Hive Query
Hi If(ISNOTNULL(sum(columnname), sum(columnname),0) as sumVendor Or *coalesce( sum(columnname),0) as ...* As explained here https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF, COALESCE(T v1, T v2, ...) Returns the first v that is not NULL, or NULL if all v's are NULL. hth GB On 6/3/2015 2:23 AM, Teja Kunapareddy wrote: Ravi, Use if condition .. it works .. UseISNOTNULL instead, and give the replace field as third parameter for if condition… NOTE: false and null are together in if condition If(ISNOTNULL(sum(columnname), sum(columnname),0) as sumVendor Regards, Teja -- The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation.
Re: Hive Query
Hey all, Has anyone else also found the coalesce function to be prone to some weird behaviours? e.g.1: Giving null when it shouldn’t. e.g.2: I had to change a coalesce(v1,v2,v3) to coalesce(v1,v2,v3,null) (???) otherwise the query would crash! Regards, João Alves On 03 Jun 2015, at 17:00, gabriel balan gabriel.ba...@oracle.com wrote: Hi If(ISNOTNULL(sum(columnname), sum(columnname),0) as sumVendor Or coalesce( sum(columnname),0) as ... As explained here https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF, COALESCE(T v1, T v2, ...) Returns the first v that is not NULL, or NULL if all v's are NULL. hth GB On 6/3/2015 2:23 AM, Teja Kunapareddy wrote: Ravi, Use if condition .. it works .. UseISNOTNULL instead, and give the replace field as third parameter for if condition… NOTE: false and null are together in if condition If(ISNOTNULL(sum(columnname), sum(columnname),0) as sumVendor Regards, Teja -- The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation.
Re: HIve Joins vs Pig Joins
These results are fairly old. Both Hive and Pig have changed significantly since then. Many of Hive's internal operators have been rewritten. Both can now run over Tez or Spark. So at this point it's hard to say which is better. The best option is to develop some tests based on your use case and try both to see which works best for you. Alan. Ritesh Kumar Singh mailto:riteshoneinamill...@gmail.com June 2, 2015 at 21:06 Check this paper : Pig-Hive benchmarks http://www.ibm.com/developerworks/library/ba-pigvhive/pighivebenchmarking.pdf Discusses these things in details. Cheers, Ritesh sreejesh s mailto:sreejesh...@yahoo.com June 1, 2015 at 23:02 Hi, I have read couple of articles that say pig joins perform better compared to Hive joins... Is that true ? if Yes could you please explain the reason. Thanks
Re: Query in HQL
modify to say 'CustomerId' On Wednesday, June 3, 2015 9:44 PM, Renuka Be renunalin...@gmail.com wrote: Hi Folks, Here is my query, select customerid as CustomerId from Customers It returns table with column name as 'customerid', ' CustomerId' in the query was not reflected into table. What shall i do to display column name as 'CustomerId'. Regards,Renuka
Hive 0.14: impersonation not working without system property
Hi, While executing hive queries through CliDriver, we are getting permission denied error on scratch dir when executing with impersonation. The error goes away when the system property HADOOP_PROXY_USER is set to the the proxy/impersonated user name. Attaching the sample program to reproduce the issue. This happens on HDP 2.2 (did not try on CDH 5u4 which also has hive 0.14). The sample program when executed without any cmd line parameters reproduces this issue. When a cmd line arg is provided, it sets the system property and the queries are executed without an error. Has anyone seen this? Seems like a possible regression from 0.13 to 0.14. It does not happen on CDH 5u2 which has hive 0.13. Kiran HiveCliSample.java Description: HiveCliSample.java
Re: Top N query
Hi gopal, Thanks for your response. I does't understand about your solution. Could you please provide exact query? Regards, Ravi On Wed, Jun 3, 2015 at 2:23 PM, Gopal Vijayaraghavan gop...@apache.org wrote: Hi, This particular case can be handled by a special case of the inner equijoin. ( recommend_ratings2.movieid = T_0.movieid_0 or ((recommend_ratings2.movieid IS NULL) AND (T_0.movieid_0 IS NULL))) Can be rewritten as a null-safe equi-join (i.e where movieid = movieid_0, which is not SQL semantics). That however is a corner-case to workaround folks who want OR joins to just handle NULL == NULL as a match. Cheers, Gopal
Re: Group by query
Hi Bhagwan, Thanks for your response... Is there any possible to use aggregated and non aggregated columns in hive query to get data without using group by? Regards, Ravi On Wed, Jun 3, 2015 at 10:24 AM, Bhagwan S. Soni bhgwnsson...@gmail.com wrote: Ravishankar, below query will works fine without including group by SELECT COUNT(adventurepersoncontacts.contactid) as contact FROM default.adventurepersoncontacts But when you are planning to get aggregated column with non-aggregated column you have to specify the level of aggregation. Why don't you include selected column in group by, I don't think so it will affect your output. Thanks, Bhagwan On Wed, Jun 3, 2015 at 4:05 PM, Ravisankar Mani rrav...@gmail.com wrote: Hi everyone, I have selected specific column using aggregation . FAILED: SemanticException [Error 10002]: Line 2:26 Invalid column reference 'contactid . Exception occur while executing following query. But it working in sql server without using group by. Kindly refer the query. SELECT adventurepersoncontacts.contactid as contactid ,adventurepersoncontacts.fullname as fullname ,adventurepersoncontacts.age as age ,adventurepersoncontacts.emailaddress as emailaddress ,adventurepersoncontacts.phoneno as phoneno ,adventurepersoncontacts.modifieddate as modifieddate , COUNT(adventurepersoncontacts.contactid) as contact FROM default.adventurepersoncontacts Regards, Ravisankar M R
Query in HQL
Hi Folks, Here is my query, select customerid as CustomerId from Customers It returns table with column name as 'customerid', ' CustomerId' in the query was not reflected into table. What shall i do to display column name as 'CustomerId'. Regards, Renuka
Re: Query in HQL
My bad .. i looked up the code and hive really transforms everything to lower case. On Wednesday, June 3, 2015 10:02 PM, Renuka Be renunalin...@gmail.com wrote: Hi Ankit, Thanks for your reply. I have tried as you suggested it throws below error. FAILED: ParseException line 1:21 cannot recognize input near 'as' ''CustomerId'' 'from' in selection target Query: select customerid as 'CustomerId' from Customers Regards,Renuka On Thu, Jun 4, 2015 at 12:49 AM, Ankit Bhatnagar ank...@yahoo-inc.com wrote: modify to say 'CustomerId' On Wednesday, June 3, 2015 9:44 PM, Renuka Be renunalin...@gmail.com wrote: Hi Folks, Here is my query, select customerid as CustomerId from Customers It returns table with column name as 'customerid', ' CustomerId' in the query was not reflected into table. What shall i do to display column name as 'CustomerId'. Regards,Renuka
Re: Top N query
Hi, This particular case can be handled by a special case of the inner equijoin. ( recommend_ratings2.movieid = T_0.movieid_0 or ((recommend_ratings2.movieid IS NULL) AND (T_0.movieid_0 IS NULL))) Can be rewritten as a null-safe equi-join (i.e where movieid = movieid_0, which is not SQL semantics). That however is a corner-case to workaround folks who want OR joins to just handle NULL == NULL as a match. Cheers, Gopal
When using ORC, how to confirm Stipes/row-groups are being skipped ?
Hi Community, While using ORC file format, I would like to see in the logs that stripes and/or row-groups are being skipped based on my where clause. I don't see skip-messages it in the logs. Is that info even outputted ? If so, what do I need to enable it ? Thank you, *Chandan*
connection pooling for hive JDBC client
Hello, I am working on an application that queries and interacts with hive using the JDBC API. In many other cases, using a JDBC connection pool such as commons-dbcp or BoneCP or HikariCP is a recommended practice and results in much better performance. All of the examples I’ve found of using accessing Hive through the JDBC API (such as https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBCClientSampleCode) use a raw JDBC connection to Hive directly. A few questions: 1. Should hive work with connection pools such as commons-dbcp, BoneCP, or HikariCP? * Note: I have tried all BoneCP and HikariCP, and cannot seem to get either to work because HiveConnection.setReadOnly throws a java.sql.SQLException saying “Method not supported”. I am using hive 0.14.0. 2. Do you recommended using a JDBC connection pool for interacting with hive from an application that will execute many repeated and concurrent queries/statements? Also, just to be clear, I am not asking about configuring the Hive metastore to use a connection pool to connect to its underlying database (MySQL, PostgreSQL, etc), but about using a connection pool to interact with and query Hive through the JDBC api from an application. Thank you, David McWhorter — David McWhorter Senior Developer, Foundations Informatics and Technology Services Office: 434.260.5232 | Mobile: 434.227.2551 david_mcwhor...@premierinc.commailto:david_mcwhor...@premierinc.com | Premier, Inc. (NASDAQ: PINC)
Re: current_date function in hive
Hi everyone, Thanks for the responses. My problem is solved now. I was not using the proper version of hive to be able to use the current_date function. Thanks Gopal for the UDF link. I beleive I will have to add the udf to the hiveserver2 in order to be able to run the query using Oozie. On Tue, Jun 2, 2015 at 1:30 PM, Gopal Vijayaraghavan gop...@apache.org wrote: Hi, You may try to_date(FROM_UNIXTIME(UNIX_TIMESTAMP())) That would be a very bad idea for query correctness. The current_date UDF was introduced because of that particular anti-pattern. The unix_timestamp() is evaluated when a row is read, so each row gets a slightly different timestamp when doing ETLs I have seen it give completely incorrect results when used near midnight (i.e query kicked off at 11:59, executed at 00:01 next day). Plus map-reduce falure handling mandates (logically) identical outputs on map task retries or speculation. I¹ve recently pulled out the current_timestamp() impl out into a reusable UDF recently so that I can use that in hive-1.0.x installs as well. https://github.com/t3rmin4t0r/current-timestamp You can use that UDF JAR with hive-1.0+, the constant folding will replace the value in-place during compilation. Cheers, Gopal -- Thanking You, Ayazur Rehman +1 720 917 8540