store full line in block
Hi All, My input data contains 45 columns, when I stored the data in hive table, end of the line in a block storing part of the line, so when I query its giving wrong result, please help me to store the full line in a block. -- Thanks *Kishore *
Re: store full line in block
some part of the line is storing in one block, and rest of the line is storing in another block, when i query for column one hive is giving the first column as the first column in another block(which begins from middle of the line). On Tue, Apr 29, 2014 at 2:05 PM, Kishore kumar kish...@techdigita.inwrote: Hi All, My input data contains 45 columns, when I stored the data in hive table, end of the line in a block storing part of the line, so when I query its giving wrong result, please help me to store the full line in a block. -- Thanks *Kishore * --
Number of hive-server2 threads increments after jdbc connection
Hello everyone, When I run a jdbc example from https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBCClientSampleCode against my hive server, the number of hive-server2 threads increments. If I execute it long enough I either start seeing exceptions Exception in thread main java.sql.SQLException: org.apache.thrift.TApplicationException: Internal error processing ExecuteStatement at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:203) at HiveJdbcClient.main(HiveJdbcClient.java:24) Caused by: org.apache.thrift.TApplicationException: Internal error processing ExecuteStatement at org.apache.thrift.TApplicationException.read(TApplicationException.java:108) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71) at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_ExecuteStatement(TCLIService.java:213) at org.apache.hive.service.cli.thrift.TCLIService$Client.ExecuteStatement(TCLIService.java:200) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:197) ... 1 more or I bump into a limit of connections with Zookeeper (each hive-server2 thread maintains a connection with Zookeeper. I have hive.support.concurrency enabled). In either case I can't connect to hive server after that. I've tried this on hive 0.10 (cdh 4.4) and hive 0.12 (cdh 5.0 and hdp 2.0.6) with same results. Please, could anyone help me resolve this. Thanks in advance.
OrcOutputFormat
Hi, I am trying to run an MR job to write files in ORC format. I do not see any files created although the job runs successfully. If I change the output format from OrcOutputFormat to TextOutputFormat (and that being the only change), I see the output files getting created. I am using Hive-0.12.0. I tried upgrading to Hive 0.13.0 but with this version I get the following error - 2014-04-29 10:37:07,426 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.VerifyError: org/apache/hadoop/hive/ql/io/orc/OrcProto$RowIndex at org.apache.hadoop.hive.ql.io.orc.WriterImpl.init(WriterImpl.java:129) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:369) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:104) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:91) at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.close(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:411) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:335) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1300) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153) How do you think can this issue be resolved? Thanks, Seema
Re: Number of hive-server2 threads increments after jdbc connection
Hi, In your code if more connections and statements are created?. If so, closed those connections? After use close unused connections and statements. Hope It Helps, Chinna On Tue, Apr 29, 2014 at 3:47 PM, Dima Fadeyev dfade...@pragsis.com wrote: Hello everyone, When I run a jdbc example from https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBCClientSampleCodeagainst my hive server, the number of hive-server2 threads increments. If I execute it long enough I either start seeing exceptions Exception in thread main java.sql.SQLException: org.apache.thrift.TApplicationException: Internal error processing ExecuteStatement at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:203) at HiveJdbcClient.main(HiveJdbcClient.java:24) Caused by: org.apache.thrift.TApplicationException: Internal error processing ExecuteStatement at org.apache.thrift.TApplicationException.read(TApplicationException.java:108) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71) at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_ExecuteStatement(TCLIService.java:213) at org.apache.hive.service.cli.thrift.TCLIService$Client.ExecuteStatement(TCLIService.java:200) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:197) ... 1 more or I bump into a limit of connections with Zookeeper (each hive-server2 thread maintains a connection with Zookeeper. I have hive.support.concurrency enabled). In either case I can't connect to hive server after that. I've tried this on hive 0.10 (cdh 4.4) and hive 0.12 (cdh 5.0 and hdp 2.0.6) with same results. Please, could anyone help me resolve this. Thanks in advance. -- Hope It Helps, Chinna
Re: Number of hive-server2 threads increments after jdbc connection
Hi, Chinna. Thanks for your reply. Yes, modifying code solves the problem. This is what my code looks like (a piece of it): Connection con = DriverManager.getConnection(jdbc:hive2://localhost:1/default, hive, hive); Statement stmt = con.createStatement(); String tableName = testHiveDriverTable; stmt.execute(drop table if exists + tableName); //stmt.close(); When I uncomment the last line, the number of hive-server2 threads doesn't keep incrementing to infinity. However I'm investigating the issue where the code is not really my code. Is there a way to correct this behavior from within hive-server2 without changing the client's code? El 29/04/14 14:05, Chinna Rao Lalam escribió: Hi, In your code if more connections and statements are created?. If so, closed those connections? After use close unused connections and statements. Hope It Helps, Chinna On Tue, Apr 29, 2014 at 3:47 PM, Dima Fadeyev dfade...@pragsis.com mailto:dfade...@pragsis.com wrote: Hello everyone, When I run a jdbc example from https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBCClientSampleCode against my hive server, the number of hive-server2 threads increments. If I execute it long enough I either start seeing exceptions Exception in thread main java.sql.SQLException: org.apache.thrift.TApplicationException: Internal error processing ExecuteStatement at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:203) at HiveJdbcClient.main(HiveJdbcClient.java:24) Caused by: org.apache.thrift.TApplicationException: Internal error processing ExecuteStatement at org.apache.thrift.TApplicationException.read(TApplicationException.java:108) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71) at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_ExecuteStatement(TCLIService.java:213) at org.apache.hive.service.cli.thrift.TCLIService$Client.ExecuteStatement(TCLIService.java:200) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:197) ... 1 more or I bump into a limit of connections with Zookeeper (each hive-server2 thread maintains a connection with Zookeeper. I have hive.support.concurrency enabled). In either case I can't connect to hive server after that. I've tried this on hive 0.10 (cdh 4.4) and hive 0.12 (cdh 5.0 and hdp 2.0.6) with same results. Please, could anyone help me resolve this. Thanks in advance. -- Hope It Helps, Chinna
Bug in Hive Partition windowing functions?
Hi, we have an issue with windowing function query never completed when running against the large dataset 25,000 rows. That is the reducer (only one) never exit and it appears stuck in an infinite loop. I looked at the Reducer counter and it never changes over the 6 hours when it gets stuck in a loop. When the data set is small 25K rows, it runs fine. Is there any work around this issue? We tested against Hive 0.11/0.12/0.13 and the same result is the same. create table window_function_fail as select a.*, sum(case when bprice is not null then 1 else 0 end) over (partition by date,name order by otime,bprice,aprice desc ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) bidpid from large_table a; create table large_table( date string, name string , stime string , bpricedecimal , apricedecimal , otime double ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' stored as textfile; Thanks in advance.
Re: Problem adding jar using pyhs2
Hi Brad, Your test, after edting for local host/file names, etc. worked. It must be something else I'm doing wrong in my development stuff. At least I know it should work. I'll figure it out eventually. Thanks again. David On Mon, Apr 28, 2014 at 10:22:57AM -0700, Brad Ruderman wrote: Hi David- Can you test the code? It is working for me. Make sure your jar is in HDFS and you are using the FQDN for referencing it. import pyhs2 with pyhs2.connect(host='127.0.0.1', port=1, authMechanism=PLAIN, user='root', password='test', database='default') as conn: with conn.cursor() as cur: cur.execute(ADD JAR hdfs:// sandbox.hortonworks.com:8020/nexr-hive-udf-0.2-SNAPSHOT.jar) cur.execute(CREATE TEMPORARY FUNCTION substr AS 'com.nexr.platform.hive.udf.UDFSubstrForOracle') #Execute query cur.execute(select substr(description,2,4) from sample_07) #Return column info from query print cur.getSchema() #Fetch table results for i in cur.fetch(): print i Thanks, Brad On Mon, Apr 28, 2014 at 7:39 AM, David Engel da...@istwok.net wrote: Thanks for your response. We've essentially done your first suggestion in the past by copying or symlinking our jar into Hive's lib directory. It works, but we'd like a better way for different users to to use different versions of our jar during development. Perhaps that's not possible, though, without running completely differnt instances of Hive. I don't think your second suggestion will work. The original problem is that when add jar file.jar is run through pyhs2, the fulle command gets passed to AddResourceProcessor.run(), yet AddResourceProcessor.run() is written such that it only expects jar file.jar to get passed to it. That's how it appears to work when add jar file.jar is run from a stand-alone Hive CLI and from beeline. David On Sat, Apr 26, 2014 at 12:14:53AM -0700, Brad Ruderman wrote: An easy solution would be to add the jar to the classpath or auxlibs therefore every instance of hive already has the jar and you just need to create the temporary function. Else you can put the JAR in HDFS and reference the add jar using the hdfs scheme. Example: import pyhs2 with pyhs2.connect(host='127.0.0.1', port=1, authMechanism=PLAIN, user='root', password='test', database='default') as conn: with conn.cursor() as cur: cur.execute(ADD JAR hdfs:// sandbox.hortonworks.com:8020/nexr-hive-udf-0.2-SNAPSHOT.jar) cur.execute(CREATE TEMPORARY FUNCTION substr AS 'com.nexr.platform.hive.udf.UDFSubstrForOracle') #Execute query cur.execute(select substr(description,2,4) from sample_07) #Return column info from query print cur.getSchema() #Fetch table results for i in cur.fetch(): print i On Fri, Apr 25, 2014 at 7:54 AM, David Engel da...@istwok.net wrote: Hi, I'm trying to convert some of our Hive queries to use the pyhs2 Python package (https://github.com/BradRuderman/pyhs2). Because we have our own jar with some custom SerDes and UDFs, we need to use the add jar /path/to/my.jar command to make them available to Hive. This works fine using the Hive CLI directly and also with the Beeline client. It doesn't work, however, with pyhs2. I naively tracked the problem down to a bug in AddResourceProcessor.run(). See HIVE-6971 in Jira. My attempted fix turned out to not be correct because it breaks the add command when used from the CLI and Beeline. It seems the add part of any add file|jar|archive ... command needs to get stripped off somewhere before it gets passed to AddResourceProcessor.run(). Unfortunately, I can't find that location when the command is received from pyhs2. Can someone help? David -- David Engel da...@istwok.net -- David Engel da...@istwok.net -- David Engel da...@istwok.net
Re: OrcOutputFormat
Hi, AFAIK, you would need to use HCatalog APIs to read-from/write-to an ORCFile. Please refer to https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput -Abhishek On Tue, Apr 29, 2014 at 6:40 AM, Seema Datar sda...@yahoo-inc.com wrote: Hi, I am trying to run an MR job to write files in ORC format. I do not see any files created although the job runs successfully. If I change the output format from OrcOutputFormat to TextOutputFormat (and that being the only change), I see the output files getting created. I am using Hive-0.12.0. I tried upgrading to Hive 0.13.0 but with this version I get the following error - 2014-04-29 10:37:07,426 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.VerifyError: org/apache/hadoop/hive/ql/io/orc/OrcProto$RowIndex at org.apache.hadoop.hive.ql.io.orc.WriterImpl.init(WriterImpl.java:129) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:369) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:104) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:91) at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.close(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:411) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:335) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1300) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153) How do you think can this issue be resolved? Thanks, Seema