Re: Hive Metastore Hook to to fire only on success
Thanks. I was using MetaStoreEventListener and wasn't aware there's another type of HMS hook. TY. Daniel On Fri, Oct 5, 2018 at 10:54 PM Alan Gates wrote: > Which version of Hive are you on and which hook are you seeing fire? > Based on looking at the master code you should only see the > commitCreateTable hook call if the creation succeeds. > > Alan. > > On Thu, Oct 4, 2018 at 12:36 AM Daniel Haviv > wrote: > >> Hi, >> I'm writing a HMS hook and I noticed that the hook fires no matter if the >> operation succeeded or not. >> For example, if a user creates an already existing table, the operation >> will fail but the the hook will fire regardless. >> >> Is there a way to either validate that the operation succeeded or fire >> only upon success? >> >> >> TY. >> Daniel >> >
Hive Metastore Hook to to fire only on success
Hi, I'm writing a HMS hook and I noticed that the hook fires no matter if the operation succeeded or not. For example, if a user creates an already existing table, the operation will fail but the the hook will fire regardless. Is there a way to either validate that the operation succeeded or fire only upon success? TY. Daniel
Specifying orc.stripe.size in Spark
Hi, When writing a dataframe using: df.write.orc("/path/to/orc") How can I specify orc parameters like orc.stripe.size ? Thank you, Daniel
Column names in ORC file
Hi, When I'm generating ORC files using spark the column names are written into the ORC file but when generated using Hive I get the following column names: _col107, _col33, _col23, _col102 Is it possible to somehow configure hive to properly store the column names like Spark? Thank you, Daniel
Re: IntWritable cannot be cast to LongWritable
I'm using 1.1.0. I always thought these issues were resolved way back at 0.13-0.14. So rewriting the data is the only way to handle this? Thank you, Daniel On Wed, Dec 14, 2016 at 8:42 PM, Owen O'Malley wrote: > Which version of Hive are you on? Hive 2.1 should automatically handle the > type conversions from the file to the table. > > .. Owen > > On Wed, Dec 14, 2016 at 9:36 AM, Daniel Haviv < > daniel.ha...@veracity-group.com> wrote: > >> Hi, >> I have an ORC table where one of the fields was an int and is now a >> bigint. >> Whenever I query a partition before the schema change I encounter the >> following error: >> Error: java.io.IOException: java.io.IOException: >> java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be >> cast to org.apache.hadoop.io.LongWritable >> at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handle >> RecordReaderNextException(HiveIOExceptionHandlerChain.java:121) >> at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleR >> ecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) >> at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRe >> cordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:226) >> at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRe >> cordReader.next(HadoopShimsSecure.java:136) >> at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToN >> ext(MapTask.java:199) >> at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next( >> MapTask.java:185) >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java: >> 453) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) >> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro >> upInformation.java:1698) >> >> I tried to manually go through the old partitions and set that column to >> int but I'm still getting the same exceptions. >> I expected promoting an int to a bigint shouldn't cause any problems. >> >> Am I doing something wrong ? >> >> Thank you, >> Daniel >> > >
IntWritable cannot be cast to LongWritable
Hi, I have an ORC table where one of the fields was an int and is now a bigint. Whenever I query a partition before the schema change I encounter the following error: Error: java.io.IOException: java.io.IOException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.LongWritable at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:226) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:136) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) I tried to manually go through the old partitions and set that column to int but I'm still getting the same exceptions. I expected promoting an int to a bigint shouldn't cause any problems. Am I doing something wrong ? Thank you, Daniel
Re: How to setup Hive JDBC client to connect remote Hiveserver
You can see which pid is listening on port 1 by running "netstat -pan | grep 1" The logs are usually under /var/log/hive or under a log dir inside Hive's dir Thank you. Daniel > On 4 Apr 2016, at 08:11, brajmohan saxena wrote: > > My Hiveserver2 is up and running. > But I think its not able to listening at port 1. > What should i do now ? > > Also I am using apache-hive-1.2.1-bin copied in my home directory and running > the Hiveserver2 from bin. > But I do not find any hive.log file anywhere, Could you please suggest me the > exact location. > > Thanks > Braj > >> On Mon, Apr 4, 2016 at 10:25 AM, Daniel Haviv >> wrote: >> It seems your hive server is not up (or not listening on port 1). >> hiveserver's logs might shed some light (usually at /var/log/hive) >> >> Thank you. >> Daniel >> >>> On 4 Apr 2016, at 07:00, brajmohan saxena wrote: >>> >>> Hi Shumin, >>> >>> I did telnet >>> >>> braj-laptop:bin brajmohan$ telnet 192.168.1.103 >>> >>> Trying 192.168.1.103... >>> >>> telnet: connect to address 192.168.1.103: Connection refused >>> >>> telnet: Unable to connect to remote host >>> >>> Thanks >>> >>> Braj >>> >>> >>>> On Mon, Apr 4, 2016 at 8:41 AM, Shumin Guo wrote: >>>> Can you telnet to that port? >>>> >>>> $ telnet 192.168.1.103 1 >>>> >>>>> On Sun, Apr 3, 2016 at 9:43 PM, brajmohan saxena >>>>> wrote: >>>>> Hi, >>>>> >>>>> Could you please tell me how to connect a simple JDBC program to remote >>>>> Hiveserver2 with default Derby database. >>>>> >>>>> I have Hiveserver2 running on remote machine and i am trying to run >>>>> simple JDBC program from client machine ( >>>>> DriverManager.getConnection("jdbc:hive2://192.168.1.103:1/default", >>>>> "", ""); ) >>>>> >>>>> but getting the following error. >>>>> Error: Could not open client transport with JDBC Uri: >>>>> jdbc:hive2://192.168.1.103:10001: java.net.ConnectException: Connection >>>>> refused (state=08S01,code=0) >>>>> >>>>> Do I need to change hive-site.xml file at server side. >>>>> >>>>> Thanks in advance >>>>> >>>>> Regards >>>>> >>>>> Braj >>>>> >
Re: How to setup Hive JDBC client to connect remote Hiveserver
It seems your hive server is not up (or not listening on port 1). hiveserver's logs might shed some light (usually at /var/log/hive) Thank you. Daniel > On 4 Apr 2016, at 07:00, brajmohan saxena wrote: > > Hi Shumin, > > I did telnet > > braj-laptop:bin brajmohan$ telnet 192.168.1.103 > > Trying 192.168.1.103... > > telnet: connect to address 192.168.1.103: Connection refused > > telnet: Unable to connect to remote host > > Thanks > > Braj > > >> On Mon, Apr 4, 2016 at 8:41 AM, Shumin Guo wrote: >> Can you telnet to that port? >> >> $ telnet 192.168.1.103 1 >> >>> On Sun, Apr 3, 2016 at 9:43 PM, brajmohan saxena >>> wrote: >>> Hi, >>> >>> Could you please tell me how to connect a simple JDBC program to remote >>> Hiveserver2 with default Derby database. >>> >>> I have Hiveserver2 running on remote machine and i am trying to run simple >>> JDBC program from client machine ( >>> DriverManager.getConnection("jdbc:hive2://192.168.1.103:1/default", "", >>> ""); ) >>> >>> but getting the following error. >>> Error: Could not open client transport with JDBC Uri: >>> jdbc:hive2://192.168.1.103:10001: java.net.ConnectException: Connection >>> refused (state=08S01,code=0) >>> >>> Do I need to change hive-site.xml file at server side. >>> >>> Thanks in advance >>> >>> Regards >>> >>> Braj >>> >
Re: Hive_CSV
Hi Ajay, Use the CSV serde to read your file, map all three columns but only select the relevant ones when you insert: Create table csvtab ( irrelevant string, sportName string, sportType string) ... Insert into loaded_table select sportName, sportType from csvtab; Daniel > On 9 Mar 2016, at 19:43, Ajay Chander wrote: > > Hi Everyone, > > I am looking for a way, to ignore the first occurrence of the delimiter while > loading the data from csv file to hive external table. > > Csv file: > > Xyz, baseball, outdoor > > Hive table has two columns sport_name & sport_type and fields are separated > by ',' > > Now I want to load by data into table such that while loading it has to > ignore the first delimiter that ignore xyz and load the data from second > delimiter. > > In the end my hive table should have the following data, > > Baseball, outdoor . > > Any inputs are appreciated. Thank you for your time.
Partition level inputformat
Hi, I'm trying to add external partitions to a table with a different inputformat and row delimiter properties but I keep failing and I can't find any documentation that explains the correct syntax. This is the DML I'm running: hive> alter table test_tbl_parquet add partition (year=2016,month=01,day=27) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\u0001' > STORED AS INPUTFORMAT > 'com.mycopmany.hive.WhaleAvroGenericInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > location '/mycopmany/data/test_tbl/year=2016/month=01/day=27'; FAILED: ParseException line 1:90 missing EOF at 'ROW' near ')' Thank you. Daniel
Re: chmod: changing permissions of '/datadir/000056_0': Permission denied. user=danielh is not the owner of inode=000056_0
Hi, Any thoughts on this issue ? Thank you. Daniel On Wed, Jan 20, 2016 at 12:28 PM, Daniel Haviv < daniel.ha...@veracity-group.com> wrote: > Hi, > We have a table in which the files are created by different users (under > the same group). > When a user inserts into the table it will finish successfully but after > moving the files the user will receive the following error(s): > chmod: changing permissions of '/datadir/56_0': Permission denied. > user=danielh is not the owner of inode=56_0 > > and that's because hive is trying to chmod file that the specific user did > not create. > > Is there a way to prevent this behavior ? > > Thank you. > Daniel >
chmod: changing permissions of '/datadir/000056_0': Permission denied. user=danielh is not the owner of inode=000056_0
Hi, We have a table in which the files are created by different users (under the same group). When a user inserts into the table it will finish successfully but after moving the files the user will receive the following error(s): chmod: changing permissions of '/datadir/56_0': Permission denied. user=danielh is not the owner of inode=56_0 and that's because hive is trying to chmod file that the specific user did not create. Is there a way to prevent this behavior ? Thank you. Daniel
Fwd: Conversion
Hi, We have a string column that represents an array of doubles that looks like this: f7 ad 3b 38 89 b7 e5 3f a1 c1 1a 74 db To parse it we use unhex(translate(signalvalues,' ','')) which returns a BINARY value. How can we convert it to ARRAY ? Thank you. Daniel
simple usage of stack UDTF causes a cast exception
Hi, I'm trying to break a row into two rows based on two different columns by using the following query: SELECT mystack.alias1 FROM cdrtable LATERAL VIEW stack(2, caller_IMEI, recipient_IMEI) mystack AS alias1; The exception I'm hitting is: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.LazyString cannot be cast to org.apache.hadoop.io.Text this is the table's DDL: CREATE TABLE `cdrtable`( `ts` string, `caller_msisdn` string, `caller_imei` string, `caller_imsi` string, `caller_cell` string, `recipient_msisdn` string, `recipient_imei` string, `recipient_imsi` string, `recipient_cell` string, `call_type` string, `call_duration` string, `call_length` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' The whole stack trace: 16/01/10 14:20:34 [main]: ERROR CliDriver: Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.LazyString cannot be cast to org.apache.hadoop.io.Text java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.LazyString cannot be cast to org.apache.hadoop.io.Text at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:152) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1655) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:227) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.LazyString cannot be cast to org.apache.hadoop.io.Text at org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:90) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.processOp(LateralViewJoinOperator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:125) at org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:45) at org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:107) at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFStack.process(GenericUDTFStack.java:123) at org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:108) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.LateralViewForwardOperator.processOp(LateralViewForwardOperator.java:37) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:424) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:416) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138) ... 13 more Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.LazyString cannot be cast to org.apache.hadoop.io.Text at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(WritableStringObjectInspector.java:41) at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:220) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:306) at
Re: UPDATE RE: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes (beeline - hive server 2)
Hi, I remember encountering a similar problem that was caused by an old mysql client driver. You can try and upgrade your mysql connector. Daniel On Mon, Nov 30, 2015 at 8:12 PM, Timothy Garza < timothy.ga...@collinsongroup.com> wrote: > We’ve been playing with the MySQL Global Settings: (Hive metastore) > > > > *mysql*> set global innodb_large_prefix = ON; (ß this was set to OFF > previously) > > > > …and now the ERROR is thus: > > Specified key was too long; max key length is 3072 bytes > > > > So it’s still ‘failing’ (but the HDFS operation itself succeeds). This > must be the problem area as the message has changed from: > > > > Specified key was too long; max key length is 767 bytes > > to > > Specified key was too long; max key length is 3072 bytes > > > > …simply by altering the MySQL Global settings. So is hiveserver2 trying to > use a key larger than MySQL supports (v5.5.2, file format Antelope)? > > > > NB. This only occurs when executing beeline INSERT, not CREATE nor SELECT > statements on a Hive Table (in this case a Sequence File). > > > > My colleague thinks this is SSL related (because of the use of the word > ‘key’ in the error), is HiveServer2 connecting to the Metastore using SSL? > > > -- > > Weirdly I’m experiencing exactly the same issue when trying to populate a > Hive Table using INSERT OVERWRITE TABLE. We’re recently upgraded from Hive > 0.13 to 1.2.1. NB. The Hive Table populates but the map-reduce returns an > error code. I have run the hive Schema Tool: schematool -dbType mysql > -upgradeSchemaFrom 0.13 > > > > The only table I can see with 767 size column is “PART_COL_STATS” – > implemented in one of the metastore upgrade scripts. Column Name: > PARTITION_NAME > | varchar(767). ß I changed this column to varchar(1000) but get the same > message afterwards: > > > > *ERROR jdbc.JDBCStatsPublisher:* Error during JDBC initialization. > > com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key > was too long; max key length is 767 bytes > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > > at com.mysql.jdbc.Util.handleNewInstance(Util.java:409) > > at com.mysql.jdbc.Util.getInstance(Util.java:384) > > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052) > > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4232) > > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4164) > > at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2615) > > at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2776) > > at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2832) > > at > com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1755) > > at > com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1679) > > at > org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.init(JDBCStatsPublisher.java:292) > > at > org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:411) > > at > org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) > > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) > > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) > > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653) > > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412) > > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195) > > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) > > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054) > > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154) > > at > org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71) > > at > org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > > at > org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218) > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
Re: Hive On Spark - Using custom SerDe
Hi, How should I set it ? just a normal set in hive or add it via the safety valve to the hive or sparks configuartion? Thank you. Daniel On Mon, Nov 16, 2015 at 5:46 PM, Jimmy Xiang wrote: > Have you add your class to "spark.kryo.classesToRegister"? You also need > to make sure your jar is in ""hive.aux.jars.path". > > Thanks, > Jimmy > > On Mon, Nov 16, 2015 at 1:44 AM, Daniel Haviv < > daniel.ha...@veracity-group.com> wrote: > >> Hi, >> We have a custom SerDe we would like to use with Hive on Spark but I'm >> not sure how to. >> The error messages are pretty clear about the fact that it can't find my >> SerDE's class: >> >> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable >> to find class: com.mycompany.hive.WhaleAvroGenericInputFormat >> >> >> >> >> Thank you. >> >> Daniel >> >> >
Hive On Spark - Using custom SerDe
Hi, We have a custom SerDe we would like to use with Hive on Spark but I'm not sure how to. The error messages are pretty clear about the fact that it can't find my SerDE's class: Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: com.mycompany.hive.WhaleAvroGenericInputFormat Thank you. Daniel
Re: Disabling local mode optimization
Hi, I'm trying to set hive.exec.mode.local.auto.inputbytes.max & hive.exec.mode.local.auto.tasks.max to 1 or 0 but still local mode is being used instead of M/R. Any ideas? Thank you. Daniel On Thu, Sep 3, 2015 at 8:02 AM, sreebalineni . wrote: > Hi, > > Is not it that you should set it true, by default it is disabled which is > false. > > Hive analyzes the size of each map-reduce job in a query and may run it > locally if the following thresholds are satisfied: > >- The total input size of the job is lower than: >hive.exec.mode.local.auto.inputbytes.max (128MB by default) >- The total number of map-tasks is less than: >hive.exec.mode.local.auto.tasks.max (4 by default) >- The total number of reduce tasks required is 1 or 0. > > So for queries over small data sets, or for queries with multiple > map-reduce jobs where the input to subsequent jobs is substantially smaller > (because of reduction/filtering in the prior job), jobs may be run locally. > > so we may need to check the sizeof your input, which version of hive are > you using? it can work only from Hive 0.7 onwards > > On Wed, Sep 2, 2015 at 4:46 PM, Daniel Haviv < > daniel.ha...@veracity-group.com> wrote: > >> Hi, >> I would like to disable the optimization where a query that just selects >> data is running without mapreduce (local mode). >> >> hive.exec.mode.local.auto is set to false but hive still runs in local mode >> for some queries. >> >> >> How can I disable local mode completely? >> >> >> Thank you. >> >> Daniel >> >> >
Re: Merging small files
Changed it to sort by. On Sat, Oct 17, 2015 at 6:05 PM, Daniel Haviv < daniel.ha...@veracity-group.com> wrote: > Thanks for the tip Gopal. > I tried what you suggested (on Tez) but I'm getting a middle stage with 1 > reducer (which is awful for performance). > > This is my query: > insert into upstreamparam_org partition(day_ts, cmtsid) select * from > upstreamparam_20151013 order by datats,macaddress; > > I've attached the query plan in case it might help understand why. > > Thank you. > Daniel. > > > > > On Fri, Oct 16, 2015 at 7:19 PM, Gopal Vijayaraghavan > wrote: > >> >> > Is there a more efficient way to have Hive merge small files on the >> >files without running with two passes? >> >> Not entirely an efficient way, but adding a shuffle stage usually works >> much better as it gives you the ability to layout the files for better >> vectorization. >> >> Like for TPC-H, doing ETL with >> >> create table lineitem as select * from lineitem sort by l_shipdate, >> l_suppkey; >> >> will produce fewer files (exactly as many as your reducer #) & compresses >> harder due to the natural order of transactions (saves ~20Gb or so at 1000 >> scale). >> >> Caveat: that is not more efficient in MRv2, only in Tez/Spark which can >> run MRR pipelines as-is. >> >> Cheers, >> Gopal >> >> >> >
Re: Merging small files
Thanks for the tip Gopal. I tried what you suggested (on Tez) but I'm getting a middle stage with 1 reducer (which is awful for performance). This is my query: insert into upstreamparam_org partition(day_ts, cmtsid) select * from upstreamparam_20151013 order by datats,macaddress; I've attached the query plan in case it might help understand why. Thank you. Daniel. On Fri, Oct 16, 2015 at 7:19 PM, Gopal Vijayaraghavan wrote: > > > Is there a more efficient way to have Hive merge small files on the > >files without running with two passes? > > Not entirely an efficient way, but adding a shuffle stage usually works > much better as it gives you the ability to layout the files for better > vectorization. > > Like for TPC-H, doing ETL with > > create table lineitem as select * from lineitem sort by l_shipdate, > l_suppkey; > > will produce fewer files (exactly as many as your reducer #) & compresses > harder due to the natural order of transactions (saves ~20Gb or so at 1000 > scale). > > Caveat: that is not more efficient in MRv2, only in Tez/Spark which can > run MRR pipelines as-is. > > Cheers, > Gopal > > > Plan not optimized by CBO. Vertex dependency in root stage Reducer 2 <- Map 1 (SIMPLE_EDGE) Stage-3 Stats-Aggr Operator Stage-0 Move Operator partition:{} table:{"name:":"default.upstreamparam_org","input format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde"} Stage-2 Dependency Collection{} Stage-1 Reducer 2 File Output Operator [FS_5] compressed:false Statistics:Num rows: 8707462208 Data size: 1767614828224 Basic stats: COMPLETE Column stats: NONE table:{"name:":"default.upstreamparam_org","input format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde"} Select Operator [SEL_3] | outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17","_col18","_col19","_col20"] | Statistics:Num rows: 8707462208 Data size: 1767614828224 Basic stats: COMPLETE Column stats: NONE |<-Map 1 [SIMPLE_EDGE] Reduce Output Operator [RS_7] key expressions:_col1 (type: bigint), _col0 (type: bigint) sort order:++ Statistics:Num rows: 8707462208 Data size: 1767614828224 Basic stats: COMPLETE Column stats: NONE value expressions:_col2 (type: bigint), _col3 (type: int), _col4 (type: int), _col5 (type: bigint), _col6 (type: float), _col7 (type: float), _col8 (type: float), _col9 (type: float), _col10 (type: float), _col11 (type: float), _col12 (type: float), _col13 (type: float), _col14 (type: float), _col15 (type: float), _col16 (type: bigint), _col17 (type: bigint), _col18 (type: bigint), _col19 (type: bigint), _col20 (type: string) Select Operator [OP_6] outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17","_col18","_col19","_col20"] Statistics:Num rows: 8707462208 Data size: 1767614828224 Basic stats: COMPLETE Column stats: NONE TableScan [TS_0] alias:upstreamparam_20151013 Statistics:Num rows: 8707462208 Data size: 1767614828224 Basic stats: COMPLETE Column stats: NONE
Merging small files
Hi, We are using Hive to merge small files by setting hive.merge.smallfiles.avgsize to 12000 and doing an insert as select to a table. The problem is that this take two passes over the data, first to insert the data and then to merge it. Is there a more efficient way to have Hive merge small files on the files without running with two passes? Thank you. Daniel
Re: Hive SerDe regex error
Hi, You didn't escape the ^ character at the end. Try using this string instead: ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ \[]*)\[([^ ]*)\]: \(([^ ]*)\) ([\^]*) Daniel On Thu, Oct 1, 2015 at 3:17 PM, IT CTO wrote: > Hi, > I am trying to create a table with Regex SerDe but failing with no good > reason: > CREATE EXTERNAL TABLE syslog ( > month STRING, > day STRING, > time STRING, > source STRING, > process STRING, > pid STRING, > uname STRING, > message STRING) > COMMENT 'This is the syslog sample table' > ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.RegexSerDe" > WITH SERDEPROPERTIES ( > "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ \[]*)\[([^ ]*)\]: > \(([^ ]*)\) ([^]*)" > ) > STORED AS TEXTFILE > LOCATION 'dfs://localhost:8020/data/flumeTest/flume-test-spoolDir'; > > The regex iself works on regex tester so I don't understand why I am > getting: > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask. > java.util.regex.PatternSyntaxException: Unclosed character class near index > 66 > ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ []*)[([^ ]*)]: (([^ ]*)) ([^]*) > > Any help? > -- > Eran | "You don't need eyes to see, you need vision" (Faithless) >
Re: Error: java.lang.IllegalArgumentE:Column has wrong number of index entries found - when trying to insert from JSON external table to ORC table
Hi Prasanth, Can you elaborate on what does the hive.merge.orcfile.stripe.level parameter affext? Thank you for your help. Daniel Sent from my iPhone > On 8 בספט׳ 2015, at 17:48, Prasanth Jayachandran > wrote: > > hive.merge.orcfile.stripe.level
Permission denied error when starting HiveServer2
Hi, I'm getting this error when starting HiveServer2: 2015-09-07 08:09:50,356 WARN org.apache.hive.service.server.HiveServer2: Error starting HiveServer2 on attempt 1, will retry in 60 seconds java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: Permission denied at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:472) at org.apache.hive.service.cli.CLIService.applyAuthorizationConfigPolicy(CLIService.java:124) at org.apache.hive.service.cli.CLIService.init(CLIService.java:111) at org.apache.hive.service.CompositeService.init(CompositeService.java:59) at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:92) at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:309) at org.apache.hive.service.server.HiveServer2.access$400(HiveServer2.java:68) at org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:523) at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:396) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.lang.RuntimeException: java.io.IOException: Permission denied at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:465) ... 14 more Caused by: java.io.IOException: Permission denied at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createTempFile(File.java:2024) at org.apache.hadoop.hive.ql.session.SessionState.createTempFile(SessionState.java:740) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:463) ... 14 more the hive user has write permissions to the scratch dir, is there another path I should take care of ? Thank you. Daniel
Re: Disabling local mode optimization
Excatly the info I needed. Thanks Daniel > On 3 בספט׳ 2015, at 09:02, sreebalineni . wrote: > > Hi, > > Is not it that you should set it true, by default it is disabled which is > false. > Hive analyzes the size of each map-reduce job in a query and may run it > locally if the following thresholds are satisfied: > The total input size of the job is lower than: > hive.exec.mode.local.auto.inputbytes.max (128MB by default) > The total number of map-tasks is less than: > hive.exec.mode.local.auto.tasks.max (4 by default) > The total number of reduce tasks required is 1 or 0. > So for queries over small data sets, or for queries with multiple map-reduce > jobs where the input to subsequent jobs is substantially smaller (because of > reduction/filtering in the prior job), jobs may be run locally. > so we may need to check the sizeof your input, which version of hive are you > using? it can work only from Hive 0.7 onwards > >> On Wed, Sep 2, 2015 at 4:46 PM, Daniel Haviv >> wrote: >> Hi, >> I would like to disable the optimization where a query that just selects >> data is running without mapreduce (local mode). >> hive.exec.mode.local.auto is set to false but hive still runs in local mode >> for some queries. >> >> How can I disable local mode completely? >> >> Thank you. >> Daniel >
Disabling local mode optimization
Hi, I would like to disable the optimization where a query that just selects data is running without mapreduce (local mode). hive.exec.mode.local.auto is set to false but hive still runs in local mode for some queries. How can I disable local mode completely? Thank you. Daniel
Re: Data presentation to consumer layer
Hi, There is a myriad of solutions, among them: Impala Presto Drill Kylin Tajo On Tue, Aug 25, 2015 at 10:44 AM, Mich Talebzadeh wrote: > Hi, > > > > My question concerns the means of presenting data to consumer layer from > Hive. > > > > Obviously Hive is very suitable for batch analysis. However, the MapReduce > nature of extracting data make is unlikely as a direct access tool for > consumer layer. > > > > So my question is what products are there that can be used effectively to > get the data from Hive to visualisations tools like Tableau. > > > > I thought of using Oracle TimesTen in-memory database to get the data out > of Hive/Hadoop and keep the most frequently used data in memory. What are > other alternatives around? > > > > Thanks, > > > > > > Mich Talebzadeh > > > > *Sybase ASE 15 Gold Medal Award 2008* > > A Winning Strategy: Running the most Critical Financial Data on ASE 15 > > > http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf > > Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE > 15", ISBN 978-0-9563693-0-7*. > > co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN > 978-0-9759693-0-4* > > *Publications due shortly:* > > *Complex Event Processing in Heterogeneous Environments*, ISBN: > 978-0-9563693-3-8 > > *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume > one out shortly > > > > http://talebzadehmich.wordpress.com > > > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this > message shall not be understood as given or endorsed by Peridale Technology > Ltd, its subsidiaries or their employees, unless expressly so stated. It is > the responsibility of the recipient to ensure that this email is virus > free, therefore neither Peridale Ltd, its subsidiaries nor their employees > accept any responsibility. > > >
Re: Loading multiple file format in hive
Hi, You can set a different file format per partition. You can't mix files in the same directory (You could theoretically write some kind of custom SerDe). Daniel. On Mon, Aug 24, 2015 at 6:15 PM, Jeetendra G wrote: > Can anyone put some light on this please? > > On Mon, Aug 24, 2015 at 12:32 PM, Jeetendra G > wrote: > >> HI All, >> >> I have a directory where I have json formatted and parquet files in same >> folder. can hive load these? >> >> I am getting Json data and storing in HDFS. later I am running job to >> convert JSon to Parquet(every 15 mins). so we will habe 15 mins Json data. >> >> Can i provide multiple serde in hive? >> >> regards >> Jeetendra >> > >
Re: Imporve performance in displaying content of data frame
Hi Vinod, A better place to ask this would be at Spark's mailing list. Your select isn't executed until you're running the foreach on it, so you get the impression that the select ran fast. Daniel > On 1 ביולי 2015, at 12:56, Vinod Kuamr wrote: > > Hi Everyone, > > I am using following sqlContext > > var df=sqlContext.sql("SELECT fullname,SUM(CAST(contactid AS decimal(38,6))) > FROM adventurepersoncontacts GROUP BY fullname ORDER BY fullname ASC"); > > It executes fine but when I display the content of the data frame by using > println method it take very more time to retrun the result > > df.foreach(println) > > can you please let me know how get the content of data frame in a optimized > way? > > My Environment is: > Spark 1.3.1 > Windows 8 > Sample Data with 15000 records > > Thank you, > Vinod
Re: Understanding ORC file format compression
Hi Sreejesh, The data in an ORC file is divided into stripes and in these stripes columns are divided into column groups. The compression is at the column group level, so to answer your question ORC files are splittable no matter the codec used. Daniel > On 21 ביוני 2015, at 10:56, sreejesh s wrote: > > Hi, > > As per my understanding, the available codecs for ORC file format Hive table > compression are either Zlib or Snappy. > Both the compression techniques are non splittable.. Does it mean that any > queries on Hive table stored as ORC and compressed will not run multiple maps > in parallel ??? > > I know that is not correct, please help me understand what i am missing > here... > > Thanks
Re: Output of Hive
It seems like your query returns no results,try using count to confirm. Daniel > On 16 במאי 2015, at 14:40, Anand Murali wrote: > > Dear All: > > I am new to hive so pardon my ignorance. I have the following query but do > not see any output. I wondered it maybe in HDFS and checked there and do not > find it there. Can somebody advise > > hive> select year, MAX(Temperature) from records where temperature <> > and (quality = 0 or quality = 1 or quality = 4 or quality = 5 or quality = 9) > > group by year > > ; > Query ID = anand_vihar_20150516170505_9b23d8ba-19d7-4fa7-b972-4f199e3bf56a > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks not specified. Estimated from input data size: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapreduce.job.reduces= > Job running in-process (local Hadoop) > 2015-05-16 17:05:11,504 Stage-1 map = 100%, reduce = 100% > Ended Job = job_local927727978_0003 > MapReduce Jobs Launched: > Stage-Stage-1: HDFS Read: 5329140 HDFS Write: 0 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > Time taken: 1.258 seconds > > Thanks > > Anand Murali >
Re: Hive Alter Partition Location Issue
I think you have an extra '/' in the HDFS URI Daniel > On 30 באפר׳ 2015, at 16:46, Harsha N wrote: > > Thanks for your reply, > > analyze table table1 partition (dt=201501) compute statistics;--returns the > same error > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks is set to 0 since there's no reduce operator > java.io.IOException: cannot find dir = hdfs:///data/dt =201501/1430201400/ > in pathToPartitionInfo: [hdfs:/data/dt=201501/1430201400/] > at > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:347) > > To Add On > I am working on External tables in Hive 0.13.1-cdh5.3.2 > > -Harsha > >> On Thu, Apr 30, 2015 at 12:37 AM, Mich Talebzadeh >> wrote: >> Hi Harsha, >> >> >> >> Have you updated stats on table1 after partition adding? In other words it >> is possible that the optimiser is not aware of that partition yet? >> >> >> >> analyze table table1 partition (dt=201501) compute statistics; >> >> >> >> HTH >> >> >> >> Mich Talebzadeh >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", >> ISBN 978-0-9563693-0-7. >> >> co-author "Sybase Transact SQL Guidelines Best Practices", ISBN >> 978-0-9759693-0-4 >> >> Publications due shortly: >> >> Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and >> Coherence Cache >> >> Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume >> one out shortly >> >> >> >> NOTE: The information in this email is proprietary and confidential. This >> message is for the designated recipient only, if you are not the intended >> recipient, you should destroy it immediately. Any information in this >> message shall not be understood as given or endorsed by Peridale Ltd, its >> subsidiaries or their employees, unless expressly so stated. It is the >> responsibility of the recipient to ensure that this email is virus free, >> therefore neither Peridale Ltd, its subsidiaries nor their employees accept >> any responsibility. >> >> >> >> From: Harsha N [mailto:harsha.hadoo...@gmail.com] >> Sent: 30 April 2015 07:24 >> To: user@hive.apache.org >> Subject: Hive Alter Partition Location Issue >> >> >> >> Hi All, >> >> Can experts share your view on Hive behaviour in below scenario. >> >> >> >> I am facing below issue on using alter partition locations in hive. >> >> >> >> select count(*) from table1 where dt = 201501; >> >> >> >> Total jobs = 1 >> >> Launching Job 1 out of 1 >> >> Number of reduce tasks determined at compile time: 1 >> >> In order to change the average load for a reducer (in bytes): >> >> set hive.exec.reducers.bytes.per.reducer= >> >> In order to limit the maximum number of reducers: >> >> set hive.exec.reducers.max= >> >> In order to set a constant number of reducers: >> >> set mapreduce.job.reduces= >> >> java.io.IOException: cannot find dir = hdfs:///data/dt =201501/1430201400/ >> >> in pathToPartitionInfo: [hdfs:/data/dt=201501/1430201400/] >> >> >> >> Below are the steps I have followed. >> >> I have altered a partition location in hive using below command. >> >> ALTER TABLE table1 PARTITION (dt=201501) SET LOCATION >> 'hdfs:///data/dt=201501/1430201400/'; >> >> >> >> I have inserted new data into this new location. >> >> >> >> INSERT INTO TABLE table1 >> >> SELECT * FROM table2 where dt=201501 >> >> >> >> select count(*) from table1 where dt = 201501; doesn't work but >> >> select * from table1 where dt = 201501 works good. >> >> >> >> Please let me know if you need more information. >> >> >> >> Thanks >> >> Harsha >> >
Re: creating parquet table using avro schame
Sorry, I misunderstood. AFAIK you can't do that. Daniel > On 29 באפר׳ 2015, at 18:49, Yosi Botzer wrote: > > Hi, > > I have parquet files that are the product of map-reduce job. > > I have used AvroParquetOutputFormat in order to produce them, so I have an > avro schema file describing the structure of the data. > > When I wan to create avro based table in hive I can use: > TBLPROPERTIES > ('avro.schema.url'='hdfs:///schema/report/dashboard_report.avsc'); > > So I do not to specify every field in the create statement. > > Is there a way to use the avro schema file to create the parquet table as > well? > > > > Yosi
Re: creating parquet table using avro schame
You should be able to get the schema out using parquet tools: http://blog.cloudera.com/blog/2015/03/converting-apache-avro-data-to-parquet-format-in-apache-hadoop/ Daniel > On 29 באפר׳ 2015, at 18:49, Yosi Botzer wrote: > > Hi, > > I have parquet files that are the product of map-reduce job. > > I have used AvroParquetOutputFormat in order to produce them, so I have an > avro schema file describing the structure of the data. > > When I wan to create avro based table in hive I can use: > TBLPROPERTIES > ('avro.schema.url'='hdfs:///schema/report/dashboard_report.avsc'); > > So I do not to specify every field in the create statement. > > Is there a way to use the avro schema file to create the parquet table as > well? > > > > Yosi
Re: Extremely Slow Data Loading with 40k+ Partitions
Is this a test environment? If so, can you try and disable concurrency? Daniel > On 16 באפר׳ 2015, at 19:44, Tianqi Tong wrote: > > Hi Daniel, > Actually the mapreduce job was just fine, but the process stuck on the data > loading after that. > The output stopped at: > Loading data to table default.parquet_table_with_40k_partitions partition > (yearmonth=null, prefix=null) > > When I look at the size of hdfs files of table, I can see the size is > growing, but it's kind of slow. > For mapreduce job, I had 400+ mappers and 100+ reducers. > > Thanks > Tianqi > > From: Daniel Haviv [mailto:daniel.ha...@veracity-group.com] > Sent: Wednesday, April 15, 2015 9:23 PM > To: user@hive.apache.org > Subject: Re: Extremely Slow Data Loading with 40k+ Partitions > > How many reducers are you using? > > Daniel > > On 16 באפר׳ 2015, at 00:55, Tianqi Tong wrote: > > Hi, > I'm loading data to a Parquet table with dynamic partitons. I have 40k+ > partitions, and I have skipped the partition stats computation step. > Somehow it's still exetremely slow loading data into partitions (800MB/h). > Do you have any hints on the possible reason and solution? > > Thank you > Tianqi Tong >
Re: Extremely Slow Data Loading with 40k+ Partitions
How many reducers are you using? Daniel > On 16 באפר׳ 2015, at 00:55, Tianqi Tong wrote: > > Hi, > I'm loading data to a Parquet table with dynamic partitons. I have 40k+ > partitions, and I have skipped the partition stats computation step. > Somehow it's still exetremely slow loading data into partitions (800MB/h). > Do you have any hints on the possible reason and solution? > > Thank you > Tianqi Tong >
Re: A simple insert stuck in hive
I would guess it has something to do with container allocation Daniel > On 8 באפר׳ 2015, at 20:26, Alan Gates wrote: > > If you're seeing it list progress (or attempted progress) as here, this isn't > a locking issue. All locks are obtained before the job is submitted to > Hadoop. > > Alan. > >> Mich Talebzadeh April 7, 2015 at 14:09 >> Hi, >> >> Today I have noticed the following issue. >> >> A simple insert into a table is sting there throwing the following >> >> hive> insert into table mytest values(1,'test'); >> Query ID = hduser_20150407215959_bc030fac-258f-4996-b50f-3d2d49371cca >> Total jobs = 3 >> Launching Job 1 out of 3 >> Number of reduce tasks is set to 0 since there's no reduce operator >> Starting Job = job_1428439695331_0002, Tracking URL = >> http://rhes564:8088/proxy/application_1428439695331_0002/ >> Kill Command = /home/hduser/hadoop/hadoop-2.6.0/bin/hadoop job -kill >> job_1428439695331_0002 >> Hadoop job information for Stage-1: number of mappers: 1; number of >> reducers: 0 >> 2015-04-07 21:59:35,068 Stage-1 map = 0%, reduce = 0% >> 2015-04-07 22:00:35,545 Stage-1 map = 0%, reduce = 0% >> 2015-04-07 22:01:35,832 Stage-1 map = 0%, reduce = 0% >> 2015-04-07 22:02:36,058 Stage-1 map = 0%, reduce = 0% >> 2015-04-07 22:03:36,279 Stage-1 map = 0%, reduce = 0% >> 2015-04-07 22:04:36,486 Stage-1 map = 0%, reduce = 0% >> >> I have been messing around with concurrency for hive. That did not work. My >> metastore is built in Oracle. So I drooped that schema and recreated from >> scratch. Got rid of concurrency parameters. First I was getting “container >> is running beyond virtual memory limits” for the task. I changed the >> following parameters in yarn-site.xml >> >> >> >> yarn.nodemanager.resource.memory-mb >> 2048 >> Amount of physical memory, in MB, that can be allocated for >> containers. >> >> >> yarn.scheduler.minimum-allocation-mb >> 1024 >> >> >> and mapred-site.xml >> >> >> mapreduce.map.memory.mb >> 4096 >> >> >> mapreduce.reduce.memory.mb >> 4096 >> >> >> mapreduce.map.java.opts >> -Xmx3072m >> >> >> mapreduce.recduce.java.opts >> -Xmx6144m >> >> >> yarn.app.mapreduce.am.resource.mb >> 400 >> >> >> However, nothing has helped except that virtual memory error has gone. Any >> ideas appreciated. >> >> Thanks >> >> Mich Talebzadeh >> >> http://talebzadehmich.wordpress.com >> >> Publications due shortly: >> Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and >> Coherence Cache >> >> NOTE: The information in this email is proprietary and confidential. This >> message is for the designated recipient only, if you are not the intended >> recipient, you should destroy it immediately. Any information in this >> message shall not be understood as given or endorsed by Peridale Ltd, its >> subsidiaries or their employees, unless expressly so stated. It is the >> responsibility of the recipient to ensure that this email is virus free, >> therefore neither Peridale Ltd, its subsidiaries nor their employees accept >> any responsibility.
HiveServer2 addressing standby namenode
Hi, We get a lot of error messaged on the standby namenode indicating that hive is trying to address the standby namenode. As all of our jobs function normally, my guess is that Hive is constantly trying to address both namenodes and only works with the active one. Is this correct? Can this be modified so it will only address the active one and still maintain HA architecture ? Thanks, Daniel
Re: hive 0.14 return some not NULL value as NULL
Can you also supply the table's DDL and a few lines of your raw data? Daniel > On 1 באפר׳ 2015, at 09:16, "r7raul1...@163.com" wrote: > > > > > I use hive 0.14 the result is > 87FQEZT1UEDXJHJQPFFX7G7ET8S2DVPM2357378283356 9150119100048 > 7326356 NULL > 87FQEZT1UEDXJHJQPFFX7G7ET8S2DVPM2357378283356 121501191035580028 > 7326356 NULL > UBDTK8D9XUZ9GRZU8NZNXDEG73D4PCZG2362223711289 161501191549050061 > 14837289 NULL > Y49EY895ACABHS95DRQEE8DVFEB8JSE12360853052224 111501191426280023 > 115883224 NULL > > I use hive 0.10 the result is > 87FQEZT1UEDXJHJQPFFX7G7ET8S2DVPM2357378283356 9150119100048 > 73263562015-01-19 10:44:44 > 87FQEZT1UEDXJHJQPFFX7G7ET8S2DVPM2357378283356 121501191035580028 > 73263562015-01-19 10:35:58 > UBDTK8D9XUZ9GRZU8NZNXDEG73D4PCZG2362223711289 161501191549050061 > 14837289 2015-01-19 15:49:05 > Y49EY895ACABHS95DRQEE8DVFEB8JSE12360853052224 111501191426280023 > 115883224 2015-01-19 14:26:28 > > Why ? I attach my log. Also in my log I found 2015-04-01 09:55:38,409 WARN > [main] org.apache.hadoop.hive.serde2.lazy.LazyStruct: Extra bytes detected at > the end of the row! Ignoring similar problems. > > r7raul1...@163.com >
Re: Understanding Hive's execution plan
lan was produced! > > Thanks > > > Mich Talebzadeh > > http://talebzadehmich.wordpress.com > > Publications due shortly: > Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and > Coherence Cache > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this message > shall not be understood as given or endorsed by Peridale Ltd, its > subsidiaries or their employees, unless expressly so stated. It is the > responsibility of the recipient to ensure that this email is virus free, > therefore neither Peridale Ltd, its subsidiaries nor their employees accept > any responsibility. > > From: Daniel Haviv [mailto:daniel.ha...@veracity-group.com] > Sent: 26 March 2015 17:27 > To: user@hive.apache.org > Subject: Understanding Hive's execution plan > > Hi, > Can anyone direct me to a good explanation on understanding Hive's execution > plan? > > Thanks, > Daniel
Re: 0.14 parse exception, row format question
Your quotation marks around the location string seem to be wrong Daniel > On 26 במרץ 2015, at 22:10, bitsofinfo wrote: > > Hi, > > What is wrong with this query? I am reading the docs and it appears that > this should work no? > > INSERT OVERWRITE DIRECTORY “/user/admin/mydirectory” > ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’ > select * from my_table_that_exists; > > Error occurred executing hive query: Error while compiling statement: > FAILED: ParseException line 2:0 cannot recognize input near ‘ROW’ > ‘FORMAT’ ‘DELIMITED’ in statement > > Version of Hue/Hive etc I am running: > ———- > > Hue > 2.6.1-2041 > > HDP > 2.2.0 > > Hadoop > 2.6.0 > > Pig > 0.14.0 > > Hive-Hcatalog > 0.14.0 > > Oozie > 4.1.0 > > Ambari > 1.7-169 > > HBase > 0.98.4 > > Knox > 0.5.0 > > Storm > 0.9.3 > > Falcon > 0.6.0
Understanding Hive's execution plan
Hi, Can anyone direct me to a good explanation on understanding Hive's execution plan? Thanks, Daniel
Re: how to set column level privileges
Create a view with the permitted columns and handle the privileges for it Daniel > On 26 במרץ 2015, at 12:40, Allen wrote: > > hi, > > We use SQL standards based authorization for authorization in Hive > 0.14. But it has not support for column level privileges. > > So, I want to know Is there anyway to set column level privileges? > > > > Thanks! > > > >
Re: How to clean up a table for which the underlying hdfs file no longer exists
You can also use ALTER TABLE SET TBLPROPERTIES('EXTERNAL'='TRUE') And then drop it Daniel > On 22 במרץ 2015, at 04:15, Stephen Boesch wrote: > > > There is a hive table for which the metadata points to a non-existing hdfs > file. Simply calling > > drop table > > results in: > > Failed to load metadata for table: db.mytable > Caused by TAbleLoadingException: Failed to load metadata for table: db.mytable > File does not exist: hdfs:// > Caused by FileNotFoundException: File does not exist: hdfs:// .. > > So: the file does not exist in hdfs , and it is not possible to remove the > metadata for it directly. Is the next step going to be: "run some sql > commands against the metastore to manually delete the associated rows"? If > so, what are those delete commands? > > thanks
Re: How to clean up a table for which the underlying hdfs file no longer exists
You can (as a workaround) just create it's directory and then drop it Daniel > On 22 במרץ 2015, at 04:15, Stephen Boesch wrote: > > > There is a hive table for which the metadata points to a non-existing hdfs > file. Simply calling > > drop table > > results in: > > Failed to load metadata for table: db.mytable > Caused by TAbleLoadingException: Failed to load metadata for table: db.mytable > File does not exist: hdfs:// > Caused by FileNotFoundException: File does not exist: hdfs:// .. > > So: the file does not exist in hdfs , and it is not possible to remove the > metadata for it directly. Is the next step going to be: "run some sql > commands against the metastore to manually delete the associated rows"? If > so, what are those delete commands? > > thanks
Re: Which SerDe for Custom Binary Data.
https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-HowtoWriteYourOwnSerDe Daniel > On 13 במרץ 2015, at 17:56, karthik maddala wrote: > > > > I want to set up a DW based on Hive. However, my data does not come as handy > csv files but as binary files in a proprietary format. > > The binary file consists of serialized data using C language. > > > Could you please suggest which input format to be used and how to write a > custom SerDe for the above mentioned data. > > > Thanks, > Karthik Maddala > >
Re: insert table error
What is the error you get? Daniel > On 13 במרץ 2015, at 13:13, zhangjp wrote: > > case fail > CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2)) > CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC; > INSERT INTO TABLE students > VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);
Bucket pruning
Hi, We created a bucketed table and when we select in the following way: select * from testtble where bucket_col ='X'; We observe that there all of the table is being read and not just the specific bucket. Does Hive support such a feature ? Thanks, Daniel
Re: Simple way to export data from a Hive table in to Avro?
I might be missing something here but you could use: Create table newtable stored as avro as select * from oldtable On Mon, Feb 2, 2015 at 3:09 PM, Michael Segel wrote: > Currently using Hive 13.x > > Would like to select from a table that exists and output to an external > file(s) in avro via hive. > > Is there a simple way to do this? > > From what I’ve seen online, the docs tend to imply you need to know the > avro schema when you specify the table. > Could you copy from an existing table, or do I need to dump the current > schema and write some code to generate an avro schema? > > Thx > > -Mike > >
Trying to improve compression ratio for an ORC table
Hi guys, I'm experiencing something very odd: I have an ORC table with the "orc.compress"="SNAPPY" property that weighs 4.9 GB and is composed of 253 files.. I then do a CTAS into a new table where I added this property "orc.compress.size"="2485760" to improve the compression ratio. The new table weighs 5.2 GB over 18 files so not only did the compression ratio not improve, it got worse. How can this be ? Thanks, Daniel
Re: Adding new columns to parquet based Hive table
Hi Kumar, Altering the table just update's Hive's metadata without updating parquet's schema. I believe that if you'll insert to your table (after adding the column) you'll be able to later on select all 3 columns. Daniel > On 14 בינו׳ 2015, at 21:34, Kumar V wrote: > > Hi, > > Any ideas on how to go about this ? Any insights you have would be > helpful. I am kinda stuck here. > > Here are the steps I followed on hive 0.13 > > 1) create table t (f1 String, f2 string) stored as Parquet; > 2) upload parquet files with 2 fields > 3) select * from t; < Works fine. > 4) alter table t add columns (f3 string); > 5) Select * from t; <- ERROR "Caused by: > java.lang.IllegalStateException: Column f3 at index 2 does not exist > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:116) > at > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204) > at > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:79) > at > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:66) > at > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:65) > > > > > > On Wednesday, January 7, 2015 2:55 PM, Kumar V > wrote: > > > Hi, > I have a Parquet format Hive table with a few columns. I have loaded a > lot of data to this table already and it seems to work. > I have to add a few new columns to this table. If I add new columns, queries > don't work anymore since I have not reloaded the old data. > Is there a way to add new fields to the table and not reload the old Parquet > files and make the query work ? > > I tried this in Hive 0.10 and also on hive 0.13. Getting an error in both > versions. > > Please let me know how to handle this. > > Regards, > Kumar. > >
Re: Monitoring Hive Thread Usage
Found a solution (aside from JMX): ps -eLf | grep [HiveServer2 PID] On Tue, Jan 6, 2015 at 11:03 AM, Daniel Haviv < daniel.ha...@veracity-group.com> wrote: > Hi, > I suspect we have a problem with clients opening connections and not > closing them. > To verify that I'd like to monitor the Hive's number of threads but I > can't seem to find a way to do so. > > Anyone has ever tried or has any ideas? > > Thanks, > Daniel >
Monitoring Hive Thread Usage
Hi, I suspect we have a problem with clients opening connections and not closing them. To verify that I'd like to monitor the Hive's number of threads but I can't seem to find a way to do so. Anyone has ever tried or has any ideas? Thanks, Daniel
Re: How to pass information to hive udf except as arguments
First result in google: http://stackoverflow.com/questions/12464636/how-to-set-variables-in-hive-scripts Daniel > On 19 בדצמ׳ 2014, at 10:54, Dilip Agarwal wrote: > > > Hi, I have created a udf which accepts geo location points as arguments and > return the name of location fetching from a url. I have to set this URL > dynamically at the time of hive script run. > > I don't like to pass this url as separate argument tot the udf evaluate > method. Is there a way to set this url in hive script and get from hive udf, > or set this in user environment and then fetch. Please tell me the full > procedure to do this. > > > Thanks & Regards > Dilip Agarwal > +91 8287857554
Re: Case inside select statement in hive
Hi, Please RTFM before asking questions. Taken from https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF: Conditional Functions *Return Type* *Name(Signature)* *Description* T if(boolean testCondition, T valueTrue, T valueFalseOrNull) Returns valueTrue when testCondition is true, returns valueFalseOrNull otherwise. T COALESCE(T v1, T v2, ...) Returns the first v that is not NULL, or NULL if all v's are NULL. T CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END When a = b, returns c; when a = d, returns e; else returns f. T CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END When a = true, returns b; when c = true, returns d; else returns e. BR, Daniel On Tue, Dec 16, 2014 at 6:37 PM, Gayathri Swaroop wrote: > > Hi, > > I have oracle query which i want to hit against a hive table. > The oracle query has a case if exists select > would this work in hive? > This is my oracle query that needs to be converted to hive. > > select distinct CONTR.BMF_PARTNER_ID AS BMF_BMF_PARTNER_ID, > CONTR.BUSINESS_PARTNER AS BMF_BUS_PRTNR_ID, > CONTR.CONTRACT_ACCOUNT AS BMF_CONTR_ACCT_ID, > CONTR.CONTRACT_NBR AS BMF_CONTR_ID, > CONTR.ESIID AS BMF_ESI_ID, > CONTR.INSTALLATION_ID AS BMF_INSTALLATION_ID, > CONTR.SEGMENT_TYPE_CD AS BMF_SEGMENT_TYPE_CD, > CONTR.PARTNER_TYPE AS BMF_PARTNER_TYPE, > CONTR.ACTUAL_MOVEIN_DATE AS BMF_ACTUAL_MVI_DT, > CONTR.ACTUAL_MOVEOUT_DATE AS BMF_ACTUAL_MVO_DT, > CONTR.ENRL_RATE_CATEGORY AS BMF_ENRL_RATE_CATEGORY, > CONTR.CAMPAIGN_CD AS BMF_CAMPAIGN_CD, > CONTR.OFFER_CD AS BMF_OFFER_CD, >case when exists (select * from KSS_ACTIVITY_STG_CURR_STAT C_ID > where c_id.esiid = contr.esiid > and c_id.contract_nbr = contr.contract_nbr > and c_id.BMF_PARTNER_ID <> contr.BMF_PARTNER_ID > and c_id.partner_type=2 > and c_id.actual_movein_date < > to_date('09/30/2014','mm/dd/') > and c_id.actual_moveout_date > >=to_date('09/30/2014','mm/dd/')) > then 'YES' else NULL end > as IS_DUPLICATE_BMF > FROM KSS_ESIID_LIST ESID INNER JOIN > KSS_ACTIVITY_STG_CURR_STAT CONTR ON > ESID.BMF_PARTNER_ID = CONTR.BMF_PARTNER_ID > WHERE contr.partner_type=2 > and CONTR.actual_movein_date < > to_date('09/30/2014','mm/dd/') > and CONTR.actual_moveout_date > >=to_date('09/30/2014','mm/dd/'); > > > Thanks, > Gayathri >
Re: Concatenating ORC files
Thanks a lot I'll try it out Daniel > On 12 בדצמ׳ 2014, at 03:45, Prasanth Jayachandran > wrote: > > Thanks Daniel for filing the jira and the test case. I have put up a patch in > HIVE-9067 jira that should fix this issue. > > - Prasanth > > >> On Thu, Dec 11, 2014 at 3:29 AM, Daniel Haviv >> wrote: >> Hi, >> I've created a JIRA with a test case: >> https://issues.apache.org/jira/browse/HIVE-9080 >> >> Thanks! >> Daniel >> >>> On Thu, Dec 11, 2014 at 12:49 AM, Prasanth Jayachandran >>> wrote: >>> I am unable to reproduce the case that causes exception that you are >>> seeing. Will be great if you can provide a repro. >>> >>> - Prasanth >>> >>> >>>> On Wed, Dec 10, 2014 at 1:43 PM, Prasanth Jayachandran >>>> wrote: >>>> I can see a bug for the case 2 where orc index is disabled. I have created >>>> a jira to track that issue. >>>> https://issues.apache.org/jira/browse/HIVE-9067 >>>> >>>> I am not sure why does it fail in case 1 though. Can you create a jira >>>> with a reproducible case? I can take a look at it. >>>> >>>> - Prasanth >>>> >>>> >>>>> On Wed, Dec 10, 2014 at 10:37 AM, Daniel Haviv >>>>> wrote: >>>>> I've made a little experiment and recreated the table with >>>>> 'orc.create.index'='FALSE' and now it fails on something else: >>>>> Error: java.io.IOException: >>>>> org.apache.hadoop.hive.ql.metadata.HiveException: >>>>> java.lang.ClassCastException: >>>>> org.apache.hadoop.hive.ql.io.orc.ColumnStatisticsImpl$StringStatisticsImpl >>>>> cannot be cast to >>>>> org.apache.hadoop.hive.ql.io.orc.ColumnStatisticsImpl$BooleanStatisticsImpl >>>>> at >>>>> org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.map(MergeFileMapper.java:115) >>>>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) >>>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) >>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) >>>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) >>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>>> at >>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) >>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) >>>>> >>>>> It seems that the concatenation feature needs more work.. >>>>> >>>>> Daniel >>>>> >>>>>> On Wed, Dec 10, 2014 at 4:54 PM, Daniel Haviv >>>>>> wrote: >>>>>> Hi, >>>>>> I'm trying to use the new concatenate command merge small ORC files and >>>>>> file right away: >>>>>> >>>>>> alter table requests partition(day_ts=1418083200, hour_ts=1418151600) >>>>>> concatenate; >>>>>> >>>>>> Diagnostic Messages for this Task: >>>>>> Error: java.lang.IllegalArgumentException: Column has wrong number of >>>>>> index entries found: 0 expected: 1 >>>>>> at >>>>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:726) >>>>>> at >>>>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1614) >>>>>> at >>>>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1996) >>>>>> at >>>>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2288) >>>>>> at >>>>>> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:215) >>>>>> at >>>>>> org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.close(MergeFileMapper.java:98) >>>>>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) >>>>>> at >>>>>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) >>>>>> at org.apache
Re: Concatenating ORC files
Hi, I've created a JIRA with a test case: https://issues.apache.org/jira/browse/HIVE-9080 Thanks! Daniel On Thu, Dec 11, 2014 at 12:49 AM, Prasanth Jayachandran < pjayachand...@hortonworks.com> wrote: > I am unable to reproduce the case that causes exception that you are > seeing. Will be great if you can provide a repro. > > - Prasanth > > > On Wed, Dec 10, 2014 at 1:43 PM, Prasanth Jayachandran < > pjayachand...@hortonworks.com> wrote: > >> I can see a bug for the case 2 where orc index is disabled. I have >> created a jira to track that issue. >> https://issues.apache.org/jira/browse/HIVE-9067 >> >> I am not sure why does it fail in case 1 though. Can you create a jira >> with a reproducible case? I can take a look at it. >> >> - Prasanth >> >> >> On Wed, Dec 10, 2014 at 10:37 AM, Daniel Haviv < >> daniel.ha...@veracity-group.com> wrote: >> >>> I've made a little experiment and recreated the table >>> with 'orc.create.index'='FALSE' and now it fails on something else: >>> Error: java.io.IOException: >>> org.apache.hadoop.hive.ql.metadata.HiveException: >>> java.lang.ClassCastException: >>> org.apache.hadoop.hive.ql.io.orc.ColumnStatisticsImpl$StringStatisticsImpl >>> cannot be cast to >>> org.apache.hadoop.hive.ql.io.orc.ColumnStatisticsImpl$BooleanStatisticsImpl >>> at >>> org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.map(MergeFileMapper.java:115) >>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) >>> at >>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) >>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:415) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) >>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) >>> >>> It seems that the concatenation feature needs more work.. >>> >>> Daniel >>> >>> On Wed, Dec 10, 2014 at 4:54 PM, Daniel Haviv < >>> daniel.ha...@veracity-group.com> wrote: >>> >>>> Hi, >>>> I'm trying to use the new concatenate command merge small ORC files and >>>> file right away: >>>> >>>> alter table requests partition(day_ts=1418083200, hour_ts=1418151600) >>>> concatenate; >>>> >>>> Diagnostic Messages for this Task: >>>> Error: java.lang.IllegalArgumentException: Column has wrong number of >>>> index entries found: 0 expected: 1 >>>> at >>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:726) >>>> at >>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1614) >>>> at >>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1996) >>>> at >>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2288) >>>> at >>>> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:215) >>>> at >>>> org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.close(MergeFileMapper.java:98) >>>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) >>>> at >>>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) >>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>> at >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) >>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) >>>> >>>> >>>> Is there some property I need to set for ORC to be able to support >>>> concatenation? >>>> >>>> Thanks, >>>> Daniel >>>> >>>> >>>> >>> >> > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. >
Re: Concatenating ORC files
HI Prasanth, The first attempt had ("orc.compress"="Snappy") and all the files under it were created the same way so I'm assuming they all should have indexes created. In the second attempt I used ("orc.create.index"="false", "orc.compress"="Snappy"). Thanks, Daniel ll On Wed, Dec 10, 2014 at 9:04 PM, Prasanth Jayachandran < pjayachand...@hortonworks.com> wrote: > Hi Daniel > > In you first run, are there some files with “orc.create.index”=“false”? > What are the table properties used to create ORC files in both cases? > > - Prasanth > > > On Wed, Dec 10, 2014 at 7:55 AM, Daniel Haviv < > daniel.ha...@veracity-group.com> wrote: > >> Hi, >> I'm trying to use the new concatenate command merge small ORC files and >> file right away: >> >> alter table requests partition(day_ts=1418083200, hour_ts=1418151600) >> concatenate; >> >> Diagnostic Messages for this Task: >> Error: java.lang.IllegalArgumentException: Column has wrong number of >> index entries found: 0 expected: 1 >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:726) >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1614) >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1996) >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2288) >> at >> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:215) >> at >> org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.close(MergeFileMapper.java:98) >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) >> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) >> >> >> Is there some property I need to set for ORC to be able to support >> concatenation? >> >> Thanks, >> Daniel >> >> >> > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You.
Re: Concatenating ORC files
I've made a little experiment and recreated the table with 'orc.create.index'='FALSE' and now it fails on something else: Error: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.ColumnStatisticsImpl$StringStatisticsImpl cannot be cast to org.apache.hadoop.hive.ql.io.orc.ColumnStatisticsImpl$BooleanStatisticsImpl at org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.map(MergeFileMapper.java:115) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) It seems that the concatenation feature needs more work.. Daniel On Wed, Dec 10, 2014 at 4:54 PM, Daniel Haviv < daniel.ha...@veracity-group.com> wrote: > Hi, > I'm trying to use the new concatenate command merge small ORC files and > file right away: > > alter table requests partition(day_ts=1418083200, hour_ts=1418151600) > concatenate; > > Diagnostic Messages for this Task: > Error: java.lang.IllegalArgumentException: Column has wrong number of > index entries found: 0 expected: 1 > at > org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:726) > at > org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1614) > at > org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1996) > at > org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2288) > at > org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:215) > at > org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.close(MergeFileMapper.java:98) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > > > Is there some property I need to set for ORC to be able to support > concatenation? > > Thanks, > Daniel > > >
Concatenating ORC files
Hi, I'm trying to use the new concatenate command merge small ORC files and file right away: alter table requests partition(day_ts=1418083200, hour_ts=1418151600) concatenate; Diagnostic Messages for this Task: Error: java.lang.IllegalArgumentException: Column has wrong number of index entries found: 0 expected: 1 at org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:726) at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1614) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1996) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2288) at org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:215) at org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.close(MergeFileMapper.java:98) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Is there some property I need to set for ORC to be able to support concatenation? Thanks, Daniel
Re: Insert into dynamic partitions performance
I see. Thanks a lot that's very helpful! Daniel > On 7 בדצמ׳ 2014, at 09:10, Gopal V wrote: > >> On 12/6/14, 10:11 PM, Daniel Haviv wrote: >> >> Isn't there a way to make hive allocate more than one reducer for the whole >> job? Maybe one >> per partition. > > Yes. > > hive.optimize.sort.dynamic.partition=true; does nearly that. > > It raises the net number of useful reducers to total-num-of-partitions x > total-num-buckets. > > If you have say, data being written into six hundred partitions with 1 bucket > each, it can use anywhere between 1 and 600 reducers (hashcode collisions > causing skews, of course). > > It's turned off by default, because it really slows down the 1 partition > without buckets insert speed. > > Cheers, > Gopal > >>>> On 7 בדצמ׳ 2014, at 06:06, Gopal V wrote: >>>> >>>> On 12/6/14, 6:27 AM, Daniel Haviv wrote: >>>> Hi, >>>> I'm executing an insert statement that goes over 1TB of data. >>>> The map phase goes well but the reduce stage only used one reducer which >>>> becomes >> a great bottleneck. >>> >>> Are you inserting into a bucketed or sorted table? >>> >>> If the destination table is bucketed + partitioned, you can use the dynamic >>> partition >> sort optimization to get beyond the single reducer. >>> >>> Cheers, >>> Gopal >
Re: Insert into dynamic partitions performance
Thanks Gopal, I dont want to divide my data any further. Isn't there a way to make hive allocate more than one reducer for the whole job? Maybe one per partition. Daniel > On 7 בדצמ׳ 2014, at 06:06, Gopal V wrote: > >> On 12/6/14, 6:27 AM, Daniel Haviv wrote: >> Hi, >> I'm executing an insert statement that goes over 1TB of data. >> The map phase goes well but the reduce stage only used one reducer which >> becomes a great bottleneck. > > Are you inserting into a bucketed or sorted table? > > If the destination table is bucketed + partitioned, you can use the dynamic > partition sort optimization to get beyond the single reducer. > > Cheers, > Gopal
Insert into dynamic partitions performance
Hi, I'm executing an insert statement that goes over 1TB of data. The map phase goes well but the reduce stage only used one reducer which becomes a great bottleneck. I've tried to set the number of reducers to four and added a distribute by clause to the statement but I'm still using just one reducer. How can I increase the reducer's parallelism? Thanks, Daniel
Re: Start hiveserver2 as a daemon
Try using screen Daniel > On 5 בדצמ׳ 2014, at 19:08, peterm_second wrote: > > yes, > I've tried nohup , & even sh -c . > & works but after the first call get's executed in the background I get the > message you can see when a hadoop job is submitted to the cluster and then > the terminal get's frozen. I think the problem is in the > ext/hiveserver2.sh:hiveserver2 function. it's says something along the lines > of > exec $HADOOP jar $JAR $CLASS $HIVE_OPTS "$@" > I am not 100% how the exec command works , but somehow it re-owns the > launching terminal. My problem is aggravated by the fact that I am launching > hive using sshpass > > Peter > >> On 5.12.2014 18:55, Jörn Franke wrote: >> Have you tried nohup ? >> >> Le 5 déc. 2014 15:25, "peterm_second" a écrit : >>> Hi Guys, >>> How can I launch the Hiveserver2 as a daemon. >>> I am launching the hiverserv2 using sshpass and I can't detach >>> hiveserver2 from my terminal. Is there a way to deamonise the hiveserver2 ? >>> >>> I've also tried using & but it's not working either, any thoughts ? >>> >>> Regards, >>> Peter >
Running hive inside a bash script
Hi, I have a bash script that runs a hive query and I would like it to do something if the query succeeds and something else if it fails. My testings show that a query failure does not change Hive's exit code, what's the right way to achieve this ? Thanks, Daniel
Re: Container launch failed Error
Good luck Share your results with us Daniel > On 24 בנוב׳ 2014, at 19:36, Amit Behera wrote: > > Hi Daniel, > > Thanks a lot, > > > I will do that and rerun the query. :) > >> On Mon, Nov 24, 2014 at 10:59 PM, Daniel Haviv >> wrote: >> It is a problem as the application master needs to contact the other nodes >> >> Try updating the hosts file on all the machines and try again. >> >> Daniel >> >>> On 24 בנוב׳ 2014, at 19:26, Amit Behera wrote: >>> >>> I did not modify in all the slaves. except slave >>> >>> will it be a problem ? >>> >>> But for small data (up to 20 GB table) it is running and for 300GB table >>> only count(*) running sometimes and sometimes failed >>> >>> Thanks >>> Amit >>> >>>> On Mon, Nov 24, 2014 at 10:37 PM, Daniel Haviv >>>> wrote: >>>> did you copy the hosts file to all the nodes? >>>> >>>> Daniel >>>> >>>>> On 24 בנוב׳ 2014, at 19:04, Amit Behera wrote: >>>>> >>>>> hi Daniel, >>>>> >>>>> >>>>> this stacktrace same for other query . >>>>> for different run I am getting slave7 sometime slave8... >>>>> >>>>> And also I registered all machine IPs in /etc/hosts >>>>> >>>>> Regards >>>>> Amit >>>>> >>>>> >>>>> >>>>>> On Mon, Nov 24, 2014 at 10:22 PM, Daniel Haviv >>>>>> wrote: >>>>>> It seems that the application master can't resolve slave6's name to an IP >>>>>> >>>>>> Daniel >>>>>> >>>>>>> On 24 בנוב׳ 2014, at 18:49, Amit Behera wrote: >>>>>>> >>>>>>> Hi Users, >>>>>>> >>>>>>> my cluster(1+8) configuration: >>>>>>> >>>>>>> RAM : 32 GB each >>>>>>> HDFS : 1.5 TB SSD >>>>>>> CPU : 8 core each >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> I am trying to query on 300GB of table but I am able to run only select >>>>>>> query. >>>>>>> >>>>>>> Except select query , for all other query I am getting following >>>>>>> exception. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Total jobs = 1 >>>>>>> Stage-1 is selected by condition resolver. >>>>>>> Launching Job 1 out of 1 >>>>>>> Number of reduce tasks not specified. Estimated >>>>>>> from input data size: 183 >>>>>>> In order to change the average load for a >>>>>>> reducer (in bytes): >>>>>>> set >>>>>>> hive.exec.reducers.bytes.per.reducer= >>>>>>> In order to limit the maximum number of >>>>>>> reducers: >>>>>>> set hive.exec.reducers.max= >>>>>>> In order to set a constant number of reducers: >>>>>>> set mapreduce.job.reduces= >>>>>>> Starting Job = job_1416831990090_0005, Tracking >>>>>>> URL = http://master:8088/proxy/application_1416831990090_0005/ >>>>>>> Kill Command = /root/hadoop/bin/hadoop job >>>>>>> -kill job_1416831990090_0005 >>>>>>> Hadoop job information for Stage-1: number of >>>>>>> mappers: 679; number of reducers: 183 >>>>>>> 2014-11-24 19:43:01,523 Stage-1 map = 0%, >>>>>>> reduce = 0% >>>>>>> 2014-11-24 19:43:22,730 Stage-1 map = 53%, >>>>>>> reduce = 0%, Cumulative CPU 625.19 sec >>>>>>> 2014-11-24 19:43:23,778 Stage-1 map = 100%, >>>>>>> reduce = 100% >>>>>>> MapReduce Total cumulative CPU time: 10 minutes >>>>>>> 25 seconds 190 msec >>>>>>> Ended Job = job_1416831990090_0005 with errors >>>>>>> Error during job, obtaining debugging >>>>>>> information... >>>>>>> Examining task ID: >>>>>>> task
Re: Container launch failed Error
It is a problem as the application master needs to contact the other nodes Try updating the hosts file on all the machines and try again. Daniel > On 24 בנוב׳ 2014, at 19:26, Amit Behera wrote: > > I did not modify in all the slaves. except slave > > will it be a problem ? > > But for small data (up to 20 GB table) it is running and for 300GB table only > count(*) running sometimes and sometimes failed > > Thanks > Amit > >> On Mon, Nov 24, 2014 at 10:37 PM, Daniel Haviv >> wrote: >> did you copy the hosts file to all the nodes? >> >> Daniel >> >>> On 24 בנוב׳ 2014, at 19:04, Amit Behera wrote: >>> >>> hi Daniel, >>> >>> >>> this stacktrace same for other query . >>> for different run I am getting slave7 sometime slave8... >>> >>> And also I registered all machine IPs in /etc/hosts >>> >>> Regards >>> Amit >>> >>> >>> >>>> On Mon, Nov 24, 2014 at 10:22 PM, Daniel Haviv >>>> wrote: >>>> It seems that the application master can't resolve slave6's name to an IP >>>> >>>> Daniel >>>> >>>>> On 24 בנוב׳ 2014, at 18:49, Amit Behera wrote: >>>>> >>>>> Hi Users, >>>>> >>>>> my cluster(1+8) configuration: >>>>> >>>>> RAM : 32 GB each >>>>> HDFS : 1.5 TB SSD >>>>> CPU : 8 core each >>>>> >>>>> --- >>>>> >>>>> I am trying to query on 300GB of table but I am able to run only select >>>>> query. >>>>> >>>>> Except select query , for all other query I am getting following >>>>> exception. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Total jobs = 1 >>>>> Stage-1 is selected by condition resolver. >>>>> Launching Job 1 out of 1 >>>>> Number of reduce tasks not specified. Estimated >>>>> from input data size: 183 >>>>> In order to change the average load for a >>>>> reducer (in bytes): >>>>> set >>>>> hive.exec.reducers.bytes.per.reducer= >>>>> In order to limit the maximum number of >>>>> reducers: >>>>> set hive.exec.reducers.max= >>>>> In order to set a constant number of reducers: >>>>> set mapreduce.job.reduces= >>>>> Starting Job = job_1416831990090_0005, Tracking >>>>> URL = http://master:8088/proxy/application_1416831990090_0005/ >>>>> Kill Command = /root/hadoop/bin/hadoop job >>>>> -kill job_1416831990090_0005 >>>>> Hadoop job information for Stage-1: number of >>>>> mappers: 679; number of reducers: 183 >>>>> 2014-11-24 19:43:01,523 Stage-1 map = 0%, >>>>> reduce = 0% >>>>> 2014-11-24 19:43:22,730 Stage-1 map = 53%, >>>>> reduce = 0%, Cumulative CPU 625.19 sec >>>>> 2014-11-24 19:43:23,778 Stage-1 map = 100%, >>>>> reduce = 100% >>>>> MapReduce Total cumulative CPU time: 10 minutes >>>>> 25 seconds 190 msec >>>>> Ended Job = job_1416831990090_0005 with errors >>>>> Error during job, obtaining debugging >>>>> information... >>>>> Examining task ID: >>>>> task_1416831990090_0005_m_05 (and more) from job >>>>> job_1416831990090_0005 >>>>> Examining task ID: >>>>> task_1416831990090_0005_m_42 (and more) from job >>>>> job_1416831990090_0005 >>>>> Examining task ID: >>>>> task_1416831990090_0005_m_35 (and more) from job >>>>> job_1416831990090_0005 >>>>> Examining task ID: >>>>> task_1416831990090_0005_m_65 (and more) from job >>>>> job_1416831990090_0005 >>>>> Examining task ID: >>>>> task_1416831990090_0005_m_02 (and more) from job >>>>> job_1416831990090_0005 >>>>> Examining task ID: >>>>> task_1416831990090_0005_m_07 (and more) from job >>>>> job_1416831990090_0005 >>>>> Examining task ID: >>>>> task_1416831990090_0005_m_58 (and more) from job >>>>> job_1416831990090_0005 >>>>> Examini
Re: Container launch failed Error
did you copy the hosts file to all the nodes? Daniel > On 24 בנוב׳ 2014, at 19:04, Amit Behera wrote: > > hi Daniel, > > > this stacktrace same for other query . > for different run I am getting slave7 sometime slave8... > > And also I registered all machine IPs in /etc/hosts > > Regards > Amit > > > >> On Mon, Nov 24, 2014 at 10:22 PM, Daniel Haviv >> wrote: >> It seems that the application master can't resolve slave6's name to an IP >> >> Daniel >> >>> On 24 בנוב׳ 2014, at 18:49, Amit Behera wrote: >>> >>> Hi Users, >>> >>> my cluster(1+8) configuration: >>> >>> RAM : 32 GB each >>> HDFS : 1.5 TB SSD >>> CPU : 8 core each >>> >>> --- >>> >>> I am trying to query on 300GB of table but I am able to run only select >>> query. >>> >>> Except select query , for all other query I am getting following exception. >>> >>> >>> >>> >>> >>> Total jobs = 1 >>> Stage-1 is selected by condition resolver. >>> Launching Job 1 out of 1 >>> Number of reduce tasks not specified. Estimated >>> from input data size: 183 >>> In order to change the average load for a >>> reducer (in bytes): >>> set >>> hive.exec.reducers.bytes.per.reducer= >>> In order to limit the maximum number of >>> reducers: >>> set hive.exec.reducers.max= >>> In order to set a constant number of reducers: >>> set mapreduce.job.reduces= >>> Starting Job = job_1416831990090_0005, Tracking >>> URL = http://master:8088/proxy/application_1416831990090_0005/ >>> Kill Command = /root/hadoop/bin/hadoop job >>> -kill job_1416831990090_0005 >>> Hadoop job information for Stage-1: number of >>> mappers: 679; number of reducers: 183 >>> 2014-11-24 19:43:01,523 Stage-1 map = 0%, >>> reduce = 0% >>> 2014-11-24 19:43:22,730 Stage-1 map = 53%, >>> reduce = 0%, Cumulative CPU 625.19 sec >>> 2014-11-24 19:43:23,778 Stage-1 map = 100%, >>> reduce = 100% >>> MapReduce Total cumulative CPU time: 10 minutes >>> 25 seconds 190 msec >>> Ended Job = job_1416831990090_0005 with errors >>> Error during job, obtaining debugging >>> information... >>> Examining task ID: >>> task_1416831990090_0005_m_05 (and more) from job >>> job_1416831990090_0005 >>> Examining task ID: >>> task_1416831990090_0005_m_42 (and more) from job >>> job_1416831990090_0005 >>> Examining task ID: >>> task_1416831990090_0005_m_35 (and more) from job >>> job_1416831990090_0005 >>> Examining task ID: >>> task_1416831990090_0005_m_65 (and more) from job >>> job_1416831990090_0005 >>> Examining task ID: >>> task_1416831990090_0005_m_02 (and more) from job >>> job_1416831990090_0005 >>> Examining task ID: >>> task_1416831990090_0005_m_07 (and more) from job >>> job_1416831990090_0005 >>> Examining task ID: >>> task_1416831990090_0005_m_58 (and more) from job >>> job_1416831990090_0005 >>> Examining task ID: >>> task_1416831990090_0005_m_43 (and more) from job >>> job_1416831990090_0005 >>> >>> >>> Task with the most failures(4): >>> - >>> Task ID: >>> task_1416831990090_0005_m_05 >>> >>> >>> URL: >>> >>> http://master:8088/taskdetails.jsp?jobid=job_1416831990090_0005&tipid=task_1416831990090_0005_m_05 >>> - >>> Diagnostic Messages for this Task: >>> Container launch failed for >>> container_1416831990090_0005_01_000112 : >>> java.lang.IllegalArgumentException: java.net.UnknownHostException: >>> slave6 >>> at >>> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418) >>> at >>> org.apache.hadoop.security.SecurityUtil.setTokenService(SecurityUtil.java:397) >>> at >>> org.apache.hadoop.yarn.util.ConverterUtils.convertFromYarn(ConverterUtils.java:233) >>> at >>> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:211) >>> at >>> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtoco
Re: Container launch failed Error
It seems that the application master can't resolve slave6's name to an IP Daniel > On 24 בנוב׳ 2014, at 18:49, Amit Behera wrote: > > Hi Users, > > my cluster(1+8) configuration: > > RAM : 32 GB each > HDFS : 1.5 TB SSD > CPU : 8 core each > > --- > > I am trying to query on 300GB of table but I am able to run only select query. > > Except select query , for all other query I am getting following exception. > > > > > > Total jobs = 1 > Stage-1 is selected by condition resolver. > Launching Job 1 out of 1 > Number of reduce tasks not specified. Estimated > from input data size: 183 > In order to change the average load for a > reducer (in bytes): > set > hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of > reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapreduce.job.reduces= > Starting Job = job_1416831990090_0005, Tracking > URL = http://master:8088/proxy/application_1416831990090_0005/ > Kill Command = /root/hadoop/bin/hadoop job > -kill job_1416831990090_0005 > Hadoop job information for Stage-1: number of > mappers: 679; number of reducers: 183 > 2014-11-24 19:43:01,523 Stage-1 map = 0%, > reduce = 0% > 2014-11-24 19:43:22,730 Stage-1 map = 53%, > reduce = 0%, Cumulative CPU 625.19 sec > 2014-11-24 19:43:23,778 Stage-1 map = 100%, > reduce = 100% > MapReduce Total cumulative CPU time: 10 minutes > 25 seconds 190 msec > Ended Job = job_1416831990090_0005 with errors > Error during job, obtaining debugging > information... > Examining task ID: > task_1416831990090_0005_m_05 (and more) from job > job_1416831990090_0005 > Examining task ID: > task_1416831990090_0005_m_42 (and more) from job > job_1416831990090_0005 > Examining task ID: > task_1416831990090_0005_m_35 (and more) from job > job_1416831990090_0005 > Examining task ID: > task_1416831990090_0005_m_65 (and more) from job > job_1416831990090_0005 > Examining task ID: > task_1416831990090_0005_m_02 (and more) from job > job_1416831990090_0005 > Examining task ID: > task_1416831990090_0005_m_07 (and more) from job > job_1416831990090_0005 > Examining task ID: > task_1416831990090_0005_m_58 (and more) from job > job_1416831990090_0005 > Examining task ID: > task_1416831990090_0005_m_43 (and more) from job > job_1416831990090_0005 > > > Task with the most failures(4): > - > Task ID: > task_1416831990090_0005_m_05 > > > URL: > > http://master:8088/taskdetails.jsp?jobid=job_1416831990090_0005&tipid=task_1416831990090_0005_m_05 > - > Diagnostic Messages for this Task: > Container launch failed for > container_1416831990090_0005_01_000112 : > java.lang.IllegalArgumentException: java.net.UnknownHostException: > slave6 > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418) > at > org.apache.hadoop.security.SecurityUtil.setTokenService(SecurityUtil.java:397) > at > org.apache.hadoop.yarn.util.ConverterUtils.convertFromYarn(ConverterUtils.java:233) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:211) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:189) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:110) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.UnknownHostException: slave6 > ... 12 more > > > > > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.mr.MapRedTask > MapReduce Jobs Launched: > Job 0: Map: 679 Reduce: 183 Cumulative CPU: > 625.19 sec HDFS Read: 0 HDFS Write: 0 FAIL > Total MapReduce CPU Time Spent: 10 minutes 25 > seconds 190 mse > > > > Please help me to fix the issue. > > Thanks > Amit
Problem after upgrading to hive 0.14
Hi, After upgrading to hive 0.14 any query I run I hit the following message: . . . . . . . . . . . . . . . .> ; INFO : Tez session hasn't been created yet. Opening session Error: Error while processing statement: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.tez.TezTask. org.apache.tez.mapreduce.hadoop.MRHelpers.updateEnvBasedOnMRAMEnv(Lorg/apache/hadoop/conf/Configuration;Ljava/util/Map;)V (state=08S01,code=-101) 0: jdbc:hive2://localhost:1> Closing: 0: jdbc:hive2://localhost:1 when I look into the HiveServer2 logs these are the errors I get: 1. 2014-11-22 10:22:24,812 ERROR [HiveServer2-Background-Pool: Thread-123]: operation.Operation (SQLOperation.java:run(199)) - Error running hive query: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.tez.TezTask. org.apache.tez.mapreduce.hadoop.MRHelpers.updateEnvBasedOnMRAMEnv(Lorg/apache/hadoop/conf/Configuration;Ljava/util/Map;)V at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:314) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:146) at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:69) at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:196) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536) at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:208) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.NoSuchMethodError: org.apache.tez.mapreduce.hadoop.MRHelpers.updateEnvBasedOnMRAMEnv(Lorg/apache/hadoop/conf/Configuration;Ljava/util/Map;)V at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:169) at org.apache.hadoop.hive.ql.exec.tez.TezTask.updateSession(TezTask.java:234) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:136) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1604) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1364) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1177) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1004) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:999) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:144) ... 12 more 2. 2014-11-22 10:22:33,015 ERROR [HiveServer2-Handler-Pool: Thread-35]: server.TThreadPoolServer (TThreadPoolServer.java:run(215)) - Error occurred during processing of message. java.lang.RuntimeException: org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:178) at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) ... 4 more Any ideas what can cause this ? Thanks, Daniel