Re: Count bug in Hive 3.0.0.3.1
Unsubscribe On Tue, Apr 28, 2020, 1:23 AM Deepak Krishna wrote: > Hi team, > > We came across a bug related to count function. We are using hive > 3.0.0.3.1 with Tez 0.9.0.3.1. PFA the queries to replicate the issue. > > Please register this as a bug and let us know if we can support in anyway > to fix the issue. It would also be helpful to know if there are any other > workarounds for this issue. > > Thanks and Regards, > Deepak Krishna > > > > > Deepak Krishna > Big Data Engineer > > [image: Tel.] > [image: Fax] +49 721 98993- > [image: E-Mail] hs-d...@solute.de > > solute GmbH > Zeppelinstraße 15 > 76185 Karlsruhe > Germany > > > [image: Logo Solute] > > Marken der solute GmbH | brands of solute GmbH > [image: Marken] > Geschäftsführer | Managing Director: Dr. Thilo Gans, Bernd Vermaaten > Webseite | www.solute.de > Sitz | Registered Office: Karlsruhe > Registergericht | Register Court: Amtsgericht Mannheim > Registernummer | Register No.: HRB 110579 > USt-ID | VAT ID: DE234663798 > > *Informationen zum Datenschutz | Information about privacy policy* > https://www.solute.de/ger/datenschutz/grundsaetze-der-datenverarbeitung.php > > > >
Re: Permissions preventing me from inserting data into table I have just created
make sure :/home/yaboulnaga/tmp/**hive-scratch/ is writeable by your processes. On Mon, Nov 26, 2012 at 10:07 AM, yabou...@uwaterloo.ca wrote: Hello, I'm using Cloudera's CDH4 with Hive 0.9 and Hive Server 2. I am trying to load data into hive using the JDBC driver (the one distributed with Cloudera CDH4 org.apache.hive.jdbc.**HiveDriver. I can create the staging table and LOAD LOCAL into it. However when I try to insert data into a table with Columnar SerDe Stored As RCFILE I get an error caused by file permissions. I don't think that the SerDE or the Stored as parameters have anything to do with the problem but I mentioned them for completeness. The problem is that hive creates a temporary file in its scratch folder (local) owned by hive:hive with permissions 755, then pass it as an input to a mapper running as the user mapred:mapred. Now the mapper tries to create something inside the input folder (probably can do this elsewhere), and the following exception is thrown: org.apache.hadoop.hive.ql.**metadata.HiveException: java.io.IOException: Mkdirs failed to create file:/home/yaboulnaga/tmp/** hive-scratch/hive_2012-11-26_**10-46-44_887_** 2004468370569495405/_task_tmp.**-ext-10002 at org.apache.hadoop.hive.ql.io.**HiveFileFormatUtils.** getHiveRecordWriter(**HiveFileFormatUtils.java:237) at org.apache.hadoop.hive.ql.**exec.FileSinkOperator.** createBucketFiles(**FileSinkOperator.java:477) at org.apache.hadoop.hive.ql.**exec.FileSinkOperator.closeOp(** FileSinkOperator.java:709) at org.apache.hadoop.hive.ql.**exec.Operator.close(Operator.** java:557) at org.apache.hadoop.hive.ql.**exec.Operator.close(Operator.** java:566) at org.apache.hadoop.hive.ql.**exec.Operator.close(Operator.** java:566) at org.apache.hadoop.hive.ql.**exec.Operator.close(Operator.** java:566) at org.apache.hadoop.hive.ql.**exec.Operator.close(Operator.** java:566) at org.apache.hadoop.hive.ql.**exec.ExecMapper.close(** ExecMapper.java:193) at org.apache.hadoop.mapred.**MapRunner.run(MapRunner.java:**57) at org.apache.hadoop.mapred.**MapTask.runOldMapper(MapTask.** java:393) at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:327) at org.apache.hadoop.mapred.**Child$4.run(Child.java:268) at java.security.**AccessController.doPrivileged(**Native Method) at javax.security.auth.Subject.**doAs(Subject.java:396) at org.apache.hadoop.security.**UserGroupInformation.doAs(** UserGroupInformation.java:**1332) at org.apache.hadoop.mapred.**Child.main(Child.java:262) As you might have noticed, I moved the scrach folder to a directory under my home dir so that I can give this directory 777 permissions. The idea was to use hive.files.umask.value of to cause subdirectories to inherit the same open permission (not the best workaround, but wouldn't hurt on my local machine). Unfortunately this didn't work even when I added a umask to /etc/init.d/hiveserver2. Can someone please tell me what's the right way to do this? I mean create a table and then insert values into it! The Hive QL statements I use are very similar to the ones in the tutorials about loading data. Cheers! -- Younos -- The whole world is you. Yet you keep thinking there is something else. - Xuefeng Yicun 822-902 A.D. Tim R. Havens Google Phone: 573.454.1232 ICQ: 495992798 ICBM: 37°51'34.79N 90°35'24.35W ham radio callsign: NW0W
Re: Long running Join Query - Reduce task fails due to failing to report status
Just curious if you've tried using Hive's explain method to see what IT thinks of your query. On Fri, Aug 24, 2012 at 9:36 AM, Himanish Kushary himan...@gmail.comwrote: Hi, We have a complex query that involves several left outer joins resulting in 8 M/R jobs in Hive.During execution of one of the stages ( after three M/R has run) the M/R job fails due to few Reduce tasks failing due to inactivity. Most of the reduce tasks go through fine ( within 3 mins) but the last one gets stuck for a long time ( 1 hour) and finally after several attempts gets killed due to failed to report status for 600 seconds. Killing! What may be causing this issue ? Would hive.script.auto.progress help in this case ? As we are not able to get much information from the log files how may we approach resolving this ? Will tweaking of any specific M/R parameters help ? The task attempt log shows several lines like this before exiting : 2012-08-23 19:17:23,848 INFO ExecReducer: ExecReducer: processing 21900 rows: used memory = 408582240 2012-08-23 19:17:30,189 INFO ExecReducer: ExecReducer: processing 22000 rows: used memory = 346110400 2012-08-23 19:17:37,510 INFO ExecReducer: ExecReducer: processing 22100 rows: used memory = 583913576 2012-08-23 19:17:44,829 INFO ExecReducer: ExecReducer: processing 22200 rows: used memory = 513071504 2012-08-23 19:17:47,923 INFO org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1 Here are the reduce task counters: *Map-Reduce Framework* Combine input records0 Combine output records0Reduce input groups 222,480,335 Reduce shuffle bytes7,726,141,897 Reduce input records 222,480,335 Reduce output records0 Spilled Records355,827,191 CPU time spent (ms)2,152,160 Physical memory (bytes) snapshot1,182,490,624 Virtual memory (bytes) snapshot1,694,531,584 Total committed heap usage (bytes) 990,052,352 The tasktracker log gives a thread dump at that time but no exception. *2012-08-23 20:05:49,319 INFO org.apache.hadoop.mapred.TaskTracker: Process Thread Dump: lost task* *69 active threads* --- Thanks Regards Himanish -- The whole world is you. Yet you keep thinking there is something else. - Xuefeng Yicun 822-902 A.D. Tim R. Havens Google Phone: 573.454.1232 ICQ: 495992798 ICBM: 37°51'34.79N 90°35'24.35W ham radio callsign: NW0W
Re: How to list all views
There may be a way to select table_name's from the metastore...if so name your views like *_view and select for those table names. Of course if you can query the metastore there's probably an even better way where you don't have to name you view's anything special, and just search for a table type or some other descriptor. I've not really poked around in the metastore much ... but that's probably where Hive would have to look anyway. Not sure if there are any built in commands for selecting data from the metastore directly like this...other than things like 'show tables' Tim On Thu, Jul 5, 2012 at 10:15 AM, Stephen R. Scaffidi sscaff...@tripadvisor.com wrote: Thank you, but what I need is to list only the views, or conversely, only the actual tables. On 07/05/2012 11:14 AM, Bejoy KS wrote: Hi Stephen You can see the views as well along with tables using the Show tables; command. --Original Message-- From: Stephen R. Scaffidi To: user@hive.apache.org ReplyTo: user@hive.apache.org ReplyTo: sscaff...@tripadvisor.com Subject: How to list all views Sent: Jul 5, 2012 20:38 How can I list all the views in hive? I can't seem to figure out how to do it with HQL. Regards Bejoy KS Sent from handheld, please excuse typos. -- The whole world is you. Yet you keep thinking there is something else. - Xuefeng Yicun 822-902 A.D. Tim R. Havens Google Phone: 573.454.1232 ICQ: 495992798 ICBM: 37°51'34.79N 90°35'24.35W ham radio callsign: NW0W
Re: How to override a property with in a hive query
What are you trying to accomplish that a method like this won't work for? On Mon, Jul 2, 2012 at 10:25 PM, Abhishek abhishek.dod...@gmail.com wrote: Hi Tim, Is this the only way, or if we have any other ways. Sent from my iPhone On Jul 2, 2012, at 8:49 PM, Tim Havens timhav...@gmail.com wrote: Before your query type SET property = blah; On Mon, Jul 2, 2012 at 7:13 PM, Abhishek abhishek.dod...@gmail.comwrote: Hi all, Can I know, how to override a property of mapred-site.xml in Hive join query. Suppose for map reduce job we override using -D property name=value, how to do it with in hive query. Regards Abhishek. Begin forwarded message: *From:* Abhishek abhishek.dod...@gmail.com *Date:* July 2, 2012 8:10:47 PM EDT *To:* user@hive.apache.org user@hive.apache.org *Subject:* *How to a property in hive query* Hi all, Can I know, how to override a property of mapred-site.xml in Hive join query. Suppose for map reduce job we override using -D property name=value, how to do it with in hive query. -- The whole world is you. Yet you keep thinking there is something else. - Xuefeng Yicun 822-902 A.D. Tim R. Havens Google Phone: 573.454.1232 ICQ: 495992798 ICBM: 37°51'34.79N 90°35'24.35W ham radio callsign: NW0W -- The whole world is you. Yet you keep thinking there is something else. - Xuefeng Yicun 822-902 A.D. Tim R. Havens Google Phone: 573.454.1232 ICQ: 495992798 ICBM: 37°51'34.79N 90°35'24.35W ham radio callsign: NW0W
Re: How to override a property with in a hive query
Before your query type SET property = blah; On Mon, Jul 2, 2012 at 7:13 PM, Abhishek abhishek.dod...@gmail.com wrote: Hi all, Can I know, how to override a property of mapred-site.xml in Hive join query. Suppose for map reduce job we override using -D property name=value, how to do it with in hive query. Regards Abhishek. Begin forwarded message: *From:* Abhishek abhishek.dod...@gmail.com *Date:* July 2, 2012 8:10:47 PM EDT *To:* user@hive.apache.org user@hive.apache.org *Subject:* *How to a property in hive query* Hi all, Can I know, how to override a property of mapred-site.xml in Hive join query. Suppose for map reduce job we override using -D property name=value, how to do it with in hive query. -- The whole world is you. Yet you keep thinking there is something else. - Xuefeng Yicun 822-902 A.D. Tim R. Havens Google Phone: 573.454.1232 ICQ: 495992798 ICBM: 37°51'34.79N 90°35'24.35W ham radio callsign: NW0W
Please HELP: HIVE alter table add new partition to schema...
So...I have a table that has thousands of files, and Billions of rows related it. Lets make this a simple table: CREATE TABLE test_table ( ts BIGINT, exec_time DOUBLE, domain_id BIGINT, domain_name STRING, ) PARTITIONED BY (logdate STRING, source STRING, datacenter STRING, hostname STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY '\001' MAP KEYS TERMINATED BY '\002' LINES TERMINATED BY '\n' STORED AS TextFile; So...what I need to do is ADD a partition to the PARTITIONED BY spec above The partitioned by column I want to add is 'loghour STRING'. I can't seem to find any way to accomplish actually adding a NEW PARTITION COLUMN in the 'PARTITIONED BY' spec, without completely recreating and reloading the table. What's the correct way of adding to the partition schema and new column like 'loghour STRING'. I'm not trying to add an entry into the table DATA, I'm trying to add a completely new PARTITIONED BY Column...
Re: Please HELP: HIVE alter table add new partition to schema...
Thanks for the 'small files' heads up. The current LOGDATE file is about 7.2GB Gzipped. So I don't think we'll be running into many small files when that's split into 24 hours. But thanks for the reminder about 'small files'. I guess I'd rather just reload the 2500 logs or so...than have to jump through hoops... On Tue, Jun 19, 2012 at 10:45 AM, Mark Grover mgro...@oanda.com wrote: I agree. It would be best if you wrote a script that iterates through each leaf-level partition of your existing table (logdate='A', source='B', datacenter='C', hostname='D') and populate new leaf-level partitions in the new table (logdate='A', source='B', datacenter='C', hostname='D', loghour). By leaf-level partition, I am referring to the partition column that appears the last in the list of partition columns. Technically, you can do dynamic partitioning in 1 query with something like: FROM test_table src INSERT OVERWRITE TABLE dest PARTITION(logdate, source, datacenter, hostname, loghour) SELECT ts, exec_time, domain_id, domain_name, logdate, source, datacenter, hostname, loghour; However, as far as I understand, you wouldn't be able to take advantage of the existing partitioning in your source table (test_table). If you would like to take advantage of existing partitioning, you would have to issue a series of queries like this: FROM test_table src INSERT OVERWRITE TABLE dest PARTITION(logdate='A', source='B', datacenter='C', hostname='D', loghour) SELECT ts, exec_time, domain_id, domain_name, logdate, source, datacenter, hostname, loghour where logdate='A' and source='B' and datacenter='C' and hostname='D'; Just as a side note, whenever you think of your table partitioning, keep in mind to not overdo it. Creating more partitions could lead to a lot of small files on HDFS which reduces the performance of your Hadoop cluster. A couple people have talked about this small files problem: http://arunxjacob.blogspot.ca/2011/04/hdfs-file-size-vs-allocation-other.html http://www.cloudera.com/blog/2009/02/the-small-files-problem/ http://blog.rapleaf.com/dev/2008/11/20/give-me-liberty-or-give-me-death-but-dont-give-me-small-files/ In general, you would like your file sizes to be atleast of HDFS block size, most likely a small multiple of the block size. If you do find that you are running into the small files problem, there are other ways to get around like bucketing. Good luck! Mark - Original Message - From: Edward Capriolo edlinuxg...@gmail.com To: user@hive.apache.org Sent: Tuesday, June 19, 2012 11:12:48 AM Subject: Re: Please HELP: HIVE alter table add new partition to schema... You can not change the partition columns. I would use a dynamic partition insert to select all the data from the original table into the new table. On 6/19/12, Tim Havens timhav...@gmail.com wrote: So...I have a table that has thousands of files, and Billions of rows related it. Lets make this a simple table: CREATE TABLE test_table ( ts BIGINT, exec_time DOUBLE, domain_id BIGINT, domain_name STRING, ) PARTITIONED BY (logdate STRING, source STRING, datacenter STRING, hostname STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY '\001' MAP KEYS TERMINATED BY '\002' LINES TERMINATED BY '\n' STORED AS TextFile; So...what I need to do is ADD a partition to the PARTITIONED BY spec above The partitioned by column I want to add is 'loghour STRING'. I can't seem to find any way to accomplish actually adding a NEW PARTITION COLUMN in the 'PARTITIONED BY' spec, without completely recreating and reloading the table. What's the correct way of adding to the partition schema and new column like 'loghour STRING'. I'm not trying to add an entry into the table DATA, I'm trying to add a completely new PARTITIONED BY Column... -- The whole world is you. Yet you keep thinking there is something else. - Xuefeng Yicun 822-902 A.D. Tim R. Havens Google Phone: 573.454.1232 ICQ: 495992798 ICBM: 37°51'34.79N 90°35'24.35W ham radio callsign: NW0W
Re: Order by Sort by partitioned columns
I frequently sort by partitioned columns, without issues. Post your table schema, and your query that's failing, lets see what's going on? Tim On Mon, May 14, 2012 at 1:28 AM, Shin Chan had...@gmx.com wrote: Hi All Just curious if its possible to Order by or Sort by partitioned columns. I tried it , it fails as it is not able to find those columns. They are present as partitions. Any help would be appreciated. Thanks and Regards , -- The whole world is you. Yet you keep thinking there is something else. - Xuefeng Yicun 822-902 A.D. Tim R. Havens Google Phone: 573.454.1232 ICQ: 495992798 ICBM: 37°51'34.79N 90°35'24.35W ham radio callsign: NW0W
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask
I have a similar UDF to this one which create's just fine. I cam seem to resolve what 'return code -101' means however with this one. Can anyone tell me what 'return code -101' means? My StemTermsUDF.jar has the proper classpath for the JWNL jars already, I'm trying to insure they've REALLY available by 'add jar' within hive. All the paths are correct and I've checked them MANY times to be sure before posting this. hive add file /usr/lib/hadoop/lib/jwnl/jwnl_properties.xml; Added resource: /usr/lib/hadoop/lib/jwnl/jwnl_properties.xml hive add jar /usr/lib/hive/lib/jwnl/jwnl.jar; Added /usr/lib/hive/lib/jwnl/jwnl.jar to class path Added resource: /usr/lib/hive/lib/jwnl/jwnl.jar hive add jar /usr/lib/hive/lib/jwnl/commons-logging.jar; Added /usr/lib/hive/lib/jwnl/commons-logging.jar to class path Added resource: /usr/lib/hive/lib/jwnl/commons-logging.jar hive add jar StemTermsUDF.jar; Added StemTermsUDF.jar to class path Added resource: StemTermsUDF.jar hive create temporary function StemTermsUDF as 'org.apache.hadoop.hive.ql.udf.StemTermsUDF'; java.lang.NoClassDefFoundError: net/didion/jwnl/JWNLException at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.hadoop.hive.ql.exec.FunctionTask.getUdfClass(FunctionTask.java: 119) at org.apache.hadoop.hive.ql.exec.FunctionTask.createFunction(FunctionTask. java: 75) at org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java: 63) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java: 130) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java: 57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java: 1063) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java: 516) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 57) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java: 43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) Caused by: java.lang.ClassNotFoundException: net.didion.jwnl.JWNLException at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java: 294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) ... 18 more FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask
Re: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask
Unfortunately the issue appears to be something with the Jar, or my UDF. What I can't seem to resolve is what is causing the -101 Error Code. Tim On Sun, Jan 22, 2012 at 3:26 PM, Aniket Mokashi aniket...@gmail.com wrote: A simplest way would be to put the jar in auxlib directory. That does the both for you I guess. After that you can directly create temporary function in hive. ~Aniket On Sun, Jan 22, 2012 at 1:24 PM, Aniket Mokashi aniket...@gmail.comwrote: Add the jar to HADOOP_CLASSPATH when you launch hive. That should help. Thanks, Aniket On Sun, Jan 22, 2012 at 9:25 AM, Tim Havens timhav...@gmail.com wrote: I have a similar UDF to this one which create's just fine. I cam seem to resolve what 'return code -101' means however with this one. Can anyone tell me what 'return code -101' means? My StemTermsUDF.jar has the proper classpath for the JWNL jars already, I'm trying to insure they've REALLY available by 'add jar' within hive. All the paths are correct and I've checked them MANY times to be sure before posting this. hive add file /usr/lib/hadoop/lib/jwnl/jwnl_properties.xml; Added resource: /usr/lib/hadoop/lib/jwnl/jwnl_properties.xml hive add jar /usr/lib/hive/lib/jwnl/jwnl.jar; Added /usr/lib/hive/lib/jwnl/jwnl.jar to class path Added resource: /usr/lib/hive/lib/jwnl/jwnl.jar hive add jar /usr/lib/hive/lib/jwnl/commons-logging.jar; Added /usr/lib/hive/lib/jwnl/commons-logging.jar to class path Added resource: /usr/lib/hive/lib/jwnl/commons-logging.jar hive add jar StemTermsUDF.jar; Added StemTermsUDF.jar to class path Added resource: StemTermsUDF.jar hive create temporary function StemTermsUDF as 'org.apache.hadoop.hive.ql.udf.StemTermsUDF'; java.lang.NoClassDefFoundError: net/didion/jwnl/JWNLException at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.hadoop.hive.ql.exec.FunctionTask.getUdfClass( FunctionTask.java: 119) at org.apache.hadoop.hive.ql.exec.FunctionTask.createFunction(FunctionTask. java: 75) at org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java: 63) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java: 130) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java: 57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java: 1063) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java: 516) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java: 57) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java: 43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) Caused by: java.lang.ClassNotFoundException: net.didion.jwnl.JWNLException at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java: 294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) ... 18 more FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask -- ...:::Aniket:::... Quetzalco@tl -- ...:::Aniket:::... Quetzalco@tl