OrcOutputFormat
Hi, I am trying to run an MR job to write files in ORC format. I do not see any files created although the job runs successfully. If I change the output format from OrcOutputFormat to TextOutputFormat (and that being the only change), I see the output files getting created. I am using Hive-0.12.0. I tried upgrading to Hive 0.13.0 but with this version I get the following error - 2014-04-29 10:37:07,426 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.VerifyError: org/apache/hadoop/hive/ql/io/orc/OrcProto$RowIndex at org.apache.hadoop.hive.ql.io.orc.WriterImpl.init(WriterImpl.java:129) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:369) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:104) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:91) at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.close(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:411) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:335) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1300) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153) How do you think can this issue be resolved? Thanks, Seema
Re: computing median and percentiles
I understand the percentile function is supported in Hive in the latest versions. However, how does once calculate percentiles when the data is across two columns. So say - Value Count 100 2 ( so basically 100 occurred twice) 200 4 300 1 400 6 500 3 I want to find out the 0.25 percentile for the value distribution. How can I do it using the Hive percentile function?
Re: computing median and percentiles
Not really. If it was a single column with no counters, Hive provides an option to use percentile. So basically if the data was like - 100 100 200 200 200 200 300 But if we have 2 columns, one that maintain the value and the other that maintains the count, how can Hive be used to derive the percentile? Value Count 100 2 200 4 300 1 Thanks, Seema From: Stephen Sprague sprag...@gmail.commailto:sprag...@gmail.com Reply-To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Date: Thursday, March 20, 2014 5:28 AM To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Subject: Re: computing median and percentiles not a hive question is it? its more like a math question. On Wed, Mar 19, 2014 at 1:30 PM, Seema Datar sda...@yahoo-inc.commailto:sda...@yahoo-inc.com wrote: I understand the percentile function is supported in Hive in the latest versions. However, how does once calculate percentiles when the data is across two columns. So say - Value Count 100 2 ( so basically 100 occurred twice) 200 4 300 1 400 6 500 3 I want to find out the 0.25 percentile for the value distribution. How can I do it using the Hive percentile function?