OrcOutputFormat

2014-04-29 Thread Seema Datar
Hi,

I am trying to run an MR job to write files in ORC format.  I do not see any 
files created although the job runs successfully. If I change the output format 
from OrcOutputFormat to TextOutputFormat (and that being the only change), I 
see the output files getting created. I am using Hive-0.12.0. I tried upgrading 
to Hive 0.13.0 but with this version I get the following error -


2014-04-29 10:37:07,426 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error 
running child : java.lang.VerifyError: 
org/apache/hadoop/hive/ql/io/orc/OrcProto$RowIndex
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.init(WriterImpl.java:129)
at 
org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:369)
at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:104)
at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:91)
at 
org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.close(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:411)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:335)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1300)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)


How do you think can this issue be resolved?


Thanks,

Seema


Re: computing median and percentiles

2014-03-19 Thread Seema Datar


I understand the percentile function is supported in Hive in the latest 
versions. However, how does once calculate percentiles when the data is across 
two columns. So say -

Value  Count

100 2   ( so basically 100 occurred twice)
200 4
300 1
400 6
500 3


I want to find out the 0.25 percentile for the value distribution. How can I do 
it using the Hive percentile function?




Re: computing median and percentiles

2014-03-19 Thread Seema Datar
Not really. If it was a single column with no counters, Hive provides an option 
to use percentile. So basically if the data was like -

100
100
200
200
200
200
300

But if we have 2 columns, one that maintain the value and the other that 
maintains the count, how can Hive be used to derive the percentile?

Value Count
100  2
200  4
300  1

Thanks,
Seema

From: Stephen Sprague sprag...@gmail.commailto:sprag...@gmail.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Date: Thursday, March 20, 2014 5:28 AM
To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Subject: Re: computing median and percentiles

not a hive question is it?   its more like a math question.



On Wed, Mar 19, 2014 at 1:30 PM, Seema Datar 
sda...@yahoo-inc.commailto:sda...@yahoo-inc.com wrote:


I understand the percentile function is supported in Hive in the latest 
versions. However, how does once calculate percentiles when the data is across 
two columns. So say -

Value  Count

100 2   ( so basically 100 occurred twice)
200 4
300 1
400 6
500 3


I want to find out the 0.25 percentile for the value distribution. How can I do 
it using the Hive percentile function?