Re: Hive HWI ... request for your experience to be used Production
Hi Manish: Glad to receive your email because we are making efforts on HWI. We have improved the orignal and added some features and putted it on github: https://github.com/anjuke/hwi It's far from mature and standard, but it's improving and has already deployed for our company to use. After all, have a try and give some advice if you're interested in it. Thanks Qiang 2013/1/5 Manish Malhotra > > Hi All, > > We are exploring HWI to be used in PROD environment for adhoc queries etc. > Want to check out in the hive community that can somebody share there > experience while using the HWI in prod or any environment in terms of its > stability and performance. > Also evaluating to enhance to make it more useful with different features. > > Thanks for your time and help !! > > Regards, > Manish > > > >
Re: HiveHistoryViewer concurrency problem
Maybe it's not. But this exception happens when I create an *HiveHistoryViewer* instance, in which case only reading, parsing file is invloved and it's not intended to be shared between threads. So the exception surprised me and I wonder why a static buffer was used instead of a local buffer which has no concurrent issue. 2013/1/5 Edward Capriolo > It is likely an oversight. The Majority of hive code was not written to be > multi-threaded. > > > > On Fri, Jan 4, 2013 at 10:41 PM, Jie Li wrote: > >> Hi Qiang, >> >> Could you describe how HiveHistoryViewer is used? I'm also looking for >> a tool to understand the Hive log. >> >> Thanks, >> Jie >> >> On Sat, Jan 5, 2013 at 9:54 AM, Qiang Wang wrote: >> > Does Anybody have an idea about this? >> > >> > https://issues.apache.org/jira/browse/HIVE-3857 >> > >> > >> > 2013/1/4 Qiang Wang >> >> >> >> new HiveHistoryViewer() throws ConcurrentModificationException when >> called >> >> concurrently by several threads. >> >> >> >> According to the stack trace, HiveHistory.parseLine use private static >> >> Map parseBuffer to store parsed data and this caused >> the >> >> exception. >> >> >> >> I don't know why a static buffer rather than a local buffer is used! >> >> Anybody have an idea about this? >> > >> > >> > >
Re: HiveHistoryViewer concurrency problem
Hi Jie: As I know, hive history log is structured and class *HiveHistory* is used to write and read hive history log. *HiveHistoryViewer* serves as a listener to listen and store parsed log data. It has two members: HashMap *jobInfoMap*, which stores QueryInfo related with hive query and HashMap *taskInfoMap*, which stores TaskInfo related with hadoop map/red job you can dump the two maps and find what you want. Hope these info helps Qiang 2013/1/5 Jie Li > Hi Qiang, > > Could you describe how HiveHistoryViewer is used? I'm also looking for > a tool to understand the Hive log. > > Thanks, > Jie > > On Sat, Jan 5, 2013 at 9:54 AM, Qiang Wang wrote: > > Does Anybody have an idea about this? > > > > https://issues.apache.org/jira/browse/HIVE-3857 > > > > > > 2013/1/4 Qiang Wang > >> > >> new HiveHistoryViewer() throws ConcurrentModificationException when > called > >> concurrently by several threads. > >> > >> According to the stack trace, HiveHistory.parseLine use private static > >> Map parseBuffer to store parsed data and this caused the > >> exception. > >> > >> I don't know why a static buffer rather than a local buffer is used! > >> Anybody have an idea about this? > > > > >
Re: HiveHistoryViewer concurrency problem
It is likely an oversight. The Majority of hive code was not written to be multi-threaded. On Fri, Jan 4, 2013 at 10:41 PM, Jie Li wrote: > Hi Qiang, > > Could you describe how HiveHistoryViewer is used? I'm also looking for > a tool to understand the Hive log. > > Thanks, > Jie > > On Sat, Jan 5, 2013 at 9:54 AM, Qiang Wang wrote: > > Does Anybody have an idea about this? > > > > https://issues.apache.org/jira/browse/HIVE-3857 > > > > > > 2013/1/4 Qiang Wang > >> > >> new HiveHistoryViewer() throws ConcurrentModificationException when > called > >> concurrently by several threads. > >> > >> According to the stack trace, HiveHistory.parseLine use private static > >> Map parseBuffer to store parsed data and this caused the > >> exception. > >> > >> I don't know why a static buffer rather than a local buffer is used! > >> Anybody have an idea about this? > > > > >
Map-only aggregation
Hi all, Can Hive implement the aggregation as a Map-only job? As we know the data may be pre-partitioned via PARTITION-BY or CLUSTERED-BY, so we don't need the reduce phase to repartition the data. The Bucket Join seems to take advantage of the buckets for joins, so I wonder if there is some similar optimization for aggregations. Thanks, Jie
Re: HiveHistoryViewer concurrency problem
Hi Qiang, Could you describe how HiveHistoryViewer is used? I'm also looking for a tool to understand the Hive log. Thanks, Jie On Sat, Jan 5, 2013 at 9:54 AM, Qiang Wang wrote: > Does Anybody have an idea about this? > > https://issues.apache.org/jira/browse/HIVE-3857 > > > 2013/1/4 Qiang Wang >> >> new HiveHistoryViewer() throws ConcurrentModificationException when called >> concurrently by several threads. >> >> According to the stack trace, HiveHistory.parseLine use private static >> Map parseBuffer to store parsed data and this caused the >> exception. >> >> I don't know why a static buffer rather than a local buffer is used! >> Anybody have an idea about this? > >
Fwd: Hive HWI ... request for your experience to be used Production
Hi All, We are exploring HWI to be used in PROD environment for adhoc queries etc. Want to check out in the hive community that can somebody share there experience while using the HWI in prod or any environment in terms of its stability and performance. Also evaluating to enhance to make it more useful with different features. Thanks for your time and help !! Regards, Manish
Re: HiveHistoryViewer concurrency problem
Does Anybody have an idea about this? https://issues.apache.org/jira/browse/HIVE-3857 2013/1/4 Qiang Wang > new HiveHistoryViewer() throws ConcurrentModificationException when called > concurrently by several threads. > > According to the stack trace, HiveHistory.parseLine use *private static > Map parseBuffer* to store parsed data and this caused the > exception. > > I don't know why a static buffer rather than a local buffer is used! > Anybody have an idea about this? >
Re: 0.8.0 -> 0.9.0 mysql schema upgrade
Looks like this column is not even there in the 0.8/0.9 schema files . I have no idea, how I have it in my schema . I just set a default 'false' value and I m fine now. Sam On Jan 4, 2013, at 2:22 PM, Sam William wrote: > When I upgraded to 0.9.0, Im getting an exception when I try to create > tables > > FAILED: Error in metadata: javax.jdo.JDODataStoreException: Insert of object > "org.apache.hadoop.hive.metastore.model.MStorageDescriptor@4774e78a" using > statement "INSERT INTO `SDS` > (`SD_ID`,`NUM_BUCKETS`,`LOCATION`,`INPUT_FORMAT`,`CD_ID`,`OUTPUT_FORMAT`,`SERDE_ID`,`IS_COMPRESSED`) > VALUES (?,?,?,?,?,?,?,?)" failed : Field 'IS_STOREDASSUBDIRECTORIES' doesn't > have a default value > NestedThrowables: > java.sql.SQLException: Field 'IS_STOREDASSUBDIRECTORIES' doesn't have a > default value > > > The upgrade script from 0.8 to 0.9 doesnt have anything ? What am I > missing ? > > Sam William > sa...@stumbleupon.com > > > Sam William sa...@stumbleupon.com
0.8.0 -> 0.9.0 mysql schema upgrade
When I upgraded to 0.9.0, Im getting an exception when I try to create tables FAILED: Error in metadata: javax.jdo.JDODataStoreException: Insert of object "org.apache.hadoop.hive.metastore.model.MStorageDescriptor@4774e78a" using statement "INSERT INTO `SDS` (`SD_ID`,`NUM_BUCKETS`,`LOCATION`,`INPUT_FORMAT`,`CD_ID`,`OUTPUT_FORMAT`,`SERDE_ID`,`IS_COMPRESSED`) VALUES (?,?,?,?,?,?,?,?)" failed : Field 'IS_STOREDASSUBDIRECTORIES' doesn't have a default value NestedThrowables: java.sql.SQLException: Field 'IS_STOREDASSUBDIRECTORIES' doesn't have a default value The upgrade script from 0.8 to 0.9 doesnt have anything ? What am I missing ? Sam William sa...@stumbleupon.com
Re: Timestamp, Epoch Time, Functions and other Frustrations
So I read that JIRA, and also found this linked JIRA: https://issues.apache.org/jira/browse/HIVE-3454 So I decided to try the * 1.0 work around. select starttime, from_unixtime(starttime) as unixtime, cast((starttime * 1.0) as timestamp) as castts, from_utc_timestamp(starttime * 1.0, 'GMT') as fromtsgmt, from_utc_timestamp(starttime * 1.0, 'CST') asfromtscst from table Hypothesis give starttime= 1356588013 (and based off the epoch convertor website) unixtime = 2012-12-27 00:00:13 # This is because unix time displays the time in the system time zone castts = 2012-12-27 06:00:13.0 # This is because timestamp is a UTC time, it should match the GMT time fromtsgmt = 2012-12-27 06:00:13.0 # This should be exactly what the TS is so it should be the same as the cast fromtsCST =2012-12-27 00:00:13.0 # This should be the same (time based) result as from from_unixtime Actual Results: unixtime =2012-12-27 00:00:13 # 1 for 1 ! castts = 2012-12-27 00:00:13.0 # What? Why is this the same as unixtime? fromtsgmt = 2012-12-27 00:00:13.0 # What is THIS the same as unixtime? fromtscst = 2012-12-26 18:00:13.0 # This is 6 hours behind? Why did my epoch time get coverted to timestamp as if we added 6 to the hour? ! That makes NO sense, even ignoring the bug in the conversion requiring a float, am I doing this wrong or is there a different bug in how this is approached? On Fri, Jan 4, 2013 at 10:30 AM, Mark Grover wrote: > Brad is correct, there is a JIRA about this already: > https://issues.apache.org/jira/browse/HIVE-3822 > > Sorry for the inconvenience. > > Mark > > On Fri, Jan 4, 2013 at 8:25 AM, Brad Cavanagh > wrote: > > Try multiplying your values by 1000, then running the conversions. I bet > > they expect milliseconds since the epoch instead of seconds. > > > > Brad. > > > > > > On 2013-01-04, at 8:03 AM, John Omernik wrote: > > > > Greetings all. I am getting frustrated with the documentation and lack of > > intuitiveness in Hive relating to timestamps and was hoping I could post > > here and get some clarification or other ideas. > > > > I have a field that is a string, but is actually a 10 digit int > > representation of epoch time, I am going to list out the results of > various > > functions. > > > > Value = 1356588013 > > > > Hive: > > > > from_unixtime(Value) = 2012-12-27 00:00:13 (Timezone CST on the system > time, > > so that works) > > cast(value as timestamp) = 1970-01-16 10:49:48.013 > > cast(cast(value as int) as timestamp = 1970-01-16 10:49:48.013 > > from_utc_timestamp(starttime, 'GMT') = 1970-01-16 10:49:48.013 > > from_utc_timestamp(starttime, 'CST') = 1970-01-16 04:49:48.013 > > > > > > Epoch Converter - http://www.epochconverter.com/ > > > > Thu, 27 Dec 2012 06:00:13 GMT - GMT Representation of the time > > Thu Dec 27 2012 00:00:13 GMT-6 - My Timezone representation > > > > Ok Given all of these representations... how do I get the Value ( a valid > > epoch time) into a GMT time basically, 2012-12-27 06:00:13 without just > > doing math. (Math is error prone on system as we move across timezone). > Why > > doesn't the casting of the value to timestamp or even the casting of the > int > > cast of the time stamp work? Why does it read 1970? This is very > > frustrating and should be more intuitive. Please advise. > > > > >
Re: Timestamp, Epoch Time, Functions and other Frustrations
Brad is correct, there is a JIRA about this already: https://issues.apache.org/jira/browse/HIVE-3822 Sorry for the inconvenience. Mark On Fri, Jan 4, 2013 at 8:25 AM, Brad Cavanagh wrote: > Try multiplying your values by 1000, then running the conversions. I bet > they expect milliseconds since the epoch instead of seconds. > > Brad. > > > On 2013-01-04, at 8:03 AM, John Omernik wrote: > > Greetings all. I am getting frustrated with the documentation and lack of > intuitiveness in Hive relating to timestamps and was hoping I could post > here and get some clarification or other ideas. > > I have a field that is a string, but is actually a 10 digit int > representation of epoch time, I am going to list out the results of various > functions. > > Value = 1356588013 > > Hive: > > from_unixtime(Value) = 2012-12-27 00:00:13 (Timezone CST on the system time, > so that works) > cast(value as timestamp) = 1970-01-16 10:49:48.013 > cast(cast(value as int) as timestamp = 1970-01-16 10:49:48.013 > from_utc_timestamp(starttime, 'GMT') = 1970-01-16 10:49:48.013 > from_utc_timestamp(starttime, 'CST') = 1970-01-16 04:49:48.013 > > > Epoch Converter - http://www.epochconverter.com/ > > Thu, 27 Dec 2012 06:00:13 GMT - GMT Representation of the time > Thu Dec 27 2012 00:00:13 GMT-6 - My Timezone representation > > Ok Given all of these representations... how do I get the Value ( a valid > epoch time) into a GMT time basically, 2012-12-27 06:00:13 without just > doing math. (Math is error prone on system as we move across timezone). Why > doesn't the casting of the value to timestamp or even the casting of the int > cast of the time stamp work? Why does it read 1970? This is very > frustrating and should be more intuitive. Please advise. > >
Re: Timestamp, Epoch Time, Functions and other Frustrations
Try multiplying your values by 1000, then running the conversions. I bet they expect milliseconds since the epoch instead of seconds. Brad. On 2013-01-04, at 8:03 AM, John Omernik wrote: > Greetings all. I am getting frustrated with the documentation and lack of > intuitiveness in Hive relating to timestamps and was hoping I could post here > and get some clarification or other ideas. > > I have a field that is a string, but is actually a 10 digit int > representation of epoch time, I am going to list out the results of various > functions. > > Value = 1356588013 > > Hive: > > from_unixtime(Value) = 2012-12-27 00:00:13 (Timezone CST on the system time, > so that works) > cast(value as timestamp) = 1970-01-16 10:49:48.013 > cast(cast(value as int) as timestamp = 1970-01-16 10:49:48.013 > from_utc_timestamp(starttime, 'GMT') = 1970-01-16 10:49:48.013 > from_utc_timestamp(starttime, 'CST') = 1970-01-16 04:49:48.013 > > > Epoch Converter - http://www.epochconverter.com/ > > Thu, 27 Dec 2012 06:00:13 GMT - GMT Representation of the time > Thu Dec 27 2012 00:00:13 GMT-6 - My Timezone representation > > Ok Given all of these representations... how do I get the Value ( a valid > epoch time) into a GMT time basically, 2012-12-27 06:00:13 without just doing > math. (Math is error prone on system as we move across timezone). Why doesn't > the casting of the value to timestamp or even the casting of the int cast of > the time stamp work? Why does it read 1970? This is very frustrating and > should be more intuitive. Please advise. > >
Re: Timestamp, Epoch Time, Functions and other Frustrations
One more test: to_utc_timestamp(from_unixtime(value), 'CST') as to_from, provided the proper timestamp for me, however, I still had to provide the timezone which I should NOT have to do. I know that this data coming in is in epoch time, therefore I should be able to create a timezone without knowing a timezone or timezone offset. On Fri, Jan 4, 2013 at 10:03 AM, John Omernik wrote: > Greetings all. I am getting frustrated with the documentation and lack of > intuitiveness in Hive relating to timestamps and was hoping I could post > here and get some clarification or other ideas. > > I have a field that is a string, but is actually a 10 digit int > representation of epoch time, I am going to list out the results of various > functions. > > Value = 1356588013 > > Hive: > > from_unixtime(Value) = 2012-12-27 00:00:13 (Timezone CST on the system > time, so that works) > cast(value as timestamp) = 1970-01-16 10:49:48.013 > cast(cast(value as int) as timestamp = 1970-01-16 10:49:48.013 > from_utc_timestamp(starttime, 'GMT') = 1970-01-16 10:49:48.013 > from_utc_timestamp(starttime, 'CST') = 1970-01-16 04:49:48.013 > > > Epoch Converter - http://www.epochconverter.com/ > > Thu, 27 Dec 2012 06:00:13 GMT - GMT Representation of the time > Thu Dec 27 2012 00:00:13 GMT-6 - My Timezone representation > > Ok Given all of these representations... how do I get the Value ( a valid > epoch time) into a GMT time basically, 2012-12-27 06:00:13 without just > doing math. (Math is error prone on system as we move across timezone). Why > doesn't the casting of the value to timestamp or even the casting of the > int cast of the time stamp work? Why does it read 1970? This is very > frustrating and should be more intuitive. Please advise. > > >
Timestamp, Epoch Time, Functions and other Frustrations
Greetings all. I am getting frustrated with the documentation and lack of intuitiveness in Hive relating to timestamps and was hoping I could post here and get some clarification or other ideas. I have a field that is a string, but is actually a 10 digit int representation of epoch time, I am going to list out the results of various functions. Value = 1356588013 Hive: from_unixtime(Value) = 2012-12-27 00:00:13 (Timezone CST on the system time, so that works) cast(value as timestamp) = 1970-01-16 10:49:48.013 cast(cast(value as int) as timestamp = 1970-01-16 10:49:48.013 from_utc_timestamp(starttime, 'GMT') = 1970-01-16 10:49:48.013 from_utc_timestamp(starttime, 'CST') = 1970-01-16 04:49:48.013 Epoch Converter - http://www.epochconverter.com/ Thu, 27 Dec 2012 06:00:13 GMT - GMT Representation of the time Thu Dec 27 2012 00:00:13 GMT-6 - My Timezone representation Ok Given all of these representations... how do I get the Value ( a valid epoch time) into a GMT time basically, 2012-12-27 06:00:13 without just doing math. (Math is error prone on system as we move across timezone). Why doesn't the casting of the value to timestamp or even the casting of the int cast of the time stamp work? Why does it read 1970? This is very frustrating and should be more intuitive. Please advise.
Re: Thrift Hive client for CDH 4.1 HiveServer2?
they are in the src/service/if and src/metastore/if 在 2013-1-4 上午7:16,"David Morel" 写道: > Hi all (and happy New Year!) > > Is it possible to build a perl Thrift client for HiveServer2 (from > Cloudera's 4.1.x) ? > > I'm following the instructions found here: > http://stackoverflow.com/questions/5289164/perl-thrift-client-to-hive > > Downloaded Hive from Cloudera's site, then i'm a bit lost: where do I find > these thrift files that I need to build the perl libs? I have the thrift > compiler working ok, but thats as far as I got. > > Any help would be most welcome > > Thanks! > > D.Morel >
Re: Job counters limit exceeded exception
I ended up increasing the counters limit to 130 which solved my issue. Do you know of any good sources to learn how to decipher hive's EXPLAIN? Cheers, Krishna On 2 January 2013 11:20, Alexander Alten-Lorenz wrote: > Hi, > > These happens when operators are used in queries (Hive Operators). Hive > creates 4 counters per operator, max upto 1000, plus a few additional > counters like file read/write, partitions and tables. Hence the number of > counter required is going to be dependent upon the query. > > Using "EXPLAIN EXTENDED" and "grep -ri operators | wc -l" print out the > used numbers of operators. Use this value to tweak the MR settings > carefully. > > Praveen has a good explanation 'bout counters online: > > http://www.thecloudavenue.com/2011/12/limiting-usage-counters-in-hadoop.html > > Rule of thumb for Hive: > count of operators * 4 + n (n for file ops and other stuff). > > cheers, > Alex > > > On Jan 2, 2013, at 10:35 AM, Krishna Rao wrote: > > > A particular query that I run fails with the following error: > > > > *** > > Job 18: Map: 2 Reduce: 1 Cumulative CPU: 3.67 sec HDFS Read: 0 HDFS > > Write: 0 SUCCESS > > Exception in thread "main" > > org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many > > counters: 121 max=120 > > ... > > *** > > > > Googling suggests that I should increase "mapreduce.job.counters.limit". > > And that the number of counters a job uses > > has an effect on the memory used by the JobTracker, so I shouldn't > increase > > this number too high. > > > > Is there a rule of thumb for what this number should be as a function of > > JobTracker memory? That is should I be cautious and > > increase by 5 at a time, or could I just double it? > > > > Cheers, > > > > Krishna > > -- > Alexander Alten-Lorenz > http://mapredit.blogspot.com > German Hadoop LinkedIn Group: http://goo.gl/N8pCF > >