Re: Hive HWI ... request for your experience to be used Production

2013-01-04 Thread Qiang Wang
Hi Manish:

Glad to receive your email because we are making efforts on HWI.

We have improved the orignal and added some features and putted it on
github:

https://github.com/anjuke/hwi

It's far from mature and standard, but it's improving and has already
deployed for our company to use.

After all, have a try and give some advice if you're interested in it.

Thanks

Qiang


2013/1/5 Manish Malhotra 

>
> Hi All,
>
> We are exploring HWI to be used in PROD environment for adhoc queries etc.
> Want to check out in the hive community that can somebody share there
> experience while using the HWI in prod or any environment in terms of its
> stability and performance.
> Also evaluating to enhance to make it more useful with different features.
>
> Thanks for your time and help !!
>
> Regards,
> Manish
>
>
>
>


Re: HiveHistoryViewer concurrency problem

2013-01-04 Thread Qiang Wang
Maybe it's not.

But this exception happens when I create an *HiveHistoryViewer* instance,
in which case only reading, parsing file is invloved and it's not intended
to be shared between threads.

So the exception surprised me and I wonder why a static buffer was used
instead of a local buffer which has no concurrent issue.


2013/1/5 Edward Capriolo 

> It is likely an oversight. The Majority of hive code was not written to be
> multi-threaded.
>
>
>
> On Fri, Jan 4, 2013 at 10:41 PM, Jie Li  wrote:
>
>> Hi Qiang,
>>
>> Could you describe how HiveHistoryViewer is used? I'm also looking for
>> a tool to understand the Hive log.
>>
>> Thanks,
>> Jie
>>
>> On Sat, Jan 5, 2013 at 9:54 AM, Qiang Wang  wrote:
>> > Does Anybody have an idea about this?
>> >
>> > https://issues.apache.org/jira/browse/HIVE-3857
>> >
>> >
>> > 2013/1/4 Qiang Wang 
>> >>
>> >> new HiveHistoryViewer() throws ConcurrentModificationException when
>> called
>> >> concurrently by several threads.
>> >>
>> >> According to the stack trace, HiveHistory.parseLine use private static
>> >> Map parseBuffer to store parsed data and this caused
>> the
>> >> exception.
>> >>
>> >> I don't know why a static buffer rather than a local buffer is used!
>> >> Anybody have an idea about this?
>> >
>> >
>>
>
>


Re: HiveHistoryViewer concurrency problem

2013-01-04 Thread Qiang Wang
Hi Jie:

As I know, hive history log is structured and class *HiveHistory* is used
to write and read hive history log.

*HiveHistoryViewer* serves as a listener to listen and store parsed log
data. It has two members:

HashMap *jobInfoMap*, which stores QueryInfo related
with hive query

and

HashMap *taskInfoMap*, which stores TaskInfo related with
hadoop map/red job

you can dump the two maps and find what you want.

Hope these info helps

Qiang


2013/1/5 Jie Li 

> Hi Qiang,
>
> Could you describe how HiveHistoryViewer is used? I'm also looking for
> a tool to understand the Hive log.
>
> Thanks,
> Jie
>
> On Sat, Jan 5, 2013 at 9:54 AM, Qiang Wang  wrote:
> > Does Anybody have an idea about this?
> >
> > https://issues.apache.org/jira/browse/HIVE-3857
> >
> >
> > 2013/1/4 Qiang Wang 
> >>
> >> new HiveHistoryViewer() throws ConcurrentModificationException when
> called
> >> concurrently by several threads.
> >>
> >> According to the stack trace, HiveHistory.parseLine use private static
> >> Map parseBuffer to store parsed data and this caused the
> >> exception.
> >>
> >> I don't know why a static buffer rather than a local buffer is used!
> >> Anybody have an idea about this?
> >
> >
>


Re: HiveHistoryViewer concurrency problem

2013-01-04 Thread Edward Capriolo
It is likely an oversight. The Majority of hive code was not written to be
multi-threaded.


On Fri, Jan 4, 2013 at 10:41 PM, Jie Li  wrote:

> Hi Qiang,
>
> Could you describe how HiveHistoryViewer is used? I'm also looking for
> a tool to understand the Hive log.
>
> Thanks,
> Jie
>
> On Sat, Jan 5, 2013 at 9:54 AM, Qiang Wang  wrote:
> > Does Anybody have an idea about this?
> >
> > https://issues.apache.org/jira/browse/HIVE-3857
> >
> >
> > 2013/1/4 Qiang Wang 
> >>
> >> new HiveHistoryViewer() throws ConcurrentModificationException when
> called
> >> concurrently by several threads.
> >>
> >> According to the stack trace, HiveHistory.parseLine use private static
> >> Map parseBuffer to store parsed data and this caused the
> >> exception.
> >>
> >> I don't know why a static buffer rather than a local buffer is used!
> >> Anybody have an idea about this?
> >
> >
>


Map-only aggregation

2013-01-04 Thread Jie Li
Hi all,

Can Hive implement the aggregation as a Map-only job? As we know the
data may be pre-partitioned via PARTITION-BY or CLUSTERED-BY, so we
don't need the reduce phase to repartition the data.

The Bucket Join seems to take advantage of the buckets for joins, so I
wonder if there is some similar optimization for aggregations.

Thanks,
Jie


Re: HiveHistoryViewer concurrency problem

2013-01-04 Thread Jie Li
Hi Qiang,

Could you describe how HiveHistoryViewer is used? I'm also looking for
a tool to understand the Hive log.

Thanks,
Jie

On Sat, Jan 5, 2013 at 9:54 AM, Qiang Wang  wrote:
> Does Anybody have an idea about this?
>
> https://issues.apache.org/jira/browse/HIVE-3857
>
>
> 2013/1/4 Qiang Wang 
>>
>> new HiveHistoryViewer() throws ConcurrentModificationException when called
>> concurrently by several threads.
>>
>> According to the stack trace, HiveHistory.parseLine use private static
>> Map parseBuffer to store parsed data and this caused the
>> exception.
>>
>> I don't know why a static buffer rather than a local buffer is used!
>> Anybody have an idea about this?
>
>


Fwd: Hive HWI ... request for your experience to be used Production

2013-01-04 Thread Manish Malhotra
Hi All,

We are exploring HWI to be used in PROD environment for adhoc queries etc.
Want to check out in the hive community that can somebody share there
experience while using the HWI in prod or any environment in terms of its
stability and performance.
Also evaluating to enhance to make it more useful with different features.

Thanks for your time and help !!

Regards,
Manish


Re: HiveHistoryViewer concurrency problem

2013-01-04 Thread Qiang Wang
Does Anybody have an idea about this?

https://issues.apache.org/jira/browse/HIVE-3857


2013/1/4 Qiang Wang 

> new HiveHistoryViewer() throws ConcurrentModificationException when called
> concurrently by several threads.
>
> According to the stack trace, HiveHistory.parseLine use *private static
> Map parseBuffer* to store parsed data and this caused the
> exception.
>
> I don't know why a static buffer rather than a local buffer is used!
> Anybody have an idea about this?
>


Re: 0.8.0 -> 0.9.0 mysql schema upgrade

2013-01-04 Thread Sam William
Looks like this column is not even there in the 0.8/0.9 schema files .  I have 
no idea, how I have it in my schema . I just set a default 'false' value  and I 
m fine now.
Sam

On Jan 4, 2013, at 2:22 PM, Sam William  wrote:

> When I upgraded to 0.9.0,  Im getting an exception when I try to create 
> tables 
> 
> FAILED: Error in metadata: javax.jdo.JDODataStoreException: Insert of object 
> "org.apache.hadoop.hive.metastore.model.MStorageDescriptor@4774e78a" using 
> statement "INSERT INTO `SDS` 
> (`SD_ID`,`NUM_BUCKETS`,`LOCATION`,`INPUT_FORMAT`,`CD_ID`,`OUTPUT_FORMAT`,`SERDE_ID`,`IS_COMPRESSED`)
>  VALUES (?,?,?,?,?,?,?,?)" failed : Field 'IS_STOREDASSUBDIRECTORIES' doesn't 
> have a default value
> NestedThrowables:
> java.sql.SQLException: Field 'IS_STOREDASSUBDIRECTORIES' doesn't have a 
> default value
> 
> 
> The upgrade script from 0.8 to 0.9   doesnt have anything ?  What am I 
> missing ?
> 
> Sam William
> sa...@stumbleupon.com
> 
> 
> 

Sam William
sa...@stumbleupon.com





0.8.0 -> 0.9.0 mysql schema upgrade

2013-01-04 Thread Sam William
When I upgraded to 0.9.0,  Im getting an exception when I try to create tables 

FAILED: Error in metadata: javax.jdo.JDODataStoreException: Insert of object 
"org.apache.hadoop.hive.metastore.model.MStorageDescriptor@4774e78a" using 
statement "INSERT INTO `SDS` 
(`SD_ID`,`NUM_BUCKETS`,`LOCATION`,`INPUT_FORMAT`,`CD_ID`,`OUTPUT_FORMAT`,`SERDE_ID`,`IS_COMPRESSED`)
 VALUES (?,?,?,?,?,?,?,?)" failed : Field 'IS_STOREDASSUBDIRECTORIES' doesn't 
have a default value
NestedThrowables:
java.sql.SQLException: Field 'IS_STOREDASSUBDIRECTORIES' doesn't have a default 
value


The upgrade script from 0.8 to 0.9   doesnt have anything ?  What am I missing ?

Sam William
sa...@stumbleupon.com





Re: Timestamp, Epoch Time, Functions and other Frustrations

2013-01-04 Thread John Omernik
So I read that JIRA, and also found this linked JIRA:

https://issues.apache.org/jira/browse/HIVE-3454

So I decided to try the * 1.0 work around.

select
starttime,
from_unixtime(starttime) as unixtime,
 cast((starttime * 1.0)  as timestamp) as castts,
from_utc_timestamp(starttime * 1.0, 'GMT') as fromtsgmt,
from_utc_timestamp(starttime * 1.0, 'CST') asfromtscst
from table

Hypothesis give starttime= 1356588013 (and based off the epoch convertor
website)

unixtime = 2012-12-27 00:00:13 # This is because unix time displays the
time in the system time zone
castts = 2012-12-27 06:00:13.0  # This is because timestamp is a UTC time,
it should match the GMT time
fromtsgmt = 2012-12-27 06:00:13.0 # This should be exactly what the TS is
so it should be the same as the cast
fromtsCST =2012-12-27 00:00:13.0 # This should be the same (time based)
result as from from_unixtime

Actual Results:

unixtime =2012-12-27 00:00:13 # 1 for 1 !
castts = 2012-12-27 00:00:13.0 # What? Why is this the same as unixtime?
fromtsgmt = 2012-12-27 00:00:13.0 # What is THIS the same as unixtime?
fromtscst = 2012-12-26 18:00:13.0 # This is 6 hours behind? Why did my
epoch time get coverted to timestamp as if we added 6 to the hour?

!  That makes NO sense, even ignoring the bug in the conversion requiring a
float, am I doing this wrong or is there a different bug in how this is
approached?





On Fri, Jan 4, 2013 at 10:30 AM, Mark Grover wrote:

> Brad is correct, there is a JIRA about this already:
> https://issues.apache.org/jira/browse/HIVE-3822
>
> Sorry for the inconvenience.
>
> Mark
>
> On Fri, Jan 4, 2013 at 8:25 AM, Brad Cavanagh 
> wrote:
> > Try multiplying your values by 1000, then running the conversions. I bet
> > they expect milliseconds since the epoch instead of seconds.
> >
> > Brad.
> >
> >
> > On 2013-01-04, at 8:03 AM, John Omernik  wrote:
> >
> > Greetings all. I am getting frustrated with the documentation and lack of
> > intuitiveness in Hive relating to timestamps and was hoping I could post
> > here and get some clarification or other ideas.
> >
> > I have a field that is a string, but is actually a 10 digit int
> > representation of epoch time, I am going to list out the results of
> various
> > functions.
> >
> > Value = 1356588013
> >
> > Hive:
> >
> > from_unixtime(Value) = 2012-12-27 00:00:13 (Timezone CST on the system
> time,
> > so that works)
> > cast(value as timestamp) = 1970-01-16 10:49:48.013
> > cast(cast(value as int) as timestamp = 1970-01-16 10:49:48.013
> > from_utc_timestamp(starttime, 'GMT') = 1970-01-16 10:49:48.013
> > from_utc_timestamp(starttime, 'CST') = 1970-01-16 04:49:48.013
> >
> >
> > Epoch Converter - http://www.epochconverter.com/
> >
> > Thu, 27 Dec 2012 06:00:13 GMT - GMT Representation of the time
> > Thu Dec 27 2012 00:00:13 GMT-6 - My Timezone representation
> >
> > Ok Given all of these representations... how do I get the Value ( a valid
> > epoch time) into a GMT time basically, 2012-12-27 06:00:13 without just
> > doing math. (Math is error prone on system as we move across timezone).
> Why
> > doesn't the casting of the value to timestamp or even the casting of the
> int
> > cast of the time stamp work?   Why does it read 1970?  This is very
> > frustrating and should be more intuitive.  Please advise.
> >
> >
>


Re: Timestamp, Epoch Time, Functions and other Frustrations

2013-01-04 Thread Mark Grover
Brad is correct, there is a JIRA about this already:
https://issues.apache.org/jira/browse/HIVE-3822

Sorry for the inconvenience.

Mark

On Fri, Jan 4, 2013 at 8:25 AM, Brad Cavanagh  wrote:
> Try multiplying your values by 1000, then running the conversions. I bet
> they expect milliseconds since the epoch instead of seconds.
>
> Brad.
>
>
> On 2013-01-04, at 8:03 AM, John Omernik  wrote:
>
> Greetings all. I am getting frustrated with the documentation and lack of
> intuitiveness in Hive relating to timestamps and was hoping I could post
> here and get some clarification or other ideas.
>
> I have a field that is a string, but is actually a 10 digit int
> representation of epoch time, I am going to list out the results of various
> functions.
>
> Value = 1356588013
>
> Hive:
>
> from_unixtime(Value) = 2012-12-27 00:00:13 (Timezone CST on the system time,
> so that works)
> cast(value as timestamp) = 1970-01-16 10:49:48.013
> cast(cast(value as int) as timestamp = 1970-01-16 10:49:48.013
> from_utc_timestamp(starttime, 'GMT') = 1970-01-16 10:49:48.013
> from_utc_timestamp(starttime, 'CST') = 1970-01-16 04:49:48.013
>
>
> Epoch Converter - http://www.epochconverter.com/
>
> Thu, 27 Dec 2012 06:00:13 GMT - GMT Representation of the time
> Thu Dec 27 2012 00:00:13 GMT-6 - My Timezone representation
>
> Ok Given all of these representations... how do I get the Value ( a valid
> epoch time) into a GMT time basically, 2012-12-27 06:00:13 without just
> doing math. (Math is error prone on system as we move across timezone). Why
> doesn't the casting of the value to timestamp or even the casting of the int
> cast of the time stamp work?   Why does it read 1970?  This is very
> frustrating and should be more intuitive.  Please advise.
>
>


Re: Timestamp, Epoch Time, Functions and other Frustrations

2013-01-04 Thread Brad Cavanagh
Try multiplying your values by 1000, then running the conversions. I bet they 
expect milliseconds since the epoch instead of seconds.

Brad. 

On 2013-01-04, at 8:03 AM, John Omernik  wrote:

> Greetings all. I am getting frustrated with the documentation and lack of 
> intuitiveness in Hive relating to timestamps and was hoping I could post here 
> and get some clarification or other ideas. 
> 
> I have a field that is a string, but is actually a 10 digit int 
> representation of epoch time, I am going to list out the results of various 
> functions.
> 
> Value = 1356588013
> 
> Hive:
> 
> from_unixtime(Value) = 2012-12-27 00:00:13 (Timezone CST on the system time, 
> so that works)
> cast(value as timestamp) = 1970-01-16 10:49:48.013
> cast(cast(value as int) as timestamp = 1970-01-16 10:49:48.013
> from_utc_timestamp(starttime, 'GMT') = 1970-01-16 10:49:48.013
> from_utc_timestamp(starttime, 'CST') = 1970-01-16 04:49:48.013
> 
> 
> Epoch Converter - http://www.epochconverter.com/
> 
> Thu, 27 Dec 2012 06:00:13 GMT - GMT Representation of the time
> Thu Dec 27 2012 00:00:13 GMT-6 - My Timezone representation
> 
> Ok Given all of these representations... how do I get the Value ( a valid 
> epoch time) into a GMT time basically, 2012-12-27 06:00:13 without just doing 
> math. (Math is error prone on system as we move across timezone). Why doesn't 
> the casting of the value to timestamp or even the casting of the int cast of 
> the time stamp work?   Why does it read 1970?  This is very frustrating and 
> should be more intuitive.  Please advise. 
> 
> 


Re: Timestamp, Epoch Time, Functions and other Frustrations

2013-01-04 Thread John Omernik
One more test:

to_utc_timestamp(from_unixtime(value), 'CST') as to_from, provided the
proper timestamp for me, however, I still had to provide the timezone which
I should NOT have to do. I know that this data coming in is in epoch time,
therefore I should be able to create a timezone without knowing a timezone
or timezone offset.



On Fri, Jan 4, 2013 at 10:03 AM, John Omernik  wrote:

> Greetings all. I am getting frustrated with the documentation and lack of
> intuitiveness in Hive relating to timestamps and was hoping I could post
> here and get some clarification or other ideas.
>
> I have a field that is a string, but is actually a 10 digit int
> representation of epoch time, I am going to list out the results of various
> functions.
>
> Value = 1356588013
>
> Hive:
>
> from_unixtime(Value) = 2012-12-27 00:00:13 (Timezone CST on the system
> time, so that works)
> cast(value as timestamp) = 1970-01-16 10:49:48.013
> cast(cast(value as int) as timestamp = 1970-01-16 10:49:48.013
> from_utc_timestamp(starttime, 'GMT') = 1970-01-16 10:49:48.013
> from_utc_timestamp(starttime, 'CST') = 1970-01-16 04:49:48.013
>
>
> Epoch Converter - http://www.epochconverter.com/
>
> Thu, 27 Dec 2012 06:00:13 GMT - GMT Representation of the time
> Thu Dec 27 2012 00:00:13 GMT-6 - My Timezone representation
>
> Ok Given all of these representations... how do I get the Value ( a valid
> epoch time) into a GMT time basically, 2012-12-27 06:00:13 without just
> doing math. (Math is error prone on system as we move across timezone). Why
> doesn't the casting of the value to timestamp or even the casting of the
> int cast of the time stamp work?   Why does it read 1970?  This is very
> frustrating and should be more intuitive.  Please advise.
>
>
>


Timestamp, Epoch Time, Functions and other Frustrations

2013-01-04 Thread John Omernik
Greetings all. I am getting frustrated with the documentation and lack of
intuitiveness in Hive relating to timestamps and was hoping I could post
here and get some clarification or other ideas.

I have a field that is a string, but is actually a 10 digit int
representation of epoch time, I am going to list out the results of various
functions.

Value = 1356588013

Hive:

from_unixtime(Value) = 2012-12-27 00:00:13 (Timezone CST on the system
time, so that works)
cast(value as timestamp) = 1970-01-16 10:49:48.013
cast(cast(value as int) as timestamp = 1970-01-16 10:49:48.013
from_utc_timestamp(starttime, 'GMT') = 1970-01-16 10:49:48.013
from_utc_timestamp(starttime, 'CST') = 1970-01-16 04:49:48.013


Epoch Converter - http://www.epochconverter.com/

Thu, 27 Dec 2012 06:00:13 GMT - GMT Representation of the time
Thu Dec 27 2012 00:00:13 GMT-6 - My Timezone representation

Ok Given all of these representations... how do I get the Value ( a valid
epoch time) into a GMT time basically, 2012-12-27 06:00:13 without just
doing math. (Math is error prone on system as we move across timezone). Why
doesn't the casting of the value to timestamp or even the casting of the
int cast of the time stamp work?   Why does it read 1970?  This is very
frustrating and should be more intuitive.  Please advise.


Re: Thrift Hive client for CDH 4.1 HiveServer2?

2013-01-04 Thread Jov
they are in the src/service/if and src/metastore/if
在 2013-1-4 上午7:16,"David Morel" 写道:

> Hi all (and happy New Year!)
>
> Is it possible to build a perl Thrift client for HiveServer2 (from
> Cloudera's 4.1.x) ?
>
> I'm following the instructions found here:
> http://stackoverflow.com/questions/5289164/perl-thrift-client-to-hive
>
> Downloaded Hive from Cloudera's site, then i'm a bit lost: where do I find
> these thrift files that I need to build the perl libs? I have the thrift
> compiler working ok, but thats as far as I got.
>
> Any help would be most welcome
>
> Thanks!
>
> D.Morel
>


Re: Job counters limit exceeded exception

2013-01-04 Thread Krishna Rao
I ended up increasing the counters limit to 130 which solved my issue.

Do you know of any good sources to learn how to decipher hive's EXPLAIN?

Cheers,

Krishna


On 2 January 2013 11:20, Alexander Alten-Lorenz  wrote:

> Hi,
>
> These happens when operators are used in queries (Hive Operators). Hive
> creates 4 counters per operator, max upto 1000, plus a few additional
> counters like file read/write, partitions and tables. Hence the number of
> counter required is going to be dependent upon the query.
>
> Using "EXPLAIN EXTENDED" and "grep -ri operators | wc -l" print out the
> used numbers of operators. Use this value to tweak the MR settings
> carefully.
>
> Praveen has a good explanation 'bout counters online:
>
> http://www.thecloudavenue.com/2011/12/limiting-usage-counters-in-hadoop.html
>
> Rule of thumb for Hive:
> count of operators * 4 + n (n for file ops and other stuff).
>
> cheers,
>  Alex
>
>
> On Jan 2, 2013, at 10:35 AM, Krishna Rao  wrote:
>
> > A particular query that I run fails with the following error:
> >
> > ***
> > Job 18: Map: 2  Reduce: 1   Cumulative CPU: 3.67 sec   HDFS Read: 0 HDFS
> > Write: 0 SUCCESS
> > Exception in thread "main"
> > org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many
> > counters: 121 max=120
> > ...
> > ***
> >
> > Googling suggests that I should increase "mapreduce.job.counters.limit".
> > And that the number of counters a job uses
> > has an effect on the memory used by the JobTracker, so I shouldn't
> increase
> > this number too high.
> >
> > Is there a rule of thumb for what this number should be as a function of
> > JobTracker memory? That is should I be cautious and
> > increase by 5 at a time, or could I just double it?
> >
> > Cheers,
> >
> > Krishna
>
> --
> Alexander Alten-Lorenz
> http://mapredit.blogspot.com
> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>
>