Re: New to hive... slow query performance

Edward Capriolo Thu, 23 Sep 2010 07:10:16 -0700

On Thu, Sep 23, 2010 at 4:25 AM, Sharma, Raghvendra
<sraghven...@corelogic.com> wrote:
> Hi,
>
>
>
> I am very new to hive, have just been able to load some data into it.
>
>
>
> I am running hadoop on a old Pentium 4 box with 4 gb RAM.
>
> It’s a single node cluster, and configured based on tutorials from apache
> site and others.
>
>
>
> The load speeds to hdfs look ok, I am able to load approx 20 million rows in
> around 2 minutes.
>
> However, the querying is pathetic. It takes minutes to come back with a
> single where clause, a simple count(*) sends it to sleep. A join is in terms
> of hours.
>
> Since I am just starting, I am not using anything fancy, as in clustering or
> partitioning of the table. (Is that a wrong choice ? I thought I’d start
> simple)
>
>
>
> Somehow I have a feeling that it would be something to do with the wrong
> kind of configuration.
>
>
>
> Can someone help me..
>
>
>
> PS : Are there no “forums” for hadoop/hive ?? I couldn’t find any. L
>
>
>
> --raghav
>
>
>
> ******************************************************************************************
> This message may contain confidential or proprietary information intended
> only for the use of the
> addressee(s) named above or may contain information that is legally
> privileged. If you are
> not the intended addressee, or the person responsible for delivering it to
> the intended addressee,
> you are hereby notified that reading, disseminating, distributing or copying
> this message is strictly
> prohibited. If you have received this message by mistake, please immediately
> notify us by
> replying to the message and delete the original message and any copies
> immediately thereafter.
>
> Thank you.
> ******************************************************************************************
> CLLD
>


http://wiki.apache.org/hadoop/Hive

....
Hadoop is a batch processing system and Hadoop jobs tend to have high
latency and incur substantial overheads in job submission and
scheduling. As a result - latency for Hive queries is generally very
high (minutes) even when data sets involved are very small (say a few
hundred megabytes). As a result it cannot be compared with systems
such as Oracle where analyses are conducted on a significantly smaller
amount of data but the analyses proceed much more iteratively with the
response times between iterations being less than a few minutes. Hive
aims to provide acceptable (but not optimal) latency for interactive
data browsing, queries over small data sets or test queries. Hive also
does not provide sort of data or query cache to make repeated queries
over the same data set faster.

If you do not have a lot of data you probably do not need hive. Hive
performs differently then main memory databases due to its map/reduce
architecture.

Re: New to hive... slow query performance

Reply via email to