Re: Map join optimization issue

2013-02-15 Thread Mayuresh Kunjir
I am on 0.9.

If I have a selectivity condition on small table, does Hive try to estimate
filtered data size before deciding the join algorithm? If it is the case,
it makes sense to use map join even when the small table(before filter) is
larger than the hive.mapjoin.smalltable.filesize parameter. Any ideas?

~Mayuresh



On Fri, Feb 15, 2013 at 4:05 PM, Aniket Mokashi  wrote:

> I have tested that the parameter  hive.mapjoin.smalltable.filesize works
> well with 0.8. What version of hive are you on?
>
>
> On Fri, Feb 15, 2013 at 8:57 AM,  wrote:
>
>> **
>> Hi
>>
>> In later versions of hive you actually don't need a map joint hint in
>> your query. Just the following would suffice the purpose
>>
>> Set hive.auto.convert.join=true
>> Regards
>> Bejoy KS
>>
>> Sent from remote device, Please excuse typos
>> --
>> *From: * Mayuresh Kunjir 
>> *Date: *Fri, 15 Feb 2013 10:37:52 -0500
>> *To: *user
>> *ReplyTo: * user@hive.apache.org
>> *Subject: *Re: Map join optimization issue
>>
>> Thanks Aniket. I actually had not specified the map-join hint though.
>> Sorry for providing the wrong information earlier. I had only
>> set hive.auto.convert.join=true before firing my join query.
>>
>> ~Mayuresh
>>
>>
>>
>> On Thu, Feb 14, 2013 at 10:44 PM, Aniket Mokashi wrote:
>>
>>> I think hive.mapjoin.smalltable.filesize parameter will be disregarded
>>> in that case.
>>>
>>>
>>> On Thu, Feb 14, 2013 at 7:25 AM, Mayuresh Kunjir <
>>> mayuresh.kun...@gmail.com> wrote:
>>>
>>>> Yes, the hint was specified.
>>>> On Feb 14, 2013 3:11 AM, "Aniket Mokashi"  wrote:
>>>>
>>>>> have you specified map-join hint in your query?
>>>>>
>>>>>
>>>>> On Thu, Feb 7, 2013 at 11:39 AM, Mayuresh Kunjir <
>>>>> mayuresh.kun...@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>>
>>>>>> I am trying to join two tables, the smaller being of size 4GB. When I
>>>>>> set hive.mapjoin.smalltable.filesize parameter above 500MB, Hive tries to
>>>>>> perform a local task to read the smaller file. This of-course fails since
>>>>>> the file size is greater and the backup common join is then run. What I 
>>>>>> do
>>>>>> not understand is why did Hive attempt a map join when small file size 
>>>>>> was
>>>>>> greater than the smalltable.filesize parameter.
>>>>>>
>>>>>>
>>>>>> ~Mayuresh
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> "...:::Aniket:::... Quetzalco@tl"
>>>>>
>>>>
>>>
>>>
>>> --
>>> "...:::Aniket:::... Quetzalco@tl"
>>>
>>
>>
>
>
> --
> "...:::Aniket:::... Quetzalco@tl"
>


Re: Map join optimization issue

2013-02-15 Thread Aniket Mokashi
I have tested that the parameter  hive.mapjoin.smalltable.filesize works
well with 0.8. What version of hive are you on?


On Fri, Feb 15, 2013 at 8:57 AM,  wrote:

> **
> Hi
>
> In later versions of hive you actually don't need a map joint hint in your
> query. Just the following would suffice the purpose
>
> Set hive.auto.convert.join=true
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> --
> *From: * Mayuresh Kunjir 
> *Date: *Fri, 15 Feb 2013 10:37:52 -0500
> *To: *user
> *ReplyTo: * user@hive.apache.org
> *Subject: *Re: Map join optimization issue
>
> Thanks Aniket. I actually had not specified the map-join hint though.
> Sorry for providing the wrong information earlier. I had only
> set hive.auto.convert.join=true before firing my join query.
>
> ~Mayuresh
>
>
>
> On Thu, Feb 14, 2013 at 10:44 PM, Aniket Mokashi wrote:
>
>> I think hive.mapjoin.smalltable.filesize parameter will be disregarded
>> in that case.
>>
>>
>> On Thu, Feb 14, 2013 at 7:25 AM, Mayuresh Kunjir <
>> mayuresh.kun...@gmail.com> wrote:
>>
>>> Yes, the hint was specified.
>>> On Feb 14, 2013 3:11 AM, "Aniket Mokashi"  wrote:
>>>
>>>> have you specified map-join hint in your query?
>>>>
>>>>
>>>> On Thu, Feb 7, 2013 at 11:39 AM, Mayuresh Kunjir <
>>>> mayuresh.kun...@gmail.com> wrote:
>>>>
>>>>>
>>>>> Hello all,
>>>>>
>>>>>
>>>>> I am trying to join two tables, the smaller being of size 4GB. When I
>>>>> set hive.mapjoin.smalltable.filesize parameter above 500MB, Hive tries to
>>>>> perform a local task to read the smaller file. This of-course fails since
>>>>> the file size is greater and the backup common join is then run. What I do
>>>>> not understand is why did Hive attempt a map join when small file size was
>>>>> greater than the smalltable.filesize parameter.
>>>>>
>>>>>
>>>>> ~Mayuresh
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> "...:::Aniket:::... Quetzalco@tl"
>>>>
>>>
>>
>>
>> --
>> "...:::Aniket:::... Quetzalco@tl"
>>
>
>


-- 
"...:::Aniket:::... Quetzalco@tl"


Re: Map join optimization issue

2013-02-15 Thread bejoy_ks
Hi 

In later versions of hive you actually don't need a map joint hint in your 
query. Just the following would suffice the purpose

Set hive.auto.convert.join=true 

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-Original Message-
From: Mayuresh Kunjir 
Date: Fri, 15 Feb 2013 10:37:52 
To: user
Reply-To: user@hive.apache.org
Subject: Re: Map join optimization issue

Thanks Aniket. I actually had not specified the map-join hint though. Sorry
for providing the wrong information earlier. I had only
set hive.auto.convert.join=true before firing my join query.

~Mayuresh



On Thu, Feb 14, 2013 at 10:44 PM, Aniket Mokashi wrote:

> I think hive.mapjoin.smalltable.filesize parameter will be disregarded in
> that case.
>
>
> On Thu, Feb 14, 2013 at 7:25 AM, Mayuresh Kunjir <
> mayuresh.kun...@gmail.com> wrote:
>
>> Yes, the hint was specified.
>> On Feb 14, 2013 3:11 AM, "Aniket Mokashi"  wrote:
>>
>>> have you specified map-join hint in your query?
>>>
>>>
>>> On Thu, Feb 7, 2013 at 11:39 AM, Mayuresh Kunjir <
>>> mayuresh.kun...@gmail.com> wrote:
>>>
>>>>
>>>> Hello all,
>>>>
>>>>
>>>> I am trying to join two tables, the smaller being of size 4GB. When I
>>>> set hive.mapjoin.smalltable.filesize parameter above 500MB, Hive tries to
>>>> perform a local task to read the smaller file. This of-course fails since
>>>> the file size is greater and the backup common join is then run. What I do
>>>> not understand is why did Hive attempt a map join when small file size was
>>>> greater than the smalltable.filesize parameter.
>>>>
>>>>
>>>> ~Mayuresh
>>>>
>>>>
>>>
>>>
>>> --
>>> "...:::Aniket:::... Quetzalco@tl"
>>>
>>
>
>
> --
> "...:::Aniket:::... Quetzalco@tl"
>



Re: Map join optimization issue

2013-02-15 Thread Mayuresh Kunjir
Thanks Aniket. I actually had not specified the map-join hint though. Sorry
for providing the wrong information earlier. I had only
set hive.auto.convert.join=true before firing my join query.

~Mayuresh



On Thu, Feb 14, 2013 at 10:44 PM, Aniket Mokashi wrote:

> I think hive.mapjoin.smalltable.filesize parameter will be disregarded in
> that case.
>
>
> On Thu, Feb 14, 2013 at 7:25 AM, Mayuresh Kunjir <
> mayuresh.kun...@gmail.com> wrote:
>
>> Yes, the hint was specified.
>> On Feb 14, 2013 3:11 AM, "Aniket Mokashi"  wrote:
>>
>>> have you specified map-join hint in your query?
>>>
>>>
>>> On Thu, Feb 7, 2013 at 11:39 AM, Mayuresh Kunjir <
>>> mayuresh.kun...@gmail.com> wrote:
>>>

 Hello all,


 I am trying to join two tables, the smaller being of size 4GB. When I
 set hive.mapjoin.smalltable.filesize parameter above 500MB, Hive tries to
 perform a local task to read the smaller file. This of-course fails since
 the file size is greater and the backup common join is then run. What I do
 not understand is why did Hive attempt a map join when small file size was
 greater than the smalltable.filesize parameter.


 ~Mayuresh


>>>
>>>
>>> --
>>> "...:::Aniket:::... Quetzalco@tl"
>>>
>>
>
>
> --
> "...:::Aniket:::... Quetzalco@tl"
>


Re: Map join optimization issue

2013-02-14 Thread Aniket Mokashi
I think hive.mapjoin.smalltable.filesize parameter will be disregarded in
that case.


On Thu, Feb 14, 2013 at 7:25 AM, Mayuresh Kunjir
wrote:

> Yes, the hint was specified.
> On Feb 14, 2013 3:11 AM, "Aniket Mokashi"  wrote:
>
>> have you specified map-join hint in your query?
>>
>>
>> On Thu, Feb 7, 2013 at 11:39 AM, Mayuresh Kunjir <
>> mayuresh.kun...@gmail.com> wrote:
>>
>>>
>>> Hello all,
>>>
>>>
>>> I am trying to join two tables, the smaller being of size 4GB. When I
>>> set hive.mapjoin.smalltable.filesize parameter above 500MB, Hive tries to
>>> perform a local task to read the smaller file. This of-course fails since
>>> the file size is greater and the backup common join is then run. What I do
>>> not understand is why did Hive attempt a map join when small file size was
>>> greater than the smalltable.filesize parameter.
>>>
>>>
>>> ~Mayuresh
>>>
>>>
>>
>>
>> --
>> "...:::Aniket:::... Quetzalco@tl"
>>
>


-- 
"...:::Aniket:::... Quetzalco@tl"


Re: Map join optimization issue

2013-02-14 Thread Mayuresh Kunjir
Yes, the hint was specified.
On Feb 14, 2013 3:11 AM, "Aniket Mokashi"  wrote:

> have you specified map-join hint in your query?
>
>
> On Thu, Feb 7, 2013 at 11:39 AM, Mayuresh Kunjir <
> mayuresh.kun...@gmail.com> wrote:
>
>>
>> Hello all,
>>
>>
>> I am trying to join two tables, the smaller being of size 4GB. When I set
>> hive.mapjoin.smalltable.filesize parameter above 500MB, Hive tries to
>> perform a local task to read the smaller file. This of-course fails since
>> the file size is greater and the backup common join is then run. What I do
>> not understand is why did Hive attempt a map join when small file size was
>> greater than the smalltable.filesize parameter.
>>
>>
>> ~Mayuresh
>>
>>
>
>
> --
> "...:::Aniket:::... Quetzalco@tl"
>


Re: Map join optimization issue

2013-02-14 Thread Aniket Mokashi
have you specified map-join hint in your query?


On Thu, Feb 7, 2013 at 11:39 AM, Mayuresh Kunjir
wrote:

>
> Hello all,
>
>
> I am trying to join two tables, the smaller being of size 4GB. When I set
> hive.mapjoin.smalltable.filesize parameter above 500MB, Hive tries to
> perform a local task to read the smaller file. This of-course fails since
> the file size is greater and the backup common join is then run. What I do
> not understand is why did Hive attempt a map join when small file size was
> greater than the smalltable.filesize parameter.
>
>
> ~Mayuresh
>
>


-- 
"...:::Aniket:::... Quetzalco@tl"


Fwd: Map join optimization issue

2013-02-07 Thread Mayuresh Kunjir
Hello all,


I am trying to join two tables, the smaller being of size 4GB. When I set
hive.mapjoin.smalltable.filesize parameter above 500MB, Hive tries to
perform a local task to read the smaller file. This of-course fails since
the file size is greater and the backup common join is then run. What I do
not understand is why did Hive attempt a map join when small file size was
greater than the smalltable.filesize parameter.


~Mayuresh