[ 
https://issues.apache.org/jira/browse/HIVE-27142?focusedWorklogId=851455&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-851455
 ]

ASF GitHub Bot logged work on HIVE-27142:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 17/Mar/23 05:57
            Start Date: 17/Mar/23 05:57
    Worklog Time Spent: 10m 
      Work Description: shameersss1 commented on PR #4120:
URL: https://github.com/apache/hive/pull/4120#issuecomment-1473178710

   @kasakrisz @kgyrtkirk Could you please review the changes?




Issue Time Tracking
-------------------

    Worklog Id:     (was: 851455)
    Time Spent: 40m  (was: 0.5h)

>  Map Join not working as expected when joining non-native tables with native 
> tables
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-27142
>                 URL: https://issues.apache.org/jira/browse/HIVE-27142
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>    Affects Versions: All Versions
>            Reporter: Syed Shameerur Rahman
>            Assignee: Syed Shameerur Rahman
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> *1. Issue :*
> When *_hive.auto.convert.join=true_* and if the underlying query is trying to 
> join a large non-native hive table with a small native hive table, The map 
> join is happening in the wrong side i.e on the map task which process the 
> small native hive table and it can lead to OOM when the non-native table is 
> really large and only few map tasks are spawned to scan the small native hive 
> tables.
>  
> *2. Why is this happening ?*
> This happens due to improper stats collection/computation of non native hive 
> tables. Since the non-native hive tables are actually stored in a different 
> location which Hive does not know of and only a temporary path which is 
> visible to Hive while creating a non native table does not store the actual 
> data, The stats collection logic tend to under estimate the data/rows and 
> hence causes the map join to happen in the wrong side.
>  
> *3. Potential Solutions*
>  3.1  Turn off *_hive.auto.convert.join=false._* This can have a negative 
> impact of the query    if  the same query is trying to do multiple joins i.e 
> one join with non-native tables and other join where both the tables are 
> native.
>  3.2 Compute stats for non-native table by firing the ANALYZE TABLE <> 
> command before joining native and non-native commands. The user may or may 
> not choose to do it.
>  3.3 Do not collect/estimate stats for non-native hive tables by default 
> (Preferred solution)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to