Rajesh Balamohan created HCATALOG-399:
-----------------------------------------

             Summary: Hcatalog + Pig (Filter not used during skewed joins)
                 Key: HCATALOG-399
                 URL: https://issues.apache.org/jira/browse/HCATALOG-399
             Project: HCatalog
          Issue Type: Bug
         Environment: Pig 0.9.0
HCatalog 0.4.0
Linux
            Reporter: Rajesh Balamohan


Pig 0.9.0
HCatalog 0.4.0
Hadoop 0.20.20x

dim_referrer = LOAD 'tableA' USING org.apache.hcatalog.pig.HCatLoader();
source_data = LOAD 'tableB' USING org.apache.hcatalog.pig.HCatLoader();
source_data_new = FILTER source_data BY d =='20120415';
joined_data_referrer = JOIN source_data_new BY referrer LEFT OUTER, 
dim_referrer BY referrer_url using 'skewed';
dump joined_data_referrer; 

In this case, all records are scanned and the filtering is not applied by 
HCatalog.

Shouldn't it apply the filter first and then do the sampling M/R job required 
for "skewed" join?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to