Rajesh Balamohan created HCATALOG-399:
-----------------------------------------
Summary: Hcatalog + Pig (Filter not used during skewed joins)
Key: HCATALOG-399
URL: https://issues.apache.org/jira/browse/HCATALOG-399
Project: HCatalog
Issue Type: Bug
Environment: Pig 0.9.0
HCatalog 0.4.0
Linux
Reporter: Rajesh Balamohan
Pig 0.9.0
HCatalog 0.4.0
Hadoop 0.20.20x
dim_referrer = LOAD 'tableA' USING org.apache.hcatalog.pig.HCatLoader();
source_data = LOAD 'tableB' USING org.apache.hcatalog.pig.HCatLoader();
source_data_new = FILTER source_data BY d =='20120415';
joined_data_referrer = JOIN source_data_new BY referrer LEFT OUTER,
dim_referrer BY referrer_url using 'skewed';
dump joined_data_referrer;
In this case, all records are scanned and the filtering is not applied by
HCatalog.
Shouldn't it apply the filter first and then do the sampling M/R job required
for "skewed" join?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira