Sergey created PIG-3625:
---------------------------
Summary: Strange schema transformation and API path to get nested
objects
Key: PIG-3625
URL: https://issues.apache.org/jira/browse/PIG-3625
Project: Pig
Issue Type: Bug
Affects Versions: 0.10.1
Environment: CDH 4.4
Reporter: Sergey
Hi, here is a part fo my script:
{code}
describe groupedNotMatchedSaleItems;
/*
groupedNotMatchedSaleItems: {
group: long,
notMatchedSaleItems: {(sale_id: long,sale_item_id: long)}
}
*/
describe groupedFlatSales;
groupedFlatSales: {
group: long,
flatSales:
{(npl_id: long,block_id: int,is_napoleon: int,rec_cnt: int,recs:
chararray,item_id: long,shop_id: int,internal_id: int,catalog_category_id:
long,sale_item_id: long,sale_id: long,price: int,count: int)}
}
describe projectedRecsOf2ndLevel;
/*
projectedRecsOf2ndLevel: {sale_id: long,sale_item_id: long,npl_id:
long,recs: chararray}
*/
cogroupedSalesNotMatched = COGROUP groupedFlatSales by group,
groupedNotMatchedSaleItems by group,
projectedRecsOf2ndLevel by sale_id;
describe cogroupedSalesNotMatched;
/*
cogroupedSalesNotMatched: {
group: long,
groupedFlatSales: {
(
group: long,
flatSales:
{(npl_id: long,block_id: int,is_napoleon: int,rec_cnt: int,recs:
chararray,item_id: long,shop_id: int,internal_id: int,catalog_category_id:
long,sale_item_id: long,sale_id: long,price: int,count: int)}
)
},
groupedNotMatchedSaleItems: {
(
group: long,
notMatchedSaleItems: {(sale_id: long,sale_item_id: long)}
)
},
projectedRecsOf2ndLevel: {
(sale_id:
long,sale_item_id: long,npl_id: long,recs: chararray)
*/
secondLevelRecommendations = FOREACH cogroupedSalesNotMatched{
GENERATE
NplRecSecondLevelMatcher(groupedNotMatchedSaleItems.notMatchedSaleItems,
groupedFlatSales.flatSales,
projectedRecsOf2ndLevel);
}
{code}
NplRecSecondLevelMatcher is a Java UDF
Input shema inside UDF is:
{code}
{
{
(
notMatchedSaleItems:{(sale_id: long,sale_item_id: long)}
)
},
{
(
flatSales:{(npl_id: long,block_id: int,is_napoleon:
int,rec_cnt: int,recs: chararray,item_id: long,shop_id: int,internal_id:
int,catalog_category_id: long,sale_item_id: long,sale_id: long,price:
int,count: int)}
)
},
projectedRecsOf2ndLevel: {(sale_id: long,sale_item_id:
long,npl_id: long,recs: chararray)}
}
{code}
Why is it so strage for notMatchedSaleItems and flatSales?
I have to write this strage code to get access to notMatchedSaleItems bag:
{code}
/**
It's a groovy
@param input is an input tuple for the UDF
@param bagName is a bag name in schema. data-fu lib is used.
def getInputBag(Tuple input, String bagName){
def bag = getBag(input, bagName)
(bag.iterator().next() as Tuple).get(0) as DataBag
}
*/
{code}
I supposed that
{code}
(DataBag)udfInputTuple.get(0) should return the bag with "notMatchedSaleItems"
{code}
Why my input is wrapped with these bags and tuples?
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)