RE: two-level access problem?
The twoLevelAccessRequired flag is not quite a long term solution to the problem. The problem is that we treat output of relations to be bags but their schemas do NOT have twoLevelAccessRequired to be true. Only bag constants and bags from input data have this flag set to true. We need to move to either *all* bag schemas having a tuple schema with the real schema which reflects the layout of the bag or think of an alternative. Implementing the solution may have many more details which will need to be looked at. This flag should be removed and should not be needed once we arrive at a solution. Otherwise Resource Schema would also need to have this notion of two level access for bag fields. Pradeep. -Original Message- From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] Sent: Tuesday, November 03, 2009 12:30 PM To: pig-dev@hadoop.apache.org Subject: Re: two-level access problem? Thanks Pradeep, I saw that comment. I guess my question is, given the solution this comment describes, what are you referring to in the Load/Store redesign doc when you say "we must fix the two level access issues with schema of bags in current schema before we make these changes, otherwise that same contagion will afflict us here?" -D On Tue, Nov 3, 2009 at 2:10 PM, Pradeep Kamath wrote: > From comments in Schema.java: > // In bags which have a schema with a tuple which contains > // the fields present in it, if we access the second field (say) > // we are actually trying to access the second field in the > // tuple in the bag. This is currently true for two cases: > // 1) bag constants - the schema of bag constant has a tuple > // which internally has the actual elements > // 2) When bags are loaded from input data, if the user > // specifies a schema with the "bag" type, he has to specify > // the bag as containing a tuple with the actual elements in > // the schema declaration. However in both the cases above, > // the user can still say b.i where b is the bag and i is > // an element in the bag's tuple schema. So in these cases, > // the access should translate to a lookup for "i" in the > // tuple schema present in the bag. To indicate this, the > // flag below is used. It is false by default because, > // currently we use bag as the type for relations. However > // the schema of a relation does NOT have a tuple fieldschema > // with items in it. Instead, the schema directly has the > // field schema of the items. So for a relation "b", the > // above b.i access would be a direct single level access > // of i in b's schema. This is treated as the "default" case > private boolean twoLevelAccessRequired = false; > > -Original Message- > From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] > Sent: Monday, November 02, 2009 5:33 PM > To: pig-dev@hadoop.apache.org > Subject: two-level access problem? > > Could someone explain the nature of the "two-level access problem" > referred to in the Load/Store redesign wiki and in the DataType code? > > > Thanks, > -D >
RE: two-level access problem?
>From comments in Schema.java: // In bags which have a schema with a tuple which contains // the fields present in it, if we access the second field (say) // we are actually trying to access the second field in the // tuple in the bag. This is currently true for two cases: // 1) bag constants - the schema of bag constant has a tuple // which internally has the actual elements // 2) When bags are loaded from input data, if the user // specifies a schema with the "bag" type, he has to specify // the bag as containing a tuple with the actual elements in // the schema declaration. However in both the cases above, // the user can still say b.i where b is the bag and i is // an element in the bag's tuple schema. So in these cases, // the access should translate to a lookup for "i" in the // tuple schema present in the bag. To indicate this, the // flag below is used. It is false by default because, // currently we use bag as the type for relations. However // the schema of a relation does NOT have a tuple fieldschema // with items in it. Instead, the schema directly has the // field schema of the items. So for a relation "b", the // above b.i access would be a direct single level access // of i in b's schema. This is treated as the "default" case private boolean twoLevelAccessRequired = false; -Original Message- From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] Sent: Monday, November 02, 2009 5:33 PM To: pig-dev@hadoop.apache.org Subject: two-level access problem? Could someone explain the nature of the "two-level access problem" referred to in the Load/Store redesign wiki and in the DataType code? Thanks, -D
Re: two-level access problem?
Thanks Pradeep, I saw that comment. I guess my question is, given the solution this comment describes, what are you referring to in the Load/Store redesign doc when you say "we must fix the two level access issues with schema of bags in current schema before we make these changes, otherwise that same contagion will afflict us here?" -D On Tue, Nov 3, 2009 at 2:10 PM, Pradeep Kamath wrote: > From comments in Schema.java: > // In bags which have a schema with a tuple which contains > // the fields present in it, if we access the second field (say) > // we are actually trying to access the second field in the > // tuple in the bag. This is currently true for two cases: > // 1) bag constants - the schema of bag constant has a tuple > // which internally has the actual elements > // 2) When bags are loaded from input data, if the user > // specifies a schema with the "bag" type, he has to specify > // the bag as containing a tuple with the actual elements in > // the schema declaration. However in both the cases above, > // the user can still say b.i where b is the bag and i is > // an element in the bag's tuple schema. So in these cases, > // the access should translate to a lookup for "i" in the > // tuple schema present in the bag. To indicate this, the > // flag below is used. It is false by default because, > // currently we use bag as the type for relations. However > // the schema of a relation does NOT have a tuple fieldschema > // with items in it. Instead, the schema directly has the > // field schema of the items. So for a relation "b", the > // above b.i access would be a direct single level access > // of i in b's schema. This is treated as the "default" case > private boolean twoLevelAccessRequired = false; > > -Original Message- > From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] > Sent: Monday, November 02, 2009 5:33 PM > To: pig-dev@hadoop.apache.org > Subject: two-level access problem? > > Could someone explain the nature of the "two-level access problem" > referred to in the Load/Store redesign wiki and in the DataType code? > > > Thanks, > -D >