We can have that "strict typing" option in pig.properties and then make the type checking validation consuming that config key. However by default I want to turn it on.
Pi On 5/15/08, Alan Gates <[EMAIL PROTECTED]> wrote: > > I agree this will be somewhat surprising, perhaps we should give a warning. > But we need to preserve our philosophy that "Pig's eat anything". This > would seem to dictate that we allow people to use union regardless of the > schemas. One open question in my mind is whether we have a "strict mode" > (similar to 'use strict' in perl) where things like this cause errors > instead of (possibly) warnings. > > Alan. > > pi song wrote: > >> Alan, >> >> On my second thought, union of two incompatible data streams can cause >> undefined state in downstream operators, resulting in a mix of good output >> and garbage. This seems to break the rule of least surprise. What do you >> think? >> >> Pi >> >> On Wed, May 14, 2008 at 9:06 AM, pi song <[EMAIL PROTECTED]> wrote: >> >> >> >>> Ok, will follow that. >>> >>> >>> On 5/14/08, Alan Gates <[EMAIL PROTECTED]> wrote: >>> >>> >>>> I agree that option 3 is the correct course. >>>> >>>> One note, you say: >>>> >>>> In case that schemas from all the input ports are not compatible, no >>>> problem >>>> because we won't process it. >>>> >>>> How do you mean "won't process it"? We still have to allow a union >>>> operation between two non-compatible inputs (otherwise we can only use >>>> union >>>> when we have schemas). But the resulting union will not have a schema >>>> (since the output no longer has a consistent schema). >>>> >>>> Alan. >>>> >>>> >>>> pi song wrote: >>>> >>>> >>>> >>>>> Union is an example of bag (relational) operators that can have more >>>>> than >>>>> one input. >>>>> >>>>> In case that schemas from all the input ports are the same, no problem. >>>>> In case that schemas from all the input ports are not compatible, no >>>>> problem >>>>> because we won't process it. >>>>> In case that schemas from all the input ports are not the same, but >>>>> compatible, here comes a problem. >>>>> >>>>> Example: >>>>> >>>>> C = UNION A,B ; >>>>> >>>>> Schema(A) = < Int, Chararray > >>>>> Schema(B) = < Double, Chararray > >>>>> >>>>> The output schema will get resolved to < Double, Chararray >. Here is >>>>> the >>>>> problem. The Union operator at the moment doesn't support casting in >>>>> any >>>>> layer. In this case if we don't cast it, the binary data of Int will >>>>> get >>>>> picked up as Double by the downstream operator!! There are a couple >>>>> solutions for this:- >>>>> >>>>> 1) Implement LOUnion and POUnion to support type casting internally >>>>> 2) Add casting support in LOUnion operator and let the >>>>> LogicalToPhysical >>>>> compiler generates LOForeach for it. >>>>> 3) Explicitly insert LOForEach to do necessary casting between Union >>>>> and >>>>> the >>>>> problematic input. This is analogous to the way we implement implicit >>>>> casting for expression operators. >>>>> 4) Don't support "not same but compatible" case at all. >>>>> >>>>> I will do (3) because it makes the most sense to me plus incurs the >>>>> least >>>>> impact on other modules. Does anyone have problem with it? >>>>> >>>>> Pi >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >> >> >
