My issue relates to data types in schemas and compile-time vs run-time type checking.
http://pig.apache.org/docs/r0.14.0/basic.html#schemas says: we encourage you to use them whenever possible; type declarations result in better parse-time error checking and more efficient code execution. In the course of trying to follow this sound advice I have stumbled upon some puzzling behavior ... and I got bit by something. I'm hoping the Pig gods can shed some light. The following creates a single tuple with one chararray field whose value is 'abc' grunt> a = LOAD 'one-line-file.txt' AS (s:chararray); grunt> b = FOREACH a GENERATE 'abc' AS s:chararray; grunt> DESCRIBE b; b: {s: chararray} grunt> DUMP b; (abc) Let's now provide a schema where I specify the type. Note that I am not performing an explicit cast, just changing the type. grunt> c = FOREACH b GENERATE s AS m:long; I fully expected this to fail at compile time, since I am taking a chararray and saying that it is now a long. grunt> DESCRIBE c; c: {m: long} When it didn't fail at compile time, I expected it to either: 1) fail at runtime 2) perform a cast Yet, I get grunt> DUMP c; (abc) So, it didn't complain at either compile time or runtime. If I then apply an explicit cast operation, the system thinks that c.m is already a long and it doesn't modify modify the value. grunt> d = FOREACH c GENERATE (long)m AS n; grunt> DESCRIBE d; d: {n: long} grunt> DUMP d; (abc) This looks like a bug to me. Seems to me that the 'conversion' from chararray => long without an explicit cast should generate a compile-time error. Otherwise, it should be interpreted as an implicit cast. In either case, the value 'abc' should not be allowed to 'pass' as a long integer value. I ran into this issue this because I am trying to auto-generate some pig code. I thought it would be best to fully specify schemas & types during FOREACH/GENERATE projections because it would give me more compile-time type safety. It now looks to me like this would be a mistake, because erroneous types might make things even worse. Thoughts/advice appreciated. Thanks for your great work. Michael
