Subtle point. I can provide schema with Parquet, as you note. (Actually, for Parquet, Drill is schema-required: I can't not provide a schema due to the nature of Parquet...)
But, I can't provide a schema for JSON, CSV, etc. The point is, Drill forbids the user from providing a schema; only the file format itself can provide the schema (or not, in the case of JSON). This is the very heart of the problem. The root cause of our schema change exception is that vectors are, indeed, strongly typed. But, file columns are not. Here is my favorite: {x: 10} {x: 10.1} Blam! Query fails because the vector is chosen as BigInt, then we discover it really should have been Float8. (If the answer is: go back and rebuild the vector with the new type, consider the case that 100K records separate the two above so that the first batch is long gone by the time we see the offending record. If only I could tell Drill to use Float8 (or Decimal) up front... Views won't help here because the failure occurs before a view can kick in. However, presumably, I could write a view to handle a different classic case: myDir / |- File 1: {a: 10, b: "foo"} |- File 2: {a: 20} With query: SELECT a, b FROM myDir For File 2, Drill will guess that b is a Nullable Int, but it is really VarChar. I think I could write clever SQL that says: If b is of type Nullable Int, return NULL cast to nullable VarChar, else return b The irony is that I must to write procedural code to declare a static attribute of the data. Yet SQL is otherwise declarative: I state what I want, not how to implement it. Life would be so much easier if I could just say, "trust me, when you read column b, it is a VarChar." Thanks, - Paul On Tuesday, April 3, 2018, 10:53:27 AM PDT, Ted Dunning <ted.dunn...@gmail.com> wrote: I don't see why you say that Drill is schema-forbidden. The Parquet reader, for instance, makes strong use of the implied schema to facilitate reading of typed data. Likewise, the vectorized internal format is strongly typed and, as such, uses schema information. Views are another way to communicate schema information. It is true that you can't, say, view comments on fields from the command line. But I don't understand saying "schema-forbidden". On Tue, Apr 3, 2018 at 10:01 AM, Paul Rogers <par0...@yahoo.com.invalid> wrote: > Here is another way to think about it. Today, Drill is "schema-forbidden": > even if I know the schema, I can't communicate that to Drill; Drill must > figure it out on its own, making the same mistakes every time on ambiguous > schemas. >