> Please elaborate. I'm mainly aware of the situation in Scala, where the lack of named tuples is the reason why type-safe schema transformation is rather limited. When working with typed data, there are basically two options:
* Use unnamed tuples, which is not really an option, because it either hard-codes column positions (=> unreadable) or requires tedious pattern matching over all fields. * Use (case) classes, which is the standard solution: You write out a case class, which involves typing out all the field names/types once. The problem is that transforming the data cannot be done automatically. Let's assume the input data has 30 columns, so we have to write a first class RawInput with 30 fields. In a later processing step we might want to remove a few columns. This again requires to define a new class ReducedInput with 20+ fields. Eventually we might want to add a bunch of derived columns, and again we have to introduce a new type. The problem can be mitigated by inheritance/traits, but it still remains a work-around which is not very convenient to work with. In Nim, the same can be solved very elegantly by just transforming/constructing named tuples everywhere. That's what the DSL looks like: # A const schema definition is required once. Ideally this is the # only point where we have to type out our 30 columns. const schema = ... # array with field information # For here on, it is just a bunch of macros performing named tuple transformations let df = DF.fromText("test.csv") .map(schemaParser(schema, ";")) # Projection can use whichever is shorter to type df.map(t => t.projectAway(fields, to, remove)) df.map(t => t.projectTo(fields, to, keep)) # Adding new fields also does not require repeating existing fields df.map(t => t.addFields(length: sqrt(t.x^2, t.y^2)) # Eventually even the schema of a join can be computed statically: let joined = dfA.join(dfB, on=[joinField]) This should also play nicely with structural typing in Nim, e.g., passing data frames to functions can be done generically, and does not require to write out field names explicitly. I'm not sure how this would work with objects. Since they are nominal, I guess they would have to be made explicitly available in the outer scope. Currently I leave it up to the user if they want to define their types explicitly, for instance via this macro: type MyRowType = schemaType(schema) proc myExplictlyTypedProc(df: DataFrame[MyRowType]) = ... What I wanted to avoid is that a user has to explicitly name their types for each transformation.