Thanks for the info. I have been converting an internal data source to V2 and am now preparing it for 2.4.0.
I have a couple of suggestions from my experience so far. First I believe we are missing documentation on this. I am currently writing an internal tutorial based on what I am learning, I would be happy to share it once it gets a little better (not sure where it should go though). The change from using Row to using InternalRow is a little confusing. For generic row we can do Row.fromSeq(values) where values are regular java types (matching the schema). This even includes more complex types like Array[String] and everything just works. For IntrenalRow, this doesn't work for non trivial types. I figured out how to convert strings and timestamps (hopefully I even did it correctly) but I couldn't figure Array[String]. Beyond the fact that I would love to learn how to do the conversion correctly for various types (such as array), I would suggest we should add some method to create the internal row from base types. In the 2.3.0 version, the row we got from Get would be encoded via an encoder which was provided. I managed to get it to work by doing: val encoder = RowEncoder.apply(schema).resolveAndBind() in the constructor and then encoder.toRow(Row.fromSeq(values)) this simply feels a little weird to me. Another issue that I encountered is handling bad data. In our legacy source we have cases where a specific row is bad. What we would do in non spark code is simply skip it. The problem is that in spark, if we put next to be true we must have some row for the get function. This means we always need to read records ahead to figure out if we actually ha something or not. Might we instead be allowed to return null from get in which case the line would just be skipped? Lastly I would be happy for a means to return metrics from the reading (how many records we read, how many bad records we have). Perhaps by allowing to use accumulators in the data source? Sorry for the long winded message, I will probably have more as I continue to explore this. Thanks, Assaf. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org