Pat McDonough created SPARK-16641: ------------------------------------- Summary: Add an Option to Create a Dataset With a Case Class, Ignoring Column Names (Using ordinal instead) Key: SPARK-16641 URL: https://issues.apache.org/jira/browse/SPARK-16641 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.0.0 Reporter: Pat McDonough Priority: Minor
When working with a CSV that has no header row, there isn't a concise method to create a Dataset using a case class. An option to map fields by ordinal rather than field name would be great. For example, given the following case class: {code} case class Part(partkey: Int, name: String, mfgr: String, brand: String, _type: String, size: Int, container: String, retailprice: Double, comments: String) {code} I'd like to use the following: {code} val parts = spark.read.option("delimiter", "|").option("header", "false") .csv("dbfs:/databricks-datasets/tpch/data-001/part/").as[Part] {code} But that won't work because the field names (_c0, _c1, _c2...) do not match the Case class field names. Instead, I end up writing a bunch of extra conversion code in a map function. {code} val parts = spark.read.option("delimiter", "|").option("header", "false") .csv("dbfs:/databricks-datasets/tpch/data-001/part/") .map(p => new part(p.getString(0).trim().toInt, p.getString(1), p.getString(2), p.getString(3), p.getString(4), p.getString(5).trim().toInt, p.getString(6), p.getString(7).trim().toDouble, p.getString(8))) {code} CC: [~rxin] -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org