Yes. This is what I am after. But I have to use the Java API. And using the Java API I was not able to get the .as() function working
On Fri, Aug 5, 2016 at 7:09 PM, Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > I don't understand where the issue is... > > ➜ spark git:(master) ✗ cat csv-logs/people-1.csv > name,city,country,age,alive > Jacek,Warszawa,Polska,42,true > > val df = spark.read.option("header", true).csv("csv-logs/people-1.csv") > val nameCityPairs = df.select('name, 'city).as[(String, String)] > > scala> nameCityPairs.printSchema > root > |-- name: string (nullable = true) > |-- city: string (nullable = true) > > Is this what you're after? > > Pozdrawiam, > Jacek Laskowski > ---- > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Fri, Aug 5, 2016 at 2:06 PM, Aseem Bansal <asmbans...@gmail.com> wrote: > > I need to use few columns out of a csv. But as there is no option to read > > few columns out of csv so > > 1. I am reading the whole CSV using SparkSession.csv() > > 2. selecting few of the columns using DataFrame.select() > > 3. applying schema using the .as() function of Dataset<Row>. I tried to > > extent org.apache.spark.sql.Encoder as the input for as function > > > > But I am getting the following exception > > > > Exception in thread "main" java.lang.RuntimeException: Only expression > > encoders are supported today > > > > So my questions are - > > 1. Is it possible to read few columns instead of whole CSV? I cannot > change > > the CSV as that is upstream data > > 2. How do I apply schema to few columns if I cannot write my encoder? >