You can call that on sparkSession to On Thu, 18 Nov 2021, 10:48 , <mar...@wunderlich.com> wrote:
> PS: The following works, but it seems rather awkward having to use the > SQLContext here. > > SQLContext sqlContext = new SQLContext(sparkContext); > > Dataset<Row> data = sqlContext > .createDataset(textList, Encoders.STRING()) > .withColumnRenamed("value", "text"); > > > > > Am 2021-11-18 11:26, schrieb mar...@wunderlich.com: > > Hello, > > I am struggling with a task that should be super simple: I would like to > create a Spark DF of Type Dataset<Row> with one column from a single String > (or from a one-element List of Strings). The column header should be "text". > > SparkContext.parallelize() does not work, because it returns RDD<T> and > not Dataset<Row> and it takes a "ClassTag" as 3rd parameter. > > I am able to convert a List of Strings to JavaRDD<Row> using this: > JavaSparkContext javaSparkContext = new JavaSparkContext(sparkContext); > > JavaRDD<String> javaRdd = javaSparkContext.parallelize(textList); > > But then I am stuck with this javaRDD. Besides, it seems overly complex > having to create an intermediate representation. > > There is also this SO post with a solution in Scala that I have not been > able to convert to Java, because the APIs differ: > > > https://stackoverflow.com/questions/44028677/how-to-create-a-dataframe-from-a-string > > Basically, what I am looking for is something simple like: > > Dataset<Row> myData = sparkSession.createDataFrame(textList, "text"); > > Any hints? Thanks a lot. > > Cheers, > > Martin > >