You can call that on sparkSession to

On Thu, 18 Nov 2021, 10:48 , <mar...@wunderlich.com> wrote:

> PS: The following works, but it seems rather awkward having to use the
> SQLContext here.
>
> SQLContext sqlContext = new SQLContext(sparkContext);
>
> Dataset<Row> data = sqlContext
>       .createDataset(textList, Encoders.STRING())
>       .withColumnRenamed("value", "text");
>
>
>
>
> Am 2021-11-18 11:26, schrieb mar...@wunderlich.com:
>
> Hello,
>
> I am struggling with a task that should be super simple: I would like to
> create a Spark DF of Type Dataset<Row> with one column from a single String
> (or from a one-element List of Strings). The column header should be "text".
>
> SparkContext.parallelize() does not work, because it returns RDD<T> and
> not Dataset<Row> and it takes a "ClassTag" as 3rd parameter.
>
> I am able to convert a List of Strings to JavaRDD<Row> using this:
>     JavaSparkContext javaSparkContext = new JavaSparkContext(sparkContext);
>
>     JavaRDD<String> javaRdd = javaSparkContext.parallelize(textList);
>
> But then I am stuck with this javaRDD. Besides, it seems overly complex
> having to create an intermediate representation.
>
> There is also this SO post with a solution in Scala that I have not been
> able to convert to Java, because the APIs differ:
>
>
> https://stackoverflow.com/questions/44028677/how-to-create-a-dataframe-from-a-string
>
> Basically, what I am looking for is something simple like:
>
>     Dataset<Row> myData = sparkSession.createDataFrame(textList, "text");
>
> Any hints? Thanks a lot.
>
> Cheers,
>
> Martin
>
>

Reply via email to