Re: RDD[internalRow] -> DataSet
not possible, but you can add your own object in your project to the spark's package that would give you access to private methods package org.apache.spark.sql import org.apache.spark.rdd.RDD import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.execution.LogicalRDD import org.apache.spark.sql.types.StructType object DataFrameUtil { /** * Creates a DataFrame out of RDD[InternalRow] that you can get using `df.queryExection.toRdd` */ def createFromInternalRows(sparkSession: SparkSession, schema: StructType, rdd: RDD[InternalRow]): DataFrame = { val logicalPlan = LogicalRDD(schema.toAttributes, rdd)(sparkSession) Dataset.ofRows(sparkSession, logicalPlan) } }
Re: RDD[internalRow] -> DataSet
Hi Satyajit, That's exactly what Dataset.rdd does --> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala?utf8=%E2%9C%93#L2916-L2921 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Fri, Dec 8, 2017 at 5:25 AM, satyajit vegesnawrote: > Hi All, > > Is there a way to convert RDD[internalRow] to Dataset , from outside spark > sql package. > > Regards, > Satyajit. >
RDD[internalRow] -> DataSet
Hi All, Is there a way to convert RDD[internalRow] to Dataset , from outside spark sql package. Regards, Satyajit.