[ https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841214#comment-17841214 ]
Rushikesh Kavar commented on SPARK-48009: ----------------------------------------- import org.apache.spark.sql.*; import java.util.List; public class Writer { public static void writeAvro(List list, String path) { writeAvro(list, path, SaveMode.Overwrite); } public static void writeAvro(List list, String path, SaveMode saveMode) { Dataset<Row> dataset = getDatasetFromList(list); dataset.write().format("avro") .mode(saveMode) .save(path); } public static void writeAvro(Dataset<Row> ds, String path, SaveMode saveMode) { ds.write().format("avro") .mode(saveMode) .save(path); } public static Dataset<Row> getDatasetFromList(List list) { Class clazz = list.get(0).getClass(); SparkSession spark = SparkSession.builder() .config("spark.master", "local") .getOrCreate(); SQLContext context = spark.sqlContext(); Dataset<Row> dataset = context.createDataset(list, Encoders.bean(clazz)).toDF(); return dataset; } } > Specifications for Apache Spark hadoop Avro append operation > ------------------------------------------------------------ > > Key: SPARK-48009 > URL: https://issues.apache.org/jira/browse/SPARK-48009 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.4.3 > Reporter: Rushikesh Kavar > Priority: Minor > > Consider a path /a/b/c > Assume, I write the avro to folder using apache spark. > After it is written, Assume I try to append dataset to this to folder. > I want to see the specification of what happens in case of append. > After doing PoC, I found out that when dataet which is appended is having > same schema as of existing data, data gets just appended. But I want to see > clear docs of what happens exactly in case of append. > I am attaching my testing java code. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org