[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation
[ https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841216#comment-17841216 ] Rushikesh Kavar commented on SPARK-48009: - I am calling OverrideAvro first and then AppendAvro > Specifications for Apache Spark hadoop Avro append operation > > > Key: SPARK-48009 > URL: https://issues.apache.org/jira/browse/SPARK-48009 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.3 >Reporter: Rushikesh Kavar >Priority: Minor > > Consider a path /a/b/c > Assume, I write the avro to folder using apache spark. > After it is written, Assume I try to append dataset to this to folder. > I want to see the specification of what happens in case of append. > After doing PoC, I found out that when dataet which is appended is having > same schema as of existing data, data gets just appended. But I want to see > clear docs of what happens exactly in case of append. > I am attaching my testing java code. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation
[ https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841214#comment-17841214 ] Rushikesh Kavar commented on SPARK-48009: - import org.apache.spark.sql.*; import java.util.List; public class Writer { public static void writeAvro(List list, String path) { writeAvro(list, path, SaveMode.Overwrite); } public static void writeAvro(List list, String path, SaveMode saveMode) { Dataset dataset = getDatasetFromList(list); dataset.write().format("avro") .mode(saveMode) .save(path); } public static void writeAvro(Dataset ds, String path, SaveMode saveMode) { ds.write().format("avro") .mode(saveMode) .save(path); } public static Dataset getDatasetFromList(List list) { Class clazz = list.get(0).getClass(); SparkSession spark = SparkSession.builder() .config("spark.master", "local") .getOrCreate(); SQLContext context = spark.sqlContext(); Dataset dataset = context.createDataset(list, Encoders.bean(clazz)).toDF(); return dataset; } } > Specifications for Apache Spark hadoop Avro append operation > > > Key: SPARK-48009 > URL: https://issues.apache.org/jira/browse/SPARK-48009 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.3 >Reporter: Rushikesh Kavar >Priority: Minor > > Consider a path /a/b/c > Assume, I write the avro to folder using apache spark. > After it is written, Assume I try to append dataset to this to folder. > I want to see the specification of what happens in case of append. > After doing PoC, I found out that when dataet which is appended is having > same schema as of existing data, data gets just appended. But I want to see > clear docs of what happens exactly in case of append. > I am attaching my testing java code. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation
[ https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841213#comment-17841213 ] Rushikesh Kavar commented on SPARK-48009: - import org.apache.spark.sql.SaveMode; import org.example.avro.Writer; import java.util.ArrayList; import java.util.List; public class OverrideAvro { public static void main(String[] args) { // C:\Users\kavarus\testing\spark-testing\data Writer.writeAvro(getMockData(), "C:\\Users\\kavarus\\testing\\spark-testing\\data", SaveMode.Overwrite); } public static List getMockData() { List lst = new ArrayList<>(); lst.add(new Modal("1", "Test1", 26)); lst.add(new Modal("2", "Test2", 28)); return lst; } } > Specifications for Apache Spark hadoop Avro append operation > > > Key: SPARK-48009 > URL: https://issues.apache.org/jira/browse/SPARK-48009 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.3 >Reporter: Rushikesh Kavar >Priority: Minor > > Consider a path /a/b/c > Assume, I write the avro to folder using apache spark. > After it is written, Assume I try to append dataset to this to folder. > I want to see the specification of what happens in case of append. > After doing PoC, I found out that when dataet which is appended is having > same schema as of existing data, data gets just appended. But I want to see > clear docs of what happens exactly in case of append. > I am attaching my testing java code. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation
[ https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841211#comment-17841211 ] Rushikesh Kavar commented on SPARK-48009: - import org.apache.spark.sql.SaveMode; import org.example.avro.Writer; import java.util.ArrayList; import java.util.List; public class AppendAvro { public static void main(String[] args) { Writer.writeAvro(getMockData(), "C:\\Users\\kavarus\\testing\\spark-testing\\data", SaveMode.Append); } public static List getMockData() { List lst = new ArrayList<>(); lst.add(new Modal("3", "Test3", 27)); lst.add(new Modal("4", "Test4", 27)); return lst; } } > Specifications for Apache Spark hadoop Avro append operation > > > Key: SPARK-48009 > URL: https://issues.apache.org/jira/browse/SPARK-48009 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.3 >Reporter: Rushikesh Kavar >Priority: Minor > > Consider a path /a/b/c > Assume, I write the avro to folder using apache spark. > After it is written, Assume I try to append dataset to this to folder. > I want to see the specification of what happens in case of append. > After doing PoC, I found out that when dataet which is appended is having > same schema as of existing data, data gets just appended. But I want to see > clear docs of what happens exactly in case of append. > I am attaching my testing java code. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation
[ https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841212#comment-17841212 ] Rushikesh Kavar commented on SPARK-48009: - public class Modal { public String id; public String name; public int age; public Modal(String id, String name, int age) { this.id = id; this.name = name; this.age = age; } public String getId() { return id; } public void setId(String id) { this.id = id; } public String getName() { return name; } public void setName(String name) { this.name = name; } public int getAge() { return age; } public void setAge(int age) { this.age = age; } } > Specifications for Apache Spark hadoop Avro append operation > > > Key: SPARK-48009 > URL: https://issues.apache.org/jira/browse/SPARK-48009 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.3 >Reporter: Rushikesh Kavar >Priority: Minor > > Consider a path /a/b/c > Assume, I write the avro to folder using apache spark. > After it is written, Assume I try to append dataset to this to folder. > I want to see the specification of what happens in case of append. > After doing PoC, I found out that when dataet which is appended is having > same schema as of existing data, data gets just appended. But I want to see > clear docs of what happens exactly in case of append. > I am attaching my testing java code. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation
[ https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841210#comment-17841210 ] Rushikesh Kavar commented on SPARK-48009: - I will attach the code within few hours > Specifications for Apache Spark hadoop Avro append operation > > > Key: SPARK-48009 > URL: https://issues.apache.org/jira/browse/SPARK-48009 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.3 >Reporter: Rushikesh Kavar >Priority: Minor > > Consider a path /a/b/c > Assume, I write the avro to folder using apache spark. > After it is written, Assume I try to append dataset to this to folder. > I want to see the specification of what happens in case of append. > After doing PoC, I found out that when dataet which is appended is having > same schema as of existing data, data gets just appended. But I want to see > clear docs of what happens exactly in case of append. > I am attaching my testing java code. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org