[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation

2024-04-26 Thread Rushikesh Kavar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841216#comment-17841216
 ] 

Rushikesh Kavar commented on SPARK-48009:
-

I am calling OverrideAvro first and then AppendAvro

> Specifications for Apache Spark hadoop Avro append operation
> 
>
> Key: SPARK-48009
> URL: https://issues.apache.org/jira/browse/SPARK-48009
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.3
>Reporter: Rushikesh Kavar
>Priority: Minor
>
> Consider a path /a/b/c 
> Assume, I write the avro to folder using apache spark.
> After it is written, Assume I try to append dataset to this to folder. 
> I want to see the specification of what happens in case of append.
> After doing PoC, I found out that when dataet which is appended is having 
> same schema as of existing data, data gets just appended. But I want to see 
> clear docs of what happens exactly in case of append.
> I am attaching my testing java code.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation

2024-04-26 Thread Rushikesh Kavar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841214#comment-17841214
 ] 

Rushikesh Kavar commented on SPARK-48009:
-

import org.apache.spark.sql.*;

import java.util.List;

public class Writer {

    public static void writeAvro(List list, String path) {
        writeAvro(list, path, SaveMode.Overwrite);
    }

    public static void writeAvro(List list, String path, SaveMode saveMode) {

        Dataset dataset = getDatasetFromList(list);

        dataset.write().format("avro")
                .mode(saveMode)
                .save(path);
    }

    public static void writeAvro(Dataset ds, String path, SaveMode 
saveMode) {

        ds.write().format("avro")
                .mode(saveMode)
                .save(path);
    }

    public static Dataset getDatasetFromList(List list) {
        Class clazz = list.get(0).getClass();

        SparkSession spark = SparkSession.builder()
                .config("spark.master", "local")
                .getOrCreate();
        SQLContext context = spark.sqlContext();
        Dataset dataset = context.createDataset(list, 
Encoders.bean(clazz)).toDF();
        return dataset;
    }

}

> Specifications for Apache Spark hadoop Avro append operation
> 
>
> Key: SPARK-48009
> URL: https://issues.apache.org/jira/browse/SPARK-48009
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.3
>Reporter: Rushikesh Kavar
>Priority: Minor
>
> Consider a path /a/b/c 
> Assume, I write the avro to folder using apache spark.
> After it is written, Assume I try to append dataset to this to folder. 
> I want to see the specification of what happens in case of append.
> After doing PoC, I found out that when dataet which is appended is having 
> same schema as of existing data, data gets just appended. But I want to see 
> clear docs of what happens exactly in case of append.
> I am attaching my testing java code.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation

2024-04-26 Thread Rushikesh Kavar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841213#comment-17841213
 ] 

Rushikesh Kavar commented on SPARK-48009:
-

import org.apache.spark.sql.SaveMode;
import org.example.avro.Writer;

import java.util.ArrayList;
import java.util.List;

public class OverrideAvro {

    public static void main(String[] args) {
            // C:\Users\kavarus\testing\spark-testing\data
        Writer.writeAvro(getMockData(), 
"C:\\Users\\kavarus\\testing\\spark-testing\\data", SaveMode.Overwrite);
    }

    public static List getMockData() {
        List lst = new ArrayList<>();
        lst.add(new Modal("1", "Test1", 26));
        lst.add(new Modal("2", "Test2", 28));
        return lst;
    }

}

> Specifications for Apache Spark hadoop Avro append operation
> 
>
> Key: SPARK-48009
> URL: https://issues.apache.org/jira/browse/SPARK-48009
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.3
>Reporter: Rushikesh Kavar
>Priority: Minor
>
> Consider a path /a/b/c 
> Assume, I write the avro to folder using apache spark.
> After it is written, Assume I try to append dataset to this to folder. 
> I want to see the specification of what happens in case of append.
> After doing PoC, I found out that when dataet which is appended is having 
> same schema as of existing data, data gets just appended. But I want to see 
> clear docs of what happens exactly in case of append.
> I am attaching my testing java code.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation

2024-04-26 Thread Rushikesh Kavar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841211#comment-17841211
 ] 

Rushikesh Kavar commented on SPARK-48009:
-

 

import org.apache.spark.sql.SaveMode;
import org.example.avro.Writer;

import java.util.ArrayList;
import java.util.List;

public class AppendAvro {

    public static void main(String[] args) {
        Writer.writeAvro(getMockData(), 
"C:\\Users\\kavarus\\testing\\spark-testing\\data", SaveMode.Append);
    }

    public static List getMockData() {
        List lst = new ArrayList<>();
        lst.add(new Modal("3", "Test3", 27));
        lst.add(new Modal("4", "Test4", 27));
        return lst;
    }
}

> Specifications for Apache Spark hadoop Avro append operation
> 
>
> Key: SPARK-48009
> URL: https://issues.apache.org/jira/browse/SPARK-48009
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.3
>Reporter: Rushikesh Kavar
>Priority: Minor
>
> Consider a path /a/b/c 
> Assume, I write the avro to folder using apache spark.
> After it is written, Assume I try to append dataset to this to folder. 
> I want to see the specification of what happens in case of append.
> After doing PoC, I found out that when dataet which is appended is having 
> same schema as of existing data, data gets just appended. But I want to see 
> clear docs of what happens exactly in case of append.
> I am attaching my testing java code.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation

2024-04-26 Thread Rushikesh Kavar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841212#comment-17841212
 ] 

Rushikesh Kavar commented on SPARK-48009:
-

 

public class Modal {
    public String id;
    public String name;
    public int age;

    public Modal(String id, String name, int age) {
        this.id = id;
        this.name = name;
        this.age = age;
    }

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }
}

> Specifications for Apache Spark hadoop Avro append operation
> 
>
> Key: SPARK-48009
> URL: https://issues.apache.org/jira/browse/SPARK-48009
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.3
>Reporter: Rushikesh Kavar
>Priority: Minor
>
> Consider a path /a/b/c 
> Assume, I write the avro to folder using apache spark.
> After it is written, Assume I try to append dataset to this to folder. 
> I want to see the specification of what happens in case of append.
> After doing PoC, I found out that when dataet which is appended is having 
> same schema as of existing data, data gets just appended. But I want to see 
> clear docs of what happens exactly in case of append.
> I am attaching my testing java code.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation

2024-04-26 Thread Rushikesh Kavar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841210#comment-17841210
 ] 

Rushikesh Kavar commented on SPARK-48009:
-

I will attach the code within few hours

> Specifications for Apache Spark hadoop Avro append operation
> 
>
> Key: SPARK-48009
> URL: https://issues.apache.org/jira/browse/SPARK-48009
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.3
>Reporter: Rushikesh Kavar
>Priority: Minor
>
> Consider a path /a/b/c 
> Assume, I write the avro to folder using apache spark.
> After it is written, Assume I try to append dataset to this to folder. 
> I want to see the specification of what happens in case of append.
> After doing PoC, I found out that when dataet which is appended is having 
> same schema as of existing data, data gets just appended. But I want to see 
> clear docs of what happens exactly in case of append.
> I am attaching my testing java code.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org