[jira] [Comment Edited] (PHOENIX-2745) The spark savemode not work correctly

lichenglin (JIRA) Fri, 04 Mar 2016 16:31:14 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181346#comment-15181346
 ]


lichenglin edited comment on PHOENIX-2745 at 3/5/16 12:30 AM:
--------------------------------------------------------------

Here is my demo code:

SparkConf conf = new SparkConf();
conf.setAppName("test");
conf.setMaster("local[2]");
JavaSparkContext sc = new JavaSparkContext(conf);
SQLContext sql = new SQLContext(sc);
JavaRDD<Person> rdd1 = sc.parallelize(Arrays.asList(
                new Person("A", 14), new Person("B", 18)));
JavaRDD<Person> rdd2 = sc.parallelize(Arrays.asList(
                new Person("A", 16), new Person("B", 18)));
DataFrame d1 = sql.createDataFrame(rdd1, Person.class);
DataFrame d2 = sql.createDataFrame(rdd2, Person.class);
d1.write().mode(SaveMode.Overwrite).save("sparktable");
d2.write().mode(SaveMode.Overwrite).save("sparktable");
sql.read().load("sparktable").show();
d1.write().mode(SaveMode.Overwrite).format("org.apache.phoenix.spark")
                .option("table", "test").option("zkUrl", "namenode:2181")
                .save();
d2.write().mode(SaveMode.Overwrite).format("org.apache.phoenix.spark")
                .option("table", "test").option("zkUrl", "namenode:2181")
                .save();
sql.read().format("org.apache.phoenix.spark").option("table", "test")
                .option("zkUrl", "namenode:2181").load().show();

Person.class just have two field  age and name;
Phoenix table's dll is "CREATE TABLE test ( age integer not null, name varchar 
not null,CONSTRAINT pk PRIMARY KEY (age, name) );" 

the spark show like this:
+---+----+
|age|name|
+---+----+
| 16|   A|
| 18|   B|
+---+----+

but phoniex show like this:

+---+----+
|AGE|NAME|
+---+----+
| 14|   A|
| 16|   A|
| 18|   B|
+---+----+

by the way if you use a Hivecontext instead of Sqlcontext you can see the drop 
dll in the sparklog





was (Author: licl):
Here is my demo code:

SparkConf conf = new SparkConf();
conf.setAppName("test");
conf.setMaster("local[2]");
JavaSparkContext sc = new JavaSparkContext(conf);
SQLContext sql = new SQLContext(sc);
JavaRDD<Person> rdd1 = sc.parallelize(Arrays.asList(
                new Person("A", 14), new Person("B", 18)));
JavaRDD<Person> rdd2 = sc.parallelize(Arrays.asList(
                new Person("A", 16), new Person("B", 18)));
DataFrame d1 = sql.createDataFrame(rdd1, Person.class);
DataFrame d2 = sql.createDataFrame(rdd2, Person.class);
d1.write().mode(SaveMode.Overwrite).save("sparktable");
d2.write().mode(SaveMode.Overwrite).save("sparktable");
sql.read().load("sparktable").show();
d1.write().mode(SaveMode.Overwrite).format("org.apache.phoenix.spark")
                .option("table", "test").option("zkUrl", "namenode:2181")
                .save();
d2.write().mode(SaveMode.Overwrite).format("org.apache.phoenix.spark")
                .option("table", "test").option("zkUrl", "namenode:2181")
                .save();
sql.read().format("org.apache.phoenix.spark").option("table", "test")
                .option("zkUrl", "namenode:2181").load().show();

Person.class just have two field  age and name;
Phoenix table's dll is "CREATE TABLE test ( age integer not null, name varchar 
not null,CONSTRAINT pk PRIMARY KEY (age, name) );" 

the spark show like this:
+---+----+
|age|name|
+---+----+
| 16|   A|
| 18|   B|
+---+----+

but phoniex show like this:

+---+----+
|AGE|NAME|
+---+----+
| 14|   A|
| 16|   A|
| 18|   B|
+---+----+







> The spark savemode not work correctly
> -------------------------------------
>
>                 Key: PHOENIX-2745
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2745
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.6.0
>            Reporter: lichenglin
>
> When saving a dataframe to spark with the mode SaveMode.Overwrite 
> spark will drop the table  first and load the new dataframe 
> but phoinex just replace the old data to the new data according to the 
> primary key 
> the old datas still exsits.
> the overwrite actually work as append



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PHOENIX-2745) The spark savemode not work correctly

Reply via email to