[ https://issues.apache.org/jira/browse/PHOENIX-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181346#comment-15181346 ]
lichenglin edited comment on PHOENIX-2745 at 3/5/16 12:30 AM: -------------------------------------------------------------- Here is my demo code: SparkConf conf = new SparkConf(); conf.setAppName("test"); conf.setMaster("local[2]"); JavaSparkContext sc = new JavaSparkContext(conf); SQLContext sql = new SQLContext(sc); JavaRDD<Person> rdd1 = sc.parallelize(Arrays.asList( new Person("A", 14), new Person("B", 18))); JavaRDD<Person> rdd2 = sc.parallelize(Arrays.asList( new Person("A", 16), new Person("B", 18))); DataFrame d1 = sql.createDataFrame(rdd1, Person.class); DataFrame d2 = sql.createDataFrame(rdd2, Person.class); d1.write().mode(SaveMode.Overwrite).save("sparktable"); d2.write().mode(SaveMode.Overwrite).save("sparktable"); sql.read().load("sparktable").show(); d1.write().mode(SaveMode.Overwrite).format("org.apache.phoenix.spark") .option("table", "test").option("zkUrl", "namenode:2181") .save(); d2.write().mode(SaveMode.Overwrite).format("org.apache.phoenix.spark") .option("table", "test").option("zkUrl", "namenode:2181") .save(); sql.read().format("org.apache.phoenix.spark").option("table", "test") .option("zkUrl", "namenode:2181").load().show(); Person.class just have two field age and name; Phoenix table's dll is "CREATE TABLE test ( age integer not null, name varchar not null,CONSTRAINT pk PRIMARY KEY (age, name) );" the spark show like this: +---+----+ |age|name| +---+----+ | 16| A| | 18| B| +---+----+ but phoniex show like this: +---+----+ |AGE|NAME| +---+----+ | 14| A| | 16| A| | 18| B| +---+----+ by the way if you use a Hivecontext instead of Sqlcontext you can see the drop dll in the sparklog was (Author: licl): Here is my demo code: SparkConf conf = new SparkConf(); conf.setAppName("test"); conf.setMaster("local[2]"); JavaSparkContext sc = new JavaSparkContext(conf); SQLContext sql = new SQLContext(sc); JavaRDD<Person> rdd1 = sc.parallelize(Arrays.asList( new Person("A", 14), new Person("B", 18))); JavaRDD<Person> rdd2 = sc.parallelize(Arrays.asList( new Person("A", 16), new Person("B", 18))); DataFrame d1 = sql.createDataFrame(rdd1, Person.class); DataFrame d2 = sql.createDataFrame(rdd2, Person.class); d1.write().mode(SaveMode.Overwrite).save("sparktable"); d2.write().mode(SaveMode.Overwrite).save("sparktable"); sql.read().load("sparktable").show(); d1.write().mode(SaveMode.Overwrite).format("org.apache.phoenix.spark") .option("table", "test").option("zkUrl", "namenode:2181") .save(); d2.write().mode(SaveMode.Overwrite).format("org.apache.phoenix.spark") .option("table", "test").option("zkUrl", "namenode:2181") .save(); sql.read().format("org.apache.phoenix.spark").option("table", "test") .option("zkUrl", "namenode:2181").load().show(); Person.class just have two field age and name; Phoenix table's dll is "CREATE TABLE test ( age integer not null, name varchar not null,CONSTRAINT pk PRIMARY KEY (age, name) );" the spark show like this: +---+----+ |age|name| +---+----+ | 16| A| | 18| B| +---+----+ but phoniex show like this: +---+----+ |AGE|NAME| +---+----+ | 14| A| | 16| A| | 18| B| +---+----+ > The spark savemode not work correctly > ------------------------------------- > > Key: PHOENIX-2745 > URL: https://issues.apache.org/jira/browse/PHOENIX-2745 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.6.0 > Reporter: lichenglin > > When saving a dataframe to spark with the mode SaveMode.Overwrite > spark will drop the table first and load the new dataframe > but phoinex just replace the old data to the new data according to the > primary key > the old datas still exsits. > the overwrite actually work as append -- This message was sent by Atlassian JIRA (v6.3.4#6332)