[
https://issues.apache.org/jira/browse/PHOENIX-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181346#comment-15181346
]
lichenglin edited comment on PHOENIX-2745 at 3/5/16 12:30 AM:
--------------------------------------------------------------
Here is my demo code:
SparkConf conf = new SparkConf();
conf.setAppName("test");
conf.setMaster("local[2]");
JavaSparkContext sc = new JavaSparkContext(conf);
SQLContext sql = new SQLContext(sc);
JavaRDD<Person> rdd1 = sc.parallelize(Arrays.asList(
new Person("A", 14), new Person("B", 18)));
JavaRDD<Person> rdd2 = sc.parallelize(Arrays.asList(
new Person("A", 16), new Person("B", 18)));
DataFrame d1 = sql.createDataFrame(rdd1, Person.class);
DataFrame d2 = sql.createDataFrame(rdd2, Person.class);
d1.write().mode(SaveMode.Overwrite).save("sparktable");
d2.write().mode(SaveMode.Overwrite).save("sparktable");
sql.read().load("sparktable").show();
d1.write().mode(SaveMode.Overwrite).format("org.apache.phoenix.spark")
.option("table", "test").option("zkUrl", "namenode:2181")
.save();
d2.write().mode(SaveMode.Overwrite).format("org.apache.phoenix.spark")
.option("table", "test").option("zkUrl", "namenode:2181")
.save();
sql.read().format("org.apache.phoenix.spark").option("table", "test")
.option("zkUrl", "namenode:2181").load().show();
Person.class just have two field age and name;
Phoenix table's dll is "CREATE TABLE test ( age integer not null, name varchar
not null,CONSTRAINT pk PRIMARY KEY (age, name) );"
the spark show like this:
+---+----+
|age|name|
+---+----+
| 16| A|
| 18| B|
+---+----+
but phoniex show like this:
+---+----+
|AGE|NAME|
+---+----+
| 14| A|
| 16| A|
| 18| B|
+---+----+
by the way if you use a Hivecontext instead of Sqlcontext you can see the drop
dll in the sparklog
was (Author: licl):
Here is my demo code:
SparkConf conf = new SparkConf();
conf.setAppName("test");
conf.setMaster("local[2]");
JavaSparkContext sc = new JavaSparkContext(conf);
SQLContext sql = new SQLContext(sc);
JavaRDD<Person> rdd1 = sc.parallelize(Arrays.asList(
new Person("A", 14), new Person("B", 18)));
JavaRDD<Person> rdd2 = sc.parallelize(Arrays.asList(
new Person("A", 16), new Person("B", 18)));
DataFrame d1 = sql.createDataFrame(rdd1, Person.class);
DataFrame d2 = sql.createDataFrame(rdd2, Person.class);
d1.write().mode(SaveMode.Overwrite).save("sparktable");
d2.write().mode(SaveMode.Overwrite).save("sparktable");
sql.read().load("sparktable").show();
d1.write().mode(SaveMode.Overwrite).format("org.apache.phoenix.spark")
.option("table", "test").option("zkUrl", "namenode:2181")
.save();
d2.write().mode(SaveMode.Overwrite).format("org.apache.phoenix.spark")
.option("table", "test").option("zkUrl", "namenode:2181")
.save();
sql.read().format("org.apache.phoenix.spark").option("table", "test")
.option("zkUrl", "namenode:2181").load().show();
Person.class just have two field age and name;
Phoenix table's dll is "CREATE TABLE test ( age integer not null, name varchar
not null,CONSTRAINT pk PRIMARY KEY (age, name) );"
the spark show like this:
+---+----+
|age|name|
+---+----+
| 16| A|
| 18| B|
+---+----+
but phoniex show like this:
+---+----+
|AGE|NAME|
+---+----+
| 14| A|
| 16| A|
| 18| B|
+---+----+
> The spark savemode not work correctly
> -------------------------------------
>
> Key: PHOENIX-2745
> URL: https://issues.apache.org/jira/browse/PHOENIX-2745
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.6.0
> Reporter: lichenglin
>
> When saving a dataframe to spark with the mode SaveMode.Overwrite
> spark will drop the table first and load the new dataframe
> but phoinex just replace the old data to the new data according to the
> primary key
> the old datas still exsits.
> the overwrite actually work as append
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)