Re: [PR] PHOENIX-7407 Remove deprecated datasource V1 code from spark2 and spark3 connector [phoenix-connectors]

via GitHub Thu, 29 May 2025 22:02:13 -0700


stoty commented on code in PR #145:
URL: 
https://github.com/apache/phoenix-connectors/pull/145#discussion_r2115135405



##########
phoenix5-spark/src/main/scala/org/apache/phoenix/spark/datasource/v2/PhoenixSparkSqlRelation.scala:
##########
@@ -1,3 +1,21 @@
+/*

Review Comment:
   Thanks.
   As you are the orginal author, we can handle this here instead of a separate 
ticket.



##########
phoenix5-spark/src/it/java/org/apache/phoenix/spark/DataSourceApiIT.java:
##########
@@ -73,10 +73,8 @@ public Configuration getConfiguration(Configuration 
confToClone) {
 
     @Test
     public void basicWriteAndReadBackTest() throws SQLException {
-        SparkConf sparkConf = new 
SparkConf().setMaster("local").setAppName("phoenix-test")
-                .set("spark.hadoopRDD.ignoreEmptySplits", "false");
-        JavaSparkContext jsc = new JavaSparkContext(sparkConf);
-        SQLContext sqlContext = new SQLContext(jsc);
+
+        SparkSession spark = SparkUtil.getSparkSession();

Review Comment:
   Looks like spark.hadoopRDD.ignoreEmptySplits is default since 3.2.0, so 
removing it should be OK.



##########
phoenix5-spark3/src/main/java/org/apache/phoenix/spark/sql/connector/writer/PhoenixWriteBuilder.java:
##########
@@ -18,13 +18,28 @@
 package org.apache.phoenix.spark.sql.connector.writer;
 
 import 
org.apache.phoenix.thirdparty.com.google.common.annotations.VisibleForTesting;
-import org.apache.spark.sql.connector.write.BatchWrite;
-import org.apache.spark.sql.connector.write.LogicalWriteInfo;
 import org.apache.spark.sql.connector.write.WriteBuilder;
+import org.apache.spark.sql.connector.write.SupportsOverwrite;
+import org.apache.spark.sql.connector.write.LogicalWriteInfo;
+import org.apache.spark.sql.connector.write.BatchWrite;
+import org.apache.spark.sql.sources.Filter;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 import java.util.Map;
 
-public class PhoenixWriteBuilder implements WriteBuilder {
+/**
+ * The PhoenixWriteBuilder class is responsible for constructing and 
configuring a write operation
+ * for Phoenix when interfacing with Spark's data source API. This class 
implements the WriteBuilder
+ * interface for building write operations and SupportsOverwrite interface to 
handle overwrite behavior.
+ *
+ * The class facilitates the creation of a batch write operation that is 
configured with the provided
+ * logical write information and options specific to the Phoenix data source.
+ *
+ * Note: Overwrite mode does not do truncate table and behave the same as 
Append mode.

Review Comment:
   grammer: "behaves"



##########
phoenix5-spark/src/it/java/org/apache/phoenix/spark/DataSourceApiIT.java:
##########
@@ -73,10 +73,8 @@ public Configuration getConfiguration(Configuration 
confToClone) {
 
     @Test
     public void basicWriteAndReadBackTest() throws SQLException {
-        SparkConf sparkConf = new 
SparkConf().setMaster("local").setAppName("phoenix-test")
-                .set("spark.hadoopRDD.ignoreEmptySplits", "false");
-        JavaSparkContext jsc = new JavaSparkContext(sparkConf);
-        SQLContext sqlContext = new SQLContext(jsc);
+
+        SparkSession spark = SparkUtil.getSparkSession();

Review Comment:
   since 3.2.0 / SPARK-34809 spark.hadoopRDD.ignoreEmptySplits is enabled by 
default.
   
   However, this uses Spark 2. Shouldn't we keep that property for Spark 2 ?



##########
phoenix5-spark3/README.md:
##########
@@ -165,7 +165,9 @@ The `save` is method on DataFrame allows passing in a data 
source type. You can
 specify which table to persist the DataFrame to. The column names are derived 
from
 the DataFrame's schema field names, and must match the Phoenix column names.
 
-The `save` method also takes a `SaveMode` option, for which only 
`SaveMode.Append` is supported.
+The `save` method also takes a `SaveMode` option, it is recommended to use 
`SaveMode.Append`.
+For maintaining compatibility with source type `"org.apache.phoenix.spark"`, 
+`SaveMode.Overwrite` is accepted but it behave same way as `SaveMode.Append`.

Review Comment:
   grammar: "behaves the same way"



##########
phoenix5-spark3/README.md:
##########
@@ -341,10 +343,8 @@ the deprected `zkUrl` parameter for backwards 
compatibility purposes. If neither
 it falls back to using connection defined by hbase-site.xml.
 - `"jdbcUrl"` expects a full Phoenix JDBC URL, i.e. "jdbc:phoenix" or 
"jdbc:phoenix:zkHost:zkport",
 while `"zkUrl"` expects the ZK quorum only, i.e. "zkHost:zkPort"
-- If you want to use DataSourceV1, you can use source type 
`"org.apache.phoenix.spark"`
-  instead of `"phoenix"`, however this is deprecated.
-  The `"org.apache.phoenix.spark"` datasource does not accept the `"jdbcUrl"` 
parameter,
-  only `"zkUrl"`
+- DataSourceV1 implementation was removed, source type 
`"org.apache.phoenix.spark"` 
+uses DatasourceV2 since connector 6.0.0 release.

Review Comment:
   grammer: "uses the"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] PHOENIX-7407 Remove deprecated datasource V1 code from spark2 and spark3 connector [phoenix-connectors]

Reply via email to