stoty commented on code in PR #145:
URL:
https://github.com/apache/phoenix-connectors/pull/145#discussion_r2115135405
##########
phoenix5-spark/src/main/scala/org/apache/phoenix/spark/datasource/v2/PhoenixSparkSqlRelation.scala:
##########
@@ -1,3 +1,21 @@
+/*
Review Comment:
Thanks.
As you are the orginal author, we can handle this here instead of a separate
ticket.
##########
phoenix5-spark/src/it/java/org/apache/phoenix/spark/DataSourceApiIT.java:
##########
@@ -73,10 +73,8 @@ public Configuration getConfiguration(Configuration
confToClone) {
@Test
public void basicWriteAndReadBackTest() throws SQLException {
- SparkConf sparkConf = new
SparkConf().setMaster("local").setAppName("phoenix-test")
- .set("spark.hadoopRDD.ignoreEmptySplits", "false");
- JavaSparkContext jsc = new JavaSparkContext(sparkConf);
- SQLContext sqlContext = new SQLContext(jsc);
+
+ SparkSession spark = SparkUtil.getSparkSession();
Review Comment:
Looks like spark.hadoopRDD.ignoreEmptySplits is default since 3.2.0, so
removing it should be OK.
##########
phoenix5-spark3/src/main/java/org/apache/phoenix/spark/sql/connector/writer/PhoenixWriteBuilder.java:
##########
@@ -18,13 +18,28 @@
package org.apache.phoenix.spark.sql.connector.writer;
import
org.apache.phoenix.thirdparty.com.google.common.annotations.VisibleForTesting;
-import org.apache.spark.sql.connector.write.BatchWrite;
-import org.apache.spark.sql.connector.write.LogicalWriteInfo;
import org.apache.spark.sql.connector.write.WriteBuilder;
+import org.apache.spark.sql.connector.write.SupportsOverwrite;
+import org.apache.spark.sql.connector.write.LogicalWriteInfo;
+import org.apache.spark.sql.connector.write.BatchWrite;
+import org.apache.spark.sql.sources.Filter;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
import java.util.Map;
-public class PhoenixWriteBuilder implements WriteBuilder {
+/**
+ * The PhoenixWriteBuilder class is responsible for constructing and
configuring a write operation
+ * for Phoenix when interfacing with Spark's data source API. This class
implements the WriteBuilder
+ * interface for building write operations and SupportsOverwrite interface to
handle overwrite behavior.
+ *
+ * The class facilitates the creation of a batch write operation that is
configured with the provided
+ * logical write information and options specific to the Phoenix data source.
+ *
+ * Note: Overwrite mode does not do truncate table and behave the same as
Append mode.
Review Comment:
grammer: "behaves"
##########
phoenix5-spark/src/it/java/org/apache/phoenix/spark/DataSourceApiIT.java:
##########
@@ -73,10 +73,8 @@ public Configuration getConfiguration(Configuration
confToClone) {
@Test
public void basicWriteAndReadBackTest() throws SQLException {
- SparkConf sparkConf = new
SparkConf().setMaster("local").setAppName("phoenix-test")
- .set("spark.hadoopRDD.ignoreEmptySplits", "false");
- JavaSparkContext jsc = new JavaSparkContext(sparkConf);
- SQLContext sqlContext = new SQLContext(jsc);
+
+ SparkSession spark = SparkUtil.getSparkSession();
Review Comment:
since 3.2.0 / SPARK-34809 spark.hadoopRDD.ignoreEmptySplits is enabled by
default.
However, this uses Spark 2. Shouldn't we keep that property for Spark 2 ?
##########
phoenix5-spark3/README.md:
##########
@@ -165,7 +165,9 @@ The `save` is method on DataFrame allows passing in a data
source type. You can
specify which table to persist the DataFrame to. The column names are derived
from
the DataFrame's schema field names, and must match the Phoenix column names.
-The `save` method also takes a `SaveMode` option, for which only
`SaveMode.Append` is supported.
+The `save` method also takes a `SaveMode` option, it is recommended to use
`SaveMode.Append`.
+For maintaining compatibility with source type `"org.apache.phoenix.spark"`,
+`SaveMode.Overwrite` is accepted but it behave same way as `SaveMode.Append`.
Review Comment:
grammar: "behaves the same way"
##########
phoenix5-spark3/README.md:
##########
@@ -341,10 +343,8 @@ the deprected `zkUrl` parameter for backwards
compatibility purposes. If neither
it falls back to using connection defined by hbase-site.xml.
- `"jdbcUrl"` expects a full Phoenix JDBC URL, i.e. "jdbc:phoenix" or
"jdbc:phoenix:zkHost:zkport",
while `"zkUrl"` expects the ZK quorum only, i.e. "zkHost:zkPort"
-- If you want to use DataSourceV1, you can use source type
`"org.apache.phoenix.spark"`
- instead of `"phoenix"`, however this is deprecated.
- The `"org.apache.phoenix.spark"` datasource does not accept the `"jdbcUrl"`
parameter,
- only `"zkUrl"`
+- DataSourceV1 implementation was removed, source type
`"org.apache.phoenix.spark"`
+uses DatasourceV2 since connector 6.0.0 release.
Review Comment:
grammer: "uses the"
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]