(spark) branch master updated: [SPARK-46473][SQL] Reuse `getPartitionedFile` method
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 223afea9960c [SPARK-46473][SQL] Reuse `getPartitionedFile` method 223afea9960c is described below commit 223afea9960c7ef1a4c8654e043e860f6c248185 Author: huangxiaoping <1754789...@qq.com> AuthorDate: Wed Jan 31 22:59:20 2024 -0600 [SPARK-46473][SQL] Reuse `getPartitionedFile` method ### What changes were proposed in this pull request? Reuse `getPartitionedFile` method to reduce redundant code. ### Why are the changes needed? Reduce redundant code. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? No Closes #44437 from huangxiaopingRD/SPARK-46473. Authored-by: huangxiaoping <1754789...@qq.com> Signed-off-by: Sean Owen --- .../apache/spark/sql/execution/DataSourceScanExec.scala| 2 +- .../apache/spark/sql/execution/PartitionedFileUtil.scala | 14 +++--- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala index b3b2b0eab055..2622eadaefb3 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala @@ -645,7 +645,7 @@ case class FileSourceScanExec( logInfo(s"Planning with ${bucketSpec.numBuckets} buckets") val filesGroupedToBuckets = selectedPartitions.flatMap { p => -p.files.map(f => PartitionedFileUtil.getPartitionedFile(f, p.values)) +p.files.map(f => PartitionedFileUtil.getPartitionedFile(f, p.values, 0, f.getLen)) }.groupBy { f => BucketingUtils .getBucketId(f.toPath.getName) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/PartitionedFileUtil.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/PartitionedFileUtil.scala index b31369b6768e..997859058de1 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/PartitionedFileUtil.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/PartitionedFileUtil.scala @@ -33,20 +33,20 @@ object PartitionedFileUtil { (0L until file.getLen by maxSplitBytes).map { offset => val remaining = file.getLen - offset val size = if (remaining > maxSplitBytes) maxSplitBytes else remaining -val hosts = getBlockHosts(getBlockLocations(file.fileStatus), offset, size) -PartitionedFile(partitionValues, SparkPath.fromPath(file.getPath), offset, size, hosts, - file.getModificationTime, file.getLen, file.metadata) +getPartitionedFile(file, partitionValues, offset, size) } } else { - Seq(getPartitionedFile(file, partitionValues)) + Seq(getPartitionedFile(file, partitionValues, 0, file.getLen)) } } def getPartitionedFile( file: FileStatusWithMetadata, - partitionValues: InternalRow): PartitionedFile = { -val hosts = getBlockHosts(getBlockLocations(file.fileStatus), 0, file.getLen) -PartitionedFile(partitionValues, SparkPath.fromPath(file.getPath), 0, file.getLen, hosts, + partitionValues: InternalRow, + start: Long, + length: Long): PartitionedFile = { +val hosts = getBlockHosts(getBlockLocations(file.fileStatus), start, length) +PartitionedFile(partitionValues, SparkPath.fromPath(file.getPath), start, length, hosts, file.getModificationTime, file.getLen, file.metadata) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46882][SS][TEST] Replace unnecessary AtomicInteger with int
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c63dea8f4235 [SPARK-46882][SS][TEST] Replace unnecessary AtomicInteger with int c63dea8f4235 is described below commit c63dea8f42357ecfd4fe41f04732e2cb0d0d53ae Author: beliefer AuthorDate: Wed Jan 31 17:47:50 2024 -0800 [SPARK-46882][SS][TEST] Replace unnecessary AtomicInteger with int ### What changes were proposed in this pull request? This PR propose to replace unnecessary `AtomicInteger` with int. ### Why are the changes needed? The variable `value` of `GetMaxCounter` always guarded by itself. So we can replace the unnecessary `AtomicInteger` with int. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? GA. ### Was this patch authored or co-authored using generative AI tooling? 'No'. Closes #44907 from beliefer/SPARK-46882. Authored-by: beliefer Signed-off-by: Dongjoon Hyun --- .../apache/spark/streaming/util/WriteAheadLogSuite.scala| 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/streaming/src/test/scala/org/apache/spark/streaming/util/WriteAheadLogSuite.scala b/streaming/src/test/scala/org/apache/spark/streaming/util/WriteAheadLogSuite.scala index 3a9fffec13cf..cf9d5b7387f7 100644 --- a/streaming/src/test/scala/org/apache/spark/streaming/util/WriteAheadLogSuite.scala +++ b/streaming/src/test/scala/org/apache/spark/streaming/util/WriteAheadLogSuite.scala @@ -20,7 +20,6 @@ import java.io._ import java.nio.ByteBuffer import java.util.{Iterator => JIterator} import java.util.concurrent.{CountDownLatch, RejectedExecutionException, ThreadPoolExecutor, TimeUnit} -import java.util.concurrent.atomic.AtomicInteger import scala.collection.mutable.ArrayBuffer import scala.concurrent._ @@ -238,14 +237,14 @@ class FileBasedWriteAheadLogSuite val executionContext = ExecutionContext.fromExecutorService(fpool) class GetMaxCounter { - private val value = new AtomicInteger() - @volatile private var max: Int = 0 + private var value = 0 + private var max: Int = 0 def increment(): Unit = synchronized { -val atInstant = value.incrementAndGet() -if (atInstant > max) max = atInstant +value = value + 1 +if (value > max) max = value } - def decrement(): Unit = synchronized { value.decrementAndGet() } - def get(): Int = synchronized { value.get() } + def decrement(): Unit = synchronized { value = value - 1 } + def get(): Int = synchronized { value } def getMax(): Int = synchronized { max } } try { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46931][PS] Implement `{Frame, Series}.to_hdf`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 88f121c47778 [SPARK-46931][PS] Implement `{Frame, Series}.to_hdf` 88f121c47778 is described below commit 88f121c47778f0755862046d09484a83932cb30b Author: Ruifeng Zheng AuthorDate: Wed Jan 31 08:41:21 2024 -0800 [SPARK-46931][PS] Implement `{Frame, Series}.to_hdf` ### What changes were proposed in this pull request? Implement `{Frame, Series}.to_hdf` ### Why are the changes needed? pandas parity ### Does this PR introduce _any_ user-facing change? yes ``` In [3]: df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c']) In [4]: df.to_hdf('/tmp/data.h5', key='df', mode='w') In [5]: psdf = ps.from_pandas(df) In [6]: psdf.to_hdf('/tmp/data2.h5', key='df', mode='w') /Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/utils.py:1015: PandasAPIOnSparkAdviceWarning: `to_hdf` loads all data into the driver's memory. It should only be used if the resulting DataFrame is expected to be small. warnings.warn(message, PandasAPIOnSparkAdviceWarning) In [7]: !ls /tmp/*h5 /tmp/data.h5/tmp/data2.h5 In [8]: !ls -lh /tmp/*h5 -rw-r--r-- 1 ruifeng.zheng wheel 6.9K Jan 31 12:21 /tmp/data.h5 -rw-r--r-- 1 ruifeng.zheng wheel 6.9K Jan 31 12:21 /tmp/data2.h5 ``` ### How was this patch tested? manually test, `hdf` requires additional library `pytables` which in turn needs [many prerequisites](https://www.pytables.org/usersguide/installation.html#prerequisites) since `pytables` is just a optional dep of `Pandas`, so I think we can avoid adding it to CI first. ### Was this patch authored or co-authored using generative AI tooling? no Closes #44966 from zhengruifeng/ps_to_hdf. Authored-by: Ruifeng Zheng Signed-off-by: Dongjoon Hyun --- .../docs/source/reference/pyspark.pandas/frame.rst | 1 + .../source/reference/pyspark.pandas/series.rst | 1 + python/pyspark/pandas/generic.py | 120 + python/pyspark/pandas/missing/frame.py | 1 - python/pyspark/pandas/missing/series.py| 1 - 5 files changed, 122 insertions(+), 2 deletions(-) diff --git a/python/docs/source/reference/pyspark.pandas/frame.rst b/python/docs/source/reference/pyspark.pandas/frame.rst index 12cf6e7db12f..77b60468b8fb 100644 --- a/python/docs/source/reference/pyspark.pandas/frame.rst +++ b/python/docs/source/reference/pyspark.pandas/frame.rst @@ -286,6 +286,7 @@ Serialization / IO / Conversion DataFrame.to_json DataFrame.to_dict DataFrame.to_excel + DataFrame.to_hdf DataFrame.to_clipboard DataFrame.to_markdown DataFrame.to_records diff --git a/python/docs/source/reference/pyspark.pandas/series.rst b/python/docs/source/reference/pyspark.pandas/series.rst index 88d1861c6ccf..5606fa93a5f3 100644 --- a/python/docs/source/reference/pyspark.pandas/series.rst +++ b/python/docs/source/reference/pyspark.pandas/series.rst @@ -486,6 +486,7 @@ Serialization / IO / Conversion Series.to_json Series.to_csv Series.to_excel + Series.to_hdf Series.to_frame Pandas-on-Spark specific diff --git a/python/pyspark/pandas/generic.py b/python/pyspark/pandas/generic.py index 77cefb53fe5d..ed2aeb8ea6af 100644 --- a/python/pyspark/pandas/generic.py +++ b/python/pyspark/pandas/generic.py @@ -1103,6 +1103,126 @@ class Frame(object, metaclass=ABCMeta): psdf._to_internal_pandas(), self.to_excel, f, args ) +def to_hdf( +self, +path_or_buf: Union[str, pd.HDFStore], +key: str, +mode: str = "a", +complevel: Optional[int] = None, +complib: Optional[str] = None, +append: bool = False, +format: Optional[str] = None, +index: bool = True, +min_itemsize: Optional[Union[int, Dict[str, int]]] = None, +nan_rep: Optional[Any] = None, +dropna: Optional[bool] = None, +data_columns: Optional[Union[bool, List[str]]] = None, +errors: str = "strict", +encoding: str = "UTF-8", +) -> None: +""" +Write the contained data to an HDF5 file using HDFStore. + +.. note:: This method should only be used if the resulting DataFrame is expected + to be small, as all the data is loaded into the driver's memory. + +.. versionadded:: 4.0.0 + +Parameters +-- +path_or_buf : str or pandas.HDFStore +File path or HDFStore object. +key : str +Identifier for the group in the store. +mode : {'a', 'w', 'r+'}, default 'a' +Mode to open file: + +- 'w': write, a new file is
(spark) branch master updated: [SPARK-46930][SQL] Add support for a custom prefix for Union type fields in Avro
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d49265a170fb [SPARK-46930][SQL] Add support for a custom prefix for Union type fields in Avro d49265a170fb is described below commit d49265a170fb7bb06471d97f4483139529939ecd Author: Ivan Sadikov AuthorDate: Wed Jan 31 08:39:46 2024 -0800 [SPARK-46930][SQL] Add support for a custom prefix for Union type fields in Avro ### What changes were proposed in this pull request? This PR enhances stable ids functionality in Avro by allowing users to configure a custom prefix for Union type member fields when `enableStableIdentifiersForUnionType` is enabled. Without the patch, the fields are generated with `member_` prefix, e.g. `member_int`, `member_string`. This could become difficult to change for complex schemas. The solution is to add a new option `stableIdentifierPrefixForUnionType` which defaults to `member_` and allows users to configure whatever prefix they require, e.g. `member`, `tmp_`, or even an empty string. ### Why are the changes needed? Allows to customise the prefix of stable ids in Avro without the need to rename all of the columns which could be cumbersome for complex schemas. ### Does this PR introduce _any_ user-facing change? Yes. The PR adds a new option in Avro: `stableIdentifierPrefixForUnionType`. ### How was this patch tested? Existing tests + a new unit test to verify different prefixes. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44964 from sadikovi/SPARK-46930. Authored-by: Ivan Sadikov Signed-off-by: Dongjoon Hyun --- .../apache/spark/sql/avro/AvroDataToCatalyst.scala | 12 +++-- .../apache/spark/sql/avro/AvroDeserializer.scala | 12 +++-- .../org/apache/spark/sql/avro/AvroFileFormat.scala | 3 +- .../org/apache/spark/sql/avro/AvroOptions.scala| 6 +++ .../org/apache/spark/sql/avro/AvroUtils.scala | 5 +- .../apache/spark/sql/avro/SchemaConverters.scala | 58 +++--- .../sql/v2/avro/AvroPartitionReaderFactory.scala | 3 +- .../sql/avro/AvroCatalystDataConversionSuite.scala | 7 +-- .../apache/spark/sql/avro/AvroFunctionsSuite.scala | 3 +- .../apache/spark/sql/avro/AvroRowReaderSuite.scala | 3 +- .../org/apache/spark/sql/avro/AvroSerdeSuite.scala | 3 +- .../org/apache/spark/sql/avro/AvroSuite.scala | 54 +++- docs/sql-data-sources-avro.md | 10 +++- 13 files changed, 133 insertions(+), 46 deletions(-) diff --git a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala index 9f31a2db55a5..7d80998d96eb 100644 --- a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala +++ b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala @@ -40,7 +40,9 @@ private[sql] case class AvroDataToCatalyst( override lazy val dataType: DataType = { val dt = SchemaConverters.toSqlType( - expectedSchema, avroOptions.useStableIdForUnionType).dataType + expectedSchema, + avroOptions.useStableIdForUnionType, + avroOptions.stableIdPrefixForUnionType).dataType parseMode match { // With PermissiveMode, the output Catalyst row might contain columns of null values for // corrupt records, even if some of the columns are not nullable in the user-provided schema. @@ -62,8 +64,12 @@ private[sql] case class AvroDataToCatalyst( @transient private lazy val reader = new GenericDatumReader[Any](actualSchema, expectedSchema) @transient private lazy val deserializer = -new AvroDeserializer(expectedSchema, dataType, - avroOptions.datetimeRebaseModeInRead, avroOptions.useStableIdForUnionType) +new AvroDeserializer( + expectedSchema, + dataType, + avroOptions.datetimeRebaseModeInRead, + avroOptions.useStableIdForUnionType, + avroOptions.stableIdPrefixForUnionType) @transient private var decoder: BinaryDecoder = _ diff --git a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala index 9e10fac8bb55..139c45adb442 100644 --- a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala +++ b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala @@ -50,20 +50,23 @@ private[sql] class AvroDeserializer( positionalFieldMatch: Boolean, datetimeRebaseSpec: RebaseSpec, filters: StructFilters, -useStableIdForUnionType: Boolean) { +useStableIdForUnionType: Boolean, +
(spark) branch master updated: [SPARK-46929][CORE][CONNECT][SS] Use ThreadUtils.shutdown to close thread pools
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 262ed5bcab0b [SPARK-46929][CORE][CONNECT][SS] Use ThreadUtils.shutdown to close thread pools 262ed5bcab0b is described below commit 262ed5bcab0ba750b089b0693dbb1a59ef6fd11f Author: beliefer AuthorDate: Wed Jan 31 09:52:19 2024 -0600 [SPARK-46929][CORE][CONNECT][SS] Use ThreadUtils.shutdown to close thread pools ### What changes were proposed in this pull request? This PR propose use `ThreadUtils.shutdown` to close thread pools. ### Why are the changes needed? `ThreadUtils` provided the `shutdown` to close thread pools. `ThreadUtils` wraps common logic to shutdown thread pools. We should use `ThreadUtils.shutdown` to close the thread pool. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? GA ### Was this patch authored or co-authored using generative AI tooling? 'No'. Closes #44962 from beliefer/SPARK-46929. Authored-by: beliefer Signed-off-by: Sean Owen --- .../sql/connect/service/SparkConnectExecutionManager.scala | 5 +++-- .../sql/connect/service/SparkConnectSessionManager.scala| 5 +++-- .../connect/service/SparkConnectStreamingQueryCache.scala | 9 +++-- .../scala/org/apache/spark/ExecutorAllocationManager.scala | 4 ++-- .../org/apache/spark/status/ElementTrackingStore.scala | 6 ++ .../main/scala/org/apache/spark/streaming/Checkpoint.scala | 12 +--- .../org/apache/spark/streaming/scheduler/JobScheduler.scala | 13 ++--- 7 files changed, 24 insertions(+), 30 deletions(-) diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectExecutionManager.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectExecutionManager.scala index c90f53ac07df..85fb150b3171 100644 --- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectExecutionManager.scala +++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectExecutionManager.scala @@ -21,6 +21,7 @@ import java.util.concurrent.{Executors, ScheduledExecutorService, TimeUnit} import javax.annotation.concurrent.GuardedBy import scala.collection.mutable +import scala.concurrent.duration.FiniteDuration import scala.jdk.CollectionConverters._ import scala.util.control.NonFatal @@ -30,6 +31,7 @@ import org.apache.spark.{SparkEnv, SparkSQLException} import org.apache.spark.connect.proto import org.apache.spark.internal.Logging import org.apache.spark.sql.connect.config.Connect.{CONNECT_EXECUTE_MANAGER_ABANDONED_TOMBSTONES_SIZE, CONNECT_EXECUTE_MANAGER_DETACHED_TIMEOUT, CONNECT_EXECUTE_MANAGER_MAINTENANCE_INTERVAL} +import org.apache.spark.util.ThreadUtils // Unique key identifying execution by combination of user, session and operation id case class ExecuteKey(userId: String, sessionId: String, operationId: String) @@ -167,8 +169,7 @@ private[connect] class SparkConnectExecutionManager() extends Logging { private[connect] def shutdown(): Unit = executionsLock.synchronized { scheduledExecutor.foreach { executor => - executor.shutdown() - executor.awaitTermination(1, TimeUnit.MINUTES) + ThreadUtils.shutdown(executor, FiniteDuration(1, TimeUnit.MINUTES)) } scheduledExecutor = None // note: this does not cleanly shut down the executions, but the server is shutting down. diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectSessionManager.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectSessionManager.scala index ef14cd305d40..4da728b95a33 100644 --- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectSessionManager.scala +++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectSessionManager.scala @@ -22,6 +22,7 @@ import java.util.concurrent.{Executors, ScheduledExecutorService, TimeUnit} import javax.annotation.concurrent.GuardedBy import scala.collection.mutable +import scala.concurrent.duration.FiniteDuration import scala.jdk.CollectionConverters._ import scala.util.control.NonFatal @@ -31,6 +32,7 @@ import org.apache.spark.{SparkEnv, SparkSQLException} import org.apache.spark.internal.Logging import org.apache.spark.sql.SparkSession import org.apache.spark.sql.connect.config.Connect.{CONNECT_SESSION_MANAGER_CLOSED_SESSIONS_TOMBSTONES_SIZE, CONNECT_SESSION_MANAGER_DEFAULT_SESSION_TIMEOUT, CONNECT_SESSION_MANAGER_MAINTENANCE_INTERVAL} +import org.apache.spark.util.ThreadUtils /** * Global tracker of all SessionHolders
(spark) branch master updated: [SPARK-46400][CORE][SQL] When there are corrupted files in the local maven repo, skip this cache and try again
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f2a471e9cc75 [SPARK-46400][CORE][SQL] When there are corrupted files in the local maven repo, skip this cache and try again f2a471e9cc75 is described below commit f2a471e9cc752f3826232eedc9025fd156a85965 Author: panbingkun AuthorDate: Wed Jan 31 09:46:07 2024 -0600 [SPARK-46400][CORE][SQL] When there are corrupted files in the local maven repo, skip this cache and try again ### What changes were proposed in this pull request? The pr aims to - fix potential bug(ie: https://github.com/apache/spark/pull/44208) and enhance user experience. - make the code more compliant with standards ### Why are the changes needed? We use the local maven repo as the first-level cache in ivy. The original intention was to reduce the time required to parse and obtain the ar, but when there are corrupted files in the local maven repo,The above mechanism will be directly interrupted and the prompt is very unfriendly, which will greatly confuse the user. Based on the original intention, we should skip the cache directly in similar situations. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44343 from panbingkun/SPARK-46400. Authored-by: panbingkun Signed-off-by: Sean Owen --- .../scala/org/apache/spark/util/MavenUtils.scala | 147 +++-- .../sql/hive/client/IsolatedClientLoader.scala | 4 + 2 files changed, 112 insertions(+), 39 deletions(-) diff --git a/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala b/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala index 2d7fba6f07d5..65530b7fa473 100644 --- a/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala +++ b/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala @@ -27,7 +27,7 @@ import org.apache.ivy.Ivy import org.apache.ivy.core.LogOptions import org.apache.ivy.core.module.descriptor.{Artifact, DefaultDependencyDescriptor, DefaultExcludeRule, DefaultModuleDescriptor, ExcludeRule} import org.apache.ivy.core.module.id.{ArtifactId, ModuleId, ModuleRevisionId} -import org.apache.ivy.core.report.ResolveReport +import org.apache.ivy.core.report.{DownloadStatus, ResolveReport} import org.apache.ivy.core.resolve.ResolveOptions import org.apache.ivy.core.retrieve.RetrieveOptions import org.apache.ivy.core.settings.IvySettings @@ -43,8 +43,8 @@ import org.apache.spark.util.ArrayImplicits._ private[spark] object MavenUtils extends Logging { val JAR_IVY_SETTING_PATH_KEY: String = "spark.jars.ivySettings" -// // Exposed for testing -// var printStream = SparkSubmit.printStream + // Exposed for testing + // var printStream = SparkSubmit.printStream // Exposed for testing. // These components are used to make the default exclusion rules for Spark dependencies. @@ -113,7 +113,7 @@ private[spark] object MavenUtils extends Logging { splits(2) != null && splits(2).trim.nonEmpty, s"The version cannot be null or " + s"be whitespace. The version provided is: ${splits(2)}") - new MavenCoordinate(splits(0), splits(1), splits(2)) + MavenCoordinate(splits(0), splits(1), splits(2)) }.toImmutableArraySeq } @@ -128,24 +128,30 @@ private[spark] object MavenUtils extends Logging { } /** - * Extracts maven coordinates from a comma-delimited string + * Create a ChainResolver used by Ivy to search for and resolve dependencies. * * @param defaultIvyUserDir * The default user path for Ivy + * @param useLocalM2AsCache + * Whether to use the local maven repo as a cache * @return * A ChainResolver used by Ivy to search for and resolve dependencies. */ - private[util] def createRepoResolvers(defaultIvyUserDir: File): ChainResolver = { + private[util] def createRepoResolvers( + defaultIvyUserDir: File, + useLocalM2AsCache: Boolean = true): ChainResolver = { // We need a chain resolver if we want to check multiple repositories val cr = new ChainResolver cr.setName("spark-list") -val localM2 = new IBiblioResolver -localM2.setM2compatible(true) -localM2.setRoot(m2Path.toURI.toString) -localM2.setUsepoms(true) -localM2.setName("local-m2-cache") -cr.add(localM2) +if (useLocalM2AsCache) { + val localM2 = new IBiblioResolver + localM2.setM2compatible(true) + localM2.setRoot(m2Path.toURI.toString) + localM2.setUsepoms(true) + localM2.setName("local-m2-cache") + cr.add(localM2) +} val localIvy = new FileSystemResolver val
(spark) branch master updated: [SPARK-45522][BUILD][CORE][SQL][UI] Migrate from Jetty 9 to Jetty 10
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6c19bf6b48e7 [SPARK-45522][BUILD][CORE][SQL][UI] Migrate from Jetty 9 to Jetty 10 6c19bf6b48e7 is described below commit 6c19bf6b48e7e2ab9937dc2d91ea23dd83abae64 Author: HiuFung Kwok AuthorDate: Wed Jan 31 09:42:16 2024 -0600 [SPARK-45522][BUILD][CORE][SQL][UI] Migrate from Jetty 9 to Jetty 10 ### What changes were proposed in this pull request? This is an upgrade ticket to bump the Jetty version from 9 to 10. This PR aims to bring incremental Jetty upgrades to Spark, as Jetty 9 support already reached EOL. ### Why are the changes needed? Jetty 9 is already beyond EOL, which means that we won't receive any security fix onward for Spark. ### Does this PR introduce _any_ user-facing change? No, SNI host check is now defaulted to true on embedded Jetty, hence set it back to false to maintain backward compatibility. Despite the redirect behaviour changed for trailing /, but modern browser should be able to pick up the 302 status code and perform redirect accordingly, hence there is no impact on user level. ### How was this patch tested? Junit test case. ### Was this patch authored or co-authored using generative AI tooling? No Closes #43765 from HiuKwok/ft-hf-SPARK-45522-jetty-upgradte. Lead-authored-by: HiuFung Kwok Co-authored-by: HiuFung Kwok <37996731+hiuk...@users.noreply.github.com> Signed-off-by: Sean Owen --- LICENSE-binary | 1 - core/pom.xml | 8 +--- .../main/scala/org/apache/spark/SSLOptions.scala | 2 +- .../main/scala/org/apache/spark/TestUtils.scala| 13 + .../scala/org/apache/spark/ui/JettyUtils.scala | 13 ++--- .../test/scala/org/apache/spark/ui/UISuite.scala | 22 +- dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++-- dev/test-dependencies.sh | 2 +- pom.xml| 8 +--- .../service/cli/thrift/ThriftHttpCLIService.java | 12 ++-- 10 files changed, 52 insertions(+), 33 deletions(-) diff --git a/LICENSE-binary b/LICENSE-binary index c6f291f11088..2073d85246b6 100644 --- a/LICENSE-binary +++ b/LICENSE-binary @@ -368,7 +368,6 @@ xerces:xercesImpl org.codehaus.jackson:jackson-jaxrs org.codehaus.jackson:jackson-xc org.eclipse.jetty:jetty-client -org.eclipse.jetty:jetty-continuation org.eclipse.jetty:jetty-http org.eclipse.jetty:jetty-io org.eclipse.jetty:jetty-jndi diff --git a/core/pom.xml b/core/pom.xml index c093213bd6b9..f780551fb555 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -146,11 +146,6 @@ jetty-http compile - - org.eclipse.jetty - jetty-continuation - compile - org.eclipse.jetty jetty-servlet @@ -538,7 +533,7 @@ true true - guava,protobuf-java,jetty-io,jetty-servlet,jetty-servlets,jetty-continuation,jetty-http,jetty-plus,jetty-util,jetty-server,jetty-security,jetty-proxy,jetty-client + guava,protobuf-java,jetty-io,jetty-servlet,jetty-servlets,jetty-http,jetty-plus,jetty-util,jetty-server,jetty-security,jetty-proxy,jetty-client true @@ -558,7 +553,6 @@ org.eclipse.jetty:jetty-http org.eclipse.jetty:jetty-proxy org.eclipse.jetty:jetty-client - org.eclipse.jetty:jetty-continuation org.eclipse.jetty:jetty-servlet org.eclipse.jetty:jetty-servlets org.eclipse.jetty:jetty-plus diff --git a/core/src/main/scala/org/apache/spark/SSLOptions.scala b/core/src/main/scala/org/apache/spark/SSLOptions.scala index 26108d885e4c..ce058cec2686 100644 --- a/core/src/main/scala/org/apache/spark/SSLOptions.scala +++ b/core/src/main/scala/org/apache/spark/SSLOptions.scala @@ -87,7 +87,7 @@ private[spark] case class SSLOptions( /** * Creates a Jetty SSL context factory according to the SSL settings represented by this object. */ - def createJettySslContextFactory(): Option[SslContextFactory] = { + def createJettySslContextFactoryServer(): Option[SslContextFactory.Server] = { if (enabled) { val sslContextFactory = new SslContextFactory.Server() diff --git a/core/src/main/scala/org/apache/spark/TestUtils.scala b/core/src/main/scala/org/apache/spark/TestUtils.scala index e85f98ff55c5..5e3078d7292b 100644 --- a/core/src/main/scala/org/apache/spark/TestUtils.scala +++ b/core/src/main/scala/org/apache/spark/TestUtils.scala @@ -252,6 +252,19 @@ private[spark] object TestUtils extends
(spark) branch master updated: [MINOR][SQL] Use `DecimalType.MINIMUM_ADJUSTED_SCALE` instead of magic number `6` in `Divide` class
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0c7770f4de56 [MINOR][SQL] Use `DecimalType.MINIMUM_ADJUSTED_SCALE` instead of magic number `6` in `Divide` class 0c7770f4de56 is described below commit 0c7770f4de560ad74e93b0902ab7a6be52c655be Author: longfei.jiang <1251489...@qq.com> AuthorDate: Wed Jan 31 09:40:07 2024 -0600 [MINOR][SQL] Use `DecimalType.MINIMUM_ADJUSTED_SCALE` instead of magic number `6` in `Divide` class ### What changes were proposed in this pull request? Replace magic value `6` with constants `DecimalType.MINIMUM_ADJUSTED_SCALE` ### Why are the changes needed? Magic values are less self-documenting than constant values. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UT `ArithmeticExpressionSuite#"SPARK-45786: Decimal multiply, divide, remainder, quot"` can provide testing ### Was this patch authored or co-authored using generative AI tooling? No Closes #44941 from jlfsdtc/magic_value. Authored-by: longfei.jiang <1251489...@qq.com> Signed-off-by: Sean Owen --- .../scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala index a0fb17cec812..9f1b42ad84d3 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala @@ -810,7 +810,7 @@ case class Divide( DecimalType.adjustPrecisionScale(prec, scale) } else { var intDig = min(DecimalType.MAX_SCALE, p1 - s1 + s2) - var decDig = min(DecimalType.MAX_SCALE, max(6, s1 + p2 + 1)) + var decDig = min(DecimalType.MAX_SCALE, max(DecimalType.MINIMUM_ADJUSTED_SCALE, s1 + p2 + 1)) val diff = (intDig + decDig) - DecimalType.MAX_SCALE if (diff > 0) { decDig -= diff / 2 + 1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (0871a6f15623 -> e95e820ac137)
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 0871a6f15623 [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test add e95e820ac137 [SPARK-46932] Clean up the imports in `pyspark.pandas.test_*` No new revisions were added by this update. Summary of changes: .../pandas/tests/connect/test_parity_categorical.py | 10 +- .../pandas/tests/connect/test_parity_extension.py | 21 ++--- .../tests/connect/test_parity_numpy_compat.py | 20 ++-- python/pyspark/pandas/tests/test_categorical.py | 12 ++-- python/pyspark/pandas/tests/test_extension.py | 11 +-- python/pyspark/pandas/tests/test_numpy_compat.py| 12 ++-- 6 files changed, 46 insertions(+), 40 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (5d87ac6214bb -> 0871a6f15623)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 5d87ac6214bb [SPARK-46926][PS] Add `convert_dtypes`, `infer_objects` and `set_axis` in fallback list add 0871a6f15623 [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test No new revisions were added by this update. Summary of changes: python/pyspark/sql/tests/test_session.py | 6 ++ python/pyspark/sql/tests/test_udf.py | 10 ++- python/pyspark/sql/tests/test_udf_profiler.py | 1 + python/pyspark/sql/tests/test_utils.py| 35 +++--- python/pyspark/testing/connectutils.py| 99 ++- python/pyspark/testing/utils.py | 9 +-- 6 files changed, 96 insertions(+), 64 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (8f406212aa09 -> 5d87ac6214bb)
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 8f406212aa09 [SPARK-46924][CORE] Fix `Load New` button in `Master/HistoryServer` Log UI add 5d87ac6214bb [SPARK-46926][PS] Add `convert_dtypes`, `infer_objects` and `set_axis` in fallback list No new revisions were added by this update. Summary of changes: python/pyspark/pandas/frame.py | 8 +++- python/pyspark/pandas/missing/frame.py | 4 +++- 2 files changed, 10 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (2f917c32feeb -> 8f406212aa09)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 2f917c32feeb [SPARK-46927][PYTHON] Make `assertDataFrameEqual` work properly without PyArrow add 8f406212aa09 [SPARK-46924][CORE] Fix `Load New` button in `Master/HistoryServer` Log UI No new revisions were added by this update. Summary of changes: .../deploy/{history/LogPage.scala => Utils.scala} | 79 +++--- .../spark/deploy/history/HistoryServer.scala | 3 +- .../org/apache/spark/deploy/history/LogPage.scala | 55 ++- .../apache/spark/deploy/master/ui/LogPage.scala| 55 ++- .../spark/deploy/master/ui/MasterWebUI.scala | 2 + 5 files changed, 37 insertions(+), 157 deletions(-) copy core/src/main/scala/org/apache/spark/deploy/{history/LogPage.scala => Utils.scala} (57%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-46747][SQL] Avoid scan in getTableExistsQuery for JDBC Dialects
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 64115d9829ec [SPARK-46747][SQL] Avoid scan in getTableExistsQuery for JDBC Dialects 64115d9829ec is described below commit 64115d9829ec881b69ffe44f844845104484a025 Author: Kent Yao AuthorDate: Tue Jan 30 20:34:28 2024 +0800 [SPARK-46747][SQL] Avoid scan in getTableExistsQuery for JDBC Dialects ### What changes were proposed in this pull request? [SPARK-46747](https://issues.apache.org/jira/browse/SPARK-46747) reported an issue that Postgres instances suffered from too many shared locks, which was caused by Spark‘s get table exist query. In this PR, we supplanted `"SELECT 1 FROM $table LIMIT 1"` with `"SELECT 1 FROM $table WHERE 1=0"` to prevent data from being scanned. ### Why are the changes needed? overhead reduction for JDBC datasources ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing JDBC v1/v2 datasouce tests. ### Was this patch authored or co-authored using generative AI tooling? no Closes #44948 from yaooqinn/SPARK-46747. Authored-by: Kent Yao Signed-off-by: Kent Yao (cherry picked from commit 031df8fa62666f14f54cf0a792f7fa2acc43afee) Signed-off-by: Kent Yao --- .../src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala| 2 +- .../src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala| 4 .../src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala | 4 sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala | 7 +++ 4 files changed, 4 insertions(+), 13 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala index c58d8368150b..ab5bfdb7189a 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala @@ -142,7 +142,7 @@ abstract class JdbcDialect extends Serializable with Logging { * @return The SQL query to use for checking the table. */ def getTableExistsQuery(table: String): String = { -s"SELECT * FROM $table WHERE 1=0" +s"SELECT 1 FROM $table WHERE 1=0" } /** diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala index 12882dc8e676..29eb8916bb79 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala @@ -107,10 +107,6 @@ private case object MySQLDialect extends JdbcDialect with SQLConfHelper { schemaBuilder.result } - override def getTableExistsQuery(table: String): String = { -s"SELECT 1 FROM $table LIMIT 1" - } - override def isCascadingTruncateTable(): Option[Boolean] = Some(false) // See https://dev.mysql.com/doc/refman/8.0/en/alter-table.html diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala index c2ca45d9143a..7f6f8dc00886 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala @@ -113,10 +113,6 @@ private object PostgresDialect extends JdbcDialect with SQLConfHelper { case _ => None } - override def getTableExistsQuery(table: String): String = { -s"SELECT 1 FROM $table LIMIT 1" - } - override def isCascadingTruncateTable(): Option[Boolean] = Some(false) /** diff --git a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala index aa66fcd53041..7c9306b65f1e 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala @@ -1009,10 +1009,9 @@ class JDBCSuite extends QueryTest with SharedSparkSession { val h2 = JdbcDialects.get(url) val derby = JdbcDialects.get("jdbc:derby:db") val table = "weblogs" -val defaultQuery = s"SELECT * FROM $table WHERE 1=0" -val limitQuery = s"SELECT 1 FROM $table LIMIT 1" -assert(MySQL.getTableExistsQuery(table) == limitQuery) -assert(Postgres.getTableExistsQuery(table) == limitQuery) +val defaultQuery = s"SELECT 1 FROM $table WHERE 1=0" +assert(MySQL.getTableExistsQuery(table) == defaultQuery) +assert(Postgres.getTableExistsQuery(table) == defaultQuery) assert(db2.getTableExistsQuery(table) == defaultQuery) assert(h2.getTableExistsQuery(table) == defaultQuery)
(spark) branch branch-3.5 updated (343ae8226161 -> d3b4537f8bd5)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git from 343ae8226161 [SPARK-46893][UI] Remove inline scripts from UI descriptions add d3b4537f8bd5 [SPARK-46747][SQL] Avoid scan in getTableExistsQuery for JDBC Dialects No new revisions were added by this update. Summary of changes: .../src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala| 2 +- .../src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala| 4 .../src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala | 4 sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala | 7 +++ 4 files changed, 4 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.3 updated: [SPARK-46747][SQL] Avoid scan in getTableExistsQuery for JDBC Dialects
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new e98872fb5d07 [SPARK-46747][SQL] Avoid scan in getTableExistsQuery for JDBC Dialects e98872fb5d07 is described below commit e98872fb5d07d570e6d0516b49a5d2e58876d1a6 Author: Kent Yao AuthorDate: Tue Jan 30 20:34:28 2024 +0800 [SPARK-46747][SQL] Avoid scan in getTableExistsQuery for JDBC Dialects ### What changes were proposed in this pull request? [SPARK-46747](https://issues.apache.org/jira/browse/SPARK-46747) reported an issue that Postgres instances suffered from too many shared locks, which was caused by Spark‘s get table exist query. In this PR, we supplanted `"SELECT 1 FROM $table LIMIT 1"` with `"SELECT 1 FROM $table WHERE 1=0"` to prevent data from being scanned. ### Why are the changes needed? overhead reduction for JDBC datasources ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing JDBC v1/v2 datasouce tests. ### Was this patch authored or co-authored using generative AI tooling? no Closes #44948 from yaooqinn/SPARK-46747. Authored-by: Kent Yao Signed-off-by: Kent Yao (cherry picked from commit 031df8fa62666f14f54cf0a792f7fa2acc43afee) Signed-off-by: Kent Yao --- .../src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala| 2 +- .../src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala| 4 .../src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala | 4 sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala | 7 +++ 4 files changed, 4 insertions(+), 13 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala index 1e65542946af..0cd46dda62c3 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala @@ -141,7 +141,7 @@ abstract class JdbcDialect extends Serializable with Logging{ * @return The SQL query to use for checking the table. */ def getTableExistsQuery(table: String): String = { -s"SELECT * FROM $table WHERE 1=0" +s"SELECT 1 FROM $table WHERE 1=0" } /** diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala index 24f9bac74f86..73903d65b01a 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala @@ -94,10 +94,6 @@ private case object MySQLDialect extends JdbcDialect with SQLConfHelper { schemaBuilder.result } - override def getTableExistsQuery(table: String): String = { -s"SELECT 1 FROM $table LIMIT 1" - } - override def isCascadingTruncateTable(): Option[Boolean] = Some(false) // See https://dev.mysql.com/doc/refman/8.0/en/alter-table.html diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala index a668d66ee2f9..6de4d6ec71a8 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala @@ -141,10 +141,6 @@ private object PostgresDialect extends JdbcDialect with SQLConfHelper { case _ => None } - override def getTableExistsQuery(table: String): String = { -s"SELECT 1 FROM $table LIMIT 1" - } - override def isCascadingTruncateTable(): Option[Boolean] = Some(false) /** diff --git a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala index a222391c06fb..74b85f18ceef 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala @@ -989,10 +989,9 @@ class JDBCSuite extends QueryTest val h2 = JdbcDialects.get(url) val derby = JdbcDialects.get("jdbc:derby:db") val table = "weblogs" -val defaultQuery = s"SELECT * FROM $table WHERE 1=0" -val limitQuery = s"SELECT 1 FROM $table LIMIT 1" -assert(MySQL.getTableExistsQuery(table) == limitQuery) -assert(Postgres.getTableExistsQuery(table) == limitQuery) +val defaultQuery = s"SELECT 1 FROM $table WHERE 1=0" +assert(MySQL.getTableExistsQuery(table) == defaultQuery) +assert(Postgres.getTableExistsQuery(table) == defaultQuery) assert(db2.getTableExistsQuery(table) == defaultQuery) assert(h2.getTableExistsQuery(table) == defaultQuery) assert(derby.getTableExistsQuery(table)
(spark) branch master updated (4f0a9df534a4 -> 2f917c32feeb)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 4f0a9df534a4 [SPARK-46925][PYTHON][CONNECT] Add a warning that instructs to install memory_profiler for memory profiling add 2f917c32feeb [SPARK-46927][PYTHON] Make `assertDataFrameEqual` work properly without PyArrow No new revisions were added by this update. Summary of changes: python/pyspark/testing/utils.py | 42 - 1 file changed, 21 insertions(+), 21 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (ecdb38e2d059 -> 4f0a9df534a4)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from ecdb38e2d059 [SPARK-46923][DOCS] Limit width of configuration tables add 4f0a9df534a4 [SPARK-46925][PYTHON][CONNECT] Add a warning that instructs to install memory_profiler for memory profiling No new revisions were added by this update. Summary of changes: python/pyspark/sql/connect/session.py | 17 - python/pyspark/sql/session.py | 15 ++- 2 files changed, 30 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (8e29c0d8c5cd -> ecdb38e2d059)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 8e29c0d8c5cd [SPARK-46921][BUILD] Move `ProblemFilters` that do not belong to `defaultExcludes` to `v40excludes` add ecdb38e2d059 [SPARK-46923][DOCS] Limit width of configuration tables No new revisions were added by this update. Summary of changes: connector/profiler/README.md | 2 +- docs/configuration.md | 38 - docs/css/custom.css| 56 -- docs/monitoring.md | 2 +- docs/running-on-kubernetes.md | 2 +- docs/running-on-yarn.md| 6 +-- docs/security.md | 18 - docs/spark-standalone.md | 20 + docs/sql-data-sources-avro.md | 8 ++-- docs/sql-data-sources-hive-tables.md | 2 +- docs/sql-data-sources-orc.md | 4 +- docs/sql-data-sources-parquet.md | 2 +- docs/sql-performance-tuning.md | 16 docs/sql-ref-ansi-compliance.md| 38 +++-- docs/structured-streaming-kafka-integration.md | 8 ++-- sql/gen-sql-config-docs.py | 4 +- 16 files changed, 143 insertions(+), 83 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org