[GitHub] [spark] SparkQA commented on pull request #32883: [SPARK-35725][SQL] Support repartition expand partitions in AQE
SparkQA commented on pull request #32883: URL: https://github.com/apache/spark/pull/32883#issuecomment-859647049 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32874: [SPARK-35699][K8S] Improve error message when creating k8s pod failed.
SparkQA commented on pull request #32874: URL: https://github.com/apache/spark/pull/32874#issuecomment-859239174 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on pull request #32576: [SPARK-35429][CORE] Remove commons-httpclient due to EOL and CVEs
sunchao commented on pull request #32576: URL: https://github.com/apache/spark/pull/32576#issuecomment-859920490 @sumeetgajjar do you wanna pick up this one again? we just upgraded Hive to 2.3.9 which removes the `commons-httpclient` from Hive side. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] suryanshagnihotri commented on pull request #26619: [SPARK-29981][BUILD] Add hive-1.2/2.3 profiles
suryanshagnihotri commented on pull request #26619: URL: https://github.com/apache/spark/pull/26619#issuecomment-859243045 @dongjoon-hyun yes have tried it but it fails with class not found exceptions like ``` java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.plan.LoadTableDesc$LoadFileType at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:255) at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:244) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at org.apache.spark.sql.hive.client.Shim_v3_0.clazzLoadFileType$lzycompute(HiveShim.scala:1272) at org.apache.spark.sql.hive.client.Shim_v3_0.clazzLoadFileType(HiveShim.scala:1271) at org.apache.spark.sql.hive.client.Shim_v3_0.loadTable(HiveShim.scala:1352) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$loadTable$1(HiveClientImpl.scala:894) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:295) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:228) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:227) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:277) at org.apache.spark.sql.hive.client.HiveClientImpl.loadTable(HiveClientImpl.scala:889) at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$loadTable$1(HiveExternalCatalog.scala:884) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:103) at org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:878) at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.loadTable(ExternalCatalogWithListener.scala:167) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:327) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:102) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106) at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:120) at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps and make `isnull` method data-type-based
SparkQA commented on pull request #32821: URL: https://github.com/apache/spark/pull/32821#issuecomment-859239324 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #32136: [SPARK-35022][CORE] Task Scheduling Plugin in Spark
mridulm commented on pull request #32136: URL: https://github.com/apache/spark/pull/32136#issuecomment-859263950 I am trying to catch up on this discussion, and it is a very long thread already :-) Thanks for all the discussion ! Can we update the doc @viirya given that there seems to be some consensus developing ? Based on that, I will revisit comments to understand the rationales better for the various conclusion points. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32850: [SPARK-34920][CORE][SQL] Add error classes with SQLSTATE
cloud-fan commented on pull request #32850: URL: https://github.com/apache/spark/pull/32850#issuecomment-859584261 cc @viirya @maropu @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sigmod closed pull request #32463: [WIP][SPARK-35147][SQL] Migrate to resolveWithPruning for two command rules
sigmod closed pull request #32463: URL: https://github.com/apache/spark/pull/32463 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yikf opened a new pull request #32877: wait until something does get queued.
yikf opened a new pull request #32877: URL: https://github.com/apache/spark/pull/32877 ### What changes were proposed in this pull request? Currently, we continue the loop after wait timeout when `ContextCleaner` cleanUp if the queue is empty, It is an ineffective loop because the queue is empty. In the PR, it prevent ineffective loop, if the queue is empty, we wait until something does get queued instead of ineffective loop after wait timeout . ### Why are the changes needed? avoid inefeective loop ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Exist test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #32873: [SPARK-35718][SQL] Support casting of Date to timestamp without time zone type
cloud-fan closed pull request #32873: URL: https://github.com/apache/spark/pull/32873 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32835: [SPARK-35591][PYTHON][DOCS] Rename "Koalas" to "pandas API on Spark" in the documents
AmplabJenkins removed a comment on pull request #32835: URL: https://github.com/apache/spark/pull/32835#issuecomment-859252319 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32885: [SPARK-35742][SQL] Expression.semanticEquals should be symmetrical
SparkQA removed a comment on pull request #32885: URL: https://github.com/apache/spark/pull/32885#issuecomment-859924503 **[Test build #139711 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139711/testReport)** for PR 32885 at commit [`73aa6f2`](https://github.com/apache/spark/commit/73aa6f23c1280b8b047d177ddbb6b41d3d6e02cc). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin closed pull request #32871: [SPARK-35475][PYTHON] Fix disallow_untyped_defs mypy checks.
ueshin closed pull request #32871: URL: https://github.com/apache/spark/pull/32871 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32410: [SPARK-35286][SQL] Replace SessionState.start with SessionState.setCurrentSessionState
HyukjinKwon commented on a change in pull request #32410: URL: https://github.com/apache/spark/pull/32410#discussion_r649958997 ## File path: sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java ## @@ -141,7 +141,8 @@ public void open(Map sessionConfMap) throws HiveSQLException { sessionState = new SessionState(hiveConf, username); sessionState.setUserIpAddress(ipAddress); sessionState.setIsHiveServerQuery(true); -SessionState.start(sessionState); +// Use setCurrentSessionState to avoid creating useless SessionDirs. +SessionState.setCurrentSessionState(sessionState); Review comment: @wangyum, last question. What about this? Is it safe to remove? ``` // Hive object instance should be created with a copy of the conf object. If the conf is // shared with SessionState, other parts of the code might update the config, but // Hive.get(HiveConf) would not recognize the case when it needs refreshing Hive.get(new HiveConf(startSs.conf)).getMSC(); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #32828: [SPARK-35689][SS] Add log warn when keyWithIndexToValue returns null value
viirya commented on pull request #32828: URL: https://github.com/apache/spark/pull/32828#issuecomment-859243654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32858: [SPARK-35706][SQL] Consider making the ':' in STRUCT data type definition optional
AmplabJenkins commented on pull request #32858: URL: https://github.com/apache/spark/pull/32858#issuecomment-859553066 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32845: [SPARK-35691][CORE] addFile/addJar/addDirectory should put CanonicalFile
AmplabJenkins commented on pull request #32845: URL: https://github.com/apache/spark/pull/32845#issuecomment-859245462 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #32838: [SPARK-35694][INFRA] Increase the default JVM stack size of SBT/Maven
gengliangwang commented on pull request #32838: URL: https://github.com/apache/spark/pull/32838#issuecomment-859341582 @LuciferYang Yes, I saw that too. Let me increase it to 64MB and have multiple retries before merging it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dgd-contributor commented on pull request #32863: [SPARK-35652][SQL] joinWith on two table generated from same one
dgd-contributor commented on pull request #32863: URL: https://github.com/apache/spark/pull/32863#issuecomment-859211187 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32862: [SPARK-35695][SQL] Collect observed metrics from cached and adaptive execution sub-trees
AmplabJenkins commented on pull request #32862: URL: https://github.com/apache/spark/pull/32862#issuecomment-859302268 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Yikun edited a comment on pull request #32867: [SPARK-35721][PYTHON] Path level discover for python unittests
Yikun edited a comment on pull request #32867: URL: https://github.com/apache/spark/pull/32867#issuecomment-858668874 cc @HyukjinKwon @xinrong-databricks It would be good if you could give some inputs on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] eejbyfeldt commented on pull request #27153: [SPARK-20384][SQL] Support value class in schema of Dataset (without breaking existing current projection)
eejbyfeldt commented on pull request #27153: URL: https://github.com/apache/spark/pull/27153#issuecomment-859592841 Hi @mickjermsurawong-stripe ! Great to hear, looking forward to updated changes. Also my PR https://github.com/apache/spark/pull/32783 got merged, so rebasing this branch on master should no longer fail tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #31504: [SPARK-34172][SQL] Add `SHOW DATABASES` as table-valued function
github-actions[bot] closed pull request #31504: URL: https://github.com/apache/spark/pull/31504 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32849: [WIP][SPARK-35704][SQL] Add fields to `DayTimeIntervalType`
SparkQA commented on pull request #32849: URL: https://github.com/apache/spark/pull/32849#issuecomment-859182161 **[Test build #139659 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139659/testReport)** for PR 32849 at commit [`32fae30`](https://github.com/apache/spark/commit/32fae304389bbf5f3d9ae7cfb1a2496ae9e84a35). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32870: [SPARK-35439][SQL][FOLLOWUP] ExpressionContainmentOrdering should not sort unrelated expressions
AmplabJenkins commented on pull request #32870: URL: https://github.com/apache/spark/pull/32870#issuecomment-859195282 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32750: [SPARK-34512][BUILD][SQL] Upgrade built-in Hive to 2.3.9
AmplabJenkins commented on pull request #32750: URL: https://github.com/apache/spark/pull/32750#issuecomment-859195284 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a change in pull request #32140: [WIP][SPARK-32922][SHUFFLE][CORE] Adds support for executors to fetch local and remote merged shuffle data
otterc commented on a change in pull request #32140: URL: https://github.com/apache/spark/pull/32140#discussion_r649603473 ## File path: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala ## @@ -22,31 +22,40 @@ import java.nio.ByteBuffer import java.util.UUID import java.util.concurrent.{CompletableFuture, Semaphore} +import scala.collection.mutable import scala.concurrent.ExecutionContext.Implicits.global import scala.concurrent.Future import io.netty.util.internal.OutOfDirectMemoryError import org.mockito.ArgumentMatchers.{any, eq => meq} -import org.mockito.Mockito.{mock, times, verify, when} +import org.mockito.Mockito.{doThrow, mock, times, verify, when} +import org.mockito.invocation.InvocationOnMock import org.mockito.stubbing.Answer +import org.roaringbitmap.RoaringBitmap import org.scalatest.PrivateMethodTester -import org.apache.spark.{SparkFunSuite, TaskContext} +import org.apache.spark.{MapOutputTracker, SparkFunSuite, TaskContext} +import org.apache.spark.MapOutputTracker.SHUFFLE_PUSH_MAP_ID import org.apache.spark.network._ import org.apache.spark.network.buffer.{FileSegmentManagedBuffer, ManagedBuffer} -import org.apache.spark.network.shuffle.{BlockFetchingListener, DownloadFileManager, ExternalBlockStoreClient} +import org.apache.spark.network.shuffle.{BlockFetchingListener, DownloadFileManager, ExternalBlockStoreClient, MergedBlockMeta, MergedBlocksMetaListener} import org.apache.spark.network.util.LimitedInputStream import org.apache.spark.shuffle.{FetchFailedException, ShuffleReadMetricsReporter} -import org.apache.spark.storage.ShuffleBlockFetcherIterator.FetchBlockInfo +import org.apache.spark.storage.BlockManagerId.SHUFFLE_MERGER_IDENTIFIER +import org.apache.spark.storage.ShuffleBlockFetcherIterator._ import org.apache.spark.util.Utils class ShuffleBlockFetcherIteratorSuite extends SparkFunSuite with PrivateMethodTester { Review comment: I have added test for (b) `failed to fetch merged block as well as fallback block should throw a FetchFailedException` ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -871,6 +1054,82 @@ final class ShuffleBlockFetcherIterator( "Failed to get block " + blockId + ", which is not a shuffle block", e) } } + + /** + * All the below methods are used by [[PushBasedFetchHelper]] to communicate with the iterator + */ + private[storage] def addToResultsQueue(result: FetchResult): Unit = { +results.put(result) + } + + private[storage] def foundMoreBlocksToFetch(moreBlocksToFetch: Int): Unit = { +numBlocksToFetch += moreBlocksToFetch + } + + /** + * Currently used by [[PushBasedFetchHelper]] to fetch fallback blocks when there is a fetch + * failure with a shuffle merged block/chunk. + */ + private[storage] def fetchFallbackBlocks( + fallbackBlocksByAddr: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])]): Unit = { +val fallbackLocalBlocks = mutable.LinkedHashSet[(BlockId, Int)]() +val fallbackHostLocalBlocksByExecutor = + mutable.LinkedHashMap[BlockManagerId, Seq[(BlockId, Long, Int)]]() +val fallbackMergedLocalBlocks = mutable.LinkedHashSet[BlockId]() +val fallbackRemoteReqs = partitionBlocksByFetchMode(fallbackBlocksByAddr, + fallbackLocalBlocks, fallbackHostLocalBlocksByExecutor, fallbackMergedLocalBlocks) +// Add the remote requests into our queue in a random order +fetchRequests ++= Utils.randomize(fallbackRemoteReqs) +logInfo(s"Started ${fallbackRemoteReqs.size} fallback remote requests for merged") +// If there is any fall back block that's a local block, we get them here. The original +// invocation to fetchLocalBlocks might have already returned by this time, so we need +// to invoke it again here. +fetchLocalBlocks(fallbackLocalBlocks) +// Merged local blocks should be empty during fallback +assert(fallbackMergedLocalBlocks.isEmpty, + "There should be zero merged blocks during fallback") +// Some of the fallback local blocks could be host local blocks +fetchAllHostLocalBlocks(fallbackHostLocalBlocksByExecutor) + } + + /** + * Removes all the pending shuffle chunks that are on the same host as the block chunk that had + * a fetch failure. + * + * @return set of all the removed shuffle chunk Ids. + */ + private[storage] def removePendingChunks( + failedBlockId: ShuffleBlockChunkId, + address: BlockManagerId): mutable.HashSet[ShuffleBlockChunkId] = { +val removedChunkIds = new mutable.HashSet[ShuffleBlockChunkId]() + +def sameShuffleBlockChunk(block: BlockId): Boolean = { + val chunkId = block.asInstanceOf[ShuffleBlockChunkId] + chunkId.shuffleId == failedBlockId.shuffleId && chunkId.reduceId == failedBlockId.reduceId +} + +def filterRequests(queue: mutable.Queue[FetchRequest]): Unit = { + val
[GitHub] [spark] dongjoon-hyun commented on pull request #26619: [SPARK-29981][BUILD] Add hive-1.2/2.3 profiles
dongjoon-hyun commented on pull request #26619: URL: https://github.com/apache/spark/pull/26619#issuecomment-859155211 Did you try `spark.sql.hive.metastore.version`, @suryanshagnihotri ? - https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore For some known issues, we are upgrade the built-in hive module to 2.3.9 here. - https://github.com/apache/spark/pull/32750 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32847: [SPARK-35616][PYTHON] Make `astype` method data-type-based
AmplabJenkins commented on pull request #32847: URL: https://github.com/apache/spark/pull/32847#issuecomment-859175443 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #31685: [SPARK-26797][SQL][WIP][test-maven] Start using the new logical types API of Parquet 1.11.1 instead of the deprecated one
github-actions[bot] commented on pull request #31685: URL: https://github.com/apache/spark/pull/31685#issuecomment-859166765 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #30878: [SPARK-33872][SQL] Throw more specific exceptions when relations/namespaces are not found
github-actions[bot] commented on pull request #30878: URL: https://github.com/apache/spark/pull/30878#issuecomment-859166786 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32470: [SPARK-35712][SQL] Simplify ResolveAggregateFunctions
maropu commented on a change in pull request #32470: URL: https://github.com/apache/spark/pull/32470#discussion_r649623086 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -2457,164 +2450,127 @@ class Analyzer(override val catalogManager: CatalogManager) _.containsPattern(AGGREGATE), ruleId) { // Resolve aggregate with having clause to Filter(..., Aggregate()). Note, to avoid wrongly // resolve the having condition expression, here we skip resolving it in ResolveReferences - // and transform it to Filter after aggregate is resolved. See more details in SPARK-31519. + // and transform it to Filter after aggregate is resolved. Basically columns in HAVING should + // be resolved with `agg.child.output` first. See more details in SPARK-31519. case UnresolvedHaving(cond, agg: Aggregate) if agg.resolved => -resolveHaving(Filter(cond, agg), agg) - - case f @ Filter(_, agg: Aggregate) if agg.resolved => -resolveHaving(f, agg) - - case sort @ Sort(sortOrder, global, aggregate: Aggregate) if aggregate.resolved => - -// Try resolving the ordering as though it is in the aggregate clause. -try { - // If a sort order is unresolved, containing references not in aggregate, or containing - // `AggregateExpression`, we need to push down it to the underlying aggregate operator. - val unresolvedSortOrders = sortOrder.filter { s => -!s.resolved || !s.references.subsetOf(aggregate.outputSet) || containsAggregate(s) +resolveOperatorWithAggregate(Seq(cond), agg, (newExprs, newChild) => { + Filter(newExprs.head, newChild) +}) + + case Filter(cond, agg: Aggregate) if agg.resolved => +// We should resolve the references normally based on child.output first. +val maybeResolved = resolveExpressionByPlanOutput(cond, agg) +resolveOperatorWithAggregate(Seq(maybeResolved), agg, (newExprs, newChild) => { + Filter(newExprs.head, newChild) +}) + + case Sort(sortOrder, global, agg: Aggregate) if agg.resolved => +// We should resolve the references normally based on child.output first. +val maybeResolved = sortOrder.map(_.child).map(resolveExpressionByPlanOutput(_, agg)) +resolveOperatorWithAggregate(maybeResolved, agg, (newExprs, newChild) => { + val newSortOrder = sortOrder.zip(newExprs).map { +case (sortOrder, expr) => sortOrder.copy(child = expr) } - val aliasedOrdering = unresolvedSortOrders.map(o => Alias(o.child, "aggOrder")()) - - val aggregateWithExtraOrdering = aggregate.copy( -aggregateExpressions = aggregate.aggregateExpressions ++ aliasedOrdering) - - val resolvedAggregate: Aggregate = - executeSameContext(aggregateWithExtraOrdering).asInstanceOf[Aggregate] - - val (reResolvedAggExprs, resolvedAliasedOrdering) = - resolvedAggregate.aggregateExpressions.splitAt(aggregate.aggregateExpressions.length) - - // If we pass the analysis check, then the ordering expressions should only reference to - // aggregate expressions or grouping expressions, and it's safe to push them down to - // Aggregate. - checkAnalysis(resolvedAggregate) - - val originalAggExprs = aggregate.aggregateExpressions.map(trimNonTopLevelAliases) - - // If the ordering expression is same with original aggregate expression, we don't need - // to push down this ordering expression and can reference the original aggregate - // expression instead. - val needsPushDown = ArrayBuffer.empty[NamedExpression] - val orderToAlias = unresolvedSortOrders.zip(aliasedOrdering) - val evaluatedOrderings = - resolvedAliasedOrdering.asInstanceOf[Seq[Alias]].zip(orderToAlias).map { - case (evaluated, (order, aliasOrder)) => -val index = reResolvedAggExprs.indexWhere { - case Alias(child, _) => child semanticEquals evaluated.child - case other => other semanticEquals evaluated.child -} - -if (index == -1) { - if (hasCharVarchar(evaluated)) { -needsPushDown += aliasOrder -order.copy(child = aliasOrder) - } else { -needsPushDown += evaluated -order.copy(child = evaluated.toAttribute) - } -} else { - order.copy(child = originalAggExprs(index).toAttribute) -} + Sort(newSortOrder, global, newChild) +}) +} + +def resolveExprsWithAggregate( +exprs: Seq[Expression], +agg: Aggregate): (Seq[NamedExpression], Seq[Expression]) = { Review comment: How about
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #26619: [SPARK-29981][BUILD] Add hive-1.2/2.3 profiles
dongjoon-hyun edited a comment on pull request #26619: URL: https://github.com/apache/spark/pull/26619#issuecomment-859155211 Did you try `spark.sql.hive.metastore.version`, @suryanshagnihotri ? - https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore For some known issues, we are upgrading the built-in hive module to 2.3.9 here. - https://github.com/apache/spark/pull/32750 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32871: [SPARK-35475][PYTHON] Fix disallow_untyped_defs mypy checks.
AmplabJenkins commented on pull request #32871: URL: https://github.com/apache/spark/pull/32871#issuecomment-859195281 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32870: [[SPARK-35439][SQL]][FOLLOWUP] ExpressionContainmentOrdering should not sort unrelated expressions
SparkQA commented on pull request #32870: URL: https://github.com/apache/spark/pull/32870#issuecomment-859153502 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32847: [SPARK-35616][PYTHON] Make `astype` method data-type-based
AmplabJenkins removed a comment on pull request #32847: URL: https://github.com/apache/spark/pull/32847#issuecomment-859175443 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32871: [SPARK-35475][PYTHON] Fix disallow_untyped_defs mypy checks.
SparkQA commented on pull request #32871: URL: https://github.com/apache/spark/pull/32871#issuecomment-859176870 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32750: [SPARK-34512][BUILD][SQL] Upgrade built-in Hive to 2.3.9
AmplabJenkins removed a comment on pull request #32750: URL: https://github.com/apache/spark/pull/32750#issuecomment-859195284 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #32841: [SPARK-35673][SQL] Fix user-defined hint and unrecognized hint in subquery.
maropu commented on pull request #32841: URL: https://github.com/apache/spark/pull/32841#issuecomment-859193291 late lgtm. Thank you for this fix, @cfmcgrady . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu closed pull request #32860: [SPARK-35709][DOCS] Remove the reference to third party Nomad integration project
maropu closed pull request #32860: URL: https://github.com/apache/spark/pull/32860 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #32860: [SPARK-35709][DOCS] Remove the reference to third party Nomad integration project
maropu commented on pull request #32860: URL: https://github.com/apache/spark/pull/32860#issuecomment-859153880 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32849: [WIP][SPARK-35704][SQL] Add fields to `DayTimeIntervalType`
AmplabJenkins commented on pull request #32849: URL: https://github.com/apache/spark/pull/32849#issuecomment-859153228 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32849: [WIP][SPARK-35704][SQL] Add fields to `DayTimeIntervalType`
AmplabJenkins removed a comment on pull request #32849: URL: https://github.com/apache/spark/pull/32849#issuecomment-859153228 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32870: [SPARK-35439][SQL][FOLLOWUP] ExpressionContainmentOrdering should not sort unrelated expressions
dongjoon-hyun commented on a change in pull request #32870: URL: https://github.com/apache/spark/pull/32870#discussion_r649609136 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala ## @@ -229,3 +208,27 @@ class EquivalentExpressions { sb.toString() } } + +/** + * Orders `Expression` by parent/child relations. The child expression is smaller + * than parent expression. If there is child-parent relationships among the subexpressions, + * we want the child expressions come first than parent expressions, so we can replace + * child expressions in parent expressions with subexpression evaluation. Note that + * this is not for general expression ordering. For example, two irrelevant expressions + * will be considered as e1 < e2 and e2 < e1 by this ordering. But for the usage here, + * the order of irrelevant expressions does not matter. + */ +class ExpressionContainmentOrdering extends Ordering[Expression] { Review comment: Just curious, is there a reason of this move? ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala ## @@ -229,3 +208,27 @@ class EquivalentExpressions { sb.toString() } } + +/** + * Orders `Expression` by parent/child relations. The child expression is smaller + * than parent expression. If there is child-parent relationships among the subexpressions, + * we want the child expressions come first than parent expressions, so we can replace + * child expressions in parent expressions with subexpression evaluation. Note that + * this is not for general expression ordering. For example, two irrelevant expressions + * will be considered as e1 < e2 and e2 < e1 by this ordering. But for the usage here, Review comment: Shall we change this to `0` according to the new logic? ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala ## @@ -229,3 +208,27 @@ class EquivalentExpressions { sb.toString() } } + +/** + * Orders `Expression` by parent/child relations. The child expression is smaller + * than parent expression. If there is child-parent relationships among the subexpressions, + * we want the child expressions come first than parent expressions, so we can replace + * child expressions in parent expressions with subexpression evaluation. Note that + * this is not for general expression ordering. For example, two irrelevant expressions + * will be considered as e1 < e2 and e2 < e1 by this ordering. But for the usage here, + * the order of irrelevant expressions does not matter. + */ +class ExpressionContainmentOrdering extends Ordering[Expression] { Review comment: Never mind. New one also looks good~ ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala ## @@ -229,3 +208,27 @@ class EquivalentExpressions { sb.toString() } } + +/** + * Orders `Expression` by parent/child relations. The child expression is smaller + * than parent expression. If there is child-parent relationships among the subexpressions, + * we want the child expressions come first than parent expressions, so we can replace + * child expressions in parent expressions with subexpression evaluation. Note that + * this is not for general expression ordering. For example, two irrelevant expressions + * will be considered as equal by this ordering. But for the usage here, the order of + * irrelevant expressions does not matter. Review comment: To be complete, could you add some description about the semantically-equal expressions? ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala ## @@ -229,3 +208,27 @@ class EquivalentExpressions { sb.toString() } } + +/** + * Orders `Expression` by parent/child relations. The child expression is smaller + * than parent expression. If there is child-parent relationships among the subexpressions, + * we want the child expressions come first than parent expressions, so we can replace + * child expressions in parent expressions with subexpression evaluation. Note that + * this is not for general expression ordering. For example, two irrelevant or semantically-equal + * expressions will be considered as equal by this ordering. But for the usage here, the order of + * irrelevant expressions does not matter. + */ +class ExpressionContainmentOrdering extends Ordering[Expression] { + override def compare(x: Expression, y: Expression): Int = { +if (x.find(_.semanticEquals(y)).isDefined) { + // `y` is child expression of `x`. + 1 +} else if (y.find(_.semanticEquals(x)).isDefined) { + // `x` is child expression of `y`. + -1 +} else { + // Irrelevant expressions Review comment:
[GitHub] [spark] ueshin opened a new pull request #32871: [SPARK-35475][PYTHON] Fix disallow_untyped_defs mypy checks.
ueshin opened a new pull request #32871: URL: https://github.com/apache/spark/pull/32871 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32868: [SPARK-35714][CORE] Bug fix for deadlock during the executor shutdown
AmplabJenkins removed a comment on pull request #32868: URL: https://github.com/apache/spark/pull/32868#issuecomment-859153229 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139660/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32868: [SPARK-35714][CORE] Bug fix for deadlock during the executor shutdown
AmplabJenkins commented on pull request #32868: URL: https://github.com/apache/spark/pull/32868#issuecomment-859153229 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139660/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32871: [SPARK-35475][PYTHON] Fix disallow_untyped_defs mypy checks.
AmplabJenkins removed a comment on pull request #32871: URL: https://github.com/apache/spark/pull/32871#issuecomment-859195281 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32750: [SPARK-34512][BUILD][SQL] Upgrade built-in Hive to 2.3.9
SparkQA commented on pull request #32750: URL: https://github.com/apache/spark/pull/32750#issuecomment-859153636 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #32870: [SPARK-35439][SQL][FOLLOWUP] ExpressionContainmentOrdering should not sort unrelated expressions
viirya commented on a change in pull request #32870: URL: https://github.com/apache/spark/pull/32870#discussion_r649615316 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala ## @@ -229,3 +208,27 @@ class EquivalentExpressions { sb.toString() } } + +/** + * Orders `Expression` by parent/child relations. The child expression is smaller + * than parent expression. If there is child-parent relationships among the subexpressions, + * we want the child expressions come first than parent expressions, so we can replace + * child expressions in parent expressions with subexpression evaluation. Note that + * this is not for general expression ordering. For example, two irrelevant expressions + * will be considered as e1 < e2 and e2 < e1 by this ordering. But for the usage here, + * the order of irrelevant expressions does not matter. + */ +class ExpressionContainmentOrdering extends Ordering[Expression] { Review comment: Oh, as it is a nested class, I cannot allocate it separately, but ```scala val equivalence = new EquivalentExpressions val exprOrdering = new equivalence.ExpressionContainmentOrdering ``` I can revert to nested class if you think it's unnecessary change. ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala ## @@ -229,3 +208,27 @@ class EquivalentExpressions { sb.toString() } } + +/** + * Orders `Expression` by parent/child relations. The child expression is smaller + * than parent expression. If there is child-parent relationships among the subexpressions, + * we want the child expressions come first than parent expressions, so we can replace + * child expressions in parent expressions with subexpression evaluation. Note that + * this is not for general expression ordering. For example, two irrelevant expressions + * will be considered as e1 < e2 and e2 < e1 by this ordering. But for the usage here, Review comment: Right, missing the doc. Fixed. ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala ## @@ -229,3 +208,27 @@ class EquivalentExpressions { sb.toString() } } + +/** + * Orders `Expression` by parent/child relations. The child expression is smaller + * than parent expression. If there is child-parent relationships among the subexpressions, + * we want the child expressions come first than parent expressions, so we can replace + * child expressions in parent expressions with subexpression evaluation. Note that + * this is not for general expression ordering. For example, two irrelevant expressions + * will be considered as e1 < e2 and e2 < e1 by this ordering. But for the usage here, + * the order of irrelevant expressions does not matter. + */ +class ExpressionContainmentOrdering extends Ordering[Expression] { Review comment: Oh, as it is a nested class, I cannot allocate it separately, but ```scala val equivalence = new EquivalentExpressions val exprOrdering = new equivalence.ExpressionContainmentOrdering ``` I can revert to nested class if you think it's unnecessary change. ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala ## @@ -229,3 +208,27 @@ class EquivalentExpressions { sb.toString() } } + +/** + * Orders `Expression` by parent/child relations. The child expression is smaller + * than parent expression. If there is child-parent relationships among the subexpressions, + * we want the child expressions come first than parent expressions, so we can replace + * child expressions in parent expressions with subexpression evaluation. Note that + * this is not for general expression ordering. For example, two irrelevant expressions + * will be considered as e1 < e2 and e2 < e1 by this ordering. But for the usage here, Review comment: Right, missing the doc. Fixed. ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala ## @@ -229,3 +208,27 @@ class EquivalentExpressions { sb.toString() } } + +/** + * Orders `Expression` by parent/child relations. The child expression is smaller + * than parent expression. If there is child-parent relationships among the subexpressions, + * we want the child expressions come first than parent expressions, so we can replace + * child expressions in parent expressions with subexpression evaluation. Note that + * this is not for general expression ordering. For example, two irrelevant expressions + * will be considered as equal by this ordering. But for the usage here, the order of + * irrelevant expressions does not matter. Review comment: Sure. Added. ## File path:
[GitHub] [spark] SparkQA removed a comment on pull request #32849: [WIP][SPARK-35704][SQL] Add fields to `DayTimeIntervalType`
SparkQA removed a comment on pull request #32849: URL: https://github.com/apache/spark/pull/32849#issuecomment-859038318 **[Test build #139659 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139659/testReport)** for PR 32849 at commit [`32fae30`](https://github.com/apache/spark/commit/32fae304389bbf5f3d9ae7cfb1a2496ae9e84a35). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32865: [SPARK-35701] Use copy-on-write semantics for SQLConf registered configurations.
maropu commented on a change in pull request #32865: URL: https://github.com/apache/spark/pull/32865#discussion_r649630875 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -53,28 +54,60 @@ import org.apache.spark.util.Utils object SQLConf { - private[sql] val sqlConfEntries = -new ConcurrentHashMap[String, ConfigEntry[_]]() + private[this] val sqlConfEntriesUpdateLock = new Object - val staticConfKeys: java.util.Set[String] = -java.util.Collections.synchronizedSet(new java.util.HashSet[String]()) + @volatile + private[this] var sqlConfEntries: util.Map[String, ConfigEntry[_]] = util.Collections.emptyMap() - private def register(entry: ConfigEntry[_]): Unit = sqlConfEntries.merge(entry.key, entry, -(existingConfigEntry, newConfigEntry) => { - require(existingConfigEntry == null, -s"Duplicate SQLConfigEntry. ${newConfigEntry.key} has been registered") - newConfigEntry -} - ) + private[this] val staticConfKeysUpdateLock = new Object Review comment: Q: we need the two separate locks for the static/non-static configs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #32783: [SPARK-35653][SQL] Fix CatalystToExternalMap interpreted path fails for Map with case classes as keys or values
maropu commented on pull request #32783: URL: https://github.com/apache/spark/pull/32783#issuecomment-859152984 Thank you, all~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when running multiple Hive version related tests
HyukjinKwon commented on pull request #32693: URL: https://github.com/apache/spark/pull/32693#issuecomment-859193309 cc @wangyum -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32871: [SPARK-35475][PYTHON] Fix disallow_untyped_defs mypy checks.
SparkQA removed a comment on pull request #32871: URL: https://github.com/apache/spark/pull/32871#issuecomment-859176870 **[Test build #139664 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139664/testReport)** for PR 32871 at commit [`81a1042`](https://github.com/apache/spark/commit/81a10425a0d4feaa08532e0138c293e9a013337e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #32822: [SPARK-35678][ML] add a common softmax function
zhengruifeng commented on a change in pull request #32822: URL: https://github.com/apache/spark/pull/32822#discussion_r649624193 ## File path: mllib-local/src/main/scala/org/apache/spark/ml/impl/Utils.scala ## @@ -94,4 +95,43 @@ private[spark] object Utils { math.log1p(math.exp(x)) } } + + /** + * Perform in-place softmax conversion. + */ + def softmax(values: Array[Double]): Unit = { +var maxValue = Double.MinValue +var i = 0 +while (i < values.length) { + val value = values(i) + if (value.isPosInfinity) { +java.util.Arrays.fill(values, 0) +values(i) = 1.0 +return + } else if (value > maxValue) { +maxValue = value + } + i += 1 +} + +var sum = 0.0 +i = 0 +if (maxValue > 0) { Review comment: sounds reasonable ## File path: mllib-local/src/main/scala/org/apache/spark/ml/impl/Utils.scala ## @@ -94,4 +95,43 @@ private[spark] object Utils { math.log1p(math.exp(x)) } } + + /** + * Perform in-place softmax conversion. + */ + def softmax(values: Array[Double]): Unit = { +var maxValue = Double.MinValue +var i = 0 +while (i < values.length) { + val value = values(i) + if (value.isPosInfinity) { +java.util.Arrays.fill(values, 0) +values(i) = 1.0 +return + } else if (value > maxValue) { +maxValue = value + } + i += 1 +} + +var sum = 0.0 +i = 0 +if (maxValue > 0) { Review comment: ```py def softmax(X, copy=True): """ Calculate the softmax function. The softmax function is calculated by np.exp(X) / np.sum(np.exp(X), axis=1) This will cause overflow when large values are exponentiated. Hence the largest value in each row is subtracted from each data point to prevent this. Parameters -- X : array-like of float of shape (M, N) Argument to the logistic function. copy : bool, default=True Copy X or not. Returns --- out : ndarray of shape (M, N) Softmax function evaluated at every point in x. """ if copy: X = np.copy(X) max_prob = np.max(X, axis=1).reshape((-1, 1)) X -= max_prob np.exp(X, X) sum_prob = np.sum(X, axis=1).reshape((-1, 1)) X /= sum_prob return X ``` softmax in scikit-learn does not check whether the maxvalue is positive or not -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on pull request #32140: [WIP][SPARK-32922][SHUFFLE][CORE] Adds support for executors to fetch local and remote merged shuffle data
otterc commented on pull request #32140: URL: https://github.com/apache/spark/pull/32140#issuecomment-859181282 This is still dependent on the changes in https://github.com/apache/spark/pull/32140 which has the protocol side of changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org