[GitHub] [spark] SparkQA commented on pull request #32883: [SPARK-35725][SQL] Support repartition expand partitions in AQE

2021-06-14 Thread GitBox


SparkQA commented on pull request #32883:
URL: https://github.com/apache/spark/pull/32883#issuecomment-859647049






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32874: [SPARK-35699][K8S] Improve error message when creating k8s pod failed.

2021-06-14 Thread GitBox


SparkQA commented on pull request #32874:
URL: https://github.com/apache/spark/pull/32874#issuecomment-859239174






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on pull request #32576: [SPARK-35429][CORE] Remove commons-httpclient due to EOL and CVEs

2021-06-14 Thread GitBox


sunchao commented on pull request #32576:
URL: https://github.com/apache/spark/pull/32576#issuecomment-859920490


   @sumeetgajjar do you wanna pick up this one again? we just upgraded Hive to 
2.3.9 which removes the `commons-httpclient` from Hive side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] suryanshagnihotri commented on pull request #26619: [SPARK-29981][BUILD] Add hive-1.2/2.3 profiles

2021-06-14 Thread GitBox


suryanshagnihotri commented on pull request #26619:
URL: https://github.com/apache/spark/pull/26619#issuecomment-859243045


   @dongjoon-hyun yes have tried it but it fails with class not found 
exceptions like 
   ```
   java.lang.ClassNotFoundException: 
org.apache.hadoop.hive.ql.plan.LoadTableDesc$LoadFileType
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at 
org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:255)
at 
org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:244)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at 
org.apache.spark.sql.hive.client.Shim_v3_0.clazzLoadFileType$lzycompute(HiveShim.scala:1272)
at 
org.apache.spark.sql.hive.client.Shim_v3_0.clazzLoadFileType(HiveShim.scala:1271)
at 
org.apache.spark.sql.hive.client.Shim_v3_0.loadTable(HiveShim.scala:1352)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$loadTable$1(HiveClientImpl.scala:894)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:295)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:228)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:227)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:277)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.loadTable(HiveClientImpl.scala:889)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$loadTable$1(HiveExternalCatalog.scala:884)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:103)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:878)
at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.loadTable(ExternalCatalogWithListener.scala:167)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:327)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:102)
at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:120)
at 
org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps and make `isnull` method data-type-based

2021-06-14 Thread GitBox


SparkQA commented on pull request #32821:
URL: https://github.com/apache/spark/pull/32821#issuecomment-859239324






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #32136: [SPARK-35022][CORE] Task Scheduling Plugin in Spark

2021-06-14 Thread GitBox


mridulm commented on pull request #32136:
URL: https://github.com/apache/spark/pull/32136#issuecomment-859263950


   I am trying to catch up on this discussion, and it is a very long thread 
already :-) Thanks for all the discussion !
   Can we update the doc @viirya given that there seems to be some consensus 
developing ?
   Based on that, I will revisit comments to understand the rationales better 
for the various conclusion points.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #32850: [SPARK-34920][CORE][SQL] Add error classes with SQLSTATE

2021-06-14 Thread GitBox


cloud-fan commented on pull request #32850:
URL: https://github.com/apache/spark/pull/32850#issuecomment-859584261


   cc @viirya @maropu @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sigmod closed pull request #32463: [WIP][SPARK-35147][SQL] Migrate to resolveWithPruning for two command rules

2021-06-14 Thread GitBox


sigmod closed pull request #32463:
URL: https://github.com/apache/spark/pull/32463


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yikf opened a new pull request #32877: wait until something does get queued.

2021-06-14 Thread GitBox


yikf opened a new pull request #32877:
URL: https://github.com/apache/spark/pull/32877


   ### What changes were proposed in this pull request?
   Currently, we continue the loop after wait timeout when `ContextCleaner` 
cleanUp if the queue is empty, It is an ineffective loop because the queue is 
empty.
   
   In the PR, it prevent ineffective loop, if the queue is empty, we wait until 
something does get queued instead of ineffective loop after wait timeout .
   
   ### Why are the changes needed?
   avoid inefeective loop
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Exist test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #32873: [SPARK-35718][SQL] Support casting of Date to timestamp without time zone type

2021-06-14 Thread GitBox


cloud-fan closed pull request #32873:
URL: https://github.com/apache/spark/pull/32873


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32835: [SPARK-35591][PYTHON][DOCS] Rename "Koalas" to "pandas API on Spark" in the documents

2021-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #32835:
URL: https://github.com/apache/spark/pull/32835#issuecomment-859252319






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32885: [SPARK-35742][SQL] Expression.semanticEquals should be symmetrical

2021-06-14 Thread GitBox


SparkQA removed a comment on pull request #32885:
URL: https://github.com/apache/spark/pull/32885#issuecomment-859924503


   **[Test build #139711 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139711/testReport)**
 for PR 32885 at commit 
[`73aa6f2`](https://github.com/apache/spark/commit/73aa6f23c1280b8b047d177ddbb6b41d3d6e02cc).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ueshin closed pull request #32871: [SPARK-35475][PYTHON] Fix disallow_untyped_defs mypy checks.

2021-06-14 Thread GitBox


ueshin closed pull request #32871:
URL: https://github.com/apache/spark/pull/32871


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32410: [SPARK-35286][SQL] Replace SessionState.start with SessionState.setCurrentSessionState

2021-06-14 Thread GitBox


HyukjinKwon commented on a change in pull request #32410:
URL: https://github.com/apache/spark/pull/32410#discussion_r649958997



##
File path: 
sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java
##
@@ -141,7 +141,8 @@ public void open(Map sessionConfMap) throws 
HiveSQLException {
 sessionState = new SessionState(hiveConf, username);
 sessionState.setUserIpAddress(ipAddress);
 sessionState.setIsHiveServerQuery(true);
-SessionState.start(sessionState);
+// Use setCurrentSessionState to avoid creating useless SessionDirs.
+SessionState.setCurrentSessionState(sessionState);

Review comment:
   @wangyum, last question. What about this? Is it safe to remove?
   
   ```
 // Hive object instance should be created with a copy of the conf 
object. If the conf is
 // shared with SessionState, other parts of the code might update the 
config, but
 // Hive.get(HiveConf) would not recognize the case when it needs 
refreshing
 Hive.get(new HiveConf(startSs.conf)).getMSC();
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #32828: [SPARK-35689][SS] Add log warn when keyWithIndexToValue returns null value

2021-06-14 Thread GitBox


viirya commented on pull request #32828:
URL: https://github.com/apache/spark/pull/32828#issuecomment-859243654






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32858: [SPARK-35706][SQL] Consider making the ':' in STRUCT data type definition optional

2021-06-14 Thread GitBox


AmplabJenkins commented on pull request #32858:
URL: https://github.com/apache/spark/pull/32858#issuecomment-859553066






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32845: [SPARK-35691][CORE] addFile/addJar/addDirectory should put CanonicalFile

2021-06-14 Thread GitBox


AmplabJenkins commented on pull request #32845:
URL: https://github.com/apache/spark/pull/32845#issuecomment-859245462






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #32838: [SPARK-35694][INFRA] Increase the default JVM stack size of SBT/Maven

2021-06-14 Thread GitBox


gengliangwang commented on pull request #32838:
URL: https://github.com/apache/spark/pull/32838#issuecomment-859341582


   @LuciferYang Yes, I saw that too. Let me increase it to 64MB and have 
multiple retries before merging it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dgd-contributor commented on pull request #32863: [SPARK-35652][SQL] joinWith on two table generated from same one

2021-06-14 Thread GitBox


dgd-contributor commented on pull request #32863:
URL: https://github.com/apache/spark/pull/32863#issuecomment-859211187






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32862: [SPARK-35695][SQL] Collect observed metrics from cached and adaptive execution sub-trees

2021-06-14 Thread GitBox


AmplabJenkins commented on pull request #32862:
URL: https://github.com/apache/spark/pull/32862#issuecomment-859302268






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Yikun edited a comment on pull request #32867: [SPARK-35721][PYTHON] Path level discover for python unittests

2021-06-14 Thread GitBox


Yikun edited a comment on pull request #32867:
URL: https://github.com/apache/spark/pull/32867#issuecomment-858668874


   cc @HyukjinKwon @xinrong-databricks 
   
   It would be good if you could give some inputs on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] eejbyfeldt commented on pull request #27153: [SPARK-20384][SQL] Support value class in schema of Dataset (without breaking existing current projection)

2021-06-14 Thread GitBox


eejbyfeldt commented on pull request #27153:
URL: https://github.com/apache/spark/pull/27153#issuecomment-859592841


   Hi @mickjermsurawong-stripe ! Great to hear, looking forward to updated 
changes. Also my PR https://github.com/apache/spark/pull/32783 got merged, so 
rebasing this branch on master should no longer fail tests. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #31504: [SPARK-34172][SQL] Add `SHOW DATABASES` as table-valued function

2021-06-14 Thread GitBox


github-actions[bot] closed pull request #31504:
URL: https://github.com/apache/spark/pull/31504


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32849: [WIP][SPARK-35704][SQL] Add fields to `DayTimeIntervalType`

2021-06-14 Thread GitBox


SparkQA commented on pull request #32849:
URL: https://github.com/apache/spark/pull/32849#issuecomment-859182161


   **[Test build #139659 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139659/testReport)**
 for PR 32849 at commit 
[`32fae30`](https://github.com/apache/spark/commit/32fae304389bbf5f3d9ae7cfb1a2496ae9e84a35).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32870: [SPARK-35439][SQL][FOLLOWUP] ExpressionContainmentOrdering should not sort unrelated expressions

2021-06-14 Thread GitBox


AmplabJenkins commented on pull request #32870:
URL: https://github.com/apache/spark/pull/32870#issuecomment-859195282






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32750: [SPARK-34512][BUILD][SQL] Upgrade built-in Hive to 2.3.9

2021-06-14 Thread GitBox


AmplabJenkins commented on pull request #32750:
URL: https://github.com/apache/spark/pull/32750#issuecomment-859195284






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a change in pull request #32140: [WIP][SPARK-32922][SHUFFLE][CORE] Adds support for executors to fetch local and remote merged shuffle data

2021-06-14 Thread GitBox


otterc commented on a change in pull request #32140:
URL: https://github.com/apache/spark/pull/32140#discussion_r649603473



##
File path: 
core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala
##
@@ -22,31 +22,40 @@ import java.nio.ByteBuffer
 import java.util.UUID
 import java.util.concurrent.{CompletableFuture, Semaphore}
 
+import scala.collection.mutable
 import scala.concurrent.ExecutionContext.Implicits.global
 import scala.concurrent.Future
 
 import io.netty.util.internal.OutOfDirectMemoryError
 import org.mockito.ArgumentMatchers.{any, eq => meq}
-import org.mockito.Mockito.{mock, times, verify, when}
+import org.mockito.Mockito.{doThrow, mock, times, verify, when}
+import org.mockito.invocation.InvocationOnMock
 import org.mockito.stubbing.Answer
+import org.roaringbitmap.RoaringBitmap
 import org.scalatest.PrivateMethodTester
 
-import org.apache.spark.{SparkFunSuite, TaskContext}
+import org.apache.spark.{MapOutputTracker, SparkFunSuite, TaskContext}
+import org.apache.spark.MapOutputTracker.SHUFFLE_PUSH_MAP_ID
 import org.apache.spark.network._
 import org.apache.spark.network.buffer.{FileSegmentManagedBuffer, 
ManagedBuffer}
-import org.apache.spark.network.shuffle.{BlockFetchingListener, 
DownloadFileManager, ExternalBlockStoreClient}
+import org.apache.spark.network.shuffle.{BlockFetchingListener, 
DownloadFileManager, ExternalBlockStoreClient, MergedBlockMeta, 
MergedBlocksMetaListener}
 import org.apache.spark.network.util.LimitedInputStream
 import org.apache.spark.shuffle.{FetchFailedException, 
ShuffleReadMetricsReporter}
-import org.apache.spark.storage.ShuffleBlockFetcherIterator.FetchBlockInfo
+import org.apache.spark.storage.BlockManagerId.SHUFFLE_MERGER_IDENTIFIER
+import org.apache.spark.storage.ShuffleBlockFetcherIterator._
 import org.apache.spark.util.Utils
 
 
 class ShuffleBlockFetcherIteratorSuite extends SparkFunSuite with 
PrivateMethodTester {
 

Review comment:
   I have added test for (b) `failed to fetch merged block as well as 
fallback block should throw a FetchFailedException`

##
File path: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
##
@@ -871,6 +1054,82 @@ final class ShuffleBlockFetcherIterator(
   "Failed to get block " + blockId + ", which is not a shuffle block", 
e)
 }
   }
+
+  /**
+   * All the below methods are used by [[PushBasedFetchHelper]] to communicate 
with the iterator
+   */
+  private[storage] def addToResultsQueue(result: FetchResult): Unit = {
+results.put(result)
+  }
+
+  private[storage] def foundMoreBlocksToFetch(moreBlocksToFetch: Int): Unit = {
+numBlocksToFetch += moreBlocksToFetch
+  }
+
+  /**
+   * Currently used by [[PushBasedFetchHelper]] to fetch fallback blocks when 
there is a fetch
+   * failure with a shuffle merged block/chunk.
+   */
+  private[storage] def fetchFallbackBlocks(
+  fallbackBlocksByAddr: Iterator[(BlockManagerId, Seq[(BlockId, Long, 
Int)])]): Unit = {
+val fallbackLocalBlocks = mutable.LinkedHashSet[(BlockId, Int)]()
+val fallbackHostLocalBlocksByExecutor =
+  mutable.LinkedHashMap[BlockManagerId, Seq[(BlockId, Long, Int)]]()
+val fallbackMergedLocalBlocks = mutable.LinkedHashSet[BlockId]()
+val fallbackRemoteReqs = partitionBlocksByFetchMode(fallbackBlocksByAddr,
+  fallbackLocalBlocks, fallbackHostLocalBlocksByExecutor, 
fallbackMergedLocalBlocks)
+// Add the remote requests into our queue in a random order
+fetchRequests ++= Utils.randomize(fallbackRemoteReqs)
+logInfo(s"Started ${fallbackRemoteReqs.size} fallback remote requests for 
merged")
+// If there is any fall back block that's a local block, we get them here. 
The original
+// invocation to fetchLocalBlocks might have already returned by this 
time, so we need
+// to invoke it again here.
+fetchLocalBlocks(fallbackLocalBlocks)
+// Merged local blocks should be empty during fallback
+assert(fallbackMergedLocalBlocks.isEmpty,
+  "There should be zero merged blocks during fallback")
+// Some of the fallback local blocks could be host local blocks
+fetchAllHostLocalBlocks(fallbackHostLocalBlocksByExecutor)
+  }
+
+  /**
+   * Removes all the pending shuffle chunks that are on the same host as the 
block chunk that had
+   * a fetch failure.
+   *
+   * @return set of all the removed shuffle chunk Ids.
+   */
+  private[storage] def removePendingChunks(
+  failedBlockId: ShuffleBlockChunkId,
+  address: BlockManagerId): mutable.HashSet[ShuffleBlockChunkId] = {
+val removedChunkIds = new mutable.HashSet[ShuffleBlockChunkId]()
+
+def sameShuffleBlockChunk(block: BlockId): Boolean = {
+  val chunkId = block.asInstanceOf[ShuffleBlockChunkId]
+  chunkId.shuffleId == failedBlockId.shuffleId && chunkId.reduceId == 
failedBlockId.reduceId
+}
+
+def filterRequests(queue: mutable.Queue[FetchRequest]): Unit = {
+  val 

[GitHub] [spark] dongjoon-hyun commented on pull request #26619: [SPARK-29981][BUILD] Add hive-1.2/2.3 profiles

2021-06-14 Thread GitBox


dongjoon-hyun commented on pull request #26619:
URL: https://github.com/apache/spark/pull/26619#issuecomment-859155211


   Did you try `spark.sql.hive.metastore.version`, @suryanshagnihotri ?
   - 
https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore
   
   For some known issues, we are upgrade the built-in hive module to 2.3.9 here.
   - https://github.com/apache/spark/pull/32750


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32847: [SPARK-35616][PYTHON] Make `astype` method data-type-based

2021-06-14 Thread GitBox


AmplabJenkins commented on pull request #32847:
URL: https://github.com/apache/spark/pull/32847#issuecomment-859175443






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on pull request #31685: [SPARK-26797][SQL][WIP][test-maven] Start using the new logical types API of Parquet 1.11.1 instead of the deprecated one

2021-06-14 Thread GitBox


github-actions[bot] commented on pull request #31685:
URL: https://github.com/apache/spark/pull/31685#issuecomment-859166765


   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on pull request #30878: [SPARK-33872][SQL] Throw more specific exceptions when relations/namespaces are not found

2021-06-14 Thread GitBox


github-actions[bot] commented on pull request #30878:
URL: https://github.com/apache/spark/pull/30878#issuecomment-859166786


   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #32470: [SPARK-35712][SQL] Simplify ResolveAggregateFunctions

2021-06-14 Thread GitBox


maropu commented on a change in pull request #32470:
URL: https://github.com/apache/spark/pull/32470#discussion_r649623086



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -2457,164 +2450,127 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   _.containsPattern(AGGREGATE), ruleId) {
   // Resolve aggregate with having clause to Filter(..., Aggregate()). 
Note, to avoid wrongly
   // resolve the having condition expression, here we skip resolving it in 
ResolveReferences
-  // and transform it to Filter after aggregate is resolved. See more 
details in SPARK-31519.
+  // and transform it to Filter after aggregate is resolved. Basically 
columns in HAVING should
+  // be resolved with `agg.child.output` first. See more details in 
SPARK-31519.
   case UnresolvedHaving(cond, agg: Aggregate) if agg.resolved =>
-resolveHaving(Filter(cond, agg), agg)
-
-  case f @ Filter(_, agg: Aggregate) if agg.resolved =>
-resolveHaving(f, agg)
-
-  case sort @ Sort(sortOrder, global, aggregate: Aggregate) if 
aggregate.resolved =>
-
-// Try resolving the ordering as though it is in the aggregate clause.
-try {
-  // If a sort order is unresolved, containing references not in 
aggregate, or containing
-  // `AggregateExpression`, we need to push down it to the underlying 
aggregate operator.
-  val unresolvedSortOrders = sortOrder.filter { s =>
-!s.resolved || !s.references.subsetOf(aggregate.outputSet) || 
containsAggregate(s)
+resolveOperatorWithAggregate(Seq(cond), agg, (newExprs, newChild) => {
+  Filter(newExprs.head, newChild)
+})
+
+  case Filter(cond, agg: Aggregate) if agg.resolved =>
+// We should resolve the references normally based on child.output 
first.
+val maybeResolved = resolveExpressionByPlanOutput(cond, agg)
+resolveOperatorWithAggregate(Seq(maybeResolved), agg, (newExprs, 
newChild) => {
+  Filter(newExprs.head, newChild)
+})
+
+  case Sort(sortOrder, global, agg: Aggregate) if agg.resolved =>
+// We should resolve the references normally based on child.output 
first.
+val maybeResolved = 
sortOrder.map(_.child).map(resolveExpressionByPlanOutput(_, agg))
+resolveOperatorWithAggregate(maybeResolved, agg, (newExprs, newChild) 
=> {
+  val newSortOrder = sortOrder.zip(newExprs).map {
+case (sortOrder, expr) => sortOrder.copy(child = expr)
   }
-  val aliasedOrdering = unresolvedSortOrders.map(o => Alias(o.child, 
"aggOrder")())
-
-  val aggregateWithExtraOrdering = aggregate.copy(
-aggregateExpressions = aggregate.aggregateExpressions ++ 
aliasedOrdering)
-
-  val resolvedAggregate: Aggregate =
-
executeSameContext(aggregateWithExtraOrdering).asInstanceOf[Aggregate]
-
-  val (reResolvedAggExprs, resolvedAliasedOrdering) =
-
resolvedAggregate.aggregateExpressions.splitAt(aggregate.aggregateExpressions.length)
-
-  // If we pass the analysis check, then the ordering expressions 
should only reference to
-  // aggregate expressions or grouping expressions, and it's safe to 
push them down to
-  // Aggregate.
-  checkAnalysis(resolvedAggregate)
-
-  val originalAggExprs = 
aggregate.aggregateExpressions.map(trimNonTopLevelAliases)
-
-  // If the ordering expression is same with original aggregate 
expression, we don't need
-  // to push down this ordering expression and can reference the 
original aggregate
-  // expression instead.
-  val needsPushDown = ArrayBuffer.empty[NamedExpression]
-  val orderToAlias = unresolvedSortOrders.zip(aliasedOrdering)
-  val evaluatedOrderings =
-
resolvedAliasedOrdering.asInstanceOf[Seq[Alias]].zip(orderToAlias).map {
-  case (evaluated, (order, aliasOrder)) =>
-val index = reResolvedAggExprs.indexWhere {
-  case Alias(child, _) => child semanticEquals evaluated.child
-  case other => other semanticEquals evaluated.child
-}
-
-if (index == -1) {
-  if (hasCharVarchar(evaluated)) {
-needsPushDown += aliasOrder
-order.copy(child = aliasOrder)
-  } else {
-needsPushDown += evaluated
-order.copy(child = evaluated.toAttribute)
-  }
-} else {
-  order.copy(child = originalAggExprs(index).toAttribute)
-}
+  Sort(newSortOrder, global, newChild)
+})
+}
+
+def resolveExprsWithAggregate(
+exprs: Seq[Expression],
+agg: Aggregate): (Seq[NamedExpression], Seq[Expression]) = {

Review comment:
   How about 

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #26619: [SPARK-29981][BUILD] Add hive-1.2/2.3 profiles

2021-06-14 Thread GitBox


dongjoon-hyun edited a comment on pull request #26619:
URL: https://github.com/apache/spark/pull/26619#issuecomment-859155211


   Did you try `spark.sql.hive.metastore.version`, @suryanshagnihotri ?
   - 
https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore
   
   For some known issues, we are upgrading the built-in hive module to 2.3.9 
here.
   - https://github.com/apache/spark/pull/32750


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32871: [SPARK-35475][PYTHON] Fix disallow_untyped_defs mypy checks.

2021-06-14 Thread GitBox


AmplabJenkins commented on pull request #32871:
URL: https://github.com/apache/spark/pull/32871#issuecomment-859195281






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32870: [[SPARK-35439][SQL]][FOLLOWUP] ExpressionContainmentOrdering should not sort unrelated expressions

2021-06-14 Thread GitBox


SparkQA commented on pull request #32870:
URL: https://github.com/apache/spark/pull/32870#issuecomment-859153502






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32847: [SPARK-35616][PYTHON] Make `astype` method data-type-based

2021-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #32847:
URL: https://github.com/apache/spark/pull/32847#issuecomment-859175443






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32871: [SPARK-35475][PYTHON] Fix disallow_untyped_defs mypy checks.

2021-06-14 Thread GitBox


SparkQA commented on pull request #32871:
URL: https://github.com/apache/spark/pull/32871#issuecomment-859176870






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32750: [SPARK-34512][BUILD][SQL] Upgrade built-in Hive to 2.3.9

2021-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #32750:
URL: https://github.com/apache/spark/pull/32750#issuecomment-859195284






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #32841: [SPARK-35673][SQL] Fix user-defined hint and unrecognized hint in subquery.

2021-06-14 Thread GitBox


maropu commented on pull request #32841:
URL: https://github.com/apache/spark/pull/32841#issuecomment-859193291


   late lgtm. Thank you for this fix, @cfmcgrady .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu closed pull request #32860: [SPARK-35709][DOCS] Remove the reference to third party Nomad integration project

2021-06-14 Thread GitBox


maropu closed pull request #32860:
URL: https://github.com/apache/spark/pull/32860


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #32860: [SPARK-35709][DOCS] Remove the reference to third party Nomad integration project

2021-06-14 Thread GitBox


maropu commented on pull request #32860:
URL: https://github.com/apache/spark/pull/32860#issuecomment-859153880






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32849: [WIP][SPARK-35704][SQL] Add fields to `DayTimeIntervalType`

2021-06-14 Thread GitBox


AmplabJenkins commented on pull request #32849:
URL: https://github.com/apache/spark/pull/32849#issuecomment-859153228






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32849: [WIP][SPARK-35704][SQL] Add fields to `DayTimeIntervalType`

2021-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #32849:
URL: https://github.com/apache/spark/pull/32849#issuecomment-859153228






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32870: [SPARK-35439][SQL][FOLLOWUP] ExpressionContainmentOrdering should not sort unrelated expressions

2021-06-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #32870:
URL: https://github.com/apache/spark/pull/32870#discussion_r649609136



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
##
@@ -229,3 +208,27 @@ class EquivalentExpressions {
 sb.toString()
   }
 }
+
+/**
+ * Orders `Expression` by parent/child relations. The child expression is 
smaller
+ * than parent expression. If there is child-parent relationships among the 
subexpressions,
+ * we want the child expressions come first than parent expressions, so we can 
replace
+ * child expressions in parent expressions with subexpression evaluation. Note 
that
+ * this is not for general expression ordering. For example, two irrelevant 
expressions
+ * will be considered as e1 < e2 and e2 < e1 by this ordering. But for the 
usage here,
+ * the order of irrelevant expressions does not matter.
+ */
+class ExpressionContainmentOrdering extends Ordering[Expression] {

Review comment:
   Just curious, is there a reason of this move?

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
##
@@ -229,3 +208,27 @@ class EquivalentExpressions {
 sb.toString()
   }
 }
+
+/**
+ * Orders `Expression` by parent/child relations. The child expression is 
smaller
+ * than parent expression. If there is child-parent relationships among the 
subexpressions,
+ * we want the child expressions come first than parent expressions, so we can 
replace
+ * child expressions in parent expressions with subexpression evaluation. Note 
that
+ * this is not for general expression ordering. For example, two irrelevant 
expressions
+ * will be considered as e1 < e2 and e2 < e1 by this ordering. But for the 
usage here,

Review comment:
   Shall we change this to `0` according to the new logic?

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
##
@@ -229,3 +208,27 @@ class EquivalentExpressions {
 sb.toString()
   }
 }
+
+/**
+ * Orders `Expression` by parent/child relations. The child expression is 
smaller
+ * than parent expression. If there is child-parent relationships among the 
subexpressions,
+ * we want the child expressions come first than parent expressions, so we can 
replace
+ * child expressions in parent expressions with subexpression evaluation. Note 
that
+ * this is not for general expression ordering. For example, two irrelevant 
expressions
+ * will be considered as e1 < e2 and e2 < e1 by this ordering. But for the 
usage here,
+ * the order of irrelevant expressions does not matter.
+ */
+class ExpressionContainmentOrdering extends Ordering[Expression] {

Review comment:
   Never mind. New one also looks good~

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
##
@@ -229,3 +208,27 @@ class EquivalentExpressions {
 sb.toString()
   }
 }
+
+/**
+ * Orders `Expression` by parent/child relations. The child expression is 
smaller
+ * than parent expression. If there is child-parent relationships among the 
subexpressions,
+ * we want the child expressions come first than parent expressions, so we can 
replace
+ * child expressions in parent expressions with subexpression evaluation. Note 
that
+ * this is not for general expression ordering. For example, two irrelevant 
expressions
+ * will be considered as equal by this ordering. But for the usage here, the 
order of
+ * irrelevant expressions does not matter.

Review comment:
   To be complete, could you add some description about the 
semantically-equal expressions?

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
##
@@ -229,3 +208,27 @@ class EquivalentExpressions {
 sb.toString()
   }
 }
+
+/**
+ * Orders `Expression` by parent/child relations. The child expression is 
smaller
+ * than parent expression. If there is child-parent relationships among the 
subexpressions,
+ * we want the child expressions come first than parent expressions, so we can 
replace
+ * child expressions in parent expressions with subexpression evaluation. Note 
that
+ * this is not for general expression ordering. For example, two irrelevant or 
semantically-equal
+ * expressions will be considered as equal by this ordering. But for the usage 
here, the order of
+ * irrelevant expressions does not matter.
+ */
+class ExpressionContainmentOrdering extends Ordering[Expression] {
+  override def compare(x: Expression, y: Expression): Int = {
+if (x.find(_.semanticEquals(y)).isDefined) {
+  // `y` is child expression of `x`.
+  1
+} else if (y.find(_.semanticEquals(x)).isDefined) {
+  // `x` is child expression of `y`.
+  -1
+} else {
+  // Irrelevant expressions

Review comment:
   

[GitHub] [spark] ueshin opened a new pull request #32871: [SPARK-35475][PYTHON] Fix disallow_untyped_defs mypy checks.

2021-06-14 Thread GitBox


ueshin opened a new pull request #32871:
URL: https://github.com/apache/spark/pull/32871






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32868: [SPARK-35714][CORE] Bug fix for deadlock during the executor shutdown

2021-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #32868:
URL: https://github.com/apache/spark/pull/32868#issuecomment-859153229


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139660/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32868: [SPARK-35714][CORE] Bug fix for deadlock during the executor shutdown

2021-06-14 Thread GitBox


AmplabJenkins commented on pull request #32868:
URL: https://github.com/apache/spark/pull/32868#issuecomment-859153229


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139660/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32871: [SPARK-35475][PYTHON] Fix disallow_untyped_defs mypy checks.

2021-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #32871:
URL: https://github.com/apache/spark/pull/32871#issuecomment-859195281






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32750: [SPARK-34512][BUILD][SQL] Upgrade built-in Hive to 2.3.9

2021-06-14 Thread GitBox


SparkQA commented on pull request #32750:
URL: https://github.com/apache/spark/pull/32750#issuecomment-859153636






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #32870: [SPARK-35439][SQL][FOLLOWUP] ExpressionContainmentOrdering should not sort unrelated expressions

2021-06-14 Thread GitBox


viirya commented on a change in pull request #32870:
URL: https://github.com/apache/spark/pull/32870#discussion_r649615316



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
##
@@ -229,3 +208,27 @@ class EquivalentExpressions {
 sb.toString()
   }
 }
+
+/**
+ * Orders `Expression` by parent/child relations. The child expression is 
smaller
+ * than parent expression. If there is child-parent relationships among the 
subexpressions,
+ * we want the child expressions come first than parent expressions, so we can 
replace
+ * child expressions in parent expressions with subexpression evaluation. Note 
that
+ * this is not for general expression ordering. For example, two irrelevant 
expressions
+ * will be considered as e1 < e2 and e2 < e1 by this ordering. But for the 
usage here,
+ * the order of irrelevant expressions does not matter.
+ */
+class ExpressionContainmentOrdering extends Ordering[Expression] {

Review comment:
   Oh, as it is a nested class, I cannot allocate it separately, but
   
   ```scala
   val equivalence = new EquivalentExpressions
   val exprOrdering = new equivalence.ExpressionContainmentOrdering
   ```
   
   I can revert to nested class if you think it's unnecessary change.

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
##
@@ -229,3 +208,27 @@ class EquivalentExpressions {
 sb.toString()
   }
 }
+
+/**
+ * Orders `Expression` by parent/child relations. The child expression is 
smaller
+ * than parent expression. If there is child-parent relationships among the 
subexpressions,
+ * we want the child expressions come first than parent expressions, so we can 
replace
+ * child expressions in parent expressions with subexpression evaluation. Note 
that
+ * this is not for general expression ordering. For example, two irrelevant 
expressions
+ * will be considered as e1 < e2 and e2 < e1 by this ordering. But for the 
usage here,

Review comment:
   Right, missing the doc. Fixed.

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
##
@@ -229,3 +208,27 @@ class EquivalentExpressions {
 sb.toString()
   }
 }
+
+/**
+ * Orders `Expression` by parent/child relations. The child expression is 
smaller
+ * than parent expression. If there is child-parent relationships among the 
subexpressions,
+ * we want the child expressions come first than parent expressions, so we can 
replace
+ * child expressions in parent expressions with subexpression evaluation. Note 
that
+ * this is not for general expression ordering. For example, two irrelevant 
expressions
+ * will be considered as e1 < e2 and e2 < e1 by this ordering. But for the 
usage here,
+ * the order of irrelevant expressions does not matter.
+ */
+class ExpressionContainmentOrdering extends Ordering[Expression] {

Review comment:
   Oh, as it is a nested class, I cannot allocate it separately, but
   
   ```scala
   val equivalence = new EquivalentExpressions
   val exprOrdering = new equivalence.ExpressionContainmentOrdering
   ```
   
   I can revert to nested class if you think it's unnecessary change.

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
##
@@ -229,3 +208,27 @@ class EquivalentExpressions {
 sb.toString()
   }
 }
+
+/**
+ * Orders `Expression` by parent/child relations. The child expression is 
smaller
+ * than parent expression. If there is child-parent relationships among the 
subexpressions,
+ * we want the child expressions come first than parent expressions, so we can 
replace
+ * child expressions in parent expressions with subexpression evaluation. Note 
that
+ * this is not for general expression ordering. For example, two irrelevant 
expressions
+ * will be considered as e1 < e2 and e2 < e1 by this ordering. But for the 
usage here,

Review comment:
   Right, missing the doc. Fixed.

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
##
@@ -229,3 +208,27 @@ class EquivalentExpressions {
 sb.toString()
   }
 }
+
+/**
+ * Orders `Expression` by parent/child relations. The child expression is 
smaller
+ * than parent expression. If there is child-parent relationships among the 
subexpressions,
+ * we want the child expressions come first than parent expressions, so we can 
replace
+ * child expressions in parent expressions with subexpression evaluation. Note 
that
+ * this is not for general expression ordering. For example, two irrelevant 
expressions
+ * will be considered as equal by this ordering. But for the usage here, the 
order of
+ * irrelevant expressions does not matter.

Review comment:
   Sure. Added.

##
File path: 

[GitHub] [spark] SparkQA removed a comment on pull request #32849: [WIP][SPARK-35704][SQL] Add fields to `DayTimeIntervalType`

2021-06-14 Thread GitBox


SparkQA removed a comment on pull request #32849:
URL: https://github.com/apache/spark/pull/32849#issuecomment-859038318


   **[Test build #139659 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139659/testReport)**
 for PR 32849 at commit 
[`32fae30`](https://github.com/apache/spark/commit/32fae304389bbf5f3d9ae7cfb1a2496ae9e84a35).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #32865: [SPARK-35701] Use copy-on-write semantics for SQLConf registered configurations.

2021-06-14 Thread GitBox


maropu commented on a change in pull request #32865:
URL: https://github.com/apache/spark/pull/32865#discussion_r649630875



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -53,28 +54,60 @@ import org.apache.spark.util.Utils
 
 object SQLConf {
 
-  private[sql] val sqlConfEntries =
-new ConcurrentHashMap[String, ConfigEntry[_]]()
+  private[this] val sqlConfEntriesUpdateLock = new Object
 
-  val staticConfKeys: java.util.Set[String] =
-java.util.Collections.synchronizedSet(new java.util.HashSet[String]())
+  @volatile
+  private[this] var sqlConfEntries: util.Map[String, ConfigEntry[_]] = 
util.Collections.emptyMap()
 
-  private def register(entry: ConfigEntry[_]): Unit = 
sqlConfEntries.merge(entry.key, entry,
-(existingConfigEntry, newConfigEntry) => {
-  require(existingConfigEntry == null,
-s"Duplicate SQLConfigEntry. ${newConfigEntry.key} has been registered")
-  newConfigEntry
-}
-  )
+  private[this] val staticConfKeysUpdateLock = new Object

Review comment:
   Q: we need the two separate locks for the static/non-static configs?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #32783: [SPARK-35653][SQL] Fix CatalystToExternalMap interpreted path fails for Map with case classes as keys or values

2021-06-14 Thread GitBox


maropu commented on pull request #32783:
URL: https://github.com/apache/spark/pull/32783#issuecomment-859152984


   Thank you, all~ 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when running multiple Hive version related tests

2021-06-14 Thread GitBox


HyukjinKwon commented on pull request #32693:
URL: https://github.com/apache/spark/pull/32693#issuecomment-859193309


   cc @wangyum


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32871: [SPARK-35475][PYTHON] Fix disallow_untyped_defs mypy checks.

2021-06-14 Thread GitBox


SparkQA removed a comment on pull request #32871:
URL: https://github.com/apache/spark/pull/32871#issuecomment-859176870


   **[Test build #139664 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139664/testReport)**
 for PR 32871 at commit 
[`81a1042`](https://github.com/apache/spark/commit/81a10425a0d4feaa08532e0138c293e9a013337e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #32822: [SPARK-35678][ML] add a common softmax function

2021-06-14 Thread GitBox


zhengruifeng commented on a change in pull request #32822:
URL: https://github.com/apache/spark/pull/32822#discussion_r649624193



##
File path: mllib-local/src/main/scala/org/apache/spark/ml/impl/Utils.scala
##
@@ -94,4 +95,43 @@ private[spark] object Utils {
   math.log1p(math.exp(x))
 }
   }
+
+  /**
+   * Perform in-place softmax conversion.
+   */
+  def softmax(values: Array[Double]): Unit = {
+var maxValue = Double.MinValue
+var i = 0
+while (i < values.length) {
+  val value = values(i)
+  if (value.isPosInfinity) {
+java.util.Arrays.fill(values, 0)
+values(i) = 1.0
+return
+  } else if (value > maxValue) {
+maxValue = value
+  }
+  i += 1
+}
+
+var sum = 0.0
+i = 0
+if (maxValue > 0) {

Review comment:
   sounds reasonable

##
File path: mllib-local/src/main/scala/org/apache/spark/ml/impl/Utils.scala
##
@@ -94,4 +95,43 @@ private[spark] object Utils {
   math.log1p(math.exp(x))
 }
   }
+
+  /**
+   * Perform in-place softmax conversion.
+   */
+  def softmax(values: Array[Double]): Unit = {
+var maxValue = Double.MinValue
+var i = 0
+while (i < values.length) {
+  val value = values(i)
+  if (value.isPosInfinity) {
+java.util.Arrays.fill(values, 0)
+values(i) = 1.0
+return
+  } else if (value > maxValue) {
+maxValue = value
+  }
+  i += 1
+}
+
+var sum = 0.0
+i = 0
+if (maxValue > 0) {

Review comment:
   ```py
   def softmax(X, copy=True):
   """
   Calculate the softmax function.
   
   The softmax function is calculated by
   np.exp(X) / np.sum(np.exp(X), axis=1)
   
   This will cause overflow when large values are exponentiated.
   Hence the largest value in each row is subtracted from each data
   point to prevent this.
   
   Parameters
   --
   X : array-like of float of shape (M, N)
   Argument to the logistic function.
   
   copy : bool, default=True
   Copy X or not.
   
   Returns
   ---
   out : ndarray of shape (M, N)
   Softmax function evaluated at every point in x.
   """
   if copy:
   X = np.copy(X)
   max_prob = np.max(X, axis=1).reshape((-1, 1))
   X -= max_prob
   np.exp(X, X)
   sum_prob = np.sum(X, axis=1).reshape((-1, 1))
   X /= sum_prob
   return X
   ```
   
   softmax in scikit-learn does not check whether the maxvalue is positive or 
not




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on pull request #32140: [WIP][SPARK-32922][SHUFFLE][CORE] Adds support for executors to fetch local and remote merged shuffle data

2021-06-14 Thread GitBox


otterc commented on pull request #32140:
URL: https://github.com/apache/spark/pull/32140#issuecomment-859181282


   This is still dependent on the changes in 
https://github.com/apache/spark/pull/32140 which has the protocol side of 
changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    4   5   6   7   8   9