[GitHub] [spark] Ngone51 commented on pull request #36162: [SPARK-32170][CORE] Improve the speculation through the stage task metrics.

2022-07-05 Thread GitBox


Ngone51 commented on PR #36162:
URL: https://github.com/apache/spark/pull/36162#issuecomment-1175852257

   This's a good idea to me. @weixiuli Could you help do some initial 
investigation on it? We can continue to improve this if it benifits.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #36564: [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status

2022-07-05 Thread GitBox


AngersZh commented on PR #36564:
URL: https://github.com/apache/spark/pull/36564#issuecomment-1175851865

   Yea


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] grundprinzip commented on a diff in pull request #37075: [SparkConnect] Initial Protobuf Definitions

2022-07-05 Thread GitBox


grundprinzip commented on code in PR #37075:
URL: https://github.com/apache/spark/pull/37075#discussion_r914475557


##
pom.xml:
##
@@ -116,7 +116,7 @@
 2.17.2
 
 3.3.3
-2.5.0
+3.21.1

Review Comment:
   Absolutely, the shading an relocation rules haven't been updated yet. I'm 
still a bit unclear what the best way to progress is to avoid conflicts with 
third-party packages or Spark consumers. I've discussed some approaches and one 
way would be to produce a shaded spark connect artifact that is then consumed 
in it's shaded version by spark itself or to shade and relocate after the build.
   
   However, this PR is mostly for discussing the proto interface.



##
pom.xml:
##
@@ -116,7 +116,7 @@
 2.17.2
 
 3.3.3
-2.5.0
+3.21.1

Review Comment:
   Absolutely, the shading and relocation rules haven't been updated yet. I'm 
still a bit unclear what the best way to progress is to avoid conflicts with 
third-party packages or Spark consumers. I've discussed some approaches and one 
way would be to produce a shaded spark connect artifact that is then consumed 
in it's shaded version by spark itself or to shade and relocate after the build.
   
   However, this PR is mostly for discussing the proto interface.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] allisonwang-db opened a new pull request, #37099: [SPARK-37287][SQL] Pull out dynamic partition and bucket sort from FileFormatWriter

2022-07-05 Thread GitBox


allisonwang-db opened a new pull request, #37099:
URL: https://github.com/apache/spark/pull/37099

   
   
   ### What changes were proposed in this pull request?
   
   This PR develops on top of the original PR 
https://github.com/apache/spark/pull/34568. It
   - Adds a new trait `V1WriteCommand` that extends the `DataWritingCommand` 
with an abstract method `requiredOrdering`.
   - Changes each of the following commands to extend `V1WriteCommand` and 
implement `requiredOrdering`:
 - CreateDataSourceTableAsSelectCommand
 - InsertIntoHadoopFsRelationCommand
 - InsertIntoHiveTable
 - CreateHiveTableAsSelectBase
   - Adds a new rule `V1Writes` (similar to `V2Writes`) to decide whether we 
should add a logical `Sort` operator based on the required ordering of a V1 
data writing command.
   - Adds a new config **spark.sql.plannedWrite.enabled** (default: true) to 
indicate if logical sorts have been added during the query optimization stage.
   
   ### Why are the changes needed?
   
   Simplify `FileFormatWriter` and unify the logic of V1 and V2 write commands.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   New unit tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #37087: [SPARK-39447][SQL][3.2] Avoid AssertionError in AdaptiveSparkPlanExec.doExecuteBroadcast

2022-07-05 Thread GitBox


cloud-fan commented on PR #37087:
URL: https://github.com/apache/spark/pull/37087#issuecomment-1175840358

   > Shall we also backport https://github.com/apache/spark/pull/36953 into 
branch-3.2 ?
   
   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #37020: [SPARK-34927][INFRA] Support TPCDSQueryBenchmark in Benchmarks

2022-07-05 Thread GitBox


LuciferYang commented on PR #37020:
URL: https://github.com/apache/spark/pull/37020#issuecomment-1175830680

   Sorry, there's a problem I didn't notice before.
   
   I found that `Generate an input dataset for TPCDSQueryBenchmark with 
SF=1`always be executed even if I didn't specify to run `TPCDSQueryBenchmark`, 
Is this what we want?  For example
   
   https://github.com/LuciferYang/spark/runs/7207453029?check_suite_focus=true
   
   I just want to run `MapStatusesConvertBenchmark`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang closed pull request #37086: [SPARK-39680][CORE] Remove `hasSpaceForAnotherRecord` from `UnsafeExternalSorter`

2022-07-05 Thread GitBox


LuciferYang closed pull request #37086: [SPARK-39680][CORE] Remove 
`hasSpaceForAnotherRecord` from `UnsafeExternalSorter` 
URL: https://github.com/apache/spark/pull/37086


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #37086: [SPARK-39680][CORE] Remove `hasSpaceForAnotherRecord` from `UnsafeExternalSorter`

2022-07-05 Thread GitBox


LuciferYang commented on PR #37086:
URL: https://github.com/apache/spark/pull/37086#issuecomment-1175821690

   thanks  for your review @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #37086: [SPARK-39680][CORE] Remove `hasSpaceForAnotherRecord` from `UnsafeExternalSorter`

2022-07-05 Thread GitBox


LuciferYang commented on PR #37086:
URL: https://github.com/apache/spark/pull/37086#issuecomment-1175821199

   > May I ask if you are currently some other refactoring on this area? I'm 
wondering if you want to encapsulate back for some reasons or not.
   
   Not yet, just see this method is no longer used. I will close this pr


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mskapilks opened a new pull request, #37098: [SPARK-39690][SQL] Fixes Reuse exchange across subqueries with AQE if subquery side exchange materialized first

2022-07-05 Thread GitBox


mskapilks opened a new pull request, #37098:
URL: https://github.com/apache/spark/pull/37098

   
   
   ### What changes were proposed in this pull request?
   
   When trying to reuse Exchange of a subquery in main plan, if the Exchange 
inside subquery materialize first then main ASPE node won't have that stage 
info (in 
[stageToReplace](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala#L243))
 to replace in current logical plan. This will cause AQE to produce new 
candidate physical plan without reusing the exchange present inside subquery. 
And depending on how complex the inner plan is (no. of exchanges) AQE could 
choose plan without ReusedExchange. 
   
   We have seen this with multiple queries with our private build. This can 
happen in DPP also.
   
   ### Why are the changes needed?
   
   To reuse exchange from subqueries in AQE
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Manually verified in unit test `SPARK-34637: DPP side broadcast query stage 
is created firstly` 
[Code](https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala#L1426).
 The last candidate physical plan created by AQE in this test does not have 
reuse exchange before this change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37086: [SPARK-39680][CORE] Remove `hasSpaceForAnotherRecord` from `UnsafeExternalSorter`

2022-07-05 Thread GitBox


dongjoon-hyun commented on code in PR #37086:
URL: https://github.com/apache/spark/pull/37086#discussion_r914446291


##
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:
##
@@ -562,10 +562,6 @@ public UnsafeSorterIterator getSortedIterator() throws 
IOException {
 }
   }
 
-  @VisibleForTesting boolean hasSpaceForAnotherRecord() {

Review Comment:
   Thank you for making a PR, but I'd not remove this. Although this is not 
used currently, this was added once inevitably. I feel this removal might 
become a too hasty decision because `UnsafeExternalSorter` is designed to be 
based on `UnsafeInMemorySorter` and 
`UnsafeInMemorySorter.hasSpaceForAnotherRecord()` is a useful public method. It 
still looks okay to me to have this `VisibleForTesting` method to expose the 
underlying structure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37086: [SPARK-39680][CORE] Remove `hasSpaceForAnotherRecord` from `UnsafeExternalSorter`

2022-07-05 Thread GitBox


dongjoon-hyun commented on code in PR #37086:
URL: https://github.com/apache/spark/pull/37086#discussion_r914446291


##
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:
##
@@ -562,10 +562,6 @@ public UnsafeSorterIterator getSortedIterator() throws 
IOException {
 }
   }
 
-  @VisibleForTesting boolean hasSpaceForAnotherRecord() {

Review Comment:
   Thank you for making a PR, but I'd not remove this. Although this is not 
used currently, this was added once inevitably. I feel this removal might be 
too hasty decision because `UnsafeExternalSorter` is designed to be based on 
`UnsafeInMemorySorter` and `UnsafeInMemorySorter.hasSpaceForAnotherRecord()` is 
a useful public method. It still looks okay to me to have this 
`VisibleForTesting` method to expose the underlying structure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #37075: [SparkConnect] Initial Protobuf Definitions

2022-07-05 Thread GitBox


LuciferYang commented on code in PR #37075:
URL: https://github.com/apache/spark/pull/37075#discussion_r914442245


##
pom.xml:
##
@@ -116,7 +116,7 @@
 2.17.2
 
 3.3.3
-2.5.0
+3.21.1

Review Comment:
   Maybe we should shade and relocation `protobuf` in spark to avoid  potential 
conflicts with other third-party libraries, such as hadoop



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #37075: [SparkConnect] Initial Protobuf Definitions

2022-07-05 Thread GitBox


LuciferYang commented on code in PR #37075:
URL: https://github.com/apache/spark/pull/37075#discussion_r914442245


##
pom.xml:
##
@@ -116,7 +116,7 @@
 2.17.2
 
 3.3.3
-2.5.0
+3.21.1

Review Comment:
   Maybe we should shade and relocation `protobuf-java` in spark to avoid  
potential conflicts with other third-party libraries, such as hadoop



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37087: [SPARK-39447][SQL][3.2] Avoid AssertionError in AdaptiveSparkPlanExec.doExecuteBroadcast

2022-07-05 Thread GitBox


dongjoon-hyun commented on code in PR #37087:
URL: https://github.com/apache/spark/pull/37087#discussion_r914436645


##
sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala:
##
@@ -1597,6 +1597,25 @@ class DynamicPartitionPruningV1SuiteAEOff extends 
DynamicPartitionPruningV1Suite
 class DynamicPartitionPruningV1SuiteAEOn extends DynamicPartitionPruningV1Suite
   with EnableAdaptiveExecutionSuite {
 
+  test("SPARK-39447: Avoid AssertionError in 
AdaptiveSparkPlanExec.doExecuteBroadcast") {

Review Comment:
   Do you mean this test case doesn't fail currently in branch-3.2 because we 
don't have SPARK-39551, @ulysses-you ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #37097: [SPARK-39616][BUILD][ML][FOLLOWUP] Fix flaky doctests

2022-07-05 Thread GitBox


dongjoon-hyun commented on PR #37097:
URL: https://github.com/apache/spark/pull/37097#issuecomment-1175804926

   Thank you, @zhengruifeng , @HyukjinKwon , @Yikun .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #37097: [SPARK-39616][BUILD][ML][FOLLOWUP] Fix flaky doctests

2022-07-05 Thread GitBox


dongjoon-hyun closed pull request #37097: [SPARK-39616][BUILD][ML][FOLLOWUP] 
Fix flaky doctests
URL: https://github.com/apache/spark/pull/37097


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] singhpk234 commented on pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-07-05 Thread GitBox


singhpk234 commented on PR #37083:
URL: https://github.com/apache/spark/pull/37083#issuecomment-1175798196

   Thanks @wangyum !
   
   > So enabling spark.sql.cbo.enabled is what you want?
   
   I believe then setting `spark.sql.cbo.enabled` to true by default could 
help, (what i wanted was to take this valuable stat of row count, bubbled up 
from v2 connector to be accounted for in default spark behaviour) but I think 
it requires some additional efforts, since our other defaults such as auto-bhj 
etc needs to adjusted accordingly.
   
   > I think it's by design
   
   for my knowledge, can you please point me to some jira's ,happy to learn 
more.
   
   Love to know your thoughts on the same, Happy to close this as well if we 
consider this is not a problem at all.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Yikun commented on a diff in pull request #37005: [SPARK-39522][INFRA]Uses Docker image cache over a custom image in pyspark job

2022-07-05 Thread GitBox


Yikun commented on code in PR #37005:
URL: https://github.com/apache/spark/pull/37005#discussion_r914420987


##
.github/workflows/build_and_test.yml:
##
@@ -805,3 +853,27 @@ jobs:
   with:
 name: unit-tests-log-docker-integration--8-${{ inputs.hadoop }}-hive2.3
 path: "**/target/unit-tests.log"
+
+  # Note that: there are only GHCR creation permission for secrets.GITHUB_TOKEN
+  # If you want to clean up the CI images, you need to:
+  # - 1. Generate the token from https://github.com/settings/tokens with 
`write:packages` and `delete:packages`
+  # - 2. Add the the token as secrets `GHCR_DEL` in 
https://github.com/{username}}/spark/settings/secrets/actions
+  infra-image-post:
+# Always runs after pyspark/sparkr/lint have completed, regardless of 
whether they were successful
+if: always()
+needs: [pyspark, sparkr, lint]
+runs-on: ubuntu-20.04
+env:
+  GHCR_DEL: ${{ secrets.GHCR_DEL }}

Review Comment:
   
https://docs.github.com/en/actions/using-workflows/reusing-workflows#using-inputs-and-secrets-in-a-reusable-workflow
   
   need to define secrets in main workflow



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on pull request #37087: [SPARK-39447][SQL][3.2] Avoid AssertionError in AdaptiveSparkPlanExec.doExecuteBroadcast

2022-07-05 Thread GitBox


ulysses-you commented on PR #37087:
URL: https://github.com/apache/spark/pull/37087#issuecomment-1175757982

   @cloud-fan  oh, this patch is based on #36953 but it does not backport into 
branch-3.2. Shall we also backport #36953 into branch-3.2 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a diff in pull request #37097: [SPARK-39616][BUILD][ML][FOLLOWUP] Skip flaky doctests

2022-07-05 Thread GitBox


zhengruifeng commented on code in PR #37097:
URL: https://github.com/apache/spark/pull/37097#discussion_r914375796


##
python/pyspark/mllib/linalg/distributed.py:
##
@@ -423,7 +423,7 @@ def computeSVD(
 [DenseVector([-0.7071, 0.7071]), DenseVector([-0.7071, -0.7071])]
 >>> svd_model.s
 DenseVector([3.4641, 3.1623])
->>> svd_model.V # doctest: +ELLIPSIS
+>>> svd_model.V # doctest: +SKIP
 DenseMatrix(3, 2, [-0.4082, -0.8165, -0.4082, 0.8944, -0.4472, 0.0], 0)

Review Comment:
   good idea



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37097: [SPARK-39616][BUILD][ML][FOLLOWUP] Skip flaky doctests

2022-07-05 Thread GitBox


HyukjinKwon commented on code in PR #37097:
URL: https://github.com/apache/spark/pull/37097#discussion_r914374186


##
python/pyspark/mllib/linalg/distributed.py:
##
@@ -423,7 +423,7 @@ def computeSVD(
 [DenseVector([-0.7071, 0.7071]), DenseVector([-0.7071, -0.7071])]
 >>> svd_model.s
 DenseVector([3.4641, 3.1623])
->>> svd_model.V # doctest: +ELLIPSIS
+>>> svd_model.V # doctest: +SKIP
 DenseMatrix(3, 2, [-0.4082, -0.8165, -0.4082, 0.8944, -0.4472, 0.0], 0)

Review Comment:
   Or you can change like this too I believe. 
   
   ```suggestion
   DenseMatrix(3, 2, [-0.4082, -0.8165, -0.4082, 0.8944, -0.4472, 
...0.0], 0)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng opened a new pull request, #37097: [SPARK-39616][BUILD][ML][FOLLOWUP] Skip flaky doctests

2022-07-05 Thread GitBox


zhengruifeng opened a new pull request, #37097:
URL: https://github.com/apache/spark/pull/37097

   ### What changes were proposed in this pull request?
   Skip flaky doctests
   
   
   ### Why are the changes needed?
   ```
   File "/__w/spark/spark/python/pyspark/mllib/linalg/distributed.py", line 
859, in __main__.IndexedRowMatrix.computeSVD
   Failed example:
   svd_model.V # doctest: +ELLIPSIS
   Expected:
   DenseMatrix(3, 2, [-0.4082, -0.8165, -0.4082, 0.8944, -0.4472, 0.0], 0)
   Got:
   DenseMatrix(3, 2, [-0.4082, -0.8165, -0.4082, 0.8944, -0.4472, -0.0], 0)
   **
   File "/__w/spark/spark/python/pyspark/mllib/linalg/distributed.py", line 
426, in __main__.RowMatrix.computeSVD
   Failed example:
   svd_model.V # doctest: +ELLIPSIS
   Expected:
   DenseMatrix(3, 2, [-0.4082, -0.8165, -0.4082, 0.8944, -0.4472, 0.0], 0)
   Got:
   DenseMatrix(3, 2, [-0.4082, -0.8165, -0.4082, 0.8944, -0.4472, -0.0], 0)
   **
  1 of   6 in __main__.IndexedRowMatrix.computeSVD
  1 of   6 in __main__.RowMatrix.computeSVD
   ***Test Failed*** 2 failures.
   Had test failures in pyspark.mllib.linalg.distributed with python3.9; see 
logs.
   ```
   
   https://github.com/apache/spark/pull/37002 occasionally cause above tests 
output `-0.0` instead of `0.0`, I think they are both acceptable.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   updated doctests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xinrong-databricks commented on a diff in pull request #37038: [SPARK-39648][PYTHON][DOC] Fix type hints of `like`, `rlike`, `ilike` of Column

2022-07-05 Thread GitBox


xinrong-databricks commented on code in PR #37038:
URL: https://github.com/apache/spark/pull/37038#discussion_r914357361


##
python/pyspark/sql/column.py:
##
@@ -656,9 +668,9 @@ def __iter__(self) -> None:
 """
 
 contains = _bin_op("contains", _contains_doc)
-rlike = _bin_op("rlike", _rlike_doc)
-like = _bin_op("like", _like_doc)
-ilike = _bin_op("ilike", _ilike_doc)
+rlike = _bin_op("rlike", _rlike_doc, is_other_str=True)

Review Comment:
   Sounds good! Defined `_bin_op_other_str`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xinrong-databricks commented on a diff in pull request #37038: [SPARK-39648][PYTHON][DOC] Fix type hints of `like`, `rlike`, `ilike` of Column

2022-07-05 Thread GitBox


xinrong-databricks commented on code in PR #37038:
URL: https://github.com/apache/spark/pull/37038#discussion_r914357361


##
python/pyspark/sql/column.py:
##
@@ -656,9 +668,9 @@ def __iter__(self) -> None:
 """
 
 contains = _bin_op("contains", _contains_doc)
-rlike = _bin_op("rlike", _rlike_doc)
-like = _bin_op("like", _like_doc)
-ilike = _bin_op("ilike", _ilike_doc)
+rlike = _bin_op("rlike", _rlike_doc, is_other_str=True)

Review Comment:
   Sounds good! Defined `_bin_op_other_str`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pan3793 commented on a diff in pull request #36991: [SPARK-39601][YARN] AllocationFailure should not be treated as exitCausedByApp when driver is shutting down

2022-07-05 Thread GitBox


pan3793 commented on code in PR #36991:
URL: https://github.com/apache/spark/pull/36991#discussion_r914354354


##
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala:
##
@@ -859,7 +859,8 @@ private[yarn] class YarnAllocator(
 // 
.com/apache/hadoop/blob/228156cfd1b474988bc4fedfbf7edddc87db41e3/had
 // 
oop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/ap
 // ache/hadoop/yarn/util/Apps.java#L273 for details)
-if 
(NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(other_exit_status)) {
+if 
(NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(other_exit_status) ||
+SparkContext.getActive.forall(_.isStopped)) {

Review Comment:
   The SparkContext set `stopped` to false at the beginning of the stop 
procedure, do you have other suggestions to check if spark context is stopped?
   
   
https://github.com/apache/spark/blob/c1d1ec5f3bde0b46bc34a161935ebfc2c41ec153/core/src/main/scala/org/apache/spark/SparkContext.scala#L2056-L2065



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Yikun commented on pull request #37003: [SPARK-39522][INFRA] Add Apache Spark infra GA image cache

2022-07-05 Thread GitBox


Yikun commented on PR #37003:
URL: https://github.com/apache/spark/pull/37003#issuecomment-1175683700

   - Cache build works well: 
https://github.com/apache/spark/actions/workflows/build_infra_images_cache.yml
   - Image has been published in: 
https://github.com/apache/spark/pkgs/container/spark%2Fapache-spark-github-action-image-cache
   - Image build from cache works well: 
https://github.com/Yikun/spark/runs/7206936628?check_suite_focus=true#step:7:102


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #37092: [SPARK-39687][PYTHON][DOCS] Make sure new catalog methods listed in API reference

2022-07-05 Thread GitBox


zhengruifeng commented on PR #37092:
URL: https://github.com/apache/spark/pull/37092#issuecomment-1175681321

   Thank you @HyukjinKwon  for remindering me to check the doc and helping view


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on pull request #37090: [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT`

2022-07-05 Thread GitBox


beliefer commented on PR #37090:
URL: https://github.com/apache/spark/pull/37090#issuecomment-1175679627

   ping @huaxingao cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on pull request #36696: [SPARK-39312][SQL] Use parquet native In predicate for in filter push down

2022-07-05 Thread GitBox


wangyum commented on PR #36696:
URL: https://github.com/apache/spark/pull/36696#issuecomment-1175662846

   Could you update the `FilterPushdownBenchmark` results?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-07-05 Thread GitBox


wangyum commented on PR #37083:
URL: https://github.com/apache/spark/pull/37083#issuecomment-1175649786

   I think it's by design. So enabling `spark.sql.cbo.enabled` is what you want?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #37002: [SPARK-39616][BUILD][ML] Upgrade Breeze to 2.0

2022-07-05 Thread GitBox


zhengruifeng commented on PR #37002:
URL: https://github.com/apache/spark/pull/37002#issuecomment-1175640643

   @HyukjinKwon @dongjoon-hyun let me take a look, thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #37002: [SPARK-39616][BUILD][ML] Upgrade Breeze to 2.0

2022-07-05 Thread GitBox


dongjoon-hyun commented on PR #37002:
URL: https://github.com/apache/spark/pull/37002#issuecomment-1175634920

   Thank you for reporting that, @HyukjinKwon .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #37003: [SPARK-39522][INFRA] Add Apache Spark infra GA image cache

2022-07-05 Thread GitBox


HyukjinKwon closed pull request #37003: [SPARK-39522][INFRA] Add Apache Spark 
infra GA image cache
URL: https://github.com/apache/spark/pull/37003


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #37003: [SPARK-39522][INFRA] Add Apache Spark infra GA image cache

2022-07-05 Thread GitBox


HyukjinKwon commented on PR #37003:
URL: https://github.com/apache/spark/pull/37003#issuecomment-1175633739

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Yikun commented on pull request #37003: [SPARK-39522][INFRA] Add Apache Spark infra GA image cache

2022-07-05 Thread GitBox


Yikun commented on PR #37003:
URL: https://github.com/apache/spark/pull/37003#issuecomment-1175633700

   @HyukjinKwon Yes, I think it ok to go! Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #37075: [SparkConnect] Initial Protobuf Definitions

2022-07-05 Thread GitBox


HyukjinKwon commented on PR #37075:
URL: https://github.com/apache/spark/pull/37075#issuecomment-1175628073

   Let me turn this to a draft since it's WIP


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #37002: [SPARK-39616][BUILD][ML] Upgrade Breeze to 2.0

2022-07-05 Thread GitBox


HyukjinKwon commented on PR #37002:
URL: https://github.com/apache/spark/pull/37002#issuecomment-1175625302

   @zhengruifeng seems like there's one test flaky presumably after this:
   
   ```
   
   File "/__w/spark/spark/python/pyspark/mllib/linalg/distributed.py", line 
859, in __main__.IndexedRowMatrix.computeSVD
   Failed example:
   svd_model.V # doctest: +ELLIPSIS
   Expected:
   DenseMatrix(3, 2, [-0.4082, -0.8165, -0.4082, 0.8944, -0.4472, 0.0], 0)
   Got:
   DenseMatrix(3, 2, [-0.4082, -0.8165, -0.4082, 0.8944, -0.4472, -0.0], 0)
   **
   File "/__w/spark/spark/python/pyspark/mllib/linalg/distributed.py", line 
426, in __main__.RowMatrix.computeSVD
   Failed example:
   svd_model.V # doctest: +ELLIPSIS
   Expected:
   DenseMatrix(3, 2, [-0.4082, -0.8165, -0.4082, 0.8944, -0.4472, 0.0], 0)
   Got:
   DenseMatrix(3, 2, [-0.4082, -0.8165, -0.4082, 0.8944, -0.4472, -0.0], 0)
   **
  1 of   6 in __main__.IndexedRowMatrix.computeSVD
  1 of   6 in __main__.RowMatrix.computeSVD
   ***Test Failed*** 2 failures.
   Had test failures in pyspark.mllib.linalg.distributed with python3.9; see 
logs.
   ```
   
   Mind taking a look please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] kazuyukitanimura commented on a diff in pull request #37096: [SPARK-39584][SQL][TESTS] Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results

2022-07-05 Thread GitBox


kazuyukitanimura commented on code in PR #37096:
URL: https://github.com/apache/spark/pull/37096#discussion_r914297412


##
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala:
##
@@ -65,23 +67,19 @@ object TPCDSQueryBenchmark extends SqlBasedBenchmark with 
Logging {
 "web_returns", "web_site", "reason", "call_center", "warehouse", 
"ship_mode", "income_band",
 "time_dim", "web_page")
 
-  def setupTables(dataLocation: String, createTempView: Boolean): Map[String, 
Long] = {
+  def setupTables(dataLocation: String, tableColumns: Map[String, 
StructType]): Map[String, Long] =
 tables.map { tableName =>
-  if (createTempView) {

Review Comment:
   It was initially introduced by #31011
   My understanding is that the temp view was disabled for CBO because CBO runs
   `spark.sql(s"ANALYZE TABLE $tableName COMPUTE STATISTICS FOR ALL COLUMNS")`
   
https://github.com/apache/spark/pull/31011/files#diff-f0ef9be2f138cb947253f07c4285a8cf6b054355cf53beb2ba70ce82a380356bR173
   
   `ANALYZE TABLE` requires actual table rather than temp view. Otherwise, it 
will throw `org.apache.spark.sql.AnalysisException: Temporary view is not 
cached for analyzing columns.`
   
   @maropu please chime in if I said anything wrong.
   
   With this PR, all input data are creating table with the forced schema.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] kazuyukitanimura commented on a diff in pull request #37096: [SPARK-39584][SQL][TESTS] Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results

2022-07-05 Thread GitBox


kazuyukitanimura commented on code in PR #37096:
URL: https://github.com/apache/spark/pull/37096#discussion_r914297412


##
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala:
##
@@ -65,23 +67,19 @@ object TPCDSQueryBenchmark extends SqlBasedBenchmark with 
Logging {
 "web_returns", "web_site", "reason", "call_center", "warehouse", 
"ship_mode", "income_band",
 "time_dim", "web_page")
 
-  def setupTables(dataLocation: String, createTempView: Boolean): Map[String, 
Long] = {
+  def setupTables(dataLocation: String, tableColumns: Map[String, 
StructType]): Map[String, Long] =
 tables.map { tableName =>
-  if (createTempView) {

Review Comment:
   It was initially introduced by #31011
   My understanding is that the temp view was disabled for CBO because CBO runs
   `spark.sql(s"ANALYZE TABLE $tableName COMPUTE STATISTICS FOR ALL COLUMNS")`
   
https://github.com/apache/spark/pull/31011/files#diff-f0ef9be2f138cb947253f07c4285a8cf6b054355cf53beb2ba70ce82a380356bR173
   
   `ANALYZE TABLE` requires actual table rather than temp view. Otherwise, it 
will throw `org.apache.spark.sql.AnalysisException`
   
   @maropu please chime in if I said anything wrong.
   
   With this PR, all input data are creating table with the forced schema.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37096: [SPARK-39584][SQL][TESTS] Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results

2022-07-05 Thread GitBox


dongjoon-hyun commented on code in PR #37096:
URL: https://github.com/apache/spark/pull/37096#discussion_r914282181


##
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala:
##
@@ -65,23 +67,19 @@ object TPCDSQueryBenchmark extends SqlBasedBenchmark with 
Logging {
 "web_returns", "web_site", "reason", "call_center", "warehouse", 
"ship_mode", "income_band",
 "time_dim", "web_page")
 
-  def setupTables(dataLocation: String, createTempView: Boolean): Map[String, 
Long] = {
+  def setupTables(dataLocation: String, tableColumns: Map[String, 
StructType]): Map[String, Long] =
 tables.map { tableName =>
-  if (createTempView) {

Review Comment:
   Do you know the reason why we use `createTempView = 
!benchmarkArgs.cboEnabled` before? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #37003: [SPARK-39522][INFRA] Add Apache Spark infra GA image cache

2022-07-05 Thread GitBox


HyukjinKwon commented on PR #37003:
URL: https://github.com/apache/spark/pull/37003#issuecomment-1175598372

   @Yikun is it good to go?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #37092: [SPARK-39687][PYTHON][DOCS] Make sure new catalog methods listed in API reference

2022-07-05 Thread GitBox


HyukjinKwon closed pull request #37092: [SPARK-39687][PYTHON][DOCS] Make sure 
new catalog methods listed in API reference
URL: https://github.com/apache/spark/pull/37092


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #37092: [SPARK-39687][PYTHON][DOCS] Make sure new catalog methods listed in API reference

2022-07-05 Thread GitBox


HyukjinKwon commented on PR #37092:
URL: https://github.com/apache/spark/pull/37092#issuecomment-1175597057

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #37091: [SPARK-39686][INFRA] Disable scheduled builds that did not pass even once

2022-07-05 Thread GitBox


HyukjinKwon closed pull request #37091: [SPARK-39686][INFRA] Disable scheduled 
builds that did not pass even once
URL: https://github.com/apache/spark/pull/37091


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #37091: [SPARK-39686][INFRA] Disable scheduled builds that did not pass even once

2022-07-05 Thread GitBox


HyukjinKwon commented on PR #37091:
URL: https://github.com/apache/spark/pull/37091#issuecomment-1175596012

   Thanks all!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #37091: [SPARK-39686][INFRA] Disable scheduled builds that did not pass even once

2022-07-05 Thread GitBox


HyukjinKwon commented on PR #37091:
URL: https://github.com/apache/spark/pull/37091#issuecomment-1175595952

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Jonathancui123 commented on pull request #36871: [SPARK-39469][SQL] Infer date type for CSV schema inference

2022-07-05 Thread GitBox


Jonathancui123 commented on PR #36871:
URL: https://github.com/apache/spark/pull/36871#issuecomment-1175592703

   > would appreciate it if you summarize any unaddressed comments or concerns. 
I am fine given that we disable this here by default.
   
   @HyukjinKwon I don't see any remaining unaddressed comments - I believe we 
are ready to merge this PR. Here are the resolutions to potential concerns:
   
   - No backwards compatibility concern: `Legacy` parser policy cannot be used 
with `inferDate`. We can open another ticket to add a 
`LegacyStrictSimpleDateFormatter` [as Bruce 
suggested](https://github.com/apache/spark/pull/36871#discussion_r905348742) if 
necessary. 
   - Performance degradation: Benchmark results ([in the bottom of PR 
description](https://github.com/apache/spark/pull/36871#issue-1271523396)) show 
that date inference greatly slows down schema inference for CSVs with dates. 
This feature is guarded by the CSV Option `inferDate` and is off by default. 
   - Logical errors for strict parsing and converting Date to TimestampNTZ have 
been fixed and have unit tests
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Jonathancui123 commented on a diff in pull request #36871: [SPARK-39469][SQL] Infer date type for CSV schema inference

2022-07-05 Thread GitBox


Jonathancui123 commented on code in PR #36871:
URL: https://github.com/apache/spark/pull/36871#discussion_r909084899


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala:
##
@@ -197,34 +198,46 @@ class UnivocityParser(
 Decimal(decimalParser(datum), dt.precision, dt.scale)
   }
 
-case _: TimestampType => (d: String) =>
+case _: DateType => (d: String) =>
   nullSafeDatum(d, name, nullable, options) { datum =>
 try {
-  timestampFormatter.parse(datum)
+  dateFormatter.parse(datum)
 } catch {
   case NonFatal(e) =>
 // If fails to parse, then tries the way used in 2.0 and 1.x for 
backwards
 // compatibility.
 val str = 
DateTimeUtils.cleanLegacyTimestampStr(UTF8String.fromString(datum))
-DateTimeUtils.stringToTimestamp(str, 
options.zoneId).getOrElse(throw e)
+DateTimeUtils.stringToDate(str).getOrElse(throw e)
 }
   }
 
-case _: TimestampNTZType => (d: String) =>
-  nullSafeDatum(d, name, nullable, options) { datum =>
-timestampNTZFormatter.parseWithoutTimeZone(datum, false)
-  }
-
-case _: DateType => (d: String) =>
+case _: TimestampType => (d: String) =>
   nullSafeDatum(d, name, nullable, options) { datum =>
 try {
-  dateFormatter.parse(datum)
+  timestampFormatter.parse(datum)
 } catch {
   case NonFatal(e) =>
 // If fails to parse, then tries the way used in 2.0 and 1.x for 
backwards
 // compatibility.
 val str = 
DateTimeUtils.cleanLegacyTimestampStr(UTF8String.fromString(datum))
-DateTimeUtils.stringToDate(str).getOrElse(throw e)
+DateTimeUtils.stringToTimestamp(str, options.zoneId).getOrElse {
+  // There may be date type entries in timestamp column due to 
schema inference
+  if (options.inferDate) {
+daysToMicros(dateFormatter.parse(datum), options.zoneId)
+  } else {
+throw(e)
+  }
+}
+}
+  }
+
+case _: TimestampNTZType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options) { datum =>
+try {
+  timestampNTZFormatter.parseWithoutTimeZone(datum, false)
+} catch {
+  case NonFatal(e) if (options.inferDate) =>
+daysToMicros(dateFormatter.parse(datum), options.zoneId)

Review Comment:
   Wow great catch! I hadn't fully considered the effect of user timeZone on 
TimestampNTZ parsing. 
   
   I've fixed the error and I've modified this test to have a PST user and 
check that the parsed date is converted to a timestamp in UTC. 
   
   
https://github.com/Jonathancui123/spark/blob/8df7ebbd6c1c0d5bb875ce554026f9b9aeae148e/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/UnivocityParserSuite.scala#L368-L374
 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Jonathancui123 commented on a diff in pull request #36871: [SPARK-39469][SQL] Infer date type for CSV schema inference

2022-07-05 Thread GitBox


Jonathancui123 commented on code in PR #36871:
URL: https://github.com/apache/spark/pull/36871#discussion_r909084899


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala:
##
@@ -197,34 +198,46 @@ class UnivocityParser(
 Decimal(decimalParser(datum), dt.precision, dt.scale)
   }
 
-case _: TimestampType => (d: String) =>
+case _: DateType => (d: String) =>
   nullSafeDatum(d, name, nullable, options) { datum =>
 try {
-  timestampFormatter.parse(datum)
+  dateFormatter.parse(datum)
 } catch {
   case NonFatal(e) =>
 // If fails to parse, then tries the way used in 2.0 and 1.x for 
backwards
 // compatibility.
 val str = 
DateTimeUtils.cleanLegacyTimestampStr(UTF8String.fromString(datum))
-DateTimeUtils.stringToTimestamp(str, 
options.zoneId).getOrElse(throw e)
+DateTimeUtils.stringToDate(str).getOrElse(throw e)
 }
   }
 
-case _: TimestampNTZType => (d: String) =>
-  nullSafeDatum(d, name, nullable, options) { datum =>
-timestampNTZFormatter.parseWithoutTimeZone(datum, false)
-  }
-
-case _: DateType => (d: String) =>
+case _: TimestampType => (d: String) =>
   nullSafeDatum(d, name, nullable, options) { datum =>
 try {
-  dateFormatter.parse(datum)
+  timestampFormatter.parse(datum)
 } catch {
   case NonFatal(e) =>
 // If fails to parse, then tries the way used in 2.0 and 1.x for 
backwards
 // compatibility.
 val str = 
DateTimeUtils.cleanLegacyTimestampStr(UTF8String.fromString(datum))
-DateTimeUtils.stringToDate(str).getOrElse(throw e)
+DateTimeUtils.stringToTimestamp(str, options.zoneId).getOrElse {
+  // There may be date type entries in timestamp column due to 
schema inference
+  if (options.inferDate) {
+daysToMicros(dateFormatter.parse(datum), options.zoneId)
+  } else {
+throw(e)
+  }
+}
+}
+  }
+
+case _: TimestampNTZType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options) { datum =>
+try {
+  timestampNTZFormatter.parseWithoutTimeZone(datum, false)
+} catch {
+  case NonFatal(e) if (options.inferDate) =>
+daysToMicros(dateFormatter.parse(datum), options.zoneId)

Review Comment:
   Wow great catch! I hadn't fully considered the effect of user timeZone on 
TimestampNTZ parsing. 
   
   I've fixed the error and I've modified this test to have a PST user and 
check that the parsed date is converted to a timestamp in UTC. 
   
   
https://github.com/Jonathancui123/spark/blob/8df7ebbd6c1c0d5bb875ce554026f9b9aeae148e/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/UnivocityParserSuite.scala#L368-L374



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] kazuyukitanimura commented on pull request #37096: [SPARK-39584][SQL][TESTS] Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results

2022-07-05 Thread GitBox


kazuyukitanimura commented on PR #37096:
URL: https://github.com/apache/spark/pull/37096#issuecomment-1175537588

   cc @dongjoon-hyun @sunchao @viirya @wangyum @LuciferYang @HyukjinKwon 
@peter-toth @maropu 
   Appreciate your feedback!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] kazuyukitanimura opened a new pull request, #37096: [SPARK-39584][SQL][TESTS] Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results

2022-07-05 Thread GitBox


kazuyukitanimura opened a new pull request, #37096:
URL: https://github.com/apache/spark/pull/37096

   ### What changes were proposed in this pull request?
   `GenTPCDSData` uses the schema defined in `TPCDSSchema` that contains 
`char(N)`. When `GenTPCDSData` generates parquet files, that pads spaces for 
strings whose lengths are `< N`.
   
   When `TPCDSQueryBenchmark` reads data from parquet generated by 
`GenTPCDSData`, it uses schema from the parquet file and keeps the paddings. 
Due to the extra spaces, string filter queries of TPC-DS fail to match. For 
example, `q13` query results are all nulls and returns too fast because string 
filter does not meet any rows.
   
   This PR proposes to pass the schema definition to the table creation before 
reading in order to fix the issue. This is similar to what the Spark TPC-DS 
unit tests do. In particular, this PR uses `createTable(tableName, source, 
schema, options)` interface.
   
   History related to the `char` issue 
https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn
   
   
   ### Why are the changes needed?
   Currently, `TPCDSQueryBenchmark` is benchmarking with wrong query results 
and that is showing inaccurate performance results. Therefore, the `Per 
Row(ns)` column of `TPCDSQueryBenchmark-results.txt` is `Infinity`. With this 
PR, the column now shows real numbers.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Tested on Github Actions.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #37074: [SPARK-39672][SQL][3.1] Don't remove project before filter for IN or correlated EXISTS subquery

2022-07-05 Thread GitBox


dongjoon-hyun commented on PR #37074:
URL: https://github.com/apache/spark/pull/37074#issuecomment-1175515574

   FYI, cc @viirya and @huaxingao 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #37042: [SPARK-39564][SQL][FOLLOWUP] Consider the case of serde available in CatalogTable on explain string for LogicalRelation

2022-07-05 Thread GitBox


HeartSaVioR commented on PR #37042:
URL: https://github.com/apache/spark/pull/37042#issuecomment-1175475311

   Thanks all for reviewing and merging!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #37095: [SPARK-39688][K8S] `getReusablePVCs` should handle accounts with no PVC permission

2022-07-05 Thread GitBox


dongjoon-hyun closed pull request #37095: [SPARK-39688][K8S] `getReusablePVCs` 
should handle accounts with no PVC permission
URL: https://github.com/apache/spark/pull/37095


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #37095: [SPARK-39688][K8S] `getReusablePVCs` should handle accounts with no PVC permission

2022-07-05 Thread GitBox


dongjoon-hyun commented on PR #37095:
URL: https://github.com/apache/spark/pull/37095#issuecomment-1175466973

   Thank you, @viirya . K8s test passed.
   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao closed pull request #37071: [SPARK-39656][SQL][3.3] Fix wrong namespace in DescribeNamespaceExec

2022-07-05 Thread GitBox


huaxingao closed pull request #37071: [SPARK-39656][SQL][3.3] Fix wrong 
namespace in DescribeNamespaceExec
URL: https://github.com/apache/spark/pull/37071


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao closed pull request #37072: [SPARK-39656][SQL][3.2] Fix wrong namespace in DescribeNamespaceExec

2022-07-05 Thread GitBox


huaxingao closed pull request #37072: [SPARK-39656][SQL][3.2] Fix wrong 
namespace in DescribeNamespaceExec
URL: https://github.com/apache/spark/pull/37072


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] kazuyukitanimura commented on pull request #37020: [SPARK-34927][INFRA] Support TPCDSQueryBenchmark in Benchmarks

2022-07-05 Thread GitBox


kazuyukitanimura commented on PR #37020:
URL: https://github.com/apache/spark/pull/37020#issuecomment-1175433010

   Thank you Dongjoon! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #37020: [SPARK-34927][INFRA] Support TPCDSQueryBenchmark in Benchmarks

2022-07-05 Thread GitBox


dongjoon-hyun closed pull request #37020: [SPARK-34927][INFRA] Support 
TPCDSQueryBenchmark in Benchmarks
URL: https://github.com/apache/spark/pull/37020


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #37071: [SPARK-39656][SQL][3.3] Fix wrong namespace in DescribeNamespaceExec

2022-07-05 Thread GitBox


huaxingao commented on PR #37071:
URL: https://github.com/apache/spark/pull/37071#issuecomment-1175431435

   Merged to 3.3. Thanks @ulysses-you 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #37072: [SPARK-39656][SQL][3.2] Fix wrong namespace in DescribeNamespaceExec

2022-07-05 Thread GitBox


huaxingao commented on PR #37072:
URL: https://github.com/apache/spark/pull/37072#issuecomment-1175429363

   Merged to 3.2. Thanks @ulysses-you 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #37002: [SPARK-39616][BUILD][ML] Upgrade Breeze to 2.0

2022-07-05 Thread GitBox


dongjoon-hyun closed pull request #37002: [SPARK-39616][BUILD][ML] Upgrade 
Breeze to 2.0
URL: https://github.com/apache/spark/pull/37002


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #37002: [SPARK-39616][BUILD][ML] Upgrade Breeze to 2.0

2022-07-05 Thread GitBox


dongjoon-hyun commented on PR #37002:
URL: https://github.com/apache/spark/pull/37002#issuecomment-1175426740

   Thank you, @zhengruifeng , @srowen , @HyukjinKwon . Merged to master for 
Apache Spark 3.4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #37095: [SPARK-39688][K8S] `getReusablePVCs` should handle accounts with no PVC permission

2022-07-05 Thread GitBox


dongjoon-hyun commented on PR #37095:
URL: https://github.com/apache/spark/pull/37095#issuecomment-1175425035

   Could you review this when you have some time, @viirya ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request, #37095: [SPARK-39688][K8S] getReusablePVCs should handle accounts with no PVC permission

2022-07-05 Thread GitBox


dongjoon-hyun opened a new pull request, #37095:
URL: https://github.com/apache/spark/pull/37095

   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on a diff in pull request #37052: [SPARK-39647][CORE] Register the executor with ESS before registering the BlockManager

2022-07-05 Thread GitBox


attilapiros commented on code in PR #37052:
URL: https://github.com/apache/spark/pull/37052#discussion_r914058583


##
core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala:
##
@@ -2177,6 +2177,43 @@ class BlockManagerSuite extends SparkFunSuite with 
Matchers with BeforeAndAfterE
 assert(kryoException.getMessage === "java.io.IOException: Input/output 
error")
   }
 
+  test("SPARK-39647: Failure to register with ESS should prevent registering 
the BM") {
+val timeoutExec = "timeoutExec"

Review Comment:
   nit: this val is not needed anymore.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on a diff in pull request #37052: [SPARK-39647][CORE] Register the executor with ESS before registering the BlockManager

2022-07-05 Thread GitBox


attilapiros commented on code in PR #37052:
URL: https://github.com/apache/spark/pull/37052#discussion_r914027452


##
core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala:
##
@@ -2177,6 +2177,46 @@ class BlockManagerSuite extends SparkFunSuite with 
Matchers with BeforeAndAfterE
 assert(kryoException.getMessage === "java.io.IOException: Input/output 
error")
   }
 
+  test("SPARK-39647: Failure to register with ESS should prevent registering 
the BM") {
+val timeoutExec = "timeoutExec"
+val handler = new NoOpRpcHandler {
+  override def receive(
+  client: TransportClient,
+  message: ByteBuffer,
+  callback: RpcResponseCallback): Unit = {
+val msgObj = BlockTransferMessage.Decoder.fromByteBuffer(message)
+msgObj match {
+
+  case exec: RegisterExecutor if exec.execId == timeoutExec =>

Review Comment:
   As you are using this handler for only one executor ("timeoutExec") you can 
remove the if part.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on a diff in pull request #37052: [SPARK-39647][CORE] Register the executor with ESS before registering the BlockManager

2022-07-05 Thread GitBox


attilapiros commented on code in PR #37052:
URL: https://github.com/apache/spark/pull/37052#discussion_r914025635


##
core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala:
##
@@ -2177,6 +2177,46 @@ class BlockManagerSuite extends SparkFunSuite with 
Matchers with BeforeAndAfterE
 assert(kryoException.getMessage === "java.io.IOException: Input/output 
error")
   }
 
+  test("SPARK-39647: Failure to register with ESS should prevent registering 
the BM") {
+val timeoutExec = "timeoutExec"
+val handler = new NoOpRpcHandler {
+  override def receive(
+  client: TransportClient,
+  message: ByteBuffer,
+  callback: RpcResponseCallback): Unit = {
+val msgObj = BlockTransferMessage.Decoder.fromByteBuffer(message)
+msgObj match {
+
+  case exec: RegisterExecutor if exec.execId == timeoutExec =>
+() // No reply to generate client-side timeout
+}
+  }
+}
+val transConf = SparkTransportConf.fromSparkConf(conf, "shuffle", 
numUsableCores = 0)
+Utils.tryWithResource(new TransportContext(transConf, handler, true)) { 
transCtx =>
+  // a server which delays response 50ms and must try twice for success.

Review Comment:
   This comment is copy-pasted from "SPARK-20640: Shuffle registration timeout 
and maxAttempts conf are working" but here it is not a valid statement.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] kazuyukitanimura commented on pull request #37020: [SPARK-34927][INFRA] Support TPCDSQueryBenchmark in Benchmarks

2022-07-05 Thread GitBox


kazuyukitanimura commented on PR #37020:
URL: https://github.com/apache/spark/pull/37020#issuecomment-1175285111

   Thank you @dongjoon-hyun @wangyum @LuciferYang @HyukjinKwon for the review. 
Could you please help me with merging this if there are no more feedbacks?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maryannxue commented on pull request #37014: [SPARK-39624][SQL] Support coalesce partition through CartesianProduct

2022-07-05 Thread GitBox


maryannxue commented on PR #37014:
URL: https://github.com/apache/spark/pull/37014#issuecomment-1175224895

   Can you please rework your PR description, by stating what's the behavior 
before and after your change? That screenshot doesn't say anything here. You 
can remove it. Maybe use some before/after examples instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maryannxue commented on a diff in pull request #37014: [SPARK-39624][SQL] Support coalesce partition through CartesianProduct

2022-07-05 Thread GitBox


maryannxue commented on code in PR #37014:
URL: https://github.com/apache/spark/pull/37014#discussion_r913956638


##
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala:
##
@@ -132,6 +133,7 @@ case class CoalesceShufflePartitions(session: SparkSession) 
extends AQEShuffleRe
   Seq(collectShuffleStageInfos(r))
 case unary: UnaryExecNode => collectCoalesceGroups(unary.child)
 case union: UnionExec => union.children.flatMap(collectCoalesceGroups)
+case cartesian: CartesianProductExec => 
cartesian.children.flatMap(collectCoalesceGroups)

Review Comment:
   update the javadoc for this method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maryannxue commented on a diff in pull request #37014: [SPARK-39624][SQL] Support coalesce partition through CartesianProduct

2022-07-05 Thread GitBox


maryannxue commented on code in PR #37014:
URL: https://github.com/apache/spark/pull/37014#discussion_r913956001


##
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala:
##
@@ -2602,6 +2602,59 @@ class AdaptiveQueryExecSuite
   assert(findTopLevelBroadcastNestedLoopJoin(adaptivePlan).size == 1)
 }
   }
+
+  test("SPARK-39624 Support coalesce partition through CartesianProduct") {
+def checkResultPartition(
+   df: Dataset[Row],
+   numShuffleReader: Int,
+   numPartition: Int): Unit = {
+  df.collect()
+  assert(collect(df.queryExecution.executedPlan) {
+case r: AQEShuffleReadExec => r
+  }.size === numShuffleReader)
+  assert(df.rdd.partitions.length === numPartition)
+}
+withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+  SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+  SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "1048576",
+  SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+  SQLConf.SHUFFLE_PARTITIONS.key -> "10") {
+  withTempView("t1", "t2", "t3") {
+spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, 
i.toString)), 2)
+  .toDF().createOrReplaceTempView("t1")
+spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, 
i.toString)), 4)
+  .toDF().createOrReplaceTempView("t2")
+spark.sparkContext.parallelize((1 to 10).map(i => TestData(i, 
i.toString)), 4)
+  .toDF().createOrReplaceTempView("t3")
+// positive test that could be coalesced

Review Comment:
   can you explain a little more in the code comment why a query `can` (not 
could) be coalesced or why it can't?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk closed pull request #37093: [SPARK-39677][SQL][DOCS][3.2] Fix args formatting of the regexp and like functions

2022-07-05 Thread GitBox


MaxGekk closed pull request #37093: [SPARK-39677][SQL][DOCS][3.2] Fix args 
formatting of the regexp and like functions
URL: https://github.com/apache/spark/pull/37093


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #37093: [SPARK-39677][SQL][DOCS][3.2] Fix args formatting of the regexp and like functions

2022-07-05 Thread GitBox


MaxGekk commented on PR #37093:
URL: https://github.com/apache/spark/pull/37093#issuecomment-1175193736

   Merging to 3.2 since all tests passed, and the fixes have been merged to 
master already.
   https://user-images.githubusercontent.com/1580697/177363428-ec1d021b-91a0-4b2a-a70f-25784101c061.png";>
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk closed pull request #37094: [SPARK-39677][SQL][DOCS][3.1] Fix args formatting of the regexp and like functions

2022-07-05 Thread GitBox


MaxGekk closed pull request #37094: [SPARK-39677][SQL][DOCS][3.1] Fix args 
formatting of the regexp and like functions
URL: https://github.com/apache/spark/pull/37094


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #37094: [SPARK-39677][SQL][DOCS][3.1] Fix args formatting of the regexp and like functions

2022-07-05 Thread GitBox


MaxGekk commented on PR #37094:
URL: https://github.com/apache/spark/pull/37094#issuecomment-1175184867

   Merging this to 3.1 since the original PR was approved and has been merged 
to master already.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Yikun commented on a diff in pull request #37091: [SPARK-39686][INFRA] Disable scheduled builds that did not pass even once

2022-07-05 Thread GitBox


Yikun commented on code in PR #37091:
URL: https://github.com/apache/spark/pull/37091#discussion_r913892631


##
.github/workflows/build_branch32.yml:
##
@@ -36,12 +36,12 @@ jobs:
 {
   "SCALA_PROFILE": "scala2.13"
 }
+  # TODO(SPARK-39681): Reenable "lint": "true"
+  # TODO(SPARK-39685): Reenable "pyspark": "true"

Review Comment:
   ```suggestion
 # TODO(SPARK-39685): Reenable "lint": "true"
 # TODO(SPARK-39681): Reenable "pyspark": "true"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a diff in pull request #36991: [SPARK-39601][YARN] AllocationFailure should not be treated as exitCausedByApp when driver is shutting down

2022-07-05 Thread GitBox


tgravescs commented on code in PR #36991:
URL: https://github.com/apache/spark/pull/36991#discussion_r913871806


##
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala:
##
@@ -859,7 +859,8 @@ private[yarn] class YarnAllocator(
 // 
.com/apache/hadoop/blob/228156cfd1b474988bc4fedfbf7edddc87db41e3/had
 // 
oop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/ap
 // ache/hadoop/yarn/util/Apps.java#L273 for details)
-if 
(NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(other_exit_status)) {
+if 
(NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(other_exit_status) ||
+SparkContext.getActive.forall(_.isStopped)) {

Review Comment:
   I'd prefer to know we are in shutdown vs checking the spark context active 
level but I need to refresh my memory on that shutdown sequence



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #37060: [SPARK-39663][SQL][TESTS] Add UT for MysqlDialect listIndexes method

2022-07-05 Thread GitBox


LuciferYang commented on code in PR #37060:
URL: https://github.com/apache/spark/pull/37060#discussion_r913829758


##
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala:
##
@@ -219,11 +221,21 @@ private[v2] trait V2JDBCTest extends SharedSparkSession 
with DockerIntegrationFu
   s" The supported Index Types are:"))
 
 sql(s"CREATE index i1 ON $catalogName.new_table USING BTREE (col1)")
+assert(jdbcTable.indexExists("i1"))
+if (supportListIndexes) {
+  val indexes = jdbcTable.listIndexes()
+  assert(indexes.size == 1)
+  assert(indexes(0).indexName() == "i1")

Review Comment:
   Yes, `.head` is always recommended



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] panbingkun commented on pull request #37058: [SPARK-39661][SQL] Avoid creating unnecessary SLF4J Logger

2022-07-05 Thread GitBox


panbingkun commented on PR #37058:
URL: https://github.com/apache/spark/pull/37058#issuecomment-1175082503

   > Hm, tests timed out. Can you retrigger? I'm sure it's not related, but 
good to see tests pass to be sure.
   
   okay


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a diff in pull request #37060: [SPARK-39663][SQL][TESTS] Add UT for MysqlDialect listIndexes method

2022-07-05 Thread GitBox


srowen commented on code in PR #37060:
URL: https://github.com/apache/spark/pull/37060#discussion_r913808190


##
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala:
##
@@ -219,11 +221,21 @@ private[v2] trait V2JDBCTest extends SharedSparkSession 
with DockerIntegrationFu
   s" The supported Index Types are:"))
 
 sql(s"CREATE index i1 ON $catalogName.new_table USING BTREE (col1)")
+assert(jdbcTable.indexExists("i1"))
+if (supportListIndexes) {
+  val indexes = jdbcTable.listIndexes()
+  assert(indexes.size == 1)
+  assert(indexes(0).indexName() == "i1")

Review Comment:
   While we're nit-picking, `indexes.head`? doesn't matter



##
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala:
##
@@ -219,11 +221,21 @@ private[v2] trait V2JDBCTest extends SharedSparkSession 
with DockerIntegrationFu
   s" The supported Index Types are:"))
 
 sql(s"CREATE index i1 ON $catalogName.new_table USING BTREE (col1)")
+assert(jdbcTable.indexExists("i1"))
+if (supportListIndexes) {
+  val indexes = jdbcTable.listIndexes()
+  assert(indexes.size == 1)
+  assert(indexes(0).indexName() == "i1")
+}
+
 sql(s"CREATE index i2 ON $catalogName.new_table (col2, col3, col5)" +
   s" OPTIONS ($indexOptions)")
-
-assert(jdbcTable.indexExists("i1") == true)
-assert(jdbcTable.indexExists("i2") == true)
+assert(jdbcTable.indexExists("i2"))
+if (supportListIndexes) {
+  val indexes = jdbcTable.listIndexes()
+  assert(indexes.size == 2)
+  assert(indexes.map(index => index.indexName()).sorted === 
Array("i1", "i2"))

Review Comment:
   `.map(_.indexName())`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] grundprinzip commented on a diff in pull request #37075: [SparkConnect] Initial Protobuf Definitions

2022-07-05 Thread GitBox


grundprinzip commented on code in PR #37075:
URL: https://github.com/apache/spark/pull/37075#discussion_r913806860


##
sql/core/pom.xml:
##
@@ -145,6 +145,30 @@
   org.apache.xbean
   xbean-asm9-shaded
 
+
+  io.grpc
+  grpc-netty-shaded
+  runtime
+
+
+  io.grpc
+  grpc-protobuf
+
+
+  io.grpc
+  grpc-stub
+
+ 
+  org.apache.tomcat
+  annotations-api
+  provided
+
+
+
+  com.thesamet.scalapb
+  scalapb-runtime-grpc_2.12

Review Comment:
   Thanks, fixed it.



##
sql/core/pom.xml:
##
@@ -265,6 +297,36 @@
   
 
   
+  

Review Comment:
   Yeah, I had to find a colleague with M1 to test, I pushed a new version that 
actually works.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] grundprinzip commented on pull request #37075: [SparkConnect] Initial Protobuf Definitions

2022-07-05 Thread GitBox


grundprinzip commented on PR #37075:
URL: https://github.com/apache/spark/pull/37075#issuecomment-1175070660

   Added the missing licence headers, fixed the M1 build.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #37011: [SPARK-39625][SPARK-38904][SQL] Add Dataset.as(StructType)

2022-07-05 Thread GitBox


cloud-fan commented on PR #37011:
URL: https://github.com/apache/spark/pull/37011#issuecomment-1175058395

   @ravwojdyla I'm fine to support this use case. We can use the 
`AssertNotNull` expression to do this. What I was against is the opposite: the 
dataframe infers some columns as non-nullable, but users want to forcibly mark 
these columns as nullable, to degrade performance.
   
   @dilipbiswal what do you think of this use case? When the column is nullable 
but the specified schema requires it to be non-nullabl, shall we add runtime 
null check to make it work? At least this can be under a config.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #36994: [SPARK-39606][SQL] Use child stats to estimate order operator

2022-07-05 Thread GitBox


cloud-fan commented on PR #36994:
URL: https://github.com/apache/spark/pull/36994#issuecomment-1175051179

   late LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #37058: [SPARK-39661][SQL] Avoid creating unnecessary SLF4J Logger

2022-07-05 Thread GitBox


srowen commented on PR #37058:
URL: https://github.com/apache/spark/pull/37058#issuecomment-1175050327

   Hm, tests timed out. Can you retrigger? I'm sure it's not related, but good 
to see tests pass to be sure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #37082: [SPARK-39677][SQL][DOCS] Fix args formatting of the regexp and like functions

2022-07-05 Thread GitBox


cloud-fan commented on PR #37082:
URL: https://github.com/apache/spark/pull/37082#issuecomment-1175050026

   late LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #37082: [SPARK-39677][SQL][DOCS] Fix args formatting of the regexp and like functions

2022-07-05 Thread GitBox


MaxGekk commented on PR #37082:
URL: https://github.com/apache/spark/pull/37082#issuecomment-1175007274

   Here are backports:
   - 3.2: https://github.com/apache/spark/pull/37093
   - 3.1: https://github.com/apache/spark/pull/37094


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk opened a new pull request, #37094: [SPARK-39677][SQL][DOCS][3.1] Fix args formatting of the regexp and like functions

2022-07-05 Thread GitBox


MaxGekk opened a new pull request, #37094:
URL: https://github.com/apache/spark/pull/37094

   ### What changes were proposed in this pull request?
   In the PR, I propose to fix args formatting of some regexp functions by 
adding explicit new lines. That fixes the following items in arg lists.
   
   Before:
   
   https://user-images.githubusercontent.com/1580697/177274234-04209d43-a542-4c71-b5ca-6f3239208015.png";>
   
   After:
   
   https://user-images.githubusercontent.com/1580697/177280718-cb05184c-8559-4461-b94d-dfaaafda7dd2.png";>
   
   
   
   ### Why are the changes needed?
   To improve readability of Spark SQL docs.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   By building docs and checking manually:
   ```
   $ SKIP_SCALADOC=1 SKIP_PYTHONDOC=1 SKIP_RDOC=1 bundle exec jekyll build
   ```
   
   Authored-by: Max Gekk 
   Signed-off-by: Max Gekk 
   (cherry picked from commit 4e42f8b12e8dc57a15998f22d508a19cf3c856aa)
   Signed-off-by: Max Gekk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk opened a new pull request, #37093: [SPARK-39677][SQL][DOCS][3.2] Fix args formatting of the regexp and like functions

2022-07-05 Thread GitBox


MaxGekk opened a new pull request, #37093:
URL: https://github.com/apache/spark/pull/37093

   ### What changes were proposed in this pull request?
   In the PR, I propose to fix args formatting of some regexp functions by 
adding explicit new lines. That fixes the following items in arg lists.
   
   Before:
   
   https://user-images.githubusercontent.com/1580697/177274234-04209d43-a542-4c71-b5ca-6f3239208015.png";>
   
   After:
   
   https://user-images.githubusercontent.com/1580697/177280718-cb05184c-8559-4461-b94d-dfaaafda7dd2.png";>
   
   
   
   ### Why are the changes needed?
   To improve readability of Spark SQL docs.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   By building docs and checking manually:
   ```
   $ SKIP_SCALADOC=1 SKIP_PYTHONDOC=1 SKIP_RDOC=1 bundle exec jekyll build
   ```
   
   Authored-by: Max Gekk 
   Signed-off-by: Max Gekk 
   (cherry picked from commit 4e42f8b12e8dc57a15998f22d508a19cf3c856aa)
   Signed-off-by: Max Gekk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Yikun commented on pull request #37091: [SPARK-39686][INFRA] Disable scheduled builds that did not pass even once

2022-07-05 Thread GitBox


Yikun commented on PR #37091:
URL: https://github.com/apache/spark/pull/37091#issuecomment-1174989349

   Looks like all failures are container-based job (maybe just a coincidence),  
is our strategy always let all tests pass with latest images (or to say latest 
deps)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #37092: [SPARK-39687][PYTHON][DOCS] Make sure new catalog methods listed in API reference

2022-07-05 Thread GitBox


zhengruifeng commented on PR #37092:
URL: https://github.com/apache/spark/pull/37092#issuecomment-1174987095

   cc @HyukjinKwon @itholic 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng opened a new pull request, #37092: [SPARK-39687][PYTHON][DOCS] Make sure new catalog methods listed in API reference

2022-07-05 Thread GitBox


zhengruifeng opened a new pull request, #37092:
URL: https://github.com/apache/spark/pull/37092

   ### What changes were proposed in this pull request?
   1, add new methods to `catalog.rst`;
   2, follow sphinx synctax `AnalysisException` -> ``:class:`AnalysisException``
   
   
   ### Why are the changes needed?
   Make sure new catalog methods listed in API reference
   
   
   ### Does this PR introduce _any_ user-facing change?
   No, docs only
   
   
   ### How was this patch tested?
   manually doc build and check, such as
   
   
![image](https://user-images.githubusercontent.com/7322292/177324325-63280b21-9e68-4780-a095-84bf9fd404ac.png)
   
   
![image](https://user-images.githubusercontent.com/7322292/177323978-59c527f5-108f-4559-821a-fa9480d76c1f.png)
   
   
![image](https://user-images.githubusercontent.com/7322292/177324089-669fdb2f-8696-4e50-b725-0af9f8516365.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #37088: [SPARK-39579][PYTHON][FOLLOWUP] fix functionExists(functionName, dbName) when dbName is not None

2022-07-05 Thread GitBox


zhengruifeng commented on PR #37088:
URL: https://github.com/apache/spark/pull/37088#issuecomment-1174972340

   merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #37079: [SPARK-39610][INFRA] Add GITHUB_WORKSPACE to git trust safe.directory for container based job

2022-07-05 Thread GitBox


HyukjinKwon closed pull request #37079: [SPARK-39610][INFRA] Add 
GITHUB_WORKSPACE to git trust safe.directory for container based job
URL: https://github.com/apache/spark/pull/37079


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #37079: [SPARK-39610][INFRA] Add GITHUB_WORKSPACE to git trust safe.directory for container based job

2022-07-05 Thread GitBox


HyukjinKwon commented on PR #37079:
URL: https://github.com/apache/spark/pull/37079#issuecomment-1174972063

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng closed pull request #37088: [SPARK-39579][PYTHON][FOLLOWUP] fix functionExists(functionName, dbName) when dbName is not None

2022-07-05 Thread GitBox


zhengruifeng closed pull request #37088: [SPARK-39579][PYTHON][FOLLOWUP] fix 
functionExists(functionName, dbName) when dbName is not None
URL: https://github.com/apache/spark/pull/37088


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #37078: [SPARK-39611][PYTHON][PS] Fix wrong aliases in __array_ufunc__

2022-07-05 Thread GitBox


HyukjinKwon closed pull request #37078: [SPARK-39611][PYTHON][PS] Fix wrong 
aliases in __array_ufunc__
URL: https://github.com/apache/spark/pull/37078


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #37078: [SPARK-39611][PYTHON][PS] Fix wrong aliases in __array_ufunc__

2022-07-05 Thread GitBox


HyukjinKwon commented on PR #37078:
URL: https://github.com/apache/spark/pull/37078#issuecomment-1174970899

   Merged to master, branch-3.3 and branch-3.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >