HyukjinKwon commented on code in PR #41711:
URL: https://github.com/apache/spark/pull/41711#discussion_r1240603584
##
dev/error_message_refiner.py:
##
@@ -0,0 +1,235 @@
+#!/usr/bin/env python3
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+#
HyukjinKwon commented on code in PR #41711:
URL: https://github.com/apache/spark/pull/41711#discussion_r1240603499
##
dev/api_key.txt:
##
@@ -0,0 +1 @@
+# Please REMOVE this comment and enter the API key here. You can obtain an API
key from
HyukjinKwon commented on code in PR #41711:
URL: https://github.com/apache/spark/pull/41711#discussion_r1240603443
##
dev/error_message_refiner.py:
##
@@ -0,0 +1,235 @@
+#!/usr/bin/env python3
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+#
bersprockets commented on code in PR #41712:
URL: https://github.com/apache/spark/pull/41712#discussion_r1240569151
##
sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala:
##
@@ -1685,4 +1685,24 @@ class JoinSuite extends QueryTest with
SharedSparkSession with
itholic commented on code in PR #41711:
URL: https://github.com/apache/spark/pull/41711#discussion_r1240578537
##
dev/error_message_refiner.py:
##
@@ -0,0 +1,219 @@
+#!/usr/bin/env python3
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor
szehon-ho commented on code in PR #41614:
URL: https://github.com/apache/spark/pull/41614#discussion_r1240571397
##
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala:
##
@@ -578,7 +594,9 @@ case class ShuffledHashJoinExec(
|
szehon-ho commented on code in PR #41614:
URL: https://github.com/apache/spark/pull/41614#discussion_r1240571397
##
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala:
##
@@ -578,7 +594,9 @@ case class ShuffledHashJoinExec(
|
dongjoon-hyun commented on PR #36374:
URL: https://github.com/apache/spark/pull/36374#issuecomment-1605232031
Thank you for confirming. Apache Spark 3.4.1 is released officially.
- https://lists.apache.org/list.html?d...@spark.apache.org
- https://spark.apache.org/downloads.html
--
szehon-ho commented on code in PR #41614:
URL: https://github.com/apache/spark/pull/41614#discussion_r1240569908
##
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala:
##
@@ -578,7 +594,9 @@ case class ShuffledHashJoinExec(
|
bersprockets commented on code in PR #41712:
URL: https://github.com/apache/spark/pull/41712#discussion_r1240569151
##
sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala:
##
@@ -1685,4 +1685,24 @@ class JoinSuite extends QueryTest with
SharedSparkSession with
bersprockets commented on PR #41712:
URL: https://github.com/apache/spark/pull/41712#issuecomment-1605228576
For the title, may I suggest:
```
[SPARK-44132][SQL] Materialize `Stream` of join column names to avoid
codegen failure
```
For the description, may I suggest:
###
szehon-ho commented on code in PR #41614:
URL: https://github.com/apache/spark/pull/41614#discussion_r1240567843
##
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala:
##
@@ -364,7 +366,13 @@ case class ShuffledHashJoinExec(
override def
amaliujia commented on PR #41716:
URL: https://github.com/apache/spark/pull/41716#issuecomment-1605197640
@hvanhovell @cloud-fan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
amaliujia opened a new pull request, #41716:
URL: https://github.com/apache/spark/pull/41716
### What changes were proposed in this pull request?
Extract toAttribute method from StructField to Util class.
### Why are the changes needed?
StructField should be
github-actions[bot] closed pull request #38202: [SPARK-40763][K8S] Should
expose driver service name to config for user features
URL: https://github.com/apache/spark/pull/38202
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
github-actions[bot] commented on PR #38885:
URL: https://github.com/apache/spark/pull/38885#issuecomment-1605183251
We're closing this PR because it hasn't been updated in a while. This isn't
a judgement on the merit of the PR in any way. It's just a way of keeping the
PR queue manageable.
github-actions[bot] closed pull request #39985: [SPARK-42412][WIP] Initial
prototype implementation of PySpark ML via Spark connect
URL: https://github.com/apache/spark/pull/39985
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
github-actions[bot] closed pull request #40391: [SPARK-42766][YARN]
YarnAllocator filter excluded nodes when launching containers
URL: https://github.com/apache/spark/pull/40391
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
github-actions[bot] closed pull request #40122: [SPARK-42349][PYTHON] Support
pandas cogroup with multiple df
URL: https://github.com/apache/spark/pull/40122
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
github-actions[bot] closed pull request #40411: [SPARK-42781][DOCS][PYTHON]
provide one format for writing to kafka
URL: https://github.com/apache/spark/pull/40411
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
github-actions[bot] closed pull request #40419: [SPARK-42789][SQL] Rewrite
multiple GetJsonObjects to a JsonTuple if their json expressions are the same
URL: https://github.com/apache/spark/pull/40419
--
This is an automated message from the Apache Git Service.
To respond to the message,
dongjoon-hyun closed pull request #41715: [SPARK-44163][PYTHON] Handle
`ModuleNotFoundError` in addition to `ImportError`
URL: https://github.com/apache/spark/pull/41715
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use
dtenedor commented on PR #41486:
URL: https://github.com/apache/spark/pull/41486#issuecomment-1605060822
Hi @MaxGekk can we trouble you to help us merge this please when you have a
moment :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please
dongjoon-hyun opened a new pull request, #41715:
URL: https://github.com/apache/spark/pull/41715
…
### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
dtenedor commented on code in PR #41486:
URL: https://github.com/apache/spark/pull/41486#discussion_r1240463584
##
core/src/main/resources/error/error-classes.json:
##
@@ -757,6 +757,21 @@
"The expression cannot be used as a grouping expression
because its data type
dtenedor commented on code in PR #41486:
URL: https://github.com/apache/spark/pull/41486#discussion_r1240463321
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datasketchesExpressions.scala:
##
@@ -98,15 +104,22 @@ case class HllUnion(first: Expression,
dtenedor commented on code in PR #41486:
URL: https://github.com/apache/spark/pull/41486#discussion_r1240463194
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala:
##
@@ -189,21 +189,20 @@ object HllSketchAgg {
mkaravel commented on code in PR #41486:
URL: https://github.com/apache/spark/pull/41486#discussion_r1240413693
##
core/src/main/resources/error/error-classes.json:
##
@@ -757,6 +757,21 @@
"The expression cannot be used as a grouping expression
because its data type
szehon-ho commented on PR #41683:
URL: https://github.com/apache/spark/pull/41683#issuecomment-1604982731
Hm build failure doesnt seem related:
```
python/pyspark/mllib/clustering.py:781: error: Decorated property not
supported [misc]
python/pyspark/mllib/clustering.py:949:
amaliujia commented on PR #41714:
URL: https://github.com/apache/spark/pull/41714#issuecomment-1604972107
cc @hvanhovell @cloud-fan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
siying commented on PR #41578:
URL: https://github.com/apache/spark/pull/41578#issuecomment-1604966953
@MaxGekk I addressed the comments. The CI failure is on python link. I don't
know why it happens but I don't see how it is related.
--
This is an automated message from the Apache Git
amaliujia opened a new pull request, #41714:
URL: https://github.com/apache/spark/pull/41714
### What changes were proposed in this pull request?
The StructType has some methods that require CatalystParser and Catalyst
expression. We are not planning to move the parser and
dongjoon-hyun closed pull request #41713: [SPARK-44158][K8S] Remove unused
`spark.kubernetes.executor.lostCheckmaxAttempts`
URL: https://github.com/apache/spark/pull/41713
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use
dongjoon-hyun commented on PR #41713:
URL: https://github.com/apache/spark/pull/41713#issuecomment-1604941186
Thank you so much! Merged to master/3.4/3.3.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
vicennial commented on PR #41701:
URL: https://github.com/apache/spark/pull/41701#issuecomment-1604932773
The PR is ready to review (the falling tests are flaky, I've submitted a
rerun request).
--
This is an automated message from the Apache Git Service.
To respond to the message,
dongjoon-hyun commented on PR #41713:
URL: https://github.com/apache/spark/pull/41713#issuecomment-1604904486
Could you review this PR when you have some time, @viirya ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use
jdesjean commented on PR #41443:
URL: https://github.com/apache/spark/pull/41443#issuecomment-1604796450
> > @beliefer the event API is open by design, it is made for others to use
it. For example thriftserver uses it. This particular PR will be foundation on
which we will be building a UI
dongjoon-hyun opened a new pull request, #41713:
URL: https://github.com/apache/spark/pull/41713
### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was
amaliujia commented on code in PR #41559:
URL: https://github.com/apache/spark/pull/41559#discussion_r1240066813
##
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataTypeExpression.scala:
##
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
dongjoon-hyun commented on PR #41707:
URL: https://github.com/apache/spark/pull/41707#issuecomment-1604550160
Merged to master for Apache Spark 3.5.0 Thank you, @panbingkun and
@HyukjinKwon .
--
This is an automated message from the Apache Git Service.
To respond to the message, please
dongjoon-hyun closed pull request #41707: [SPARK-44151][BUILD] Upgrade
`commons-codec` to 1.16.0
URL: https://github.com/apache/spark/pull/41707
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
dongjoon-hyun commented on code in PR #41709:
URL: https://github.com/apache/spark/pull/41709#discussion_r1240040307
##
core/src/main/scala/org/apache/spark/util/Utils.scala:
##
@@ -2287,6 +2287,22 @@ private[spark] object Utils extends Logging with
SparkClassUtils {
dongjoon-hyun commented on code in PR #41700:
URL: https://github.com/apache/spark/pull/41700#discussion_r1240037361
##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/GroupBasedRowLevelOperationScanPlanning.scala:
##
@@ -56,29 +62,67 @@ object
aokolnychyi commented on code in PR #41700:
URL: https://github.com/apache/spark/pull/41700#discussion_r1240020952
##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/GroupBasedRowLevelOperationScanPlanning.scala:
##
@@ -56,29 +62,67 @@ object
aokolnychyi commented on PR #41700:
URL: https://github.com/apache/spark/pull/41700#issuecomment-1604524387
Thank you, @dongjoon-hyun!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
gatorsmile commented on code in PR #41711:
URL: https://github.com/apache/spark/pull/41711#discussion_r1239940518
##
dev/error_message_refiner.py:
##
@@ -0,0 +1,219 @@
+#!/usr/bin/env python3
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+#
steven-aerts commented on PR #41688:
URL: https://github.com/apache/spark/pull/41688#issuecomment-1604244490
Eclipsed by #41712.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
steven-aerts closed pull request #41688: [WIP][SPARK-44132][SQL][TESTS] nesting
full outer join confuses codegen
URL: https://github.com/apache/spark/pull/41688
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
steven-aerts opened a new pull request, #41712:
URL: https://github.com/apache/spark/pull/41712
When nesting multiple joins using column names which are Stream the lazily
evaluated stream can sometimes generate faulty codegen data. Resulting in a NPE
or bad access of the data.
0xdarkman commented on PR #36374:
URL: https://github.com/apache/spark/pull/36374#issuecomment-1604226893
I am running this version 3.4.1 and it looks good.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
itholic commented on PR #41711:
URL: https://github.com/apache/spark/pull/41711#issuecomment-1604003249
BTW, I set the category as `SQL` because it works based on the SQL error
class, but I'm not sure if this is correct.
--
This is an automated message from the Apache Git Service.
To
itholic commented on PR #41711:
URL: https://github.com/apache/spark/pull/41711#issuecomment-1603986988
cc @gengliangwang @allisonwang-db @MaxGekk @cloud-fan could you take a look
at this PR when you find some time?
--
This is an automated message from the Apache Git Service.
To respond
itholic opened a new pull request, #41711:
URL: https://github.com/apache/spark/pull/41711
### What changes were proposed in this pull request?
This PR proposes adding a utility script that can help increase productivity
in error message improvement tasks.
When passing the
pan3793 commented on code in PR #41709:
URL: https://github.com/apache/spark/pull/41709#discussion_r1239566530
##
core/src/main/scala/org/apache/spark/util/Utils.scala:
##
@@ -2287,6 +2287,22 @@ private[spark] object Utils extends Logging with
SparkClassUtils {
pan3793 commented on code in PR #41709:
URL: https://github.com/apache/spark/pull/41709#discussion_r1239566530
##
core/src/main/scala/org/apache/spark/util/Utils.scala:
##
@@ -2287,6 +2287,22 @@ private[spark] object Utils extends Logging with
SparkClassUtils {
pan3793 commented on code in PR #41709:
URL: https://github.com/apache/spark/pull/41709#discussion_r1239566530
##
core/src/main/scala/org/apache/spark/util/Utils.scala:
##
@@ -2287,6 +2287,22 @@ private[spark] object Utils extends Logging with
SparkClassUtils {
ming95 commented on code in PR #41154:
URL: https://github.com/apache/spark/pull/41154#discussion_r1239560157
##
sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala:
##
@@ -1275,4 +1275,27 @@ class DataFrameReaderWriterSuite extends QueryTest
dongjoon-hyun commented on code in PR #41709:
URL: https://github.com/apache/spark/pull/41709#discussion_r1239558578
##
core/src/main/scala/org/apache/spark/util/Utils.scala:
##
@@ -2287,6 +2287,22 @@ private[spark] object Utils extends Logging with
SparkClassUtils {
dongjoon-hyun commented on code in PR #41709:
URL: https://github.com/apache/spark/pull/41709#discussion_r1239558578
##
core/src/main/scala/org/apache/spark/util/Utils.scala:
##
@@ -2287,6 +2287,22 @@ private[spark] object Utils extends Logging with
SparkClassUtils {
peter-toth commented on PR #41677:
URL: https://github.com/apache/spark/pull/41677#issuecomment-1603934838
The failure in `[Run / Linters, licenses, dependencies and documentation
generation]` seems unrelated.
--
This is an automated message from the Apache Git Service.
To respond to the
pan3793 commented on code in PR #41709:
URL: https://github.com/apache/spark/pull/41709#discussion_r1239521669
##
core/src/main/scala/org/apache/spark/util/Utils.scala:
##
@@ -2287,6 +2287,22 @@ private[spark] object Utils extends Logging with
SparkClassUtils {
pan3793 commented on code in PR #41709:
URL: https://github.com/apache/spark/pull/41709#discussion_r1239521669
##
core/src/main/scala/org/apache/spark/util/Utils.scala:
##
@@ -2287,6 +2287,22 @@ private[spark] object Utils extends Logging with
SparkClassUtils {
dongjoon-hyun commented on PR #41709:
URL: https://github.com/apache/spark/pull/41709#issuecomment-1603874002
Merged to master for Apache Spark 3.5.0.
Thank you, @viirya and @pan3793 .
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on
dongjoon-hyun closed pull request #41709: [SPARK-44153][CORE][UI] Support `Heap
Histogram` column in `Executors` tab
URL: https://github.com/apache/spark/pull/41709
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
EnricoMi commented on PR #40122:
URL: https://github.com/apache/spark/pull/40122#issuecomment-1603851057
@HyukjinKwon @cloud-fan @sunchao @zhengruifeng this is so sad.
Can we get some feedback if this contribution is welcome or if any more
effort is just wasted.
--
This is an
amaliujia commented on PR #41710:
URL: https://github.com/apache/spark/pull/41710#issuecomment-1603761304
I did a search and looks like `constructorNotFoundError` may fit into
internal errors which throws when a constructor is not found during reflection
related operation, which is also at
dongjoon-hyun commented on code in PR #41709:
URL: https://github.com/apache/spark/pull/41709#discussion_r1239373825
##
core/src/main/scala/org/apache/spark/util/Utils.scala:
##
@@ -2287,6 +2287,22 @@ private[spark] object Utils extends Logging with
SparkClassUtils {
HyukjinKwon closed pull request #41708: [SPARK-44135][PYTHON][CONNECT][DOCS]
Document Spark Connect only API in PySpark
URL: https://github.com/apache/spark/pull/41708
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
HyukjinKwon commented on PR #41708:
URL: https://github.com/apache/spark/pull/41708#issuecomment-1603750548
Merged to master
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
dongjoon-hyun commented on code in PR #41709:
URL: https://github.com/apache/spark/pull/41709#discussion_r1239373825
##
core/src/main/scala/org/apache/spark/util/Utils.scala:
##
@@ -2287,6 +2287,22 @@ private[spark] object Utils extends Logging with
SparkClassUtils {
dongjoon-hyun commented on code in PR #41709:
URL: https://github.com/apache/spark/pull/41709#discussion_r1239373194
##
core/src/main/scala/org/apache/spark/util/Utils.scala:
##
@@ -2287,6 +2287,22 @@ private[spark] object Utils extends Logging with
SparkClassUtils {
pan3793 commented on code in PR #41709:
URL: https://github.com/apache/spark/pull/41709#discussion_r1239357126
##
core/src/main/scala/org/apache/spark/util/Utils.scala:
##
@@ -2287,6 +2287,22 @@ private[spark] object Utils extends Logging with
SparkClassUtils {
72 matches
Mail list logo