Github user henryr closed the pull request at:
https://github.com/apache/spark/pull/22211
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/22211
Merged to 2.1
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user henryr opened a pull request:
https://github.com/apache/spark/pull/22211
[SPARK-23207][SPARK-22905][SPARK-24564][SPARK-25114][SQL][BACKPORT-2.â¦
## What changes were proposed in this pull request?
Back port of #20393 and #22079.
Currently
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21482
Any further comments here?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21482#discussion_r197234189
--- Diff: python/pyspark/sql/functions.py ---
@@ -468,6 +468,18 @@ def input_file_name():
return Column(sc._jvm.functions.input_file_name
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21482
I think consistency in Spark's naming convention (and therefore increased
discoverability by users) outweighs the advantage of naming it exactly for
the Impala equivalent. I do agree
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21482#discussion_r193898812
--- Diff: R/pkg/NAMESPACE ---
@@ -281,6 +281,8 @@ exportMethods("%<=>%",
"initcap",
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21482
@rxin, that in itself is a bit weird, but there are ways to express inf
values in Scala and thus inf values can show up flowing through Spark plans.
I'm not sure MySQL has any such facility
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21482
@rxin Other engines are all over the place:
* MySQL doesn't have support for infinity (based on my cursory look) - 1.0
/ 0.0 is written as `null`. Also seems to be true of SQLite
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21482#discussion_r192522027
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
---
@@ -199,6 +199,50 @@ case class Nvl2(expr1
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21482#discussion_r192520713
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NullExpressionsSuite.scala
---
@@ -56,6 +56,16 @@ class
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21482#discussion_r192521881
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
---
@@ -199,6 +199,50 @@ case class Nvl2(expr1
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21482#discussion_r192520834
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -1107,6 +1107,14 @@ object functions {
*/
def input_file_name
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21482#discussion_r192520566
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NullExpressionsSuite.scala
---
@@ -24,7 +24,7 @@ import
Github user henryr closed the pull request at:
https://github.com/apache/spark/pull/21302
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21302
Done.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21302#discussion_r188042296
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
---
@@ -602,6 +602,16 @@ class
GitHub user henryr opened a pull request:
https://github.com/apache/spark/pull/21323
[SPARK-23582][SQL] Add withSQLConf(...) to test case
## What changes were proposed in this pull request?
Add a `withSQLConf(...)` wrapper to force Parquet filter pushdown for a
test
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21302
Sounds good, done.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user henryr opened a pull request:
https://github.com/apache/spark/pull/21302
[SPARK-23852][SQL] Upgrade to Parquet 1.8.3
## What changes were proposed in this pull request?
Upgrade Parquet dependency to 1.8.3 to avoid PARQUET-1217
## How was this patch
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21049
@dilipbiswal I tried hard to find something in the SQL standard that
clarified the situation, but couldn't (of course that could be because the
standard is pretty hard to parse... :)).
So
GitHub user henryr opened a pull request:
https://github.com/apache/spark/pull/21284
[SPARK-23852][SQL] Add test that fails if PARQUET-1217 is not fixed
## What changes were proposed in this pull request?
Add a new test that triggers if PARQUET-1217 - a predicate pushdown
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21201
Any chance to get this merged now the tests are working again? Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21049
I might be a bit of a hardliner on this, but I think it's correct to
eliminate the {{ORDER BY}} from common table expressions (e.g. MSSQL agrees
with me, see [this
link](https://docs.microsoft.com
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21145#discussion_r186143060
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ReadTask.java ---
@@ -22,20 +22,20 @@
import
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21145
I don't mind `ReadTask`. It's imperfect because 'task' implies that this is
a thing that can be executed, whereas this interface doesn't have a way to pass
control to the task object. It's more like
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21201
No problem! It's a small usability fix.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21201
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user henryr opened a pull request:
https://github.com/apache/spark/pull/21201
[SPARK-24128][SQL] Mention configuration option in implicit CROSS JOIN error
## What changes were proposed in this pull request?
Mention `spark.sql.crossJoin.enabled` in error message when
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21049
@dilipbiswal thanks for the clarification.
I agree that this particular case - where the alias is the root of a
logical plan - might need special handling. Is there any reason to actually
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21122
cc @rdblue
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21070
yes, thanks @maropu! +1 to the idea of making this a jenkins job.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21073
@gatorsmile this looks ready for your review (asking because you filed the
JIRA) if you time, thanks!
---
-
To unsubscribe, e
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21070
Ok, thanks for the context. I've worked on other projects where it's
sometimes ok to take a small risk of a perf hit on trunk as long as the
community is committed to addressing the issues before
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21070
@cloud-fan since thereâs probably quite some time before this lands in a
release, what do you think about merging this now if itâs ready, and filing
the perf jira as a blocker against 2.4? My
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21073#discussion_r183560436
--- Diff: python/pyspark/sql/functions.py ---
@@ -2186,6 +2186,29 @@ def map_values(col):
return Column(sc._jvm.functions.map_values(_to_java_column
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21073#discussion_r183558371
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -116,6 +118,154 @@ case class MapValues
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21073#discussion_r183559826
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -376,6 +376,35 @@ class DataFrameFunctionsSuite extends
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21073#discussion_r183559190
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -116,6 +118,154 @@ case class MapValues
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21073#discussion_r183560201
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -116,6 +118,154 @@ case class MapValues
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21073#discussion_r183559429
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -116,6 +118,154 @@ case class MapValues
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21073#discussion_r183559663
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
---
@@ -56,6 +58,26 @@ class
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21049
@dilipbiswal Thanks! Although Spark doesn't necessarily parse the query in
the `from` clause as a subquery, is it fair to say it plans it as one? (Since
the planner puts the alias under
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21070
This looks pretty good to me - are there any committers that can give it a
(hopefully) final review?
---
-
To unsubscribe, e
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21073#discussion_r182547477
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -115,6 +116,62 @@ case class MapValues
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21070
@scottcarey I agree that's important. Perhaps it could be done as a
follow-up PR?
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21073#discussion_r181915827
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -115,6 +116,62 @@ case class MapValues
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21068#discussion_r181908529
--- Diff:
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocatorBlacklistTracker.scala
---
@@ -0,0 +1,155
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21068#discussion_r181911744
--- Diff:
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/FailureWithinTimeIntervalTracker.scala
---
@@ -0,0 +1,80
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21068#discussion_r181910914
--- Diff:
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/FailureWithinTimeIntervalTracker.scala
---
@@ -0,0 +1,80
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21068#discussion_r181907316
--- Diff:
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocatorBlacklistTracker.scala
---
@@ -0,0 +1,155
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21068#discussion_r181912300
--- Diff:
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/FailureWithinTimeIntervalTracker.scala
---
@@ -0,0 +1,80
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21068#discussion_r181907395
--- Diff:
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocatorBlacklistTracker.scala
---
@@ -0,0 +1,155
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21070#discussion_r181901031
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
---
@@ -63,115 +59,159 @@ public final
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21070#discussion_r181902045
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
---
@@ -63,115 +59,159 @@ public final
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21070#discussion_r181889287
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
---
@@ -63,115 +58,139 @@ public final
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21070#discussion_r181882476
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
---
@@ -63,115 +58,139 @@ public final
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21072#discussion_r181847445
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantSortsSuite.scala
---
@@ -98,4 +98,31 @@ class
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21072#discussion_r181851593
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -736,12 +736,22 @@ object EliminateSorts extends Rule
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21072#discussion_r181849951
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantSortsSuite.scala
---
@@ -98,4 +98,31 @@ class
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21070#discussion_r181846514
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
---
@@ -63,115 +58,139 @@ public final
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21049#discussion_r181839715
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -307,6 +309,32 @@ object RemoveRedundantProject
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21049
In SQL, the sort in a subquery doesn't make sense because of the relational
model - the output of a subquery is an unordered bag of tuples. Some engines
still allow the sort, some silently drop
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21072#discussion_r181563918
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -736,12 +736,15 @@ object EliminateSorts extends Rule
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21070#discussion_r181540952
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
---
@@ -63,115 +58,139 @@ public final
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21070#discussion_r181528729
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
---
@@ -63,115 +58,139 @@ public final
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20560#discussion_r181253369
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -733,6 +735,17 @@ object EliminateSorts extends Rule
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21049#discussion_r180964957
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -307,6 +309,32 @@ object RemoveRedundantProject
GitHub user henryr opened a pull request:
https://github.com/apache/spark/pull/21049
[SPARK-23957][SQL] Remove redundant sort operators from subqueries
## What changes were proposed in this pull request?
Subqueries (at least in SQL) have 'bag of tuples' semantics. Ordering
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20560#discussion_r179003707
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
---
@@ -274,3 +279,7 @@ abstract class BinaryNode
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20560#discussion_r180601594
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -733,6 +735,17 @@ object EliminateSorts extends Rule
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20560#discussion_r180592716
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
@@ -22,10 +22,11 @@ import org.apache.spark.sql.{execution, Row
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/20687
@gatorsmile ok, I think the coverage right now is a reasonable start - the
other test cases I can think of would act more like they're exercising the
expression-walking code, not the actual
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/20687
@gatorsmile thank you for the reviews! Are there specific test cases you'd
like to see? I've checked correlated and uncorrelated subqueries, various
flavours of join, aggregates with HAVING clauses
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20687#discussion_r174637789
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
---
@@ -22,54 +22,34 @@ import
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20687#discussion_r173561557
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
---
@@ -22,32 +22,24 @@ import
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20687#discussion_r173561516
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
---
@@ -22,32 +22,24 @@ import
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20687#discussion_r173561415
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
---
@@ -22,32 +22,24 @@ import
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20687#discussion_r173323846
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala
---
@@ -331,4 +330,31 @@ class ComplexTypesSuite
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20687#discussion_r173268999
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala
---
@@ -331,4 +330,31 @@ class ComplexTypesSuite
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20687#discussion_r173013094
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala
---
@@ -331,4 +330,31 @@ class ComplexTypesSuite
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20687#discussion_r173007679
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala
---
@@ -331,4 +330,31 @@ class ComplexTypesSuite
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20687#discussion_r172996211
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
---
@@ -22,32 +22,24 @@ import
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20687#discussion_r172992262
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala
---
@@ -331,4 +330,30 @@ class ComplexTypesSuite
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/20687
This failing because of SPARK-23606, which seems unrelated (I haven't been
able to trigger it in local builds, at least
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/20687
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/20740
argh, wrong PR, sorry for retest-spam.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/20740
Retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user henryr opened a pull request:
https://github.com/apache/spark/pull/20740
[SPARK-23604][SQL] Change Statistics.isEmpty to !Statistics.hasNonNulâ¦
â¦lValue
## What changes were proposed in this pull request?
Parquet 1.9 will change the semantics
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20687#discussion_r172300096
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala
---
@@ -331,4 +331,24 @@ class ComplexTypesSuite
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/20687
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user henryr opened a pull request:
https://github.com/apache/spark/pull/20687
[SPARK-25000][SQL] Fix complex type simplification rules to apply to entire
plan
## What changes were proposed in this pull request?
Complex type simplification optimizer rules were
GitHub user henryr opened a pull request:
https://github.com/apache/spark/pull/20443
[SPARK-23157][SQL][FOLLOW-UP] DataFrame -> SparkDataFrame in R comment
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/henryr/spark SPARK-23
GitHub user henryr opened a pull request:
https://github.com/apache/spark/pull/20429
[SPARK-23157][SQL] Explain restriction on column expression in withColumn()
## What changes were proposed in this pull request?
It's not obvious from the comments that any added column must
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20355#discussion_r163455727
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
@@ -78,4 +78,20 @@ class FileBasedDataSourceSuite extends
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20355#discussion_r163422934
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
@@ -68,13 +68,16 @@ class FileBasedDataSourceSuite extends
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20355#discussion_r163411267
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/text/TextSuite.scala
---
@@ -172,6 +172,14 @@ class TextSuite extends
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20355#discussion_r163146077
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/text/TextSuite.scala
---
@@ -172,6 +172,14 @@ class TextSuite extends
GitHub user henryr opened a pull request:
https://github.com/apache/spark/pull/20355
SPARK-23148: [SQL] Allow pathnames with special characters for CSV / â¦
â¦JSON / text
## What changes were proposed in this pull request?
Fix for JSON and CSV data sources when
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/20254
Thanks all for the pointers and feedback! I've removed the references to
the behavior before 2.0, and now the changes just make it explicit that this is
`EXCEPT DISTINCT` (I appreciate that that's
1 - 100 of 127 matches
Mail list logo