GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/19905
[SPARK-22710] ConfigBuilder.fallbackConf should trigger onCreate function
## What changes were proposed in this pull request?
I was looking at the config code today and found that configs defined
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19468
For future pull requests, can you create subtasks under
https://issues.apache.org/jira/browse/SPARK-18278 ?
---
-
To unsubscribe
Repository: spark
Updated Branches:
refs/heads/master 475a29f11 -> e9b2070ab
http://git-wip-us.apache.org/repos/asf/spark/blob/e9b2070a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19468
Thanks - merging in master!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
-on-k8s.github.io/userdocs/running-on-kubernetes.html
cc rxin felixcheung mateiz (shepherd)
k8s-big-data SIG members & contributors: mccheah ash211 ssuchter varunkatta
kimoonkim erikerlandson liyinan926 tnachen ifilonenko
Author: Yinan Li <liyinan...@gmail.com>
Author: foxish
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/11994#discussion_r153321690
--- Diff: core/src/main/scala/org/apache/spark/metrics/sink/Sink.scala ---
@@ -17,8 +17,48 @@
package org.apache.spark.metrics.sink
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/11994
Hey so my main question is whether we should expose the coda hale metric
library directly. In the past, we have done this and it has come back to bite
us. For example, exposing the Hadoop
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19468
I went through the changes to make sure the non-k8s changes are ok. They do
look ok to me. From that perspective, LGTM
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19623#discussion_r148605519
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java
---
@@ -50,28 +53,34 @@
/**
* Creates
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19623#discussion_r148553358
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java
---
@@ -50,28 +53,34 @@
/**
* Creates
Repository: spark
Updated Branches:
refs/heads/master b2463fad7 -> 41b60125b
[SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark
## What changes were proposed in this pull request?
This PR proposes to add a link from `spark.catalog(..)` to `Catalog` and expose
Catalog
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19596
Merging in master.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19623#discussion_r148545942
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java
---
@@ -50,28 +53,34 @@
/**
* Creates
Repository: spark
Updated Branches:
refs/heads/master 849b465bb -> 277b1924b
[SPARK-22408][SQL] RelationalGroupedDataset's distinct pivot value calculation
launches unnecessary stages
## What changes were proposed in this pull request?
Adding a global limit on top of the distinct values
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19629
Merging in master.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19623#discussion_r148527451
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java
---
@@ -50,28 +53,34 @@
/**
* Creates
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19629
Jenkins, test this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
How was this patch tested?
This is a doc only change.
Author: Reynold Xin <r...@databricks.com>
Closes #19626 from rxin/dsv2-update.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d43e1f06
Tree: http://git-wip-us.apache.
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19626
Merging in master.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/19626
[minor] Data source v2 docs update.
## What changes were proposed in this pull request?
This patch includes some doc updates for data source API v2. I was reading
the code and noticed some minor
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19626
cc @cloud-fan
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19596
Yea definitely.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19592
Is this complexity worth it? Can we just document it as a behavior and
users need to be careful with it?
---
-
To unsubscribe, e
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18828#discussion_r147544081
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/SparkPlanSuite.scala ---
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19468#discussion_r146429113
--- Diff: pom.xml ---
@@ -2649,6 +2649,13 @@
+ kubernetes
+
+resource-managers/kubernetes/core
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19498
cc @tdas
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19535
Looks good at high level.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19512
Seems fine to backport into 2.2.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19269#discussion_r145579513
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java
---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19521
LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19524
seems fine.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19419
Yea in general for security features it seems like it's good to turn on
them by default.
---
-
To unsubscribe, e-mail: reviews
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19419
Is there a reason why this cannot be always enabled?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18732
Grouped UDFs, or Grouped Vectorized UDFs.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19451#discussion_r144687542
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -1242,6 +1244,51 @@ object ReplaceIntersectWithSemiJoin
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19451
If we have to do this all over again i'd put all rules in their own files.
Replace isn't really a great high level category because all rules at some
level replace something
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19451
Actually you already have it in the classdoc, so please just update the pr
description with it.
---
-
To unsubscribe, e-mail
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19451#discussion_r144461898
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -1242,6 +1244,53 @@ object ReplaceIntersectWithSemiJoin
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19451#discussion_r144461913
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -1242,6 +1244,53 @@ object ReplaceIntersectWithSemiJoin
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19451#discussion_r144461813
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -1242,6 +1244,53 @@ object ReplaceIntersectWithSemiJoin
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19451
Can you update the pr description with an example plan before / after this
optimization, and also put that example in the comment section of the doc
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18805
Does the package include a binary distribution for Linux?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/6751
Isn't the hint available in SQL?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19454
Honestly I don't think it is worth doing this.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18732
I'm OK with the naming. We can change them later if needed before the
release.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19454
I actually think this can be confusing on Dataset[T], when the Dataset is
just untyped and a DataFrame. Do we throw a runtime exception
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19454
Is this worth doing?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r143340681
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
---
@@ -519,3 +519,18 @@ case class CoGroup
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19250#discussion_r143243338
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
---
@@ -1213,6 +1213,71 @@ case class
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19250#discussion_r143122895
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/TimestampTableTimeZone.scala
---
@@ -0,0 +1,213 @@
+/*
+ * Licensed
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19250#discussion_r143122657
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -230,6 +230,13 @@ case class AlterTableSetPropertiesCommand
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19250#discussion_r143122503
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -266,6 +267,10 @@ final class DataFrameWriter[T] private[sql](ds:
Dataset
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19250#discussion_r143122396
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
---
@@ -1015,6 +1020,10 @@ object DateTimeUtils {
guess
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19250#discussion_r143122317
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
---
@@ -1213,6 +1213,71 @@ case class
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19394
What's the other value?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19394
Not sure - maybe print the chi-value of the test and see if they make
sense. If they do, we can change the threshold
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18732
What's the difference between this one and the transform function you also
proposed? I'm trying to see if all the naming makes sense when considered
together
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19393
LGTM but I wrote most of the code so perhaps we should find somebody else
to review.
---
-
To unsubscribe, e-mail: reviews
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18732
Is this just a mapGroups function?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
sed on chi square test ...
Author: Reynold Xin <r...@databricks.com>
Closes #19387 from rxin/SPARK-22160.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/323806e6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3238
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19387#discussion_r141786663
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -108,11 +108,21 @@ class HashPartitioner(partitions: Int) extends
Partitioner
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19387
Merging in master.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19387
I put up a comment saying this test result should be deterministic, since
the sampling uses a fixed seed based on partition id
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19387#discussion_r141764431
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala ---
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19387#discussion_r141764415
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala ---
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19387#discussion_r141755874
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -108,9 +108,17 @@ class HashPartitioner(partitions: Int) extends
Partitioner
GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/19387
[SPARK-22160][SQL] Allow changing sample points per partition in range
shuffle exchange
## What changes were proposed in this pull request?
Spark's RangePartitioner hard codes the number
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19384
I reverted the 2nd commit. Should be good for merge now.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19384
hm the 2nd commit is not meant for this one.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/19384
[SPARK-22159][SQL] Make config names consistently end with "enabled".
## What changes were proposed in this pull request?
spark.sql.execution.ar
GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/19376
[SPARK-22153][SQL] Rename ShuffleExchange -> ShuffleExchangeExec
## What changes were proposed in this pull request?
For some reason when we added the Exec suffix to all physical operators,
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19362#discussion_r141403817
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -136,6 +134,8 @@ abstract class Optimizer(sessionCatalog
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19269#discussion_r140379971
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java
---
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19269#discussion_r139889741
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Command.scala
---
@@ -0,0 +1,114
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19269#discussion_r139889045
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/SupportsWriteUnsafeRow.java
---
@@ -0,0 +1,44 @@
+/*
+ * Licensed
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19269#discussion_r13951
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriter.java
---
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18704
cc @michal-databricks any thoughts on this?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19261
What does this even mean?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19136
LGTM.
Still some feedback that can be addressed later. We should also document
all the APIs as Evolving
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19136#discussion_r138947707
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala
---
@@ -0,0 +1,71 @@
+/*
+ * Licensed
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19136#discussion_r138947426
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/Statistics.java
---
@@ -0,0 +1,29 @@
+/*
+ * Licensed to the Apache Software
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19136#discussion_r138947297
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/ReadSupportWithSchema.java
---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19136#discussion_r138946124
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/ReadSupport.java ---
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19136#discussion_r138945691
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2.java ---
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19136#discussion_r138709319
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/Statistics.java
---
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19136#discussion_r138665881
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java
---
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19136#discussion_r138624261
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/upward/Statistics.java
---
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19136#discussion_r138623586
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/downward/ColumnPruningSupport.java
---
@@ -0,0 +1,36 @@
+/*
+ * Licensed
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19136#discussion_r138622262
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java
---
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19136#discussion_r138622067
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/SchemaRequiredDataSourceV2.java
---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19136#discussion_r138621970
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/SchemaRequiredDataSourceV2.java
---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19136#discussion_r138621700
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2Options.java
---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19136#discussion_r138621506
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2Options.java
---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/16578
I tried this and this is definitely super useful! it's a big patch and most
of the people working in this area are either doing something else that's not
Spark, or working on a few high priority SPIPs
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19086#discussion_r136640563
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
---
@@ -495,15 +495,16 @@ private[hive] class HiveClientImpl
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19064#discussion_r135590939
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
---
@@ -23,6 +23,12 @@ import
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19064#discussion_r135590830
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
---
@@ -23,6 +23,12 @@ import
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18906
I understand why you are using Python. What I don't understand is why you'd
need to annotate nullability, because those are typically annotated for the
purpose of performance improvement, but Python
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18999#discussion_r135132622
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -659,19 +659,77 @@ def distinct(self):
return DataFrame(self._jdf.distinct(), self.sql_ctx
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18906
@ptkool have you seen a real use case so far that you need this? I'm a bit
surprised since Python UDFs are already pretty slow, and you'd care about this.
Are there other cases you run
401 - 500 of 19261 matches
Mail list logo