Github user aray commented on the issue:
https://github.com/apache/spark/pull/21699
Using either `Column` or `String` type was actually in my original PR:
https://github.com/apache/spark/pull/7841
@rxin later modified the api to only take a `String` prior to the release
as part
Github user aray commented on the issue:
https://github.com/apache/spark/pull/21187
LGTM thanks for doing this!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user aray commented on the issue:
https://github.com/apache/spark/pull/19629
diff LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/18306#discussion_r143345713
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -562,6 +563,8 @@ class SparkContext(config: SparkConf) extends Logging
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/19226#discussion_r138917350
--- Diff: python/pyspark/serializers.py ---
@@ -343,6 +343,8 @@ def _load_stream_without_unbatching(self, stream):
key_batch_stream
Github user aray commented on the issue:
https://github.com/apache/spark/pull/19226
@holdenk I'm not going to be able to solve this tonight (short of just
removing the failing test).
---
-
To unsubscribe, e-mail
Github user aray commented on the issue:
https://github.com/apache/spark/pull/19226
It's actually this one that is failing
https://github.com/aray/spark/blob/0d64a6d11237383c2a6ea21275dc9daa5cc8d634/python/pyspark/tests.py#L964
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/19226
[SPARK-21985][PySpark] PairDeserializer is broken for double-zipped RDDs
## What changes were proposed in this pull request?
This removes the mostly unnecessary test that each individual
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16121
I'll take a look, sorry about that.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/18306#discussion_r136436631
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -562,6 +563,8 @@ class SparkContext(config: SparkConf) extends Logging
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/18818#discussion_r136421644
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
---
@@ -582,6 +582,7 @@ class CodegenContext
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/19080#discussion_r136419947
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
---
@@ -284,24 +241,17 @@ case class RangePartitioning
Github user aray commented on the issue:
https://github.com/apache/spark/pull/18306
ping @zsxwing
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user aray commented on the issue:
https://github.com/apache/spark/pull/18818
ping @viirya @gatorsmile
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
GitHub user aray reopened a pull request:
https://github.com/apache/spark/pull/18786
[SPARK-21584][SQL][SparkR] Update R method for summary to call new
implementation
## What changes were proposed in this pull request?
SPARK-21100 introduced a new `summary` method
Github user aray closed the pull request at:
https://github.com/apache/spark/pull/18786
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user aray commented on the issue:
https://github.com/apache/spark/pull/18786
closing and reopening to trigger AppVeyor test that timed out
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user aray commented on the issue:
https://github.com/apache/spark/pull/18818
@viirya @gatorsmile I have addressed your comments, could you take another
look.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/18818#discussion_r133116720
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
---
@@ -465,7 +475,7 @@ abstract class BinaryComparison
Github user aray commented on the issue:
https://github.com/apache/spark/pull/18818
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user aray commented on the issue:
https://github.com/apache/spark/pull/18786
I'm pushing for it to stay as is because it's the more logical layout of
the data: min=0%, 25%, 50%, 75%, max=100%. It's also more consistent with
summary of native R dataframes (and for Python
Github user aray commented on the issue:
https://github.com/apache/spark/pull/18306
@zsxwing can you take another look at this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user aray commented on the issue:
https://github.com/apache/spark/pull/18786
@rxin Any thoughts on whether it's ok to change the output of `summary` in
R in a non "additive" way?
---
If your project is set up for it, you can reply to this email and have your
re
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/18818#discussion_r131913185
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
---
@@ -453,6 +453,14 @@ case class Or(left: Expression
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/18818#discussion_r131808912
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
---
@@ -453,6 +453,14 @@ case class Or(left: Expression
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/18818#discussion_r131656840
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala
---
@@ -79,18 +79,6 @@ private[sql] class TypeCollection(private val
Github user aray commented on the issue:
https://github.com/apache/spark/pull/18835
Thanks, I see it now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user aray closed the pull request at:
https://github.com/apache/spark/pull/18835
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/18835
[SPARK-21628][BUILD] Explicitly specify Java version in maven compiler
plugin so IntelliJ imports project correctly
## What changes were proposed in this pull request?
Explicitly specify
Github user aray commented on the issue:
https://github.com/apache/spark/pull/18786
No the changes to `summary` are not additive, it inserts 25%, 50%, and 75%
percentiles before max (the last row). People that want the previous behavior
can use `describe`. Or if they are trying
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/18818
[SPARK-21110][SQL] Structs, arrays, and other orderable datatypes should be
usable in inequalities
## What changes were proposed in this pull request?
Allows `BinaryComparison` operators
GitHub user aray reopened a pull request:
https://github.com/apache/spark/pull/18786
[SPARK-21584][SQL][SparkR] Update R method for summary to call new
implementation
## What changes were proposed in this pull request?
SPARK-21100 introduced a new `summary` method
Github user aray closed the pull request at:
https://github.com/apache/spark/pull/18786
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/18800
[SPARK-21330][SQL] Bad partitioning does not allow to read a JDBC table
with extreme values on the partition column
## What changes were proposed in this pull request?
An overflow
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/18786#discussion_r130620399
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2973,15 +2974,51 @@ setMethod("describe",
dataFrame(sdf)
})
+
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/18786#discussion_r130618566
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -2500,8 +2500,15 @@ test_that("describe() and summarize() on a
DataFrame", {
ex
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/18786
[SPARK-21584][SQL][SparkR] Update R method for summary to call new
implementation
## What changes were proposed in this pull request?
SPARK-21100 introduced a new `summary` method
Github user aray commented on the issue:
https://github.com/apache/spark/pull/18697
@viirya We could certainly make that improvement. I believe it would be a
fairly trivial change to this PR if we were just considering expressions that
have the same canonical representation. However
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/18697#discussion_r130396904
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ---
@@ -65,6 +65,10 @@ abstract class SparkPlan extends QueryPlan[SparkPlan
Github user aray commented on the issue:
https://github.com/apache/spark/pull/18762
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/18762
[SPARK-21566][SQL][Python] Python method for summary
## What changes were proposed in this pull request?
Adds the recently added `summary` method to the python dataframe interface
Github user aray commented on the issue:
https://github.com/apache/spark/pull/18697
ping @rxin can someone look at this correctness fix?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16577
Closing since it does not look like there is any interest in changing this.
Thanks everyone!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user aray closed the pull request at:
https://github.com/apache/spark/pull/16577
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user aray commented on the issue:
https://github.com/apache/spark/pull/18697
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user aray commented on the issue:
https://github.com/apache/spark/pull/18697
Plan for the example query before the patch (with partitioning as suffix):
```
*HashAggregate(keys=[parent#228], functions=[], output=[level2#274])
hashpartitioning(parent#228, 5
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/18697
[SPARK-16683][SQL] Repeated joins to same table can leak attributes via
partitioning
## What changes were proposed in this pull request?
In some complex queries where the same table
Github user aray commented on the issue:
https://github.com/apache/spark/pull/18306
ping @rxin @marmbrus @zsxwing @felixcheung can anyone look at this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/18307#discussion_r125053122
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2205,37 +2205,170 @@ class Dataset[T] private[sql](
* // max 92.0
Github user aray commented on the issue:
https://github.com/apache/spark/pull/18307
@rxin @felixcheung Thanks for the feedback, I revamped this PR to leave
`describe` unchanged and added two new methods `describeExtended` and
`describeAdvanced` (the latter is used to implement all
Github user aray commented on the issue:
https://github.com/apache/spark/pull/18306
@zsxwing @marmbrus @rxin This is ready for review. I have changed the
approach so that queries from all SparkSession's are stopped. I was not able to
use a SparkListener as @zsxwing suggested or even
Github user aray commented on the issue:
https://github.com/apache/spark/pull/18307
@rxin Yes it slows things down quite a bit. Informal testing on 10M row 2
column synthetic data puts this implementation at around 10s vs 0.5s in
2.2-rc4. I can speed it up some by doing only a single
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/18306#discussion_r122112073
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
---
@@ -690,6 +690,7 @@ class SparkSession private(
* @since 2.0.0
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/18307
[SPARK-21100][SQL] describe should give quartiles similar to Pandas
## What changes were proposed in this pull request?
Modify the describe method to include quartiles (25th, 50th, and 75th
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/18306
[SPARK-21029][SS] All StreamingQuery should be stopped when the
SparkSession is stopped
## What changes were proposed in this pull request?
Adds method to `StreamingQueryManager` that stops
Github user aray commented on the issue:
https://github.com/apache/spark/pull/18001
Yes, it does not work without
```
Andrews-MacBook-Pro:spark-2.1.1-bin-hadoop2.7 andrew$ jupyter --version
4.0.6
Andrews-MacBook-Pro:spark-2.1.1-bin-hadoop2.7 andrew
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/18001
[SPARK-20769][Doc] Incorrect documentation for using Jupyter notebook
## What changes were proposed in this pull request?
SPARK-13973 incorrectly removed the required
Github user aray commented on the issue:
https://github.com/apache/spark/pull/17348
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16483
@thunterdb The extra step -- as implemented -- is only at the end as that
gives the same result as doing it after every iteration but without the extra
overhead.
---
If your project is set up
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/16483#discussion_r106548090
--- Diff:
graphx/src/test/scala/org/apache/spark/graphx/lib/PageRankSuite.scala ---
@@ -68,26 +69,34 @@ class PageRankSuite extends SparkFunSuite
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/16483#discussion_r106546448
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala
---
@@ -322,13 +335,12 @@ object PageRank extends Logging {
def
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16483
@rxin can anyone else review this? It would be nice to get this correctness
fix into 2.2.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user aray commented on the issue:
https://github.com/apache/spark/pull/17226
@HyukjinKwon There is an inconsistency/regression but its not being
introduced in this PR, its already there. Take an example without null as a
pivot column value like below. The only difference
Github user aray commented on the issue:
https://github.com/apache/spark/pull/17226
@HyukjinKwon we're not introducing a regression in this PR by fixing the
NPE, the answer given by 1.6 was incorrect under any interpenetration. Again,
there is a completely separate issue of what
Github user aray commented on the issue:
https://github.com/apache/spark/pull/17226
BTW for 3 above if we decide it should be 0, we can add an initial value
for `PivotFirst` to make the fix.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user aray commented on the issue:
https://github.com/apache/spark/pull/17226
There are three things going on here in your one example.
1. Spark 1.6 [first version with pivot] (and Spark 2.0+ with an aggregate
output type unsupported by PivotFirst) gives incorrect
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/17226#discussion_r105324124
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -522,7 +522,7 @@ class Analyzer(
} else
Github user aray commented on the issue:
https://github.com/apache/spark/pull/17226
@HyukjinKwon As stated in 17226#discussion_r105322758 I think we should
open a second JIRA to have the discussion on whether or not count(1) of no
values in a pivot should be filled with 0's
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/17226#discussion_r105322758
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala ---
@@ -216,4 +216,10 @@ class DataFramePivotSuite extends QueryTest
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/17226
[SPARK-19882][SQL] Pivot with null as a distinct pivot value throws NPE
## What changes were proposed in this pull request?
Allows null values of the pivot column to be included in the pivot
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/15415#discussion_r97168170
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/15415#discussion_r97162464
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/15415#discussion_r97168311
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/15415#discussion_r97166816
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user aray closed the pull request at:
https://github.com/apache/spark/pull/16539
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16483
@rxin can you take a look?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/16577
[SPARK-19214][SQL] Typed aggregate count output field name should be "count"
## What changes were proposed in this pull request?
Changes the output field name of typed aggreg
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16559
It can already be done with the `posexplode` UDTF like
```
with t as (values (array(1,2,3)), (array(4,5,6)) as (a))
select col from t lateral view posexplode(a) tt where pos = 2
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16555
The title should say 2.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/16539
[SPARK-8855][MLlib][PySpark] Python API for Association Rules
## What changes were proposed in this pull request?
This patch adds a `generateAssociationRules(confidence)` method
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16483
ping @srowen @ankurdave can you take a look at this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/16483
[SPARK-18847][GraphX] PageRank gives incorrect results for graphs with sinks
## What changes were proposed in this pull request?
Graphs with sinks (vertices with no outgoing edges) don't have
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16271
Yes the improvement is from the sum of magnitudes of initial values being
closer to the (known) sum of the solution. Fiddling with resetProb controls a
completely different thing. The current
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/16271#discussion_r92621591
--- Diff:
graphx/src/test/scala/org/apache/spark/graphx/lib/PageRankSuite.scala ---
@@ -70,10 +70,10 @@ class PageRankSuite extends SparkFunSuite
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/16240#discussion_r92546082
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala
---
@@ -100,31 +100,76 @@ abstract class SQLImplicits {
// Seqs
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16271
**References**
[Pagerank paper](http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf)
> We need to make an initial assignment of the ranks. This assignment can
be made by one of several strateg
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16271
ping @srowen @dbtsai @rxin @ankurdave @jegonzal
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16271
Updated the above benchmark code with a log normal random graph on 10,000
vertices the difference is much more drastic.
![](http://i.imgur.com/Zo56dEO.png)
(take the very bottom of the graph
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/16271
[SPARK-18845][GraphX] PageRank has incorrect initialization value that
leads to slow convergence
## What changes were proposed in this pull request?
Change the initial value in all PageRank
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16161
I would be happy to create a seperate PR for adding support for
`mutable.Map` (and `List`) if that is wanted. But there is no _generic_
solution as there is no type that is assignable to both
Github user aray closed the pull request at:
https://github.com/apache/spark/pull/16197
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/16197
[SPARK-17760][SQL][Backport] AnalysisException with dataframe pivot when
groupBy column is not attribute
## What changes were proposed in this pull request?
Backport of #16177 to branch-2.0
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16161
Right now it's not supported to have the following:
```
case class Foo(a: Map[Int, Int])
```
(using the scala Predef version of Map)
The
[documented](http://spark.apache.org
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/16177
[SPARK-17760][SQL] AnalysisException with dataframe pivot when groupBy
column is not attribute
## What changes were proposed in this pull request?
Fixes AnalysisException for pivot queries
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16161
The approach is to change the deserializer (via
`ScalaReflection#deserializerFor`) to return the more specific type
`scala.collections.immutable.Map` instead of `scala.collections.Map` as it does
now
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/16161
[SPARK-18717][SQL] Make code generation for Scala Map work with
immutable.Map also
## What changes were proposed in this pull request?
Fixes compile errors in generated code when user has
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16121
@davies, @zero323, and @holdenk this is in a good place for review if you
want to take a look.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16121
@davies I was trying to make minimal changes to `PairDeserializer`, but you
are right it needs changed also. I'll update the PR shortly.
---
If your project is set up for it, you can reply
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/16121
[SPARK-16589][PYTHON] Chained cartesian produces incorrect number of records
## What changes were proposed in this pull request?
Fixes a bug in the python implementation of rdd cartesian
Github user aray commented on the issue:
https://github.com/apache/spark/pull/15898
@tejasapatil yes that is the use case where this applies. It's only tested
against whatever version is included in the hadoop2.7+hive build configuration
listed above. Is there anything in particular
1 - 100 of 156 matches
Mail list logo