Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/1698#issuecomment-50890613
redudeByKey being the same as reduce, and cartesian being the same as
broadcast is the whole point, the difference being that redudeByKey and
cartesian
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/1698#issuecomment-50896323
why do you use treeReduce + broadcast? the data per partition is small no?
only a few aggregates per partition
---
If your project is set up for it, you can reply
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/1698#issuecomment-50900320
i can see your point of 10M columns.
would be really nice if we have a lazy and efficient allReduce(RDD[T], (T,
T) = T): RDD[T]
a RDD transform
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/2962
implement secondary sort: sorting by values in addition to keys
see:
https://issues.apache.org/jira/browse/SPARK-3655
this is the first of 2 competing pullreqs that try to address
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/2963
add foldLeftByKey to PairRDDFunctions for reduce algorithms that by key ...
...need to process values in a particular order
see:
https://issues.apache.org/jira/browse/SPARK-3655
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/735#issuecomment-46716428
hey sorry somehow misses this conversation thread. sure will update
defaults and docs
On Wed, Jun 4, 2014 at 1:48 AM, Patrick Wendell notificati
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/735#issuecomment-46865956
not sure if i am supposed to deal with these failures?
On Sat, Jun 21, 2014 at 1:52 PM, UCB AMPLab notificati...@github.com
wrote:
Refer
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/735#issuecomment-48548156
i updated docs and defaults as requested. currently waiting for feedback or
a merge
On Wed, Jul 9, 2014 at 6:46 PM, mingyukim notificati
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/735#issuecomment-49251684
https://issues.apache.org/jira/browse/SPARK-2543
On Wed, Jul 16, 2014 at 9:53 PM, Apache Spark QA notificati...@github.com
wrote:
QA tests
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/609#issuecomment-49559153
on the command line i can get this to work now, but its still way beyond my
bash skills to use exec spark-submit inside a script with multiple java
options
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/113#issuecomment-38056370
thanks
On Tue, Mar 18, 2014 at 2:55 AM, Reynold Xin
notificati...@github.comwrote:
We are reverting this pull request in
#167https
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/735
Feat kryo max buffersize
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tresata/spark feat-kryo-max-buffersize
Alternatively you can
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/735#issuecomment-42896061
hey matei,
i think they always had this feature in kryo, at least in 2.x.
created jira here:
https://issues.apache.org/jira/browse/SPARK-1811
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/764
SPARK-1801. expose InterruptibleIterator and TaskKilledException in deve...
...loper api
You can merge this pull request into a Git repository by running:
$ git pull https://github.com
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/2963#discussion_r21387829
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
---
@@ -460,6 +461,63 @@ class PairRDDFunctions[K, V](self: RDD[(K, V
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/2963#issuecomment-65828969
Hey @zsxwing,
In Scala Seq the order in which the values get processed in foldLeft is
well defined.
But can we make any assumptions at all about
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/3632
SPARK-3655 GroupByKeyAndSortValues
See https://issues.apache.org/jira/browse/SPARK-3655
This pullreq is based on the approach that uses
repartitionAndSortWithinPartition, but only
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/2962#issuecomment-67754336
i am going to close this pulllreq. i get the impression there is no
interest in changing spark internal sort routines to support sorting by (key,
value) pairs
Github user koertkuipers closed the pull request at:
https://github.com/apache/spark/pull/2962
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user koertkuipers closed the pull request at:
https://github.com/apache/spark/pull/2963
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/2963#issuecomment-67754464
i am going to close this pullreq. i hope to pick up foldLeft later again
(together with a proper java version), but for SPARK-3655 the focus for now
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/3632#issuecomment-68011681
hey @markhamstra
i assume you are referring to the one method groupByKeyAndSortValues that
has an implicit Ordering[V] parameter, since the other
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/3632#issuecomment-68081488
mhhh i dont really agree with you. i find OrderedRDD confusing because:
1) you kind of have to know that there is an implicit conversion to
OrderedRDD somewhere
GitHub user koertkuipers reopened a pull request:
https://github.com/apache/spark/pull/3632
SPARK-3655 GroupByKeyAndSortValues
See https://issues.apache.org/jira/browse/SPARK-3655
This pullreq is based on the approach that uses
repartitionAndSortWithinPartition, but only
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/3632#issuecomment-68081763
i will work on updated version early januari
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user koertkuipers closed the pull request at:
https://github.com/apache/spark/pull/3632
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/3632#issuecomment-68081928
woops sorry i hit the wrong button there. didnt mean to close this pullreq.
@markhamstra
i will try to update this pullreq sometime in first few weeks
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/3632#issuecomment-68150977
@markhamstra take a look now.
i ignored the situation of K and V having same type, since i think it can
be dealt with by using a simple wrapper (value) class
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/3632#discussion_r22428452
--- Diff: core/src/main/scala/org/apache/spark/util/Ordering.scala ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/3632#discussion_r22423736
--- Diff: core/src/main/scala/org/apache/spark/util/Ordering.scala ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/3632#discussion_r22423573
--- Diff: core/src/main/scala/org/apache/spark/util/Ordering.scala ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/3632#discussion_r22770867
--- Diff: core/src/main/scala/org/apache/spark/util/Ordering.scala ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/6883
SPARK-4644 blockjoin
Although the discussion (and design doc) under SPARK-4644 seem focussed on
other aspects of skew (OOM mostly) than this pullreq (which focusses on
avoiding a single
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/6848#issuecomment-113789091
i see MiMa failed. what binary compatibility promise does spark make? all
minor versions are binary compatible?
---
If your project is set up for it, you can
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/6848#issuecomment-113242162
ok i will look into JavaSparkContext and a few simple regression tests.
will probably need some help with python.
On Wed, Jun 17, 2015 at 12:34 AM
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/6848#issuecomment-114350614
i see:
[info] spark-core: found 2 potential binary incompatibilities (filtered 488)
[error] * method
saveAsTextFile(java.lang.String
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/6848
SPARK 8398 hadoop input/output format advanced control
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tresata/spark
feat-hadoop-input
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/6361#issuecomment-116739363
@JoshRosen one issue i see with publishing a modified chill package: we
read files in spark that were written by scalding using chill/kryo for
serialization
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/6361#issuecomment-118637079
thats fair enough. however keep in mind that kryo is a transitive
dependency of spark, and one that does not upgrade well and has not been
shaded, so you
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/6361#issuecomment-116299142
i am not so sure you it is safe to bump the kryo version like that. chill
0.5.0 doesnt compile against kryo 2.24.0, so what guarantees do you have that
chill
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/6883#issuecomment-133778896
i put this in a spark package together with skewjoin in case anyone wants
to use it.
see here:
http://spark-packages.org/package/tresata/spark-skewjoin
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/6848#issuecomment-121468577
@andrewor14 @JoshRosen anything i need to do, besides fixing trivial
conflicts?
---
If your project is set up for it, you can reply to this email and have your
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/8416#issuecomment-152095666
You could create dataframe per path and then union them.
On Oct 28, 2015 19:14, "Jon Edvald" <notificati...@github.com> wrote:
>
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/8416#issuecomment-148589922
i believe this is done
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/8416#discussion_r42308755
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -123,6 +124,24 @@ class DataFrameReader private[sql](sqlContext
Github user koertkuipers closed the pull request at:
https://github.com/apache/spark/pull/6883
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/6883#issuecomment-148917448
sure
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/8416#issuecomment-145902167
goals (copied over from SPARK-5741 comments by @marmbrus ):
It was originally just parquet that would support more than one file, but
now all HadoopFSRelations
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/6848#issuecomment-120564648
i will fix resolve conflicts when someone says this is good to go.
otherwise i keep merging from master every few days.
---
If your project is set up for it, you
Github user koertkuipers closed the pull request at:
https://github.com/apache/spark/pull/3632
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/8416#issuecomment-135486744
Can you point me to the jira where that decision was made?
Hadoop globbing only covers a small subset of all use cases. For example
for timeseries analysis
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/8416#issuecomment-135488186
I am not sure Union is a good idea at all, since i would have to union
DataFrames for hundreds of partitions and the Union logical operator only takes
left
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/8416#issuecomment-136034683
i updated this pullreq based on the conversation at
https://issues.apache.org/jira/browse/SPARK-5741
---
If your project is set up for it, you can reply
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/4449#issuecomment-138716589
i would like to have something like this in core
On Fri, Sep 4, 2015 at 6:22 AM, rapen <notificati...@github.com> wrote:
> @danielhav
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/8416
[SPARK-10185] [SQL] Feat sql comma separated paths
Make sure comma-separated paths get processed correcly in
ResolvedDataSource for a HadoopFsRelationProvider
You can merge this pull request
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
**[Test build #5 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/5/consoleFull)**
for PR 13512 at commit
[`077f782`](https://github.com/apache/spark
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
Build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
**[Test build #5 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/5/consoleFull)**
for PR 13512 at commit
[`077f782`](https://github.com/apache/spark
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/5/
Test FAILed
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/13512
[SPARK-15769][SQL] Add Encoder for input type to Aggregator
## What changes were proposed in this pull request?
Aggregator also has an Encoder for the input type
## How
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13526
can you explain a bit what is inefficient and would need an optimizer rule?
is it mapValues being called twice? once for the key and then for the new
values?
thanks!
---
If your
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13526
see this conversation:
https://mail-archives.apache.org/mod_mbox/spark-user/201602.mbox/%3ccaaswr-7kqfmxd_cpr-_wdygafh+rarecm9olm5jkxfk14fc...@mail.gmail.com%3E
mapGroups
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13526
could we "rewind"/undo the append for the key and change it to a map that
inserts new values and key? so remove one append and replace it with another
operation?
---
If your proj
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13526
the tricky part with that is that (ds: Dataset[(K,
V)]).groupBy(_._1).mapValues(_._2) should return a
KeyValueGroupedDataset[K, V]
On Tue, Jun 7, 2016 at 8:22 PM, Wenchen Fan
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13526
```
scala> val x = Seq(("a", 1), ("b", 2)).toDS
x: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
scala> x.groupByKey(_._1).ma
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
If Aggregator is designed for typed Dataset only then that is a bit of a
shame, because its a elegant and generic api that should be useful for
DataFrame too. this causes fragmentation
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/13526
[SPARK-15780][SQL] Support mapValues on KeyValueGroupedDataset
## What changes were proposed in this pull request?
Add mapValues to KeyValueGroupedDataset
## How
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13526
ok i will study the physical plans for both and try to understand why one
would be slower
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
well that was sort of what i was trying to achieve. the unit tests i added
were for using Aggregator for untyped grouping(```groupBy```).
and i think for it to be useful within
Github user koertkuipers closed the pull request at:
https://github.com/apache/spark/pull/13512
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
for example with this branch you can do:
```
val df3 = Seq(("a", "x", 1), ("a", "y", 3), ("b", "x", 3)).toDF("i"
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/13526#discussion_r65972115
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala ---
@@ -65,6 +65,44 @@ class KeyValueGroupedDataset[K, V] private
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/13532
[SPARK-15204][SQL] improve nullability inference for Aggregator
## What changes were proposed in this pull request?
TypedAggregateExpression sets nullable based on the schema
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/8416
this patch should not have broken reading files that include comma.
i also added unit test for this:
https://github.com/apache/spark/pull/8416/files#diff
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/13532#discussion_r65986613
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala
---
@@ -51,7 +52,8 @@ object
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
@cloud-fan from the (added) unit tests:
```
val df2 = Seq("a" -> 1, "a" -> 3, "b" -> 3).toDF("i", "j")
checkAnswer(df2.grou
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
@cloud-fan i am running into some trouble updating my branch to the latest
master. i get errors in tests due to Analyzer.validateTopLevelTupleFields
the issue seems
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/13727#discussion_r68672691
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -135,7 +129,7 @@ class DataFrameReader private[sql](sparkSession
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/13727#discussion_r68624316
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -135,7 +129,7 @@ class DataFrameReader private[sql](sparkSession
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/13727#discussion_r68645998
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -135,7 +129,7 @@ class DataFrameReader private[sql](sparkSession
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/11980#issuecomment-202588020
@cloud-fan i tried to do that, but i don't think i am familiar enough with
the code gen, because it breaks other unit tests. it seems to me i am messing
up
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/11980
SPARK-14139 Dataset loses nullability in operations with RowEncoder
## What changes were proposed in this pull request?
RowEncoder now respects nullability for struct fields when
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/11508#issuecomment-202111907
it might seem easiest to put a defaultSize on ObjectType, but i think that
is masking the real problem, which is that the optimizer replaces the real
types
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/11508#issuecomment-202136461
would it be possible to have a variation of ObjectType that can take in
info like defaultSize which it takes from the real type?
---
If your project is set up
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-193877543
i believe the need to pass all files along (e.g. inputFiles:
Array[FileStatus]) instead of just the input paths came from the need to cache
it so that stuff
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-193921325
if it did then it was not always in the apis i think? i remember the apis
having paths: Seq[String] instead of files: Seq[FileStatus]. by explicitly
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/11980#discussion_r58319194
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala
---
@@ -120,17 +120,19 @@ object RowEncoder
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/11980#issuecomment-203070056
@cloud-fan i pushed at attempt at this, but i am having trouble with
RowEncoderSuite encode/decode: Product
this test uses a Product value with a StructType
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/11980#discussion_r57829829
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala
---
@@ -120,17 +120,19 @@ object RowEncoder
Github user koertkuipers commented on a diff in the pull request:
https://github.com/apache/spark/pull/11980#discussion_r57829840
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects.scala
---
@@ -680,3 +680,54 @@ case class AssertNotNull(child
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/12359#issuecomment-209535957
great, thanks for this
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/6848#issuecomment-213661475
@holdenk ok i tried to make it look all pretty
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/11947#issuecomment-215192562
hello!
why is there no stringNullValue?
basically i want for a column with type string to read in all empty strings
as nulls. this is what the old option
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/11947#issuecomment-215194241
do these settings roundtrip correctly? say i set doubleNaNValue to "XY",
and i create a dataframe with a Double.NaN in it, does it get written out
corre
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/11947#issuecomment-215196735
i personally would have been happy with a simple single values for nulls
for all datatypes.
and the usage of that single value should be consistent across
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/11947#issuecomment-215979899
please also provide a way for strings to be converted to null upon reading
---
If your project is set up for it, you can reply to this email and have your
reply
Github user koertkuipers closed the pull request at:
https://github.com/apache/spark/pull/11980
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/12877#issuecomment-216670925
i made it lazy val since SparkSession.wrapped is effectively lazy too:
protected[sql] def wrapped: SQLContext = {
if (_wrapped == null
GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/12877
[SPARK-15097][SQL] make Dataset.sqlContext a stable identifier for imports
## What changes were proposed in this pull request?
Make Dataset.sqlContext a lazy val so that its a stable
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/12877#issuecomment-216675245
if a SparkSession sits inside a Dataset does that mean _wrapped is always
already initialized (because you cannot have a Dataset without a
SparkContext
1 - 100 of 175 matches
Mail list logo