Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/21596
FYI - I have found nondeterministic flakes with RDDOperationScope in newer
jackson, you can see fix at https://github.com/palantir/spark/pull/379. What
happens is that jackson object mapper
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/20914
@gatorsmile how does it look now?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
Github user robert3005 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20914#discussion_r178168714
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelationSuite.scala
---
@@ -107,7 +112,7 @@ class
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/20914
`org.apache.spark.sql.execution.streaming.RateSourceV2Suite.basic
microbatch execution` failed which looks like a flake to me
GitHub user robert3005 opened a pull request:
https://github.com/apache/spark/pull/20914
[SPARK-23802][SQL] PropagateEmptyRelation can leave query plan in
unresolved state
## What changes were proposed in this pull request?
Add cast to nulls introduced
Github user robert3005 closed the pull request at:
https://github.com/apache/spark/pull/18176
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/18406
@jerryshao sorry I missed your comment. Somehow didn't get notification for
it
---
-
To unsubscribe, e-mail: reviews
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/19669
#18406 isn't stale, thanks
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/18406
Yes, the key point is to register dynamic metrics since enumerating all of
them can be a lot of hassle and needs to be kept in sync with external libraries
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/18621
Fixed in #18689
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user robert3005 closed the pull request at:
https://github.com/apache/spark/pull/18621
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/18716
LGTM, Thanks for looking into this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/18689
Shouldn't
https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/apache/spark/sql/sources/HadoopFsRelationTest.scala#L786
also be removed? As I understand it it checks whether
GitHub user robert3005 opened a pull request:
https://github.com/apache/spark/pull/18621
[SPARK-21400][SQL] Don't overwrite output committers on append
## What changes were proposed in this pull request?
Stop ignoring user defined output committers in append mode
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/18406
I don't see how this can be worked out. Let's say I am parquet and I want
to register my metrics since they're part of application execution. Right now I
have to statically define all metrics
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/18406
This is to facilitate using metrics in libraries that integrate in spark.
Since spark already has metric reporting infrastructure and lets you register
sources with it it seems natural extension
GitHub user robert3005 opened a pull request:
https://github.com/apache/spark/pull/18406
[SPARK-21195] Automatically register new metrics from sources and wire
default registry
## What changes were proposed in this pull request?
Registers metric listeners on sources metrics
GitHub user robert3005 opened a pull request:
https://github.com/apache/spark/pull/18176
[SPARK-20952] Make TaskContext an InheritableTheadLocal
## What changes were proposed in this pull request?
Make TaskContext reference an InheritableTheadLocal so thread pools spun up
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/14615
thanks @gatorsmile, updated
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/14615
ping? @rxin
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user robert3005 closed the pull request at:
https://github.com/apache/spark/pull/16575
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/16648
@bdrillard if you don't have time to finish this up I am happy to update
this to latest. I would really like to see this fixed since it's silly that you
can't have more than 3k columns
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/14615
It indeed does look like a flake.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user robert3005 closed the pull request at:
https://github.com/apache/spark/pull/16963
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user robert3005 opened a pull request:
https://github.com/apache/spark/pull/16963
[SPARK-19632] Non hive external catalogs
## What changes were proposed in this pull request?
Open up ExternalCatalog and SessionState in order to allow integrating
other catalogs
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/16575
Removed the caching logic. It was there since I wasn't sure how often we
call inputRDDs and how many times the resulting rdd would get created overall
since it's a def now
---
If your project
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/16575
This was posted mostly to get comments on what's the expected behaviour.
What's unclear is whether dataset can be shared across sparksessions and if so
what are the semantics and behaviour
GitHub user robert3005 opened a pull request:
https://github.com/apache/spark/pull/16575
[SPARK-19213] DatasourceScanExec uses runtime sparksession
## What changes were proposed in this pull request?
Physical plan for hadoop fs relation uses active session at the moment
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/14615
@rxin any chance you or someone else can take a look?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user robert3005 closed the pull request at:
https://github.com/apache/spark/pull/15033
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user robert3005 opened a pull request:
https://github.com/apache/spark/pull/15033
[SPARK-17478] create event log dir if it does not exist
## What changes were proposed in this pull request?
Create spark.eventLog.dir if it does not exist
## How was this patch
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/14573
Agree it would be subsumed and it looks pretty cool. I didn't know you can
make it asynchronous also you want to avoid spinning too many tasks since these
consume resources and block other jobs
Github user robert3005 commented on a diff in the pull request:
https://github.com/apache/spark/pull/14573#discussion_r77336634
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -116,6 +116,14 @@ object SQLConf {
.longConf
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/14900
have you seen #14791 ? Should fix the biggest offender but full clean up is
definitely useful
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user robert3005 commented on a diff in the pull request:
https://github.com/apache/spark/pull/14573#discussion_r76245512
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1296,6 +1296,7 @@ abstract class RDD[T: ClassTag](
* an exception if called
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/14573
@hvanhovell made all the suggested changes. I initially misunderstood what
getByteArrayRdd does. Shuold be good now
---
If your project is set up for it, you can reply to this email and have
Github user robert3005 commented on a diff in the pull request:
https://github.com/apache/spark/pull/14573#discussion_r76230260
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1296,6 +1296,7 @@ abstract class RDD[T: ClassTag](
* an exception if called
Github user robert3005 commented on a diff in the pull request:
https://github.com/apache/spark/pull/14573#discussion_r76229774
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ---
@@ -311,30 +311,32 @@ abstract class SparkPlan extends QueryPlan
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/14791
One more screenshot with more values from the ui.
![screen shot 2016-08-24 at 4 09 41
pm](https://cloud.githubusercontent.com/assets/512084/17936025/4ef2c4c4-6a15-11e6-9776-fba181f7d3af.png
GitHub user robert3005 opened a pull request:
https://github.com/apache/spark/pull/14791
[SPARK-17216][UI] fix event timeline bars
## What changes were proposed in this pull request?
Make event timeline bar expand to full length of the bar (which is total
time
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/14573
Ping, anything else?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user robert3005 commented on a diff in the pull request:
https://github.com/apache/spark/pull/14733#discussion_r75597217
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala
---
@@ -125,12 +129,37 @@ case class
Github user robert3005 commented on the issue:
https://github.com/apache/spark/pull/14615
@rxin anything else? I added docs to the best of my understanding let me
know if you meant something else.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user robert3005 commented on a diff in the pull request:
https://github.com/apache/spark/pull/14615#discussion_r74789001
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2494,16 +2494,18 @@ class Dataset[T] private[sql](
* @since 2.0.0
Github user robert3005 commented on a diff in the pull request:
https://github.com/apache/spark/pull/14615#discussion_r74788858
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
---
@@ -84,7 +84,7 @@ class JsonFileFormat
GitHub user robert3005 opened a pull request:
https://github.com/apache/spark/pull/14615
make toJSON not go through rdd form but operate on dataset always
## What changes were proposed in this pull request?
Don't convert toRdd when doing toJSON
## How
GitHub user robert3005 opened a pull request:
https://github.com/apache/spark/pull/14573
[SPARK-16984][SQL] don't try whole dataset immediately when first partition
doesn't haveâ¦
## What changes were proposed in this pull request?
Try increase number of partitions to try
Github user robert3005 commented on a diff in the pull request:
https://github.com/apache/spark/pull/10210#discussion_r48899006
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala ---
@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache
Github user robert3005 commented on the pull request:
https://github.com/apache/spark/pull/10210#issuecomment-163446091
jenkins retest
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user robert3005 commented on the pull request:
https://github.com/apache/spark/pull/10210#issuecomment-163371356
Thanks for pointers. I will try that locally. Sorry for all the noise.
---
If your project is set up for it, you can reply to this email and have your
reply appear
GitHub user robert3005 opened a pull request:
https://github.com/apache/spark/pull/10210
[SPARK-9843] Make catalyst optimizer pass pluggable at runtime
Let me know whether you'd like to see it in other place
You can merge this pull request into a Git repository by running
GitHub user robert3005 opened a pull request:
https://github.com/apache/spark/pull/8146
[SPARK-9843] allow pluggable optimizers
This is to allow adding optimization passes that might be valid for
specific application.
You can merge this pull request into a Git repository
Github user robert3005 closed the pull request at:
https://github.com/apache/spark/pull/8146
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
53 matches
Mail list logo