[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version

2018-09-10 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/21596 FYI - I have found nondeterministic flakes with RDDOperationScope in newer jackson, you can see fix at https://github.com/palantir/spark/pull/379. What happens is that jackson object mapper

[GitHub] spark issue #20914: [SPARK-23802][SQL] PropagateEmptyRelation can leave quer...

2018-04-03 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/20914 @gatorsmile how does it look now? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #20914: [SPARK-23802][SQL] PropagateEmptyRelation can lea...

2018-03-29 Thread robert3005
Github user robert3005 commented on a diff in the pull request: https://github.com/apache/spark/pull/20914#discussion_r178168714 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelationSuite.scala --- @@ -107,7 +112,7 @@ class

[GitHub] spark issue #20914: [SPARK-23802][SQL] PropagateEmptyRelation can leave quer...

2018-03-27 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/20914 `org.apache.spark.sql.execution.streaming.RateSourceV2Suite.basic microbatch execution` failed which looks like a flake to me

[GitHub] spark pull request #20914: [SPARK-23802][SQL] PropagateEmptyRelation can lea...

2018-03-27 Thread robert3005
GitHub user robert3005 opened a pull request: https://github.com/apache/spark/pull/20914 [SPARK-23802][SQL] PropagateEmptyRelation can leave query plan in unresolved state ## What changes were proposed in this pull request? Add cast to nulls introduced

[GitHub] spark pull request #18176: [SPARK-20952] ParquetFileFormat should forward Ta...

2018-02-15 Thread robert3005
Github user robert3005 closed the pull request at: https://github.com/apache/spark/pull/18176 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #18406: [SPARK-21195] Automatically register new metrics from so...

2017-11-06 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/18406 @jerryshao sorry I missed your comment. Somehow didn't get notification for it --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #19669: [BUILD] Close stale PRs

2017-11-06 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/19669 #18406 isn't stale, thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #18406: [SPARK-21195] Automatically register new metrics from so...

2017-11-06 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/18406 Yes, the key point is to register dynamic metrics since enumerating all of them can be a lot of hassle and needs to be kept in sync with external libraries

[GitHub] spark issue #18621: [SPARK-21400][SQL] Don't overwrite output committers on ...

2017-07-26 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/18621 Fixed in #18689 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #18621: [SPARK-21400][SQL] Don't overwrite output committ...

2017-07-26 Thread robert3005
Github user robert3005 closed the pull request at: https://github.com/apache/spark/pull/18621 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #18716: [SPARK-10063] Follow-up: remove a useless test related t...

2017-07-23 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/18716 LGTM, Thanks for looking into this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18689: [SPARK-10063] Follow-up: remove dead code related to an ...

2017-07-22 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/18689 Shouldn't https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/apache/spark/sql/sources/HadoopFsRelationTest.scala#L786 also be removed? As I understand it it checks whether

[GitHub] spark pull request #18621: [SPARK-21400][SQL] Don't overwrite output committ...

2017-07-13 Thread robert3005
GitHub user robert3005 opened a pull request: https://github.com/apache/spark/pull/18621 [SPARK-21400][SQL] Don't overwrite output committers on append ## What changes were proposed in this pull request? Stop ignoring user defined output committers in append mode

[GitHub] spark issue #18406: [SPARK-21195] Automatically register new metrics from so...

2017-06-27 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/18406 I don't see how this can be worked out. Let's say I am parquet and I want to register my metrics since they're part of application execution. Right now I have to statically define all metrics

[GitHub] spark issue #18406: [SPARK-21195] Automatically register new metrics from so...

2017-06-27 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/18406 This is to facilitate using metrics in libraries that integrate in spark. Since spark already has metric reporting infrastructure and lets you register sources with it it seems natural extension

[GitHub] spark pull request #18406: [SPARK-21195] Automatically register new metrics ...

2017-06-23 Thread robert3005
GitHub user robert3005 opened a pull request: https://github.com/apache/spark/pull/18406 [SPARK-21195] Automatically register new metrics from sources and wire default registry ## What changes were proposed in this pull request? Registers metric listeners on sources metrics

[GitHub] spark pull request #18176: [SPARK-20952] Make TaskContext an InheritableThea...

2017-06-01 Thread robert3005
GitHub user robert3005 opened a pull request: https://github.com/apache/spark/pull/18176 [SPARK-20952] Make TaskContext an InheritableTheadLocal ## What changes were proposed in this pull request? Make TaskContext reference an InheritableTheadLocal so thread pools spun up

[GitHub] spark issue #14615: [SPARK-17029] make toJSON not go through rdd form but op...

2017-05-10 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/14615 thanks @gatorsmile, updated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14615: [SPARK-17029] make toJSON not go through rdd form but op...

2017-05-10 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/14615 ping? @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16575: [SPARK-19213] DatasourceScanExec uses runtime spa...

2017-05-10 Thread robert3005
Github user robert3005 closed the pull request at: https://github.com/apache/spark/pull/16575 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #16648: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-04-27 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/16648 @bdrillard if you don't have time to finish this up I am happy to update this to latest. I would really like to see this fixed since it's silly that you can't have more than 3k columns

[GitHub] spark issue #14615: [SPARK-17029] make toJSON not go through rdd form but op...

2017-03-17 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/14615 It indeed does look like a flake. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #16963: [SPARK-19632] Non hive external catalogs

2017-02-16 Thread robert3005
Github user robert3005 closed the pull request at: https://github.com/apache/spark/pull/16963 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #16963: [SPARK-19632] Non hive external catalogs

2017-02-16 Thread robert3005
GitHub user robert3005 opened a pull request: https://github.com/apache/spark/pull/16963 [SPARK-19632] Non hive external catalogs ## What changes were proposed in this pull request? Open up ExternalCatalog and SessionState in order to allow integrating other catalogs

[GitHub] spark issue #16575: [SPARK-19213] DatasourceScanExec uses runtime sparksessi...

2017-02-09 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/16575 Removed the caching logic. It was there since I wasn't sure how often we call inputRDDs and how many times the resulting rdd would get created overall since it's a def now --- If your project

[GitHub] spark issue #16575: [SPARK-19213] DatasourceScanExec uses runtime sparksessi...

2017-01-13 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/16575 This was posted mostly to get comments on what's the expected behaviour. What's unclear is whether dataset can be shared across sparksessions and if so what are the semantics and behaviour

[GitHub] spark pull request #16575: [SPARK-19213] DatasourceScanExec uses runtime spa...

2017-01-13 Thread robert3005
GitHub user robert3005 opened a pull request: https://github.com/apache/spark/pull/16575 [SPARK-19213] DatasourceScanExec uses runtime sparksession ## What changes were proposed in this pull request? Physical plan for hadoop fs relation uses active session at the moment

[GitHub] spark issue #14615: [SPARK-17029] make toJSON not go through rdd form but op...

2016-10-20 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/14615 @rxin any chance you or someone else can take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #15033: [SPARK-17478] create event log dir if it does not...

2016-09-09 Thread robert3005
Github user robert3005 closed the pull request at: https://github.com/apache/spark/pull/15033 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #15033: [SPARK-17478] create event log dir if it does not...

2016-09-09 Thread robert3005
GitHub user robert3005 opened a pull request: https://github.com/apache/spark/pull/15033 [SPARK-17478] create event log dir if it does not exist ## What changes were proposed in this pull request? Create spark.eventLog.dir if it does not exist ## How was this patch

[GitHub] spark issue #14573: [SPARK-16984][SQL] don't try whole dataset immediately w...

2016-09-02 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/14573 Agree it would be subsumed and it looks pretty cool. I didn't know you can make it asynchronous also you want to avoid spinning too many tasks since these consume resources and block other jobs

[GitHub] spark pull request #14573: [SPARK-16984][SQL] don't try whole dataset immedi...

2016-09-02 Thread robert3005
Github user robert3005 commented on a diff in the pull request: https://github.com/apache/spark/pull/14573#discussion_r77336634 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -116,6 +116,14 @@ object SQLConf { .longConf

[GitHub] spark issue #14900: [WEBUI] Style of event timeline is broken

2016-08-31 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/14900 have you seen #14791 ? Should fix the biggest offender but full clean up is definitely useful --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #14573: [SPARK-16984][SQL] don't try whole dataset immedi...

2016-08-25 Thread robert3005
Github user robert3005 commented on a diff in the pull request: https://github.com/apache/spark/pull/14573#discussion_r76245512 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1296,6 +1296,7 @@ abstract class RDD[T: ClassTag]( * an exception if called

[GitHub] spark issue #14573: [SPARK-16984][SQL] don't try whole dataset immediately w...

2016-08-25 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/14573 @hvanhovell made all the suggested changes. I initially misunderstood what getByteArrayRdd does. Shuold be good now --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request #14573: [SPARK-16984][SQL] don't try whole dataset immedi...

2016-08-25 Thread robert3005
Github user robert3005 commented on a diff in the pull request: https://github.com/apache/spark/pull/14573#discussion_r76230260 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1296,6 +1296,7 @@ abstract class RDD[T: ClassTag]( * an exception if called

[GitHub] spark pull request #14573: [SPARK-16984][SQL] don't try whole dataset immedi...

2016-08-25 Thread robert3005
Github user robert3005 commented on a diff in the pull request: https://github.com/apache/spark/pull/14573#discussion_r76229774 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala --- @@ -311,30 +311,32 @@ abstract class SparkPlan extends QueryPlan

[GitHub] spark issue #14791: [SPARK-17216][UI] fix event timeline bars length

2016-08-24 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/14791 One more screenshot with more values from the ui. ![screen shot 2016-08-24 at 4 09 41 pm](https://cloud.githubusercontent.com/assets/512084/17936025/4ef2c4c4-6a15-11e6-9776-fba181f7d3af.png

[GitHub] spark pull request #14791: [SPARK-17216][UI] fix event timeline bars

2016-08-24 Thread robert3005
GitHub user robert3005 opened a pull request: https://github.com/apache/spark/pull/14791 [SPARK-17216][UI] fix event timeline bars ## What changes were proposed in this pull request? Make event timeline bar expand to full length of the bar (which is total time

[GitHub] spark issue #14573: [SPARK-16984][SQL] don't try whole dataset immediately w...

2016-08-23 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/14573 Ping, anything else? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #14733: [SPARK-17170] [SQL] InMemoryTableScanExec driver-...

2016-08-21 Thread robert3005
Github user robert3005 commented on a diff in the pull request: https://github.com/apache/spark/pull/14733#discussion_r75597217 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -125,12 +129,37 @@ case class

[GitHub] spark issue #14615: [SPARK-17029] make toJSON not go through rdd form but op...

2016-08-16 Thread robert3005
Github user robert3005 commented on the issue: https://github.com/apache/spark/pull/14615 @rxin anything else? I added docs to the best of my understanding let me know if you meant something else. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #14615: [SPARK-17029] make toJSON not go through rdd form...

2016-08-15 Thread robert3005
Github user robert3005 commented on a diff in the pull request: https://github.com/apache/spark/pull/14615#discussion_r74789001 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2494,16 +2494,18 @@ class Dataset[T] private[sql]( * @since 2.0.0

[GitHub] spark pull request #14615: [SPARK-17029] make toJSON not go through rdd form...

2016-08-15 Thread robert3005
Github user robert3005 commented on a diff in the pull request: https://github.com/apache/spark/pull/14615#discussion_r74788858 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala --- @@ -84,7 +84,7 @@ class JsonFileFormat

[GitHub] spark pull request #14615: make toJSON not go through rdd form but operate o...

2016-08-11 Thread robert3005
GitHub user robert3005 opened a pull request: https://github.com/apache/spark/pull/14615 make toJSON not go through rdd form but operate on dataset always ## What changes were proposed in this pull request? Don't convert toRdd when doing toJSON ## How

[GitHub] spark pull request #14573: [SPARK-16984][SQL] don't try whole dataset immedi...

2016-08-09 Thread robert3005
GitHub user robert3005 opened a pull request: https://github.com/apache/spark/pull/14573 [SPARK-16984][SQL] don't try whole dataset immediately when first partition doesn't have… ## What changes were proposed in this pull request? Try increase number of partitions to try

[GitHub] spark pull request: [SPARK-9843][SQL] Make catalyst optimizer pass...

2016-01-05 Thread robert3005
Github user robert3005 commented on a diff in the pull request: https://github.com/apache/spark/pull/10210#discussion_r48899006 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala --- @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-9843] Make catalyst optimizer pass plug...

2015-12-09 Thread robert3005
Github user robert3005 commented on the pull request: https://github.com/apache/spark/pull/10210#issuecomment-163446091 jenkins retest --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-9843] Make catalyst optimizer pass plug...

2015-12-09 Thread robert3005
Github user robert3005 commented on the pull request: https://github.com/apache/spark/pull/10210#issuecomment-163371356 Thanks for pointers. I will try that locally. Sorry for all the noise. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-9843] Make catalyst optimizer pass plug...

2015-12-08 Thread robert3005
GitHub user robert3005 opened a pull request: https://github.com/apache/spark/pull/10210 [SPARK-9843] Make catalyst optimizer pass pluggable at runtime Let me know whether you'd like to see it in other place You can merge this pull request into a Git repository by running

[GitHub] spark pull request: [SPARK-9843] allow pluggable optimizers

2015-08-12 Thread robert3005
GitHub user robert3005 opened a pull request: https://github.com/apache/spark/pull/8146 [SPARK-9843] allow pluggable optimizers This is to allow adding optimization passes that might be valid for specific application. You can merge this pull request into a Git repository

[GitHub] spark pull request: [SPARK-9843][SQL] allow pluggable optimizers

2015-08-12 Thread robert3005
Github user robert3005 closed the pull request at: https://github.com/apache/spark/pull/8146 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature