[GitHub] spark issue #17009: [SPARK-19674][SQL]Ignore driver accumulator updates don'...

2017-03-24 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17009 I created a Spark 2.1 backport at #17418. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #17418: [SPARK-19674][SQL] Ignore driver accumulator upda...

2017-03-24 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/17418 [SPARK-19674][SQL] Ignore driver accumulator updates don't belong to … [SPARK-19674][SQL] Ignore driver accumulator updates don't belong to the execution when merging all accumulator updates

[GitHub] spark issue #17009: [SPARK-19674][SQL]Ignore driver accumulator updates don'...

2017-03-24 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17009 It looks like this will fix a bug we're experiencing in Spark 2.1. Given that this PR is a bug fix, any chance we can get a backport into `branch-2.1`? I can work on it myself if @carsonwang

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-03-24 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 Rebased to latest master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-03-24 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 I'm still working actively on this PR (as I have time), but I wanted to share that I will be away and unavailable from tonight, March 24th until Tuesday, April 11th. If you post a comment

[GitHub] spark pull request #17390: [SPARK-17204][CORE] Fix replicated off heap stora...

2017-03-24 Thread mallman
Github user mallman closed the pull request at: https://github.com/apache/spark/pull/17390 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #16499: [SPARK-17204][CORE] Fix replicated off heap storage

2017-03-22 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16499 Backport PR is #17390 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #17390: [SPARK-17204][CORE] Fix replicated off heap storage

2017-03-22 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17390 This is a backport of #16499 to branch-2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #17390: [SPARK-17204][CORE] Fix replicated off heap stora...

2017-03-22 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/17390 [SPARK-17204][CORE] Fix replicated off heap storage (Jira: https://issues.apache.org/jira/browse/SPARK-17204) There are a couple of bugs in the `BlockManager` with respect to support

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-03-22 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15125 @felixcheung We haven't heard from @jkbradley or @ankurdave in a week. Should we give them more time or can we merge to master? --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #16499: [SPARK-17204][CORE] Fix replicated off heap storage

2017-03-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16499 > @mallman can you send a new PR for 2.0? thanks! Will do. Do I need to open a new JIRA ticket for that? --- If your project is set up for it, you can reply to this email and have y

[GitHub] spark pull request #16499: [SPARK-17204][CORE] Fix replicated off heap stora...

2017-03-20 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16499#discussion_r107028767 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1048,7 +1065,7 @@ private[spark] class BlockManager( try

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-03-16 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 @viirya A month has gone by since my last update. I've added much more comprehensive coverage to the `SelectedFieldSuite`, however I haven't yet fixed the `SelectedField` extractor to pass all

[GitHub] spark pull request #16499: [SPARK-17204][CORE] Fix replicated off heap stora...

2017-03-16 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16499#discussion_r106552361 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1048,7 +1065,7 @@ private[spark] class BlockManager( try

[GitHub] spark pull request #16499: [SPARK-17204][CORE] Fix replicated off heap stora...

2017-03-07 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16499#discussion_r104770778 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1048,7 +1065,7 @@ private[spark] class BlockManager( try

[GitHub] spark issue #16499: [SPARK-17204][CORE] Fix replicated off heap storage

2017-03-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16499 I looked into simply cleaning up the `StorageUtils.dispose` method to only dispose memory-mapped buffers. However, I did find legitimate uses of that method to dispose of direct/non-memory-mapped

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-03-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15125 @felixcheung Can you take another look and merge if LGTY? I think we've addressed all of the open reviewer requests. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #16499: [SPARK-17204][CORE] Fix replicated off heap stora...

2017-02-28 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16499#discussion_r103602156 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1018,7 +1025,9 @@ private[spark] class BlockManager( try

[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-28 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15125#discussion_r103509816 --- Diff: docs/graphx-programming-guide.md --- @@ -708,7 +708,9 @@ messages remaining. > messaging function. These constraints allow additio

[GitHub] spark pull request #16499: [SPARK-17204][CORE] Fix replicated off heap stora...

2017-02-28 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16499#discussion_r103509038 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1018,7 +1025,9 @@ private[spark] class BlockManager( try

[GitHub] spark pull request #16499: [SPARK-17204][CORE] Fix replicated off heap stora...

2017-02-21 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16499#discussion_r102293681 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -317,6 +317,9 @@ private[spark] class BlockManager

[GitHub] spark pull request #16499: [SPARK-17204][CORE] Fix replicated off heap stora...

2017-02-21 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16499#discussion_r102293219 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -843,7 +852,15 @@ private[spark] class BlockManager

[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-21 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15125#discussion_r102292537 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala --- @@ -122,27 +125,39 @@ object Pregel extends Logging { require

[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-21 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15125#discussion_r102290972 --- Diff: docs/graphx-programming-guide.md --- @@ -708,7 +708,9 @@ messages remaining. > messaging function. These constraints allow additio

[GitHub] spark pull request #16499: [SPARK-17204][CORE] Fix replicated off heap stora...

2017-02-21 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16499#discussion_r102272981 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -813,7 +813,14 @@ private[spark] class BlockManager

[GitHub] spark pull request #16499: [SPARK-17204][CORE] Fix replicated off heap stora...

2017-02-21 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16499#discussion_r102271763 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1018,7 +1025,9 @@ private[spark] class BlockManager( try

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-02-20 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15125 @dding3, thank you for your continued patience and dedication to this PR, despite the continued change requests. We are getting closer to a merge. --- If your project is set up for it, you can

[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-20 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15125#discussion_r102057156 --- Diff: docs/graphx-programming-guide.md --- @@ -708,7 +708,9 @@ messages remaining. > messaging function. These constraints allow additio

[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-20 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15125#discussion_r102056438 --- Diff: docs/graphx-programming-guide.md --- @@ -708,7 +708,9 @@ messages remaining. > messaging function. These constraints allow additio

[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-20 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15125#discussion_r102053462 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala --- @@ -154,7 +169,9 @@ object Pregel extends Logging { // count

[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-17 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15125#discussion_r101819321 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/util/PeriodicGraphCheckpointer.scala --- @@ -87,10 +87,10 @@ private[mllib] class

[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-17 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15125#discussion_r101818789 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/GraphOps.scala --- @@ -362,12 +362,14 @@ class GraphOps[VD: ClassTag, ED: ClassTag](graph: Graph[VD

[GitHub] spark pull request #16499: [SPARK-17204][CORE] Fix replicated off heap stora...

2017-02-17 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16499#discussion_r101809872 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1018,7 +1025,9 @@ private[spark] class BlockManager( try

[GitHub] spark pull request #16499: [SPARK-17204][CORE] Fix replicated off heap stora...

2017-02-17 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16499#discussion_r101809099 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -813,7 +813,14 @@ private[spark] class BlockManager

[GitHub] spark pull request #16499: [SPARK-17204][CORE] Fix replicated off heap stora...

2017-02-16 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16499#discussion_r101675669 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1018,7 +1025,9 @@ private[spark] class BlockManager( try

[GitHub] spark pull request #16499: [SPARK-17204][CORE] Fix replicated off heap stora...

2017-02-16 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16499#discussion_r101675576 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -813,7 +813,14 @@ private[spark] class BlockManager

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-02-16 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15125 LGTM. @felixcheung are we good to merge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-02-16 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15125 @dding3 I submitted a PR against your `cp2_pregel` branch. If you merge that PR into your branch, it will be reflected in this PR. This is my PR: https://github.com/dding3/spark/pull/1

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-02-16 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15125 > I think @mallman is saying he would merge changes to @dding3 branch Yes, or I could do them in a follow up PR. Or @dding3 could do them without my PR. I'm not hung up on getting cre

[GitHub] spark pull request #16499: [SPARK-17204][CORE] Fix replicated off heap stora...

2017-02-16 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16499#discussion_r101602604 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -813,7 +813,14 @@ private[spark] class BlockManager

[GitHub] spark pull request #16499: [SPARK-17204][CORE] Fix replicated off heap stora...

2017-02-16 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16499#discussion_r101602014 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -813,7 +813,14 @@ private[spark] class BlockManager

[GitHub] spark pull request #16499: [SPARK-17204][CORE] Fix replicated off heap stora...

2017-02-16 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16499#discussion_r101592331 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1018,7 +1025,9 @@ private[spark] class BlockManager( try

[GitHub] spark issue #16942: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-02-15 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16942 Weird. I think I've seen that behavior once before. But I think the only time I force push on a PR is to rebase. Maybe that's the only kind of force push allowed for Github PRs. --- If your

[GitHub] spark issue #16942: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-02-15 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16942 Force pushing your branch shouldn't close the PR. You didn't close it manually? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-02-14 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 @viirya I've added a commit to address some of your feedback. I will have another commit to address the others, but I'm not sure when I'll have it in. Hopefully by the end of next week

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-02-14 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15125 Our connected components computation completed successfully, with performance as expected. I've created a PR against @dding3's PR branch to incorporate a couple simple things. Then I think we're

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-02-13 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15125 @dding3 These latest changes look great. I'll run our big connected components job today and report back. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-02-10 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15125 @viirya @dding3 I'm going to rerun our big connected components computation with the changes I've suggested to validate that it still performs and completes as expected. Given the time required

[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15125#discussion_r100641170 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/util/PeriodicGraphCheckpointer.scala --- @@ -87,10 +88,7 @@ private[mllib] class

[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15125#discussion_r100640256 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala --- @@ -123,16 +127,25 @@ object Pregel extends Logging { s" bu

[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15125#discussion_r100638292 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala --- @@ -123,16 +127,25 @@ object Pregel extends Logging { s" bu

[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15125#discussion_r100638130 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/util/PeriodicGraphCheckpointer.scala --- @@ -76,7 +77,7 @@ import

[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15125#discussion_r100632148 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/impl/PeriodicRDDCheckpointerSuite.scala --- @@ -23,7 +23,7 @@ import org.apache.spark.{SparkContext

[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15125#discussion_r100631975 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/impl/PeriodicGraphCheckpointerSuite.scala --- @@ -21,6 +21,7 @@ import org.apache.hadoop.fs.Path

[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15125#discussion_r100612840 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala --- @@ -123,16 +127,25 @@ object Pregel extends Logging { s" bu

[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15125#discussion_r100609529 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala --- @@ -123,16 +127,25 @@ object Pregel extends Logging { s" bu

[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15125#discussion_r100608839 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala --- @@ -123,16 +127,25 @@ object Pregel extends Logging { s" bu

[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16797 BTW @budde, given that this represents a regression in behavior from previous versions of Spark, I think it is too generous of you to label the Jira issue as an "improvement" instead of

[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16797 >> Like you said, users can still create a hive table with mixed-case-schema parquet/orc files, by hive or other systems like presto. This table is readable for hive, and for Spark prior

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16775 @viirya I believe this PR meshes with the refactoring and application to pregel GraphX algorithms in #15125. Basically, it moves the periodic checkpointing code from mllib into core and uses

[GitHub] spark pull request #16785: [SPARK-19443][SQL] The function to generate const...

2017-02-09 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16785#discussion_r100364260 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -314,19 +322,29 @@ abstract class UnaryNode

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-02-09 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r100360523 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/GetStructField2.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-02-08 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r100229358 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/GetStructField2.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-02-08 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r100229300 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/SelectedField.scala --- @@ -0,0 +1,76 @@ +/* + * Licensed

[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-06 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16797 The proposal to restore schema inference with finer grained control on when it is performed sounds reasonable to me. The case I'm most interested in is turning off schema inference entirely

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-02-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r99174674 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/SelectedField.scala --- @@ -0,0 +1,76 @@ +/* + * Licensed

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-02-01 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r98920657 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/GetStructField2.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-01-31 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r98819150 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/GetStructField2.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed

[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2

2017-01-31 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16751 FYI, there are at least two workarounds in the Spark codebase which can potentially be removed as a consequence of this upgrade. For example: https://github.com/apache/spark/blob

[GitHub] spark issue #16281: [SPARK-13127][SQL] Update Parquet to 1.9.0

2017-01-31 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16281 FYI, we've been using 1.9.0 patched with a fix for https://issues.apache.org/jira/browse/PARQUET-783 without problem. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-01-18 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15125 LGTM. @srowen, can you recommend an mllib committer to review these changes? I'm not familiar with that team. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-01-17 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15125 Hi @dding3. Thanks for working on this! I was able to rebase and apply your patch to our build of Spark 2.1 to successfully compute the connected components of a graph with 5.2 billion vertices

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-01-16 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > Does this take over #14957? If so, we might need Closes #14957 in the PR description for the merge script to close that one or let the author know this takes over that. I don't k

[GitHub] spark issue #16499: [SPARK-17204][CORE] Fix replicated off heap storage

2017-01-13 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16499 Josh, can you take a look at this when you have a chance? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #16499: [SPARK-17204][CORE] Fix replicated off heap storage

2017-01-13 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16499 @rxin, can you recommend someone I reach out to for help reviewing this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-01-13 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 cc @rxin @ericl @cloud-fan @marmbrus I would love to get your feedback on this if you have the time. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-01-13 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/16578 [SPARK-4502][SQL] Parquet nested column pruning (Link to Jira: https://issues.apache.org/jira/browse/SPARK-4502) ## What changes were proposed in this pull request? One

[GitHub] spark pull request #16514: [SPARK-19128] [SQL] Refresh Cache after Set Locat...

2017-01-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16514#discussion_r95490619 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -555,6 +557,61 @@ class HiveDDLSuite

[GitHub] spark pull request #16514: [SPARK-19128] [SQL] Refresh Cache after Set Locat...

2017-01-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16514#discussion_r95489960 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -119,7 +119,30 @@ private[hive] class HiveMetastoreCatalog

[GitHub] spark issue #16514: [SPARK-19128] [SQL] Refresh Cache after Set Location

2017-01-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16514 > A good suggestion. Will do the code changes tomorrow. Thanks! I look forward to seeing this. Thanks for taking this on. --- If your project is set up for it, you can reply to this em

[GitHub] spark pull request #16500: [SPARK-19120] [SPARK-19121] Refresh Metadata Cach...

2017-01-09 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16500#discussion_r95206030 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -392,7 +392,9 @@ case class InsertIntoHiveTable

[GitHub] spark pull request #16499: [SPARK-17204][CORE] Fix replicated off heap stora...

2017-01-07 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16499#discussion_r95066452 --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerReplicationSuite.scala --- @@ -375,7 +375,8 @@ class BlockManagerReplicationSuite

[GitHub] spark pull request #16499: [SPARK-17204][CORE] Fix replicated off heap stora...

2017-01-07 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16499#discussion_r95066296 --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerReplicationSuite.scala --- @@ -387,12 +388,23 @@ class BlockManagerReplicationSuite

[GitHub] spark pull request #16499: [SPARK-17204][CORE] Fix replicated off heap stora...

2017-01-07 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/16499 [SPARK-17204][CORE] Fix replicated off heap storage (Jira: https://issues.apache.org/jira/browse/SPARK-17204) ## What changes were proposed in this pull request

[GitHub] spark issue #15480: [SPARK-16845][SQL] `GeneratedClass$SpecificOrdering` gro...

2017-01-04 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15480 Hi @lw-lin. Just FYI we use this patch at VideoAmp and would love to see it merged in. I notice this PR has gone a little cold. I'm sorry I can't offer much concrete help, but I wanted to check

[GitHub] spark issue #16281: [SPARK-13127][SQL] Update Parquet to 1.9.0

2016-12-15 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16281 I'd love to see frequent, conservative patch releases. From my experience, parquet bugs cause significant trouble for downstream consumers. For example, we encountered a data corruption bug writing

[GitHub] spark issue #16274: [SPARK-18853][SQL] Project (UnaryNode) is way too aggres...

2016-12-14 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16274 Outside of some comment grooming, LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #16274: [SPARK-18853][SQL] Project (UnaryNode) is way too...

2016-12-14 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16274#discussion_r92448788 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/ArrayType.scala --- @@ -78,10 +78,10 @@ case class ArrayType(elementType: DataType

[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...

2016-12-13 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16122 @wangyum Thank you for this important bug fix! :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...

2016-12-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16122 I believe that may be the case, unfortunately. At least, I have no immediate ideas otherwise. > On Dec 7, 2016, at 5:25 PM, Eric Liang <notificati...@github.com> wrote: >

[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...

2016-12-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16122 I think that's exactly what I tried and got the `NoSuchMethodException`. On Dec 7, 2016, at 3:35 PM, Eric Liang <notificati...@github.com> wrote: I did some d

[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...

2016-12-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16122 I'm not sure configuration-level rollback will guarantee an absence of interactions with other tests. For one thing, I think we need to create and clean up an independent metastore directory

[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...

2016-12-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16122 It does, yes. My concern around that test is that its behavior doesn't seem to be independent of other tests. For example, the value of ```hive.getConf.getBoolean

[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...

2016-12-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16122 There may be a way to do it, but the classloader tricks being used in the hive client implementation are beyond my comprehension. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...

2016-12-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16122 I tried that but got a `NoSuchMethodException` in the call to `getMSC`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...

2016-12-06 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16122 I haven't been able to get a proper unit test environment running where the embedded metastore conf is different from the client conf. I did validate that Spark without this patch failed to execute

[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...

2016-12-06 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16122 @wangyum I'm going to see if I can help with the unit testing on this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-12-05 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15998 I suspect this is a spurious, unrelated test failure. Can we get a rebuild, please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-12-05 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15998 @gatorsmile I've applied your patch and reverted the change I made in the previous commit to workaround that defect. The failed test now passes for me. Let's see what Jenkins says. --- If your

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-12-05 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15998 > #15998 (comment) found a bug. If this PR will not be merged to Spark 2.1 branch, I think we need to submit a separate PR for resolving the bug. I would like to get this patch into Sp

<    1   2   3   4   5   6   7   >