[GitHub] spark pull request #18055: [Core][WIP] Make the object in TorrentBroadcast a...

2017-05-21 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/18055#discussion_r117666471 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -54,7 +54,7 @@ import org.apache.spark.util.io.{ChunkedByteBuffer

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistices to ...

2017-01-25 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r97783672 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -90,25 +95,101 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistices to ...

2017-01-24 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r97701247 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -90,25 +95,100 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistices to ...

2017-01-24 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r97700863 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -90,25 +95,100 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistices to ...

2017-01-24 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r97700723 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -90,25 +95,100 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistices to ...

2017-01-24 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r97700670 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -90,25 +95,100 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistices to ...

2017-01-24 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r97700568 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -230,6 +230,21 @@ case object SinglePartition

[GitHub] spark issue #16633: [SPARK-19274][SQL] Make GlobalLimit without shuffling da...

2017-01-19 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/16633 @viirya i suggest fix the 2 in this pr, let's wait some comment on 1. /cc @rxin and @wzhfy who may comment on the first case. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #16633: [SPARK-19274][SQL] Make GlobalLimit without shuffling da...

2017-01-19 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/16633 For 1, my idea is not use the proposal in this PR, 1. how you determine `total rows in all partitions are (much) more than limit number.` and then go into this code path and how to decide

[GitHub] spark issue #16633: [SPARK-19274][SQL] Make GlobalLimit without shuffling da...

2017-01-19 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/16633 all partitions after local limit are about/nearly 100,000,000 rows --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #16633: [SPARK-19274][SQL] Make GlobalLimit without shuffling da...

2017-01-19 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/16633 Again, to clean, I am against the performance regression in flowing case 0. limit num is 100,000,000 1. the original table rows is very big, much larger than 100,000,000 rows 2. after

[GitHub] spark issue #16633: [SPARK-19274][SQL] Make GlobalLimit without shuffling da...

2017-01-19 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/16633 I think shuffle is ok, but shuffle to one partition leads to the performance issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #16633: [SPARK-19274][SQL] Make GlobalLimit without shuffling da...

2017-01-19 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/16633 Assume local limit output 100,000,000 rows, then in global limit it will be take in a single partition, so it is very slow and can not use other free cores to improve the parallelism. --- If your

[GitHub] spark issue #16633: [SPARK-19274][SQL] Make GlobalLimit without shuffling da...

2017-01-19 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/16633 @viirya my team member post the mail list, actually we mean the case i listed above, the main issue is the single partition issue in global limit, if in that case you fall back to old global limit

[GitHub] spark issue #16633: [SPARK-19274][SQL] Make GlobalLimit without shuffling da...

2017-01-19 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/16633 I think the local limit cost is important, we assume recompute partions number: m, all the partitions: n m = 1, n =100 is a positive case, but there also cases that m very close to n(even m = n

[GitHub] spark issue #16633: [SPARK-19274][SQL] Make GlobalLimit without shuffling da...

2017-01-19 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/16633 Your proposal avoid the cost of all partitions compute and shuffle for local limit but introduce some partitions recompute for local limit stage. We can not decide which cost is cheaper

[GitHub] spark issue #16633: [SPARK-19274][SQL] Make GlobalLimit without shuffling da...

2017-01-19 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/16633 I think before compare our proposals , we should first make sure our proposal will not bring performance regression. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #16633: [SPARK-19274][SQL] Make GlobalLimit without shuffling da...

2017-01-19 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/16633 Not get you, but let me explain more, If we use map output statistics to decide each global limit should take how many element. 1. local limit shuffle with the maillist partitioner and return

[GitHub] spark issue #16633: [SPARK-19274][SQL] Make GlobalLimit without shuffling da...

2017-01-18 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/16633 need define a new map output statistics to do this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16633: [SPARK-19274][SQL] Make GlobalLimit without shuffling da...

2017-01-18 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/16633 Yes, you are right, we can not ensure the uniform distribution for global limit. An idea is not use a special partitioner, after the shuffle we should get the mapoutput statistics for row num

[GitHub] spark issue #16633: [SPARK-19274][SQL] Make GlobalLimit without shuffling da...

2017-01-18 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/16633 refer to the maillist >One issue left is how to decide shuffle partition number. We can have a config of the maximum number of elements for each GlobalLimit task to process, then

[GitHub] spark issue #16633: [SPARK-19274][SQL] Make GlobalLimit without shuffling da...

2017-01-18 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/16633 To clear, now we have these issues: 1. local limit compute all partitions, that means it launch many tasks but actually maybe very small tasks is enough. 2. global limit single partition

[GitHub] spark pull request #16633: [SPARK-19274][SQL] Make GlobalLimit without shuff...

2017-01-18 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/16633#discussion_r96784321 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -90,21 +94,74 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark issue #16633: [SPARK-19274][SQL] Make GlobalLimit without shuffling da...

2017-01-18 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/16633 @viirya @rxin i support the idea of @wzhfy in the maillist http://apache-spark-developers-list.1001551.n3.nabble.com/Limit-Query-Performance-Suggestion-td20570.html, it solved the single partition

[GitHub] spark pull request #16633: [SPARK-19274][SQL] Make GlobalLimit without shuff...

2017-01-18 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/16633#discussion_r96782626 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -90,21 +94,74 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #16633: [SPARK-19274][SQL] Make GlobalLimit without shuff...

2017-01-18 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/16633#discussion_r96782094 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -90,21 +94,74 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #16633: [SPARK-19274][SQL] Make GlobalLimit without shuff...

2017-01-18 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/16633#discussion_r96781278 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -90,21 +94,74 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #16633: [SPARK-19274][SQL] Make GlobalLimit without shuff...

2017-01-18 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/16633#discussion_r96780810 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -90,21 +94,74 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #16633: [SPARK-19274][SQL] Make GlobalLimit without shuff...

2017-01-18 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/16633#discussion_r96780571 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -90,21 +94,74 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #16633: [SPARK-19274][SQL] Make GlobalLimit without shuff...

2017-01-18 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/16633#discussion_r96779648 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -90,21 +94,74 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #16633: [SPARK-19274][SQL] Make GlobalLimit without shuff...

2017-01-18 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/16633#discussion_r96773557 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -90,21 +94,74 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #16633: [SPARK-19274][SQL] Make GlobalLimit without shuff...

2017-01-18 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/16633#discussion_r96773174 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -90,21 +94,74 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #16633: [SPARK-19274][SQL] Make GlobalLimit without shuff...

2017-01-18 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/16633#discussion_r96673145 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -90,21 +94,74 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark issue #15240: [SPARK-17556] [CORE] [SQL] Executor side broadcast for b...

2017-01-03 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15240 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15240: [SPARK-17556] [CORE] [SQL] Executor side broadcast for b...

2017-01-03 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15240 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15240: [SPARK-17556] [CORE] [SQL] Executor side broadcast for b...

2017-01-02 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15240 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-11-08 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15297 @YuhuWang2002 We should limit the use case for outer join: For left outer join, such as A left join B, this implementation now can not handle the case of skew of table B. That's because

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-20 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15481 `CoarseGrainedSchedulerBackend.removeExecutor` also use ask, but it does not matter right? because it just send msg once and log the error if failure --- If your project is set up for it, you can

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15481 Updated, can you review again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15481 ok, i will revert to the initial commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15481 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15481 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrai...

2016-10-14 Thread scwf
GitHub user scwf opened a pull request: https://github.com/apache/spark/pull/15481 [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSchedulerBackend reset ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-17929 Now

[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew

2016-10-07 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15297 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15297: [WIP][SPARK-9862]Handling data skew

2016-10-07 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15297 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15240: [SPARK-17556] [CORE] [SQL] Executor side broadcast for b...

2016-10-07 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15240 /cc @rxin can you help review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15213: [SPARK-17644] [CORE] Do not add failedStages when abortS...

2016-09-28 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15213 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15213: [SPARK-17644] [CORE] Do not add failedStages when abortS...

2016-09-28 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15213 @kayousterhout Thanks for your comment, i have updated based on all your comment. . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #15213: [SPARK-17644] [CORE] Do not add failedStages when...

2016-09-28 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/15213#discussion_r80865465 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1256,11 +1257,13 @@ class DAGScheduler

[GitHub] spark issue #15213: [SPARK-17644] [CORE] Do not add failedStages when abortS...

2016-09-26 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15213 @markhamstra in my fix i just want to make the minor changes for the dagscheduer, and your fix is also ok to me, i can update this according your comment. Thanks:) /cc @zsxwing may also have

[GitHub] spark pull request #15240: [SPARK-17556] Executor side broadcast for broadca...

2016-09-26 Thread scwf
GitHub user scwf opened a pull request: https://github.com/apache/spark/pull/15240 [SPARK-17556] Executor side broadcast for broadcast joins ## What changes were proposed in this pull request? Design doc : https://issues.apache.org/jira/secure/attachment/12830286/executor

[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...

2016-09-23 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15213 > actual problem is not in abortStage but rather in improper additions to failedStages correct, i think a more accurate description for this issue is "do not add `failedStag

[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...

2016-09-23 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15213 Actually the failedStages only added here in spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...

2016-09-23 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15213 Thanks @zsxwing to explain this. @markhamstra the issue happens in the case of my PR description. It usually depends on muti-thread submitting jobs cases and the order of fetch failure, so i said

[GitHub] spark pull request #15213: [SPARK-17644] [CORE] Fix the race condition when ...

2016-09-23 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/15213#discussion_r80274817 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -2105,6 +2109,54 @@ class DAGSchedulerSuite extends SparkFunSuite

[GitHub] spark pull request #15213: [SPARK-17644] [CORE] Fix the race condition when ...

2016-09-23 Thread scwf
GitHub user scwf opened a pull request: https://github.com/apache/spark/pull/15213 [SPARK-17644] [CORE] Fix the race condition when DAGScheduler handle the FetchFailed event ## What changes were proposed in this pull request? | Time|Thread 1 , Job1 | Thread 2

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-31 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/14712#discussion_r77014850 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala --- @@ -168,6 +169,107 @@ class StatisticsSuite extends QueryTest

[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...

2016-08-19 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/14712 /cc @cloud-fan @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request: [SPARK-8968] [SQL] external sort by the partit...

2016-01-20 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/7336#issuecomment-173435459 @yhuai thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-8968] [SQL] external sort by the partit...

2016-01-20 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/7336#discussion_r50269169 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveWriterContainers.scala --- @@ -198,33 +241,99 @@ private[spark] class

[GitHub] spark pull request: [SPARK-8968] [SQL] external sort by the partit...

2016-01-20 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/7336#issuecomment-173241662 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...

2016-01-15 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/5827#issuecomment-172144781 @rxin Our parser is a extended version of the `SqlParser`, the main difference is that we add the support for subquery(both correlated and uncorrelated ),exists

[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...

2016-01-15 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/5827#issuecomment-172151467 Actually we were trying to contribute this improvements, unfortunately the community do not want them for maintain(or compatibility with hive ql) reason in the past

[GitHub] spark pull request: [SPARK-8968] [SQL] external sort by the partit...

2016-01-15 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/7336#issuecomment-172144917 Ping @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...

2016-01-14 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/5827#issuecomment-171667688 @rxin, yes we used this and we implements a new sqlparser based on this interface to support ANSI tpcds sql. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-12742] [SQL] org.apache.spark.sql.hive....

2016-01-11 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/10682#discussion_r49404076 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/LogicalPlanToSQLSuite.scala --- @@ -24,6 +24,9 @@ class LogicalPlanToSQLSuite extends

[GitHub] spark pull request: [SPARK-8968] [SQL] external sort by the partit...

2016-01-10 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/7336#issuecomment-170359121 Back to update, @marmbrus @rxin please help review this when you have time. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-12742] [SQL] org.apache.spark.sql.hive....

2016-01-10 Thread scwf
GitHub user scwf opened a pull request: https://github.com/apache/spark/pull/10682 [SPARK-12742] [SQL] org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table already exists exception ``` [info] Exception encountered when attempting to run a suite with class name

[GitHub] spark pull request: [SPARK-8968] [SQL] external sort by the partit...

2016-01-10 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/7336#issuecomment-170412209 @rxin, yes, This PR try to fix the same issue on the Hive support side. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-6190][core] create LargeByteBuffer for ...

2016-01-08 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/5400#issuecomment-170037893 >>The cached size cannot be greater than 2GB. @rxin how to understand the `cached size`? the partition size of a cached rdd? --- If your project is

[GitHub] spark pull request: [SPARK-6190][core] create LargeByteBuffer for ...

2016-01-06 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/5400#issuecomment-169517123 hi @squito, can you explain in which situation users will hit the 2g limit? will a job of processing very large data(such as PB level data) reach this limit? --- If your

[GitHub] spark pull request: [SPARK-12321][SQL] JSON format for TreeNode (u...

2015-12-22 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/10311#issuecomment-166772051 Get it thanks @marmbrus :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-12321][SQL] JSON format for TreeNode (u...

2015-12-22 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/10311#issuecomment-166611132 Hi @cloud-fan can you explain in which cases we can use this feature or the motivation for this? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-12222] [Core] Deserialize RoaringBitmap...

2015-12-10 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/10253#issuecomment-163590407 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-11016] Move RoaringBitmap to explicit K...

2015-12-08 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/9748#issuecomment-162860246 @davies here are some problems when deserialize for RoaringBitmap. see the examples below: run this piece of code ``` import com.esotericsoftware.kryo.io.{Input

[GitHub] spark pull request: [SPARK-11016] Move RoaringBitmap to explicit K...

2015-12-08 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/9748#issuecomment-163074120 ok, should i send pr to master and branch-1.6 both? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12222] [Core] Deserialize RoaringBitmap...

2015-12-08 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/10213#issuecomment-163089761 /cc @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-12222] [Core] Deserialize RoaringBitmap...

2015-12-08 Thread scwf
GitHub user scwf opened a pull request: https://github.com/apache/spark/pull/10213 [SPARK-1] [Core] Deserialize RoaringBitmap using Kryo serializer throw Buffer underflow exception Deserialize RoaringBitmap using Kryo serializer throw Buffer underflow exception

[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

2015-10-26 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/9215#issuecomment-151041137 should this merged to branch-1.5? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-9281] [SQL] use decimal or double when ...

2015-10-13 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/7642#issuecomment-147906001 hi @davies seems this is not compatible with hiveql, HiveQl still parse float number as double. https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org

[GitHub] spark pull request: [SPARK-4226][SQL]Add subquery (not) in/exists ...

2015-10-11 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/9055#issuecomment-147273499 ok, does this support multi exists and in in where clause? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-4226][SQL]Add subquery (not) in/exists ...

2015-10-11 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/9055#issuecomment-147272550 what's the difference with #4812? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SQL] Add toString to DataFrame/Column

2015-09-24 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/4436#discussion_r40289134 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -67,6 +68,17 @@ abstract class Expression extends

[GitHub] spark pull request: [SQL] Add toString to DataFrame/Column

2015-09-23 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/4436#discussion_r40283225 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -67,6 +68,17 @@ abstract class Expression extends

[GitHub] spark pull request: [SPARK-4131][SQL] support writing data into th...

2015-09-20 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/4380#issuecomment-141853176 @litao-buptsse, i will update this soon thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-09-08 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/7417#issuecomment-138544481 @Sephiroth-Lin can you rebase this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-09-05 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/7417#issuecomment-138037270 @zsxwing it is definitely putting the small table in the left side of 'RDD.cartesian` improve the performance. you can have a simple test that do cartesian with a big data

[GitHub] spark pull request: [SPARK-8813][SQL] Combine files when there're ...

2015-08-16 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/8125#issuecomment-131669839 @liancheng we have this cases: The production system produce small text/csv files every five minute, and we use spark sql to do some ETL work(such as agg) on this small

[GitHub] spark pull request: [SPARK-4131][SQL] support writing data into th...

2015-08-12 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/4380#issuecomment-130185516 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-8968] [SQL] external sort by the partit...

2015-08-12 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/7336#issuecomment-130316086 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-8968] [SQL] external sort by the partit...

2015-08-12 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/7336#issuecomment-130312583 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-4131][SQL] support writing data into th...

2015-08-11 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/4380#issuecomment-130153626 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-8968] [SQL] external sort by the partit...

2015-08-11 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/7336#issuecomment-130127052 /cc @marmbrus can you take a look at this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-4131][SQL] support writing data into th...

2015-08-11 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/4380#issuecomment-130152470 yes, since we upgrade the hive version to 1.2.1, we should adapt the token tree in hiveql, the old one is not correct in 1.2.1. Updated --- If your project is set up

[GitHub] spark pull request: [SPARK-8968] [SQL] external sort by the partit...

2015-08-09 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/7336#issuecomment-129266361 /cc @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-7190] [SPARK-8804] [SPARK-7815] [SQL] u...

2015-08-06 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/7197#issuecomment-128555369 @davies https://issues.apache.org/jira/browse/SPARK-9725 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-7190] [SPARK-8804] [SPARK-7815] [SQL] u...

2015-08-06 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/7197#issuecomment-128352195 @davies here is a bug when this PR is in, that is when set executor memory = 32g, all the queries for string field will have problem. seems it return empty/garbled

[GitHub] spark pull request: [SPARK-4131][SQL] support writing data into th...

2015-07-29 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/4380#issuecomment-125853720 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-4131][SQL] support writing data into th...

2015-07-28 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/4380#issuecomment-125565512 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-4131][SQL] support writing data into th...

2015-07-23 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/4380#issuecomment-124285788 /cc @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

  1   2   3   4   5   6   7   8   9   10   >