[GitHub] spark pull request #19330: [SPARK-18134][SQL] Orderable MapType

2018-10-28 Thread jinxing64
Github user jinxing64 closed the pull request at: https://github.com/apache/spark/pull/19330 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22712: [SPARK-25724] Add sorting functionality in MapType.

2018-10-16 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/22712 > BTW, how does hive implement comparable maps? Is it below piece of code ? https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspec

[GitHub] spark pull request #22712: [SPARK-25724] Add sorting functionality in MapTyp...

2018-10-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/22712#discussion_r225628985 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ordering.scala --- @@ -53,6 +53,10 @@ class InterpretedOrdering(ordering

[GitHub] spark pull request #22712: [SPARK-25724] Add sorting functionality in MapTyp...

2018-10-13 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/22712#discussion_r224961789 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/MapType.scala --- @@ -73,6 +74,90 @@ case class MapType( override private[spark

[GitHub] spark issue #22712: [SPARK-25724] Add sorting functionality in MapType.

2018-10-13 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/22712 @maropu I want to split https://github.com/apache/spark/pull/19330 to two parts: 1. Approach to compare two Maps with themselves already sorted. (This PR) 2. Approach to sort

[GitHub] spark pull request #22712: [SPARK-25724] Add sorting functionality in MapTyp...

2018-10-13 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/22712 [SPARK-25724] Add sorting functionality in MapType. ## What changes were proposed in this pull request? This is related to https://github.com/apache/spark/pull/19330. As subtask

[GitHub] spark issue #19330: [SPARK-18134][SQL] Orderable MapType

2018-10-10 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19330 @maropu Thanks, and yes I'm still here and I can keep going if this pr is interested. I will update this pr this weekend

[GitHub] spark issue #19868: [SPARK-22676] Replace spark.sql.hive.verifyPartitionPath...

2018-09-26 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19868 Sure, updated. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-09-22 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19868 Sure, updated. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-09-21 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19868 @cloud-fan Thanks for ping~ I updated the description. Let me know if I should refine it. --- - To unsubscribe, e-mail

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-09-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19868 Sure, let me do it today. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21330: [SPARK-22234] Support distinct window functions

2018-09-02 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21330 If this feature is interested, could you please help start the review @jiangxb1987 Thanks a lot. --- - To unsubscribe, e

[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

2018-08-27 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/22202 Thanks for ping~ Seems that `ShuffleMapTask0.1` is a speculation, please update the description. The change seems fine for me. But give https://github.com/apache/spark/pull/21019

[GitHub] spark pull request #21330: [SPARK-22234] Support distinct window functions

2018-08-27 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/21330#discussion_r213000357 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1883,7 +1883,19 @@ class Analyzer

[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongHashedRelation in ex...

2018-07-19 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21772 Jenkins, test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21712: [SPARK-22384][SQL][followup] Refine partition pruning wh...

2018-07-05 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21712 Thanks for ping me. LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2018-06-05 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19602 Thanks for merging ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

2018-06-01 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19602#discussion_r192550679 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -207,65 +271,68 @@ class HiveClientSuite(version: String

[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

2018-06-01 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19602#discussion_r192550486 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -59,38 +61,62 @@ class HiveClientSuite(version: String

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2018-06-01 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19602 And also I think we have same problem for datasource table. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2018-06-01 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19602 @cloud-fan Sorry for late reply, so busy these days. In current change: 1. I follow `Cast.mayTruncate` strictly when extract partition Attribute; 2. I created new test data

[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

2018-06-01 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19602#discussion_r192312477 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -207,65 +271,68 @@ class HiveClientSuite(version: String

[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

2018-06-01 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19602#discussion_r192311969 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -207,65 +271,68 @@ class HiveClientSuite(version: String

[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...

2018-05-28 Thread jinxing64
Github user jinxing64 closed the pull request at: https://github.com/apache/spark/pull/21424 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

2018-05-28 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21424 > Broadcast is special because it's run on driver side, so SparkOutOfMemoryError is same as OutOfMemoryError, we need to kill the driver anyway. Thanks a lot for deep explanation

[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

2018-05-28 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19602#discussion_r191206457 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -642,11 +641,11 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

2018-05-28 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19602#discussion_r191204885 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -657,18 +656,41 @@ private[client] class Shim_v0_13 extends

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2018-05-28 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19602 @cloud-fan I removed some cases in `ExtractAttribute`. But to make it clear, do you mean that in below scenario, we don't prune partitions? ``` create table test(id int, string

[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

2018-05-28 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21424 @cloud-fan > I also found that we may throw OOM My previous understanding is that Spark throw `SparkOutOfMemoryError` when expect there's no memory -- such expectation is from Sp

[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...

2018-05-27 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/21424#discussion_r191107018 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -106,11 +108,20 @@ private[execution] object

[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

2018-05-27 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19602#discussion_r191078137 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -657,18 +656,46 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

2018-05-27 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19602#discussion_r191078043 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -53,7 +52,7 @@ class HiveClientSuite(version: String

[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

2018-05-27 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19602#discussion_r191078009 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -657,18 +656,46 @@ private[client] class Shim_v0_13 extends

[GitHub] spark issue #21424: [SPARK-24379] BroadcastExchangeExec should catch SparkOu...

2018-05-24 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21424 cc @cloud-fan @JoshRosen Would you please help take a look at this when you have time ? --- - To unsubscribe, e-mail

[GitHub] spark pull request #21424: [SPARK-24379] BroadcastExchangeExec should catch ...

2018-05-24 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/21424 [SPARK-24379] BroadcastExchangeExec should catch SparkOutOfMemory and re-throw SparkFatalException, which wraps SparkOutOfMemory inside. ## What changes were proposed in this pull request

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2018-05-24 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19602 @cloud-fan Thanks a lot for looking into this. I updated the change and generalized `ExtractAttribute

[GitHub] spark issue #21342: [SPARK-24294] Throw SparkException when OOM in Broadcast...

2018-05-23 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21342 Thanks for merging ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21342: [SPARK-24294] Throw SparkException when OOM in Broadcast...

2018-05-23 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21342 Thanks a gain for comments ! I updated this pr and added small defensive logic in SparkUncaughtExceptionHandler.scala. Please take another look

[GitHub] spark pull request #21342: [SPARK-24294] Throw SparkException when OOM in Br...

2018-05-23 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/21342#discussion_r190237452 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -111,12 +112,18 @@ case class

[GitHub] spark issue #21342: [SPARK-24294] Throw SparkException when OOM in Broadcast...

2018-05-22 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21342 https://issues.scala-lang.org/browse/SI-9554?orig=1 is still "OPEN", not sure which scala version can fi

[GitHub] spark pull request #21342: [SPARK-24294] Throw SparkException when OOM in Br...

2018-05-22 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/21342#discussion_r189900950 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -111,12 +112,18 @@ case class

[GitHub] spark pull request #21342: [SPARK-24294] Throw SparkException when OOM in Br...

2018-05-20 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/21342#discussion_r189482538 --- Diff: core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryException.java --- @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #21330: [SPARK-22234] Support distinct window functions

2018-05-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21330 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #21330: [SPARK-22234] Support distinct window functions

2018-05-20 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/21330#discussion_r189456451 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala --- @@ -213,10 +218,24 @@ case class WindowExec

[GitHub] spark issue #21330: [SPARK-22234] Support distinct window functions

2018-05-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21330 @hvanhovell Thanks for looking into this. >So this only works for global window frames? It's for entire-partition/global window frame, growing frame, shrinking frame and mov

[GitHub] spark issue #21342: [SPARK-24294] Throw SparkException when OOM in Broadcast...

2018-05-19 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21342 @kiszk @cloud-fan Thanks a lot for comments. I tested manually and found that the bug exists in all fatal throwable. In current change, I catch all fatal throwable and wrap

[GitHub] spark issue #21342: [SPARK-24294] Throw SparkException when OOM in Broadcast...

2018-05-19 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21342 I will update the pr title if the change is on the right direction. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21330: [SPARK-22234] Support distinct window functions

2018-05-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21330 Also cc @kiszk @HyukjinKwon Really appreciate if you can leave some comments :) --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #21252: [SPARK-24193] create TakeOrderedAndProjectExec only when...

2018-05-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21252 Thanks for merging ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21342: [SPARK-24294] Throw SparkException when OOM in Broadcast...

2018-05-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21342 Thanks a lot for looking into this. The issue is that, sometimes user would configure `spark.sql.broadcastTimeout` as bigger value, because the `relationFuture` in `BroadcastExchangeExec

[GitHub] spark issue #21286: [SPARK-24238][SQL] HadoopFsRelation can't append the sam...

2018-05-16 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21286 Cool, so do I understand correctly that there are only two things need to do: 1. create a unique jobID, permaps `timestamp+uuid`? 2. disable cleanup in FileOutputCommitter

[GitHub] spark issue #21342: [SPARK-24294] Throw SparkException when OOM in Broadcast...

2018-05-16 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21342 cc @sameeragarwal @hvanhovell @cloud-fan @jiangxb1987 Please take a look at this when you have time

[GitHub] spark pull request #21342: [SPARK-24294] Throw SparkException when OOM in Br...

2018-05-16 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/21342 [SPARK-24294] Throw SparkException when OOM in BroadcastExchangeExec ## What changes were proposed in this pull request? When OutOfMemoryError thrown from BroadcastExchangeExec

[GitHub] spark issue #21286: [SPARK-24238][SQL] HadoopFsRelation can't append the sam...

2018-05-15 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21286 Does Spark have a jobID in writing path? Below path is an example in my debugging log: ``` parquettest2/_temporary/0/_temporary/attempt_20180515215310__m_00_0/part-0-9104445e

[GitHub] spark issue #21330: [SPARK-22234] Support distinct window functions

2018-05-15 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21330 @cloud-fan @jiangxb1987 @cenyuhai Do you think this change makes sense? I can keep working

[GitHub] spark pull request #21330: [SPARK-22234] Support distinct window functions

2018-05-15 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/21330 [SPARK-22234] Support distinct window functions ## What changes were proposed in this pull request? This pr proposes to support distinct window functions. After this change, query like below

[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...

2018-05-15 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21289 @cloud-fan @adrian-ionescu I added a test, please check when you have time. --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #21252: [SPARK-24193] Sort by disk when number of limit i...

2018-05-15 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/21252#discussion_r188187631 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1238,6 +1238,15 @@ object SQLConf { .booleanConf

[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...

2018-05-11 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21289 I mean the memory refresh always happen after writing data, which I think is not necessary ? --- - To unsubscribe, e-mail

[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...

2018-05-11 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21289 @cloud-fan @adrian-ionescu Thanks for looking into this ! The issue is that when I append into a table with sql below: ``` insert into X select ``` It always refresh

[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...

2018-05-10 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21289 cc @cloud-fan @gengliangwang @adrian-ionescu Do you have any thoughts? Please take a look when you have time, thanks

[GitHub] spark pull request #21289: [SPARK-24240] Add a config to control whether InM...

2018-05-10 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/21289 [SPARK-24240] Add a config to control whether InMemoryFileIndex should update cache when refresh. ## What changes were proposed in this pull request? In current code(https://github.com

[GitHub] spark issue #21252: [SPARK-24193] Sort by disk when number of limit is big i...

2018-05-09 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21252 PR description updated. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21252: [SPARK-24193] Sort by disk when number of limit i...

2018-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/21252#discussion_r186966678 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1238,6 +1238,14 @@ object SQLConf { .booleanConf

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2018-05-07 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19602 I rebased this pr and resolved conflicts. cc @cloud-fan @jiangxb1987 Not sure if you have interest on this. Take a look if have time. Thanks

[GitHub] spark issue #21252: [SPARK-24193] Sort by disk when number of limit is big i...

2018-05-07 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21252 I will add a suite tomorrow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #21252: [SPARK-24193] Sort by disk when number of limit is big i...

2018-05-07 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21252 @cloud-fan @viirya Thanks for comments. I refined accordingly. Please check~ --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #21212: [SPARK-24143] filter empty blocks when convert mapstatus...

2018-05-07 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21212 Thanks for merging ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21252: [SPARK-24193] Sort by disk when number of limit is big i...

2018-05-06 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21252 > Instead of touching inside of TakeOrderedAndProjectExec, how about we don't replace Sort + Limit with TakeOrderedAndProjectExec when reaching the threshold? Yes, the code will be m

[GitHub] spark pull request #21252: [SPARK-24193] Sort by disk when number of limit i...

2018-05-06 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/21252#discussion_r186320836 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -127,13 +127,36 @@ case class TakeOrderedAndProjectExec

[GitHub] spark issue #21252: [SPARK-24193] Sort by disk when number of limit is big i...

2018-05-06 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21252 cc @cloud-fan @viirya I'm not sure if you are interested in this config. Could you please give some advice? Thanks a lot

[GitHub] spark pull request #21252: [SPARK-24193] Sort by disk when number of limit i...

2018-05-06 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/21252 [SPARK-24193] Sort by disk when number of limit is big in TakeOrderedAndProjectExec ## What changes were proposed in this pull request? Physical plan of `select colA from t order

[GitHub] spark pull request #21212: [SPARK-24143] filter empty blocks when convert ma...

2018-05-06 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/21212#discussion_r186293064 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -267,28 +269,30 @@ final class

[GitHub] spark issue #21212: [SPARK-24143] filter empty blocks when convert mapstatus...

2018-05-05 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21212 @squito I ananlyzed "YourKit Memory Inspections" to analyze the heap, but didn't find many duplicate objects in ArrayBuffer. I guess your concern is ArrayBuffer will do lots of co

[GitHub] spark pull request #21212: [SPARK-24143] filter empty blocks when convert ma...

2018-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/21212#discussion_r186254447 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -267,28 +269,28 @@ final class

[GitHub] spark issue #21212: [SPARK-24143] filter empty blocks when convert mapstatus...

2018-05-03 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21212 @squito @cloud-fan @jiangxb1987 Thanks a lot for review. > shall we also optimize the space usage for MapStatus @cloud-fan do you mean optimize space usage for MapStatus w

[GitHub] spark issue #21212: [SPARK-24143] filter empty blocks when convert mapstatus...

2018-05-02 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21212 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21212: [SPARK-24143] filter empty blocks when convert mapstatus...

2018-05-01 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21212 @squito @cloud-fan @jiangxb1987 Do you think this make sense? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21212: [SPARK-24143] filter empty blocks when convert ma...

2018-05-01 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/21212 [SPARK-24143] filter empty blocks when convert mapstatus to (blockId,… … size) pair. ## What changes were proposed in this pull request? In current code

[GitHub] spark issue #21091: [SPARK-22676][FOLLOW-UP] fix code style for test.

2018-04-19 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21091 Thanks for merging ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21091: [SPARK-22676][FOLLOW-UP] fix code style for test.

2018-04-19 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/21091#discussion_r182677298 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala --- @@ -34,79 +34,81 @@ class QueryPartitionSuite extends QueryTest

[GitHub] spark issue #21091: [SPARK-22676][FOLLOW-UP] fix code style for test.

2018-04-18 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21091 cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21091: [SPARK-22676][FOLLOW-UP] fix code style for test.

2018-04-17 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/21091 [SPARK-22676][FOLLOW-UP] fix code style for test. ## What changes were proposed in this pull request? This pr address comments in https://github.com/apache/spark/pull/19868 ; Fix

[GitHub] spark issue #21091: [SPARK-22676][FOLLOW-UP] fix code style for test.

2018-04-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21091 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19868 @cloud-fan Thanks a lot for merging. I will address the left comments today. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #21019: [SPARK-23948] Trigger mapstage's job listener in submitM...

2018-04-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21019 @squito @jiangxb1987 Thanks for merging. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21019: [SPARK-23948] Trigger mapstage's job listener in submitM...

2018-04-16 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21019 Thanks comments from Imran and Xingbo. I made some change and please take another look when you have time

[GitHub] spark pull request #19868: [SPARK-22676] Avoid iterating all partition paths...

2018-04-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19868#discussion_r181939704 --- Diff: core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala --- @@ -195,6 +205,10 @@ class NewHadoopRDD[K, V]( e

[GitHub] spark pull request #19868: [SPARK-22676] Avoid iterating all partition paths...

2018-04-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19868#discussion_r181939661 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -276,6 +292,12 @@ class HadoopRDD[K, V]( try

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-16 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19868 @cloud-fan Thanks again for review; I updated according to your comments and please take another look

[GitHub] spark pull request #19868: [SPARK-22676] Avoid iterating all partition paths...

2018-04-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19868#discussion_r181718520 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -279,6 +293,10 @@ class HadoopRDD[K, V]( case e: IOException

[GitHub] spark pull request #19868: [SPARK-22676] Avoid iterating all partition paths...

2018-04-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19868#discussion_r181716009 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -197,17 +200,24 @@ class HadoopRDD[K, V]( val jobConf = getJobConf

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-15 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19868 @cloud-fan @jiangxb1987 I updated and add a config `spark.files.ignoreMissingFiles`. It works for HadoopRDD and NewHadoopRDD in two cases: 1. "file not found" when `getPartiti

[GitHub] spark pull request #19868: [SPARK-22676] Avoid iterating all partition paths...

2018-04-15 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19868#discussion_r181571713 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -197,17 +200,24 @@ class HadoopRDD[K, V]( val jobConf = getJobConf

[GitHub] spark pull request #19868: [SPARK-22676] Avoid iterating all partition paths...

2018-04-15 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19868#discussion_r181571746 --- Diff: core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala --- @@ -124,17 +126,25 @@ class NewHadoopRDD[K, V

[GitHub] spark pull request #19868: [SPARK-22676] Avoid iterating all partition paths...

2018-04-13 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19868#discussion_r181538066 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -176,12 +176,13 @@ class HadoopTableReader( val

[GitHub] spark pull request #19868: [SPARK-22676] Avoid iterating all partition paths...

2018-04-13 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19868#discussion_r181294385 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -176,12 +176,13 @@ class HadoopTableReader( val

[GitHub] spark issue #21019: [SPARK-23948] Trigger mapstage's job listener in submitM...

2018-04-12 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21019 @squito Thanks a lot. I will add a test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19868: [SPARK-22676] Avoid iterating all partition paths...

2018-04-12 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19868#discussion_r181014951 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -176,12 +176,13 @@ class HadoopTableReader( val

[GitHub] spark pull request #19868: [SPARK-22676] Avoid iterating all partition paths...

2018-04-11 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19868#discussion_r180733361 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -176,12 +176,13 @@ class HadoopTableReader( val

  1   2   3   4   5   6   7   8   >