[GitHub] spark issue #20825: add impurity stats in tree leaf node debug string

2018-03-15 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/20825 cc @zsxwing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20825: add impurity stats in tree leaf node debug string

2018-03-14 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/20825 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...

2018-02-10 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/20561 lgtm --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorte...

2018-02-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/20561#discussion_r167300330 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/UnsafeKVExternalSorterSuite.scala --- @@ -205,4 +206,42 @@ class

[GitHub] spark pull request #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorte...

2018-02-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/20561#discussion_r167299807 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java --- @@ -98,10 +99,20 @@ public UnsafeKVExternalSorter

[GitHub] spark pull request #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorte...

2018-02-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/20561#discussion_r167299716 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java --- @@ -98,10 +99,20 @@ public UnsafeKVExternalSorter

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/18052 Personally, I think less is more, don't add everything into every software, otherwise every software can write email eventually. The RDD API is kind of frozen, we don't add more APIs

[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/18052 It seems that it's also easy to implement these outside of PySpark by user themselves or third-party libraries, right? If that's the case, I'd like not to add it into PySpark. --- If your project

[GitHub] spark issue #18244: [SPARK-20211][SQL] Fix the Precision and Scale of Decima...

2017-06-09 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/18244 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #18244: [SPARK-20211][SQL] Fix the Precision and Scale of...

2017-06-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/18244#discussion_r121050241 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -126,7 +126,15 @@ final class Decimal extends Ordered[Decimal

[GitHub] spark issue #17375: [SPARK-19019][PYTHON][BRANCH-1.6] Fix hijacked `collecti...

2017-03-21 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/17375 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17374: [SPARK-19019][PYTHON][BRANCH-2.0] Fix hijacked `collecti...

2017-03-21 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/17374 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17282: [SPARK-19872][PYTHON] Use the correct deserializer for R...

2017-03-15 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/17282 lgtm, merging into master, and 2.1, 2.0 branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #16909: [SPARK-13450] Introduce ExternalAppendOnlyUnsafeR...

2017-03-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16909#discussion_r105007318 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala --- @@ -341,25 +364,27 @@ private[window] final class

[GitHub] spark pull request #16909: [SPARK-13450] Introduce ExternalAppendOnlyUnsafeR...

2017-03-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16909#discussion_r105006782 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala --- @@ -164,9 +176,12 @@ private[window] final class

[GitHub] spark pull request #16909: [SPARK-13450] Introduce ExternalAppendOnlyUnsafeR...

2017-03-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16909#discussion_r105006306 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -674,6 +675,24 @@ object SQLConf { .stringConf

[GitHub] spark issue #16896: [SPARK-19561][Python] cast TimestampType.toInternal outp...

2017-03-07 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16896 My bad, did not realized that, sorry. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16896: [SPARK-19561][Python] cast TimestampType.toInternal outp...

2017-03-07 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16896 Merged into master and 2.1 branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16896: [SPARK-19561][Python] cast TimestampType.toInternal outp...

2017-03-06 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16896 lgtm, will merge it when I get a chance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only d...

2017-02-27 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16782#discussion_r103276349 --- Diff: python/pyspark/__init__.py --- @@ -96,9 +96,11 @@ def keyword_only(func): """ @wraps(func) def wrapper(

[GitHub] spark issue #17036: [SPARK-19706][pyspark] add Column.contains in pyspark

2017-02-23 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/17036 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17036: [SPARK-19706][pyspark] add Column.contains in pyspark

2017-02-22 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/17036 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16865: [SPARK-19530][SQL] Use guava weigher for code cache evic...

2017-02-19 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16865 I still think it's not worth it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16844: [SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMa...

2017-02-17 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16844 Merging into master, 2.1, 2.0 branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #16844: [SPARK-19500] [SQL] Fix off-by-one bug in BytesTo...

2017-02-15 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16844#discussion_r101360777 --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java --- @@ -742,7 +742,7 @@ public boolean append(Object kbase, long koff, int

[GitHub] spark pull request #16909: [SPARK-13450] Introduce ExternalAppendOnlyUnsafeR...

2017-02-14 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16909#discussion_r101214678 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArray.scala --- @@ -0,0 +1,218 @@ +/* + * Licensed

[GitHub] spark issue #16896: [SPARK-19561][Python] cast TimestampType.toInternal outp...

2017-02-14 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16896 Just one minor comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #16896: [SPARK-19561][Python] cast TimestampType.toIntern...

2017-02-14 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16896#discussion_r101204910 --- Diff: python/pyspark/sql/types.py --- @@ -189,7 +189,7 @@ def toInternal(self, dt): if dt is not None: seconds

[GitHub] spark issue #16865: [SPARK-19530][SQL] Use guava weigher for code cache evic...

2017-02-09 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16865 @viirya This is a general OOM, should not be caused by cached bytecode, they are way smaller comparing other things in executor, I think this patch will not help either. --- If your project is set

[GitHub] spark issue #16865: [SPARK-19530][SQL] Use guava weigher for code cache evic...

2017-02-09 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16865 I understand the motivation here, could you show the benefit of this change for a real use case? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #16865: [SPARK-19530][SQL] Use guava weigher for code cac...

2017-02-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16865#discussion_r100428764 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -1004,7 +1016,8 @@ object CodeGenerator

[GitHub] spark pull request #16844: [SPARK-19500] [SQL] Fix off-by-one bug in BytesTo...

2017-02-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16844#discussion_r100387863 --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java --- @@ -695,11 +690,16 @@ public boolean append(Object kbase, long koff, int

[GitHub] spark issue #16825: [SPARK-19481][REPL][maven]Avoid to leak SparkContext in ...

2017-02-09 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16825 lgtm, merging this into master and 2.1 branch, thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #16844: [SPARK-19500] [SQL] Fix off-by-one bug in BytesTo...

2017-02-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16844#discussion_r100383151 --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java --- @@ -741,14 +741,6 @@ public boolean append(Object kbase, long koff, int

[GitHub] spark pull request #16844: [SPARK-19500] [SQL] Fix off-by-one bug in BytesTo...

2017-02-09 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16844#discussion_r100381544 --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java --- @@ -695,11 +690,16 @@ public boolean append(Object kbase, long koff, int

[GitHub] spark pull request #16825: [SPARK-19481][REPL][maven]Avoid to leak SparkCont...

2017-02-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16825#discussion_r100206125 --- Diff: repl/src/main/scala/org/apache/spark/repl/Signaling.scala --- @@ -28,15 +28,17 @@ private[repl] object Signaling extends Logging { * when

[GitHub] spark issue #16844: [SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMa...

2017-02-08 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16844 @viirya Addressed your comment, also fixed another bug (updated PR description). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #16844: [SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMa...

2017-02-07 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16844 cc @joshrosen, @viirya --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #16844: [SPARK-19500] [SQL] Fix off-by-one bug in BytesTo...

2017-02-07 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/16844 [SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMap ## What changes were proposed in this pull request? Radix sort require that half of array as free (as temporary space), so we use

[GitHub] spark issue #13808: [SPARK-14480][SQL] Remove meaningless StringIteratorRead...

2017-01-26 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13808 @HyukjinKwon @rxin This patch have a regression: A column that have escaped newline can't be correctly parsed anymore. Should we revert this patch or figure a way to fix that? --- If your project

[GitHub] spark issue #16581: [SPARK-18589] [SQL] Fix Python UDF accessing attributes ...

2017-01-20 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16581 Cherry-picked into 2.1 branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15467: [SPARK-17912][SQL] Refactor code generation to get data ...

2017-01-19 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15467 Merging this into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #16581: [SPARK-18589] [SQL] Fix Python UDF accessing attr...

2017-01-17 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16581#discussion_r96469102 --- Diff: python/pyspark/sql/tests.py --- @@ -342,6 +342,15 @@ def test_udf_in_filter_on_top_of_outer_join(self): df = df.withColumn('b', udf

[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

2017-01-17 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16429 lgtm, merging into master and 2.1 branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.n...

2017-01-14 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16429#discussion_r96130191 --- Diff: python/pyspark/serializers.py --- @@ -382,18 +382,30 @@ def _hijack_namedtuple(): return global _old_namedtuple

[GitHub] spark issue #16555: [SPARK-19180][SQL] the offset of short should be 2 in Of...

2017-01-13 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16555 lgtm, merging it into master, 2.1 and 2.0 branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16581: [SPARK-18589] [SQL] Fix Python UDF accessing attributes ...

2017-01-13 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16581 cc @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16581: [SPARK-18589] [SQL] Fix Python UDF accessing attr...

2017-01-13 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/16581 [SPARK-18589] [SQL] Fix Python UDF accessing attributes from both side of join ## What changes were proposed in this pull request? PythonUDF is unevaluable, which can not be used inside

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-12-24 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/14452 @viirya For duplicated CTE, without some optimization (pushing down different predicates in different positions), the physical plan should be identical. So I'm wondering some aggressive pushing down

[GitHub] spark pull request #15923: [SPARK-4105] retry the fetch or stage if shuffle ...

2016-12-22 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/15923#discussion_r93671999 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -305,40 +316,84 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark issue #16211: [SPARK-18576][PYTHON] Add basic TaskContext information ...

2016-12-20 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16211 Looks good to me in general, cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Remove timeout for reading d...

2016-12-20 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16263 Merging this into master and 2.1 branch, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15980: [SPARK-18528][SQL] Fix a bug to initialise an iterator o...

2016-12-20 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15980 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16232: [SPARK-18800][SQL] Correct the assert in UnsafeKVExterna...

2016-12-20 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16232 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16232: [SPARK-18800][SQL] Fix UnsafeKVExternalSorter by correct...

2016-12-19 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16232 @viirya without a repro, I don't think this is the root cause. There could be a random corrupt that cause the error. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #16232: [SPARK-18800][SQL] Fix UnsafeKVExternalSorter by correct...

2016-12-19 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16232 That make sense, we should update the assert. But this still is not a bug, the other changes are not needed. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #16232: [SPARK-18800][SQL] Fix UnsafeKVExternalSorter by correct...

2016-12-19 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16232 @viirya That's not correct, the values does not have entry in the array. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16232: [SPARK-18800][SQL] Fix UnsafeKVExternalSorter by correct...

2016-12-19 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16232 @viirya A map with multiple values for same key, is only used for hashed relation, not aggregation, will also not passed into UnsafeKVExternalSorter. So I think this is not actually a bug

[GitHub] spark pull request #16263: [SPARK-18281][SQL][PySpark] Consumes the returned...

2016-12-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16263#discussion_r93127142 --- Diff: python/pyspark/sql/tests.py --- @@ -558,6 +558,18 @@ def test_create_dataframe_from_objects(self): self.assertEqual(df.dtypes, [(&quo

[GitHub] spark pull request #16263: [SPARK-18281][SQL][PySpark] Consumes the returned...

2016-12-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16263#discussion_r93126957 --- Diff: python/pyspark/rdd.py --- @@ -135,12 +135,12 @@ def _load_from_socket(port, serializer): break if not sock: raise

[GitHub] spark pull request #16263: [SPARK-18281][SQL][PySpark] Consumes the returned...

2016-12-16 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16263#discussion_r92833679 --- Diff: python/pyspark/rdd.py --- @@ -2349,7 +2352,12 @@ def toLocalIterator(self): """ with SCCallSiteSync(self.

[GitHub] spark pull request #16263: [SPARK-18281][SQL][PySpark] Consumes the returned...

2016-12-16 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16263#discussion_r92830957 --- Diff: python/pyspark/rdd.py --- @@ -2349,7 +2352,12 @@ def toLocalIterator(self): """ with SCCallSiteSync(self.

[GitHub] spark issue #16274: [SPARK-18853][SQL] Project (UnaryNode) is way too aggres...

2016-12-13 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16274 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16193 Pushing down predicates into data source is also during optimization in planner, I think this one is not the first that do optimization outside Optimizer. --- If your project is set up for it, you

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16193 The reason we move the PythonUDFEvaluator from logical plan into physical plan, because this one-off break many things, many rules need to treat specially. --- If your project is set up

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16193 @cloud-fan It's not trivial to do this in optimizer, for example, we should split one Filter into two, that will conflict with another optimizer rule, that combine two filter into one. --- If your

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16193 If no objection in next two hours, I will merge this one into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-09 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16193 @cloud-fan There is no R UDF at this point. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16235: [SPARK-18745][SQL] Fix signed integer overflow due to to...

2016-12-09 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16235 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16193: [SPARK-18766] [SQL] Push Down Filter Through Batc...

2016-12-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16193#discussion_r91641264 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -165,4 +167,25 @@ object ExtractPythonUDFs extends

[GitHub] spark pull request #16193: [SPARK-18766] [SQL] Push Down Filter Through Batc...

2016-12-08 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16193#discussion_r91609923 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -165,4 +167,31 @@ object ExtractPythonUDFs extends

[GitHub] spark issue #16121: [SPARK-16589][PYTHON] Chained cartesian produces incorre...

2016-12-08 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16121 LGTM, merging into master and 2.1 branch, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15923: [SPARK-4105] retry the fetch or stage if shuffle block i...

2016-12-07 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15923 @JoshRosen Added a test for `detectCorrupt` is false. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #16193: [SPARK-18766] [SQL] Push Down Filter Through Batc...

2016-12-07 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16193#discussion_r91394979 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -166,3 +174,40 @@ object ExtractPythonUDFs extends

[GitHub] spark issue #15923: [SPARK-4105] retry the fetch or stage if shuffle block i...

2016-12-06 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15923 ping @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16151: [SPARK-18719] Add spark.ui.showConsoleProgress to config...

2016-12-05 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16151 lgtm, merging into master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16121: [SPARK-16589][PYTHON] Chained cartesian produces incorre...

2016-12-02 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/16121 It's pretty tricky to make the chained CartesianDeserializer work, maybe it's easier to have a workaround in the RDD.cartesian() to add an _reserialize() between chained cartesian (or zipped

[GitHub] spark pull request #15923: [SPARK-4105] retry the fetch or stage if shuffle ...

2016-11-29 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/15923#discussion_r90085570 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -56,8 +59,10 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark pull request #15923: [SPARK-4105] retry the fetch or stage if shuffle ...

2016-11-29 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/15923#discussion_r90085604 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -305,40 +312,82 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark issue #15935: [SPARK-18188] add checksum for blocks of broadcast

2016-11-28 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15935 @zsxwing Added. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15935: [SPARK-18188] add checksum for blocks of broadcast

2016-11-28 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15935 Manually test this patch with a job that usually failed with corrupt streams: ``` 136 26993 0 FAILED PROCESS_LOCAL 0 / 10.1.108.161 2016/11/20 08:39:11 7 s 98 ms

[GitHub] spark pull request #15923: [SPARK-4105] retry the fetch or stage if shuffle ...

2016-11-28 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/15923#discussion_r89889964 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -305,40 +312,82 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark issue #15923: [SPARK-4105] retry the fetch or stage if shuffle block i...

2016-11-28 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15923 Manually test this patch with a job that usually failed because of corrupt stream, as the logging said: ``` 16/11/20 08:32:07 WARN ShuffleBlockFetcherIterator: got an corrupted block

[GitHub] spark pull request #15980: [SPARK-18528][SQL] Fix a bug to initialise an ite...

2016-11-23 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/15980#discussion_r89365636 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -67,23 +67,14 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark issue #13065: [SPARK-15214][SQL] Code-generation for Generate

2016-11-19 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13065 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #15923: [SPARK-4105] retry the fetch or stage if shuffle ...

2016-11-18 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/15923#discussion_r88759884 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -305,40 +312,82 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark pull request #15923: [SPARK-4105] retry the fetch or stage if shuffle ...

2016-11-18 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/15923#discussion_r88759763 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -305,40 +312,82 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark pull request #13065: [SPARK-15214][SQL] Code-generation for Generate

2016-11-18 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13065#discussion_r88755507 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala --- @@ -103,5 +109,192 @@ case class GenerateExec

[GitHub] spark pull request #13065: [SPARK-15214][SQL] Code-generation for Generate

2016-11-18 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13065#discussion_r88754155 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala --- @@ -113,4 +117,25 @@ class WholeStageCodegenSuite extends

[GitHub] spark pull request #15935: [SPARK-] add checksum for blocks of broadcast

2016-11-18 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/15935 [SPARK-] add checksum for blocks of broadcast ## What changes were proposed in this pull request? A TorrentBroadcast is serialized and compressed first, then splitted as fixed size blocks

[GitHub] spark issue #15923: [SPARK-4105] retry the fetch or stage if shuffle block i...

2016-11-18 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15923 @joshrosen @zsxwing Could you help to review this one ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #13065: [SPARK-15214][SQL] Code-generation for Generate

2016-11-17 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13065#discussion_r88581043 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala --- @@ -103,5 +109,182 @@ case class GenerateExec

[GitHub] spark pull request #13065: [SPARK-15214][SQL] Code-generation for Generate

2016-11-17 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13065#discussion_r88580906 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala --- @@ -113,4 +117,25 @@ class WholeStageCodegenSuite extends

[GitHub] spark issue #15894: [SPARK-18188] Add checksum for shuffle blocks

2016-11-17 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15894 Due to complexity and overhead here, close it in favor of https://github.com/apache/spark/pull/15923/. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #15894: [SPARK-18188] Add checksum for shuffle blocks

2016-11-17 Thread davies
Github user davies closed the pull request at: https://github.com/apache/spark/pull/15894 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #15923: [SPARK-4105] retry the fetch or stage if shuffle ...

2016-11-17 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/15923 [SPARK-4105] retry the fetch or stage if shuffle block is corrupt ## What changes were proposed in this pull request? There is an outstanding issue that existed for a long time: Sometimes

[GitHub] spark pull request #13065: [SPARK-15214][SQL] Code-generation for Generate

2016-11-17 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13065#discussion_r88550711 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala --- @@ -40,6 +42,10 @@ private[execution] sealed case class LazyIterator

[GitHub] spark pull request #13065: [SPARK-15214][SQL] Code-generation for Generate

2016-11-17 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13065#discussion_r88525329 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -149,29 +167,52 @@ case class Stack(children: Seq

[GitHub] spark pull request #13065: [SPARK-15214][SQL] Code-generation for Generate

2016-11-17 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13065#discussion_r88524027 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala --- @@ -40,6 +42,10 @@ private[execution] sealed case class LazyIterator

[GitHub] spark pull request #13065: [SPARK-15214][SQL] Code-generation for Generate

2016-11-17 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13065#discussion_r88523492 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala --- @@ -103,5 +109,182 @@ case class GenerateExec

  1   2   3   4   5   6   7   8   9   10   >