spark git commit: [SPARK-19872] [PYTHON] Use the correct deserializer for RDD construction for coalesce/repartition

2017-03-15 Thread davies
`` ``` [u'a', u'b', u'', u'd', u'e', u'f', u'g', u'h', u'i', u'j', u'k', u'l', u''] ``` ## How was this patch tested? Unit test in `python/pyspark/tests.py`. Author: hyukjinkwon <gurwls...@gmail.com> Closes #17282 from HyukjinKwon/SPARK-19872. (cherry picked from commit 7387126f83dc0489eb1

spark git commit: [SPARK-19872] [PYTHON] Use the correct deserializer for RDD construction for coalesce/repartition

2017-03-15 Thread davies
85be Author: hyukjinkwon <gurwls...@gmail.com> Authored: Wed Mar 15 10:17:18 2017 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Wed Mar 15 10:17:18 2017 -0700 -- python/pyspark/

spark git commit: [SPARK-19561] [PYTHON] cast TimestampType.toInternal output to long

2017-03-07 Thread davies
ion in DataFrames near the epoch. ## How was this patch tested? Added a new test that fails without the change. dongjoon-hyun davies Mind taking a look? The contribution is my original work and I license the work to the project under the project’s open source license. Author: Jason Wh

spark git commit: [SPARK-19561] [PYTHON] cast TimestampType.toInternal output to long

2017-03-07 Thread davies
amp creation in DataFrames near the epoch. ## How was this patch tested? Added a new test that fails without the change. dongjoon-hyun davies Mind taking a look? The contribution is my original work and I license the work to the project under the project’s open source license. Author: Jason Wh

spark git commit: [SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMap

2017-02-17 Thread davies
2.2, or fail to insert into InMemorySorter in 2.1). This PR fix the off-by-one bug in BytesToBytesMap. This PR also fix a bug that the array will never grow if it fail to grow once (stay as initial capacity), introduced by #15722 . ## How was this patch tested? Added regression test. Author: Dav

spark git commit: [SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMap

2017-02-17 Thread davies
2.2, or fail to insert into InMemorySorter in 2.1). This PR fix the off-by-one bug in BytesToBytesMap. This PR also fix a bug that the array will never grow if it fail to grow once (stay as initial capacity), introduced by #15722 . ## How was this patch tested? Added regression test. Author: Dav

spark git commit: [SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMap

2017-02-17 Thread davies
ail to insert into InMemorySorter in 2.1). This PR fix the off-by-one bug in BytesToBytesMap. This PR also fix a bug that the array will never grow if it fail to grow once (stay as initial capacity), introduced by #15722 . ## How was this patch tested? Added regression test. Author: Davies

spark git commit: [SPARK-19481] [REPL] [MAVEN] Avoid to leak SparkContext in Signaling.cancelOnInterrupt

2017-02-09 Thread davies
er Commit: 303f00a4bf6660dd83c8bd9e3a107bb3438a421b Parents: 6287c94 Author: Shixiong Zhu <shixi...@databricks.com> Authored: Thu Feb 9 11:16:51 2017 -0800 Committer: Davies Liu <davies@gmail.com> Committed: Thu Feb 9 11:16:51 2017 -0800 --

spark git commit: [SPARK-17912] [SQL] Refactor code generation to get data for ColumnVector/ColumnarBatch

2017-01-19 Thread davies
asf/spark/diff/148a84b3 Branch: refs/heads/master Commit: 148a84b37082697c7f61c6a621010abe4b12f2eb Parents: 63d8390 Author: Kazuaki Ishizaki <ishiz...@jp.ibm.com> Authored: Thu Jan 19 15:16:05 2017 -0800 Committer: Davies Liu <davies@gmail.com> Committed: Thu Jan 1

spark git commit: [SPARK-19019] [PYTHON] Fix hijacked `collections.namedtuple` and port cloudpickle changes for PySpark to work with Python 3.6.0

2017-01-17 Thread davies
inished test(python3.6): pyspark.sql.window (4s) Finished test(python3.6): pyspark.sql.readwriter (35s) Tests passed in 433 seconds ``` Author: hyukjinkwon <gurwls...@gmail.com> Closes #16429 from HyukjinKwon/SPARK-19019. (cherry picked from commit 20e6280626fe243b170a2e7c5e018c67f3dac1db)

spark git commit: [SPARK-19019] [PYTHON] Fix hijacked `collections.namedtuple` and port cloudpickle changes for PySpark to work with Python 3.6.0

2017-01-17 Thread davies
git-wip-us.apache.org/repos/asf/spark/commit/20e62806 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/20e62806 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/20e62806 Branch: refs/heads/master Commit: 20e6280626fe243b170a2e7c5e018c67f3dac1db Parents: b79cc7c Author: hyukjinkwon <

spark git commit: [SPARK-19180] [SQL] the offset of short should be 2 in OffHeapColumn

2017-01-13 Thread davies
e 2. ## How was this patch tested? unit test Author: Yucai Yu <yucai...@intel.com> Closes #16555 from yucai/offheap_short. (cherry picked from commit ad0dadaa251b031a480fc2080f792a54ed7dfc5f) Signed-off-by: Davies Liu <davies@gmail.com> Project: http://git-wip-us.apache.org/

spark git commit: [SPARK-19180] [SQL] the offset of short should be 2 in OffHeapColumn

2017-01-13 Thread davies
e 2. ## How was this patch tested? unit test Author: Yucai Yu <yucai...@intel.com> Closes #16555 from yucai/offheap_short. (cherry picked from commit ad0dadaa251b031a480fc2080f792a54ed7dfc5f) Signed-off-by: Davies Liu <davies@gmail.com> Project: http://git-wip-us.apache.org/

spark git commit: [SPARK-19180] [SQL] the offset of short should be 2 in OffHeapColumn

2017-01-13 Thread davies
asf/spark/tree/ad0dadaa Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ad0dadaa Branch: refs/heads/master Commit: ad0dadaa251b031a480fc2080f792a54ed7dfc5f Parents: b0e8eb6 Author: Yucai Yu <yucai...@intel.com> Authored: Fri Jan 13 13:40:53 2017 -0800 Committer: Davies Liu <da

spark git commit: [SPARK-18281] [SQL] [PYSPARK] Remove timeout for reading data through socket for local iterator

2016-12-20 Thread davies
..@gmail.com> Authored: Tue Dec 20 13:12:16 2016 -0800 Committer: Davies Liu <davies@gmail.com> Committed: Tue Dec 20 13:12:16 2016 -0800 -- python/pyspark/rdd.py | 11 +-- python/pyspark/tests.py | 12

spark git commit: [SPARK-18281] [SQL] [PYSPARK] Remove timeout for reading data through socket for local iterator

2016-12-20 Thread davies
ded tests into PySpark. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Liang-Chi Hsieh <vii...@gmail.com> Closes #16263 from viirya/fix-pyspark-localiterator. (cherry picked from commit 95c95b71ed31b2971475aec6d7776dc234845d0a) Signed-off-b

spark git commit: [SPARK-18281] [SQL] [PYSPARK] Remove timeout for reading data through socket for local iterator

2016-12-20 Thread davies
ded tests into PySpark. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Liang-Chi Hsieh <vii...@gmail.com> Closes #16263 from viirya/fix-pyspark-localiterator. (cherry picked from commit 95c95b71ed31b2971475aec6d7776dc234845d0a) Signed-off-b

spark git commit: [SPARK-16589] [PYTHON] Chained cartesian produces incorrect number of records

2016-12-08 Thread davies
*de*serialization. ## How was this patch tested? Additional unit tests (sourced from #14248) plus one for testing a cartesian with zip. Author: Andrew Ray <ray.and...@gmail.com> Closes #16121 from aray/fix-cartesian. (cherry picked from commit 3c68944b229aaaeeaee3efcbae3e3be9a2914855) Signed-off-b

spark git commit: [SPARK-16589] [PYTHON] Chained cartesian produces incorrect number of records

2016-12-08 Thread davies
..@gmail.com> Authored: Thu Dec 8 11:08:12 2016 -0800 Committer: Davies Liu <davies@gmail.com> Committed: Thu Dec 8 11:08:12 2016 -0800 -- python/pyspark/serializers.py | 58 +++--- py

spark git commit: [SPARK-18719] Add spark.ui.showConsoleProgress to configuration docs

2016-12-05 Thread davies
0:50 2016 -0800 Committer: Davies Liu <davies@gmail.com> Committed: Mon Dec 5 14:40:50 2016 -0800 -- docs/configuration.md | 9 + 1 file

spark git commit: [SPARK-17817] [PYSPARK] [FOLLOWUP] PySpark RDD Repartitioning Results in Highly Skewed Partition Sizes

2016-10-18 Thread davies
park/diff/1e35e969 Branch: refs/heads/master Commit: 1e35e96930dda02cb0788c8143e5f2e1944b Parents: cd106b0 Author: Liang-Chi Hsieh <vii...@gmail.com> Authored: Tue Oct 18 14:25:10 2016 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Tue Oct 1

spark git commit: [SPARK-17388] [SQL] Support for inferring type date/timestamp/decimal for partition column

2016-10-18 Thread davies
ds/master Commit: 37686539f546ac7a3657dbfc59b7ac982b4b9bce Parents: e59df62 Author: hyukjinkwon <gurwls...@gmail.com> Authored: Tue Oct 18 13:20:42 2016 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Tue Oct 1

spark git commit: [SPARK-17845] [SQL] More self-evident window function frame boundary API

2016-10-12 Thread davies
ld Xin <r...@databricks.com> Authored: Wed Oct 12 16:45:10 2016 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Wed Oct 12 16:45:10 2016 -0700 -- python/pyspark/sql/tests.py | 25 +

spark git commit: [SPARK-17806] [SQL] fix bug in join key rewritten in HashJoin

2016-10-07 Thread davies
ber of columns and different data types. Manually test the reported case and confirmed that this PR fix the bug. Author: Davies Liu <dav...@databricks.com> Closes #15390 from davies/rewrite_key. (cherry picked from commit 94b24b84a666517e31e9c9d693f92d9bbfd7f9ad) Signed-off-by: Davies Li

spark git commit: [SPARK-17806] [SQL] fix bug in join key rewritten in HashJoin

2016-10-07 Thread davies
ber of columns and different data types. Manually test the reported case and confirmed that this PR fix the bug. Author: Davies Liu <dav...@databricks.com> Closes #15390 from davies/rewrite_key. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-17679] [PYSPARK] remove unnecessary Py4J ListConverter patch

2016-10-03 Thread davies
d: Mon Oct 3 14:12:03 2016 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Mon Oct 3 14:12:03 2016 -0700 -- python/pyspark/java_gateway.py | 9 - python/pyspark/ml/common.py| 4 ++-- python/pyspark

spark git commit: [SPARK-17738] [SQL] fix ARRAY/MAP in columnar cache

2016-09-30 Thread davies
ong. ## How was this patch tested? The flaky test should be fixed. Author: Davies Liu <dav...@databricks.com> Closes #15305 from davies/fix_MAP. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f327e168 Tree: http:

spark git commit: [SPARK-17100] [SQL] fix Python udf in filter on top of outer join

2016-09-19 Thread davies
ome expressions are not evaluable, we should check that before evaluate it. ## How was this patch tested? Added regression tests. Author: Davies Liu <dav...@databricks.com> Closes #15103 from davies/udf_join. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http:

spark git commit: [SPARK-17100] [SQL] fix Python udf in filter on top of outer join

2016-09-19 Thread davies
ome expressions are not evaluable, we should check that before evaluate it. ## How was this patch tested? Added regression tests. Author: Davies Liu <dav...@databricks.com> Closes #15103 from davies/udf_join. (cherry picked from commit d8104158a922d86dd4f00e50d5d7dddc7b777a21) S

spark git commit: [SPARK-16439] [SQL] bring back the separator in SQL UI

2016-09-19 Thread davies
xisting tests. ![metrics](https://cloud.githubusercontent.com/assets/40902/14573908/21ad2f00-030d-11e6-9e2c-c544f30039ea.png) Author: Davies Liu <dav...@databricks.com> Closes #15106 from davies/metric_sep. (cherry picked from commit e0632062635c37cbc77df7ebd2a1846655193e12) Signed-off-by:

spark git commit: [SPARK-16439] [SQL] bring back the separator in SQL UI

2016-09-19 Thread davies
xisting tests. ![metrics](https://cloud.githubusercontent.com/assets/40902/14573908/21ad2f00-030d-11e6-9e2c-c544f30039ea.png) Author: Davies Liu <dav...@databricks.com> Closes #15106 from davies/metric_sep. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.

spark git commit: [SPARK-17472] [PYSPARK] Better error message for serialization failures of large objects in Python

2016-09-14 Thread davies
x: len(b.value)).count() run() ``` Before: ``` SystemError: error return without exception set ``` After: ``` cPickle.PicklingError: Could not serialize broadcast: SystemError: error return without exception set ``` ## How was this patch tested? Manually tried out these cases cc davies Author

spark git commit: [SPARK-17514] df.take(1) and df.limit(1).collect() should perform the same in Python

2016-09-14 Thread davies
sql/tests.py` which asserts that the expected number of jobs, stages, and tasks are run for both queries. Author: Josh Rosen <joshro...@databricks.com> Closes #15068 from JoshRosen/pyspark-collect-limit. (cherry picked from commit 6d06ff6f7e2dd72ba8fe96cd875e83eda6ebb2a9) Signed-off-by: Davies Li

spark git commit: [SPARK-17514] df.take(1) and df.limit(1).collect() should perform the same in Python

2016-09-14 Thread davies
om> Authored: Wed Sep 14 10:10:01 2016 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Wed Sep 14 10:10:01 2016 -0700 -- python/pyspark/sql/dataframe.py | 5 + python

spark git commit: [SPARK-17474] [SQL] fix python udf in TakeOrderedAndProjectExec

2016-09-12 Thread davies
, I changed the TakeOrderedAndProjectExec to no use Option[Seq[Expression]] to workaround it. cc JoshRosen ## How was this patch tested? Added regression test. Author: Davies Liu <dav...@databricks.com> Closes #15030 from davies/all_expr. Project: http://git-wip-us.apache.org/repos/asf/s

spark git commit: [SPARK-17474] [SQL] fix python udf in TakeOrderedAndProjectExec

2016-09-12 Thread davies
, I changed the TakeOrderedAndProjectExec to no use Option[Seq[Expression]] to workaround it. cc JoshRosen ## How was this patch tested? Added regression test. Author: Davies Liu <dav...@databricks.com> Closes #15030 from davies/all_expr. (cherry picked fr

spark git commit: [SPARK-17354] [SQL] Partitioning by dates/timestamps should work with Parquet vectorized reader

2016-09-09 Thread davies
sts in `SQLQuerySuite`. Author: hyukjinkwon <gurwls...@gmail.com> Closes #14919 from HyukjinKwon/SPARK-17354. (cherry picked from commit f7d2143705c8c1baeed0bc62940f9dba636e705b) Signed-off-by: Davies Liu <davies@gmail.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-17354] [SQL] Partitioning by dates/timestamps should work with Parquet vectorized reader

2016-09-09 Thread davies
Sep 9 14:23:05 2016 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Fri Sep 9 14:23:05 2016 -0700 -- .../execution/vectorized/ColumnVectorUtils.java | 5 +- .../sql/execution/vectorized/ColumnarBatch.java | 6

spark git commit: [SPARK-16334] [BACKPORT] Reusing same dictionary column for decoding consecutive row groups shouldn't throw an error

2016-09-06 Thread davies
park/tree/53438048 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/53438048 Branch: refs/heads/branch-2.0 Commit: 534380484ac5f56bd3e14a8917a24ca6cccf198f Parents: 95e44dc Author: Sameer Agarwal <samee...@cs.berkeley.edu> Authored: Tue Sep 6 10:48:53 2016 -0700 Committer: Dav

spark git commit: [SPARK-16922] [SPARK-17211] [SQL] make the address of values portable in LongToUnsafeRowMap

2016-09-06 Thread davies
out Platform.LONG_ARRAY_OFFSET). ## How was this patch tested? Added a test case with random generated keys, to improve the coverage. But this test is not a regression test, that could require a Spark cluster that have at least 32G heap in driver or executor. Author: Davies Liu <dav...@databricks.com> Closes #1

spark git commit: [SPARK-16922] [SPARK-17211] [SQL] make the address of values portable in LongToUnsafeRowMap

2016-09-06 Thread davies
SET). ## How was this patch tested? Added a test case with random generated keys, to improve the coverage. But this test is not a regression test, that could require a Spark cluster that have at least 32G heap in driver or executor. Author: Davies Liu <dav...@databricks.com> Closes #1

spark git commit: Revert "[SPARK-16334] Reusing same dictionary column for decoding consecutive row groups shouldn't throw an error"

2016-09-02 Thread davies
ch-2.0 Commit: c0ea7707127c92ecb51794b96ea40d7cdb28b168 Parents: b8f65da Author: Davies Liu <davies@gmail.com> Authored: Fri Sep 2 16:05:37 2016 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Fri Sep 2 16:05:37 2016 -0700 --

spark git commit: Fix build

2016-09-02 Thread davies
iff: http://git-wip-us.apache.org/repos/asf/spark/diff/b8f65dad Branch: refs/heads/branch-2.0 Commit: b8f65dad7be22231e982aaec3bbd69dbeacc20da Parents: a3930c3 Author: Davies Liu <davies@gmail.com> Authored: Fri Sep 2 15:40:02 2016 -0700 Committer: Davies Liu <davies@gmail.com> Com

spark git commit: [SPARK-16334] Reusing same dictionary column for decoding consecutive row groups shouldn't throw an error

2016-09-02 Thread davies
er Commit: a2c9acb0e54b2e38cb8ee6431f1ea0e0b4cd959a Parents: ed9c884 Author: Sameer Agarwal <samee...@cs.berkeley.edu> Authored: Fri Sep 2 15:16:16 2016 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Fri Sep

spark git commit: [SPARK-17230] [SQL] Should not pass optimized query into QueryExecution in DataFrameWriter

2016-09-02 Thread davies
ain. Right now, we did not know a logical plan is optimized or not, we could introduce a flag for that, and make sure a optimized logical plan will not be analyzed again. Added regression tests. Author: Davies Liu <dav...@databricks.com> Closes #14797 from davies/fix_writer. (cherry pi

spark git commit: [SPARK-17230] [SQL] Should not pass optimized query into QueryExecution in DataFrameWriter

2016-09-02 Thread davies
so they will not be optimized and analyzed again. Right now, we did not know a logical plan is optimized or not, we could introduce a flag for that, and make sure a optimized logical plan will not be analyzed again. ## How was this patch tested? Added regression tests. Author: Davies Liu &

spark git commit: [SPARK-17261] [PYSPARK] Using HiveContext after re-creating SparkContext in Spark 2.0 throws "Java.lang.illegalStateException: Cannot call methods on a stopped sparkContext"

2016-09-02 Thread davies
/asf/spark/tree/ea662286 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ea662286 Branch: refs/heads/master Commit: ea662286561aa9fe321cb0a0e10cdeaf60440b90 Parents: 6bcbf9b Author: Jeff Zhang <zjf...@apache.org> Authored: Fri Sep 2 10:08:14 2016 -0700 Committer: Davies Liu <davies

spark git commit: [SPARK-17261] [PYSPARK] Using HiveContext after re-creating SparkContext in Spark 2.0 throws "Java.lang.illegalStateException: Cannot call methods on a stopped sparkContext"

2016-09-02 Thread davies
ark.sql("show databases").show() ``` Author: Jeff Zhang <zjf...@apache.org> Closes #14857 from zjffdu/SPARK-17261. (cherry picked from commit ea662286561aa9fe321cb0a0e10cdeaf60440b90) Signed-off-by: Davies Liu <davies@gmail.com> Project: http://git-wip-us.apache.org/repo

spark git commit: [SPARK-16525] [SQL] Enable Row Based HashMap in HashAggregateExec

2016-09-01 Thread davies
refs/heads/master Commit: 03d77af9ec4ce9a42affd6ab4381ae5bd3c79a5a Parents: 15539e5 Author: Qifan Pu <qifan...@gmail.com> Authored: Thu Sep 1 16:56:35 2016 -0700 Committer: Davies Liu <davies@gmail.com> Com

spark git commit: [SPARK-16926] [SQL] Remove partition columns from partition metadata.

2016-09-01 Thread davies
repos/asf/spark/tree/473d7864 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/473d7864 Branch: refs/heads/master Commit: 473d78649dec7583bcc4ec24b6f38303c38e81a2 Parents: edb4573 Author: Brian Cho <b...@fb.com> Authored: Thu Sep 1 14:13:17 2016 -0700 Committer: Davies Liu <da

spark git commit: [SPARK-16926] [SQL] Remove partition columns from partition metadata.

2016-09-01 Thread davies
How was this patch tested? Existing unit tests. Author: Brian Cho <b...@fb.com> Closes #14515 from dafrista/partition-columns-metadata. (cherry picked from commit 473d78649dec7583bcc4ec24b6f38303c38e81a2) Signed-off-by: Davies Liu <davies@gmail.com> Project: http://git-wip-us.apache.org/

spark git commit: [SPARK-17063] [SQL] Improve performance of MSCK REPAIR TABLE with Hive metastore

2016-08-29 Thread davies
nds in total, 2.5X faster (with larger cluster, gathering will much faster). Author: Davies Liu <dav...@databricks.com> Closes #14607 from davies/repair_batch. (cherry picked from commit 48caec2516ef35bfa1a3de2dc0a80d0dc819e6bd) Signed-off-by: Davies Liu <davies@gmail.com> Project:

spark git commit: [SPARK-17063] [SQL] Improve performance of MSCK REPAIR TABLE with Hive metastore

2016-08-29 Thread davies
ing these partitions took 25 seconds (most of the time spent in object store), 59 seconds in total, 2.5X faster (with larger cluster, gathering will much faster). Author: Davies Liu <dav...@databricks.com> Closes #14607 from davies/repair_batch. Project: http://git-wip-us.apache.org/repos/asf/s

spark git commit: [SPARK-16700][PYSPARK][SQL] create DataFrame from dict/Row with schema

2016-08-25 Thread davies
der in provided schema, this PR fix that by ignore the order of fields in this case. Created regression tests for them. Author: Davies Liu <dav...@databricks.com> Closes #14469 from davies/py_dict. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.

spark git commit: [SPARK-13286] [SQL] add the next expression of SQLException as cause

2016-08-23 Thread davies
ver, so did not add a regression test. Author: Davies Liu <dav...@databricks.com> Closes #14722 from davies/keep_cause. (cherry picked from commit 9afdfc94f49395e69a7959e881c19d787ce00c3e) Signed-off-by: Davies Liu <davies@gmail.com> Project: http://git-wip-us.apache.org/repos

spark git commit: [SPARK-13286] [SQL] add the next expression of SQLException as cause

2016-08-23 Thread davies
use). ## How was this patch tested? Can't reproduce this on the default JDBC driver, so did not add a regression test. Author: Davies Liu <dav...@databricks.com> Closes #14722 from davies/keep_cause. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.a

spark git commit: [SPARK-17113] [SHUFFLE] Job failure due to Executor OOM in offheap mode

2016-08-19 Thread davies
italkedia/fix_offheap_oom. (cherry picked from commit cf0cce90364d17afe780ff9a5426dfcefa298535) Signed-off-by: Davies Liu <davies@gmail.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ae89c8e1 Tree: http://git-wip-us.a

spark git commit: [SPARK-17113] [SHUFFLE] Job failure due to Executor OOM in offheap mode

2016-08-19 Thread davies
er Commit: cf0cce90364d17afe780ff9a5426dfcefa298535 Parents: 071eaaf Author: Sital Kedia <ske...@fb.com> Authored: Fri Aug 19 11:27:30 2016 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Fri Aug 1

spark git commit: [SPARK-17106] [SQL] Simplify the SubqueryExpression interface

2016-08-17 Thread davies
efs/heads/master Commit: 0b0c8b95e3594db36d87ef0e59a30eefe8508ac1 Parents: 56d8674 Author: Herman van Hovell <hvanhov...@databricks.com> Authored: Wed Aug 17 07:03:24 2016 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Wed Aug 1

spark git commit: [SPARK-17035] [SQL] [PYSPARK] Improve Timestamp not to lose precision for all cases

2016-08-16 Thread davies
ark/tree/12a89e55 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/12a89e55 Branch: refs/heads/master Commit: 12a89e55cbd630fa2986da984e066cd07d3bf1f7 Parents: 6f0988b Author: Dongjoon Hyun <dongj...@apache.org> Authored: Tue Aug 1

spark git commit: [SPARK-16958] [SQL] Reuse subqueries within the same query

2016-08-11 Thread davies
-subquery](https://cloud.githubusercontent.com/assets/40902/17573229/e578d93c-5f0d-11e6-8a3c-0150d81d3aed.png) ## How was this patch tested? Existing tests. Author: Davies Liu <dav...@databricks.com> Closes #14548 from davies/subq. Project: http://git-wip-us.apache.org/repos/asf/spark/re

spark git commit: [SPARK-16928] [SQL] Recursive call of ColumnVector::getInt() breaks JIT inlining

2016-08-10 Thread davies
Pu <qifan...@gmail.com> Authored: Wed Aug 10 14:45:13 2016 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Wed Aug 10 14:45:13 2016 -0700 -- .../parquet/VectorizedColumnReader.java | 22 +

spark git commit: [SPARK-15639] [SPARK-16321] [SQL] Push down filter at RowGroups level for parquet reader

2016-08-10 Thread davies
om viirya/vectorized-reader-push-down-filter2. (cherry picked from commit 19af298bb6d264adcf02f6f84c8dc1542b408507) Signed-off-by: Davies Liu <davies@gmail.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/977fbbfc Tree: http://git-wi

spark git commit: [SPARK-15639] [SPARK-16321] [SQL] Push down filter at RowGroups level for parquet reader

2016-08-10 Thread davies
b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/19af298b Branch: refs/heads/master Commit: 19af298bb6d264adcf02f6f84c8dc1542b408507 Parents: 11a6844 Author: Liang-Chi Hsieh <sim...@tw.ibm.com> Authored: Wed Aug 10 10:03:55 2016 -0700 Commit

spark git commit: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE

2016-08-09 Thread davies
ONS The implementation in this PR will only list partitions (not the files with a partition) in driver (in parallel if needed). Added unit tests for it and Hive compatibility test suite. Author: Davies Liu <dav...@databricks.com> Closes #14500 from davies/repair_table. Project: http:

spark git commit: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE

2016-08-09 Thread davies
tem. Another syntax is: ALTER TABLE table RECOVER PARTITIONS The implementation in this PR will only list partitions (not the files with a partition) in driver (in parallel if needed). ## How was this patch tested? Added unit tests for it and Hive compatibility test suite. Author: Davies Liu &

spark git commit: [SPARK-16950] [PYSPARK] fromOffsets parameter support in KafkaUtils.createDirectStream for python3

2016-08-09 Thread davies
hor: Mariusz Strzelecki <mariusz.strzele...@allegrogroup.com> Closes #14540 from szczeles/kafka_pyspark. (cherry picked from commit 29081b587f3423bf5a3e0066357884d0c26a04bf) Signed-off-by: Davies Liu <davies@gmail.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-16950] [PYSPARK] fromOffsets parameter support in KafkaUtils.createDirectStream for python3

2016-08-09 Thread davies
iff: http://git-wip-us.apache.org/repos/asf/spark/diff/29081b58 Branch: refs/heads/master Commit: 29081b587f3423bf5a3e0066357884d0c26a04bf Parents: 182e119 Author: Mariusz Strzelecki <mariusz.strzele...@allegrogroup.com> Authored: Tue Aug 9 09:44:43 2016 -0700 Committer: Davies Liu <da

spark git commit: [SPARK-16884] Move DataSourceScanExec out of ExistingRDD.scala file

2016-08-04 Thread davies
n't necessarily depend on an existing RDD. cc davies ## How was this patch tested? Existing tests. Author: Eric Liang <e...@databricks.com> Closes #14487 from ericl/split-scan. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit

spark git commit: [SPARK-16802] [SQL] fix overflow in LongToUnsafeRowMap

2016-08-04 Thread davies
uch smaller then minKey, for example, key is Long.MinValue, minKey is > 0). ## How was this patch tested? Added regression test (also for SPARK-16740) Author: Davies Liu <dav...@databricks.com> Closes #14464 from davies/fix_overflow. Project: http://git-wip-us.apache.org/repos/as

spark git commit: [SPARK-16802] [SQL] fix overflow in LongToUnsafeRowMap

2016-08-04 Thread davies
uch much smaller then minKey, for example, key is Long.MinValue, minKey is > 0). ## How was this patch tested? Added regression test (also for SPARK-16740) Author: Davies Liu <dav...@databricks.com> Closes #14464 from davies/fix_overflow. (cherry picked

spark git commit: [SPARK-16596] [SQL] Refactor DataSourceScanExec to do partition discovery at execution instead of planning time

2016-08-03 Thread davies
repos/asf/spark/tree/e6f226c5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e6f226c5 Branch: refs/heads/master Commit: e6f226c5670d9f332b49ca40ff7b86b81a218d1b Parents: b55f343 Author: Eric Liang <e...@databricks.com> Authored: Wed Aug 3 11:19:55 2016 -0700 Committer: Dav

spark git commit: [SPARK-16062] [SPARK-15989] [SQL] Fix two bugs of Python-only UDTs

2016-08-02 Thread davies
teDataFrame([(i % 3, [PythonOnlyPoint(float(i), float(i))]) for i in range(10)], schema=schema) df.show() ## How was this patch tested? PySpark's sql tests. Author: Liang-Chi Hsieh <sim...@tw.ibm.com> Closes #13778 from viirya/fix-pyudt. (cherry picked from commit 146001a9ffefc7aa

spark git commit: [SPARK-16062] [SPARK-15989] [SQL] Fix two bugs of Python-only UDTs

2016-08-02 Thread davies
d Author: Liang-Chi Hsieh <sim...@tw.ibm.com> Authored: Tue Aug 2 10:08:18 2016 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Tue Aug 2 10:08:18 2016 -0700 -- python/pyspark/sql/tests.py

spark git commit: [SPARK-13850] Force the sorter to Spill when number of elements in th…

2016-06-30 Thread davies
ort failing on large buffer size. Tested by running a job which was failing without this change due to TimSort bug. Author: Sital Kedia <ske...@fb.com> Closes #13107 from sitalkedia/fix_TimSort. (cherry picked from commit 07f46afc733b1718d528a6ea5c0d774f047024fa) Signed-off-by: Davies Li

spark git commit: [SPARK-13850] Force the sorter to Spill when number of elements in th…

2016-06-30 Thread davies
er Commit: 07f46afc733b1718d528a6ea5c0d774f047024fa Parents: 5344bad Author: Sital Kedia <ske...@fb.com> Authored: Thu Jun 30 10:53:18 2016 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Thu Jun 30 10:53:18 2016 -0700 ---

spark git commit: [SPARK-16301] [SQL] The analyzer rule for resolving using joins should respect the case sensitivity setting.

2016-06-29 Thread davies
uld respect the case sensitivity setting. ## How was this patch tested? New tests in ResolveNaturalJoinSuite Author: Yin Huai <yh...@databricks.com> Closes #13977 from yhuai/SPARK-16301. (cherry picked from commit 8b5a8b25b9d29b7d0949d5663c7394b26154a836) Signed-off-by: Davies Li

spark git commit: [SPARK-16301] [SQL] The analyzer rule for resolving using joins should respect the case sensitivity setting.

2016-06-29 Thread davies
: Wed Jun 29 14:42:58 2016 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Wed Jun 29 14:42:58 2016 -0700 -- .../spark/sql/catalyst/analysis/Analyzer.scala | 26 - .../analysis/ResolveN

spark git commit: [TRIVIAL] [PYSPARK] Clean up orc compression option as well

2016-06-29 Thread davies
nly in https://github.com/apache/spark/pull/13948. ## How was this patch tested? N/A Author: hyukjinkwon <gurwls...@gmail.com> Closes #13963 from HyukjinKwon/minor-orc-compress. (cherry picked from commit d8a87a3ed211dd08f06eeb9560661b8f11ce82fa) Signed-off-by: Davies Liu <davies..

spark git commit: [TRIVIAL] [PYSPARK] Clean up orc compression option as well

2016-06-29 Thread davies
Jun 29 13:32:03 2016 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Wed Jun 29 13:32:03 2016 -0700 -- python/pyspark/sql/readwriter.py | 3 +-- 1 file changed, 1 inse

spark git commit: [SPARK-16175] [PYSPARK] handle None for UDT

2016-06-28 Thread davies
ate the Python UDT to do this as well. ## How was this patch tested? Added tests. Author: Davies Liu <dav...@databricks.com> Closes #13878 from davies/udt_null. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/35438fb0 T

spark git commit: [SPARK-16175] [PYSPARK] handle None for UDT

2016-06-28 Thread davies
PR update the Python UDT to do this as well. ## How was this patch tested? Added tests. Author: Davies Liu <dav...@databricks.com> Closes #13878 from davies/udt_null. (cherry picked from commit 35438fb0ad3bcda5c5a3a0ccde1a620699d012db) Signed-off-by: Davies Liu <davies@gmail.com>

spark git commit: [SPARK-16224] [SQL] [PYSPARK] SparkSession builder's configs need to be set to the existing Scala SparkContext's SparkConf

2016-06-28 Thread davies
om commit 0923c4f5676691e28e70ecb05890e123540b91f0) Signed-off-by: Davies Liu <davies@gmail.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4c5e16f5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4c5e16f5 Diff: http://git-wip-us.a

spark git commit: [SPARK-16224] [SQL] [PYSPARK] SparkSession builder's configs need to be set to the existing Scala SparkContext's SparkConf

2016-06-28 Thread davies
78 Author: Yin Huai <yh...@databricks.com> Authored: Tue Jun 28 07:54:44 2016 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Tue Jun 28 07:54:44 2016 -0700 -- python/pyspark/context.py | 2 +

spark git commit: [SPARK-16173] [SQL] Can't join describe() of DataFrame in Scala 2.10

2016-06-24 Thread davies
park/diff/576265f8 Branch: refs/heads/branch-1.5 Commit: 576265f838fe74879b5f6b0367f759d1c37c9468 Parents: 6001138 Author: Dongjoon Hyun <dongj...@apache.org> Authored: Fri Jun 24 22:30:52 2016 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Fri Jun 2

spark git commit: [SPARK-16186] [SQL] Support partition batch pruning with `IN` predicate in InMemoryTableScanExec

2016-06-24 Thread davies
9499 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a7d29499 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a7d29499 Branch: refs/heads/master Commit: a7d29499dca5b86e776abc225ece84391f09353a Parents: d2e44d7 Author: Dongjoon Hyun <dongj...@apache.org> Authored: Fri Jun 24 17:13:13 2

spark git commit: [SPARK-16173] [SQL] Can't join describe() of DataFrame in Scala 2.10

2016-06-24 Thread davies
park/diff/b7acc1b7 Branch: refs/heads/branch-1.6 Commit: b7acc1b71c5d4b163a7451e8c6430afe920a04e0 Parents: d7223bb Author: Dongjoon Hyun <dongj...@apache.org> Authored: Fri Jun 24 22:30:52 2016 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Fri Jun 2

spark git commit: [SPARK-16173] [SQL] Can't join describe() of DataFrame in Scala 2.10

2016-06-24 Thread davies
es #13900 from dongjoon-hyun/SPARK-16173. (cherry picked from commit e5d0928e2473d1838ff5420c6a8964557c33135e) Signed-off-by: Davies Liu <davies@gmail.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9de09551 T

spark git commit: [SPARK-16173] [SQL] Can't join describe() of DataFrame in Scala 2.10

2016-06-24 Thread davies
e Branch: refs/heads/master Commit: e5d0928e2473d1838ff5420c6a8964557c33135e Parents: 20768da Author: Dongjoon Hyun <dongj...@apache.org> Authored: Fri Jun 24 17:26:39 2016 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Fri Jun 2

spark git commit: Revert "[SPARK-16186] [SQL] Support partition batch pruning with `IN` predicate in InMemoryTableScanExec"

2016-06-24 Thread davies
bce6e Parents: a65bcbc Author: Davies Liu <davies@gmail.com> Authored: Fri Jun 24 17:21:18 2016 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Fri Jun 24 17:21:18 2016 -0700 -- .../columnar/In

spark git commit: [SPARK-16186] [SQL] Support partition batch pruning with `IN` predicate in InMemoryTableScanExec

2016-06-24 Thread davies
cbc2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a65bcbc2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a65bcbc2 Branch: refs/heads/master Commit: a65bcbc27dcd9b3053cb13c5d67251c8d48f4397 Parents: 4435de1 Author: Dongjoon Hyun <dongj...@apache.org> Authored: Fri Jun 24 17:13:13 2

spark git commit: [SPARK-16077] [PYSPARK] catch the exception from pickle.whichmodule()

2016-06-24 Thread davies
ind the module). ## How was this patch tested? Manual tested. Can't have a unit test for this. Author: Davies Liu <dav...@databricks.com> Closes #13788 from davies/whichmodule. (cherry picked from commit d48935400ca47275f677b527c636976af09332c8) Signed-off-by: Davies Liu <davies..

spark git commit: [SPARK-16077] [PYSPARK] catch the exception from pickle.whichmodule()

2016-06-24 Thread davies
ind the module). ## How was this patch tested? Manual tested. Can't have a unit test for this. Author: Davies Liu <dav...@databricks.com> Closes #13788 from davies/whichmodule. (cherry picked from commit d48935400ca47275f677b527c636976af09332c8) Signed-off-by: Davies Liu <davies..

spark git commit: [SPARK-16163] [SQL] Cache the statistics for logical plans

2016-06-23 Thread davies
t's only useful when used in another query (before planning), because once we finished the planning, the statistics will not be used anymore. ## How was this patch tested? Testsed with TPC-DS Q64, it could be planned in a second after the patch. Author: Davies Liu <dav...@databricks.com>

spark git commit: [SPARK-16163] [SQL] Cache the statistics for logical plans

2016-06-23 Thread davies
nly useful when used in another query (before planning), because once we finished the planning, the statistics will not be used anymore. ## How was this patch tested? Testsed with TPC-DS Q64, it could be planned in a second after the patch. Author: Davies Liu <dav...@databricks.com> Clos

spark git commit: [SPARK-16003] SerializationDebugger runs into infinite loop

2016-06-22 Thread davies
cc davies cloud-fan ## How was this patch tested? Unit tests for SerializationDebugger. Author: Eric Liang <e...@databricks.com> Closes #13814 from ericl/spark-16003. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit

spark git commit: [SPARK-16003] SerializationDebugger runs into infinite loop

2016-06-22 Thread davies
ava cc davies cloud-fan ## How was this patch tested? Unit tests for SerializationDebugger. Author: Eric Liang <e...@databricks.com> Closes #13814 from ericl/spark-16003. (cherry picked from commit 6f915c9ec24003877d1ef675a59145699780a2ff) Signed-off-by: Davies Liu <davies@gmail.co

spark git commit: [SPARK-16104] [SQL] Do not creaate CSV writer object for every flush when writing

2016-06-21 Thread davies
er Commit: 7580f3041a1a3757a0b14b9d8afeb720f261fff6 Parents: d77c4e6 Author: hyukjinkwon <gurwls...@gmail.com> Authored: Tue Jun 21 21:58:38 2016 -0700 Committer: Davies Liu <davies@gmail.com> Committed: Tue Jun 2

spark git commit: [SPARK-16086] [SQL] [PYSPARK] create Row without any fields

2016-06-21 Thread davies
est for empty row and udf without arguments. Author: Davies Liu <dav...@databricks.com> Closes #13812 from davies/no_argus. (cherry picked from commit 2d6919bea9fc213b5af530afab7793b63c6c8b51) Signed-off-by: Davies Liu <davies@gmail.com> Project: http://git-wip-us.apache.org/

spark git commit: [SPARK-16086] [SQL] [PYSPARK] create Row without any fields

2016-06-21 Thread davies
est for empty row and udf without arguments. Author: Davies Liu <dav...@databricks.com> Closes #13812 from davies/no_argus. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2d6919be Tree: http://git-wip-us.apache.org/repos/

  1   2   3   4   5   6   7   8   >