[jira] [Assigned] (SPARK-17100) pyspark filter on a udf column after join gives java.lang.UnsupportedOperationException

2016-09-14 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reassigned SPARK-17100: -- Assignee: Davies Liu > pyspark filter on a udf column after join gives >

[jira] [Resolved] (SPARK-17472) Better error message for serialization failures of large objects in Python

2016-09-14 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-17472. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 15026

[jira] [Commented] (SPARK-17544) Timeout waiting for connection from pool, DataFrame Reader's not closing S3 connections?

2016-09-14 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15491397#comment-15491397 ] Davies Liu commented on SPARK-17544: Could you post some code to reproduce the issue? > Timeout

[jira] [Resolved] (SPARK-17514) df.take(1) and df.limit(1).collect() perform differently in Python

2016-09-14 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-17514. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull

[jira] [Resolved] (SPARK-17474) Python UDF does not work between Sort and Limit

2016-09-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-17474. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull

[jira] [Assigned] (SPARK-15621) BatchEvalPythonExec fails with OOM

2016-09-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reassigned SPARK-15621: -- Assignee: Davies Liu > BatchEvalPythonExec fails with OOM >

[jira] [Resolved] (SPARK-17354) java.lang.ClassCastException: java.lang.Integer cannot be cast to java.sql.Date

2016-09-09 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-17354. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull

[jira] [Created] (SPARK-17482) Analyzer should be able run on top of optimized rule

2016-09-09 Thread Davies Liu (JIRA)
Davies Liu created SPARK-17482: -- Summary: Analyzer should be able run on top of optimized rule Key: SPARK-17482 URL: https://issues.apache.org/jira/browse/SPARK-17482 Project: Spark Issue Type:

[jira] [Updated] (SPARK-17474) Python UDF does not work between Sort and Limit

2016-09-09 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-17474: --- Summary: Python UDF does not work between Sort and Limit (was: expressions of QueryPlan does not

[jira] [Updated] (SPARK-17474) expressions of QueryPlan does not include those inside Option[Seq[Expression]]

2016-09-09 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-17474: --- Affects Version/s: (was: 1.6.2) (was: 1.5.2) > expressions of

[jira] [Created] (SPARK-17474) expressions of QueryPlan does not include those inside Option[Seq[Expression]]

2016-09-09 Thread Davies Liu (JIRA)
Davies Liu created SPARK-17474: -- Summary: expressions of QueryPlan does not include those inside Option[Seq[Expression]] Key: SPARK-17474 URL: https://issues.apache.org/jira/browse/SPARK-17474 Project:

[jira] [Commented] (SPARK-17381) Memory leak org.apache.spark.sql.execution.ui.SQLTaskMetrics

2016-09-06 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468738#comment-15468738 ] Davies Liu commented on SPARK-17381: cc [~cloud_fan] > Memory leak

[jira] [Closed] (SPARK-17384) SQL - Running query with outer join from 1.6 fails

2016-09-06 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu closed SPARK-17384. -- Resolution: Duplicate Assignee: Herman van Hovell > SQL - Running query with outer join from 1.6

[jira] [Commented] (SPARK-17384) SQL - Running query with outer join from 1.6 fails

2016-09-06 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468636#comment-15468636 ] Davies Liu commented on SPARK-17384: This is caused by the SQL parser change, the parsed plan in 1.6:

[jira] [Commented] (SPARK-17377) Joining Datasets read and aggregated from a partitioned Parquet file gives wrong results

2016-09-06 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468583#comment-15468583 ] Davies Liu commented on SPARK-17377: Tested this with latest master and 2.0 on databricks[1], they

[jira] [Assigned] (SPARK-17377) Joining Datasets read and aggregated from a partitioned Parquet file gives wrong results

2016-09-06 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reassigned SPARK-17377: -- Assignee: Davies Liu > Joining Datasets read and aggregated from a partitioned Parquet file

[jira] [Updated] (SPARK-17377) Joining Datasets read and aggregated from a partitioned Parquet file gives wrong results

2016-09-06 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-17377: --- Description: Reproduction: 1) Read two Datasets from a partitioned Parquet file with different

[jira] [Commented] (SPARK-17403) Fatal Error: Scan cached strings

2016-09-06 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468449#comment-15468449 ] Davies Liu commented on SPARK-17403: [~rhernando] Could you pull out the string column (SL_RD_ColR_N)

[jira] [Updated] (SPARK-17403) Fatal Error: Scan cached strings

2016-09-06 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-17403: --- Summary: Fatal Error: Scan cached strings (was: Fatal Error: SIGSEGV on Jdbc joins) > Fatal Error:

[jira] [Commented] (SPARK-16922) Query with Broadcast Hash join fails due to executor OOM in Spark 2.0

2016-09-06 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468364#comment-15468364 ] Davies Liu commented on SPARK-16922: Is there any performance difference comparing to

[jira] [Resolved] (SPARK-16922) Query with Broadcast Hash join fails due to executor OOM in Spark 2.0

2016-09-06 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-16922. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull

[jira] [Resolved] (SPARK-17211) Broadcast join produces incorrect results when compressed Oops differs between driver, executor

2016-09-06 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-17211. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull

[jira] [Updated] (SPARK-17409) Query in CTAS is Optimized Twice

2016-09-06 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-17409: --- Assignee: Xiao Li > Query in CTAS is Optimized Twice > > >

[jira] [Commented] (SPARK-17211) Broadcast join produces incorrect results when compressed Oops differs between driver, executor

2016-09-02 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459835#comment-15459835 ] Davies Liu commented on SPARK-17211: Could you try the patch ?

[jira] [Resolved] (SPARK-16334) SQL query on parquet table java.lang.ArrayIndexOutOfBoundsException

2016-09-02 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-16334. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull

[jira] [Resolved] (SPARK-17230) Writing decimal to csv will result empty string if the decimal exceeds (20, 18)

2016-09-02 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-17230. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 > Writing decimal to csv

[jira] [Updated] (SPARK-17261) Using HiveContext after re-creating SparkContext in Spark 2.0 throws "Java.lang.illegalStateException: Cannot call methods on a stopped sparkContext"

2016-09-02 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-17261: --- Assignee: Jeff Zhang > Using HiveContext after re-creating SparkContext in Spark 2.0 throws >

[jira] [Resolved] (SPARK-17261) Using HiveContext after re-creating SparkContext in Spark 2.0 throws "Java.lang.illegalStateException: Cannot call methods on a stopped sparkContext"

2016-09-02 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-17261. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull

[jira] [Commented] (SPARK-17211) Broadcast join produces incorrect results when compressed Oops differs between driver, executor

2016-09-02 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459040#comment-15459040 ] Davies Liu commented on SPARK-17211: [~migtor] Could you try this patch ?

[jira] [Resolved] (SPARK-16525) Enable Row Based HashMap in HashAggregateExec

2016-09-01 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-16525. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14176

[jira] [Resolved] (SPARK-16926) Partition columns are present in columns metadata for partition but not table

2016-09-01 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-16926. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull

[jira] [Commented] (SPARK-16922) Query with Broadcast Hash join fails due to executor OOM in Spark 2.0

2016-09-01 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456605#comment-15456605 ] Davies Liu commented on SPARK-16922: [~sitalke...@gmail.com] I think I found the cause and fix it,

[jira] [Assigned] (SPARK-16922) Query with Broadcast Hash join fails due to executor OOM in Spark 2.0

2016-09-01 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reassigned SPARK-16922: -- Assignee: Davies Liu > Query with Broadcast Hash join fails due to executor OOM in Spark 2.0

[jira] [Assigned] (SPARK-17211) Broadcast join produces incorrect results on EMR with large driver memory

2016-09-01 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reassigned SPARK-17211: -- Assignee: Davies Liu > Broadcast join produces incorrect results on EMR with large driver

[jira] [Resolved] (SPARK-17063) MSCK REPAIR TABLE is super slow with Hive metastore

2016-08-29 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-17063. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull

[jira] [Created] (SPARK-17230) Writing decimal to csv will result empty string if the decimal exceeds (20, 18)

2016-08-24 Thread Davies Liu (JIRA)
Davies Liu created SPARK-17230: -- Summary: Writing decimal to csv will result empty string if the decimal exceeds (20, 18) Key: SPARK-17230 URL: https://issues.apache.org/jira/browse/SPARK-17230 Project:

[jira] [Commented] (SPARK-14560) Cooperative Memory Management for Spillables

2016-08-23 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433200#comment-15433200 ] Davies Liu commented on SPARK-14560: Even with SPARK-4452, we still can not say that we fixed the OOM

[jira] [Resolved] (SPARK-13286) JDBC driver doesn't report full exception

2016-08-23 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-13286. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull

[jira] [Closed] (SPARK-16569) Use Cython to speed up Pyspark internals

2016-08-19 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu closed SPARK-16569. -- Resolution: Won't Fix > Use Cython to speed up Pyspark internals >

[jira] [Commented] (SPARK-16569) Use Cython to speed up Pyspark internals

2016-08-19 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428843#comment-15428843 ] Davies Liu commented on SPARK-16569: Agreed to [~robert3005]. Another options could be just use PyPy,

[jira] [Updated] (SPARK-17113) Job failure due to Executor OOM in offheap mode

2016-08-19 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-17113: --- Assignee: Sital Kedia > Job failure due to Executor OOM in offheap mode >

[jira] [Resolved] (SPARK-17113) Job failure due to Executor OOM in offheap mode

2016-08-19 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-17113. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 > Job failure due to

[jira] [Assigned] (SPARK-13286) JDBC driver doesn't report full exception

2016-08-19 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reassigned SPARK-13286: -- Assignee: Davies Liu > JDBC driver doesn't report full exception >

[jira] [Commented] (SPARK-16922) Query with Broadcast Hash join fails due to executor OOM in Spark 2.0

2016-08-18 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427264#comment-15427264 ] Davies Liu commented on SPARK-16922: Which serializer are you using? java serializer or Kyro? >

[jira] [Comment Edited] (SPARK-16922) Query with Broadcast Hash join fails due to executor OOM in Spark 2.0

2016-08-18 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427241#comment-15427241 ] Davies Liu edited comment on SPARK-16922 at 8/18/16 9:58 PM: - Is this failure

[jira] [Commented] (SPARK-16922) Query with Broadcast Hash join fails due to executor OOM in Spark 2.0

2016-08-18 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427241#comment-15427241 ] Davies Liu commented on SPARK-16922: Is this failure determistic or not? Happened on every task or

[jira] [Created] (SPARK-17115) Improve the performance of UnsafeProjection for wide table

2016-08-17 Thread Davies Liu (JIRA)
Davies Liu created SPARK-17115: -- Summary: Improve the performance of UnsafeProjection for wide table Key: SPARK-17115 URL: https://issues.apache.org/jira/browse/SPARK-17115 Project: Spark Issue

[jira] [Resolved] (SPARK-17106) Simplify subquery interface

2016-08-17 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-17106. Resolution: Fixed Fix Version/s: 2.1.0 > Simplify subquery interface >

[jira] [Resolved] (SPARK-17035) Conversion of datetime.max to microseconds produces incorrect value

2016-08-16 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-17035. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14631

[jira] [Commented] (SPARK-16922) Query with Broadcast Hash join fails due to executor OOM in Spark 2.0

2016-08-15 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421484#comment-15421484 ] Davies Liu commented on SPARK-16922: Have you also have this one?

[jira] [Created] (SPARK-17063) MSCK REPAIR TABLE is super slow with Hive metastore

2016-08-15 Thread Davies Liu (JIRA)
Davies Liu created SPARK-17063: -- Summary: MSCK REPAIR TABLE is super slow with Hive metastore Key: SPARK-17063 URL: https://issues.apache.org/jira/browse/SPARK-17063 Project: Spark Issue Type:

[jira] [Commented] (SPARK-16922) Query with Broadcast Hash join fails due to executor OOM in Spark 2.0

2016-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419438#comment-15419438 ] Davies Liu commented on SPARK-16922: I think it's fixed by

[jira] [Commented] (SPARK-16922) Query with Broadcast Hash join fails due to executor OOM in Spark 2.0

2016-08-12 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419434#comment-15419434 ] Davies Liu commented on SPARK-16922: [~sitalke...@gmail.com] There are two integer overflow bugs

[jira] [Resolved] (SPARK-16958) Reuse subqueries within single query

2016-08-11 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-16958. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14548

[jira] [Commented] (HADOOP-12455) fs.Globber breaks on colon in filename; doesn't use Path's handling for colons

2016-08-10 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15416141#comment-15416141 ] Davies Liu commented on HADOOP-12455: - I think the root cause is that `fs.Path` try to parse `A` in

[jira] [Resolved] (SPARK-16928) Recursive call of ColumnVector::getInt() breaks JIT inlining

2016-08-10 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-16928. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14513

[jira] [Commented] (SPARK-16227) Json schema inference fails when `:` exists in file path

2016-08-10 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415994#comment-15415994 ] Davies Liu commented on SPARK-16227: Can reproduce this by change `jsont` to `jsont:1` > Json schema

[jira] [Commented] (SPARK-16227) Json schema inference fails when `:` exists in file path

2016-08-10 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415991#comment-15415991 ] Davies Liu commented on SPARK-16227: [~brkyvz] I can't reproduce this in master (2.1-snapshot),

[jira] [Assigned] (SPARK-14887) Generated SpecificUnsafeProjection Exceeds JVM Code Size Limits

2016-08-10 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reassigned SPARK-14887: -- Assignee: Davies Liu > Generated SpecificUnsafeProjection Exceeds JVM Code Size Limits >

[jira] [Commented] (SPARK-14887) Generated SpecificUnsafeProjection Exceeds JVM Code Size Limits

2016-08-10 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415710#comment-15415710 ] Davies Liu commented on SPARK-14887: Do you have a large CaseWhen in this query? > Generated

[jira] [Comment Edited] (SPARK-15639) Try to push down filter at RowGroups level for parquet reader

2016-08-10 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415591#comment-15415591 ] Davies Liu edited comment on SPARK-15639 at 8/10/16 5:07 PM: - Merged

[jira] [Resolved] (SPARK-15639) Try to push down filter at RowGroups level for parquet reader

2016-08-10 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-15639. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Target

[jira] [Commented] (SPARK-16093) Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1

2016-08-09 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414361#comment-15414361 ] Davies Liu commented on SPARK-16093: It also could be possible that the stat of Hive table is broken

[jira] [Commented] (SPARK-16093) Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1

2016-08-09 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414360#comment-15414360 ] Davies Liu commented on SPARK-16093: Could you `set spark.sql.autoBroadcastJoinThreshold` to verify

[jira] [Commented] (SPARK-16766) TakeOrderedAndProjectExec easily cause OOM

2016-08-09 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414345#comment-15414345 ] Davies Liu commented on SPARK-16766: What's the use case of this queue? I'd imagine that it's not

[jira] [Updated] (SPARK-16766) TakeOrderedAndProjectExec easily cause OOM

2016-08-09 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-16766: --- Priority: Minor (was: Critical) > TakeOrderedAndProjectExec easily cause OOM >

[jira] [Updated] (SPARK-16905) Support SQL DDL: MSCK REPAIR TABLE

2016-08-09 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-16905: --- Fix Version/s: 2.0.1 > Support SQL DDL: MSCK REPAIR TABLE > -- > >

[jira] [Updated] (SPARK-16905) Support SQL DDL: MSCK REPAIR TABLE

2016-08-09 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-16905: --- Issue Type: Bug (was: New Feature) > Support SQL DDL: MSCK REPAIR TABLE >

[jira] [Commented] (SPARK-15354) Topology aware block replication strategies

2016-08-09 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413969#comment-15413969 ] Davies Liu commented on SPARK-15354: Should we make sure that there will be two copy on same host (ip

[jira] [Resolved] (SPARK-16905) Support SQL DDL: MSCK REPAIR TABLE

2016-08-09 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-16905. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14500

[jira] [Resolved] (SPARK-16950) fromOffsets parameter in Kafka's Direct Streams does not work in python3

2016-08-09 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-16950. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull

[jira] [Created] (SPARK-16958) Reuse subqueries within single query

2016-08-08 Thread Davies Liu (JIRA)
Davies Liu created SPARK-16958: -- Summary: Reuse subqueries within single query Key: SPARK-16958 URL: https://issues.apache.org/jira/browse/SPARK-16958 Project: Spark Issue Type: New Feature

[jira] [Updated] (SPARK-11150) Dynamic partition pruning

2016-08-08 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-11150: --- Assignee: (was: Davies Liu) > Dynamic partition pruning > - > >

[jira] [Updated] (SPARK-11150) Dynamic partition pruning

2016-08-08 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-11150: --- Target Version/s: (was: 2.1.0) > Dynamic partition pruning > - > >

[jira] [Updated] (SPARK-11150) Dynamic partition pruning

2016-08-08 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-11150: --- Affects Version/s: 2.0.0 Target Version/s: 2.1.0 Issue Type: New Feature (was: Bug)

[jira] [Assigned] (SPARK-11150) Dynamic partition pruning

2016-08-08 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reassigned SPARK-11150: -- Assignee: Davies Liu > Dynamic partition pruning > - > >

[jira] [Commented] (SPARK-15354) Topology aware block replication strategies

2016-08-05 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410269#comment-15410269 ] Davies Liu commented on SPARK-15354: This strategy used in HDFS is to balance the write traffic (for

[jira] [Created] (SPARK-16905) Support SQL DDL: MSCK REPAIR TABLE

2016-08-04 Thread Davies Liu (JIRA)
Davies Liu created SPARK-16905: -- Summary: Support SQL DDL: MSCK REPAIR TABLE Key: SPARK-16905 URL: https://issues.apache.org/jira/browse/SPARK-16905 Project: Spark Issue Type: New Feature

[jira] [Updated] (SPARK-16884) Move DataSourceScanExec out of ExistingRDD.scala file

2016-08-04 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-16884: --- Assignee: Eric Liang > Move DataSourceScanExec out of ExistingRDD.scala file >

[jira] [Resolved] (SPARK-16884) Move DataSourceScanExec out of ExistingRDD.scala file

2016-08-04 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-16884. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14487

[jira] [Resolved] (SPARK-16802) joins.LongToUnsafeRowMap crashes with ArrayIndexOutOfBoundsException

2016-08-04 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-16802. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull

[jira] [Resolved] (SPARK-16596) Refactor DataSourceScanExec to do partition discovery at execution instead of planning time

2016-08-03 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-16596. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14241

[jira] [Commented] (SPARK-16700) StructType doesn't accept Python dicts anymore

2016-08-02 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405043#comment-15405043 ] Davies Liu commented on SPARK-16700: Sent PR https://github.com/apache/spark/pull/14469 to address

[jira] [Assigned] (SPARK-16700) StructType doesn't accept Python dicts anymore

2016-08-02 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reassigned SPARK-16700: -- Assignee: Davies Liu > StructType doesn't accept Python dicts anymore >

[jira] [Updated] (SPARK-15639) Try to push down filter at RowGroups level for parquet reader

2016-08-02 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-15639: --- Target Version/s: 2.0.1 > Try to push down filter at RowGroups level for parquet reader >

[jira] [Updated] (SPARK-15639) Try to push down filter at RowGroups level for parquet reader

2016-08-02 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-15639: --- Priority: Blocker (was: Major) > Try to push down filter at RowGroups level for parquet reader >

[jira] [Resolved] (SPARK-16062) PySpark SQL python-only UDTs don't work well

2016-08-02 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-16062. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull

[jira] [Resolved] (SPARK-15989) PySpark SQL python-only UDTs don't support nested types

2016-08-02 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-15989. Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull

[jira] [Assigned] (SPARK-16802) joins.LongToUnsafeRowMap crashes with ArrayIndexOutOfBoundsException

2016-08-01 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reassigned SPARK-16802: -- Assignee: Davies Liu > joins.LongToUnsafeRowMap crashes with ArrayIndexOutOfBoundsException >

[jira] [Commented] (SPARK-16700) StructType doesn't accept Python dicts anymore

2016-08-01 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402739#comment-15402739 ] Davies Liu commented on SPARK-16700: There are two separate problems here: 1) Spark 2.0 enforce data

[jira] [Resolved] (SPARK-16175) Handle None for all Python UDT

2016-06-28 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-16175. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13878

[jira] [Created] (SPARK-16259) Cleanup options for DataFrame reader API in Python

2016-06-28 Thread Davies Liu (JIRA)
Davies Liu created SPARK-16259: -- Summary: Cleanup options for DataFrame reader API in Python Key: SPARK-16259 URL: https://issues.apache.org/jira/browse/SPARK-16259 Project: Spark Issue Type:

[jira] [Resolved] (SPARK-16224) Hive context created by HiveContext can't access Hive databases when used in a script launched be spark-submit

2016-06-28 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-16224. Resolution: Fixed Fix Version/s: 2.0.1 Issue resolved by pull request 13931

[jira] [Commented] (SPARK-15700) Spark 2.0 dataframes using more memory (reading/writing parquet)

2016-06-27 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352016#comment-15352016 ] Davies Liu commented on SPARK-15700: My guess is that the SQL metrics required more memory than

[jira] [Commented] (SPARK-15621) BatchEvalPythonExec fails with OOM

2016-06-27 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352004#comment-15352004 ] Davies Liu commented on SPARK-15621: The number of rows in the queue will bounded by the number of

[jira] [Updated] (SPARK-15621) BatchEvalPythonExec fails with OOM

2016-06-27 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-15621: --- Priority: Major (was: Critical) > BatchEvalPythonExec fails with OOM >

[jira] [Updated] (SPARK-16224) Hive context created by HiveContext can't access Hive databases when used in a script launched be spark-submit

2016-06-27 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-16224: --- Description: Hi, This is a continuation of a resolved bug

[jira] [Updated] (SPARK-16173) Can't join describe() of DataFrame in Scala 2.10

2016-06-24 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-16173: --- Fix Version/s: 1.5.3 > Can't join describe() of DataFrame in Scala 2.10 >

[jira] [Updated] (SPARK-16173) Can't join describe() of DataFrame in Scala 2.10

2016-06-24 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-16173: --- Fix Version/s: 2.0.1 > Can't join describe() of DataFrame in Scala 2.10 >

[jira] [Updated] (SPARK-16173) Can't join describe() of DataFrame in Scala 2.10

2016-06-24 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-16173: --- Assignee: Dongjoon Hyun > Can't join describe() of DataFrame in Scala 2.10 >

<    1   2   3   4   5   6   7   8   9   10   >