[jira] [Commented] (SPARK-15911) Remove additional Project to be consistent with SQL when insert into table

2017-02-03 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851488#comment-15851488 ] Liang-Chi Hsieh commented on SPARK-15911: - [~hyukjin.kwon] Thanks! > Remove additional Project

[jira] [Updated] (SPARK-19425) Make ExtractEquiJoinKeys support UDT columns

2017-02-03 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-19425: Description: DataFrame.except doesn't work for UDT columns. It is because

[jira] [Updated] (SPARK-19425) Make ExtractEquiJoinKeys support UDT columns

2017-02-03 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-19425: Description: DataFrame.except doesn't work for UDT columns. It is because

[jira] [Updated] (SPARK-19425) Make ExtractEquiJoinKeys support UDT columns

2017-02-03 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-19425: Summary: Make ExtractEquiJoinKeys support UDT columns (was: Make df.except work for UDT)

[jira] [Updated] (SPARK-19443) The function to generate constraints takes too long when the query plan grows continuously

2017-02-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-19443: Description: This issue is originally reported and discussed at

[jira] [Updated] (SPARK-19433) ML Pipeline with long stages takes long time to finish

2017-02-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-19433: Description: This issue is originally reported and discussed at

[jira] [Created] (SPARK-19443) The function to generate constraints takes too long when the query plan grows continuously

2017-02-02 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-19443: --- Summary: The function to generate constraints takes too long when the query plan grows continuously Key: SPARK-19443 URL: https://issues.apache.org/jira/browse/SPARK-19443

[jira] [Created] (SPARK-19433) ML Pipeline with long stages takes long time to finish

2017-02-01 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-19433: --- Summary: ML Pipeline with long stages takes long time to finish Key: SPARK-19433 URL: https://issues.apache.org/jira/browse/SPARK-19433 Project: Spark

[jira] [Commented] (SPARK-19425) Make df.except work for UDT

2017-02-01 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848483#comment-15848483 ] Liang-Chi Hsieh commented on SPARK-19425: - I remember affects version can be None before. But

[jira] [Created] (SPARK-19425) Make df.except work for UDT

2017-02-01 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-19425: --- Summary: Make df.except work for UDT Key: SPARK-19425 URL: https://issues.apache.org/jira/browse/SPARK-19425 Project: Spark Issue Type: Bug

[jira] (SPARK-19411) Remove the metadata used to mark optional columns in merged Parquet schema for filter predicate pushdown

2017-01-31 Thread Liang-Chi Hsieh (JIRA)
Title: Message Title Liang-Chi Hsieh created an issue

[jira] (SPARK-6307) Executers fetches the same rdd-block 100's or 1000's of times

2017-01-30 Thread Liang-Chi Hsieh (JIRA)
Title: Message Title Liang-Chi Hsieh commented on SPARK-6307

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2017-01-26 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840857#comment-15840857 ] Liang-Chi Hsieh commented on SPARK-18539: - [~lian cheng] Yea, I see. The term {{optional}} is

[jira] [Created] (SPARK-19355) Use map output statistices to improve global limit's parallelism

2017-01-24 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-19355: --- Summary: Use map output statistices to improve global limit's parallelism Key: SPARK-19355 URL: https://issues.apache.org/jira/browse/SPARK-19355 Project:

[jira] [Comment Edited] (SPARK-19311) UDFs disregard UDT type hierarchy

2017-01-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831888#comment-15831888 ] Liang-Chi Hsieh edited comment on SPARK-19311 at 1/20/17 3:06 PM: --

[jira] [Commented] (SPARK-19311) UDFs disregard UDT type hierarchy

2017-01-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831888#comment-15831888 ] Liang-Chi Hsieh commented on SPARK-19311: - [~Gregor Moehler] I think you already have the fixing.

[jira] [Closed] (SPARK-19274) Make GlobalLimit without shuffling data to single partition

2017-01-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh closed SPARK-19274. --- Resolution: Won't Fix > Make GlobalLimit without shuffling data to single partition >

[jira] [Created] (SPARK-19274) Make GlobalLimit without shuffling data to single partition

2017-01-18 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-19274: --- Summary: Make GlobalLimit without shuffling data to single partition Key: SPARK-19274 URL: https://issues.apache.org/jira/browse/SPARK-19274 Project: Spark

[jira] [Created] (SPARK-19244) Sort MemoryConsumers according to their memory usage when spilling

2017-01-16 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-19244: --- Summary: Sort MemoryConsumers according to their memory usage when spilling Key: SPARK-19244 URL: https://issues.apache.org/jira/browse/SPARK-19244 Project:

[jira] [Commented] (SPARK-19223) InputFileBlockHolder doesn't work with Python UDF for datasource other than FileFormat

2017-01-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822733#comment-15822733 ] Liang-Chi Hsieh commented on SPARK-19223: - Hi [~someonehere15], For the issue on spark-xml

[jira] [Created] (SPARK-19223) InputFileBlockHolder doesn't work with Python UDF for datasource other than FileFormat

2017-01-13 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-19223: --- Summary: InputFileBlockHolder doesn't work with Python UDF for datasource other than FileFormat Key: SPARK-19223 URL: https://issues.apache.org/jira/browse/SPARK-19223

[jira] [Comment Edited] (SPARK-18667) input_file_name function does not work with UDF

2017-01-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822631#comment-15822631 ] Liang-Chi Hsieh edited comment on SPARK-18667 at 1/14/17 2:00 AM: --

[jira] [Commented] (SPARK-18667) input_file_name function does not work with UDF

2017-01-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822631#comment-15822631 ] Liang-Chi Hsieh commented on SPARK-18667: - [~someonehere15], Yeah, I can reproduce that the last

[jira] [Commented] (SPARK-18667) input_file_name function does not work with UDF

2017-01-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821852#comment-15821852 ] Liang-Chi Hsieh commented on SPARK-18667: - Hi [~someonehere15], Thanks for providing the info. I

[jira] [Commented] (SPARK-18667) input_file_name function does not work with UDF

2017-01-12 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821135#comment-15821135 ] Liang-Chi Hsieh commented on SPARK-18667: - Hi Ben, I've just tried the example codes in current

[jira] [Comment Edited] (SPARK-7768) Make user-defined type (UDT) API public

2017-01-08 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15807248#comment-15807248 ] Liang-Chi Hsieh edited comment on SPARK-7768 at 1/9/17 6:42 AM: Hi

[jira] [Commented] (SPARK-7768) Make user-defined type (UDT) API public

2017-01-07 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15807248#comment-15807248 ] Liang-Chi Hsieh commented on SPARK-7768: Hi Randall, With the {{UDTRegistration}} added since

[jira] [Comment Edited] (SPARK-19068) Large number of executors causing a ton of ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdat

2017-01-04 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15800637#comment-15800637 ] Liang-Chi Hsieh edited comment on SPARK-19068 at 1/5/17 7:38 AM: - Does it

[jira] [Commented] (SPARK-19068) Large number of executors causing a ton of ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(41,

2017-01-04 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15800637#comment-15800637 ] Liang-Chi Hsieh commented on SPARK-19068: - Does it affect the correctness of the results of the

[jira] [Commented] (SPARK-19081) spark sql use HIVE UDF throw exception when return a Map value

2017-01-04 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15800602#comment-15800602 ] Liang-Chi Hsieh commented on SPARK-19081: - I want to make sure that this issue is happened on

[jira] [Created] (SPARK-19082) The config ignoreCorruptFiles doesn't work for Parquet

2017-01-04 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-19082: --- Summary: The config ignoreCorruptFiles doesn't work for Parquet Key: SPARK-19082 URL: https://issues.apache.org/jira/browse/SPARK-19082 Project: Spark

[jira] [Comment Edited] (SPARK-7768) Make user-defined type (UDT) API public

2017-01-03 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15797443#comment-15797443 ] Liang-Chi Hsieh edited comment on SPARK-7768 at 1/4/17 7:35 AM: I would

[jira] [Comment Edited] (SPARK-7768) Make user-defined type (UDT) API public

2017-01-03 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15797443#comment-15797443 ] Liang-Chi Hsieh edited comment on SPARK-7768 at 1/4/17 7:35 AM: I would

[jira] [Commented] (SPARK-7768) Make user-defined type (UDT) API public

2017-01-03 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15797443#comment-15797443 ] Liang-Chi Hsieh commented on SPARK-7768: I would like to push this forward and make

[jira] [Created] (SPARK-19055) SparkSession initialization will be associated with invalid SparkContext when new SparkContext is created to replace stopped SparkContext

2017-01-02 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-19055: --- Summary: SparkSession initialization will be associated with invalid SparkContext when new SparkContext is created to replace stopped SparkContext Key: SPARK-19055 URL:

[jira] [Commented] (SPARK-18781) Allow MatrixFactorizationModel.predict to skip user/product approximation count

2016-12-30 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787786#comment-15787786 ] Liang-Chi Hsieh commented on SPARK-18781: - [~eyal] Do you have estimated time cost of the

[jira] [Comment Edited] (SPARK-19032) Non-deterministic results using aggregation first across multiple workers

2016-12-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786883#comment-15786883 ] Liang-Chi Hsieh edited comment on SPARK-19032 at 12/30/16 4:50 AM: --- I

[jira] [Commented] (SPARK-19032) Non-deterministic results using aggregation first across multiple workers

2016-12-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786883#comment-15786883 ] Liang-Chi Hsieh commented on SPARK-19032: - I think you can not guarantee the sort order per group

[jira] [Commented] (SPARK-19032) Non-deterministic results using aggregation first across multiple workers

2016-12-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786824#comment-15786824 ] Liang-Chi Hsieh commented on SPARK-19032: - There is a related discussion at dev mailing list:

[jira] [Issue Comment Deleted] (SPARK-16849) Improve subquery execution by deduplicating the subqueries with the same results

2016-12-28 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-16849: Comment: was deleted (was: Design doc v1) > Improve subquery execution by deduplicating

[jira] [Updated] (SPARK-16849) Improve subquery execution by deduplicating the subqueries with the same results

2016-12-28 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-16849: Attachment: de-duplicating subqueries.pdf > Improve subquery execution by deduplicating

[jira] [Updated] (SPARK-16849) Improve subquery execution by deduplicating the subqueries with the same results

2016-12-28 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-16849: Attachment: (was: de-duplicating subqueries.pdf) > Improve subquery execution by

[jira] [Updated] (SPARK-16849) Improve subquery execution by deduplicating the subqueries with the same results

2016-12-28 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-16849: Attachment: de-duplicating subqueries.pdf Design doc v1 > Improve subquery execution by

[jira] [Commented] (SPARK-18997) Recommended upgrade libthrift to 0.9.3

2016-12-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15777815#comment-15777815 ] Liang-Chi Hsieh commented on SPARK-18997: - I've checked the dependency and seems there is no

[jira] [Commented] (SPARK-18978) Spark streaming ClassCastException

2016-12-22 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15772185#comment-15772185 ] Liang-Chi Hsieh commented on SPARK-18978: - Actually I can't reproduce this in master branch with

[jira] [Updated] (SPARK-18986) ExternalAppendOnlyMap shouldn't fail when forced to spill before calling its iterator

2016-12-22 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-18986: Component/s: (was: SQL) Spark Core > ExternalAppendOnlyMap shouldn't

[jira] [Created] (SPARK-18986) ExternalAppendOnlyMap shouldn't fail when forced to spill before calling its iterator

2016-12-22 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-18986: --- Summary: ExternalAppendOnlyMap shouldn't fail when forced to spill before calling its iterator Key: SPARK-18986 URL: https://issues.apache.org/jira/browse/SPARK-18986

[jira] [Commented] (SPARK-18956) Python API should reuse existing SparkSession while creating new SQLContext instances

2016-12-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15766987#comment-15766987 ] Liang-Chi Hsieh commented on SPARK-18956: - Yeah, I think so. > Python API should reuse existing

[jira] [Commented] (SPARK-18800) Correct the assert in UnsafeKVExternalSorter which ensures array size

2016-12-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15766039#comment-15766039 ] Liang-Chi Hsieh commented on SPARK-18800: - Note: this jia is motivated by the issue reported on

[jira] [Updated] (SPARK-18800) Correct the assert in UnsafeKVExternalSorter which ensures array size

2016-12-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-18800: Issue Type: Improvement (was: Bug) > Correct the assert in UnsafeKVExternalSorter which

[jira] [Updated] (SPARK-18800) Correct the assert in UnsafeKVExternalSorter which ensures array size

2016-12-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-18800: Description: UnsafeKVExternalSorter uses UnsafeInMemorySorter to sort the records of

[jira] [Updated] (SPARK-18800) Correct the assert in UnsafeKVExternalSorter which ensures array size

2016-12-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-18800: Summary: Correct the assert in UnsafeKVExternalSorter which ensures array size (was:

[jira] [Commented] (SPARK-18281) toLocalIterator yields time out error on pyspark2

2016-12-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15750281#comment-15750281 ] Liang-Chi Hsieh commented on SPARK-18281: - [~mwdus...@us.ibm.com] BTW, I updated the fixing and

[jira] [Commented] (SPARK-18281) toLocalIterator yields time out error on pyspark2

2016-12-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15750246#comment-15750246 ] Liang-Chi Hsieh commented on SPARK-18281: - [~mwdus...@us.ibm.com] Thanks for this test case! It

[jira] [Commented] (SPARK-18281) toLocalIterator yields time out error on pyspark2

2016-12-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15750158#comment-15750158 ] Liang-Chi Hsieh commented on SPARK-18281: - Hi [~holdenk], what you meant for "we immediately do a

[jira] [Comment Edited] (SPARK-18281) toLocalIterator yields time out error on pyspark2

2016-12-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747120#comment-15747120 ] Liang-Chi Hsieh edited comment on SPARK-18281 at 12/14/16 3:48 AM: ---

[jira] [Commented] (SPARK-18281) toLocalIterator yields time out error on pyspark2

2016-12-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747120#comment-15747120 ] Liang-Chi Hsieh commented on SPARK-18281: - [~mwdus...@us.ibm.com] Thanks for reporting this

[jira] [Commented] (SPARK-18281) toLocalIterator yields time out error on pyspark2

2016-12-12 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744047#comment-15744047 ] Liang-Chi Hsieh commented on SPARK-18281: - [~mwdus...@us.ibm.com] I can reproduce your issue. I

[jira] [Created] (SPARK-18824) Add optimizer rule to reorder expensive Filter predicates like ScalaUDF

2016-12-11 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-18824: --- Summary: Add optimizer rule to reorder expensive Filter predicates like ScalaUDF Key: SPARK-18824 URL: https://issues.apache.org/jira/browse/SPARK-18824

[jira] [Updated] (SPARK-18800) UnsafeInMemorySorter throws exception when used in UnsafeKVExternalSorter

2016-12-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-18800: Description: UnsafeKVExternalSorter uses UnsafeInMemorySorter to sort the records of

[jira] [Updated] (SPARK-18800) UnsafeInMemorySorter throws exception when used in UnsafeKVExternalSorter

2016-12-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-18800: Description: UnsafeKVExternalSorter uses UnsafeInMemorySorter to sort the records of

[jira] [Created] (SPARK-18800) UnsafeInMemorySorter throws exception when used in UnsafeKVExternalSorter

2016-12-09 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-18800: --- Summary: UnsafeInMemorySorter throws exception when used in UnsafeKVExternalSorter Key: SPARK-18800 URL: https://issues.apache.org/jira/browse/SPARK-18800

[jira] [Closed] (SPARK-18759) when use spark streaming with sparksql, lots of temp directories are created.

2016-12-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh closed SPARK-18759. --- Resolution: Duplicate duplicate to SPARK-18703 > when use spark streaming with sparksql,

[jira] [Commented] (SPARK-18756) Memory leak in Spark streaming

2016-12-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727975#comment-15727975 ] Liang-Chi Hsieh commented on SPARK-18756: - As we already upgrade to 4.0.42.Final, this should not

[jira] [Commented] (SPARK-18756) Memory leak in Spark streaming

2016-12-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727972#comment-15727972 ] Liang-Chi Hsieh commented on SPARK-18756: - I believe this bug is fixed by

[jira] [Commented] (SPARK-18759) when use spark streaming with sparksql, lots of temp directories are created.

2016-12-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727899#comment-15727899 ] Liang-Chi Hsieh commented on SPARK-18759: - I think this is duplicate to SPARK-18703. > when use

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727542#comment-15727542 ] Liang-Chi Hsieh commented on SPARK-18539: - [~lian cheng], in Parquet's code, looks like a null

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724110#comment-15724110 ] Liang-Chi Hsieh commented on SPARK-18539: - That's cool. > Cannot filter by nonexisting column in

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723960#comment-15723960 ] Liang-Chi Hsieh commented on SPARK-18539: - Actually I am not sure if this is a valid usage. I

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723949#comment-15723949 ] Liang-Chi Hsieh commented on SPARK-18539: - Because we respect user-specified schema, we won't

[jira] [Commented] (SPARK-18681) Throw Filtering is supported only on partition keys of type string exception

2016-12-01 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713871#comment-15713871 ] Liang-Chi Hsieh commented on SPARK-18681: - Looks like you create two Jiras (SPARK-18680,

[jira] [Updated] (SPARK-18666) Remove the codes checking deprecated config spark.sql.unsafe.enabled

2016-11-30 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-18666: Description: spark.sql.unsafe.enabled is deprecated since 1.6. There still are codes in

[jira] [Created] (SPARK-18666) Remove the codes checking deprecated config spark.sql.unsafe.enabled

2016-11-30 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-18666: --- Summary: Remove the codes checking deprecated config spark.sql.unsafe.enabled Key: SPARK-18666 URL: https://issues.apache.org/jira/browse/SPARK-18666 Project:

[jira] [Commented] (SPARK-17897) not isnotnull is converted to the always false condition isnotnull && not isnotnull

2016-11-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705627#comment-15705627 ] Liang-Chi Hsieh commented on SPARK-17897: - And I think Attribute is the input of itself? > not

[jira] [Commented] (SPARK-17897) not isnotnull is converted to the always false condition isnotnull && not isnotnull

2016-11-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705621#comment-15705621 ] Liang-Chi Hsieh commented on SPARK-17897: - I think the original idea should be, in

[jira] [Closed] (SPARK-18089) Remove CollectLimitExec operator

2016-11-18 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh closed SPARK-18089. --- Resolution: Won't Fix > Remove CollectLimitExec operator >

[jira] [Updated] (SPARK-18487) Add task completion listener to HashAggregate to avoid memory leak

2016-11-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-18487: Summary: Add task completion listener to HashAggregate to avoid memory leak (was: Consume

[jira] [Updated] (SPARK-18487) Consume all elements for Dataset.show/take to avoid memory leak

2016-11-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-18487: Description: The methods such as Dataset.show and take use Limit (CollectLimitExec) which

[jira] [Created] (SPARK-18487) Consume all elements for Dataset.show/take to avoid memory leak

2016-11-17 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-18487: --- Summary: Consume all elements for Dataset.show/take to avoid memory leak Key: SPARK-18487 URL: https://issues.apache.org/jira/browse/SPARK-18487 Project: Spark

[jira] [Created] (SPARK-18395) Evaluate common subexpression like lazy variable with a function approach

2016-11-09 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-18395: --- Summary: Evaluate common subexpression like lazy variable with a function approach Key: SPARK-18395 URL: https://issues.apache.org/jira/browse/SPARK-18395

[jira] [Closed] (SPARK-18376) Skip subexpression elimination for conditional expressions

2016-11-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh closed SPARK-18376. --- Resolution: Won't Fix > Skip subexpression elimination for conditional expressions >

[jira] [Created] (SPARK-18376) Skip subexpression elimination for conditional expressions

2016-11-08 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-18376: --- Summary: Skip subexpression elimination for conditional expressions Key: SPARK-18376 URL: https://issues.apache.org/jira/browse/SPARK-18376 Project: Spark

[jira] [Closed] (SPARK-18130) Don't add inferred redundant isnotnull condition from constraints

2016-11-07 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh closed SPARK-18130. --- Resolution: Won't Fix > Don't add inferred redundant isnotnull condition from constraints >

[jira] [Commented] (SPARK-18209) More robust view canonicalization without full SQL expansion

2016-11-03 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15631952#comment-15631952 ] Liang-Chi Hsieh commented on SPARK-18209: - Ya. I see that. > More robust view canonicalization

[jira] [Comment Edited] (SPARK-18209) More robust view canonicalization without full SQL expansion

2016-11-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15631292#comment-15631292 ] Liang-Chi Hsieh edited comment on SPARK-18209 at 11/3/16 2:30 AM: -- I

[jira] [Comment Edited] (SPARK-18209) More robust view canonicalization without full SQL expansion

2016-11-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15631292#comment-15631292 ] Liang-Chi Hsieh edited comment on SPARK-18209 at 11/3/16 2:29 AM: -- I

[jira] [Commented] (SPARK-18209) More robust view canonicalization without full SQL expansion

2016-11-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15631292#comment-15631292 ] Liang-Chi Hsieh commented on SPARK-18209: - I think the disallowed one is to use CTEs in a

[jira] [Commented] (SPARK-18209) More robust view canonicalization without full SQL expansion

2016-11-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15628128#comment-15628128 ] Liang-Chi Hsieh commented on SPARK-18209: - So I think permanent views shouldn't be allowed to use

[jira] [Commented] (SPARK-18209) More robust view canonicalization without full SQL expansion

2016-11-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15628099#comment-15628099 ] Liang-Chi Hsieh commented on SPARK-18209: - Can we guarantee temp UDFs still valid in next run? >

[jira] [Commented] (SPARK-18209) More robust view canonicalization without full SQL expansion

2016-11-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15628070#comment-15628070 ] Liang-Chi Hsieh commented on SPARK-18209: - I just think of not built-in UDFs which are registered

[jira] [Commented] (SPARK-18107) Insert overwrite statement runs much slower in spark-sql than it does in hive-client

2016-10-27 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612044#comment-15612044 ] Liang-Chi Hsieh commented on SPARK-18107: - Looks like HIVE-11940 largely improves insert

[jira] [Commented] (SPARK-18107) Insert overwrite statement runs much slower in spark-sql than it does in hive-client

2016-10-27 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611083#comment-15611083 ] Liang-Chi Hsieh commented on SPARK-18107: - I can create a PR for this. But it may require

[jira] [Commented] (SPARK-18107) Insert overwrite statement runs much slower in spark-sql than it does in hive-client

2016-10-27 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610979#comment-15610979 ] Liang-Chi Hsieh commented on SPARK-18107: - I found a PR at Hive which should be the one to

[jira] [Commented] (SPARK-18107) Insert overwrite statement runs much slower in spark-sql than it does in hive-client

2016-10-27 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610962#comment-15610962 ] Liang-Chi Hsieh commented on SPARK-18107: - I checked the current codes for inserting into Hive

[jira] [Created] (SPARK-18130) Don't add inferred redundant isnotnull condition from constraints

2016-10-26 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-18130: --- Summary: Don't add inferred redundant isnotnull condition from constraints Key: SPARK-18130 URL: https://issues.apache.org/jira/browse/SPARK-18130 Project:

[jira] [Commented] (SPARK-18107) Insert overwrite statement runs much slower in spark-sql than it does in hive-client

2016-10-26 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607896#comment-15607896 ] Liang-Chi Hsieh commented on SPARK-18107: - Can you provide more information about this? Like is

[jira] [Issue Comment Deleted] (SPARK-18100) Improve the performance of get_json_object using Gson

2016-10-26 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-18100: Comment: was deleted (was: Looks like Gson doesn't have native support for json path?) >

[jira] [Commented] (SPARK-18100) Improve the performance of get_json_object using Gson

2016-10-26 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607875#comment-15607875 ] Liang-Chi Hsieh commented on SPARK-18100: - Looks like Gson doesn't have native support for json

[jira] [Commented] (SPARK-18100) Improve the performance of get_json_object using Gson

2016-10-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607204#comment-15607204 ] Liang-Chi Hsieh commented on SPARK-18100: - Looks like Gson has no native support for json path?

[jira] [Created] (SPARK-18089) Remove CollectLimitExec operator

2016-10-25 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-18089: --- Summary: Remove CollectLimitExec operator Key: SPARK-18089 URL: https://issues.apache.org/jira/browse/SPARK-18089 Project: Spark Issue Type:

<    4   5   6   7   8   9   10   11   12   13   >