[jira] [Commented] (SPARK-28672) [UDF] Duplicate function creation should not allow

2019-08-19 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911007#comment-16911007 ] Liang-Chi Hsieh commented on SPARK-28672: - Is there any rule in Hive regarding this? like

[jira] [Resolved] (SPARK-28426) Metadata Handling in Thrift Server

2019-08-19 Thread Xiao Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-28426. - Resolution: Fixed > Metadata Handling in Thrift Server > -- > >

[jira] [Commented] (SPARK-28672) [UDF] Duplicate function creation should not allow

2019-08-19 Thread pavithra ramachandran (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911003#comment-16911003 ] pavithra ramachandran commented on SPARK-28672: --- [~maropu] [~viirya]  The intention of

[jira] [Commented] (SPARK-28672) [UDF] Duplicate function creation should not allow

2019-08-19 Thread pavithra ramachandran (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911002#comment-16911002 ] pavithra ramachandran commented on SPARK-28672: --- [~abhishek.akg] -  When we execute show

[jira] [Commented] (SPARK-28774) ReusedExchangeExec cannot be columnar

2019-08-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910999#comment-16910999 ] Hyukjin Kwon commented on SPARK-28774: -- Please avoid to set target version which is usually

[jira] [Updated] (SPARK-28774) ReusedExchangeExec cannot be columnar

2019-08-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-28774: - Target Version/s: (was: 3.0.0) > ReusedExchangeExec cannot be columnar >

[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910994#comment-16910994 ] Dongjoon Hyun commented on SPARK-28699: --- Thank you for the update, [~XuanYuan]! > Cache an

[jira] [Updated] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Yuanjian Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanjian Li updated SPARK-28699: Description: It's another case for the indeterminate stage/RDD rerun while stage rerun happened.

[jira] [Updated] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28699: -- Affects Version/s: 2.3.3 2.4.3 > Cache an indeterminate RDD could lead

[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Yuanjian Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910990#comment-16910990 ] Yuanjian Li commented on SPARK-28699: - [~dongjoon] Sure, the affects version is spark-2.1 after

[jira] [Updated] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Yuanjian Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanjian Li updated SPARK-28699: Affects Version/s: 2.3.3 2.4.3 > Cache an indeterminate RDD could lead to

[jira] [Updated] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28699: -- Target Version/s: 2.3.4, 2.4.4 Affects Version/s: (was: 2.4.3)

[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910987#comment-16910987 ] Dongjoon Hyun commented on SPARK-28699: --- :) BTW, I updated this to `Blocker` according to

[jira] [Updated] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28699: -- Priority: Blocker (was: Major) > Cache an indeterminate RDD could lead to incorrect result

[jira] [Updated] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28699: -- Description: It's another case for the indeterminate stage/RDD rerun while stage rerun

[jira] [Updated] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28699: -- Description: Related with SPARK-23207 SPARK-23243 It's another case for the indeterminate

[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Kazuaki Ishizaki (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910983#comment-16910983 ] Kazuaki Ishizaki commented on SPARK-28699: -- [~dongjoon] Thank you for pointing out my typo. You

[jira] [Comment Edited] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Yuanjian Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905250#comment-16905250 ] Yuanjian Li edited comment on SPARK-28699 at 8/20/19 4:09 AM: -- -The current

[jira] [Updated] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Yuanjian Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanjian Li updated SPARK-28699: Description: Related with SPARK-23207 SPARK-23243 It's another case for the indeterminate

[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910974#comment-16910974 ] Dongjoon Hyun commented on SPARK-28699: --- ? [~kiszk]. `2.4.4-rc1` is `branch-2.4` and mine. You

[jira] [Resolved] (SPARK-28777) Pyspark sql function "format_string" has the wrong parameters in doc string

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-28777. --- Fix Version/s: 2.4.4 2.3.4 3.0.0 Resolution:

[jira] [Assigned] (SPARK-28777) Pyspark sql function "format_string" has the wrong parameters in doc string

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-28777: - Assignee: Darren Tirto > Pyspark sql function "format_string" has the wrong parameters

[jira] [Closed] (SPARK-28712) spark structured stream with kafka don't really delete temp files in spark standalone cluster

2019-08-19 Thread Jira
[ https://issues.apache.org/jira/browse/SPARK-28712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 凭落 closed SPARK-28712. -- solved by SPARK-28025  > spark structured stream with kafka don't really delete temp files in spark > standalone

[jira] [Commented] (SPARK-28712) spark structured stream with kafka don't really delete temp files in spark standalone cluster

2019-08-19 Thread Jira
[ https://issues.apache.org/jira/browse/SPARK-28712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910957#comment-16910957 ] 凭落 commented on SPARK-28712: [~kabhwan] thanks a lot! It really helps! > spark structured stream with kafka

[jira] [Commented] (SPARK-28712) spark structured stream with kafka don't really delete temp files in spark standalone cluster

2019-08-19 Thread Jira
[ https://issues.apache.org/jira/browse/SPARK-28712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910956#comment-16910956 ] 凭落 commented on SPARK-28712: [~hyukjin.kwon] I'm sorry about it, next time I'll use mail list first. >

[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Kazuaki Ishizaki (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910907#comment-16910907 ] Kazuaki Ishizaki commented on SPARK-28699: -- [~smilegator] Thank you for cc. I wait for fixing

[jira] [Commented] (SPARK-28777) Pyspark sql function "format_string" has the wrong parameters in doc string

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910893#comment-16910893 ] Dongjoon Hyun commented on SPARK-28777: --- Welcome, [~darrentirto]. Thank you for filing a new JIRA

[jira] [Resolved] (SPARK-28775) DateTimeUtilsSuite fails for JDKs using the tzdata2018i or newer timezone database

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-28775. --- Fix Version/s: 2.4.4 2.3.4 3.0.0 Resolution:

[jira] [Resolved] (SPARK-28224) Check overflow in decimal Sum aggregate

2019-08-19 Thread Takeshi Yamamuro (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-28224. -- Fix Version/s: 3.0.0 Assignee: Mick Jermsurawong Resolution: Fixed

[jira] [Commented] (SPARK-28777) Pyspark sql function "format_string" has the wrong parameters in doc string

2019-08-19 Thread Darren Tirto (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910865#comment-16910865 ] Darren Tirto commented on SPARK-28777: -- Sorry, I'm a little new to this Jira board. I created a git

[jira] [Commented] (SPARK-27648) In Spark2.4 Structured Streaming:The executor storage memory increasing over time

2019-08-19 Thread Jungtaek Lim (Jira)
[ https://issues.apache.org/jira/browse/SPARK-27648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910864#comment-16910864 ] Jungtaek Lim commented on SPARK-27648: -- Even better if you could reproduce with local filesystem or

[jira] [Updated] (SPARK-28777) Pyspark sql function "format_string" has the wrong parameters in doc string

2019-08-19 Thread Darren Tirto (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Darren Tirto updated SPARK-28777: - Shepherd: (was: Darren Tirto) > Pyspark sql function "format_string" has the wrong parameters

[jira] [Updated] (SPARK-28777) Pyspark sql function "format_string" has the wrong parameters in doc string

2019-08-19 Thread Darren Tirto (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Darren Tirto updated SPARK-28777: - Shepherd: Darren Tirto > Pyspark sql function "format_string" has the wrong parameters in doc

[jira] [Commented] (SPARK-27648) In Spark2.4 Structured Streaming:The executor storage memory increasing over time

2019-08-19 Thread Puneet Loya (Jira)
[ https://issues.apache.org/jira/browse/SPARK-27648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910861#comment-16910861 ] Puneet Loya commented on SPARK-27648: - Cassandra Sink is nothing but Cassandra Foreach batch(by

[jira] [Commented] (SPARK-22390) Aggregate push down

2019-08-19 Thread Huaxin Gao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-22390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910860#comment-16910860 ] Huaxin Gao commented on SPARK-22390: I haven't looked this Datasource V2 implementation for a while.

[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Xiao Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910859#comment-16910859 ] Xiao Li commented on SPARK-28699: - Also cc [~kiszk] Let us wait for this before starting RC1 for 2.3 >

[jira] [Assigned] (SPARK-28749) Fix PySpark tests not to require kafka-0-8 in branch-2.4

2019-08-19 Thread Sean Owen (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-28749: - Assignee: Matt Foley > Fix PySpark tests not to require kafka-0-8 in branch-2.4 >

[jira] [Resolved] (SPARK-28749) Fix PySpark tests not to require kafka-0-8 in branch-2.4

2019-08-19 Thread Sean Owen (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-28749. --- Fix Version/s: 2.4.5 Resolution: Fixed Issue resolved by pull request 25482

[jira] [Updated] (SPARK-28775) DateTimeUtilsSuite fails for JDKs using the tzdata2018i or newer timezone database

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28775: -- Issue Type: Bug (was: Improvement) > DateTimeUtilsSuite fails for JDKs using the tzdata2018i

[jira] [Updated] (SPARK-28775) DateTimeUtilsSuite fails for JDKs using the tzdata2018i or newer timezone database

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28775: -- Component/s: Tests > DateTimeUtilsSuite fails for JDKs using the tzdata2018i or newer

[jira] [Created] (SPARK-28779) CSV writer doesn't handle older Mac line endings

2019-08-19 Thread nicolas paris (Jira)
nicolas paris created SPARK-28779: - Summary: CSV writer doesn't handle older Mac line endings Key: SPARK-28779 URL: https://issues.apache.org/jira/browse/SPARK-28779 Project: Spark Issue

[jira] [Updated] (SPARK-28778) Shuffle jobs fail due to incorrect advertised address when running in virtual network

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28778: -- Summary: Shuffle jobs fail due to incorrect advertised address when running in virtual

[jira] [Created] (SPARK-28778) [MESOS] Shuffle jobs fail due to incorrect advertised address when running in virtual network

2019-08-19 Thread Anton Kirillov (Jira)
Anton Kirillov created SPARK-28778: -- Summary: [MESOS] Shuffle jobs fail due to incorrect advertised address when running in virtual network Key: SPARK-28778 URL: https://issues.apache.org/jira/browse/SPARK-28778

[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910795#comment-16910795 ] Dongjoon Hyun commented on SPARK-28699: --- Hi, [~XuanYuan]. Could you check old Spark versions and

[jira] [Commented] (SPARK-28466) FileSystem closed error when to call Hive.moveFile

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910790#comment-16910790 ] Dongjoon Hyun commented on SPARK-28466: --- Hi, [~angerszhuuu]. For the `Improvement` JIRA issue,

[jira] [Updated] (SPARK-28466) FileSystem closed error when to call Hive.moveFile

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28466: -- Affects Version/s: (was: 2.4.0) (was: 2.3.0) > FileSystem

[jira] [Updated] (SPARK-28590) Add sort_stats Setter for Custom Profiler

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28590: -- Affects Version/s: (was: 2.4.0) 3.0.0 > Add sort_stats Setter for

[jira] [Updated] (SPARK-28594) Allow event logs for running streaming apps to be rolled over.

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28594: -- Affects Version/s: (was: 2.4.0) (was: 2.2.1)

[jira] [Updated] (SPARK-28590) Add sort_stats Setter for Custom Profiler

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28590: -- Target Version/s: (was: 2.4.0) > Add sort_stats Setter for Custom Profiler >

[jira] [Updated] (SPARK-28597) Spark streaming terminated when close meta data log error

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28597: -- Affects Version/s: (was: 2.4.3) 3.0.0 > Spark streaming terminated

[jira] [Updated] (SPARK-28547) Make it work for wide (> 10K columns data)

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28547: -- Affects Version/s: (was: 2.4.4) (was: 2.4.3)

[jira] [Updated] (SPARK-28560) Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28560: -- Affects Version/s: (was: 2.4.3) 3.0.0 > Optimize shuffle reader to

[jira] [Updated] (SPARK-28715) Introduce collectInPlanAndSubqueries and subqueriesAll in QueryPlan

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28715: -- Affects Version/s: (was: 2.4.3) 3.0.0 > Introduce

[jira] [Updated] (SPARK-28552) The URL prefix lowercase of MySQL is not necessary, but it is necessary in spark

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28552: -- Affects Version/s: (was: 2.4.3) 3.0.0 > The URL prefix lowercase

[jira] [Updated] (SPARK-28631) Update Kinesis dependencies to the Apache version licensed versions

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28631: -- Affects Version/s: (was: 2.4.3) > Update Kinesis dependencies to the Apache version

[jira] [Updated] (SPARK-28678) Specify that start index is 1-based in docstring of pyspark.sql.functions.slice

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28678: -- Affects Version/s: (was: 2.4.3) 3.0.0 > Specify that start index

[jira] [Updated] (SPARK-28746) Add repartitionby hint to support RepartitionByExpression

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28746: -- Affects Version/s: (was: 2.4.3) 3.0.0 > Add repartitionby hint to

[jira] [Updated] (SPARK-28596) Use Java 8 time API in date_trunc

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28596: -- Affects Version/s: (was: 2.4.3) 3.0.0 > Use Java 8 time API in

[jira] [Updated] (SPARK-28771) Join partitioned dataframes on superset of partitioning columns without shuffle

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28771: -- Affects Version/s: (was: 2.4.3) 3.0.0 > Join partitioned

[jira] [Updated] (SPARK-28577) Ensure executorMemoryHead requested value not less than MEMORY_OFFHEAP_SIZE when MEMORY_OFFHEAP_ENABLED is true

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28577: -- Affects Version/s: (was: 2.4.3) (was: 2.3.3)

[jira] [Updated] (SPARK-28727) Request for partial least square (PLS) regression model

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28727: -- Affects Version/s: (was: 2.4.3) 3.0.0 > Request for partial least

[jira] [Updated] (SPARK-28415) Add messageHandler to Kafka 10 direct stream API

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28415: -- Affects Version/s: (was: 2.4.3) 3.0.0 > Add messageHandler to

[jira] [Updated] (SPARK-28716) Add id to Exchange and Subquery's stringArgs method for easier identifying their reuses in query plans

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28716: -- Affects Version/s: (was: 2.4.3) 3.0.0 > Add id to Exchange and

[jira] [Updated] (SPARK-28762) Read JAR main class if JAR is not located in local file system

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28762: -- Affects Version/s: (was: 2.4.3) 3.0.0 > Read JAR main class if JAR

[jira] [Updated] (SPARK-28573) Convert InsertIntoTable(HiveTableRelation) to Datasource inserting for partitioned table

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28573: -- Affects Version/s: (was: 2.4.3) (was: 2.3.3)

[jira] [Updated] (SPARK-28751) Imporve java serializer deserialization performance

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28751: -- Affects Version/s: (was: 2.4.3) 3.0.0 > Imporve java serializer

[jira] [Updated] (SPARK-28655) Support to cut the event log, and solve the history server was too slow when event log is too large.

2019-08-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28655: -- Affects Version/s: (was: 2.4.3) 3.0.0 > Support to cut the event

[jira] [Resolved] (SPARK-28434) Decision Tree model isn't equal after save and load

2019-08-19 Thread Sean Owen (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-28434. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25485

[jira] [Assigned] (SPARK-28775) DateTimeUtilsSuite fails for JDKs using the tzdata2018i or newer timezone database

2019-08-19 Thread Herman van Hovell (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell reassigned SPARK-28775: - Assignee: Sean Owen (was: Herman van Hovell) > DateTimeUtilsSuite fails for

[jira] [Created] (SPARK-28777) Pyspark sql function "format_string" has the wrong parameters in doc string

2019-08-19 Thread Darren Tirto (Jira)
Darren Tirto created SPARK-28777: Summary: Pyspark sql function "format_string" has the wrong parameters in doc string Key: SPARK-28777 URL: https://issues.apache.org/jira/browse/SPARK-28777 Project:

[jira] [Updated] (SPARK-28775) DateTimeUtilsSuite fails for JDKs using the tzdata2018i or newer timezone database

2019-08-19 Thread Sean Owen (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-28775: -- Summary: DateTimeUtilsSuite fails for JDKs using the tzdata2018i or newer timezone database (was:

[jira] [Created] (SPARK-28776) SparkML MLWriter gets hadoop conf from spark context instead of session

2019-08-19 Thread Helen Yu (Jira)
Helen Yu created SPARK-28776: Summary: SparkML MLWriter gets hadoop conf from spark context instead of session Key: SPARK-28776 URL: https://issues.apache.org/jira/browse/SPARK-28776 Project: Spark

[jira] [Created] (SPARK-28775) DateTimeUtilsSuite fails for JDKs using the tzdata2018h or newer timezone database

2019-08-19 Thread Herman van Hovell (Jira)
Herman van Hovell created SPARK-28775: - Summary: DateTimeUtilsSuite fails for JDKs using the tzdata2018h or newer timezone database Key: SPARK-28775 URL: https://issues.apache.org/jira/browse/SPARK-28775

[jira] [Commented] (SPARK-25603) Generalize Nested Column Pruning

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-25603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910738#comment-16910738 ] Nicholas Chammas commented on SPARK-25603: -- [~dbtsai] - Just watched [your Spark Summit talk on

[jira] [Comment Edited] (SPARK-4502) Spark SQL reads unneccesary nested fields from Parquet

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910655#comment-16910655 ] Nicholas Chammas edited comment on SPARK-4502 at 8/19/19 7:55 PM: --

[jira] [Comment Edited] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910691#comment-16910691 ] Nicholas Chammas edited comment on SPARK-25150 at 8/19/19 7:39 PM: --- I

[jira] [Commented] (SPARK-27648) In Spark2.4 Structured Streaming:The executor storage memory increasing over time

2019-08-19 Thread Jungtaek Lim (Jira)
[ https://issues.apache.org/jira/browse/SPARK-27648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910692#comment-16910692 ] Jungtaek Lim commented on SPARK-27648: -- [~ploya] Looks like you're running load test, then can the

[jira] [Updated] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-25150: - Affects Version/s: 2.4.3 Labels: correctness (was: ) I haven't been

[jira] [Commented] (SPARK-27648) In Spark2.4 Structured Streaming:The executor storage memory increasing over time

2019-08-19 Thread Puneet Loya (Jira)
[ https://issues.apache.org/jira/browse/SPARK-27648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910680#comment-16910680 ] Puneet Loya commented on SPARK-27648: - Had posted about high storage memory issue on the mailing

[jira] [Updated] (SPARK-19248) Regex_replace works in 1.6 but not in 2.0

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-19248: - Labels: correctness (was: ) Tagging this as a correctness issue since Spark 2+'s

[jira] [Updated] (SPARK-18084) write.partitionBy() does not recognize nested columns that select() can access

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18084: - Affects Version/s: 2.4.3 Retested and confirmed that this issue is still present in

[jira] [Updated] (SPARK-10892) Join with Data Frame returns wrong results

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-10892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-10892: - Affects Version/s: 2.4.0 Labels: correctness (was: ) Updating affected

[jira] [Commented] (SPARK-4502) Spark SQL reads unneccesary nested fields from Parquet

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910655#comment-16910655 ] Nicholas Chammas commented on SPARK-4502: - Thanks for your notes [~Bartalos]. Just FYI, nested

[jira] [Assigned] (SPARK-28634) Failed to start SparkSession with Keytab file

2019-08-19 Thread Marcelo Vanzin (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned SPARK-28634: -- Assignee: Marcelo Vanzin > Failed to start SparkSession with Keytab file >

[jira] [Resolved] (SPARK-28634) Failed to start SparkSession with Keytab file

2019-08-19 Thread Marcelo Vanzin (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-28634. Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25467

[jira] [Resolved] (SPARK-25262) Support tmpfs for local dirs in k8s

2019-08-19 Thread Marcelo Vanzin (Jira)
[ https://issues.apache.org/jira/browse/SPARK-25262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-25262. Fix Version/s: 3.0.0 Assignee: Rob Vesse Resolution: Fixed > Support

[jira] [Updated] (SPARK-25262) Support tmpfs for local dirs in k8s

2019-08-19 Thread Marcelo Vanzin (Jira)
[ https://issues.apache.org/jira/browse/SPARK-25262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-25262: --- Summary: Support tmpfs for local dirs in k8s (was: Make Spark local dir volumes

[jira] [Commented] (SPARK-25262) Make Spark local dir volumes configurable with Spark on Kubernetes

2019-08-19 Thread Marcelo Vanzin (Jira)
[ https://issues.apache.org/jira/browse/SPARK-25262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910616#comment-16910616 ] Marcelo Vanzin commented on SPARK-25262: Full configurability was actually added in SPARK-28042.

[jira] [Created] (SPARK-28774) ReusedExchangeExec cannot be columnar

2019-08-19 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-28774: --- Summary: ReusedExchangeExec cannot be columnar Key: SPARK-28774 URL: https://issues.apache.org/jira/browse/SPARK-28774 Project: Spark Issue

[jira] [Created] (SPARK-28773) NULL Handling

2019-08-19 Thread Xiao Li (Jira)
Xiao Li created SPARK-28773: --- Summary: NULL Handling Key: SPARK-28773 URL: https://issues.apache.org/jira/browse/SPARK-28773 Project: Spark Issue Type: Sub-task Components: Documentation

[jira] [Resolved] (SPARK-28734) Create a table of content in the left hand side bar for SQL doc.

2019-08-19 Thread Xiao Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-28734. - Fix Version/s: 3.0.0 Resolution: Fixed > Create a table of content in the left hand side bar for

[jira] [Assigned] (SPARK-28734) Create a table of content in the left hand side bar for SQL doc.

2019-08-19 Thread Xiao Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-28734: --- Assignee: Dilip Biswal > Create a table of content in the left hand side bar for SQL doc. >

[jira] [Created] (SPARK-28772) Upgrade breeze to 1.0

2019-08-19 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-28772: --- Summary: Upgrade breeze to 1.0 Key: SPARK-28772 URL: https://issues.apache.org/jira/browse/SPARK-28772 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Josh Rosen (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-28699: --- Labels: correctness (was: ) > Cache an indeterminate RDD could lead to incorrect result while