[jira] [Assigned] (SPARK-20877) Investigate if tests will time out on CRAN

2017-05-30 Thread Shivaram Venkataraman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman reassigned SPARK-20877: - Assignee: Felix Cheung > Investigate if tests will time out on CRAN >

[jira] [Resolved] (SPARK-20877) Investigate if tests will time out on CRAN

2017-05-30 Thread Shivaram Venkataraman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman resolved SPARK-20877. --- Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request

[jira] [Updated] (SPARK-20854) extend hint syntax to support any expression, not just identifiers or strings

2017-05-30 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-20854: Issue Type: Improvement (was: Bug) > extend hint syntax to support any expression, not just identifiers

[jira] [Assigned] (SPARK-20854) extend hint syntax to support any expression, not just identifiers or strings

2017-05-30 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-20854: --- Assignee: Bogdan Raducanu Target Version/s: 2.2.0 Priority: Blocker (was:

[jira] [Comment Edited] (SPARK-20199) GradientBoostedTreesModel doesn't have featureSubsetStrategy parameter

2017-05-30 Thread pralabhkumar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030629#comment-16030629 ] pralabhkumar edited comment on SPARK-20199 at 5/31/17 4:56 AM: --- please

[jira] [Commented] (SPARK-20199) GradientBoostedTreesModel doesn't have featureSubsetStrategy parameter

2017-05-30 Thread pralabhkumar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030629#comment-16030629 ] pralabhkumar commented on SPARK-20199: -- please review the pull request .

[jira] [Updated] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-05-30 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-20392: Target Version/s: 2.3.0 > Slow performance when calling fit on ML pipeline for dataset with many

[jira] [Updated] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-05-30 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-20392: Priority: Blocker (was: Major) > Slow performance when calling fit on ML pipeline for dataset

[jira] [Updated] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-05-30 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-20392: Issue Type: Improvement (was: Bug) > Slow performance when calling fit on ML pipeline for dataset

[jira] [Reopened] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-05-30 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reopened SPARK-20392: - will re-merge it at the end of Spark 2.3, to avoid conflicts when backporting analyzer related PRs

[jira] [Updated] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-05-30 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-20392: Fix Version/s: (was: 2.3.0) > Slow performance when calling fit on ML pipeline for dataset

[jira] [Commented] (SPARK-20876) If the input parameter is float type for ceil or floor ,the result is not we expected

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030604#comment-16030604 ] Apache Spark commented on SPARK-20876: -- User '10110346' has created a pull request for this issue:

[jira] [Assigned] (SPARK-20932) CountVectorizer support handle persistence

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20932: Assignee: (was: Apache Spark) > CountVectorizer support handle persistence >

[jira] [Assigned] (SPARK-20932) CountVectorizer support handle persistence

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20932: Assignee: Apache Spark > CountVectorizer support handle persistence >

[jira] [Commented] (SPARK-20932) CountVectorizer support handle persistence

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030601#comment-16030601 ] Apache Spark commented on SPARK-20932: -- User 'zhengruifeng' has created a pull request for this

[jira] [Created] (SPARK-20932) CountVectorizer support handle persistence

2017-05-30 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-20932: Summary: CountVectorizer support handle persistence Key: SPARK-20932 URL: https://issues.apache.org/jira/browse/SPARK-20932 Project: Spark Issue Type:

[jira] [Issue Comment Deleted] (SPARK-20931) Built-in SQL Function - ABS support string type

2017-05-30 Thread Yuming Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-20931: Comment: was deleted (was: I'm working on.) > Built-in SQL Function - ABS support string type >

[jira] [Assigned] (SPARK-20931) Built-in SQL Function - ABS support string type

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20931: Assignee: Apache Spark > Built-in SQL Function - ABS support string type >

[jira] [Assigned] (SPARK-20931) Built-in SQL Function - ABS support string type

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20931: Assignee: (was: Apache Spark) > Built-in SQL Function - ABS support string type >

[jira] [Commented] (SPARK-20931) Built-in SQL Function - ABS support string type

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030577#comment-16030577 ] Apache Spark commented on SPARK-20931: -- User 'wangyum' has created a pull request for this issue:

[jira] [Resolved] (SPARK-20275) HistoryServer page shows incorrect complete date of inprogress apps

2017-05-30 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-20275. - Resolution: Fixed Assignee: Saisai Shao Fix Version/s: 2.2.0

[jira] [Created] (SPARK-20931) Built-in SQL Function - ABS support string type

2017-05-30 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-20931: --- Summary: Built-in SQL Function - ABS support string type Key: SPARK-20931 URL: https://issues.apache.org/jira/browse/SPARK-20931 Project: Spark Issue Type:

[jira] [Commented] (SPARK-20931) Built-in SQL Function - ABS support string type

2017-05-30 Thread Yuming Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030562#comment-16030562 ] Yuming Wang commented on SPARK-20931: - I'm working on. > Built-in SQL Function - ABS support string

[jira] [Assigned] (SPARK-20930) Destroy broadcasted centers after computing cost

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20930: Assignee: Apache Spark > Destroy broadcasted centers after computing cost >

[jira] [Assigned] (SPARK-20930) Destroy broadcasted centers after computing cost

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20930: Assignee: (was: Apache Spark) > Destroy broadcasted centers after computing cost >

[jira] [Commented] (SPARK-20930) Destroy broadcasted centers after computing cost

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030559#comment-16030559 ] Apache Spark commented on SPARK-20930: -- User 'zhengruifeng' has created a pull request for this

[jira] [Created] (SPARK-20930) Destroy broadcasted centers after computing cost

2017-05-30 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-20930: Summary: Destroy broadcasted centers after computing cost Key: SPARK-20930 URL: https://issues.apache.org/jira/browse/SPARK-20930 Project: Spark Issue

[jira] [Assigned] (SPARK-20213) DataFrameWriter operations do not show up in SQL tab

2017-05-30 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-20213: --- Assignee: Wenchen Fan > DataFrameWriter operations do not show up in SQL tab >

[jira] [Resolved] (SPARK-20213) DataFrameWriter operations do not show up in SQL tab

2017-05-30 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-20213. - Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 18064

[jira] [Assigned] (SPARK-20929) LinearSVC should not use shared Param HasThresholds

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20929: Assignee: Apache Spark (was: Joseph K. Bradley) > LinearSVC should not use shared Param

[jira] [Assigned] (SPARK-20929) LinearSVC should not use shared Param HasThresholds

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20929: Assignee: Joseph K. Bradley (was: Apache Spark) > LinearSVC should not use shared Param

[jira] [Commented] (SPARK-20929) LinearSVC should not use shared Param HasThresholds

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030437#comment-16030437 ] Apache Spark commented on SPARK-20929: -- User 'jkbradley' has created a pull request for this issue:

[jira] [Updated] (SPARK-20929) LinearSVC should not use shared Param HasThresholds

2017-05-30 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-20929: -- Priority: Minor (was: Major) > LinearSVC should not use shared Param HasThresholds >

[jira] [Created] (SPARK-20929) LinearSVC should not use shared Param HasThresholds

2017-05-30 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-20929: - Summary: LinearSVC should not use shared Param HasThresholds Key: SPARK-20929 URL: https://issues.apache.org/jira/browse/SPARK-20929 Project: Spark

[jira] [Commented] (SPARK-20928) Continuous Processing Mode for Structured Streaming

2017-05-30 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030379#comment-16030379 ] Nan Zhu commented on SPARK-20928: - Hi, is there any description on what does it mean? > Continuous

[jira] [Resolved] (SPARK-20651) Speed up the new app state listener

2017-05-30 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-20651. Resolution: Won't Do I've done some perf work to make sure live applications don't

[jira] [Closed] (SPARK-2183) Avoid loading/shuffling data twice in self-join query

2017-05-30 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin closed SPARK-2183. -- Resolution: Fixed Assignee: Reynold Xin This shouldn't be an issue anymore with reuse exchange in

[jira] [Commented] (SPARK-20178) Improve Scheduler fetch failures

2017-05-30 Thread Sital Kedia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030284#comment-16030284 ] Sital Kedia commented on SPARK-20178: - https://github.com/apache/spark/pull/18150 > Improve

[jira] [Resolved] (SPARK-20883) Improve StateStore APIs for efficiency

2017-05-30 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-20883. -- Resolution: Fixed Fix Version/s: 2.3.0 > Improve StateStore APIs for efficiency >

[jira] [Commented] (SPARK-19753) Remove all shuffle files on a host in case of slave lost of fetch failure

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030270#comment-16030270 ] Apache Spark commented on SPARK-19753: -- User 'sitalkedia' has created a pull request for this issue:

[jira] [Assigned] (SPARK-20894) Error while checkpointing to HDFS

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20894: Assignee: (was: Apache Spark) > Error while checkpointing to HDFS >

[jira] [Assigned] (SPARK-20894) Error while checkpointing to HDFS

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20894: Assignee: Apache Spark > Error while checkpointing to HDFS >

[jira] [Commented] (SPARK-20894) Error while checkpointing to HDFS

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030261#comment-16030261 ] Apache Spark commented on SPARK-20894: -- User 'zsxwing' has created a pull request for this issue:

[jira] [Assigned] (SPARK-20926) Exposure to Guava libraries by directly accessing tableRelationCache in SessionCatalog caused failures

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20926: Assignee: Apache Spark > Exposure to Guava libraries by directly accessing

[jira] [Commented] (SPARK-20926) Exposure to Guava libraries by directly accessing tableRelationCache in SessionCatalog caused failures

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030257#comment-16030257 ] Apache Spark commented on SPARK-20926: -- User 'rezasafi' has created a pull request for this issue:

[jira] [Assigned] (SPARK-20926) Exposure to Guava libraries by directly accessing tableRelationCache in SessionCatalog caused failures

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20926: Assignee: (was: Apache Spark) > Exposure to Guava libraries by directly accessing

[jira] [Commented] (SPARK-20925) Out of Memory Issues With org.apache.spark.sql.DataFrameWriter#partitionBy

2017-05-30 Thread Jeffrey Quinn (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030197#comment-16030197 ] Jeffrey Quinn commented on SPARK-20925: --- Apologies, will move to the mailing list next time I have

[jira] [Created] (SPARK-20928) Continuous Processing Mode for Structured Streaming

2017-05-30 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-20928: Summary: Continuous Processing Mode for Structured Streaming Key: SPARK-20928 URL: https://issues.apache.org/jira/browse/SPARK-20928 Project: Spark

[jira] [Commented] (SPARK-19236) Add createOrReplaceGlobalTempView

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030165#comment-16030165 ] Apache Spark commented on SPARK-19236: -- User 'gatorsmile' has created a pull request for this issue:

[jira] [Commented] (SPARK-20894) Error while checkpointing to HDFS (similar to JIRA SPARK-19268)

2017-05-30 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030160#comment-16030160 ] Shixiong Zhu commented on SPARK-20894: -- The root issue here is the driver uses the local file system

[jira] [Updated] (SPARK-20894) Error while checkpointing to HDFS

2017-05-30 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-20894: - Summary: Error while checkpointing to HDFS (was: Error while checkpointing to HDFS (similar to

[jira] [Updated] (SPARK-20894) Error while checkpointing to HDFS (similar to JIRA SPARK-19268)

2017-05-30 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-20894: - Issue Type: Improvement (was: Bug) > Error while checkpointing to HDFS (similar to JIRA

[jira] [Reopened] (SPARK-20894) Error while checkpointing to HDFS (similar to JIRA SPARK-19268)

2017-05-30 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reopened SPARK-20894: -- > Error while checkpointing to HDFS (similar to JIRA SPARK-19268) >

[jira] [Resolved] (SPARK-20924) Unable to call the function registered in the not-current database

2017-05-30 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-20924. - Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 18146

[jira] [Commented] (SPARK-20925) Out of Memory Issues With org.apache.spark.sql.DataFrameWriter#partitionBy

2017-05-30 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030108#comment-16030108 ] Sean Owen commented on SPARK-20925: --- This is better for the mailing list. Spark allocates off heap

[jira] [Commented] (SPARK-19732) DataFrame.fillna() does not work for bools in PySpark

2017-05-30 Thread Ruben Berenguel (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030102#comment-16030102 ] Ruben Berenguel commented on SPARK-19732: - I'll give this a go! > DataFrame.fillna() does not

[jira] [Commented] (SPARK-20925) Out of Memory Issues With org.apache.spark.sql.DataFrameWriter#partitionBy

2017-05-30 Thread Jeffrey Quinn (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030089#comment-16030089 ] Jeffrey Quinn commented on SPARK-20925: --- Thanks Sean, Sorry to continue to comment on a resolved

[jira] [Resolved] (SPARK-20925) Out of Memory Issues With org.apache.spark.sql.DataFrameWriter#partitionBy

2017-05-30 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-20925. --- Resolution: Not A Problem That doesn't mean the JVM is out of memory; it kind of means the opposite.

[jira] [Commented] (SPARK-20923) TaskMetrics._updatedBlockStatuses uses a lot of memory

2017-05-30 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030059#comment-16030059 ] Thomas Graves commented on SPARK-20923: --- taking a quick look at the history of the

[jira] [Comment Edited] (SPARK-18881) Spark never finishes jobs and stages, JobProgressListener fails

2017-05-30 Thread Mathieu D (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030025#comment-16030025 ] Mathieu D edited comment on SPARK-18881 at 5/30/17 7:52 PM: Just to mention a

[jira] [Commented] (SPARK-18881) Spark never finishes jobs and stages, JobProgressListener fails

2017-05-30 Thread Mathieu D (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030025#comment-16030025 ] Mathieu D commented on SPARK-18881: --- Just to mention a workaround for those experiencing the problem :

[jira] [Commented] (SPARK-19044) PySpark dropna() can fail with AnalysisException

2017-05-30 Thread Ruben Berenguel (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030021#comment-16030021 ] Ruben Berenguel commented on SPARK-19044: - Oh, there's a typo in the "equivalent Scala code" in

[jira] [Commented] (SPARK-20803) KernelDensity.estimate in pyspark.mllib.stat.KernelDensity throws net.razorvine.pickle.PickleException when input data is normally distributed (no error when data is n

2017-05-30 Thread Bettadapura Srinath Sharma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030004#comment-16030004 ] Bettadapura Srinath Sharma commented on SPARK-20803: In Java, the (correct) result

[jira] [Created] (SPARK-20927) Add cache operator to Unsupported Operations in Structured Streaming

2017-05-30 Thread Jacek Laskowski (JIRA)
Jacek Laskowski created SPARK-20927: --- Summary: Add cache operator to Unsupported Operations in Structured Streaming Key: SPARK-20927 URL: https://issues.apache.org/jira/browse/SPARK-20927 Project:

[jira] [Commented] (SPARK-20926) Exposure to Guava libraries by directly accessing tableRelationCache in SessionCatalog caused failures

2017-05-30 Thread Reza Safi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029996#comment-16029996 ] Reza Safi commented on SPARK-20926: --- I will post a pull request for this issue soon, by tonight at the

[jira] [Updated] (SPARK-20925) Out of Memory Issues With org.apache.spark.sql.DataFrameWriter#partitionBy

2017-05-30 Thread Jeffrey Quinn (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Quinn updated SPARK-20925: -- Description: Observed under the following conditions: Spark Version: Spark 2.1.0 Hadoop

[jira] [Commented] (SPARK-20925) Out of Memory Issues With org.apache.spark.sql.DataFrameWriter#partitionBy

2017-05-30 Thread Jeffrey Quinn (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029993#comment-16029993 ] Jeffrey Quinn commented on SPARK-20925: --- Hi Sean, Sorry for not providing adequate information.

[jira] [Created] (SPARK-20926) Exposure to Guava libraries by directly accessing tableRelationCache in SessionCatalog caused failures

2017-05-30 Thread Reza Safi (JIRA)
Reza Safi created SPARK-20926: - Summary: Exposure to Guava libraries by directly accessing tableRelationCache in SessionCatalog caused failures Key: SPARK-20926 URL: https://issues.apache.org/jira/browse/SPARK-20926

[jira] [Commented] (SPARK-20925) Out of Memory Issues With org.apache.spark.sql.DataFrameWriter#partitionBy

2017-05-30 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029984#comment-16029984 ] Sean Owen commented on SPARK-20925: --- Not enough info here -- is the JVM running out of memory? is YARN

[jira] [Updated] (SPARK-20925) Out of Memory Issues With org.apache.spark.sql.DataFrameWriter#partitionBy

2017-05-30 Thread Jeffrey Quinn (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Quinn updated SPARK-20925: -- Description: Observed under the following conditions: Spark Version: Spark 2.1.0 Hadoop

[jira] [Created] (SPARK-20925) Out of Memory Issues With org.apache.spark.sql.DataFrameWriter#partitionBy

2017-05-30 Thread Jeffrey Quinn (JIRA)
Jeffrey Quinn created SPARK-20925: - Summary: Out of Memory Issues With org.apache.spark.sql.DataFrameWriter#partitionBy Key: SPARK-20925 URL: https://issues.apache.org/jira/browse/SPARK-20925

[jira] [Commented] (SPARK-20923) TaskMetrics._updatedBlockStatuses uses a lot of memory

2017-05-30 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029962#comment-16029962 ] Josh Rosen commented on SPARK-20923: It doesn't seem to be used, as far as I can tell from a quick

[jira] [Assigned] (SPARK-20333) Fix HashPartitioner in DAGSchedulerSuite

2017-05-30 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid reassigned SPARK-20333: Assignee: jin xing > Fix HashPartitioner in DAGSchedulerSuite >

[jira] [Commented] (SPARK-20802) kolmogorovSmirnovTest in pyspark.mllib.stat.Statistics throws net.razorvine.pickle.PickleException when input data is normally distributed (no error when data is not n

2017-05-30 Thread Bettadapura Srinath Sharma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029950#comment-16029950 ] Bettadapura Srinath Sharma commented on SPARK-20802: In Java, (Correct behavior)

[jira] [Resolved] (SPARK-20333) Fix HashPartitioner in DAGSchedulerSuite

2017-05-30 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid resolved SPARK-20333. -- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 17634

[jira] [Commented] (SPARK-20923) TaskMetrics._updatedBlockStatuses uses a lot of memory

2017-05-30 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029917#comment-16029917 ] Thomas Graves commented on SPARK-20923: --- [~joshrosen] [~zsxwing] [~eseyfe] I think you have looked

[jira] [Closed] (SPARK-15905) Driver hung while writing to console progress bar

2017-05-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil closed SPARK-15905. --- Resolution: Cannot Reproduce > Driver hung while writing to console progress bar >

[jira] [Commented] (SPARK-20923) TaskMetrics._updatedBlockStatuses uses a lot of memory

2017-05-30 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029847#comment-16029847 ] Ryan Blue commented on SPARK-20923: --- I didn't look at the code path up to writing history files. I just

[jira] [Commented] (SPARK-15905) Driver hung while writing to console progress bar

2017-05-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029846#comment-16029846 ] Tejas Patil commented on SPARK-15905: - I haven't seen this in a while with Spark 2.0. Closing. If

[jira] [Updated] (SPARK-20597) KafkaSourceProvider falls back on path as synonym for topic

2017-05-30 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-20597: - Labels: starter (was: ) > KafkaSourceProvider falls back on path as synonym for topic >

[jira] [Updated] (SPARK-20599) ConsoleSink should work with write (batch)

2017-05-30 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-20599: - Labels: starter (was: ) > ConsoleSink should work with write (batch) >

[jira] [Updated] (SPARK-20919) Simplificaiton of CachedKafkaConsumer using guava cache.

2017-05-30 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-20919: - Affects Version/s: (was: 2.3.0) 2.2.0 Target Version/s: 2.3.0 >

[jira] [Updated] (SPARK-20919) Simplificaiton of CachedKafkaConsumer using guava cache.

2017-05-30 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-20919: - Issue Type: Improvement (was: Bug) > Simplificaiton of CachedKafkaConsumer using guava cache. >

[jira] [Assigned] (SPARK-20924) Unable to call the function registered in the not-current database

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20924: Assignee: Apache Spark (was: Xiao Li) > Unable to call the function registered in the

[jira] [Assigned] (SPARK-20924) Unable to call the function registered in the not-current database

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20924: Assignee: Xiao Li (was: Apache Spark) > Unable to call the function registered in the

[jira] [Commented] (SPARK-20924) Unable to call the function registered in the not-current database

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029815#comment-16029815 ] Apache Spark commented on SPARK-20924: -- User 'gatorsmile' has created a pull request for this issue:

[jira] [Created] (SPARK-20924) Unable to call the function registered in the not-current database

2017-05-30 Thread Xiao Li (JIRA)
Xiao Li created SPARK-20924: --- Summary: Unable to call the function registered in the not-current database Key: SPARK-20924 URL: https://issues.apache.org/jira/browse/SPARK-20924 Project: Spark

[jira] [Commented] (SPARK-20832) Standalone master should explicitly inform drivers of worker deaths and invalidate external shuffle service outputs

2017-05-30 Thread Jiang Xingbo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029778#comment-16029778 ] Jiang Xingbo commented on SPARK-20832: -- I'm working on this. > Standalone master should explicitly

[jira] [Updated] (SPARK-20899) PySpark supports stringIndexerOrderType in RFormula

2017-05-30 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-20899: Component/s: ML > PySpark supports stringIndexerOrderType in RFormula >

[jira] [Resolved] (SPARK-20899) PySpark supports stringIndexerOrderType in RFormula

2017-05-30 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang resolved SPARK-20899. - Resolution: Fixed Assignee: Wayne Zhang Fix Version/s: 2.3.0 > PySpark supports

[jira] [Commented] (SPARK-20923) TaskMetrics._updatedBlockStatuses uses a lot of memory

2017-05-30 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029701#comment-16029701 ] Thomas Graves commented on SPARK-20923: --- [~rdblue] with SPARK-20084, did you see anything using

[jira] [Created] (SPARK-20923) TaskMetrics._updatedBlockStatuses uses a lot of memory

2017-05-30 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-20923: - Summary: TaskMetrics._updatedBlockStatuses uses a lot of memory Key: SPARK-20923 URL: https://issues.apache.org/jira/browse/SPARK-20923 Project: Spark

[jira] [Commented] (SPARK-20922) Unsafe deserialization in Spark LauncherConnection

2017-05-30 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029693#comment-16029693 ] Marcelo Vanzin commented on SPARK-20922: Yeah, it's not as simple to exploit, but I guess we'll

[jira] [Assigned] (SPARK-20909) Build-in SQL Function Support - DAYOFWEEK

2017-05-30 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-20909: --- Assignee: Yuming Wang > Build-in SQL Function Support - DAYOFWEEK >

[jira] [Commented] (SPARK-20633) FileFormatWriter wrap the FetchFailedException which breaks job's failover

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029635#comment-16029635 ] Apache Spark commented on SPARK-20633: -- User 'squito' has created a pull request for this issue:

[jira] [Commented] (SPARK-18683) REST APIs for standalone Master、Workers and Applications

2017-05-30 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029591#comment-16029591 ] Imran Rashid commented on SPARK-18683: -- [~stanzhai] commented here

[jira] [Resolved] (SPARK-20900) ApplicationMaster crashes if SPARK_YARN_STAGING_DIR is not set

2017-05-30 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-20900. --- Resolution: Not A Problem > ApplicationMaster crashes if SPARK_YARN_STAGING_DIR is not set >

[jira] [Comment Edited] (SPARK-20922) Unsafe deserialization in Spark LauncherConnection

2017-05-30 Thread Aditya Sharad (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029455#comment-16029455 ] Aditya Sharad edited comment on SPARK-20922 at 5/30/17 3:16 PM: Yes, this

[jira] [Assigned] (SPARK-20912) map function with columns as strings

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20912: Assignee: Apache Spark > map function with columns as strings >

[jira] [Assigned] (SPARK-20912) map function with columns as strings

2017-05-30 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20912: Assignee: (was: Apache Spark) > map function with columns as strings >

  1   2   >