[jira] [Commented] (SPARK-24437) Memory leak in UnsafeHashedRelation

2018-11-07 Thread David Vogelbacher (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679253#comment-16679253 ] David Vogelbacher commented on SPARK-24437: --- [~eyalfa] There might be hundreds of cached

[jira] [Created] (SPARK-25970) Add Instrumentation to PrefixSpan

2018-11-07 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-25970: Summary: Add Instrumentation to PrefixSpan Key: SPARK-25970 URL: https://issues.apache.org/jira/browse/SPARK-25970 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-25676) Refactor BenchmarkWideTable to use main method

2018-11-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679312#comment-16679312 ] Apache Spark commented on SPARK-25676: -- User 'dongjoon-hyun' has created a pull request for this

[jira] [Resolved] (SPARK-25955) Porting JSON test for CSV functions

2018-11-07 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-25955. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 22960

[jira] [Comment Edited] (SPARK-25958) error: [Errno 97] Address family not supported by protocol in dataframe.take()

2018-11-07 Thread Ruslan Dautkhanov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678894#comment-16678894 ] Ruslan Dautkhanov edited comment on SPARK-25958 at 11/7/18 10:35 PM: -

[jira] [Commented] (SPARK-25958) error: [Errno 97] Address family not supported by protocol in dataframe.take()

2018-11-07 Thread Ruslan Dautkhanov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678894#comment-16678894 ] Ruslan Dautkhanov commented on SPARK-25958: --- We do have ipv6 disabled on our hadoop servers,

[jira] [Updated] (SPARK-25958) error: [Errno 97] Address family not supported by protocol in dataframe.take()

2018-11-07 Thread Ruslan Dautkhanov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruslan Dautkhanov updated SPARK-25958: -- Issue Type: Bug (was: New Feature) > error: [Errno 97] Address family not supported

[jira] [Assigned] (SPARK-25956) Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25956: Assignee: (was: Apache Spark) > Make Scala 2.12 as default Scala version in Spark

[jira] [Commented] (SPARK-25956) Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678848#comment-16678848 ] Apache Spark commented on SPARK-25956: -- User 'dbtsai' has created a pull request for this issue:

[jira] [Assigned] (SPARK-25956) Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25956: Assignee: Apache Spark > Make Scala 2.12 as default Scala version in Spark 3.0 >

[jira] [Commented] (SPARK-25925) Spark 2.3.1 retrieves all partitions from Hive Metastore by default

2018-11-07 Thread Adam Budde (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678861#comment-16678861 ] Adam Budde commented on SPARK-25925: [~axenol] I would definitely support making the documentation

[jira] [Commented] (SPARK-25956) Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678849#comment-16678849 ] Apache Spark commented on SPARK-25956: -- User 'dbtsai' has created a pull request for this issue:

[jira] [Resolved] (SPARK-25897) Cannot run k8s integration tests in sbt

2018-11-07 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-25897. Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 22909

[jira] [Commented] (SPARK-25966) "EOF Reached the end of stream with bytes left to read" while reading/writing to Parquets

2018-11-07 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678728#comment-16678728 ] Ryan Blue commented on SPARK-25966: --- [~andrioni], were there any failed tasks or executors in the job

[jira] [Commented] (SPARK-25967) sql.functions.trim() should remove trailing and leading tabs

2018-11-07 Thread kevin yu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678700#comment-16678700 ] kevin yu commented on SPARK-25967: -- Hello Victor: I see, by SQL2003 standard, the TRIM function removes

[jira] [Comment Edited] (SPARK-25967) sql.functions.trim() should remove trailing and leading tabs

2018-11-07 Thread Victor Sahin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678657#comment-16678657 ] Victor Sahin edited comment on SPARK-25967 at 11/7/18 7:13 PM: --- That is

[jira] [Commented] (SPARK-25967) sql.functions.trim() should remove trailing and leading tabs

2018-11-07 Thread Victor Sahin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678657#comment-16678657 ] Victor Sahin commented on SPARK-25967: -- That is not very intuitive to manually specify especially

[jira] [Commented] (SPARK-25967) sql.functions.trim() should remove trailing and leading tabs

2018-11-07 Thread kevin yu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678614#comment-16678614 ] kevin yu commented on SPARK-25967: -- Hello Victor: You can specify the tabs as specified characters to

[jira] [Commented] (SPARK-25966) "EOF Reached the end of stream with bytes left to read" while reading/writing to Parquets

2018-11-07 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678595#comment-16678595 ] Cheng Lian commented on SPARK-25966: [~andrioni], just realized that I might misunderstand this part

[jira] [Issue Comment Deleted] (SPARK-25959) Difference in featureImportances results on computed vs saved models

2018-11-07 Thread shahid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shahid updated SPARK-25959: --- Comment: was deleted (was: Thanks. I will analyze the issue. ) > Difference in featureImportances results

[jira] [Comment Edited] (SPARK-25966) "EOF Reached the end of stream with bytes left to read" while reading/writing to Parquets

2018-11-07 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678542#comment-16678542 ] Cheng Lian edited comment on SPARK-25966 at 11/7/18 5:34 PM: - Hey,

[jira] [Comment Edited] (SPARK-25966) "EOF Reached the end of stream with bytes left to read" while reading/writing to Parquets

2018-11-07 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678542#comment-16678542 ] Cheng Lian edited comment on SPARK-25966 at 11/7/18 5:34 PM: - Hey,

[jira] [Commented] (SPARK-25966) "EOF Reached the end of stream with bytes left to read" while reading/writing to Parquets

2018-11-07 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678542#comment-16678542 ] Cheng Lian commented on SPARK-25966: Hey, [~andrioni], if you still have the original (potentially)

[jira] [Commented] (SPARK-25966) "EOF Reached the end of stream with bytes left to read" while reading/writing to Parquets

2018-11-07 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678527#comment-16678527 ] Xiao Li commented on SPARK-25966: - Do you still have the file that fail your job? Can you use the

[jira] [Created] (SPARK-25967) sql.functions.trim() should remove trailing and leading tabs

2018-11-07 Thread Victor Sahin (JIRA)
Victor Sahin created SPARK-25967: Summary: sql.functions.trim() should remove trailing and leading tabs Key: SPARK-25967 URL: https://issues.apache.org/jira/browse/SPARK-25967 Project: Spark

[jira] [Updated] (SPARK-25967) sql.functions.trim() should remove trailing and leading tabs

2018-11-07 Thread Victor Sahin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victor Sahin updated SPARK-25967: - Description: sql.functions.trim removes only trailing and leading whitespaces. Removing tabs as

[jira] [Commented] (SPARK-25966) "EOF Reached the end of stream with bytes left to read" while reading/writing to Parquets

2018-11-07 Thread Yuming Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678475#comment-16678475 ] Yuming Wang commented on SPARK-25966: - Thanks [~andrioni] Is there an easy way to reproduce it? >

[jira] [Updated] (SPARK-25958) error: [Errno 97] Address family not supported by protocol in dataframe.take()

2018-11-07 Thread Ruslan Dautkhanov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruslan Dautkhanov updated SPARK-25958: -- Description: Following error happens on a heavy Spark job after 4 hours of runtime..

[jira] [Issue Comment Deleted] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-11-07 Thread bharath kumar avusherla (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bharath kumar avusherla updated SPARK-23050: Comment: was deleted (was: [~ste...@apache.org], I can start working on

[jira] [Created] (SPARK-25966) "EOF Reached the end of stream with bytes left to read" while reading/writing to Parquets

2018-11-07 Thread Alessandro Andrioni (JIRA)
Alessandro Andrioni created SPARK-25966: --- Summary: "EOF Reached the end of stream with bytes left to read" while reading/writing to Parquets Key: SPARK-25966 URL:

[jira] [Updated] (SPARK-25908) Remove old deprecated items in Spark 3

2018-11-07 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-25908: -- Description: There are many deprecated methods and classes in Spark. They _can_ be removed in Spark

[jira] [Assigned] (SPARK-25964) Revise OrcReadBenchmark/DataSourceReadBenchmark case names and execution instructions

2018-11-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25964: Assignee: Apache Spark > Revise OrcReadBenchmark/DataSourceReadBenchmark case names and

[jira] [Assigned] (SPARK-25964) Revise OrcReadBenchmark/DataSourceReadBenchmark case names and execution instructions

2018-11-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25964: Assignee: (was: Apache Spark) > Revise OrcReadBenchmark/DataSourceReadBenchmark case

[jira] [Created] (SPARK-25965) Add read benchmark for Avro

2018-11-07 Thread Gengliang Wang (JIRA)
Gengliang Wang created SPARK-25965: -- Summary: Add read benchmark for Avro Key: SPARK-25965 URL: https://issues.apache.org/jira/browse/SPARK-25965 Project: Spark Issue Type: Sub-task

[jira] [Resolved] (SPARK-25885) HighlyCompressedMapStatus deserialization optimization

2018-11-07 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-25885. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 22894

[jira] [Assigned] (SPARK-25885) HighlyCompressedMapStatus deserialization optimization

2018-11-07 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-25885: - Assignee: Artem Kupchinskiy > HighlyCompressedMapStatus deserialization optimization >

[jira] [Commented] (SPARK-25964) Revise OrcReadBenchmark/DataSourceReadBenchmark case names and execution instructions

2018-11-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678353#comment-16678353 ] Apache Spark commented on SPARK-25964: -- User 'gengliangwang' has created a pull request for this

[jira] [Created] (SPARK-25964) Revise OrcReadBenchmark/DataSourceReadBenchmark case names and execution instructions

2018-11-07 Thread Gengliang Wang (JIRA)
Gengliang Wang created SPARK-25964: -- Summary: Revise OrcReadBenchmark/DataSourceReadBenchmark case names and execution instructions Key: SPARK-25964 URL: https://issues.apache.org/jira/browse/SPARK-25964

[jira] [Assigned] (SPARK-25963) Optimize generate followed by window

2018-11-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25963: Assignee: Apache Spark > Optimize generate followed by window >

[jira] [Commented] (SPARK-25963) Optimize generate followed by window

2018-11-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678149#comment-16678149 ] Apache Spark commented on SPARK-25963: -- User 'uzadude' has created a pull request for this issue:

[jira] [Commented] (SPARK-25963) Optimize generate followed by window

2018-11-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678148#comment-16678148 ] Apache Spark commented on SPARK-25963: -- User 'uzadude' has created a pull request for this issue:

[jira] [Assigned] (SPARK-25963) Optimize generate followed by window

2018-11-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25963: Assignee: (was: Apache Spark) > Optimize generate followed by window >

[jira] [Created] (SPARK-25963) Optimize generate followed by window

2018-11-07 Thread Ohad Raviv (JIRA)
Ohad Raviv created SPARK-25963: -- Summary: Optimize generate followed by window Key: SPARK-25963 URL: https://issues.apache.org/jira/browse/SPARK-25963 Project: Spark Issue Type: Improvement

[jira] [Resolved] (SPARK-25904) Avoid allocating arrays too large for JVMs

2018-11-07 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid resolved SPARK-25904. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 22818

[jira] [Assigned] (SPARK-25904) Avoid allocating arrays too large for JVMs

2018-11-07 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid reassigned SPARK-25904: Assignee: Imran Rashid > Avoid allocating arrays too large for JVMs >

[jira] [Commented] (SPARK-25962) Specify minimum versions for both pydocstyle and flake8 in 'lint-python' script

2018-11-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678035#comment-16678035 ] Apache Spark commented on SPARK-25962: -- User 'HyukjinKwon' has created a pull request for this

[jira] [Commented] (SPARK-25962) Specify minimum versions for both pydocstyle and flake8 in 'lint-python' script

2018-11-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678034#comment-16678034 ] Apache Spark commented on SPARK-25962: -- User 'HyukjinKwon' has created a pull request for this

[jira] [Assigned] (SPARK-25962) Specify minimum versions for both pydocstyle and flake8 in 'lint-python' script

2018-11-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25962: Assignee: (was: Apache Spark) > Specify minimum versions for both pydocstyle and

[jira] [Assigned] (SPARK-25962) Specify minimum versions for both pydocstyle and flake8 in 'lint-python' script

2018-11-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25962: Assignee: Apache Spark > Specify minimum versions for both pydocstyle and flake8 in

[jira] [Created] (SPARK-25962) Specify minimum versions for both pydocstyle and flake8 in 'lint-python' script

2018-11-07 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-25962: Summary: Specify minimum versions for both pydocstyle and flake8 in 'lint-python' script Key: SPARK-25962 URL: https://issues.apache.org/jira/browse/SPARK-25962

[jira] [Commented] (SPARK-25921) Python worker reuse causes Barrier tasks to run without BarrierTaskContext

2018-11-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677959#comment-16677959 ] Apache Spark commented on SPARK-25921: -- User 'xuanyuanking' has created a pull request for this

[jira] [Assigned] (SPARK-25921) Python worker reuse causes Barrier tasks to run without BarrierTaskContext

2018-11-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25921: Assignee: (was: Apache Spark) > Python worker reuse causes Barrier tasks to run

[jira] [Commented] (SPARK-25921) Python worker reuse causes Barrier tasks to run without BarrierTaskContext

2018-11-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677954#comment-16677954 ] Apache Spark commented on SPARK-25921: -- User 'xuanyuanking' has created a pull request for this

[jira] [Assigned] (SPARK-25921) Python worker reuse causes Barrier tasks to run without BarrierTaskContext

2018-11-07 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25921: Assignee: Apache Spark > Python worker reuse causes Barrier tasks to run without

[jira] [Created] (SPARK-25961) 处理数据倾斜时使用随机数不支持

2018-11-07 Thread zengxl (JIRA)
zengxl created SPARK-25961: -- Summary: 处理数据倾斜时使用随机数不支持 Key: SPARK-25961 URL: https://issues.apache.org/jira/browse/SPARK-25961 Project: Spark Issue Type: Bug Components: SQL Affects