[jira] [Commented] (SPARK-26713) PipedRDD may holds stdin writer and stdout read threads even if the task is finished

2019-01-23 Thread Xianjin YE (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750836#comment-16750836 ] Xianjin YE commented on SPARK-26713: I have fixed and tested this issue in our internal cluster,

[jira] [Created] (SPARK-26713) PipedRDD may holds stdin writer and stdout read threads even if the task is finished

2019-01-23 Thread Xianjin YE (JIRA)
Xianjin YE created SPARK-26713: -- Summary: PipedRDD may holds stdin writer and stdout read threads even if the task is finished Key: SPARK-26713 URL: https://issues.apache.org/jira/browse/SPARK-26713

[jira] [Commented] (SPARK-26699) Dataset column output discrepancies

2019-01-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750761#comment-16750761 ] Hyukjin Kwon commented on SPARK-26699: -- WrappedArray is a {{Seq}} anyway. So shouldn't be a big

[jira] [Resolved] (SPARK-26699) Dataset column output discrepancies

2019-01-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-26699. -- Resolution: Invalid Questions should go to mailing list. You could have a better answer from

[jira] [Commented] (SPARK-24615) Accelerator-aware task scheduling for Spark

2019-01-23 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750751#comment-16750751 ] Felix Cheung commented on SPARK-24615: -- We are interested to know as well.   [~mengxr] touched on

[jira] [Commented] (SPARK-26701) spark thrift server driver memory leak

2019-01-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750759#comment-16750759 ] Hyukjin Kwon commented on SPARK-26701: -- Please include "memory analysis" and reproducible steps. >

[jira] [Updated] (SPARK-26701) spark thrift server driver memory leak

2019-01-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-26701: - Priority: Major (was: Blocker) > spark thrift server driver memory leak >

[jira] [Commented] (SPARK-26703) Hive record writer will always depends on parquet-1.6 writer should fix it

2019-01-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750756#comment-16750756 ] Hyukjin Kwon commented on SPARK-26703: -- To do this, it should upgrade Hive rather than switching

[jira] [Updated] (SPARK-26705) UnsafeHashedRelation changed after broadcasted

2019-01-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-26705: - Priority: Major (was: Critical) > UnsafeHashedRelation changed after broadcasted >

[jira] [Assigned] (SPARK-26709) OptimizeMetadataOnlyQuery does not correctly handle the files with zero record

2019-01-23 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-26709: --- Assignee: Gengliang Wang (was: Xiao Li) > OptimizeMetadataOnlyQuery does not correctly handle the

[jira] [Commented] (SPARK-26711) JSON Schema inference takes 15 times longer

2019-01-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750748#comment-16750748 ] Hyukjin Kwon commented on SPARK-26711: -- Hm, the results say something is wrong hm. 50 sec <> 7 mins

[jira] [Resolved] (SPARK-26706) Fix Cast$mayTruncate for bytes

2019-01-23 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai resolved SPARK-26706. - Resolution: Resolved > Fix Cast$mayTruncate for bytes > -- > >

[jira] [Commented] (SPARK-26711) JSON Schema inference takes 15 times longer

2019-01-23 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750741#comment-16750741 ] Bruce Robbins commented on SPARK-26711: --- [~hyukjin.kwon] inferTimestamp=: ~13 min

[jira] [Commented] (SPARK-26413) SPIP: RDD Arrow Support in Spark Core and PySpark

2019-01-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750740#comment-16750740 ] Hyukjin Kwon commented on SPARK-26413: -- I think SPARK-26412 can be resolved together if this one is

[jira] [Updated] (SPARK-26712) Disk broken causing YarnShuffleSerivce not available

2019-01-23 Thread liupengcheng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liupengcheng updated SPARK-26712: - Description: Currently, `ExecutorShuffleInfo` can be recovered from file if NM recovery

[jira] [Updated] (SPARK-26712) Disk broken causing YarnShuffleSerivce not available

2019-01-23 Thread liupengcheng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liupengcheng updated SPARK-26712: - Description: Currently, `ExecutorShuffleInfo` can be recovered from file if NM recovery

[jira] [Commented] (SPARK-26709) OptimizeMetadataOnlyQuery does not correctly handle the files with zero record

2019-01-23 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750729#comment-16750729 ] Takeshi Yamamuro commented on SPARK-26709: -- I looked over the code though, not yet. plz do it

[jira] [Updated] (SPARK-26712) Disk broken causing YarnShuffleSerivce not available

2019-01-23 Thread liupengcheng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liupengcheng updated SPARK-26712: - Description: Currently, `ExecutorShuffleInfo` can be recovered from file if NM recovery

[jira] [Updated] (SPARK-26712) Disk broken causing YarnShuffleSerivce not available

2019-01-23 Thread liupengcheng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liupengcheng updated SPARK-26712: - Description: Currently, `ExecutorShuffleInfo` can be recovered from file if NM recovery

[jira] [Updated] (SPARK-26712) Disk broken causing YarnShuffleSerivce not available

2019-01-23 Thread liupengcheng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liupengcheng updated SPARK-26712: - Description: Currently, `ExecutorShuffleInfo` can be recovered from file if NM recovery

[jira] [Resolved] (SPARK-26682) Task attempt ID collision causes lost data

2019-01-23 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-26682. - Resolution: Fixed Fix Version/s: 2.4.1 3.0.0 Issue resolved by pull

[jira] [Assigned] (SPARK-26682) Task attempt ID collision causes lost data

2019-01-23 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-26682: --- Assignee: Ryan Blue > Task attempt ID collision causes lost data >

[jira] [Commented] (SPARK-26709) OptimizeMetadataOnlyQuery does not correctly handle the files with zero record

2019-01-23 Thread Gengliang Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750714#comment-16750714 ] Gengliang Wang commented on SPARK-26709: [~maropu] I can take it. Are you working on it? >

[jira] [Commented] (SPARK-26711) JSON Schema inference takes 15 times longer

2019-01-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750713#comment-16750713 ] Hyukjin Kwon commented on SPARK-26711: -- So how was the time if {{inferTimestamp}} was

[jira] [Updated] (SPARK-26711) JSON Schema inference takes 15 times longer

2019-01-23 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-26711: -- Description: I noticed that the first benchmark/case of JSONBenchmark ("JSON schema

[jira] [Commented] (SPARK-26711) JSON Schema inference takes 15 times longer

2019-01-23 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750704#comment-16750704 ] Bruce Robbins commented on SPARK-26711: --- ping [~maxgekk] [~hyukjin.kwon] > JSON Schema inference

[jira] [Updated] (SPARK-26689) Disk broken causing broadcast failure

2019-01-23 Thread liupengcheng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liupengcheng updated SPARK-26689: - Summary: Disk broken causing broadcast failure (was: Bad disk causing broadcast failure) >

[jira] [Updated] (SPARK-26712) Disk broken causing YarnShuffleSerivce not available

2019-01-23 Thread liupengcheng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liupengcheng updated SPARK-26712: - Issue Type: Bug (was: Improvement) > Disk broken causing YarnShuffleSerivce not available >

[jira] [Updated] (SPARK-26712) Disk broken causing YarnShuffleSerivce not available

2019-01-23 Thread liupengcheng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liupengcheng updated SPARK-26712: - Summary: Disk broken causing YarnShuffleSerivce not available (was: Disk broken caused NM

[jira] [Commented] (SPARK-26679) Deconflict spark.executor.pyspark.memory and spark.python.worker.memory

2019-01-23 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750696#comment-16750696 ] Felix Cheung commented on SPARK-26679: -- I find the fraction configs very confusing and there are

[jira] [Created] (SPARK-26712) Disk broken caused NM recovery failure causing YarnShuffleSerivce not available

2019-01-23 Thread liupengcheng (JIRA)
liupengcheng created SPARK-26712: Summary: Disk broken caused NM recovery failure causing YarnShuffleSerivce not available Key: SPARK-26712 URL: https://issues.apache.org/jira/browse/SPARK-26712

[jira] [Commented] (SPARK-26679) Deconflict spark.executor.pyspark.memory and spark.python.worker.memory

2019-01-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750694#comment-16750694 ] Hyukjin Kwon commented on SPARK-26679: -- Also, if it controls a ratio comparing to the whole

[jira] [Commented] (SPARK-26679) Deconflict spark.executor.pyspark.memory and spark.python.worker.memory

2019-01-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750689#comment-16750689 ] Hyukjin Kwon commented on SPARK-26679: -- What does the current {{spark.memory.fraction}} exactly

[jira] [Comment Edited] (SPARK-26679) Deconflict spark.executor.pyspark.memory and spark.python.worker.memory

2019-01-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750594#comment-16750594 ] Hyukjin Kwon edited comment on SPARK-26679 at 1/24/19 3:30 AM: --- {quote}

[jira] [Commented] (SPARK-24437) Memory leak in UnsafeHashedRelation

2019-01-23 Thread David Vogelbacher (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750674#comment-16750674 ] David Vogelbacher commented on SPARK-24437: --- [~DaveDeCaprio] I have not tested it yet but

[jira] [Assigned] (SPARK-26710) ImageSchemaSuite has some errors when running it in local laptop

2019-01-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26710: Assignee: Apache Spark > ImageSchemaSuite has some errors when running it in local

[jira] [Commented] (SPARK-26689) Bad disk causing broadcast failure

2019-01-23 Thread liupengcheng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750669#comment-16750669 ] liupengcheng commented on SPARK-26689: -- [~tgraves] We use yarn as the resource manager, and we run

[jira] [Commented] (SPARK-26389) temp checkpoint folder at executor should be deleted on graceful shutdown

2019-01-23 Thread Fengyu Cao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750667#comment-16750667 ] Fengyu Cao commented on SPARK-26389:  a force clean-up flag maybe help if not use hdfs the size of

[jira] [Assigned] (SPARK-26710) ImageSchemaSuite has some errors when running it in local laptop

2019-01-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26710: Assignee: (was: Apache Spark) > ImageSchemaSuite has some errors when running it in

[jira] [Created] (SPARK-26711) JSON Schema inference takes 15 times longer

2019-01-23 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-26711: - Summary: JSON Schema inference takes 15 times longer Key: SPARK-26711 URL: https://issues.apache.org/jira/browse/SPARK-26711 Project: Spark Issue Type:

[jira] [Updated] (SPARK-26710) ImageSchemaSuite has some errors when running it in local laptop

2019-01-23 Thread xubo245 (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated SPARK-26710: Description: ImageSchemaSuite and org.apache.spark.ml.source.image.ImageFileFormatSuite has some errors

[jira] [Created] (SPARK-26710) ImageSchemaSuite has some errors when running it in local laptop

2019-01-23 Thread xubo245 (JIRA)
xubo245 created SPARK-26710: --- Summary: ImageSchemaSuite has some errors when running it in local laptop Key: SPARK-26710 URL: https://issues.apache.org/jira/browse/SPARK-26710 Project: Spark

[jira] [Commented] (SPARK-26679) Deconflict spark.executor.pyspark.memory and spark.python.worker.memory

2019-01-23 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750660#comment-16750660 ] Imran Rashid commented on SPARK-26679: -- the java side has the same problem. Spark has no idea how

[jira] [Resolved] (SPARK-26617) CacheManager blocks during requery

2019-01-23 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-26617. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23539

[jira] [Assigned] (SPARK-26617) CacheManager blocks during requery

2019-01-23 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-26617: --- Assignee: Dave DeCaprio > CacheManager blocks during requery >

[jira] [Resolved] (SPARK-25713) Implement copy() for ColumnarArray

2019-01-23 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-25713. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23569

[jira] [Comment Edited] (SPARK-26413) SPIP: RDD Arrow Support in Spark Core and PySpark

2019-01-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731500#comment-16731500 ] Hyukjin Kwon edited comment on SPARK-26413 at 1/24/19 1:39 AM: --- I agree

[jira] [Commented] (SPARK-26709) OptimizeMetadataOnlyQuery does not correctly handle the files with zero record

2019-01-23 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750595#comment-16750595 ] Takeshi Yamamuro commented on SPARK-26709: -- Anyone is already working on this? >

[jira] [Commented] (SPARK-26679) Deconflict spark.executor.pyspark.memory and spark.python.worker.memory

2019-01-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750594#comment-16750594 ] Hyukjin Kwon commented on SPARK-26679: -- {quote} There are two extreme cases: (1) an app which does

[jira] [Updated] (SPARK-26706) Fix Cast$mayTruncate for bytes

2019-01-23 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai updated SPARK-26706: Affects Version/s: (was: 3.0.0) > Fix Cast$mayTruncate for bytes > -- > >

[jira] [Updated] (SPARK-26706) Fix Cast$mayTruncate for bytes

2019-01-23 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai updated SPARK-26706: Affects Version/s: (was: 2.4.1) (was: 2.3.3) 2.3.2

[jira] [Updated] (SPARK-26706) Fix Cast$mayTruncate for bytes

2019-01-23 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai updated SPARK-26706: Fix Version/s: (was: 2.3.4) 2.3.3 > Fix Cast$mayTruncate for bytes >

[jira] [Updated] (SPARK-26706) Fix Cast$mayTruncate for bytes

2019-01-23 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai updated SPARK-26706: Fix Version/s: 2.4.1 2.3.4 > Fix Cast$mayTruncate for bytes >

[jira] [Assigned] (SPARK-26709) OptimizeMetadataOnlyQuery does not correctly handle the files with zero record

2019-01-23 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-26709: --- Assignee: Xiao Li > OptimizeMetadataOnlyQuery does not correctly handle the files with zero record

[jira] [Updated] (SPARK-26709) OptimizeMetadataOnlyQuery does not correctly handle the files with zero record

2019-01-23 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-26709: Summary: OptimizeMetadataOnlyQuery does not correctly handle the files with zero record (was:

[jira] [Commented] (SPARK-26679) Deconflict spark.executor.pyspark.memory and spark.python.worker.memory

2019-01-23 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750504#comment-16750504 ] Imran Rashid commented on SPARK-26679: -- I agree the old name "spark.python.worker.memory" is very

[jira] [Updated] (SPARK-26709) OptimizeMetadataOnlyQuery does not correctly handle the empty files

2019-01-23 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-26709: Description: {code:java} import org.apache.spark.sql.functions.lit

[jira] [Updated] (SPARK-26709) OptimizeMetadataOnlyQuery does not correctly handle the empty files

2019-01-23 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-26709: Description: {code:java} import org.apache.spark.sql.functions.lit

[jira] [Created] (SPARK-26709) OptimizeMetadataOnlyQuery does not correctly handle the empty files

2019-01-23 Thread Xiao Li (JIRA)
Xiao Li created SPARK-26709: --- Summary: OptimizeMetadataOnlyQuery does not correctly handle the empty files Key: SPARK-26709 URL: https://issues.apache.org/jira/browse/SPARK-26709 Project: Spark

[jira] [Updated] (SPARK-26706) Fix Cast$mayTruncate for bytes

2019-01-23 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-26706: -- Affects Version/s: 2.0.2 2.1.3 2.2.3 > Fix

[jira] [Commented] (SPARK-26688) Provide configuration of initially blacklisted YARN nodes

2019-01-23 Thread Sergey (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750498#comment-16750498 ] Sergey commented on SPARK-26688: Hi There! I'm very glad that the community paid attention to my

[jira] [Commented] (SPARK-26682) Task attempt ID collision causes lost data

2019-01-23 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750486#comment-16750486 ] Shixiong Zhu commented on SPARK-26682: -- IIUC, this issue will cause a file deletion (delete the

[jira] [Commented] (SPARK-26688) Provide configuration of initially blacklisted YARN nodes

2019-01-23 Thread Mridul Muralidharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750477#comment-16750477 ] Mridul Muralidharan commented on SPARK-26688: - If this is a legitimate usecase, we should

[jira] [Commented] (SPARK-26706) Fix Cast$mayTruncate for bytes

2019-01-23 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750466#comment-16750466 ] Dongjoon Hyun commented on SPARK-26706: --- I updated the affected versions, too. > Fix

[jira] [Updated] (SPARK-26706) Fix Cast$mayTruncate for bytes

2019-01-23 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-26706: -- Affects Version/s: 1.6.3 > Fix Cast$mayTruncate for bytes > -- >

[jira] [Updated] (SPARK-26706) Fix Cast$mayTruncate for bytes

2019-01-23 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai updated SPARK-26706: Priority: Blocker (was: Critical) > Fix Cast$mayTruncate for bytes > -- > >

[jira] [Updated] (SPARK-26706) Fix Cast$mayTruncate for bytes

2019-01-23 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai updated SPARK-26706: Labels: correctness (was: ) > Fix Cast$mayTruncate for bytes > -- > >

[jira] [Updated] (SPARK-26708) Incorrect result caused by inconsistency between a SQL cache's cached RDD and its physical plan

2019-01-23 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-26708: Description: When performing non-cascading cache invalidation, {{recache}}  is called on the other cache

[jira] [Updated] (SPARK-26708) Incorrect result caused by inconsistency between a SQL cache's cached RDD and its physical plan

2019-01-23 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-26708: Labels: correctness (was: ) > Incorrect result caused by inconsistency between a SQL cache's cached RDD

[jira] [Created] (SPARK-26708) Incorrect result caused by inconsistency between a SQL cache's cached RDD and its physical plan

2019-01-23 Thread Xiao Li (JIRA)
Xiao Li created SPARK-26708: --- Summary: Incorrect result caused by inconsistency between a SQL cache's cached RDD and its physical plan Key: SPARK-26708 URL: https://issues.apache.org/jira/browse/SPARK-26708

[jira] [Assigned] (SPARK-26706) Fix Cast$mayTruncate for bytes

2019-01-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26706: Assignee: Anton Okolnychyi (was: Apache Spark) > Fix Cast$mayTruncate for bytes >

[jira] [Updated] (SPARK-26682) Task attempt ID collision causes lost data

2019-01-23 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-26682: --- Target Version/s: 2.3.3, 2.4.1 Labels: data-loss (was: )

[jira] [Assigned] (SPARK-26706) Fix Cast$mayTruncate for bytes

2019-01-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26706: Assignee: Apache Spark (was: Anton Okolnychyi) > Fix Cast$mayTruncate for bytes >

[jira] [Commented] (SPARK-24615) Accelerator-aware task scheduling for Spark

2019-01-23 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750375#comment-16750375 ] Thomas Graves commented on SPARK-24615: --- [~jerryshao]  just curious where this is at, are you

[jira] [Updated] (SPARK-26706) Fix Cast$mayTruncate for bytes

2019-01-23 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai updated SPARK-26706: Priority: Critical (was: Major) > Fix Cast$mayTruncate for bytes > -- > >

[jira] [Assigned] (SPARK-26706) Fix Cast$mayTruncate for bytes

2019-01-23 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai reassigned SPARK-26706: --- Assignee: Anton Okolnychyi > Fix Cast$mayTruncate for bytes > -- > >

[jira] [Resolved] (SPARK-26704) docker-image-tool.sh should copy custom Dockerfiles into the build context for inclusion in images

2019-01-23 Thread Rob Vesse (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Vesse resolved SPARK-26704. --- Resolution: Not A Problem > docker-image-tool.sh should copy custom Dockerfiles into the build

[jira] [Commented] (SPARK-26704) docker-image-tool.sh should copy custom Dockerfiles into the build context for inclusion in images

2019-01-23 Thread Rob Vesse (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750366#comment-16750366 ] Rob Vesse commented on SPARK-26704: --- Yes sorry I'm conflating with the build context with the image

[jira] [Commented] (SPARK-26704) docker-image-tool.sh should copy custom Dockerfiles into the build context for inclusion in images

2019-01-23 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750337#comment-16750337 ] Marcelo Vanzin commented on SPARK-26704: bq. they will be present and thus packaged into the

[jira] [Commented] (SPARK-26704) docker-image-tool.sh should copy custom Dockerfiles into the build context for inclusion in images

2019-01-23 Thread Rob Vesse (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750323#comment-16750323 ] Rob Vesse commented on SPARK-26704: --- For me it's a question of build reproducibility (I've been

[jira] [Updated] (SPARK-26379) Structured Streaming - Exception on adding column to Dataset

2019-01-23 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-26379: -- Affects Version/s: 3.0.0 > Structured Streaming - Exception on adding column to Dataset >

[jira] [Created] (SPARK-26707) Insert into table with single struct column fails

2019-01-23 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-26707: - Summary: Insert into table with single struct column fails Key: SPARK-26707 URL: https://issues.apache.org/jira/browse/SPARK-26707 Project: Spark Issue

[jira] [Commented] (SPARK-26379) Structured Streaming - Exception on adding column to Dataset

2019-01-23 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750304#comment-16750304 ] Dongjoon Hyun commented on SPARK-26379: --- Thank you, [~kailashgupta1012] and [~kabhwan]. >

[jira] [Updated] (SPARK-26379) Structured Streaming - Exception on adding column to Dataset

2019-01-23 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-26379: -- Affects Version/s: 2.3.1 2.3.2 2.4.0 >

[jira] [Comment Edited] (SPARK-17914) Spark SQL casting to TimestampType with nanosecond results in incorrect timestamp

2019-01-23 Thread Chaitanya P Chandurkar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750292#comment-16750292 ] Chaitanya P Chandurkar edited comment on SPARK-17914 at 1/23/19 6:20 PM:

[jira] [Commented] (SPARK-17914) Spark SQL casting to TimestampType with nanosecond results in incorrect timestamp

2019-01-23 Thread Chaitanya P Chandurkar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750292#comment-16750292 ] Chaitanya P Chandurkar commented on SPARK-17914: I'm still seeing this issue in Spark

[jira] [Comment Edited] (SPARK-17914) Spark SQL casting to TimestampType with nanosecond results in incorrect timestamp

2019-01-23 Thread Chaitanya P Chandurkar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750292#comment-16750292 ] Chaitanya P Chandurkar edited comment on SPARK-17914 at 1/23/19 6:25 PM:

[jira] [Updated] (SPARK-26699) Dataset column output discrepancies

2019-01-23 Thread Praveena (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Praveena updated SPARK-26699: - Issue Type: Question (was: Bug) > Dataset column output discrepancies >

[jira] [Commented] (SPARK-26704) docker-image-tool.sh should copy custom Dockerfiles into the build context for inclusion in images

2019-01-23 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750249#comment-16750249 ] Marcelo Vanzin commented on SPARK-26704: You mentioned in the discussion that there is no need

[jira] [Issue Comment Deleted] (SPARK-20162) Reading data from MySQL - Cannot up cast from decimal(30,6) to decimal(38,18)

2019-01-23 Thread Franco Bonazza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franco Bonazza updated SPARK-20162: --- Comment: was deleted (was:   I can reproduce this error without using Avro, as you can see

[jira] [Commented] (SPARK-26389) temp checkpoint folder at executor should be deleted on graceful shutdown

2019-01-23 Thread Gabor Somogyi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750223#comment-16750223 ] Gabor Somogyi commented on SPARK-26389: --- Good to hear with HDFS it's working. Prohibiting the

[jira] [Comment Edited] (SPARK-24417) Build and Run Spark on JDK11

2019-01-23 Thread Michael Atef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750208#comment-16750208 ] Michael Atef edited comment on SPARK-24417 at 1/23/19 4:51 PM: --- Hello, I

[jira] [Commented] (SPARK-24417) Build and Run Spark on JDK11

2019-01-23 Thread Michael Atef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750208#comment-16750208 ] Michael Atef commented on SPARK-24417: -- Hello, I am facing problems with Pyspark when I moved to

[jira] [Resolved] (SPARK-25101) Creating leaderLatch with id for getting info about spark master nodes from zk

2019-01-23 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-25101. --- Resolution: Won't Fix > Creating leaderLatch with id for getting info about spark master nodes from

[jira] [Commented] (SPARK-26413) SPIP: RDD Arrow Support in Spark Core and PySpark

2019-01-23 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750174#comment-16750174 ] Thomas Graves commented on SPARK-26413: --- Just a note I think this overlaps with 

[jira] [Assigned] (SPARK-26649) Noop Streaming Sink using DSV2

2019-01-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26649: Assignee: Apache Spark > Noop Streaming Sink using DSV2 > --

[jira] [Assigned] (SPARK-26649) Noop Streaming Sink using DSV2

2019-01-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26649: Assignee: (was: Apache Spark) > Noop Streaming Sink using DSV2 >

[jira] [Updated] (SPARK-26699) Dataset column output discrepancies

2019-01-23 Thread Praveena (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Praveena updated SPARK-26699: - Description: Hi,   When i run my job in Local mode (meaning as standalone in Eclipse) with same

[jira] [Updated] (SPARK-26699) Dataset column output discrepancies

2019-01-23 Thread Praveena (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Praveena updated SPARK-26699: - Description: Hi,   When i run my job in Local mode (meaning as standalone in Eclipse) with same

[jira] [Commented] (SPARK-26689) Bad disk causing broadcast failure

2019-01-23 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750125#comment-16750125 ] Thomas Graves commented on SPARK-26689: --- Can you add more details about your setup?  Which

  1   2   >