date:20200326

[jira] [Resolved] (SPARK-31060) Handle column names containing `dots` in data source `Filter`

2020-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31060.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27728
[https://github.com/apache/spark/pull/27728]

> Handle column names containing `dots` in data source `Filter`
> -
>
> Key: SPARK-31060
> URL: https://issues.apache.org/jira/browse/SPARK-31060
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: DB Tsai
>Assignee: DB Tsai
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31060) Handle column names containing `dots` in data source `Filter`

2020-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31060:
---

Assignee: DB Tsai

> Handle column names containing `dots` in data source `Filter`
> -
>
> Key: SPARK-31060
> URL: https://issues.apache.org/jira/browse/SPARK-31060
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: DB Tsai
>Assignee: DB Tsai
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31026) Parquet predicate pushdown on columns with dots

2020-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31026.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27728
[https://github.com/apache/spark/pull/27728]

> Parquet predicate pushdown on columns with dots
> ---
>
> Key: SPARK-31026
> URL: https://issues.apache.org/jira/browse/SPARK-31026
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: DB Tsai
>Assignee: DB Tsai
>Priority: Major
> Fix For: 3.0.0
>
>
> Parquet predicate pushdown on columns with dots is disabled in -SPARK-20364- 
> due to Parquet's APIs don't support it. A new set of APIs is purposed in 
> PARQUET-1809 to generalize the support of nested cols which can address this 
> issue. This implementation will be merged into Spark repo first until we get 
> a new release from Parquet community.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-17636) Parquet predicate pushdown for nested fields

2020-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-17636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-17636.
-
Resolution: Fixed

Issue resolved by pull request 27728
[https://github.com/apache/spark/pull/27728]

> Parquet predicate pushdown for nested fields
> 
>
> Key: SPARK-17636
> URL: https://issues.apache.org/jira/browse/SPARK-17636
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 1.6.2, 1.6.3, 2.0.2
>Reporter: Mitesh
>Assignee: DB Tsai
>Priority: Minor
> Fix For: 3.0.0
>
>
> There's a *PushedFilters* for a simple numeric field, but not for a numeric 
> field inside a struct. Not sure if this is a Spark limitation because of 
> Parquet, or only a Spark limitation.
> {noformat}
> scala> hc.read.parquet("s3a://some/parquet/file").select("day_timestamp", 
> "sale_id")
> res5: org.apache.spark.sql.DataFrame = [day_timestamp: 
> struct, sale_id: bigint]
> scala> res5.filter("sale_id > 4").queryExecution.executedPlan
> res9: org.apache.spark.sql.execution.SparkPlan =
> Filter[23814] [args=(sale_id#86324L > 
> 4)][outPart=UnknownPartitioning(0)][outOrder=List()]
> +- Scan ParquetRelation[day_timestamp#86302,sale_id#86324L] InputPaths: 
> s3a://some/parquet/file, PushedFilters: [GreaterThan(sale_id,4)]
> scala> res5.filter("day_timestamp.timestamp > 4").queryExecution.executedPlan
> res10: org.apache.spark.sql.execution.SparkPlan =
> Filter[23815] [args=(day_timestamp#86302.timestamp > 
> 4)][outPart=UnknownPartitioning(0)][outOrder=List()]
> +- Scan ParquetRelation[day_timestamp#86302,sale_id#86324L] InputPaths: 
> s3a://some/parquet/file
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31026) Parquet predicate pushdown on columns with dots

2020-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31026:
---

Assignee: DB Tsai

> Parquet predicate pushdown on columns with dots
> ---
>
> Key: SPARK-31026
> URL: https://issues.apache.org/jira/browse/SPARK-31026
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: DB Tsai
>Assignee: DB Tsai
>Priority: Major
>
> Parquet predicate pushdown on columns with dots is disabled in -SPARK-20364- 
> due to Parquet's APIs don't support it. A new set of APIs is purposed in 
> PARQUET-1809 to generalize the support of nested cols which can address this 
> issue. This implementation will be merged into Spark repo first until we get 
> a new release from Parquet community.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31088) Add back HiveContext and createExternalTable

2020-03-26 Thread Xiao Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-31088.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

> Add back HiveContext and createExternalTable
> 
>
> Key: SPARK-31088
> URL: https://issues.apache.org/jira/browse/SPARK-31088
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-25556) Predicate Pushdown for Nested fields

2020-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-25556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-25556.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27728
[https://github.com/apache/spark/pull/27728]

> Predicate Pushdown for Nested fields
> 
>
> Key: SPARK-25556
> URL: https://issues.apache.org/jira/browse/SPARK-25556
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: DB Tsai
>Assignee: DB Tsai
>Priority: Major
> Fix For: 3.0.0
>
>
> This is an umbrella JIRA to support predicate pushdown for nested fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31086) Add Back the Deprecated SQLContext methods

2020-03-26 Thread Xiao Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-31086.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

> Add Back the Deprecated SQLContext methods
> --
>
> Key: SPARK-31086
> URL: https://issues.apache.org/jira/browse/SPARK-31086
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31282) Supplement version for configuration appear in security doc

2020-03-26 Thread jiaan.geng (Jira)

jiaan.geng created SPARK-31282:
--

 Summary: Supplement version for configuration appear in security 
doc
 Key: SPARK-31282
 URL: https://issues.apache.org/jira/browse/SPARK-31282
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 3.1.0
Reporter: jiaan.geng


docs/security.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31281) Hit OOM Error - GC Limit

2020-03-26 Thread HongJin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HongJin updated SPARK-31281:

Description: 
MemoryStore is 2.6GB

conf = new SparkConf().setAppName("test")
 //.set("spark.sql.codegen.wholeStage", "false")
 .set("spark.driver.host", "localhost")
 .set("spark.driver.memory", "4g")
 .set("spark.executor.cores","1")
 .set("spark.num.executors","1")
 .set("spark.executor.memory", "4g")
 .set("spark.executor.memoryOverhead", "400m")
 .set("spark.dynamicAllocation.enabled", "true")
 .set("spark.dynamicAllocation.minExecutors","1")
 .set("spark.dynamicAllocation.maxExecutors","2")
 .set("spark.ui.enabled","true") //enable spark UI
 .set("spark.sql.shuffle.partitions",defaultPartitions)
 .setMaster("local[2]")
 sparkSession = SparkSession.builder.config(conf).getOrCreate()

 

val df = SparkFactory.sparkSession.sqlContext
 .read
 .option("header", "true")
 .option("delimiter", delimiter)
 .csv(textFileLocation)

 

joinedDf = upperCaseLeft.as("l")
 .join(upperCaseRight.as("r"), caseTransformedKeys, "full_outer")
 .select(compositeKeysCol ::: nonKeyCols.map(col => 
mapHelper(col,toleranceValue,caseSensitive)): _*)

 

data = joinedDf.take(maxRecords)

 

 

 

 

  was:
conf = new SparkConf().setAppName("test")
 //.set("spark.sql.codegen.wholeStage", "false")
 .set("spark.driver.host", "localhost")
 .set("spark.driver.memory", "4g")
 .set("spark.executor.cores","1")
 .set("spark.num.executors","1")
 .set("spark.executor.memory", "4g")
 .set("spark.executor.memoryOverhead", "400m")
 .set("spark.dynamicAllocation.enabled", "true")
 .set("spark.dynamicAllocation.minExecutors","1")
 .set("spark.dynamicAllocation.maxExecutors","2")
 .set("spark.ui.enabled","true") //enable spark UI
 .set("spark.sql.shuffle.partitions",defaultPartitions)
 .setMaster(numCores)
sparkSession = SparkSession.builder.config(conf).getOrCreate()

 

val df = SparkFactory.sparkSession.sqlContext
 .read
 .option("header", "true")
 .option("delimiter", delimiter)
 .csv(textFileLocation)

 

joinedDf = upperCaseLeft.as("l")
 .join(upperCaseRight.as("r"), caseTransformedKeys, "full_outer")
 .select(compositeKeysCol ::: nonKeyCols.map(col => 
mapHelper(col,toleranceValue,caseSensitive)): _*)

 

data = joinedDf.take(maxRecords)

 

 

 

 


> Hit OOM Error - GC Limit
> 
>
> Key: SPARK-31281
> URL: https://issues.apache.org/jira/browse/SPARK-31281
> Project: Spark
>  Issue Type: Question
>  Components: Java API
>Affects Versions: 2.4.4
>Reporter: HongJin
>Priority: Critical
>
> MemoryStore is 2.6GB
> conf = new SparkConf().setAppName("test")
>  //.set("spark.sql.codegen.wholeStage", "false")
>  .set("spark.driver.host", "localhost")
>  .set("spark.driver.memory", "4g")
>  .set("spark.executor.cores","1")
>  .set("spark.num.executors","1")
>  .set("spark.executor.memory", "4g")
>  .set("spark.executor.memoryOverhead", "400m")
>  .set("spark.dynamicAllocation.enabled", "true")
>  .set("spark.dynamicAllocation.minExecutors","1")
>  .set("spark.dynamicAllocation.maxExecutors","2")
>  .set("spark.ui.enabled","true") //enable spark UI
>  .set("spark.sql.shuffle.partitions",defaultPartitions)
>  .setMaster("local[2]")
>  sparkSession = SparkSession.builder.config(conf).getOrCreate()
>  
> val df = SparkFactory.sparkSession.sqlContext
>  .read
>  .option("header", "true")
>  .option("delimiter", delimiter)
>  .csv(textFileLocation)
>  
> joinedDf = upperCaseLeft.as("l")
>  .join(upperCaseRight.as("r"), caseTransformedKeys, "full_outer")
>  .select(compositeKeysCol ::: nonKeyCols.map(col => 
> mapHelper(col,toleranceValue,caseSensitive)): _*)
>  
> data = joinedDf.take(maxRecords)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31281) Hit OOM Error - GC Limit

2020-03-26 Thread HongJin (Jira)

HongJin created SPARK-31281:
---

 Summary: Hit OOM Error - GC Limit
 Key: SPARK-31281
 URL: https://issues.apache.org/jira/browse/SPARK-31281
 Project: Spark
  Issue Type: Question
  Components: Java API
Affects Versions: 2.4.4
Reporter: HongJin


conf = new SparkConf().setAppName("test")
 //.set("spark.sql.codegen.wholeStage", "false")
 .set("spark.driver.host", "localhost")
 .set("spark.driver.memory", "4g")
 .set("spark.executor.cores","1")
 .set("spark.num.executors","1")
 .set("spark.executor.memory", "4g")
 .set("spark.executor.memoryOverhead", "400m")
 .set("spark.dynamicAllocation.enabled", "true")
 .set("spark.dynamicAllocation.minExecutors","1")
 .set("spark.dynamicAllocation.maxExecutors","2")
 .set("spark.ui.enabled","true") //enable spark UI
 .set("spark.sql.shuffle.partitions",defaultPartitions)
 .setMaster(numCores)
sparkSession = SparkSession.builder.config(conf).getOrCreate()

 

val df = SparkFactory.sparkSession.sqlContext
 .read
 .option("header", "true")
 .option("delimiter", delimiter)
 .csv(textFileLocation)

 

joinedDf = upperCaseLeft.as("l")
 .join(upperCaseRight.as("r"), caseTransformedKeys, "full_outer")
 .select(compositeKeysCol ::: nonKeyCols.map(col => 
mapHelper(col,toleranceValue,caseSensitive)): _*)

 

data = joinedDf.take(maxRecords)

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31243) add ANOVATest and FValueTest to PySpark

2020-03-26 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng resolved SPARK-31243.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 28012
[https://github.com/apache/spark/pull/28012]

> add ANOVATest and FValueTest to PySpark
> ---
>
> Key: SPARK-31243
> URL: https://issues.apache.org/jira/browse/SPARK-31243
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Minor
> Fix For: 3.1.0
>
>
> add ANOVATest and FValueTest to Python side,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31243) add ANOVATest and FValueTest to PySpark

2020-03-26 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reassigned SPARK-31243:


Assignee: Huaxin Gao

> add ANOVATest and FValueTest to PySpark
> ---
>
> Key: SPARK-31243
> URL: https://issues.apache.org/jira/browse/SPARK-31243
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Minor
>
> add ANOVATest and FValueTest to Python side,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31275) Improve the metrics format in ExecutionPage for StageId

2020-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31275.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28039
[https://github.com/apache/spark/pull/28039]

> Improve the metrics format in ExecutionPage for StageId
> ---
>
> Key: SPARK-31275
> URL: https://issues.apache.org/jira/browse/SPARK-31275
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 3.0.0
>
>
> In ExecutionPage, the metrics for stageId and attemptId are displayed like 
> "stageId (attempt)" but the format "stageId.attempt" is more standard in 
> Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-30443) "Managed memory leak detected" even with no calls to take() or limit()

2020-03-26 Thread Xiaoju Wu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068290#comment-17068290
 ] 

Xiaoju Wu edited comment on SPARK-30443 at 3/27/20, 5:50 AM:
-

Also see this kind of warning logs. SPARK-21492 may relate to this warning. 
Does your code base contain it?
And I'm afraid there could be other consumers not release memory by themselves 
but let the task release all memory related to taskId at the end of task.


was (Author: xiaojuwu):
Also see this kind of warning logs. SPARK-21492 may relate to this warning. 
Does your code base contain it?

> "Managed memory leak detected" even with no calls to take() or limit()
> --
>
> Key: SPARK-30443
> URL: https://issues.apache.org/jira/browse/SPARK-30443
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.4, 3.0.0
>Reporter: Luke Richter
>Priority: Major
> Attachments: a.csv.zip, b.csv.zip, c.csv.zip
>
>
> Our Spark code is causing a "Managed memory leak detected" warning to appear, 
> even though we are not calling take() or limit().
> According to SPARK-14168 https://issues.apache.org/jira/browse/SPARK-14168 
> managed memory leaks should only be caused by not reading an iterator to 
> completion, i.e. take() or limit()
> Our exact warning text is: "2020-01-06 14:54:59 WARN Executor:66 - Managed 
> memory leak detected; size = 2097152 bytes, TID = 118"
>  The size of the managed memory leak is always 2MB.
> I have created a minimal test program that reproduces the warning: 
> {code:java}
> import pyspark.sql
> import pyspark.sql.functions as fx
> def main():
> builder = pyspark.sql.SparkSession.builder
> builder = builder.appName("spark-jira")
> spark = builder.getOrCreate()
> reader = spark.read
> reader = reader.format("csv")
> reader = reader.option("inferSchema", "true")
> reader = reader.option("header", "true")
> table_c = reader.load("c.csv")
> table_a = reader.load("a.csv")
> table_b = reader.load("b.csv")
> primary_filter = fx.col("some_code").isNull()
> new_primary_data = table_a.filter(primary_filter)
> new_ids = new_primary_data.select("some_id")
> new_data = table_b.join(new_ids, "some_id")
> new_data = new_data.select("some_id")
> result = table_c.join(new_data, "some_id", "left")
> result.repartition(1).write.json("results.json", mode="overwrite")
> spark.stop()
> if __name__ == "__main__":
> main()
> {code}
> Our code isn't anything out of the ordinary, just some filters, selects and 
> joins.
> The input data is made up of 3 CSV files. The input data files are quite 
> large, roughly 2.6GB in total uncompressed. I attempted to reduce the number 
> of rows in the CSV input files but this caused the warning to no longer 
> appear. After compressing the files I was able to attach them below.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30443) "Managed memory leak detected" even with no calls to take() or limit()

2020-03-26 Thread Xiaoju Wu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068290#comment-17068290
 ] 

Xiaoju Wu commented on SPARK-30443:
---

Also see this kind of warning logs. SPARK-21492 may relate to this warning. 
Does your code base contain it?

> "Managed memory leak detected" even with no calls to take() or limit()
> --
>
> Key: SPARK-30443
> URL: https://issues.apache.org/jira/browse/SPARK-30443
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.4, 3.0.0
>Reporter: Luke Richter
>Priority: Major
> Attachments: a.csv.zip, b.csv.zip, c.csv.zip
>
>
> Our Spark code is causing a "Managed memory leak detected" warning to appear, 
> even though we are not calling take() or limit().
> According to SPARK-14168 https://issues.apache.org/jira/browse/SPARK-14168 
> managed memory leaks should only be caused by not reading an iterator to 
> completion, i.e. take() or limit()
> Our exact warning text is: "2020-01-06 14:54:59 WARN Executor:66 - Managed 
> memory leak detected; size = 2097152 bytes, TID = 118"
>  The size of the managed memory leak is always 2MB.
> I have created a minimal test program that reproduces the warning: 
> {code:java}
> import pyspark.sql
> import pyspark.sql.functions as fx
> def main():
> builder = pyspark.sql.SparkSession.builder
> builder = builder.appName("spark-jira")
> spark = builder.getOrCreate()
> reader = spark.read
> reader = reader.format("csv")
> reader = reader.option("inferSchema", "true")
> reader = reader.option("header", "true")
> table_c = reader.load("c.csv")
> table_a = reader.load("a.csv")
> table_b = reader.load("b.csv")
> primary_filter = fx.col("some_code").isNull()
> new_primary_data = table_a.filter(primary_filter)
> new_ids = new_primary_data.select("some_id")
> new_data = table_b.join(new_ids, "some_id")
> new_data = new_data.select("some_id")
> result = table_c.join(new_data, "some_id", "left")
> result.repartition(1).write.json("results.json", mode="overwrite")
> spark.stop()
> if __name__ == "__main__":
> main()
> {code}
> Our code isn't anything out of the ordinary, just some filters, selects and 
> joins.
> The input data is made up of 3 CSV files. The input data files are quite 
> large, roughly 2.6GB in total uncompressed. I attempted to reduce the number 
> of rows in the CSV input files but this caused the warning to no longer 
> appear. After compressing the files I was able to attach them below.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31280) Perform propagating empty relation after RewritePredicateSubquery

2020-03-26 Thread Kent Yao (Jira)

Kent Yao created SPARK-31280:


 Summary: Perform propagating empty relation after 
RewritePredicateSubquery
 Key: SPARK-31280
 URL: https://issues.apache.org/jira/browse/SPARK-31280
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Kent Yao


{code:java}
scala> spark.sql(" select * from values(1), (2) t(key) where key in (select 1 
as key where 1=0)").queryExecution
res15: org.apache.spark.sql.execution.QueryExecution =
== Parsed Logical Plan ==
'Project [*]
+- 'Filter 'key IN (list#39 [])
   :  +- Project [1 AS key#38]
   : +- Filter (1 = 0)
   :+- OneRowRelation
   +- 'SubqueryAlias t
  +- 'UnresolvedInlineTable [key], [List(1), List(2)]

== Analyzed Logical Plan ==
key: int
Project [key#40]
+- Filter key#40 IN (list#39 [])
   :  +- Project [1 AS key#38]
   : +- Filter (1 = 0)
   :+- OneRowRelation
   +- SubqueryAlias t
  +- LocalRelation [key#40]

== Optimized Logical Plan ==
Join LeftSemi, (key#40 = key#38)
:- LocalRelation [key#40]
+- LocalRelation , [key#38]

== Physical Plan ==
*(1) BroadcastHashJoin [key#40], [key#38], LeftSemi, BuildRight
:- *(1) LocalTableScan [key#40]
+- Br...
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31279) Add version information to the configuration of Hive

2020-03-26 Thread jiaan.geng (Jira)

jiaan.geng created SPARK-31279:
--

 Summary: Add version information to the configuration of Hive
 Key: SPARK-31279
 URL: https://issues.apache.org/jira/browse/SPARK-31279
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: jiaan.geng


sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31204) HiveResult compatibility for DatasourceV2 command

2020-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31204:
---

Assignee: Terry Kim

> HiveResult compatibility for DatasourceV2 command
> -
>
> Key: SPARK-31204
> URL: https://issues.apache.org/jira/browse/SPARK-31204
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Juliusz Sompolski
>Assignee: Terry Kim
>Priority: Major
>
> HiveResult performs some compatibility matches and conversions for commands 
> to be compatible with Hive output, e.g.:
> {code}
> case ExecutedCommandExec(_: DescribeCommandBase) =>
>   // If it is a describe command for a Hive table, we want to have the 
> output format
>   // be similar with Hive.
> ...
> // SHOW TABLES in Hive only output table names, while ours output 
> database, table name, isTemp.
> case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended 
> =>
> {code}
> It is needed for DatasourceV2 commands as well (eg. ShowTablesExec...).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31204) HiveResult compatibility for DatasourceV2 command

2020-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31204.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28004
[https://github.com/apache/spark/pull/28004]

> HiveResult compatibility for DatasourceV2 command
> -
>
> Key: SPARK-31204
> URL: https://issues.apache.org/jira/browse/SPARK-31204
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Juliusz Sompolski
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.0.0
>
>
> HiveResult performs some compatibility matches and conversions for commands 
> to be compatible with Hive output, e.g.:
> {code}
> case ExecutedCommandExec(_: DescribeCommandBase) =>
>   // If it is a describe command for a Hive table, we want to have the 
> output format
>   // be similar with Hive.
> ...
> // SHOW TABLES in Hive only output table names, while ours output 
> database, table name, isTemp.
> case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended 
> =>
> {code}
> It is needed for DatasourceV2 commands as well (eg. ShowTablesExec...).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31170) Spark Cli does not respect hive-site.xml and spark.sql.warehouse.dir

2020-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31170.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27969
[https://github.com/apache/spark/pull/27969]

> Spark Cli does not respect hive-site.xml and spark.sql.warehouse.dir
> 
>
> Key: SPARK-31170
> URL: https://issues.apache.org/jira/browse/SPARK-31170
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.0
>
>
> In Spark CLI, we create a hive CliSessionState and it does not load the 
> hive-site.xml. So the configurations in hive-site.xml will not take effects 
> like other spark-hive integration apps.
> Also, the warehouse directory is not correctly picked. If the `default` 
> database does not exist, the CliSessionState will create one during the first 
> time it talks to the metastore. The `Location` of the default DB will be 
> neither the value of spark.sql.warehousr.dir nor the user-specified value of 
> hive.metastore.warehourse.dir, but the default value of 
> hive.metastore.warehourse.dir which will always be `/user/hive/warehouse`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31186) toPandas fails on simple query (collect() works)

2020-03-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-31186.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28025
[https://github.com/apache/spark/pull/28025]

> toPandas fails on simple query (collect() works)
> 
>
> Key: SPARK-31186
> URL: https://issues.apache.org/jira/browse/SPARK-31186
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4
>Reporter: Michael Chirico
>Assignee: L. C. Hsieh
>Priority: Minor
> Fix For: 3.0.0
>
>
> My pandas is 0.25.1.
> I ran the following simple code (cross joins are enabled):
> {code:python}
> spark.sql('''
> select t1.*, t2.* from (
>   select explode(sequence(1, 3)) v
> ) t1 left join (
>   select explode(sequence(1, 3)) v
> ) t2
> ''').toPandas()
> {code}
> and got a ValueError from pandas:
> > ValueError: The truth value of a Series is ambiguous. Use a.empty, 
> > a.bool(), a.item(), a.any() or a.all().
> Collect works fine:
> {code:python}
> spark.sql('''
> select * from (
>   select explode(sequence(1, 3)) v
> ) t1 left join (
>   select explode(sequence(1, 3)) v
> ) t2
> ''').collect()
> # [Row(v=1, v=1),
> #  Row(v=1, v=2),
> #  Row(v=1, v=3),
> #  Row(v=2, v=1),
> #  Row(v=2, v=2),
> #  Row(v=2, v=3),
> #  Row(v=3, v=1),
> #  Row(v=3, v=2),
> #  Row(v=3, v=3)]
> {code}
> I imagine it's related to the duplicate column names, but this doesn't fail:
> {code:python}
> spark.sql("select 1 v, 1 v").toPandas()
> # v   v
> # 0   1   1
> {code}
> Also no issue for multiple rows:
> spark.sql("select 1 v, 1 v union all select 1 v, 2 v").toPandas()
> It also works when not using a cross join but a janky 
> programatically-generated union all query:
> {code:python}
> cond = []
> for ii in range(3):
> for jj in range(3):
> cond.append(f'select {ii+1} v, {jj+1} v')
> spark.sql(' union all '.join(cond)).toPandas()
> {code}
> As near as I can tell, the output is identical to the explode output, making 
> this issue all the more peculiar, as I thought toPandas() is applied to the 
> output of collect(), so if collect() gives the same output, how can 
> toPandas() fail in one case and not the other? Further, the lazy DataFrame is 
> the same: DataFrame[v: int, v: int] in both cases. I must be missing 
> something.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31186) toPandas fails on simple query (collect() works)

2020-03-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-31186:


Assignee: L. C. Hsieh

> toPandas fails on simple query (collect() works)
> 
>
> Key: SPARK-31186
> URL: https://issues.apache.org/jira/browse/SPARK-31186
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4
>Reporter: Michael Chirico
>Assignee: L. C. Hsieh
>Priority: Minor
>
> My pandas is 0.25.1.
> I ran the following simple code (cross joins are enabled):
> {code:python}
> spark.sql('''
> select t1.*, t2.* from (
>   select explode(sequence(1, 3)) v
> ) t1 left join (
>   select explode(sequence(1, 3)) v
> ) t2
> ''').toPandas()
> {code}
> and got a ValueError from pandas:
> > ValueError: The truth value of a Series is ambiguous. Use a.empty, 
> > a.bool(), a.item(), a.any() or a.all().
> Collect works fine:
> {code:python}
> spark.sql('''
> select * from (
>   select explode(sequence(1, 3)) v
> ) t1 left join (
>   select explode(sequence(1, 3)) v
> ) t2
> ''').collect()
> # [Row(v=1, v=1),
> #  Row(v=1, v=2),
> #  Row(v=1, v=3),
> #  Row(v=2, v=1),
> #  Row(v=2, v=2),
> #  Row(v=2, v=3),
> #  Row(v=3, v=1),
> #  Row(v=3, v=2),
> #  Row(v=3, v=3)]
> {code}
> I imagine it's related to the duplicate column names, but this doesn't fail:
> {code:python}
> spark.sql("select 1 v, 1 v").toPandas()
> # v   v
> # 0   1   1
> {code}
> Also no issue for multiple rows:
> spark.sql("select 1 v, 1 v union all select 1 v, 2 v").toPandas()
> It also works when not using a cross join but a janky 
> programatically-generated union all query:
> {code:python}
> cond = []
> for ii in range(3):
> for jj in range(3):
> cond.append(f'select {ii+1} v, {jj+1} v')
> spark.sql(' union all '.join(cond)).toPandas()
> {code}
> As near as I can tell, the output is identical to the explode output, making 
> this issue all the more peculiar, as I thought toPandas() is applied to the 
> output of collect(), so if collect() gives the same output, how can 
> toPandas() fail in one case and not the other? Further, the lazy DataFrame is 
> the same: DataFrame[v: int, v: int] in both cases. I must be missing 
> something.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25641) Change the spark.shuffle.server.chunkFetchHandlerThreadsPercent default to 100

2020-03-26 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-25641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068232#comment-17068232
 ] 

Dongjoon Hyun commented on SPARK-25641:
---

This commit is technically reverted via SPARK-30623 .

> Change the spark.shuffle.server.chunkFetchHandlerThreadsPercent default to 100
> --
>
> Key: SPARK-25641
> URL: https://issues.apache.org/jira/browse/SPARK-25641
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Sanket Reddy
>Assignee: Sanket Reddy
>Priority: Minor
> Fix For: 3.0.0
>
>
> We want to change the default percentage to 100 for 
> spark.shuffle.server.chunkFetchHandlerThreadsPercent. The reason being
> currently this is set to 0. Which means currently if server.ioThreads > 0, 
> the default number of threads would be 2 * #cores instead of 
> server.io.Threads. We want the default to server.io.Threads in case this is 
> not set at all. Also here a default of 0 would also mean 2 * #cores



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31191) Spark SQL and hive metastore are incompatible

2020-03-26 Thread Yuming Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068200#comment-17068200
 ] 

Yuming Wang commented on SPARK-31191:
-

[~leishuiyu] Add spark.sql.hive.metastore.jars=maven?

> Spark SQL and hive metastore are incompatible
> -
>
> Key: SPARK-31191
> URL: https://issues.apache.org/jira/browse/SPARK-31191
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment:  the spark version 2.3.0
> the hive version 2.3.3
>Reporter: leishuiyu
>Priority: Major
> Attachments: image-2020-03-23-21-37-17-663.png
>
>
> # 
> h3. When I execute bin/spark-sql, an exception occurs
>  
> {code:java}
> Caused by: java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientCaused by: 
> java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
>  at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) 
> at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) 
> ... 12 moreCaused by: java.lang.reflect.InvocationTargetException at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
>  ... 18 moreCaused by: MetaException(message:Hive Schema version 1.2.0 does 
> not match metastore's schema version 2.3.0 Metastore is not upgraded or 
> corrupt) at 
> org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:6679)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:6645)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114) 
> at com.sun.proxy.$Proxy6.verifySchema(Unknown Source) at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:572)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:66)
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:199)
>  at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
>  ... 23 more
> {code}
> h3.   2.Find the reason
>  query the source code, in spark jars directory have 
> hive-metastore-1.2.1.spark2.jar
>  the 1.2.1 version match 1.2.0 ,so generate the exception
>   
>  
> {code:java}
> //代码占位符
> private static final Map EQUIVALENT_VERSIONS =
> ImmutableMap.of("0.13.1", "0.13.0",
> "1.0.0", "0.14.0",
> "1.0.1", "1.0.0",
> "1.1.1", "1.1.0",
> "1.2.1", "1.2.0"
> );
> {code}
>  
> h3. 3.Is there any solution to this problem
>    can edit hive-site.xml  hive.metastore.schema.verification set true,but 
> new problems may arise
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31276) Contrived working example that works with multiple URI file storages for Spark cluster mode

2020-03-26 Thread Jim Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Huang updated SPARK-31276:
--
Description: 
This Spark SQL Guide --> Data sources --> Generic Load/Save Functions

[https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html]

described a very simple "local file system load of an example file".  

 

I am looking for an example that demonstrates a workflow that exercises 
different file systems.  For example, 
 # Driver loads an input file from local file system
 # Add a simple column using lit() and stores that DataFrame in cluster mode to 
HDFS
 # Write that a small limited subset of that DataFrame back to Driver's local 
file system.  (This is to avoid the anti-pattern of writing large file and out 
of the scope for this example.  The small limited DataFrame would be some basic 
statistics, not the actual complete dataset.)

 

The examples I found on the internet only uses simple paths without the 
explicit URI prefixes.

Without the explicit URI prefixes, the "filepath" inherits how Spark (mode) was 
called, local stand alone vs YARN client mode.   So a "filepath" will be 
read/write locally (file system) vs cluster mode HDFS, without these explicit 
URIs.

There are situations were a Spark program needs to deal with both local file 
system and YARN client mode (big data) in the same Spark application, like 
producing a summary table stored on the local file system of the driver at the 
end.  

If there are any existing alternatives Spark documentation that provides 
examples that traverse through the different URIs in Spark YARN client mode or 
a better or smarter Spark pattern or API that is more suited for this, I am 
happy to accept that as well.  Thanks!

  was:
This Spark SQL Guide --> Data sources --> Generic Load/Save Functions

[https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html]

described a very simple "local file system load of an example file".  

 

I am looking for an example that demonstrates a workflow that exercises 
different file systems.  For example, 
 # Driver loads an input file from local file system
 # Add a simple column using lit() and stores that DataFrame in cluster mode to 
HDFS
 # Write that same final DataFrame back to Driver's local file system

 

The examples I found on the internet only uses simple paths without the 
explicit URI prefixes.

Without the explicit URI prefixes, the "filepath" inherits how Spark (mode) was 
called, local stand alone vs cluster mode.   So a "filepath" will be read/write 
locally (file system) vs cluster mode HDFS, without these explicit URIs.

There are situations were a Spark program needs to deal with both local file 
system and cluster mode (big data) in the same Spark application, like 
producing a summary table stored on the local file system of the driver at the 
end.  

If there are any existing alternatives Spark documentation that provides 
examples of different URIs, I am happy to accept that as well.  Thanks!


> Contrived working example that works with multiple URI file storages for 
> Spark cluster mode
> ---
>
> Key: SPARK-31276
> URL: https://issues.apache.org/jira/browse/SPARK-31276
> Project: Spark
>  Issue Type: Wish
>  Components: Examples
>Affects Versions: 2.4.5
>Reporter: Jim Huang
>Priority: Major
>
> This Spark SQL Guide --> Data sources --> Generic Load/Save Functions
> [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html]
> described a very simple "local file system load of an example file".  
>  
> I am looking for an example that demonstrates a workflow that exercises 
> different file systems.  For example, 
>  # Driver loads an input file from local file system
>  # Add a simple column using lit() and stores that DataFrame in cluster mode 
> to HDFS
>  # Write that a small limited subset of that DataFrame back to Driver's local 
> file system.  (This is to avoid the anti-pattern of writing large file and 
> out of the scope for this example.  The small limited DataFrame would be some 
> basic statistics, not the actual complete dataset.)
>  
> The examples I found on the internet only uses simple paths without the 
> explicit URI prefixes.
> Without the explicit URI prefixes, the "filepath" inherits how Spark (mode) 
> was called, local stand alone vs YARN client mode.   So a "filepath" will be 
> read/write locally (file system) vs cluster mode HDFS, without these explicit 
> URIs.
> There are situations were a Spark program needs to deal with both local file 
> system and YARN client mode (big data) in the same Spark application, like 
> producing a summary table stored on the local file system of the driver at 
> the end.  
> If there are any

[jira] [Updated] (SPARK-31262) Test case import another test case contains bracketed comments, can't display bracketed comments in golden files well.

2020-03-26 Thread Takeshi Yamamuro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-31262:
-
Component/s: Tests

> Test case import another test case contains bracketed comments, can't display 
> bracketed comments in golden files well.
> --
>
> Key: SPARK-31262
> URL: https://issues.apache.org/jira/browse/SPARK-31262
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.0.0
>
>
> The content of 
> {code:java}
> nested-comments.sql 
> {code} show below:
> {code:java}
> -- This test case just used to test imported bracketed comments.
> -- the first case of bracketed comment
> --QUERY-DELIMITER-START
> /* This is the first example of bracketed comment.
> SELECT 'ommented out content' AS first;
> */
> SELECT 'selected content' AS first;
> --QUERY-DELIMITER-END
> {code}
> The test case 
> {code:java}
> comments.sql 
> {code} imports 
> {code:java}
> nested-comments.sql
> {code}
>  below:
> {code:java}
> --IMPORT nested-comments.sql
> {code}
> The output will be:
> {code:java}
> -- !query
> /* This is the first example of bracketed comment.
> SELECT 'ommented out content' AS first
> -- !query schema
> struct<>
> -- !query output
> org.apache.spark.sql.catalyst.parser.ParseException
> mismatched input '/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
> 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
> 'DROP', 
> 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 
> 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 
> 'REVOKE', '
> ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 
> 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)
> == SQL ==
> /* This is the first example of bracketed comment.
> ^^^
> SELECT 'ommented out content' AS first
> -- !query
> */
> SELECT 'selected content' AS first
> -- !query schema
> struct<>
> -- !query output
> org.apache.spark.sql.catalyst.parser.ParseException
> extraneous input '*/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
> 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
> 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
> 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
> 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
> 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, 
> pos 0)
> == SQL ==
> */
> ^^^
> SELECT 'selected content' AS first
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31262) Test case import another test case contains bracketed comments, can't display bracketed comments in golden files well.

2020-03-26 Thread Takeshi Yamamuro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-31262.
--
Fix Version/s: 3.0.0
 Assignee: jiaan.geng
   Resolution: Fixed

Resolved by [https://github.com/apache/spark/pull/28018#]

> Test case import another test case contains bracketed comments, can't display 
> bracketed comments in golden files well.
> --
>
> Key: SPARK-31262
> URL: https://issues.apache.org/jira/browse/SPARK-31262
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.0.0
>
>
> The content of 
> {code:java}
> nested-comments.sql 
> {code} show below:
> {code:java}
> -- This test case just used to test imported bracketed comments.
> -- the first case of bracketed comment
> --QUERY-DELIMITER-START
> /* This is the first example of bracketed comment.
> SELECT 'ommented out content' AS first;
> */
> SELECT 'selected content' AS first;
> --QUERY-DELIMITER-END
> {code}
> The test case 
> {code:java}
> comments.sql 
> {code} imports 
> {code:java}
> nested-comments.sql
> {code}
>  below:
> {code:java}
> --IMPORT nested-comments.sql
> {code}
> The output will be:
> {code:java}
> -- !query
> /* This is the first example of bracketed comment.
> SELECT 'ommented out content' AS first
> -- !query schema
> struct<>
> -- !query output
> org.apache.spark.sql.catalyst.parser.ParseException
> mismatched input '/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
> 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
> 'DROP', 
> 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 
> 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 
> 'REVOKE', '
> ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 
> 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)
> == SQL ==
> /* This is the first example of bracketed comment.
> ^^^
> SELECT 'ommented out content' AS first
> -- !query
> */
> SELECT 'selected content' AS first
> -- !query schema
> struct<>
> -- !query output
> org.apache.spark.sql.catalyst.parser.ParseException
> extraneous input '*/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
> 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
> 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
> 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
> 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
> 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, 
> pos 0)
> == SQL ==
> */
> ^^^
> SELECT 'selected content' AS first
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31238) Incompatible ORC dates with Spark 2.4

2020-03-26 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-31238:
-

Assignee: Maxim Gekk

> Incompatible ORC dates with Spark 2.4
> -
>
> Key: SPARK-31238
> URL: https://issues.apache.org/jira/browse/SPARK-31238
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Bruce Robbins
>Assignee: Maxim Gekk
>Priority: Blocker
>
> Using Spark 2.4.5, write pre-1582 date to ORC file and then read it:
> {noformat}
> $ export TZ=UTC
> $ bin/spark-shell --conf spark.sql.session.timeZone=UTC
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.4.5-SNAPSHOT
>   /_/
>  
> Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 
> 1.8.0_161)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> sql("select cast('1200-01-01' as date) 
> dt").write.mode("overwrite").orc("/tmp/datefile")
> scala> spark.read.orc("/tmp/datefile").show
> +--+
> |dt|
> +--+
> |1200-01-01|
> +--+
> scala> :quit
> {noformat}
> Using Spark 3.0 (branch-3.0 at commit a934142f24), read the same file:
> {noformat}
> $ export TZ=UTC
> $ bin/spark-shell --conf spark.sql.session.timeZone=UTC
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.0.0-SNAPSHOT
>   /_/
>  
> Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 
> 1.8.0_161)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> spark.read.orc("/tmp/datefile").show
> +--+
> |dt|
> +--+
> |1200-01-08|
> +--+
> scala>
> {noformat}
> Dates are off.
> Timestamps, on the other hand, appear to work as expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31238) Incompatible ORC dates with Spark 2.4

2020-03-26 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31238.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28016
[https://github.com/apache/spark/pull/28016]

> Incompatible ORC dates with Spark 2.4
> -
>
> Key: SPARK-31238
> URL: https://issues.apache.org/jira/browse/SPARK-31238
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Bruce Robbins
>Assignee: Maxim Gekk
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Using Spark 2.4.5, write pre-1582 date to ORC file and then read it:
> {noformat}
> $ export TZ=UTC
> $ bin/spark-shell --conf spark.sql.session.timeZone=UTC
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.4.5-SNAPSHOT
>   /_/
>  
> Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 
> 1.8.0_161)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> sql("select cast('1200-01-01' as date) 
> dt").write.mode("overwrite").orc("/tmp/datefile")
> scala> spark.read.orc("/tmp/datefile").show
> +--+
> |dt|
> +--+
> |1200-01-01|
> +--+
> scala> :quit
> {noformat}
> Using Spark 3.0 (branch-3.0 at commit a934142f24), read the same file:
> {noformat}
> $ export TZ=UTC
> $ bin/spark-shell --conf spark.sql.session.timeZone=UTC
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.0.0-SNAPSHOT
>   /_/
>  
> Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 
> 1.8.0_161)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> spark.read.orc("/tmp/datefile").show
> +--+
> |dt|
> +--+
> |1200-01-08|
> +--+
> scala>
> {noformat}
> Dates are off.
> Timestamps, on the other hand, appear to work as expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30320) Insert overwrite to DataSource table with dynamic partition error when running multiple task attempts

2020-03-26 Thread koert kuipers (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068005#comment-17068005
 ] 

koert kuipers commented on SPARK-30320:
---

i believe we are seeing this issue. it shows up in particular when pre-emption 
is turned on and we are using dynamic partition overwrite. pre-emption kills 
tasks, they get restarted, and then they fail again because the output 
directory already exists (so task throws FileAlreadyExistsException). as a 
result entire job fails.

so i dont think this is just a speculative execution issue. this is a general 
issue with dynamic partition overwrite not being able to recover from task 
failure.

> Insert overwrite to DataSource table with dynamic partition error when 
> running multiple task attempts
> -
>
> Key: SPARK-30320
> URL: https://issues.apache.org/jira/browse/SPARK-30320
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Du Ripeng
>Priority: Major
>
> Inserting overwrite to a DataSource table with dynamic partition might fail 
> when running multiple task attempts. Suppose there are a task attempt and a 
> speculative task attempt, the speculative attempt would raise 
> FileAlreadyExistException



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-30320) Insert overwrite to DataSource table with dynamic partition error when running multiple task attempts

2020-03-26 Thread koert kuipers (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068005#comment-17068005
 ] 

koert kuipers edited comment on SPARK-30320 at 3/26/20, 8:11 PM:
-

i believe we are seeing this issue. it shows up in particular when pre-emption 
is turned on and we are using dynamic partition overwrite. pre-emption kills 
tasks, they get restarted, and then they fail again because the output 
directory already exists (so task throws FileAlreadyExistsException). as a 
result entire job fails.

so i dont think this is just a speculative execution issue. this is a general 
issue with dynamic partition overwrite not being able to recover from task 
failure.

see also SPARK-29302 which is same issue i believe.


was (Author: koert):
i believe we are seeing this issue. it shows up in particular when pre-emption 
is turned on and we are using dynamic partition overwrite. pre-emption kills 
tasks, they get restarted, and then they fail again because the output 
directory already exists (so task throws FileAlreadyExistsException). as a 
result entire job fails.

so i dont think this is just a speculative execution issue. this is a general 
issue with dynamic partition overwrite not being able to recover from task 
failure.

> Insert overwrite to DataSource table with dynamic partition error when 
> running multiple task attempts
> -
>
> Key: SPARK-30320
> URL: https://issues.apache.org/jira/browse/SPARK-30320
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Du Ripeng
>Priority: Major
>
> Inserting overwrite to a DataSource table with dynamic partition might fail 
> when running multiple task attempts. Suppose there are a task attempt and a 
> speculative task attempt, the speculative attempt would raise 
> FileAlreadyExistException



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31278) numOutputRows shows value from last micro batch when there is no new data

2020-03-26 Thread Burak Yavuz (Jira)

Burak Yavuz created SPARK-31278:
---

 Summary: numOutputRows shows value from last micro batch when 
there is no new data
 Key: SPARK-31278
 URL: https://issues.apache.org/jira/browse/SPARK-31278
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 3.0.0
Reporter: Burak Yavuz


In Structured Streaming, we provide progress updates every 10 seconds when a 
stream doesn't have any new data upstream. When providing this progress though, 
we zero out the input information but not the output information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-29302) dynamic partition overwrite with speculation enabled

2020-03-26 Thread koert kuipers (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067972#comment-17067972
 ] 

koert kuipers edited comment on SPARK-29302 at 3/26/20, 7:23 PM:
-

i believe we are seeing this issue. it shows up in particular when pre-emption 
is turned on and we are using dynamic partition overwrite. pre-emption kills 
tasks, they get restarted, and then they fail again because the output 
directory already exists (so task throws FileAlreadyExistsException). as a 
result entire job fails.

so i dont think this is just a speculative execution issue. this is a general 
issue with dynamic partition overwrite not being able to recover from task 
failure.


was (Author: koert):
i believe we are seeing this issue. it shows up in particular when pre-emption 
is turned on and we are using dynamic partition overwrite. pre-emption kills 
tasks, they get restarted, and then they fail again because the output 
directory already exists (so task throws FileAlreadyExistsException). as a 
result entire job fails.

> dynamic partition overwrite with speculation enabled
> 
>
> Key: SPARK-29302
> URL: https://issues.apache.org/jira/browse/SPARK-29302
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: feiwang
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> Now, for a dynamic partition overwrite operation,  the filename of a task 
> output is determinable.
> So, if speculation is enabled,  would a task conflict with  its relative 
> speculation task?
> Would the two tasks concurrent write a same file?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31277) Migrate `DateTimeTestUtils` from `TimeZone` to `ZoneId`

2020-03-26 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-31277:
--

 Summary: Migrate `DateTimeTestUtils` from `TimeZone` to `ZoneId`
 Key: SPARK-31277
 URL: https://issues.apache.org/jira/browse/SPARK-31277
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Currently, Spark SQL's date-time expressions and functions are ported on Java 8 
time API but tests still use old time APIs. In particular, DateTimeTestUtils 
exposes functions that accept only TimeZone instances. This is inconvenient, 
and CPU consuming because need to convert TimeZone instances to ZoneId 
instances via strings (zone ids). The tickets aims to replace TimeZone 
parameters of DateTimeTestUtils functions by ZoneId type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29302) dynamic partition overwrite with speculation enabled

2020-03-26 Thread koert kuipers (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067972#comment-17067972
 ] 

koert kuipers commented on SPARK-29302:
---

i believe we are seeing this issue. it shows up in particular when pre-emption 
is turned on and we are using dynamic partition overwrite. pre-emption kills 
tasks, they get restarted, and then they fail again because the output 
directory alreay exsists (so task throws FileAlreadyExistsException). as a 
result entire job fails.

> dynamic partition overwrite with speculation enabled
> 
>
> Key: SPARK-29302
> URL: https://issues.apache.org/jira/browse/SPARK-29302
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: feiwang
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> Now, for a dynamic partition overwrite operation,  the filename of a task 
> output is determinable.
> So, if speculation is enabled,  would a task conflict with  its relative 
> speculation task?
> Would the two tasks concurrent write a same file?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-29302) dynamic partition overwrite with speculation enabled

2020-03-26 Thread koert kuipers (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067972#comment-17067972
 ] 

koert kuipers edited comment on SPARK-29302 at 3/26/20, 7:16 PM:
-

i believe we are seeing this issue. it shows up in particular when pre-emption 
is turned on and we are using dynamic partition overwrite. pre-emption kills 
tasks, they get restarted, and then they fail again because the output 
directory already exists (so task throws FileAlreadyExistsException). as a 
result entire job fails.


was (Author: koert):
i believe we are seeing this issue. it shows up in particular when pre-emption 
is turned on and we are using dynamic partition overwrite. pre-emption kills 
tasks, they get restarted, and then they fail again because the output 
directory alreay exsists (so task throws FileAlreadyExistsException). as a 
result entire job fails.

> dynamic partition overwrite with speculation enabled
> 
>
> Key: SPARK-29302
> URL: https://issues.apache.org/jira/browse/SPARK-29302
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: feiwang
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> Now, for a dynamic partition overwrite operation,  the filename of a task 
> output is determinable.
> So, if speculation is enabled,  would a task conflict with  its relative 
> speculation task?
> Would the two tasks concurrent write a same file?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31276) Contrived working example that works with multiple URI file storages for Spark cluster mode

2020-03-26 Thread Jim Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Huang updated SPARK-31276:
--
Description: 
This Spark SQL Guide --> Data sources --> Generic Load/Save Functions

[https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html]

described a very simple "local file system load of an example file".  

 

I am looking for an example that demonstrates a workflow that exercises 
different file systems.  For example, 
 # Driver loads an input file from local file system
 # Add a simple column using lit() and stores that DataFrame in cluster mode to 
HDFS
 # Write that same final DataFrame back to Driver's local file system

 

The examples I found on the internet only uses simple paths without the 
explicit URI prefixes.

Without the explicit URI prefixes, the "filepath" inherits how Spark (mode) was 
called, local stand alone vs cluster mode.   So a "filepath" will be read/write 
locally (file system) vs cluster mode HDFS, without these explicit URIs.

There are situations were a Spark program needs to deal with both local file 
system and cluster mode (big data) in the same Spark application, like 
producing a summary table stored on the local file system of the driver at the 
end.  

If there are any existing alternatives Spark documentation that provides 
examples of different URIs, I am happy to accept that as well.  Thanks!

  was:
This Spark SQL Guide --> Data sources --> Generic Load/Save Functions

[https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html]

described a very simple "local file system load of an example file".  

 

I am looking for an example that demonstrates a workflow that exercises 
different file systems.  For example, 
 # Driver loads an input file from local file system
 # Add a simple column using lit() and stores that DataFrame in cluster mode to 
HDFS
 # Write that same final DataFrame back to Driver's local file system

 

The examples I found on the internet only uses simple paths without the 
explicit URI prefixes.

Without the explicit URI prefixes, the "filepath" inherits how Spark (mode) was 
called, local stand alone vs cluster mode.   So a "filepath" will be read/write 
locally (file system) vs cluster mode HDFS, without these explicit URIs.

There are situations were a Spark program needs to deal with both local file 
system and cluster mode (big data) in the same Spark application, like 
producing a summary table stored on the local file system of the driver at the 
end.  

Thanks.


> Contrived working example that works with multiple URI file storages for 
> Spark cluster mode
> ---
>
> Key: SPARK-31276
> URL: https://issues.apache.org/jira/browse/SPARK-31276
> Project: Spark
>  Issue Type: Wish
>  Components: Examples
>Affects Versions: 2.4.5
>Reporter: Jim Huang
>Priority: Major
>
> This Spark SQL Guide --> Data sources --> Generic Load/Save Functions
> [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html]
> described a very simple "local file system load of an example file".  
>  
> I am looking for an example that demonstrates a workflow that exercises 
> different file systems.  For example, 
>  # Driver loads an input file from local file system
>  # Add a simple column using lit() and stores that DataFrame in cluster mode 
> to HDFS
>  # Write that same final DataFrame back to Driver's local file system
>  
> The examples I found on the internet only uses simple paths without the 
> explicit URI prefixes.
> Without the explicit URI prefixes, the "filepath" inherits how Spark (mode) 
> was called, local stand alone vs cluster mode.   So a "filepath" will be 
> read/write locally (file system) vs cluster mode HDFS, without these explicit 
> URIs.
> There are situations were a Spark program needs to deal with both local file 
> system and cluster mode (big data) in the same Spark application, like 
> producing a summary table stored on the local file system of the driver at 
> the end.  
> If there are any existing alternatives Spark documentation that provides 
> examples of different URIs, I am happy to accept that as well.  Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31276) Contrived working example that works with multiple URI file storages for Spark cluster mode

2020-03-26 Thread Jim Huang (Jira)

Jim Huang created SPARK-31276:
-

 Summary: Contrived working example that works with multiple URI 
file storages for Spark cluster mode
 Key: SPARK-31276
 URL: https://issues.apache.org/jira/browse/SPARK-31276
 Project: Spark
  Issue Type: Wish
  Components: Examples
Affects Versions: 2.4.5
Reporter: Jim Huang


This Spark SQL Guide --> Data sources --> Generic Load/Save Functions

[https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html]

described a very simple "local file system load of an example file".  

 

I am looking for an example that demonstrates a workflow that exercises 
different file systems.  For example, 
 # Driver loads an input file from local file system
 # Add a simple column using lit() and stores that DataFrame in cluster mode to 
HDFS
 # Write that same final DataFrame back to Driver's local file system

 

The examples I found on the internet only uses simple paths without the 
explicit URI prefixes.

Without the explicit URI prefixes, the "filepath" inherits how Spark (mode) was 
called, local stand alone vs cluster mode.   So a "filepath" will be read/write 
locally (file system) vs cluster mode HDFS, without these explicit URIs.

There are situations were a Spark program needs to deal with both local file 
system and cluster mode (big data) in the same Spark application, like 
producing a summary table stored on the local file system of the driver at the 
end.  

Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31275) Improve the metrics format in ExecutionPage for StageId

2020-03-26 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-31275:
--

 Summary: Improve the metrics format in ExecutionPage for StageId
 Key: SPARK-31275
 URL: https://issues.apache.org/jira/browse/SPARK-31275
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 3.0.0, 3.1.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


In ExecutionPage, the metrics for stageId and attemptId are displayed like 
"stageId (attempt)" but the format "stageId.attempt" is more standard in Spark.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31274) Support .r files in 2.x version

2020-03-26 Thread Gaurangi Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaurangi Saxena updated SPARK-31274:

Priority: Minor  (was: Trivial)

> Support .r files in 2.x version
> ---
>
> Key: SPARK-31274
> URL: https://issues.apache.org/jira/browse/SPARK-31274
> Project: Spark
>  Issue Type: Question
>  Components: Input/Output
>Affects Versions: 2.3.4
>Reporter: Gaurangi Saxena
>Priority: Minor
> Fix For: 3.1.0
>
>
> Hello,
> We are using Spark 2.3.4 currently that does not allow .r files in 
> Spark-Submit. However, the latest versions of Spark do. It is a bit difficult 
> for us at the moment to upgrade the Spark version we are using. 
> Can you point me to the Jira that added support for .r? I am not able to find 
> it in the issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31274) Support .r files in 2.x version

2020-03-26 Thread Gaurangi Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaurangi Saxena updated SPARK-31274:

Issue Type: Question  (was: Bug)

> Support .r files in 2.x version
> ---
>
> Key: SPARK-31274
> URL: https://issues.apache.org/jira/browse/SPARK-31274
> Project: Spark
>  Issue Type: Question
>  Components: Input/Output
>Affects Versions: 2.3.4
>Reporter: Gaurangi Saxena
>Priority: Trivial
> Fix For: 3.1.0
>
>
> Hello,
> We are using Spark 2.3.4 currently that does not allow .r files in 
> Spark-Submit. However, the latest versions of Spark do. It is a bit difficult 
> for us at the moment to upgrade the Spark version we are using. 
> Can you point me to the Jira that added support for .r? I am not able to find 
> it in the issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31274) Support .r files in 2.x version

2020-03-26 Thread Gaurangi Saxena (Jira)

Gaurangi Saxena created SPARK-31274:
---

 Summary: Support .r files in 2.x version
 Key: SPARK-31274
 URL: https://issues.apache.org/jira/browse/SPARK-31274
 Project: Spark
  Issue Type: Bug
  Components: Input/Output
Affects Versions: 2.3.4
Reporter: Gaurangi Saxena
 Fix For: 3.1.0


Hello,

We are using Spark 2.3.4 currently that does not allow .r files in 
Spark-Submit. However, the latest versions of Spark do. It is a bit difficult 
for us at the moment to upgrade the Spark version we are using. 

Can you point me to the Jira that added support for .r? I am not able to find 
it in the issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-30095) create function syntax has to be enhance in Doc for multiple dependent jars

2020-03-26 Thread Huaxin Gao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067892#comment-17067892
 ] 

Huaxin Gao edited comment on SPARK-30095 at 3/26/20, 5:45 PM:
--

I took a look of the create function doc, it has the following syntax for jar 
and file:


{code:java}
resource_locations
Specifies the list of resources that contain the implementation of the function 
along with its dependencies.

Syntax: USING { { (JAR | FILE ) resource_uri } , ... }
{code}


The syntax has ... which means the preceding elements can be repeated. It seems 
to me that we cover the multiple jars OK. It also seems to me that we cover the 
file syntax OK. I am thinking of closing this ticket.




was (Author: huaxingao):
I took a look of the create function doc, it has the following syntax for jar 
and file:

resource_locations
Specifies the list of resources that contain the implementation of the function 
along with its dependencies.

Syntax: USING { { (JAR | FILE ) resource_uri } , ... }

The syntax has ... which means the preceding elements can be repeated. It seems 
to me that we cover the multiple jars OK. It also seems to me that we cover the 
file syntax OK. I am thinking of closing this ticket.



> create function syntax has to be enhance in Doc for multiple dependent jars 
> 
>
> Key: SPARK-30095
> URL: https://issues.apache.org/jira/browse/SPARK-30095
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
>
> Create Function Example and Syntax has to be enhance as below
> 1. Case 1: How to use multiple dependent jars in the path while creating 
> function is not clear. -- Syntax to be given
> 2. Case 2: What are the different schema supported like file:/// is not 
> updated in doc - Supported Schema to be provided



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30095) create function syntax has to be enhance in Doc for multiple dependent jars

2020-03-26 Thread Huaxin Gao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067892#comment-17067892
 ] 

Huaxin Gao commented on SPARK-30095:


I took a look of the create function doc, it has the following syntax for jar 
and file:

resource_locations
Specifies the list of resources that contain the implementation of the function 
along with its dependencies.

Syntax: USING { { (JAR | FILE ) resource_uri } , ... }

The syntax has ... which means the preceding elements can be repeated. It seems 
to me that we cover the multiple jars OK. It also seems to me that we cover the 
file syntax OK. I am thinking of closing this ticket.



> create function syntax has to be enhance in Doc for multiple dependent jars 
> 
>
> Key: SPARK-30095
> URL: https://issues.apache.org/jira/browse/SPARK-30095
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
>
> Create Function Example and Syntax has to be enhance as below
> 1. Case 1: How to use multiple dependent jars in the path while creating 
> function is not clear. -- Syntax to be given
> 2. Case 2: What are the different schema supported like file:/// is not 
> updated in doc - Supported Schema to be provided



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31273) Support for BEGIN/COMMIT/ROLLBACK TRANSACTION in SparkSQL

2020-03-26 Thread Sergio Sainz (Jira)

Sergio Sainz created SPARK-31273:


 Summary: Support for BEGIN/COMMIT/ROLLBACK TRANSACTION in SparkSQL
 Key: SPARK-31273
 URL: https://issues.apache.org/jira/browse/SPARK-31273
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.5
Reporter: Sergio Sainz


Looking for support for atomic transactions: BEGIN/COMMIT/ROLLBACK.

 

Such as here:

BEGIN TRANSACTION: 
[https://docs.microsoft.com/en-us/sql/t-sql/language-elements/begin-transaction-transact-sql?view=sql-server-ver15]

COMMIT TRANSACTION: 
[https://docs.microsoft.com/en-us/sql/t-sql/language-elements/commit-transaction-transact-sql?view=sql-server-ver15]

ROLLBACK TRANSACTION: 
[https://docs.microsoft.com/en-us/sql/t-sql/language-elements/rollback-transaction-transact-sql?view=sql-server-ver15]

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31259) Fix log error of curRequestSize in ShuffleBlockFetcherIterator

2020-03-26 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-31259:
-

Assignee: wuyi

> Fix log error of curRequestSize in ShuffleBlockFetcherIterator
> --
>
> Key: SPARK-31259
> URL: https://issues.apache.org/jira/browse/SPARK-31259
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.1.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Minor
>
> The log of curRequestSize is incorrect. Because curRequestSize may be the 
> total size of several group blocks but we use it for each group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31259) Fix log error of curRequestSize in ShuffleBlockFetcherIterator

2020-03-26 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31259.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28028
[https://github.com/apache/spark/pull/28028]

> Fix log error of curRequestSize in ShuffleBlockFetcherIterator
> --
>
> Key: SPARK-31259
> URL: https://issues.apache.org/jira/browse/SPARK-31259
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.1.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Minor
> Fix For: 3.0.0
>
>
> The log of curRequestSize is incorrect. Because curRequestSize may be the 
> total size of several group blocks but we use it for each group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31272) Support DB2 Kerberos login in JDBC connector

2020-03-26 Thread Gabor Somogyi (Jira)

Gabor Somogyi created SPARK-31272:
-

 Summary: Support DB2 Kerberos login in JDBC connector
 Key: SPARK-31272
 URL: https://issues.apache.org/jira/browse/SPARK-31272
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Gabor Somogyi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31272) Support DB2 Kerberos login in JDBC connector

2020-03-26 Thread Gabor Somogyi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067781#comment-17067781
 ] 

Gabor Somogyi commented on SPARK-31272:
---

Started to work on this.

> Support DB2 Kerberos login in JDBC connector
> 
>
> Key: SPARK-31272
> URL: https://issues.apache.org/jira/browse/SPARK-31272
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-31191) Spark SQL and hive metastore are incompatible

2020-03-26 Thread leishuiyu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leishuiyu reopened SPARK-31191:
---

> Spark SQL and hive metastore are incompatible
> -
>
> Key: SPARK-31191
> URL: https://issues.apache.org/jira/browse/SPARK-31191
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment:  the spark version 2.3.0
> the hive version 2.3.3
>Reporter: leishuiyu
>Priority: Major
> Attachments: image-2020-03-23-21-37-17-663.png
>
>
> # 
> h3. When I execute bin/spark-sql, an exception occurs
>  
> {code:java}
> Caused by: java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientCaused by: 
> java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
>  at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) 
> at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) 
> ... 12 moreCaused by: java.lang.reflect.InvocationTargetException at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
>  ... 18 moreCaused by: MetaException(message:Hive Schema version 1.2.0 does 
> not match metastore's schema version 2.3.0 Metastore is not upgraded or 
> corrupt) at 
> org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:6679)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:6645)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114) 
> at com.sun.proxy.$Proxy6.verifySchema(Unknown Source) at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:572)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:66)
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:199)
>  at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
>  ... 23 more
> {code}
> h3.   2.Find the reason
>  query the source code, in spark jars directory have 
> hive-metastore-1.2.1.spark2.jar
>  the 1.2.1 version match 1.2.0 ,so generate the exception
>   
>  
> {code:java}
> //代码占位符
> private static final Map EQUIVALENT_VERSIONS =
> ImmutableMap.of("0.13.1", "0.13.0",
> "1.0.0", "0.14.0",
> "1.0.1", "1.0.0",
> "1.1.1", "1.1.0",
> "1.2.1", "1.2.0"
> );
> {code}
>  
> h3. 3.Is there any solution to this problem
>    can edit hive-site.xml  hive.metastore.schema.verification set true,but 
> new problems may arise
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29154) Update Spark scheduler for stage level scheduling

2020-03-26 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-29154.
---
Fix Version/s: 3.1.0
 Assignee: Thomas Graves
   Resolution: Fixed

> Update Spark scheduler for stage level scheduling
> -
>
> Key: SPARK-29154
> URL: https://issues.apache.org/jira/browse/SPARK-29154
> Project: Spark
>  Issue Type: Story
>  Components: Scheduler
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Major
> Fix For: 3.1.0
>
>
> Make the changes to DAGscheduler, stage, task set manager, task scheduler to 
> support scheduling based on the resource profiles.  Note that the logic to 
> merge profiles has a separate jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31263) Enable yarn shuffle service close the idle connections

2020-03-26 Thread feiwang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

feiwang resolved SPARK-31263.
-
Resolution: Duplicate

> Enable yarn shuffle service close the idle connections
> --
>
> Key: SPARK-31263
> URL: https://issues.apache.org/jira/browse/SPARK-31263
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.1.0
>Reporter: feiwang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31201) add an individual config for skewed partition threshold

2020-03-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-31201.
--
Target Version/s: 3.0.0
  Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/27967

> add an individual config for skewed partition threshold
> ---
>
> Key: SPARK-31201
> URL: https://issues.apache.org/jira/browse/SPARK-31201
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31271) fix web ui for driver side SQL metrics

2020-03-26 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-31271:
---

 Summary: fix web ui for driver side SQL metrics
 Key: SPARK-31271
 URL: https://issues.apache.org/jira/browse/SPARK-31271
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31268) TaskEnd event with zero Executor Metrics when task duration less then poll interval

2020-03-26 Thread angerszhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067661#comment-17067661
 ] 

angerszhu commented on SPARK-31268:
---

raise a pr soon

> TaskEnd event with zero Executor Metrics when task duration less then poll 
> interval
> ---
>
> Key: SPARK-31268
> URL: https://issues.apache.org/jira/browse/SPARK-31268
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
> Attachments: screenshot-1.png
>
>
> TaskEnd event with zero Executor Metrics when task duration less then poll 
> interval



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31270) Expose executor memory metrics at the task detal, in the Stages tab

2020-03-26 Thread angerszhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067657#comment-17067657
 ] 

angerszhu commented on SPARK-31270:
---

Raise a pr soon

> Expose executor memory metrics at the task detal, in the Stages tab
> ---
>
> Key: SPARK-31270
> URL: https://issues.apache.org/jira/browse/SPARK-31270
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31270) Expose executor memory metrics at the task detal, in the Stages tab

2020-03-26 Thread angerszhu (Jira)

angerszhu created SPARK-31270:
-

 Summary: Expose executor memory metrics at the task detal, in the 
Stages tab
 Key: SPARK-31270
 URL: https://issues.apache.org/jira/browse/SPARK-31270
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31269) Supplement version for configuration only appear in configuration doc

2020-03-26 Thread jiaan.geng (Jira)

jiaan.geng created SPARK-31269:
--

 Summary: Supplement version for configuration only appear in 
configuration doc
 Key: SPARK-31269
 URL: https://issues.apache.org/jira/browse/SPARK-31269
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 3.1.0
Reporter: jiaan.geng


The configuration doc exists some config not organized by ConfigEntry.
We need to supplement version for configuration only appear in configuration 
doc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31268) TaskEnd event with zero Executor Metrics when task duration less then poll interval

2020-03-26 Thread angerszhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-31268:
--
Attachment: screenshot-1.png

> TaskEnd event with zero Executor Metrics when task duration less then poll 
> interval
> ---
>
> Key: SPARK-31268
> URL: https://issues.apache.org/jira/browse/SPARK-31268
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
> Attachments: screenshot-1.png
>
>
> TaskEnd event with zero Executor Metrics when task duration less then poll 
> interval



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31268) TaskEnd event with zero Executor Metrics when task duration less then poll interval

2020-03-26 Thread angerszhu (Jira)

angerszhu created SPARK-31268:
-

 Summary: TaskEnd event with zero Executor Metrics when task 
duration less then poll interval
 Key: SPARK-31268
 URL: https://issues.apache.org/jira/browse/SPARK-31268
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31268) TaskEnd event with zero Executor Metrics when task duration less then poll interval

2020-03-26 Thread angerszhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-31268:
--
Description: TaskEnd event with zero Executor Metrics when task duration 
less then poll interval

> TaskEnd event with zero Executor Metrics when task duration less then poll 
> interval
> ---
>
> Key: SPARK-31268
> URL: https://issues.apache.org/jira/browse/SPARK-31268
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> TaskEnd event with zero Executor Metrics when task duration less then poll 
> interval



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31228) Add version information to the configuration of Kafka

2020-03-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-31228.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 27989
[https://github.com/apache/spark/pull/27989]

> Add version information to the configuration of Kafka
> -
>
> Key: SPARK-31228
> URL: https://issues.apache.org/jira/browse/SPARK-31228
> Project: Spark
>  Issue Type: Sub-task
>  Components: DStreams
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.1.0
>
>
> external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/package.scala
> external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/package.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31228) Add version information to the configuration of Kafka

2020-03-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-31228:


Assignee: jiaan.geng

> Add version information to the configuration of Kafka
> -
>
> Key: SPARK-31228
> URL: https://issues.apache.org/jira/browse/SPARK-31228
> Project: Spark
>  Issue Type: Sub-task
>  Components: DStreams
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
>
> external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/package.scala
> external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/package.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31242) Clone SparkSession should respect spark.sql.legacy.sessionInitWithConfigDefaults

2020-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31242.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28014
[https://github.com/apache/spark/pull/28014]

> Clone SparkSession should respect 
> spark.sql.legacy.sessionInitWithConfigDefaults
> 
>
> Key: SPARK-31242
> URL: https://issues.apache.org/jira/browse/SPARK-31242
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
> Fix For: 3.0.0
>
>
> In SQL test, a conf specified by `withSQLConf` can be reverted to "original 
> value" after cloning SparkSession if the "original value" is already set in 
> SparkConf level. Because in `WithTestConf`, it doesn't  respect 
> spark.sql.legacy.sessionInitWithConfigDefaults and always merge SQLConf with 
> SparkConf.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31242) Clone SparkSession should respect spark.sql.legacy.sessionInitWithConfigDefaults

2020-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31242:
---

Assignee: wuyi

> Clone SparkSession should respect 
> spark.sql.legacy.sessionInitWithConfigDefaults
> 
>
> Key: SPARK-31242
> URL: https://issues.apache.org/jira/browse/SPARK-31242
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>
> In SQL test, a conf specified by `withSQLConf` can be reverted to "original 
> value" after cloning SparkSession if the "original value" is already set in 
> SparkConf level. Because in `WithTestConf`, it doesn't  respect 
> spark.sql.legacy.sessionInitWithConfigDefaults and always merge SQLConf with 
> SparkConf.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31267) Flaky test: WholeStageCodegenSparkSubmitSuite.Generated code on driver should not embed platform-specific constant

2020-03-26 Thread Gabor Somogyi (Jira)

Gabor Somogyi created SPARK-31267:
-

 Summary: Flaky test: WholeStageCodegenSparkSubmitSuite.Generated 
code on driver should not embed platform-specific constant
 Key: SPARK-31267
 URL: https://issues.apache.org/jira/browse/SPARK-31267
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 3.0.0, 3.1.0
Reporter: Gabor Somogyi


https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120363/testReport/
{code}
Error Message
org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
failAfter did not complete within 1 minute.
Stacktrace
sbt.ForkMain$ForkError: 
org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
failAfter did not complete within 1 minute.
at java.lang.Thread.getStackTrace(Thread.java:1559)
at 
org.scalatest.concurrent.TimeLimits.failAfterImpl(TimeLimits.scala:234)
at 
org.scalatest.concurrent.TimeLimits.failAfterImpl$(TimeLimits.scala:233)
at 
org.apache.spark.deploy.SparkSubmitSuite$.failAfterImpl(SparkSubmitSuite.scala:1416)
at org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:230)
at org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:229)
at 
org.apache.spark.deploy.SparkSubmitSuite$.failAfter(SparkSubmitSuite.scala:1416)
at 
org.apache.spark.deploy.SparkSubmitSuite$.runSparkSubmit(SparkSubmitSuite.scala:1435)
at 
org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite.$anonfun$new$1(WholeStageCodegenSparkSubmitSuite.scala:53)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151)
at 
org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:58)
at 
org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:58)
at 
org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
at 
org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381)
at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458)
at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
at org.scalatest.Suite.run(Suite.scala:1124)
at org.scalatest.Suite.run$(Suite.scala:1106)
at 
org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
at org.scalatest.SuperEngine.runImpl(Engine.scala:518)
at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:58)
at 
org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
at 
org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
at 
org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
at sbt.ForkMain$Run$2.call(ForkMain.java:296)
at sbt.ForkMain$Run$2.call(ForkMain.java:286)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.c

[jira] [Created] (SPARK-31266) Flaky test: KafkaDataConsumerSuite.SPARK-25151 Handles multiple tasks in executor fetching same (topic, partition) pair and same offset (edge-case) - data not in use

2020-03-26 Thread Gabor Somogyi (Jira)

Gabor Somogyi created SPARK-31266:
-

 Summary: Flaky test: KafkaDataConsumerSuite.SPARK-25151 Handles 
multiple tasks in executor fetching same (topic, partition) pair and same 
offset (edge-case) - data not in use
 Key: SPARK-31266
 URL: https://issues.apache.org/jira/browse/SPARK-31266
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming, Tests
Affects Versions: 3.0.0, 3.1.0
Reporter: Gabor Somogyi


https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120363/testReport/
{code}
Error Message
java.util.concurrent.TimeoutException: Timeout after waiting for 1 ms.
Stacktrace
sbt.ForkMain$ForkError: java.util.concurrent.TimeoutException: Timeout after 
waiting for 1 ms.
at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:78)
at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:30)
at 
org.apache.spark.sql.kafka010.KafkaTestUtils.$anonfun$sendMessages$3(KafkaTestUtils.scala:425)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at 
org.apache.spark.sql.kafka010.KafkaTestUtils.sendMessages(KafkaTestUtils.scala:424)
at 
org.apache.spark.sql.kafka010.consumer.KafkaDataConsumerSuite.prepareTestTopicHavingTestMessages(KafkaDataConsumerSuite.scala:377)
at 
org.apache.spark.sql.kafka010.consumer.KafkaDataConsumerSuite.$anonfun$new$17(KafkaDataConsumerSuite.scala:320)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151)
at 
org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:58)
at 
org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:58)
at 
org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
at 
org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381)
at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458)
at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
at org.scalatest.Suite.run(Suite.scala:1124)
at org.scalatest.Suite.run$(Suite.scala:1106)
at 
org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
at org.scalatest.SuperEngine.runImpl(Engine.scala:518)
at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:58)
at 
org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAf

[jira] [Updated] (SPARK-31234) ResetCommand should not wipe out all configs

2020-03-26 Thread Takeshi Yamamuro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-31234:
-
Issue Type: Bug  (was: Improvement)

> ResetCommand should not wipe out all configs
> 
>
> Key: SPARK-31234
> URL: https://issues.apache.org/jira/browse/SPARK-31234
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, ResetCommand clear all configurations, including sql configs, 
> static sql configs and spark context level configs.
> for example:
> ```
> spark-sql> set xyz=abc;
> xyz   abc
> spark-sql> set;
> spark.app.id  local-1585055396930
> spark.app.nameSparkSQL::10.242.189.214
> spark.driver.host 10.242.189.214
> spark.driver.port 65094
> spark.executor.id driver
> spark.jars
> spark.master  local[*]
> spark.sql.catalogImplementation   hive
> spark.sql.hive.version1.2.1
> spark.submit.deployMode   client
> xyz   abc
> spark-sql> reset;
> spark-sql> set;
> spark-sql> set spark.sql.hive.version;
> spark.sql.hive.version1.2.1
> spark-sql> set spark.app.id;
> spark.app.id  
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31254) `HiveResult.toHiveString` does not use the current session time zone

2020-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31254.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28024
[https://github.com/apache/spark/pull/28024]

> `HiveResult.toHiveString` does not use the current session time zone
> 
>
> Key: SPARK-31254
> URL: https://issues.apache.org/jira/browse/SPARK-31254
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, date/timestamp formatters in `HiveResult.toHiveString` are 
> initialized once on instantiation of the `HiveResult` object, and pick up the 
> session time zone. If the sessions time zone is changed, the formatters still 
> use the previous one.
> See the discussion there 
> https://github.com/apache/spark/pull/23391#discussion_r397347820



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31254) `HiveResult.toHiveString` does not use the current session time zone

2020-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31254:
---

Assignee: Maxim Gekk

> `HiveResult.toHiveString` does not use the current session time zone
> 
>
> Key: SPARK-31254
> URL: https://issues.apache.org/jira/browse/SPARK-31254
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> Currently, date/timestamp formatters in `HiveResult.toHiveString` are 
> initialized once on instantiation of the `HiveResult` object, and pick up the 
> session time zone. If the sessions time zone is changed, the formatters still 
> use the previous one.
> See the discussion there 
> https://github.com/apache/spark/pull/23391#discussion_r397347820



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31147) forbid CHAR type in non-Hive-Serde tables

2020-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-31147:

Fix Version/s: (was: 3.1.0)
   3.0.0

> forbid CHAR type in non-Hive-Serde tables
> -
>
> Key: SPARK-31147
> URL: https://issues.apache.org/jira/browse/SPARK-31147
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31227) Non-nullable null type should not coerce to nullable type

2020-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31227.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27991
[https://github.com/apache/spark/pull/27991]

> Non-nullable null type should not coerce to nullable type
> -
>
> Key: SPARK-31227
> URL: https://issues.apache.org/jira/browse/SPARK-31227
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 3.0.0
>
>
> {code}
> scala> spark.range(10).selectExpr("array()").printSchema()
> root
>  |-- array(): array (nullable = false)
>  ||-- element: null (containsNull = false)
> scala> spark.range(10).selectExpr("concat(array()) as arr").printSchema()
> root
>  |-- arr: array (nullable = false)
>  ||-- element: null (containsNull = false)
> scala> spark.range(10).selectExpr("concat(array(), array(1)) as 
> arr").printSchema()
> root
>  |-- arr: array (nullable = false)
>  ||-- element: integer (containsNull = true)
> {code}
> The last case should not coerce to nullable type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31227) Non-nullable null type should not coerce to nullable type

2020-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31227:
---

Assignee: Hyukjin Kwon

> Non-nullable null type should not coerce to nullable type
> -
>
> Key: SPARK-31227
> URL: https://issues.apache.org/jira/browse/SPARK-31227
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>
> {code}
> scala> spark.range(10).selectExpr("array()").printSchema()
> root
>  |-- array(): array (nullable = false)
>  ||-- element: null (containsNull = false)
> scala> spark.range(10).selectExpr("concat(array()) as arr").printSchema()
> root
>  |-- arr: array (nullable = false)
>  ||-- element: null (containsNull = false)
> scala> spark.range(10).selectExpr("concat(array(), array(1)) as 
> arr").printSchema()
> root
>  |-- arr: array (nullable = false)
>  ||-- element: integer (containsNull = true)
> {code}
> The last case should not coerce to nullable type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31234) ResetCommand should not wipe out all configs

2020-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31234.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28003
[https://github.com/apache/spark/pull/28003]

> ResetCommand should not wipe out all configs
> 
>
> Key: SPARK-31234
> URL: https://issues.apache.org/jira/browse/SPARK-31234
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, ResetCommand clear all configurations, including sql configs, 
> static sql configs and spark context level configs.
> for example:
> ```
> spark-sql> set xyz=abc;
> xyz   abc
> spark-sql> set;
> spark.app.id  local-1585055396930
> spark.app.nameSparkSQL::10.242.189.214
> spark.driver.host 10.242.189.214
> spark.driver.port 65094
> spark.executor.id driver
> spark.jars
> spark.master  local[*]
> spark.sql.catalogImplementation   hive
> spark.sql.hive.version1.2.1
> spark.submit.deployMode   client
> xyz   abc
> spark-sql> reset;
> spark-sql> set;
> spark-sql> set spark.sql.hive.version;
> spark.sql.hive.version1.2.1
> spark-sql> set spark.app.id;
> spark.app.id  
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31234) ResetCommand should not wipe out all configs

2020-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31234:
---

Assignee: Kent Yao

> ResetCommand should not wipe out all configs
> 
>
> Key: SPARK-31234
> URL: https://issues.apache.org/jira/browse/SPARK-31234
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> Currently, ResetCommand clear all configurations, including sql configs, 
> static sql configs and spark context level configs.
> for example:
> ```
> spark-sql> set xyz=abc;
> xyz   abc
> spark-sql> set;
> spark.app.id  local-1585055396930
> spark.app.nameSparkSQL::10.242.189.214
> spark.driver.host 10.242.189.214
> spark.driver.port 65094
> spark.executor.id driver
> spark.jars
> spark.master  local[*]
> spark.sql.catalogImplementation   hive
> spark.sql.hive.version1.2.1
> spark.submit.deployMode   client
> xyz   abc
> spark-sql> reset;
> spark-sql> set;
> spark-sql> set spark.sql.hive.version;
> spark.sql.hive.version1.2.1
> spark-sql> set spark.app.id;
> spark.app.id  
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31247) Flaky test: KafkaContinuousSourceSuite.assign from latest offsets (failOnDataLoss: false)

2020-03-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-31247:
-
Description: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120336/testReport/

{code}
Error Message
org.scalatest.exceptions.TestFailedException:  Error adding data: Timeout after 
waiting for 1 ms. 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:78)
  
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:30)
  
org.apache.spark.sql.kafka010.KafkaTestUtils.$anonfun$sendMessages$3(KafkaTestUtils.scala:425)
  scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)  
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)  
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)  
scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)  
scala.collection.TraversableLike.map(TraversableLike.scala:238)  
scala.collection.TraversableLike.map$(TraversableLike.scala:231)  
scala.collection.AbstractTraversable.map(Traversable.scala:108)   == Progress 
==AssertOnQuery(, )AddKafkaData(topics = Set(topic-13), data 
= WrappedArray(1, 2, 3), message = )CheckAnswer: [2],[3],[4]StopStream  
  
StartStream(ContinuousTrigger(1000),org.apache.spark.util.SystemClock@1f1a9495,Map(),null)
CheckAnswer: [2],[3],[4]StopStreamAddKafkaData(topics = 
Set(topic-13), data = WrappedArray(4, 5, 6), message = )
StartStream(ContinuousTrigger(1000),org.apache.spark.util.SystemClock@2b3bec2c,Map(),null)
CheckAnswer: [2],[3],[4],[5],[6],[7] => AddKafkaData(topics = 
Set(topic-13), data = WrappedArray(7, 8), message = )CheckAnswer: 
[2],[3],[4],[5],[6],[7],[8],[9]AssertOnQuery(, Add partitions)   
 AddKafkaData(topics = Set(topic-13), data = WrappedArray(9, 10, 11, 12, 13, 
14, 15, 16), message = )CheckAnswer: 
[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17]  == 
Stream == Output Mode: Append Stream state: {KafkaSource[Assign[topic-13-4, 
topic-13-3, topic-13-2, topic-13-1, topic-13-0]]: 
{"topic-13":{"2":2,"4":2,"1":1,"3":1,"0":1}}} Thread state: alive Thread stack 
trace: sun.misc.Unsafe.park(Native Method) 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
 scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:242) 
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:258) 
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:187) 
org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:336) 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:746) 
org.apache.spark.SparkContext.runJob(SparkContext.scala:2104) 
org.apache.spark.SparkContext.runJob(SparkContext.scala:2125) 
org.apache.spark.SparkContext.runJob(SparkContext.scala:2144) 
org.apache.spark.SparkContext.runJob(SparkContext.scala:2169) 
org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1006) 
org.apache.spark.rdd.RDD$$Lambda$2999/724038556.apply(Unknown Source) 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) 
org.apache.spark.rdd.RDD.withScope(RDD.scala:390) 
org.apache.spark.rdd.RDD.collect(RDD.scala:1005) 
org.apache.spark.sql.execution.streaming.continuous.WriteToContinuousDataSourceExec.doExecute(WriteToContinuousDataSourceExec.scala:57)
 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
 org.apache.spark.sql.execution.SparkPlan$$Lambda$2791/4135277.apply(Unknown 
Source) 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
 org.apache.spark.sql.execution.SparkPlan$$Lambda$2823/504830038.apply(Unknown 
Source) 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210) 
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171) 
org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution.$anonfun$runContinuous$4(ContinuousExecution.scala:256)
 
org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$Lambda$2765/297007729.apply(Unknown
 Source) 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
 
org.apache.spark.sql.execution.SQLExecution$$$Lambda$2773/697863343.apply(Unknown
 Source) 
org.apache.spark.sql.execution.SQLExecution$.withSQLCon

[jira] [Updated] (SPARK-31252) Flaky test: ElementTrackingStoreSuite.asynchronous tracking single-fire

2020-03-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-31252:
-
Description: 
{code}
Error Message
org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
eventually never returned normally. Attempted 1 times over 230.305107 
milliseconds. Last failure message: false did not equal true.
Stacktrace
sbt.ForkMain$ForkError: 
org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
eventually never returned normally. Attempted 1 times over 230.305107 
milliseconds. Last failure message: false did not equal true.
at 
org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432)
at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439)
at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391)
at 
org.apache.spark.status.ElementTrackingStoreSuite.eventually(ElementTrackingStoreSuite.scala:31)
at 
org.apache.spark.status.ElementTrackingStoreSuite.$anonfun$new$1(ElementTrackingStoreSuite.scala:64)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151)
at 
org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:58)
at 
org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:58)
at 
org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
at 
org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381)
at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458)
at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
at org.scalatest.Suite.run(Suite.scala:1124)
at org.scalatest.Suite.run$(Suite.scala:1106)
at 
org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
at org.scalatest.SuperEngine.runImpl(Engine.scala:518)
at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:58)
at 
org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
at 
org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
at 
org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
at sbt.ForkMain$Run$2.call(ForkMain.java:296)
at sbt.ForkMain$Run$2.call(ForkMain.java:286)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: sbt.ForkMain$ForkError: 
org.scalatest.exceptions.TestFailedException: false did not equal true
at 
org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:343)
at 
org.scalatest.Matchers$AnyShouldWrapper.shouldEqual(Matchers.scala:6797)
at 
org.apache.spark.status.ElementTrackingStoreSuite.$anonfun$new$3(ElementTrackingStoreSuite.scala:65)
at 
org.scalatest.concurrent.Eventually.makeAValiantAttempt$1(Eventually.scala:395)

[jira] [Updated] (SPARK-31252) Flaky test: ElementTrackingStoreSuite.asynchronous tracking single-fire

2020-03-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-31252:
-
Description: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120353/testReport

{code}
Error Message
org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
eventually never returned normally. Attempted 1 times over 230.305107 
milliseconds. Last failure message: false did not equal true.
Stacktrace
sbt.ForkMain$ForkError: 
org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
eventually never returned normally. Attempted 1 times over 230.305107 
milliseconds. Last failure message: false did not equal true.
at 
org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432)
at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439)
at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391)
at 
org.apache.spark.status.ElementTrackingStoreSuite.eventually(ElementTrackingStoreSuite.scala:31)
at 
org.apache.spark.status.ElementTrackingStoreSuite.$anonfun$new$1(ElementTrackingStoreSuite.scala:64)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151)
at 
org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:58)
at 
org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:58)
at 
org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
at 
org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381)
at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458)
at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
at org.scalatest.Suite.run(Suite.scala:1124)
at org.scalatest.Suite.run$(Suite.scala:1106)
at 
org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
at org.scalatest.SuperEngine.runImpl(Engine.scala:518)
at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:58)
at 
org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
at 
org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
at 
org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
at sbt.ForkMain$Run$2.call(ForkMain.java:296)
at sbt.ForkMain$Run$2.call(ForkMain.java:286)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: sbt.ForkMain$ForkError: 
org.scalatest.exceptions.TestFailedException: false did not equal true
at 
org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:343)
at 
org.scalatest.Matchers$AnyShouldWrapper.shouldEqual(Matchers.scala:6797)
at 
org.apache.spark.status.ElementTrackingStoreSuite.$anonfun$new$3(ElementTrackingStoreSuite.scala:65)

79 matches

Mail list logo